Database Caching and Invalidation using Database Provided Facilities for Query Dependency Analysis Harward; Nathaniel D. ; et al. [TERRACOTTA, INC.]

Database Caching and Invalidation using Database Provided Facilities for Query Dependency Analysis

Harward; Nathaniel D. ; et al.

Patent Application Summary

U.S. patent application number 11/420451 was filed with the patent office on 2006-11-30 for database caching and invalidation using database provided facilities for query dependency analysis. This patent application is currently assigned to TERRACOTTA, INC.. Invention is credited to Andrew R. Geweke, Nathaniel D. Harward, Alexander Voskoboynik.

Application Number	20060271510 11/420451
Document ID	/
Family ID	37452936
Filed Date	2006-11-30

United States Patent Application	20060271510
Kind Code	A1
Harward; Nathaniel D. ; et al.	November 30, 2006

Database Caching and Invalidation using Database Provided Facilities for Query Dependency Analysis

Abstract

Database data is maintained reliably and invalidated based on actual changes to data in the database. Updates or changes to data are detected without parsing queries submitted to the database. The dependencies of a query can be determined by submitting a version of the received query to the database through a native facility provided by the database to analyze how query structures are processed. The caching system can access the results of the facility to determine the tables, rows, or other partitions of data a received query is dependent upon or modifies. An abstracted form of the query can be cached with an indication of the tables, rows, etc. that queries of that structure access or modify. The tables a write or update query modifies can be cached with a time of last modification. When a query is received for which the results are cached, the system can readily determine dependency information for the query, the last time the dependencies were modified, and compare this time with the time indicated for when the cached results were retrieved. By passing versions of write queries to the database, updates to the database can be detected.

Inventors:	Harward; Nathaniel D.; (San Francisco, CA) ; Geweke; Andrew R.; (Berkeley, CA) ; Voskoboynik; Alexander; (Foster City, CA)
Correspondence Address:	VIERRA MAGEN MARCUS & DENIRO LLP 575 MARKET STREET SUITE 2500 SAN FRANCISCO CA 94105 US
Assignee:	TERRACOTTA, INC. 650 Townsend Street Suite 325 San Francisco CA
Family ID:	37452936
Appl. No.:	11/420451
Filed:	May 25, 2006

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
60684610	May 25, 2005

Current U.S. Class:	1/1 ; 707/999.002
Current CPC Class:	G06F 9/526 20130101; G06F 9/445 20130101; G06F 9/544 20130101; G06F 11/1479 20130101; G06F 16/217 20190101; G06F 9/45504 20130101; G06F 9/52 20130101; G06F 16/24552 20190101; G06F 2209/522 20130101; G06F 11/203 20130101
Class at Publication:	707/002
International Class:	G06F 17/30 20060101 G06F017/30

Claims

1. A method of caching database query results, comprising: receiving a request for information from a database; determining if previously received results for said request are maintained locally; if previously received results are not maintained locally for said request: passing a form of said request to said database by calling a native analysis facility at said database to assess how said database processes requests of said form; accessing results of said assessment by said native analysis facility; determining from said results one or more dependencies of said database for requests of said form; maintaining data indicating that requests of said form for said database include said one or more dependencies; passing said request to said database and receiving results; and maintaining information including said request and said results of said request.

2. The method of claim 1, wherein: maintaining said information includes maintaining an indication of a time said results were determined.

3. The method of claim 2, further comprising: if previously received results are being maintained for said request: accessing said data to determine said one or more dependencies; accessing information indicating a time of last update to said one or more dependencies; determining whether said previously received results are valid by comparing said indication of said time said results were determined with said time of last update to said one or more dependencies; returning said results in response to said request if said previously received results are valid.

4. The method of claim 3, further comprising: if said previously received results are determined to be invalid: passing said request to said database and receiving updated results; updating said information maintained for said request to indicate said updated results.

5. The method of claim 1, wherein: said native analysis facility is "EXPLAIN PLAN."

6. The method of claim 1, wherein: said native analysis facility is a debugging facility provided by said database.

7. The method of claim 1, wherein: said native analysis facility is a performance analysis facility provided by said database.

8. The method of claim 1, wherein: said one or more dependencies of said database for requests of said form are one or more tables of said database upon which requests of said form depend for results.

9. The method of claim 1, wherein: said one or more dependencies of said database for requests of said form are one or more rows of one or more tables of said database upon which requests of said form depend for results.

10. The method of claim 1, further comprising, if previously received results are not being maintained for said request: abstracting said request by removing any literal expressions included therein to generate said form of said request; wherein said request is a database query and said form of said request is a skeleton of said database query.

11. The method of claim 1, wherein: receiving said request includes receiving said request at a first caching system; passing said form of said request includes passing said form of said request from a second caching system to said database after receiving at least one of said request and said request at said second caching system from said first caching system; and maintaining said data and maintaining said information include maintaining said data and said information at said second caching system.

12. The method of claim 11, wherein: maintaining said data and maintaining said information include maintaining said data and information at said first caching system and said second caching system.

13. A database caching system, comprising: a database including a native analysis facility to analyze query structures; a caching intermediary in communication with said database and an application accessing said database, said caching intermediary maintains one or more queries and results received from said database in response to said one or more queries, said caching intermediary receives a first query from said application and determines if said first query and previously received results for said first query are maintained by said caching intermediary, said caching intermediary passes a form of said first query to said native analysis facility at said database if said caching intermediary is not maintaining results for said first query and accesses results of said analysis facility to determine one or more tables of said database upon which queries of said form depend, said caching intermediary further maintains said form of said first query and an indication of said one of more tables of said database upon which queries of said form depend.

14. The database caching system of claim 13, wherein: said caching intermediary maintains an indication of dependencies from said database and a time of last modification of each of said dependencies; said caching intermediary receives a second query from said application and after determining that said second query is not maintained at said caching intermediary, passes a form of said second query to said native analysis facility and accesses results of said analysis facility to determine if said second query reads or modifies data at said database, said caching intermediary determines from said results one or more tables of said database modified by queries of said form of said second query if said second query updates data at said database, said caching intermediary passes said second query to said database and receives results in return, said caching intermediary maintains said second query with an indication of said results of said second query and also maintains an indication of said one or more tables of said database modified by queries of said form of said second query with an indication of a time of most recent modification to said one or more tables.

15. The database caching system of claim 14, wherein: said caching intermediary maintains said form of said second query with an indication of said one or more tables modified by queries of said form of said second query.

16. The database caching system of claim 13, wherein: said form of said first query is a structure of said first query generated by removing any literal expressions from said first query.

17. The database caching system of claim 13, wherein: said caching intermediary includes a first caching intermediary at a first processing system and a second caching intermediary at a second processing system.

18. The database caching system of claim 17, wherein: said caching intermediary further includes one or more additional caching intermediaries; said second caching intermediary receives queries received at said first caching intermediary and said one or more additional caching intermediaries.

19. One or more processor readable storage devices having processor readable code embodied on said one or more processor readable storage devices, said processor readable code for programming one or more processors to perform a method comprising: receiving a request for information from a database; determining if previously received results for said request are maintained locally; and if previously received results are not maintained locally for said request: passing a form of said request to said database by calling a native analysis facility at said database to assess how said database processes requests of said form; accessing results of said assessment by said native analysis facility; determining from said results one or more dependencies of said database for requests of said form; maintaining data indicating that requests of said form for said database include said one or more dependencies; passing said request to said database and receiving results; and maintaining information including said request and said results of said request.

Description

PRIORITY CLAIM

[0001] This application claims the benefit of U.S. Provisional Patent Application No. 60/684,610, filed May 25, 2005, entitled "Terracotta Virtualization Server", and incorporated by reference herein in its entirety.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0002] The following applications are cross-referenced and incorporated by reference herein in their entirety:

[0003] U.S. patent application Ser. No. ______, filed concurrently, entitled "Database Caching and Invalidation Based on Detected Database Updates," by Harward et al., filed concurrently (Attorney Docket No. TERA-01008US0); and

[0004] U.S. patent application Ser. No. ______, filed concurrently, entitled "Database Caching and Invalidation for Stored Procedures," by Harward et al., filed concurrently (Attorney Docket No. TERA-01009US0).

BACKGROUND OF THE INVENTION

[0005] 1. Field of the Invention

[0006] The present invention relates generally to databases and caching systems for databases.

[0007] 2. Description of the Related Art

[0008] Databases are an integral part of many operations, businesses and organizations where access to structured data is needed. Databases offer manageability for the vast amounts of data that many of today's large scale enterprises rely upon in day to day operations. Through the use of database storage, seemingly incomprehensible amounts of data can be maintained in a structured environment so that requesting users can easily locate and retrieve needed information. Many of today's large-scale applications such as e-commerce, banking, etc. are able to present end-users with seemingly instant access to personalized information by using databases. Without database storage in many of these environments, the task of methodically sorting through the amount of information necessary to access that which is needed would likely frustrate the ultimate purpose of the application to the point of ending many operations.

[0009] Databases can be implemented in many languages and may support many protocols such as Structured Query Language (SQL) which provides users and applications the ability to interface with the database through SQL statements to select, insert, update, and delete information in the database. Of course, the complexity of operations performed by and submitted to databases continues to expand. SQL statements can be combined to provide a high level of logic and interaction with data in a database.

[0010] As the amount of data maintained by databases continues to expand, the efficiency and performance of database access and response becomes ever more crucial. For instance, a database maintaining banking records for users across the country or world must be able to efficiently respond to requests from numerous users accessing the database at the same time from various remote geographic locations.

[0011] Database caching techniques have been developed to decrease the amount of aggregate traffic transmitted between the database and requesting client devices. By caching database information in smaller faster memories, performance can be increased. Certain access requests will be satisfied from local cached versions of the database. This will not only increase the performance as seen by requesting applications or entities, but also decrease the amount of load placed on the database.

[0012] An inherent difficulty with database caching is the structured nature of both the data storage and retrieval. In traditional data caching scenarios, such as caching recently accessed data from a large memory volume that is indexed according to a physical or logical address, discrete data segments can be cached in a smaller memory and referenced according to their location on the large memory volume. The cache simply looks for an update to the location on the memory volume corresponding to the cached data and when the data at that location changes, the cache can simply invalidate or discard its cached results. In databases, however, data is retrieved by passing queries such as SQL statements. These statements can access, combine, and manipulate data from different locations to satisfy the query request.

[0013] Because of the structured nature of databases, database caches relying on these traditional caching techniques are required to store a subset of an entire database's data structures. Data is cached based on memory locations and entries invalidated based on updates to those memory locations. These systems do not cache results based on an actual query. Because queries may combine or access data from various locations in the database, and request data based on logic that is dependent upon the data stored therein, it is difficult to accurately determine when the previously retrieved results for a query are no longer valid because of changes to data in the database. Complex analysis to parse a received query is thus required to determine what the query does at the database. While in theory such a technique is useful, in practice it presents obstacles. Databases are typically written in a proprietary format and the underlying source code is not made available to those wishing to implement a caching system. Because an intimate knowledge of the database's internal operations and code can not be had, parsing the queries is in many instances deficient or even wrong. Moreover, individual database providers frequently alter their databases, requiring these parsing techniques to continually be updated as the database is updated.

[0014] In other implementations, schedule or time-based caching has been used for databases to overcome the difficulty with parsing queries. Entries in a cache are made for particular queries and the results of those queries. However, the entries are automatically invalidated after a specified amount of time without any regard to actual changes at the database. If changes to the database are made between invalidation periods and a request is received, the cache may return invalid results. Additionally, this technique can degrade performance and efficiency by needlessly invalidating accurate query results.

[0015] Accordingly, an improved system and technique for database caching is needed to provide efficient, reliable, and accurate results.

SUMMARY OF THE INVENTION

[0016] A database caching system and related techniques are provided that can reliably maintain and invalidate database data based on actual changes to data in the database. Updates or changes to data at the database are detected without parsing queries submitted to the database. The dependencies of a received query can be determined by submitting a version of the received query to the database through a native facility provided by the database itself to analyze how query structures are processed at the database. The caching system can access the results of the facility to determine the tables or rows (or other data partition) a received query is dependent upon or modifies. In addition to the results of a query that can be cached with an indication of the query itself, an abstracted form of the query can be cached with an indication of the tables, rows, etc. that queries of that structure access or modify. The tables a write or update query modifies can be cached with a time of their last modification. When a query is received for which the results are cached, the system can readily determine dependency information for the query, the last time the dependencies were modified, and compare this time with the time indicated for when the cached results were retrieved. By passing versions of write queries to the database, updates to the database can be detected. In one embodiment, a component is implemented at or on the system of the database to directly detect changes to the database data. This component can monitor transactional information maintained by the database itself to determine when changes to the database occur. This component can communicate with the cache to provide notification of changes to the database.

[0017] A method of caching database query results is provided in one embodiment that includes receiving a request for information from a database and determining if previously received results for the request are maintained locally. If previously received results are not maintained locally for the request, the method can include passing a form of the request to the database by calling a native analysis facility at the database to assess how the database processes requests of the form, accessing results of the assessment by the native analysis facility, determining from the results one or more dependencies of the database for requests of the form, maintaining data indicating that requests of the form for the database include the one or more dependencies, passing the request to the database and receiving results, and maintaining information including the request and the results of the request.

[0018] In one embodiment, a database caching system is provided that includes a database including a native analysis facility to analyze query structures and a caching intermediary in communication with the database and an application accessing the database. The caching intermediary maintains one or more queries and results received from the database in response to the one or more queries. The caching intermediary receives a first query from the application and determines if the first query and previously received results for the first query are maintained by the caching intermediary. The caching intermediary passes a form of the first query to the native analysis facility at the database if the caching intermediary is not maintaining results for the first query and accesses results of the analysis facility to determine one or more tables of the database upon which queries of the form depend. The caching intermediary further maintains the form of the first query and an indication of the one of more tables of the database upon which queries of the form depend.

BRIEF DESCRIPTION OF THE DRAWINGS

[0019] FIG. 1 is a block diagram of a database caching system in accordance with one embodiment.

[0020] FIG. 2 depicts a database cache in accordance with one embodiment.

[0021] FIG. 3 is a block diagram of a database caching system in accordance with one embodiment, wherein processing and caching database queries is depicted.

[0022] FIG. 4 is a flowchart for processing and caching database queries in accordance with one embodiment.

[0023] FIG. 5 is a block diagram of a database caching system in accordance with one embodiment, wherein the detection of a backend update to the database, and subsequent update to the cache is depicted.

[0024] FIG. 6 is a flowchart for detecting updates to a database and updating a database cache according to the update in accordance with one embodiment.

[0025] FIG. 7 is a flowchart for monitoring updates to a database using a listener component implemented as a debugger for at least a portion of the database.

[0026] FIG. 8 is a block diagram of a database caching system in accordance with one embodiment, wherein processing and caching of database stored procedures is depicted.

[0027] FIG. 9 is a flowchart for processing and caching database stored procedures in accordance with one embodiment.

[0028] FIG. 10 is a flowchart for caching database queries and invalidating the results of the database queries based on row level updates to the database.

[0029] FIG. 11 is a block diagram of a multi-level database caching system in accordance with one embodiment.

DETAILED DESCRIPTION

Database Caching Intermediary

[0030] FIG. 1 is a block diagram depicting one embodiment of the technology described herein executing in an application server--database server environment. FIG. 1 depicts an application server 102 and database server 122. A variety of applications 104 can be hosted and executed upon application server 102. Applications 104 can be implemented in any suitable language, including but not limited to the Java.TM. programming language, C++, etc. The application servers on which applications 104 are executed will vary by embodiment, but can support such platforms as the Java.TM. 2 Platform, Enterprise Edition (J2EE) Platform.

[0031] In one embodiment, servers, processing systems, and other processor based devices as described can include, by way of non-limiting example, one or more processors capable of executing instructions provided on processor-readable media to perform the methods and tasks described herein. The processing system may include a volatile memory, a mass storage device or other non-volatile memory, a portable storage device, one or more communications or network interfaces and various I/O devices. A processing system hardware architecture as described is just one suitable example of a processing system suitable for implementing the present technology. While embodiments are presented with respect to exemplary server-based implementations, variations for stand-alone, client, and other suitable processing environments can be made without departing from the scope of the present disclosure.

[0032] The database server 122 hosts a database application 120. Database server 122 includes an operating system 124 and a non-volatile storage device 126 (such as a disk drive or RAID array). Numerous types of databases 120 can be used in accordance with embodiments of the disclosed technology. For example, database 120 can include a database provided by Oracle.TM. Corporation of Sunnyvale, Calif., a database provided by Sybase.TM. Corporation of Dublin, Calif., or a database provided by Microsoft.TM. Corporation of Redmond, Wash. The technology described herein is not limited to any particular type of database application. References to particular types and brands of databases and the particular functions of a type of database are presented for exemplary purposes and are not intended to limit the scope of the present disclosure or claimed subject matter. The extension of an embodied concept from an exemplary described database to a different type of database will be readily apparent to those of ordinary skill in the art. It is further noted that the disclosed server based implementations are exemplary and other implementations can be made on client devices hosting the database and/or caching system, for example. Moreover, multiple levels of caching in accordance with embodiments can be provided at one or more servers in addition to those indicated in FIG. 1. For example, a caching system implemented at a central server could interface between the components of database server 122 and those of multiple application servers. Cached results can be maintained and retrieved from the central server in addition to or in place of maintaining and receiving them at an intermediary cache at the application server level.

[0033] Interface 106 is provided at application server 102 to enable applications 104 access to database 120. Interface 106 will vary by implementation according to the type of programming platform and/or database provided. Interface 106 is provided in accordance with an Application Programming Interface (API) that provides connectivity between the programming language(s) and platforms associated with applications 104 and database 120. In many modern database implementations, interface 106 is implemented in accordance with the Java.TM. Database Connectivity (JDBC) API, an industry standard interface that provides database independent connectivity between the Java.TM. programming language and a wide range of databases. Interface 106 is a JDBC driver in one embodiment, implementing the JDBC API to provide a call level API for SQL-based database access. The JDBC driver can provide transparent access for applications 104 to database 120.

[0034] A database caching intermediary 108 is provided at application server 102 to increase the performance, reliability and availability of the information maintained by database 120 for requesting applications 104. Cache 108 can provide previously received results for queries or other requests/calls submitted to database 120 that are maintained locally at application server 102 to increase the performance and availability of database 120 while decreasing the response time for requests. Applications 104 request data from database 120 by submitting queries or other calls. For example, applications can submit commands formatted in accordance with the specifications of interface 106 (e.g., as JDBC commands) to provide SQL commands to database 120. Intermediary 108 can more quickly provide the results of a query if valid results are currently stored in the cache. Moreover, intermediary 108 can increase database availability by providing cached results when database 120 is inaccessible.

[0035] The structure of caching intermediary 108 may vary by implementation according to the requirements of a particular application. In the embodiment depicted in FIG. 1, caching intermediary 108 includes an intermediary interface or driver 112 and a cache 110. Interface 112 can provide and execute many of the disclosed operations while cache 110 maintains portions of the data from database 120 and internal data used by the caching intermediary 108 that can maintain portions of database 120 in local memory. Interface 112 conceptually wraps itself around native interface 106 to provide transparent intermediation between applications 104 and database 120. Applications 104 can communicate with interface 112 in the same manner they would with the native database interface. To requesting applications 104, interface 112 will appear the same as the native interface 106. By wrapping around the native interface, the intermediary interface forces applications 104 to communicate with the intermediary, instead of directly accessing interface 106.

[0036] In one embodiment, interface 112 is a HA-JDBC High Availability Java.TM. Database Connectivity (HA-JDBC) driver. Interface 112 will appear as a standard JDBC driver and thus provide a transparent solution to applications 104 for caching query results. Intermediary 108 can be installed and integrated into an existing system without modifying any application source or object code and without modifying any database source or object code. Applications 104 continue to access database 120 in the typical manner, appearing to interface directly with the native database driver 106, while in fact, interfacing with the intermediary driver 112 provided in accordance with one embodiment.

[0037] Caching intermediary 108 provides cached results that are maintained current and invalidated where appropriate based on updates to database 120 or events processed by database 120. Changed data within the database is the impetus for determining that cached database results are no longer valid. This contrasts with traditional database caches that operate as scheduled or time-based caches. For example, traditional schedule-based caches invalidate cached results based on a set schedule or time interval. Entries stored in the cache expire or are invalidated periodically based solely on an elapsed period of time. Invalidations occur in response to the mere passage of time without regard to any actual changes at the database. By using such a technique, these systems stand not only to return invalid data to requesting applications, but also to needlessly invalidate valid cached data and decrease performance.

Depending Analysis Using Native Database Facilities

[0038] Intermediary 108 can detect actual changes to database 120 and invalidate and/or update entries in cache 110 based on these updates rather than an elapsed period of time according to a predetermined schedule. In one embodiment, changes to database 120 are detected through queries submitted by applications 104 and received at the caching intermediary. In one embodiment, caching intermediary 108 is able to detect changes to database 120 made in response to queries without parsing the submitted queries. In one embodiment, the caching intermediary 108 calls a native analysis facility provided by and inherent to the database to analyze queries received from requesting applications. The analysis facility provides information regarding how a query submitted for analysis is processed by the database. Caching intermediary 108 can determine details of how the database processes the query, including the tables or other data partitions the database accesses to fulfill the requests.

[0039] It is important to note that the analysis facility called by caching intermediary 108 is a native facility or process provided by the database. This enables the intermediary to determine which tables of a database are accessed in responding to a query. Importantly, the intermediary is able to determine this information without parsing the actual query. Databases typically provide debugging and performance analysis facilities as native database procedures available to end-users. These facilities are traditionally used by database administrators to debug queries or to increase the performance associated with a particular query. An administrator may view how a query is processed and modify the query structure based on this review to maximize performance. Caching intermediary 108 can exploit one or more of these native facilities to determine the dependencies of queries it receives.

[0040] The use of native database facilities to determine this information is a strong departure from traditional caching techniques that rely on parsing queries. For example, intermediary 108 may call an "EXPLAIN PLAN" facility to determine the dependencies of a query submitted to an Oracle.TM. database. An abstracted or skeleton version of the query can be issued in a call to this facility to determine how such a query is processed and this, the tables accessed. Numerous types of native database facilities may be used in various embodiments depending on the type of database platform, programming language, or server platform.

[0041] The use of native database facilities to determine dependency and other information for a query is a significant and strong departure from traditional caching techniques that rely on query parsing, for example. To provide cached results based on changes to data in a database, previous techniques have relied on parsing the actual query received from an application to determine query dependencies, etc. These systems include an inherent limitation that it is difficult if not impossible to precisely parse a query without knowledge of the relevant database source code. Any cache provided by an entity without intimate knowledge of the database code will not be able to fully understand and parse queries designed for that database. The technology described herein avoids these inherent limitations by relying on the database itself to assess and provide the necessary information for caching to the intermediary.

[0042] FIG. 2 depicts an exemplary implementation of intermediary cache 110 in accordance with one embodiment. In FIG. 2, cache 110 includes three individual tables to facilitate caching and cache invalidation. A results table 204 maintains a list 210 of queries and the results 212 of those queries as retrieved from the database. Table 204 further includes an indication 214 of the time at which the query was processed and the results obtained from the database. In one embodiment, the results are time-stamped and the third column is not used.

[0043] Dependency table 206 maintains a list of query structures, also referred to as skeletons or footprints, along with a list of the tables from database 120 upon which queries of that structure are dependent. A query skeleton is a form of an actual query without any variable data or literal expressions. Literal expressions refer to the individual variables within a query for which input data is passed to the database to retrieve unique records. For example, a simple query (shown as entry 216 in table 204) that selects a record from an employee table where the user_id column is 12 will have a skeleton as shown by entry 222 in table 206. The skeleton shown is a form of the original query with the user_id literal expression "12" replaced with ":v." All queries having the same skeleton will be processed in a similar manner and retrieve results from the same tables in the database. Any query having the structure of entry 222 is dependent upon the employee table. Replacing the literal expressions with ":v" is but one example of abstracting a query. Any suitable place holder can be used and in one embodiment, the literal sections are simply "marked off" using some external method (e.g., recording the indices of the start and end characters of each literal section). Modification table 208 tracks modifications or updates to the tables in database 120. Table 208 lists table from database 102 and the time at which the table was last updated. Using the tables of cache 110, the caching intermediary can determine if valid results are cached for a particular query. If results are listed in table 204 for a received query, table 206 is accessed to determine what tables the query is dependent upon. Table 208 is then accessed to determine when those tables were last modified. If the time of the last modification is before the time the results were obtained (table 204), the results are determined to be valid.

[0044] A listener component 128 is also provided to detect changes to data maintained by database 120. This component provides the additional benefit of being able to detect so-called backend updates to database 120 that do not pass through intermediary 108. For example, database 120 can be modified by a database administrator or other entity not in communication with intermediary 108. Listener component 128 can interface directly with the database at the database server to detect any backdoor updates that originate from beyond the reach or control of the caching intermediary.

[0045] In accordance with one embodiment, invalidations of cached database results are based on actual changes to database data. Updates to the database trigger the invalidation of cache entries within a caching intermediary. A more efficient cache is provided by avoiding unnecessary invalidations of cached results, as inherent in time-based caches. Furthermore, invalidations based on changed data as determined from the database provides consistency between data in the cache and data in the cache and database. Moreover, the reliance on the database to analyze queries and determine dependency information avoids the limitations associated with parsing queries submitted to a database for which the underlying code and operation is not well understood.

[0046] FIG. 3 is a simplified block diagram depicting the communication flow of a database caching system in accordance with one embodiment. Application 104 (e.g., Application A) at application server 102 communicates with caching intermediary 108 to access database 120, implemented at database server 122. Application 104 issues query 140 for execution by database 120. Query 140 is received at caching intermediary 108. When query 140 is received at the intermediary, the interface 112 accesses intermediary cache 110 to determine if the intermediary is maintaining valid results for query 140, retrieved from the database in response to a previous request including query 140.

[0047] Interface 112 accesses cache 110 as indicated at 141 and determines whether results for query 140 are currently maintained locally at the cache. If the results are currently stored--in results table 204 for example--the intermediary accesses a dependency table 206 at cache 110 to determine which tables at the database queries of the received query's structure are dependent upon. Table 208 is then accessed to determine the last time each table query 140 is dependent upon was last updated. The time the results stored in results table 204 were obtained is compared with the last time each dependent table was updated to determine whether the cached results were retrieved from the database after the last update to any dependent table of the query. If the results were retrieved after the last update the cached results are returned as indicated at 138.

[0048] If the results were retrieved before the last update or if no results are maintained, the database is accessed to process query 140. Interface 112 abstracts the query by removing the literal expressions from the query string to create a query skeleton 142 having the structure of query 140. After abstracting the query, intermediary interface 112 passes the skeleton 142 to the native database interface which forwards it to database 120. In one embodiment, interface 112 checks cache 110 first to determine if the dependency analysis information for the abstracted query is already stored. If so, the abstracted query is not passed.

[0049] If the query structure and dependencies of query 140 are not already stored, interface 112 passes query structure 142 by calling a native analysis or tuning facility 162 provided by database 120. The term native indicates that the facility or process called at the database is a process provided by and originating from the database itself, not caching intermediary 108. Databases typically provide various functionality for performance analysis, tuning, and debugging, etc. Debugging facilities can provide information to database administrators (DBAs) and developers about how a query will be processed by the database. The facilities may analyze a query structure and specify which tables are accessed to process queries of that structure, how or in what order the tables are accessed, how the data from the tables are combined, and other types of information about how the database processes a query having the structure that was passed when calling the analysis facility. These facilities provide information to DBAs, etc. so that queries can be restructured or modified to optimize processing at the database. In one embodiment for example, an Oracle.TM. database provides a debugging facility commonly referred to as "EXPLAIN PLAN." In response to a call to the "EXPLAIN PLAN" facility passing a selected query structure for analysis, the database will analyze how the database processes that particular query structure.

[0050] When database 120 receives the abstracted query skeleton 142, it will execute the facility called by the intermediary when passing the skeleton. After analyzing skeleton 142 to determine how queries having the particular structure are processed, intermediary interface 112 accesses the results in order to determine the tables upon which queries of structure 142 are dependent. Different types of databases make these results available in different formats. For example, database 120 may create a table, file, or other data structure 150 with the results of its analysis and store it locally on disk 126, for example. Interface 112 can access the results from disk to determine the dependencies for query structure 142. An exemplary communication 144 from intermediary 112 to listener 128 instructing the listener to read results 150 from disk 126 is depicted in FIG. 3. After reading the data from disk 126, listener 128 provides the requested information to interface 112 as depicted by reply 145. In other embodiments, a database may provide the data from an analysis by such a facility directly to the requesting application such as interface 112.

[0051] Numerous types of analysis facilities are provided by databases that can be accessed in accordance with embodiments to determine query structures' dependency information. Embodiments of the technology described herein are not limited to any particular analysis facility provided by any particular database. It is important to note that a cache in accordance with one embodiment can call a native database analysis facility without having any predetermined knowledge of the query or the database protocols to be able to parse the query itself. This enables the cache to be implemented transparently to both requesting applications and databases. Importantly, the cache does not parse the query received from the requesting application. This enables the cache to interface with databases having different protocols, platforms, programming languages, and interfaces. Because the system calls a native facility at the database, it does not need to fully comprehend or be able to parse the queries it receives. The native database facility is relied upon for the required dependency analysis.

[0052] It is also important to note that these native analysis facilities such as "EXPLAIN PLAN" are not intended or designed for use in caching operations as presently described. These facilities are typically provided for database administrators to manually issue queries to the database to analyze the performance of the database when executing a particular query. These tools are valuable to database administrators because they can design and restructure queries to maximize efficiency and performance by the database. By taking advantage of such facilities for caching query results, the complicated task of parsing queries is avoided while improving performance be reducing or eliminating parsing errors. Because of their intended application, these analysis facilities typically provide large amounts of information in addition to indicating the tables the database accesses in response to a particular query structure. In one embodiment, listener 128 filters the information received from the database in response to the query analysis to determine the tables accessed when executing the query.

[0053] After receiving dependency information, intermediary 108 passes the actual query 140 to database 120. Database 120 executes the query and returns results 148. If intermediary 108 determines that query 140 is a read query (as can be determined from the query structure analysis), a new entry is created in a dependency table listing skeleton 142 (if an entry does not already exist) and the dependencies determined by the database analysis facility. Intermediary 108 also creates an entry in results table 204 for query 140, specifying the query 140, the results of the query provided by database 120, and the time at which the results were obtained. FIG. 3 conceptually depicts an update 143 issued from interface 112 to cache 110 specifying the data to be stored in the results and dependency tables. Processing of the query is complete when results 148 are provided to application 104. It should be noted that the query and skeleton can be passed to the database in any order and the cache may determine the dependency information after receiving the actual query results.

[0054] If intermediary 108 determines that query 140 is a write query, it updates or creates an entry in a modification table listing the tables effected by such a write query and the time at which the write query modified the tables. For example, if the write query modifies an employee table, and an entry already exists in modification table 208 for this table, the time at which that table is indicated to have last been modified is updated to reflect the new modification time. After updating or creating entries in the modification table, an indication is returned to application 104 that the query has been successfully processed.

[0055] In one embodiment, an entry is included in dependency table 206 for write query structures and the tables modified by those write query structures. A database analyzing a write query structure will provide information regarding the tables updated or modified by that query structure. When a query is subsequently received, the dependency table can be accessed to determine if dependency information for that query's structure is already stored. If so, the actual query can be submitted to the database without passing the query structure to the database to determine the dependent tables. When the results are received, the modification table can be directly updated. In one embodiment, the dependency table indicates whether a query is a write or read query and thus, whether the listed tables are modified by the query or whether the query is dependent upon those tables.

[0056] In an alternate embodiment, entries in table 204 affected by a write query are directly invalidated instead of updating times of last modification in table 208 that will be used later to determine if cached results are still valid. For example, each entry from table 204 can be read and the results of any queries that are dependent upon an updated table immediately invalidated. In one embodiment, the tables each query listed in the results table 204 is dependent upon is determined from dependency table 206. Intermediary 108 invalidates any results from the results table that depend upon an updated table. While such embodiments are possible, efficiency may decrease if numerous entries are listed in table 204. By maintaining a list of the last update times for each table, better efficiency can be achieved by invalidating results only when a subsequent query corresponding to those results is received to avoid reading every entry from table 204 in response to any update to the database.

[0057] FIG. 4 is a flowchart of a method for processing the database queries and caching results thereof in accordance with one embodiment. A caching system, such as intermediary 108 executing at application server 102, receives a query from an application at step 302. A results table is accessed at step 304 to determine whether an entry exists for the received query. If results are not being maintained for the received query, the query is abstracted at step 306 by removing any literal expressions or variable data.

[0058] After abstracting the query, a dependency table maintained by the caching system is accessed at step 308 to determine if an entry exists for the structure of the abstracted query. If an entry exists, the tables the query is dependent upon (read query) or the tables the query modifies (write query) are determined at step 310. After determining the tables at step 310, the unabstracted query is passed to the database at step 312 for processing. After processing the query, the database returns the results to the cache.

[0059] If the abstracted query structure is not stored in the dependency table, the abstracted query skeleton is passed to the database at step 314 by calling a native analysis facility provided by the database. The query is passed onto the database and the results received at step 316. The results of the abstracted query analysis by the database are accessed at step 318. From the results accessed at step 318 or step 310, the cache determines whether the query was a write or read query at step 320. If the query is a read query, an entry is created in the dependency table at step 322 setting forth the query structure and the table(s) the structure is dependent upon. If an entry already exists as determined at step 308, step 322 is skipped. An entry is created in results table 204 at step 324 listing the actual query and the results received from the database upon passing the query at step 312 or 316. These results are stored with an indication of the time they were received to facilitate subsequent data validity determinations. The results of the query are returned to the requesting application at step 326.

[0060] Returning to step 320, if the received query is determined to be a write query, an entry is created in the dependency table at step 328 setting forth the query structure and the table(s) queries of that structure modify. Any entries in modification table 208 that correspond to tables affected by the received query are updated at step 330 to reflect the time of the latest update by the query. If no entry or entries exists in table 208 for the table(s) updated by the received query, new entries are created. After updating and/or creating the entries at step 330, a response is returned to the requesting application at step 332 indicating that the query has been successfully processed.

[0061] If it is determined at step 304 that an entry in results table 204 for the received query already exists, table 206 is accessed at step 334 to determine what tables the query is dependent upon after abstracting the query. After determining the dependent tables, modification table 208 is accessed at step 336 to determine the last time each dependent table was last updated. If any dependent table was updated after the time of retrieval for the query results as reflected in the time stamp at the results table, the results are determined to be invalid at step 338. The actual query is then passed to the database at step 340 and the results of the query received. Results table 204 is updated at step 342 to reflect the new results for the query that were received from the database. The cache also adjusts the time for the query in the results table to reflect the time the new results were retrieved.

[0062] The results of the query are returned to the requesting application at step 344 from the cache after determining that the query results are valid at step 336. Otherwise, the results received at step 338 from the database are returned at step 344.

Database Listener

[0063] FIG. 2 illustrates the detection of updates to a database directly by a caching intermediary in response to a received query and structural analysis provided by the database for the query. It is also possible for updates to a database to be made without a caching intermediary detecting them. For example, an application accessing the database from a location outside of application server 102 may send queries that do not pass through the caching intermediary. Other applications implemented on different application servers with their own communication links to database server 122 may send updates that do not pass through the intermediary. Database administrators or other users may directly modify the database through an administration interface provided by the database. Such updates may be referred to generally as backend updates.

[0064] FIG. 5 illustrates a backend update 166 to database 120 originating from an entity beyond application server 102. Update 166 is received directly at database server 120 and does not pass through the caching intermediary 108 at application server 102. Although a backend update is described for exemplary purposes, it is noted that updates that pass through the caching intermediary may also be detected through the hereinafter explained techniques, in addition to being directly detected by the intermediary.

[0065] As understood by those of ordinary skill in the art, databases typically maintain portions of user data in local memory for performance considerations. Databases may aggregate updates to the database over a period of time and write them all to disk at once in the background when it has ample bandwidth and resources to do so. Databases typically maintain a log or other identification reflecting all updates to the database in order to track the updates that have propagated to disk and those that are currently only maintained in local memory. A query that updates one or more tables in database 120 can be processed, and the updated results maintained in a memory local to the database such as a software memory or a fast volatile memory such as RAM. Subsequently received queries that reference this data before it is written to the database tables in non-volatile memory are processed using the updated data in local memory. Database 120 may accumulate numerous updates to the database from various queries before writing updated data to the tables or other data structures stored on disk 126 or other non-volatile memory.

[0066] The log maintained by a database, often referred to as a transaction log, is continuously updated and written to a non-volatile memory such as disk 126. If power loss occurs, causing data maintained in local memory at the database to be lost, the transaction log enables the database to "replay" any lost transactions to recover data not written to disk before the power loss. Upon reboot, the database can read the transaction log, compare it with the data stored on disk, and if any updates reflected in the log have not yet propagated to disk, update the appropriate tables on disk in accordance with the information in the transaction log.

[0067] In FIG. 5, a transaction log 152 is maintained by database 120 and stored on disk 126. Listener 128 can utilize this transaction log to detect all changes to database 120 regardless of origin. When database 120 receives a query, such as query 166, specifying an update to one or more tables, the database calls the operating system to write information relating to the query to the transaction log 152 at disk 126. Database 120 causes the transaction log to be updated, in addition to making the changes specified by the query in local memory. Listener 128 can access the updated transaction log 152 on disk 126 to determine if changes were made to database data. When listener 128 detects an update to the database, it notifies caching intermediary 108 that an update has been made as shown by communication 188. The intermediary can update a modification table to reflect the time at which the tables corresponding to any previously listed entries were updated, and to add new entries and modification times for any updated tables not already listed. In one embodiment where cached results are invalidated immediately upon detecting an update and a table such as table 208 is not used, intermediary 108 can update a results table 204 by invalidating or deleting those entries dependent upon an updated table.

[0068] FIG. 6 is a flowchart depicting one embodiment for updating a cache based on detecting changes to data at the database. In FIG. 6, a database server-side component monitors changes to a database transaction log to detect and report changes to database data to a caching intermediary remote from the database server.

[0069] A transaction log or other indication of transactional updates to the database is accessed at step 352. A server-side listener component 128 reads the transaction log and at step 354 compares the most recent data from the transaction log with data read from a previous version of the transaction log. If the listener detects changes to the transaction log, it makes a determination at step 356 that the database data has been modified. If no changes to the transaction log are detected, the listener determines that the database data has not been modified. In one embodiment, selected tables of the database are monitored while others are ignored. Listener 128 can be configured to determine that a change to monitored database data has occurred when a change to the transaction by reflecting an update to one of the monitored tables if detected. In one embodiment, listener 128 detects all changes to the log and database but only reports changes to specified tables.

[0070] When an update to database data, or an update to selectively monitored data is detected, the listener formulates an update to be sent to the caching intermediary at the application server. An update is sent to the caching intermediary at the application server at step 358. The intermediary will update its local cache results and tables at step 360 to reflect the changes to the database. For example, an indication of the time a table was last modified as maintained in a modification table 208 can be updated to reflect the time of this latest update. After the local cache is updated, the caching intermediary can optionally send a reply at step 362 to the listener indicating that the necessary updates have been completed. After receiving the success response at step 362 or determining that no monitored tables have been updated at step 356, the listener will read the transaction log again at step 352 to determine if more updates have been made. In one embodiment, a delay is implemented before accessing the transaction log again at step 352. Additionally, the listener may not wait for a success response from the cache before accessing the transaction log again after sending an update at step 358.

[0071] In one exemplary embodiment, listener 128 is configured as a debugger for a native facility or process associated with database 120. As a debugger for a process executed by the database, listener 128, via an operating system 124, can gain access to and control operations performed by database 120.

[0072] Databases typically provide a facility or process that is responsible for writing data to transaction log 152 in accordance with updates to the database. This type of facility is commonly referred to as a log writer process for the database. By disguising itself as a debugger for a database's log writer process, listener 128 can be notified when transaction log 152 is updated, access the transaction log or information stored therein, and notify the cache of any relevant updates.

[0073] Referring again to FIG. 5, the exemplary embodiment presented therein includes a listener 128 configured as a debugger for a native log writer process 154 of database 120. Listener 128 attaches itself to the database through log writer 154 in such a manner to be notified of successful write operations to disk 126 that update or create transaction log 152.

[0074] Database 120 receives query 166, specifying an update to one or more tables of the database. The database initially updates any data it is storing in local memory to reflect the changes made by query 166. Database 120 also updates any copy of transaction log 152 it is maintaining in local memory to reflect the updates.

[0075] Log writer process 154 is responsible for updating and writing transaction log 152 to disk 126 when updates to the database are made. Database 120, via logwriter process 154, sends a write command 182 to update the file for transaction log 152 stored on disk 126 in order to reflect the updates made by query 166. Write command 182 is received by operating system 124, which forwards the write command 182 to disk 126, typically via a disk driver (file system) for the disk. When disk 126 has successfully written the data for transaction log 152, it sends a response 184 to operating system 124 indicating that the data was successfully written.

[0076] As a debugger for log writer process 154, listener 128 instructs operating system 124 to notify it when updates are made to transaction log 152 by log writer process 154. In FIG. 5, listener 128 is configured to be notified when operating system 124 receives a write request for transaction log 152 from log writer process 154. When operating system 124 receives write request 182, it notifies listener 128 at 186 that a write request for the transaction log has been received. Operating system 124 passes the new or updated data written to transaction log 152 in response to update 166 to listener 128, as indicated at 186. In another embodiment, the operating system passes the full transaction log to listener 128. Listener 128 parses the information received from operating system 124 to determine the nature of the database changes. If the listener determines that the updates made by query 166 effect one or more tables of database 120 that it is configured to monitor, a notification 188 is provided to caching intermediary 108 so that it can invalidate any query results that are dependent on the updated table(s). In one embodiment, the intermediary updates information maintained locally to indicate the time at which the tables were modified, by modifying or creating new entries in a modification table such as table 208. In one embodiment, listener 128 is configured to be notified when a write operation for transaction log 152 has been successfully completed. When operating system 124 receives the success response from disk 126, it can notify listener 128 at 186 that a successful write operation to the transaction log has been completed.

[0077] When operating system 124 receives success response 184 from disk 126, it will also reply to database 120 with success response 190 informing it that the transaction log was successfully updated. Log writer process 154 receives response 190 and can issue a response to the requesting application that its update was successfully processed.

[0078] It is possible in a configuration like that of FIG. 5 that an update from listener 128 to intermediary 108 may not be processed in sufficient time to avoid returning old or no longer valid data to requesting applications. Typically, databases will temporarily stop processing updates to database data while the transaction log is writing the log to disk. Databases my further refrain from responding to read queries with updated data until that data has been successfully written to the log file maintained in non-volatile memory. This avoids data inconsistencies should the data in local memory be lost after it has been provided in response to a read request but before it is written to non-volatile memory. Log writer process 154 can pause database 120 from further write and/or read query processing while it is writing the transaction log. When success response 190 is received, log writer process 154 knows the updated data for the database is now maintained in non-volatile memory. Accordingly, the log writer process will free the database 120 to continue processing queries.

[0079] Consider a situation where listener 128 and database 120 receive responses 186 and 190, respectively, at the same time. Log writer process 154 will process success response 190 and then release the database to use and respond with the newly updated data as well as process new queries. When intermediary 108 receives update 188 from listener 128, it updates its cached data as required to invalidate any effected result entries in results table 204. If the intermediary updates its records after database 120 is released to begin responding and using the new data from query 166, intermediary 108 may return old data in response to a query. Imagine that a read query is received by intermediary 108 after database 120 is released to use and respond with the new data from query 166, but before the intermediary updates its records to reflect that the relevant old entries are no longer valid. The intermediary may return the cached results from its local memory to the requesting application since it is unaware of the changes made to the database by query 150.

[0080] FIG. 7 is a flowchart of a method in accordance with one embodiment for detecting changes to a database using a listener configured as a debugger for a native database facility responsible for updating transactional information to document changes to the database. In the embodiment of FIG. 7, a technique is implemented to ensure that a caching intermediary is updated in response to changes to the database before the database is released to begin using and responding to requests with the new data. The listener presents itself as a debugger to the operating system at the database server. As understood by those of ordinary skill, debuggers can gain access and control over select portions of code, applications, or entire systems. By implementing the listener as a debugger for a portion of the database, the listener can take control of that portion to receive information regarding updates at the database from the database itself. Because the listener is disguised as a debugger, the operating system is instructed to notify the listener of select actions and allow the listener to take control of select actions to receive this updated information. The listener can thus take control of a portion of the database to ensure that the caching intermediary is updated before the database begins using updated data. While such an implementation is broadly and generally described as a debugger, it will be understood that the listener can gain access and control of portions of the database using other suitable techniques, such as through various instrumentations of the database source or object code.

[0081] The database receives an update to data stored in the database (e.g., write query) at step 370. The database updates its local memory at step 372 to reflect the changes made by the write query. The database updates its transaction log in local memory at step 374 to reflect the updates made by the query. The database sends a command to the operating system at step 376 to write the new information to the transaction log stored on a local non-volatile storage medium disk. In one embodiment, a log writer process is responsible for writing the transaction log to disk and issues a write command to the operating system at step 376.

[0082] At step 378, the operating system issues a write command to update the transaction log at the disk responsible for storing the transaction log. The write command, which includes the new information from the query, is received at the listener component at step 378. The listener component is configured as a debugger for the log writer process so that the operating system forwards the write request to it for processing. The listener presents itself as a debugger in order to gain access and control of the log writer process to receive the transaction log updates. The listener forwards the write request to the disk at step 382 and receives a response that the data was successfully written at step 380.

[0083] The listener is further configured to pause the log writer process at step 834 when a success response is received from the disk indicating an update to the transaction log. In one embodiment, listener 128 pauses the log writer process by instructing the operating system not to return the success response to the database. The listener temporarily pauses the log writer process to insure that the caching intermediary at the application server has updated its local cache results updated in accordance with the write or update query. Database 120 will not use the new data provided by the query until it receives a notification that the transaction log has successfully been written to disk.

[0084] The listener parses the information received from the operating system at step 386 after the log writer process has been paused. The listener determines the tables modified by the query. If tables the listener was configured to monitor were updated, an update is sent at step 388 to the caching intermediary at the application server. At step 390, the caching intermediary updates its locally stored information (e.g., modification table 208) to reflect the time at which the tables were modified by the query.

[0085] After the caching intermediary successfully updates its data, it provides a success response to the listener at step 392. Upon receipt of the response, the listener instructs the operating system at step 394 to release the log writer process. At step 396, a success response indicating that the transaction log was successfully written is issued to the log writer process. The log writer process releases database 120 to use the new data from the query at step 398. In this manner, the database will only use new data after that data has propagated to the caching intermediary. Thus, each cache can remain in a consistent and accurate data state with respect to the actual database.

[0086] In one embodiment, the listener is configured to be notified of updates to the transaction log before they are written to disk. For example, the listener component can be notified of updates by the operating system when a write command is issued to the disk to update the transaction log. If updates are made to tables which the listener is monitoring, the listener begins updating the caches at the application level prior to the disk handling the write command. It is possible that the transaction log will not be successfully written to disk and thus, the database not updated with the new data. The caching intermediary may invalidate cached results that are still valid as a result. While invalidations may occur more often than necessary in this scenario, no data consistency issues will exist. The intermediary will simply provide queries to the database for which it was maintaining valid results in its cache.

[0087] The listener is not limited to the particular structure of any disclosed arrangement and can be tailored to the requirements of individual implementations. In one embodiment, the listener is configured as a file system driver for the disk on which the database transaction log is to be stored. After the database receives an update to the database data, it issues a command to the operating system to write the new information to the transaction log on disk. The operating system issues this write command to an interface for the disk adapted to translate internal addresses used by the operating system to addresses understood by the disk. The listener can disguise itself as the disk interface or driver for the disk storing the transaction log in order to intercept commands issued to the disk via the disk interface.

[0088] The listener will receive the command and forward it to the actual file system driver for the disk. The file system will pass the command to the disk and receive a success response in return when the data is successfully written. The listener can be configured to access the transaction log or the data passed in the write command to be written to the transaction log to defect updates to database data. In one embodiment, the listener accesses the data written to the transaction log when the write command is issued to the file system for the disk. In another embodiment, the listener accesses this information when a success response is received from the disk after the data has been written. In either case, the listener parses the information it receives and provides an update to the caching intermediary if necessary.

[0089] Similarly to the debugger configuration, the listener can pass the success response received from the disk to the operating system immediately upon receipt. The operating system will send a success response to the database, freeing the database to use the upload information. This scenario can lead to data inconsistencies between the cache and database as previously described.

[0090] In another embodiment, the listener pauses the log writer process after receiving the success response from disk. The listener will wait until it receives a success response from the caching intermediary before issuing the success response to the operating system. By pausing the log writer process until the cache is updated, the listener ensures data consistency as already described.

Caching Stored Procedures and Invalidating Results

[0091] Stored procedures represent an increased level of complexity for caching systems associated with databases. Stored procedures define a set of one or more operations and are stored at a database in a compiled format to be called by requesting applications to receive the results of the operations. Stored procedures typically include a set of SQL statements. Applications or developers wishing to access a stored procedure simply need to know the stored procedure's name, rather than schema, indexing, and column information required to develop a query. In addition to SQL statements, stored procedures can include operational code. Stored procedures can execute a set of code, call other databases to access or modify records, directly access other files, and/or access and transmit other information, such as by calling other applications. Stored procedures are valuable, if not essential, components of many database installations. They provide centralized query management for institutions, grouping operations and call commands for increased security and to avoid errors, etc. when writing individual queries and application code. Stored procedures improve database performance by reducing the total traffic between applications and the database. An application can call a stored procedure once to execute numerous queries and receive a single set of results, rather than issue each individual query and receive a result for each issued query. The pre-compiled nature of stored procedures provides significant improvements in performance when compared to traditional query statements. Stored procedures are compiled just once by the database which reuses the compiled process whenever the procedure is called. Queries on the other hand, are compiled each time the query is executed.

[0092] Caching techniques that rely on parsing queries received from requesting applications are unable to handle stored procedures. A stored procedure is only definable and understood by the database on which it is stored. For example, a typical stored procedure as seen by an application or application server will simply be a textual invocation of the procedure such as foo( ). Any application or caching system interfacing with the database does not know the internal operations defined by foo( ), and thus, can only see this textual call or issue similar calls to access the procedure. Parsing such a call to a stored procedure will not reveal anything about the individual queries the stored procedure references or the commands it executes at the database.

[0093] In order to run stored procedures, databases typically perform an analysis before run-time to assess how the stored procedure is processed. For example, the database may compile the stored procedure, examining the structure of the procedure, the individual queries the database tables accessed, in what manner and order the tables are accessed, as well as the various other operations that the database may perform as part of the stored procedure. The database may maintain this information or compiled process locally to increase performance when a request for this stored procedure is received. In this manner, the database does not have to determine this information each time the stored procedure is called. Conceptually, the preparatory work by the database for a stored procedure amounts to compiling the stored procedure for run-time execution.

[0094] In accordance with one embodiment of the technology described herein, stored procedures are successfully cached and the results invalidated based on changed data in the database. FIG. 8 depicts one embodiment for caching stored procedures. Application 104 issues a request 402 for a stored procedure 403 maintained at database 120.

[0095] Application 104 (e.g., Application A) can issue a call 402 to a stored procedure 403 and pass one or more variable values. Intermediary 108 can intercept the request as previously described for simple queries by wrapping itself around the native database driver 106. Intermediary 108 opens a trace session at the database when it determines the request is for a stored procedure. As understood by those of ordinary skill in the art, trace sessions are typically used and designed for debugging. A tracing process (e.g., 161) can collect various types of information associated with an application, system, or section of code during a period of time during which the trace is open. For example, a tracing facility provided by an application can allow a user of the application to observe files accessed, files modified, new files created, applications called, methods called, and various other types of operations associated with the application. Various facilities provided by databases to track and document operations the database performs during a period of time may commonly be referred to as tracing facilities. An Oracle.TM. database application provides a facility called "trace." Other database vendors and types have similar but differently named facilities to perform these processes.

[0096] After opening the trace session with database 120, the intermediary issues the stored procedure call 402 to the database. Database 120 receives call 402 and executes stored procedure 403 by performing the various functions and queries specified by the stored procedure. Database 120 returns the results 406 of the stored procedure to intermediary 108 when it is complete. The results 406 of stored procedure 403 are cached in a results table such as table 204 at the intermediary in the same manner query results are cached. An identification of stored procedure 403, including the literal expressions passed with stored procedure call 402, are maintained along with the results 406 received from the database 120.

[0097] When intermediary 108 receives the results, indicating that the stored procedure has been executed, it also closes the trace session with database 120. When the trace session is closed, database 120 generates a results file 409, which in the embodiment of FIG. 8 is stored at local disk 206 in response to write command 408. The database may create and update the trace file while the trace session is opened in other embodiments. Different databases record the results of trace sessions in different ways. Databases may create a trace file containing information about the actions taken while the trace session was opened. Other databases may return information directly to the application opening the trace session. The technology described herein is not limited to any particular type of database or native facility, and it is anticipated that the particular facility used may vary according to the requirements of a particular implementation.

[0098] Intermediary 108 accesses the trace information from file 409 to determine the queries executed at the database when performing stored procedure 403. In the embodiment of FIG. 8, intermediary 108 issues a request 410 to listener 128 to read trace file 409 from disk 126. Listener 128 responds by issuing a read request 412 to operating system 124. The request is forwarded to the disk which returns the file to the operating system as indicated at 414. The operating system forwards the file to listener 128. Listener 128 parses the information in the trace file to determine the individual queries that users executed to perform stored procedure 403. Listener 128 can select the queries executed in response to the stored procedure by identifying those queries from the trace file that include an identification associated with the caching intermediary. Trace files may contain information identifying all operations, queries, etc. a database performs while a trace session is open, regardless of the originating application to which the database was responding. Databases typically store an identification of the application or process initiating each query in the trace file which the listener can use to select only those queries executed for procedure 403.

[0099] A list 416 of queries executed in response to the stored procedure is sent from listener 128 to intermediary 108 after the trace file has been parsed. From this list, intermediary 108 determines every query executed at the database in response to stored procedure 403. The intermediary determines each query executed by a stored procedure without parsing the stored procedure or the call issued by application 104.

[0100] Having determined the queries executed in response to stored procedure 403, intermediary 108 determines which, if any, tables at the database were modified by the stored procedure. This determination is made as previously described for a standard query received at the cache. Intermediary 108 abstracts each individual query and passes it to the database by calling a native database analysis facility to analyze how it processes a query of that structure. The abstracted query 418 is passed to database 120 which analyzes it and returns results 420 to interface 112 identifying any modified tables.

[0101] Intermediary 108 creates or modifies any entries in tables 206 and 208 associated with portions of the database affected by stored procedure 403. An entry is created (if not already existing) in table 206 for the structure of stored procedure 403, along with the tables the stored procedure is dependent upon. The intermediary can abstract a stored procedure in the same manner as a query by removing any literal expressions or variable data from the stored procedure call. Interface 112 can list the abstracted stored procedure call in cache 110 with an indication of the tables the stored procedure is dependent on. The dependencies are determined from individual query abstraction analyzed by the database. For stored procedures, table 206 can further include an indication of the tables the stored procedure modifies. Table 206 can include an indication for each listed table whether the procedure depends on that tables or modifies that table. In table 208, intermediary 108 lists any tables the stored procedure modifies with an indication of the last time that table was modified. For any pre-existing entries, the intermediary updates the indication of the last time of modification.

[0102] FIG. 9 is a flowchart depicting a method in accordance with one embodiment for caching database stored procedures. A request or call for a stored procedure maintained at a database is received at step 702. At step 704, a results table containing previously received results for various stored procedures and/or queries is accessed to determine whether an entry is present for the stored procedure referenced by the call. If an entry does not exist for the particular stored procedure, the stored procedure is abstracted at step 706 and a dependency table such as table 206 is accessed to determine if an entry is present for stored procedures having the structure of the received query at step 708. If an entry for the query's structure is not present in the dependency table, a trace session is opened at the database at step 710. A request can be issued to the database to track the operations, transaction, queries, etc. processed while the trace session is open. After opening the trace session, the stored procedure request is forwarded to the database at step 712. The database executes the stored procedure at step 714 by performing the actions specified in the stored procedure using the data passed with the request. The database traces its actions performed while executing the stored procedure at step 716 and creates a file or other indication of the trace analysis. After executing the stored procedure, the database returns the results at step 718. The caching intermediary can receive the results and store them in a results table at step 720. An indication of the stored procedure along with the results of its execution can be stored in a results table such as 204. The results are returned to the requesting application at step 722. After caching the results of the stored procedure, the trace session at the database is closed at step 724. In one embodiment, the trace session is closed when the results from the database are received.

[0103] The caching intermediary can issue a request to access the trace information at step 725 if it is not automatically returned as previously described. For example, a request can be issued to a listener installed at the database machine to read a file such as trace file 409 storing the trace data. The listener can issue a read command and parse the information received from the requested trace file. The listener can return all or a portion of this data to the caching system at step 726. In one embodiment, the listener parses the data and returns a list of the queries executed by the database for the stored procedure. In other embodiments, the listener can pass the entire trace file to the intermediary which can perform the proper analysis to determine the queries. IN one embodiment, the trace information is automatically returned to the requesting application such that steps 725-726 can be skipped. In any event, the queries that are executed as part of the stored procedure are determined at step 727.

[0104] Each query executed as part of the stored procedure is abstracted at step 728 by removing any literal expressions from the query. Each abstracted query or skeleton is passed to the database at step 730 by calling a native facility of the database to analyze the structure of each query. The results for each query skeleton are accessed at step 732. The intermediary creates an entry in a dependency table such as table 206 at step 734. Each entry will include a stored procedure skeleton and the tables it is dependent upon. In one embodiment, any tables modified by the stored procedure are also listed in the corresponding entry. The table can include an indication of whether a listed table is a dependency of the query or a table modified by the query. At step 736, entries in a dependency modification table such as table 208 are created or updated for any tables modified by the stored procedure. For each table that was updated in response to the stored procedure command, an existing entry in the modification table can be updated with the time the stored procedure modified the corresponding database table if an entry for the table already exists. If an entry for the table does not already exist, a new entry is created by listing the table and time of modification. The results table, dependency table, and modification table can be used for future stored procedure requests to determine whether results at the cache are validated and can be returned instead of retrieving them from the database itself.

[0105] If it is determined at step 708 that the structure of the stored procedure is already cached in a table such as 206, for example, the dependencies of the stored procedure structure are determined at step 756. A call to the stored procedure is issued at step 758 and the results received. At step 760, the results are cached by the intermediary in the results table. The results are returned to the requesting application at step 762. The modification table 208 is updated at step 764 if the stored procedure includes write queries.

[0106] If it is determined at step 704 that an entry in the results table already exists for the stored procedure, the dependency table is accessed at step 736 to determine the tables the stored procedure is dependent upon. The stored procedure can be abstracted to determine if an entry in the dependency table exists for queries of the received query's structure. The time of last modification of each dependent table is determined at step 742. At step 744, it is determined whether the cached results are still valid. In one embodiment, a time stamp for the results determined from the results table is compared with the times listed in the modification table for each table the stored procedure is dependent upon. If the results in the results table were retrieved before an update to one or more of the tables modified by the stored procedure as indicated in the modification table, the cached results are not valid and are not returned to the requesting application. A call to the stored procedure at the database is then issued at step 746 and the results received. The results of the stored procedure are stored in the cache at step 748 with an indication of the time they were obtained. The results are returned to the requesting application at step 750. The modification table is updated at step 752 to reflect the time of the latest updates if the stored procedure modifies any tables.

[0107] If the cached results are determined at step 744 to be valid, the results are returned to the requesting application at step 754. As mentioned, stored procedures can include write queries as well as read queries. In one embodiment, a call to a stored procedure is issued to the database even when the results of the procedure maintained by the cache are valid and can be returned to the requesting application. The stored procedure is issued so that the updates to the database by the write portions of the procedure can be made. The cached results can still be returned to increase performance in responding to applications. When the results of passing the stored procedure to the database are returned, the caching system can update the modification table to reflect the time of the new updates. Additionally, the results can be used to update the results table with new results for the stored procedure and the time these results were retrieved.

[0108] In one embodiment, the results table and/or dependency table can include an indication for each stored procedure entry whether that stored procedure modifies any data at the database. An embodiment that specifies whether each table in the dependency table is updated by the corresponding stored procedure or whether the stored procedure modifies the listed table has already been described. When the intermediary determines that a requested procedure contains a valid entry in the results table and the procedure does not modify data at the database, the cached results are returned and the stored procedure is not passed to the database. If the entry indicates that the stored procedure does modify data, it can be passed to the database to modify database data after returning the cached results to the requesting application. In yet another embodiment, a caching system is implemented to only cache stored procedure results and other information for procedures that do not modify data at the database.

Row-Level Caching and Invalidation

[0109] The embodiments described thus far have described table based caching systems that invalidate entries when a dependent table of a query is modified. The disclosed techniques are extendable to other levels of invalidation based on more refined analysis of modified data. For example, the disclosed technology can be used to invalidate cached query results when a row the query is dependent upon is modified. Likewise, results can be invalidated only when an actual entry in the database the query is dependent upon is modified. These techniques can increase efficiency and performance by further reducing the number of times valid query results in the cache are invalidated.

[0110] FIG. 10 is a flowchart depicting a row-based caching and invalidation technique in accordance with one embodiment. In FIG. 10, row based caching is implemented for queries whose dependencies are limited to the retrieval of records (row of data) based on a unique identifier for the row. For example, if a query specifies "select * from employee where user_id=1111", and user_id is a unique user_id assigned to each individual in the organization, such a query can be cached and invalidated in accordance with the disclosed techniques. An update to the individual record (row) of the employee table specified in the query can be the basis for invalidating the cached query results. If an update is made to another record (row) in the employee table, the cached results are not invalidated since the update does not affect the cached results.

[0111] For queries that depend on something more than a unique record identifier, the table-based caching techniques previously described are employed. Consider an exemplary query that specifies, "select * from employee where salary >100000." The results of this query depend upon a column in the employee table other than the unique record identifier for that row (e.g., user_id or social security number). The query is not merely dependent upon a part of the entry that is unique for each individual record, but rather the salary of an employee which is a non-unique portion of a record. If the results for this query are cached and a subsequent update increases the salary of an employee that was previously below $100,000 to above $100,000, the cached results will no longer be valid. The employee record with the salary column updated to a value above $100,000 should now be included in the query results. However, the update does not affect any rows that were returned in response to the original query. If invalidation is based only on changes to those records or rows included in the original results, row-based caching in this example would fail to properly invalidate the cached results and invalid results returned for a subsequent received version of this query.

[0112] In FIG. 10, selective row-based caching is implemented to guarantee accurate cache results while also improving performance by reducing the number of invalidations inherent in a table-based invalidation implementation. A query is received from a requesting application at step 802 and at 804, it is determined whether an entry corresponding to the received query exists in a results table. If there is no corresponding entry in the results table, the query is abstracted at step 806 and a determination made at step 808 as to whether an entry for the abstracted query skeleton exists in a dependency table.

[0113] If no entry exists for the query skeleton, the resulting skeleton is passed to the database at step 810 by calling a native analysis facility of the database to assess how the query structure is handled. The results of the analysis are accessed at step 812. At step 814, an entry is created in the dependency table for the query skeleton. Dependency tables in the row-based technique depicted in FIG. 10 are implemented as previously described with respect to table-based caching. The entry lists the abstracted query skeleton and any tables queries having that structure are dependent upon.

[0114] At step 816, it is determined whether the query requests records based solely on a unique identifier for individual records of the corresponding table. If the query depends solely on a unique identifier for a row of the relevant table, the process continues at step 818 where it is determined if the query is a write or read query. If the query is a read query, the query is modified at step 820 to include a request for the row ID number of the record requested by the query. Each record in a table includes an internal identifier used by the database to uniquely identify every record. This identifier or row ID number is different than the unique user data identifier referred to previously.

[0115] The modified query is passed to the database at step 822. The results of the query are received from the database at step 824. The query results are used to create an entry in results table 204 at step 826. The entry created in table 204 will not only include the query, the results, and the time the results were obtained, but also the row ID number associated with that query that was retrieved via the modified query. The row ID number is used to determine whether cached query results are valid when subsequent requests including the cached query are received. The query results are returned to the requesting application at step 828. If the cached system determines at step 818 that the query is a write query, a modification table is updated at step 830 to indicate the modifications to the tables effected by the write query. To facilitate the row-based caching technique of FIG. 10, the modification table stores an indication of the times rows within a table were last updated. At step 830, the modification table is updated to include the table dependencies determined at step 812. For each entry, the modification will specify the row_id for the specific record(s) modified by the query as well as the table that was modified and the time of modification. If it is determined at step 816 that the query does not requests records based solely on a unique identifier for individual records of the corresponding table, the method continues at step 832 where table based caching is instituted according to steps 320-332 of FIG. 4. Steps 322 and 328 are skipped if the query's structure is already cached.

[0116] If it is determined at step 804 that an entry exists in results table 204 for the query received at step 802, the dependencies for the query are determined at step 834. The tables the query is dependent upon are determined from dependency table 206. The row_id number of the query is determined from results table 204 at step 836. Modification table 208 is then searched at step 838 for an entry matching the dependent table of the query and the row_id number of the query. If an entry is found and the time of last modification to the specified row was before the results stored in table 204 were obtained, the cached results are determined to be valid at step 840. The cached results are returned to the requesting application at step 842. If no entry is found, it is determined that the cached results are valid and they are returned at step 842. If an entry is found and it indicates an update to the row after the results were obtained, the query is passed to the database at step 844 and the results received. The results are provided to the requesting application at step 846.

[0117] It should be noted that row-based caching and table-based caching can be contemporaneously implemented in accordance with one embodiment. For example, table 208 may only list table identifications for certain entries along with the time the table was modified when the entry is for a query that is dependent on something other than a unique record identifier. Table 208 may also include entries including row ID numbers and table IDs for other entries. Likewise, entries in table 204 for the query results may list the row_id numbers for queries meeting the unique row identifier dependency limitation. For more complicated queries with additional dependencies, only the results and time will be maintained. In this manner, when a query is received that has an entry in table 204, an intermediary can determine the row ID dependency from table 206 in order to compare that dependency with the time of last update from table 208 to determine whether the results in table 204 are valid.

[0118] In one embodiment, a caching system can be distributed across one or more servers or other systems at various levels to provide further performance and availability. FIG. 11 depicts an exemplary embodiment including caching intermediaries 108 distributed at application servers 102 as well as a central server 902. Central server 902 is implemented between application servers 102 and database server 122 to provide an additional level of caching.

[0119] Each application server includes a caching intermediary 108 as previously described. Central server 902 also includes a caching intermediary 108, providing a centralized cache for results and invalidation information that can be used by the individual intermediaries at each application server. In FIG. 11, the caching intermediary 108 at the central server includes similar cache 110 and interface 112 components as the local caching intermediaries. These components may include additional operations and perform slightly different operations in such a multi-level environment. Again, different implementations of intermediary 108 can be made in accordance with embodiments. In FIG. 11, the listener component 128 interfaces with the caching intermediary 108 at the central server 902 to provide information regarding updates to the database. The intermediary at the central server can maintain this information for invalidating cache entries. In one embodiment, the central server intermediary can pass this information to the local intermediaries at the application servers so it can be used there for invalidations. In yet another embodiment, the listener component 128 can interface directly with each caching intermediary 108 at the application server level and central server level.

[0120] In an exemplary operation of the system of FIG. 11, an application 104 at any of the application servers can issue a query or stored procedure call which is intercepted at the local caching intermediary 108 associated with that application. The caching intermediary can determine if it is storing dependency information for the structure of the received query. If the structure is stored locally, the local intermediary can determine if it is storing valid results for the received query as previously described. If the local intermediary is storing valid results, it can further access the central server's intermediary 108 to assess the validity of the results.

[0121] An application at one of the other application servers in communication with the central server may update database 120 in a transaction not seen at the local intermediary processing the current request. However, a query received from any of the local application servers will pass through central server 902. Accordingly, the local intermediary can access the central server to determine the last time any table, row, etc. upon which the received query is dependent was last modified by any of the applications at any of the application servers. If the local intermediary determines that its local results are valid, those results can be returned to the requesting application. If the local results are not valid, the local intermediary passes the query to the caching intermediary at the central server 902. The caching intermediary at central server 902 will return its locally cached results if they are valid.

[0122] If neither the local intermediary nor the central intermediary are maintaining valid results for the query, the intermediary at the central server will forward the query to the database and receive results. The intermediary at the central server will update its cache with the new results, time of retrieval, and/or any portions of the database modified by the query if necessary. The results will be returned to the local intermediary which will also update its cache in accordance with the new results, time, etc.

[0123] In one embodiment, the central server's caching intermediary can also send the results and any update information to the local intermediaries at the application servers in response to updates to the database from any of the application servers and/or in response to a direct update detected by listener component 128. In this manner, the local intermediaries need only check their local caches to determine if their cached results are still valid.

[0124] If the local intermediary receiving the query initially determines that it is not storing dependency information for the structure of the received query, it can abstract the query and pass it to the intermediary at the central server to determine if the intermediary at the central server is maintaining the dependency information. For example, dependency analysis may have already been performed by the database for the received structure in response to a query from another application server. If the intermediary at the central server is maintaining the information, it can pass it to the local intermediary which can cache the information for later use. Upon receiving the dependency information in one embodiment, the local intermediary determines that it is not storing valid results for the query and forwards the query to the central server's intermediary. In other embodiments, the local intermediary can pass the query to the central server's intermediary as soon as it determines that it is not storing the dependency information, rather than wait to receive this information.

[0125] If the database is not storing the dependency information for a query of this structure, it will send a response to the local intermediary informing it that the query structure has not yet been analyzed or that queries of that structure are not cacheable. In one embodiment, the central server intermediary stores information to indicate query structures that were previously determined to not be cacheable to avoid the performance of dependency analysis more than once for the same structure. Additionally, the local intermediaries can maintain information about query structures that were previously determined to not be cacheable so that a request to the central server can be avoided when it is known that queries of the received structure are not cacheable.

[0126] After receiving a notification from the central server intermediary that the dependency information is not being stored or that queries of the received query's structure are not cacheable, the local intermediary sends the actual query to the intermediary at the central server. The local intermediary also sends the actual query to the intermediary after receiving dependency information and determining that it is not maintaining valid results for the actual query as mentioned above. The central server intermediary can again check whether it is storing dependency information for the received query. In other embodiments, the central server intermediary only performs this check once. The central server intermediary will pass the abstracted query to the database by calling a facility such as "EXPLAIN PLAN" as previously described. The intermediary at the central server will also pass the actual query to the database. When the actual query results are received, the intermediary at the central server can cache the results and forward them to the local caching intermediary which will also cache the results. Likewise, when the results of the dependency analysis are received, the central server intermediary will cache the dependency information, etc. and forward the results to the local intermediary which will cache them as well. In one embodiment, the central server intermediary and local intermediaries wait to receive the dependency analysis information before caching query results or dependency information.

[0127] In FIG. 11, a dual-level caching system is provided with caches at the application server level and another server level implemented between the application servers and database server. In one embodiment, additional levels of caching can be used. For example, application server caching systems may interface with a regional server running a caching system in accordance with an embodiment. Multiple regional servers may interface with a central server running a caching system that interfaces with the database system. Other variations with additional levels of caching or further distribution of components across these servers can be implemented in accordance with various embodiments.

[0128] While exemplary embodiments have been described in which caching systems or intermediaries execute on application servers and other servers, any type of processing or computing device may be used, including personal computers, minicomputers, mainframes, handheld computing devices, mobile computing devices, and so forth. Typically, these computing devices will include one or more processors in communication with one or more processor readable storage devices, communication interfaces, peripheral devices, and so forth. Examples of storage devices include RAM, ROM, hard disk drives, floppy disk drives, CD ROMS, DVDs, flash memory, and so forth. Examples of peripherals include printers, monitors, keyboards, pointing devices, and so forth. Examples of communication interfaces include network cards, modems, wireless transmitters/receivers, and so forth. In some embodiments, all or part of the functionality is implemented in software, including firmware and/or micro code, that is stored on one or more processor readable storage devices and is used to program one or more processors to achieve the functionality described herein.

[0129] The foregoing detailed description has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the disclosure and any claimed subject matter to the precise form disclosed. Many modifications and variations are possible in light of the above teachings. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the present disclosure and its claimed subject matter be defined by the claims appended hereto.

* * * * *