U.S. patent application number 11/539111 was filed with the patent office on 2007-04-12 for dynamic temporal optimization framework.
This patent application is currently assigned to Microsoft Corporation. Invention is credited to Trishul A. Chilimbi, Martin Hirzel.
Application Number | 20070083856 11/539111 |
Document ID | / |
Family ID | 32325361 |
Filed Date | 2007-04-12 |
United States Patent
Application |
20070083856 |
Kind Code |
A1 |
Chilimbi; Trishul A. ; et
al. |
April 12, 2007 |
DYNAMIC TEMPORAL OPTIMIZATION FRAMEWORK
Abstract
A temporal profiling framework useful for dynamic optimization
with hot data stream prefetching provides profiling of longer
bursts and lower overhead. For profiling longer bursts, the
framework employs a profiling phase counter, as well as a checking
phase counter, to control transitions to and from instrumented code
for sampling bursts of a program execution trace. The temporal
profiling framework further intelligently eliminates some checks at
procedure entries and loop back-edges, while still avoiding
unbounded execution without executing checks for transition to and
from instrumented code. Fast hot data stream detection analyzes a
grammar of a profiled data reference sequence, calculating a heat
metric for recurring subsequences based on length and number of
unique occurrences outside of other hot data streams in the
sequence with sufficiently low-overhead to permit use in a dynamic
optimization framework.
Inventors: |
Chilimbi; Trishul A.;
(Seattle, WA) ; Hirzel; Martin; (Boulder,
CO) |
Correspondence
Address: |
KLARQUIST SPARKMAN LLP
121 S.W. SALMON STREET
SUITE 1600
PORTLAND
OR
97204
US
|
Assignee: |
Microsoft Corporation
Redmond
WA
|
Family ID: |
32325361 |
Appl. No.: |
11/539111 |
Filed: |
October 5, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10305056 |
Nov 25, 2002 |
7140008 |
|
|
11539111 |
Oct 5, 2006 |
|
|
|
Current U.S.
Class: |
717/128 ;
714/E11.207; 714/E11.211 |
Current CPC
Class: |
G06F 11/3612
20130101 |
Class at
Publication: |
717/128 |
International
Class: |
G06F 9/44 20060101
G06F009/44 |
Claims
1. A method of detecting a hot data stream in a data reference
sequence from sampled bursts of a program execution trace, the
method comprising: parsing the data reference sequence to extract a
compressed grammar representation of the data reference sequence,
the compressed grammar representation comprising a plurality of
language elements each representing a number of occurrences of
unique subsequences and related as a directed acyclic graph;
numbering the language elements according to a reverse postorder
numbering; calculating a heat measure of each language element
related to a product of the length of the subsequence represented
by the language element together with a number of occurrences of
the subsequence represented by the language element that are not
included in a heat measure of a predecessor language element
according to the numbering that meets a hot criteria; comparing the
heat measure of each language element to the hot criteria; and
identifying the subsequence represented by a language element
meeting the hot criteria as a hot data stream.
2. The method of claim 1 wherein the reverse postorder numbering
results in non-terminal nodes being numbered such that whenever a
non-terminal node is a child of another non-terminal node, a number
assigned to the child node is greater.
3. The method of claim 1 wherein a data reference comprises a data
pair of a program counter value which indicates an address of a
data load or store instruction and a memory location accessed by
the data load or store instruction.
4. The method of claim 1 wherein the hot data stream is a data
reference subsequence in a profiled burst with a regularity
magnitude that exceeds the hot criteria.
5. The method of claim 4 wherein the regularity magnitude of a data
reference subsequence v, is defined as v.heat=v.length*v.frequency,
where v.frequency is the number of non-overlapping occurrences of v
in the profiled burst.
6. The method of claim 4 wherein the regularity magnitude of a
non-terminal node A comprises A.heat=w.sub.A.length*A.coldUses,
where A.coldUses is a number of times A occurs in a unique parse
tree not counting occurrences in sub-trees belonging to hot
non-terminals other than A.
7. A dynamic optimizer comprising: a temporal profiling framework
insertion tool operating to modify a program to provide
instrumentation for capturing a temporal data reference sequence
for sampled bursts of an execution trace of the program; a hot data
stream detector operating to parse the temporal data reference
sequence to extract a compressed grammar representation of the data
reference sequence, the compressed grammar representation
comprising a plurality of language elements each representing a
number of occurrences of unique subsequences and related as a
directed acyclic graph, the hot data stream detector further
numbering the language elements according to a reverse postorder
numbering, the hot data stream detector further calculating a heat
measure of each language element related to a product of the length
of the subsequence represented by the language element together
with a number of occurrences of the subsequence represented by the
language element that are not included in a heat measure of a
predecessor language element according to the numbering that meets
a hot criteria, the hot data stream detector further comparing the
heat measure of each language element to the hot criteria, and
identifying the subsequence represented by a language element
meeting the hot criteria as a hot data stream; and a prefetching
code injector for inserting prefetching instructions at locations
in the program corresponding to occurrences of the identified hot
data stream in the data reference sequence.
8. The dynamic optimizer of claim 7 wherein the temporal profiling
framework insertion tool modifies an executable binary version of
the program.
9. The dynamic optimizer of claim 8 wherein the modified executable
binary version of the program comprises a profiling phase counter
and a checking phase counter controlling transitions to and from
instrumented code for sampled bursts.
10. The dynamic optimizer of claim 7 wherein the reverse postorder
numbering results in numbering language elements such that when a
non-terminal language element is a child of another non-terminal
language element, a number assigned to the child is greater.
11. The dynamic optimizer of claim 7 operating in phases comprising
a profiling phase, analysis and optimization phase, and a
hibernation phase.
12. The dynamic optimizer of claim 11 wherein the profiling phase
comprises sampling bursts of the execution trace of the
program.
13. The dynamic optimizer of claim 11 wherein the analysis and
optimization phase comprises operation of the hot data stream
detector.
14. The dynamic optimizer of claim 11 wherein the hibernation phase
comprises the program executing as optimized with prefetch
instructions.
15. A computer-readable program carrying medium having a program
carried thereon executable on a computer to perform a method of
detecting a hot data stream in a data reference sequence from
sampled bursts of a program execution trace, the method comprising:
parsing the data reference sequence to extract a compressed grammar
representation of the data reference sequence, the compressed
grammar representation comprising a plurality of language elements
each representing a number of occurrences of unique subsequences
and related as a directed acyclic graph; numbering the language
elements according to a reverse postorder numbering; calculating a
heat measure of each language element related to a product of the
length of the subsequence represented by the language element
together with a number of occurrences of the subsequence
represented by the language element that are not included in a heat
measure of a predecessor language element according to the
numbering that meets a hot criteria; comparing the heat measure of
each language element to the hot criteria; and identifying the
subsequence represented by a language element meeting the hot
criteria as a hot data stream.
16. The computer-readable program carrying medium of claim 15
wherein the reverse postorder numbering comprises non-terminal
nodes being numbered such that when a non-terminal node is a child
of another non-terminal node the child node is assigned a greater
number.
17. The computer-readable program carrying medium of claim 15
wherein a data reference in a sequence comprises: a program counter
value which indicates an address of a data load or store
instruction; and a memory location accessed by the data load or
store instruction.
18. The computer-readable program carrying medium of claim 15
wherein the hot data stream is a data reference subsequence in a
profiled burst with a regularity magnitude that exceeds the hot
criteria.
19. The computer-readable program carrying medium of claim 18
wherein the regularity magnitude of a data reference subsequence v
comprises: v.heat=v.length*v.frequency where v.frequency is the
number of non-overlapping occurrences of v in the profiled
burst.
20. The computer-readable program carrying medium of claim 18
wherein the regularity magnitude of a non-terminal node A comprises
A.heat=w.sub.A.length*A.coldUses, where A.coldUses is the number of
times A occurs in a unique parse tree not counting occurrences in
sub-trees belonging to hot non-terminals other than A.
Description
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This a divisional of U.S. patent application Ser. No.
10/305,056, filed Nov. 25, 2002, which application is incorporated
herein in its entirety.
TECHNICAL FIELD
[0002] The present invention relates to temporal profiling and
memory access optimization of computer programs, and particularly
for dynamic optimization during program execution.
BACKGROUND
[0003] With processor speed increasing much more rapidly than
memory access speed, there is a growing performance gap between
processor and memory in computers. More particularly, processor
speed continues to adhere to Moore's law (approximately doubling
every 18 months). By comparison memory access speed has been
increasing at the relatively glacial rate of 10% per year.
Consequently, there is a rapidly growing processor-memory
performance gap. Computer architects have tried to mitigate the
performance impact of this imbalance with small high-speed cache
memories that store recently accessed data. This solution is
effective only if most of the data referenced by a program is
available in the cache. Unfortunately, many general-purpose
programs, which use dynamic, pointer-based data structures, often
suffer from high cache miss rates, and therefore are limited by
memory system performance.
[0004] Due to the increasing processor-memory performance gap,
memory system optimizations have the potential to significantly
improve program performance. One such optimization involves
prefetching data ahead of its use by the program, which has the
potential of alleviating the processor-memory performance gap by
overlapping long latency memory accesses with useful computation.
Successful prefetching is accurate (i.e., correctly anticipates the
data objects that will be accessed in the future) and timely
(fetching the data early enough so that it is available in the
cache when required). For example, T. Mowry, M. Lam and A Gupta,
"Design And Analysis Of A Compiler Algorithm For Prefetching,"
Architectural Support For Programming Languages And Operating
Systems (ASP-LOS) (1992) describe an automatic prefetching
technique for scientific codes that access dense arrays in tightly
nested loops, which relies on static compiler analyses to predict
the program's data accesses and insert prefetch instructions at
appropriate program points. However, the reference pattern of
general-purpose programs, which use dynamic, pointer-based data
structures, is much more complex, and the same techniques do not
apply.
[0005] An alternative to static analyses for predicting data access
patterns is to perform program data reference profiling. Recent
research has shown that programs possess a small number of "hot
data streams," which are data reference sequences that frequently
repeat in the same order, and these account for around 90% of a
program's data references and more than 80% of cache misses. (See,
e.g., T. M. Chilimbi, "Efficient Representations And Abstractions
For Quantifying And Exploiting Data Reference Locality,"
Proceedings Of The ACM SIGPLAN '01 Conference On Programming
Language Design And Implementation (June 2001); and S. Rubin, R.
Bodik and T. Chilimbi, "An Efficient Profile-Analysis Framework For
Data-Layout Optimizations," Principles Of Programming Languages,
POPL '02 (January 2002).) These hot data streams can be prefetched
accurately since they repeat frequently in the same order and thus
are predictable. They are long enough (15-20 object references on
average) so that they can be prefetched ahead of use in a timely
manner.
[0006] In prior work, Chilimbi instrumented a program to collect
the trace of its data memory references; then used a compression
technique called Sequitur to process the trace off-line and extract
hot data streams. (See, T. M. Chilimbi, "Efficient Representations
And Abstractions For Quantifying And Exploiting Data Reference
Locality," Proceedings Of The ACM SIGPLAN '01 Conference On
Programming Language Design And Implementation (June 2001).)
Chilimbi further demonstrated that these hot data streams are
fairly stable across program inputs and could serve as the basis
for an off-line static prefetching scheme. (See, T. M. Chilimbi,
"On The Stability Of Temporal Data Reference Profiles,"
International Conference On Parallel Architectures And Compilation
Techniques (PACT) (2001).) However, this off-line static
prefetching scheme may not be appropriate for programs with
distinct phase behavior.
[0007] Dynamic optimization uses profile information from the
current execution of a program to decide what and how to optimize.
This can provide an advantage over static and even
feedback-directed optimization, such as in the case of the programs
with distinct phase behavior. On the other hand, dynamic
optimization must be more concerned with the profiling overhead,
since the slow-down from profiling has to be recovered by the
speed-up from optimization.
[0008] One common way to reduce the overhead of profiling is
through use of sampling: instead of recording all the information
that may be useful for optimization, sample a small, but
representative fraction of it. In a typical example, sampling
counts the frequency of individual events such as calls or loads.
(See, J. Anderson et al., "Continuous Profiling: Where Have All The
Cycles Gone?," ACM Transactions On Computer Systems (TOCS) (1997).)
Other dynamic optimizations exploit causality between two or more
events. One example is prefetching with Markov-predictors using
pairs of data accesses. (See, D. Joseph and D. Grunwald,
"Prefetching Using Markov Predictors," International Symposium On
Computer Architecture (ISCA) (1997).) Some recent transparent
native code optimizers focus on single-entry, multiple-exit code
regions. (See, e.g., V. Bala, E. Duesterwald and S. Banerjia,
"Dynamo: A Transparent Dynamic Optimization System," Programming
Languages Design And Implementation (PLDI) (2000); and D. Deaver,
R. Gorton and N. Rubin, "Wiggins/Redstone: An On-Line Program
Specializer," Hot Chips (1999).) Another example provides
cache-conscious data placement during generational garbage
collection to lay out sequences of data objects. (See, T. Chilimbi,
B. Davidson and J. Larus, "Cache-Conscious Structure Definition,"
Programming Languages Design And Implementation (PLDI) (1999); and
T. Chilimbi and J. Larus, "Using Generational Garbage Collection To
Implement Cache-Conscious Data Placement," International Symposium
On Memory Management (ISMM) (1998).) However, for lack of
low-overhead temporal profilers, these systems usually employ event
profilers. But, as Ball and Larus point out, event (node or edge)
profiling may misidentify frequencies of event sequences. (See, T.
Ball and J. Larus, "Efficient Path Profiling," International
Symposium On Microarchitecture (MICRO) (1996).)
[0009] The sequence of all events occurring during execution of a
program is generally referred to as the "trace." A "burst" on the
other hand is a subsequence of the trace. Arnold and Ryder present
a framework that samples bursts. (See, M. Arnold and B. Ryder, "A
Framework For Reducing The Cost Of Instrumented Code," Programming
Languages Design And Implementation (PLDI) (2001).) In their
framework, the code of each procedure is duplicated. (Id., at FIG.
2.) Both versions of the code contain the original instructions,
but only one version is instrumented to also collect profile
information. The other version only contains checks at procedure
entries and loop back-edges that decrement a counter "nCheck,"
which is initialized to "nCheck.sub.0." Most of the time, the
(non-instrumented) checking code is executed. Only when the ncheck
counter reaches zero, a single intraprocedural acyclic path of the
instrumented code is executed and ncheck is reset to
nCheck.sub.0.
[0010] A limitation of the Arnold-Ryder framework is that it stays
in the instrumented code only for the time between two checks.
Since it has checks at every procedure entry and loop back-edge,
the framework captures a burst of only one acyclic intraprocedural
path's worth of trace. In other words, only the burst between the
procedure entry check and a next loop back-edge is captured. This
limitation can fail to profile many longer "hot data stream"
bursts, and thus fail to optimize such hot data streams. Consider
for example the code fragment:
[0011] for (i=0; i<n; i++) [0012] if ( . . . ) f( ); [0013] else
g( ); Because the Arnold-Ryder framework ends burst profiling at
loop back-edges, the framework would be unable to distinguish the
traces fgfgfgfg and ffffgggg. For optimizing single-entry
multiple-exit regions of programs, this profiling limitation may
make the difference between executing optimized code most of the
time or not.
[0014] Another limitation of the Arnold-Ryder framework is that the
overhead of the framework can still be too high for dynamic
optimization of machine executable code binaries. The Arnold-Ryder
framework was implemented for a Java virtual machine execution
environment, where the program is a set of Java class files. These
Java programs typically have a higher execution overhead, so that
the overhead of the instrumentation checks is smaller compared to a
relatively slow executing program. The overhead of the Arnold-Ryder
framework's instrumentation checks may make dynamic optimization
with the framework impractical in other settings for programs with
lower execution overhead (such as statically compiled machine code
programs).
[0015] A further problem is that the overhead of hot data stream
detection has been overly high for use in dynamic optimization
systems, such as the Arnold-Ryder framework.
SUMMARY
[0016] Techniques described herein provide low-overhead temporal
profiling and analysis, such as for use in dynamic memory access
optimization.
[0017] In accordance with one technique described herein, temporal
profiling of longer bursts in a program trace is achieved by
incorporating symmetric "checking code" and "instrumented code"
counters in a temporal profiling framework employing
non-instrumented (checking) code and instrumented code versions of
a program. Rather than immediately transitioning back to the
checking code at a next proximate check in the instrumented code as
in the prior Arnold-Ryder framework, a counter also is placed on
checks in the instrumented code. After transitioning to the
instrumented code, a count of plural checks in the instrumented
code is made before returning to the checking code. This permits
the instrumented code to profile longer continuous bursts sampled
out of the program trace.
[0018] In accordance with further techniques, the overhead of
temporal profiling is reduced by intelligently eliminating checks.
In the prior Arnold-Ryder framework, checks were placed at all
procedure entries and loop back-edges in the code to ensure that
the program can never loop or recurse for an unbounded amount of
time without executing a check. The techniques intelligently
eliminate checks from procedure entries and loop back-edges. In one
implementation, the intelligent check elimination performs a static
call graph analysis of the program to determine where checks should
be placed on procedure entries to avoid unbounded execution without
checking. Based on the call graph analysis, the intelligent check
elimination places checks at entries to root procedures, procedures
whose address is taken, and procedures with recursion from below.
On the other hand, the intelligent check elimination does not place
checks on leaf procedures (that call no other code in the program)
in the call graph. Further, the intelligent check elimination
eliminates checks at loop back-edges of tight inner loops, and at
"k-boring loops" (loops with no calls and at most k profiling
events of interest, since these are easy for a compiler to
statically optimize). Other techniques to reduce checks also can be
employed. This reduction in temporal profiling overhead can make
dynamic optimization practical for faster executing programs (e.g.,
binary code), as well as improving efficiency of dynamic
optimization of just-in-time compiled (JITed) code and interpreted
programs.
[0019] In accordance with another technique, an improved hot data
stream detection more quickly identifies hot data streams from
profiled bursts of a program, which can make dynamic prefetching
practical for dynamic optimization of programs. In one
implementation, the improved hot data stream detection constructs a
parse tree of the profiled bursts, then forms a Sequitur grammar
from the parse tree. The improved hot stream detection then
traverses the grammar tree in reverse postorder numbering order. At
each grammar element, the improved hot stream detection calculates
a regularity magnitude or "heat" of the element based on a length
of the burst sequence represented by the element multiplied by its
number of "cold" uses (i.e., number of times the element occurs in
the complete parse tree, not counting occurrences as sub-trees of
another "hot" element). The improved hot stream detection
identifies elements as representing "hot data streams" if their
heat exceeds a heat threshold.
[0020] Additional features and advantages of the invention will be
made apparent from the following detailed description that proceeds
with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0021] FIG. 1 is a data flow diagram of a dynamic optimizer
utilizing a low overhead, long burst temporal profiling framework
and fast hot data stream detection to dynamically optimize a
program with dynamic hot data stream prefetching.
[0022] FIG. 2 is a block diagram of a program modified according to
the prior Arnold-Ryder framework for burst profiling.
[0023] FIG. 3 is a block diagram of a program modified according to
an improved framework for longer burst profiling in the dynamic
optimizer of FIG. 1.
[0024] FIG. 4 is a program code listing for a check to control
transitions between checking and instrumented code versions in the
improved framework of FIG. 3 for longer burst profiling.
[0025] FIG. 5 is a call graph of an example program to be modified
according to an improved framework for low-overhead burst
profiling.
[0026] FIG. 6 is an illustration of an analysis of the call graph
of FIG. 5 for modifying the example program according to the
improved framework for low-overhead burst profiling.
[0027] FIG. 7 is a data flow diagram illustrating processing for
dynamic optimization of a program image in the dynamic optimizer of
FIG. 1.
[0028] FIG. 8 is a timeline showing phases of the low-overhead,
long burst temporal profiling by the dynamic optimizer of FIG.
1.
[0029] FIG. 9 is an illustration of grammar analysis of an
exemplary data reference sequence in bursts profiled with the
low-overhead, long burst temporal profiling forming part of the
processing by the dynamic optimizer shown in FIG. 7.
[0030] FIG. 10 is a program code listing for fast hot data stream
detection in the processing by the dynamic optimizer shown in FIG.
7.
[0031] FIG. 11 is an illustration of the fast hot data stream
detection performed according to the program code listing of FIG.
10 on the grammar of the exemplary data reference sequence from
FIG. 9.
[0032] FIG. 12 is a table listing results of the fast hot data
stream detection illustrated in FIG. 11.
[0033] FIG. 13 is a block diagram of a suitable computing device
environment for devices in the network device architecture of FIG.
1.
DETAILED DESCRIPTION
[0034] The following description is directed to techniques for
low-overhead, long burst temporal profiling and fast hot data
stream detection, which can be utilized in dynamic optimization of
computer programs. More particularly, these technique are described
in their particular application to a dynamic optimization involving
hot data stream prefetching to optimize a program's memory
accesses. However, the techniques can be applied in contexts other
than the described hot data stream prefetching dynamic
optimization.
1. Overview of Dynamic Optimizer
[0035] With reference to FIG. 1, an exemplary dynamic optimizer 100
utilizes techniques described more fully herein below for
low-overhead, long burst temporal profiling and fast hot data
stream detection in a process of dynamically optimizing a computer
program. The exemplary dynamic optimizer 120 includes a program
editing tool 122 to build a program image 130 in accordance with a
low-overhead temporal profiling framework described below,
including inserting instrumentation and checking code for profiling
long burst samples of a trace of the program's execution. In the
exemplary dynamic optimizer, the program editing tool 122 inserts
the instrumentation and checking code for the low-overhead temporal
profiling framework by editing an executable or binary version 115
of the program to be optimized, after compiling and linking by a
conventional compiler from the program's source code version. For
example, the source code 105 of the program to be optimized may be
initially written by a programmer in a high level programming
language, such as C or C++. Such program source code is then
compiled using an appropriate conventional compiler 110, such as a
C/C++ compiler available in the Microsoft.RTM. Visual Studio
development platform, to produce the machine-executable program
binary 115. The executable editing tool for the instrumentation
insertion 122 can be the Vulcan executable editing tool for x86
computer platform program binaries, which is described in detail by
A. Srivastava, A. Edwards, and H. Vo, "Vulcan: Binary
Transformation In A Distributed Environment," Technical Report
MSR-TR-2001-50, Microsoft Research (2001). This has the advantage
that the dynamic optimizer does not require access to the source
code, and can employed to optimize programs where only an
executable binary version is available. In other embodiments, the
profiling framework can be built into the program image 130 as part
of the process of compiling the program from source code or an
intermediate language form, such as for use with programs written
in Java, or intermediate code representations for the Microsoft Net
platform. In such other embodiments, the compiler that inserts
instrumentation and checks embodies the tool 122.
[0036] The temporal profiling framework provided in the program
image 130 produces profiled burst data 135 representing sampled
bursts of the program's execution trace. The exemplary dynamic
optimizer 120 includes a hot data stream analyzer 140 and hot
stream prefetching code injection tool 142. The hot data stream
analyzer 140 implements fast hot data stream detection described
herein below that process the profiled burst data to identify "hot
data streams," which are frequently recurring sequences of data
accesses by the program. The hot stream prefetching code injection
tool 142 then dynamically modifies the program image 130 to perform
prefetching so as to optimize cache utilization and data accesses
by the program, based on the identified hot data streams.
2. Temporal Profiling Framework
[0037] The program image 130 (FIG. 1) is structured according to a
low-overhead, long burst temporal profiling framework 300
illustrated in FIG. 3, which is an improvement on the prior
Arnold-Ryder framework 200 (FIG. 2).
[0038] In the prior Arnold-Ryder framework 200, the code of each
procedure from an original program version (e.g., original
procedure 210 with code blocks 212-213) is duplicated. Both
duplicate versions of the code in the framework 200 contain the
original instructions, but only one version is instrumented to also
collect profile information (referred to herein as the
"instrumented code" 220). The other version (referred to herein as
the "checking code" 230) only contains checks 240-241 at procedure
entries and loop back-edges that decrement a counter "nCheck,"
which is initialized to "nCheck.sub.0." Most of the time, the
(non-instrumented) checking code 230 is executed. Only when the
nCheck counter reaches zero, a single intraprocedural acyclic path
of the instrumented code 220 is executed and nCheck is reset to
nCheck.sub.0. All back-edges 250 in the instrumented code 220
transition back to the checking code 230.
[0039] While executing in the instrumented code 220, the
Arnold-Ryder framework 200 profiles a burst out of the program
execution trace, which begins at a check (e.g., procedure entry
check 240 or back-edge check 241) and extends to the next check. In
other words, the profiling captures one intraprocedural acyclic
path. The profile of the program captured during execution of this
path can be, for example, the data accesses made by the
program.
[0040] Profiling Longer Bursts
[0041] The improved framework 300 extends the prior Arnold-Ryder
framework 200 (FIG. 2) so that profiled bursts can extend over
multiple checks, possibly crossing procedure boundaries. This way,
the improved framework can obtain interprocedural,
context-sensitive and flow-sensitive profiling information.
[0042] As in the Arnold-Ryder framework 200, the improved framework
300 is structured to include duplicate non-instrumented ("checking
code") 330 and instrumented code 320 versions of at least some
original procedures 310 of the program. Further, checks 340-341 are
placed at procedure entry and loop back-edges.
[0043] The extension in the improved framework 300 adds a second
"profiling phase" counter (labeled "nInstr") to make execution flow
in the instrumented code 320 symmetric with the checking code 330.
Further, the loop back-edges 350 from the instrumented code 320 do
not transition directly back to the procedure entry as in the prior
Arnold-Ryder framework 200, but instead go to a back-edge check
341.
[0044] The program logic or code 400 for the checks 340-341 is
shown in FIG. 4. Initially, the value of the checking phase counter
("nCheck") is set to its initial value, "nCheck.sub.0." While in
the checking code 400, the framework 300 decrements the checking
phase counter (nCheck) (statement 410) at every check 340-341. The
framework 300 continues to execute in the checking code (statement
420) as long as the value of the checking phase counter has not yet
reached zero. For example, from the entry and back-edge checks
340-341, the framework 300 takes the paths 360-361 to the checking
code 330.
[0045] When the checking phase counter (nCheck) reaches zero, the
framework 300 initializes the profiling phase counter (nInstr) to
an initial value, nInstr.sub.0, and transitions to the instrumented
code 320 (statement 430). In general, the checking phase counter's
initial value is selected to be much greater than that of the
profiling phase counter (i.e., nInstr.sub.0<<nCheck.sub.0),
which determines the sampling rate of the framework
(r=nInstr.sub.0I(nCheck.sub.0+nInstr.sub.0)).
[0046] While executing in the instrumented code, the framework 300
decrements the profiling phase counter (nInstr) at every check
340-341 (statement 440). The framework 300 continues to execute in
the instrumented code (statement 450) as long as the value of the
profiling phase counter has not yet reached zero. For example, from
the entry and back-edge checks 340-341, the framework 300 takes the
paths 370-371 to the instrumented code 320. When the profiling
phase counter reaches zero, the framework again initializes the
checking phase counter to the initial value, nCheck.sub.0, and
returns to the checking code 330 (statement 460).
[0047] The check code 400 is structured so that in the common case
where the framework is executing in the checking code and is to
continue executing the checking code (checking phase), the check
consists of a decrement of the checking phase counter and a
conditional branch.
[0048] Compared to the prior Arnold-Ryder framework 200, the
improved framework 300 profiles longer bursts of the program trace
and provides more precise profiles. For example, consider the
following code fragment:
[0049] for (i=0; i<n; i++) [0050] if ( . . . ) f( ); [0051] else
g( ); In this example code fragment, the Arnold-Ryder framework
returns to the checking code upon the back-edge path from each
execution of the procedures, f( ) and g( ). Accordingly, the
Arnold-Ryder framework profiles only on acyclic intraprocedural
path of the program trace, and would be unable to distinguish the
traces, fgfgfgfg and ffffgggg. The improved framework 300 profiles
longer bursts across procedure boundaries. In the dynamic optimizer
120 (FIG. 1), this can make a difference between executing
optimized code most of the time or not.
[0052] Low-overhead Temporal Profiling
[0053] For the dynamic optimization to effectively enhance the
performance of the program, the overhead imposed by the temporal
profiling framework desirably is relatively small compared to the
overall program execution, so that performance gains are achieved
from dynamically optimizing the program. The overhead of the
temporal profiling framework can be particularly significant in the
exemplary dynamic optimizer 120 in which the program image 130 is
built from editing an executable program binary 115, to which the
compiler 110 has already applied many static optimizations. In such
case, the overhead of the prior Arnold-Ryder framework may be too
high for effective dynamic optimization. The prior Arnold-Ryder
framework has checks at all procedure entries and loop back-edges
to insure that the program can never loop or recurse for an
unbounded amount of time without executing a check. Otherwise,
sampling could miss too much profiling information (when the
program spends an unbounded amount of time in the checking code),
or the overhead could become too high (when the program spends an
unbounded amount of time in the instrumented code).
[0054] The low-overhead temporal profiling framework described
herein decreases the overhead of the burst sampling by
intelligently eliminating some checks (i.e., placing checks at
fewer than all procedure entries and loop back-edges), while still
ensuring that the program does not spend an unbounded amount of
time without executing a check.
[0055] Eliminating Checks at Procedure Entries
[0056] In the low-overhead temporal profiling framework, the
instrumentation tool 122 places checks at an approximated minimum
set of procedure entries so that the program cannot recurse for an
unbounded amount of time without executing a check. The
instrumentation tool 122 performs a static call graph analysis of
the program 115 to determine this approximate minimum set (C.OR
right.N) of nodes in the program's call graph, such that every
cycle in the call graph contains at least one node of the set.
[0057] In the dynamic optimizer 120, the instrumentation tool 122
selects this set (C.OR right.N) of procedures f at which to place
procedure entry checks, according to the criteria represented in
the following expression: C = { f .di-elect cons. N | is_leaf
.times. ( f ) ^ ( is_root .times. .times. ( f ) addr_taken .times.
.times. ( f ) recursion_from .times. _below .times. .times. ( f ) )
} ##EQU1##
[0058] In accordance with this criteria, the instrumentation tool
122 does not place any check on any entry to a leaf procedure
(i.e., a procedure that calls nothing), since such leaf procedures
cannot be part of a recursive cycle. Otherwise, the instrumentation
tool 122 places a check on entries to all root procedures (i.e.,
procedures that are only called from outside the program), so as to
ensure that execution starts in the correct version of the code.
Also, the tool places a check on entry to every procedure whose
address is taken, since such procedures may be part of recursion
with indirect calls. Further, the tool places a check on entry to
every procedure with recursion from below. A procedure f has
recursion from below, iff it is called by g in the same strongly
connected component as f that is at least as far away from the
roots. The distance of a procedure f from the roots is the minimum
length of the shortest path from a root to f.
[0059] The "recursion_from_below" heuristic in this criteria
guarantees that there is no recursive cycle without a check and
breaks the ties to determine where in the cycle to put the check
(similarly to back-edges in loops). The tool breaks ties so that
checks are as far up in the call-stack as possible. This should
reduce the number of dynamic checks.
[0060] For example, FIG. 5 illustrates a call graph 500 of an
exemplary program being structured by the tool 122 according to the
low-overhead temporal profiling framework. In this call graph 500,
the only root is procedure main 510, and the only leaf procedure is
delete-digram 520. The only non-trivial strongly connected
component in the call graph 500 is the component 650 (of procedures
{check, match, substitute} 530-532).
[0061] FIG. 6 illustrates an analysis 600 of the call graph 500 by
the tool 122 to determine the set of procedures for entry check
placement. For this analysis, the tool 122 begins with a
breadth-first search of the call graph. The tool calculates the
distances (e.g., from 0 to 4 in this example) of each procedure
from the root procedure (main 510), and determines that only the
procedure check 530 has recursion from below, since it is called
from the procedure substitute 532 which is further away from the
root procedure main 510. The tool 122 thus determines that for this
example with call graph 500, only the procedures main 510 and check
530 meet the above criteria for placing an entry check (i.e., the
above expression evaluates to the minimum set C={main,check} for
this call graph). Accordingly, by placing a check on entry to every
procedure in this minimum set C={main,check}, the program cannot
recurse indefinitely without executing checks.
[0062] Eliminating Checks at Loop Back-Edges
[0063] In the low-overhead temporal profiling framework, the
instrumentation tool 122 also places checks at fewer than all loop
back-edges in the program. In particular, the instrumentation tool
122 eliminates checks for some tight inner loops. This is because a
dynamic optimizer that complements a static optimizer may often
find the profiling information from tight inner loops to be of
little interest because static optimization excels at optimizing
such loops. At the same time, checks at the back-edges of tight
inner loops can become extremely expensive (i.e., create excessive
overhead relative to potential optimization performance gain). With
the dynamic optimizer 100 that prefetches data into cache memory
based on hot data streams, loops that compare or copy arrays
preferably should not have checks. Such loops typically are easy to
optimize statically, the check on the back-edge is almost as
expensive as the loop body, and the loop body contains too little
work to overlap with the prefetch.
[0064] More particularly, the instrumentation tool 122 eliminates
checks on loop back-edges of loops meeting a "k-boring loops"
criteria. According to this criteria, k-boring loops are defined as
loops with no calls and at most a number (k) of profiling events of
interest. The instrumentation tool 122 does not instrument either
version of the code of a k-boring loop, and does not place a check
on its back-edge. Since the loop is not included in the
instrumented code 320 (FIG. 1) version, the program image 130 does
not spend an unbounded amount of time executing in instrumented
code. The program image may spend an unbounded amount of time
executing such a loop in uninstrumented code (checking code 330 of
FIG. 1) without executing a check. But, if the k-boring loop
hypothesis holds (i.e., there is little or no gain from optimizing
such loops with hot data stream prefetching), the dynamic optimizer
120 does not miss interesting profiling information. Experiments
have shown that the quality of the profile actually improved when
instrumenting of back-edge checks were eliminated from 4-boring
loops (i.e., k=4) in an experimental program image, where the
quality of the profile is measured by the ability to detect hot
data streams. Accordingly, eliminating k-boring loop from profiling
helps focus sampling on more interesting events (for optimizing
with hot data stream prefetching).
[0065] In alternative implementations, the instrumentation tool 122
may eliminate additional checks on loop back-edges. For example,
the instrumentation tool may eliminate back-edge checks from a loop
that has only a small, fixed number of iterations. Further, if a
check is always executing within a loop body, the loop does not
need a check on the loop's back-edge. In yet further alternative
implementations, the instrumentation tool 122 can combine the loop
counter with the profiling phase counter; if the counters are
linearly related, the program image can execute checks for the loop
via a predicate on the loop counter, rather than updating the
profiling counter each iteration of the loop.
3. Hot Data Stream Prefetching
[0066] With reference now to FIG. 7, the temporal profiling 710
using the above-described low-overhead, long burst temporal
profiling framework 300 (FIG. 3) is a first phase in an overall
dynamic optimization process 700 based on hot data stream
pre-fetching. The dynamic optimization process 700 operates in
three phases--profiling 710, analysis and optimization 720, and
hibernation 730. First, the profiling phase collects (740) a
temporal data reference profile 135 from a running program with
low-overhead, which is accomplished using the program image 130
(FIG. 1) structured according to the improved temporal profiling
framework 300. As described in more detail below, a grammar
analysis using the Sequitur compression process 750 incrementally
builds an online grammar representation 900 of the traced data
references.
[0067] Once sufficient data references have been traced, profiling
is turned off, and the analysis and optimization phase 720
commences. First, a fast hot data stream detection 140 extracts hot
data streams 760 from the Sequitur grammar representation 900.
Then, a prefetching engine 142 builds a stream prefix matching
deterministic finite state machine (DFSM) 770 for these hot data
streams, and dynamically injects checks at appropriate program
points to detect and prefetch these hot data streams in the program
image 130. This dynamic prefetching based on a DFSM is described in
more detail in DYNAMIC PREFETCHING OF HOT DATA STREAMS, U.S. Pat.
No. 7,058,936, issued on Jun. 6, 2006, which is hereby incorporated
herein by reference.
[0068] Finally, the process enters the hibernation phase where no
profiling or analysis is performed, and the program continues to
execute (780) as optimized with the added prefetch instructions. At
the end of the hibernation phase, the program image 130 is
de-optimized (790) to remove the inserted checks and prefetch
instructions, and control returns to the profiling phase 710. For
long-running programs, this profiling 710, analysis and
optimization 720 and hibernate 730 cycle may repeat multiple
times.
[0069] FIG. 8 shows a timeline 800 for the three phase profiling,
analysis and optimization, and hibernation cycle operation of the
dynamic optimizer 100 (FIG. 1). As discussed above, the
low-overhead, long burst temporal profiling framework uses the
checking phase and profiling phase counters (nCheck, nInstr) to
control its overhead and sampling rate of profiling, by
transitioning between a checking phase 810 in which the program
image 130 (FIG. 1) executes in its non-instrumented checking code
330 program image 130 (FIG. 1) executes in its non-instrumented
checking code 330 (FIG. 3) and a profiling phase 820 in which it
executes in its instrumented code 320 (FIG. 3). The time periods
for these checking and profiling phase are parameterized by the
nCheck.sub.0 and nInstr.sub.0 counter initialization values. For
example, setting nCheck.sub.0 to 9900 and nInstr.sub.0 to 100
results in a sampling rate of profiling of 100/10000=1% and a burst
length of 100 dynamic checks. The time spent for one iteration of
the checking and profiling phase (nCheck.sub.0+nInstr.sub.0) is
referred to as a burst period 850.
[0070] For dynamic optimization, the above-described low-overhead
temporal profiling framework 300 (FIG. 3) is further extended to
alternate between two additional phases, awake 830 and hibernating
840, which are controlled via two additional (awake and
hibernating) counters. The temporal profiling framework starts out
in the awake phase 830, and continues operating in the awake phase
for a number (nAwake.sub.0) of burst-periods, yielding
(nAwake.sub.0.times.nInstr.sub.0) checks (860) worth of traced data
references (or "bursts"). Then, as described above and illustrated
in FIG. 7, the dynamic optimizer 100 performs the optimizations,
and then the profiler hibernates while the optimized program
executes. This is done by setting nCheck.sub.0 to
(nCheck.sub.0+nInstr.sub.0-1) and nInstr.sub.0 to 1 for the next
nHibernate.sub.0 burst-periods (which causes the check code 400 in
FIG. 4 to keep the program image executing in the non-instrumented
checking code 330), where nHibernate.sub.0 is much greater than
nAwake.sub.0. When the hibernating phase 840 is over, the profiling
framework is "woken up" by resetting nCheck.sub.0 and nInstr.sub.0
to their original values.
[0071] While the profiling framework is hibernating, the program
image traces next to no data references and hence incurs only the
basic overhead of executing the checks 400 (FIG. 4). With the
values of nCheck.sub.0 and nInstr.sub.0 set as described above
during hibernation, the burst-periods correspond to the same time
(in executed checks 860) in both awake and hibernating phases. This
facilitates control over the relative length of the awake and
hibernating phases by appropriately setting the initial value
parameters nAwake.sub.0 and nHibernate.sub.0 of the awake and
hibernating counters relative to each other.
[0072] Fast Hot Data Stream Detection
[0073] When the temporal profiling framework 300 executes in the
instrumented code 320 (FIG. 3), the temporal profiling
instrumentation produces data reference bursts or temporal data
reference sequences 135 (FIGS. 1 and 7). A data reference r is a
load or store operation on a particular address, represented in the
exemplary dynamic optimizer 120 as a data pair (r.pc, r.addr). The
"pc" value (i.e., r.pc), is the value of the program counter, which
indicates the address in the executing program of the data load or
store instruction being executed. The "addr" value (i.e., r.addr),
is the memory location accessed by the load or store operation. The
profiled burst is a temporal sequence or stream of these data
references.
[0074] During the profiling phase 710 (FIG. 7) as discussed above,
this data reference sequence is incrementally processed into a
compressed "Sequitur" grammar representation 900 using the Sequitur
grammar analysis processing, as described in T. M. Chilimbi,
"Efficient Representations And Abstractions For Quantifying And
Exploiting Data Reference Locality," Proceedings Of The ACM SIGPLAN
'01 Conference On Programming Language Design And Implementation
(June 2001). FIG. 9 illustrates an example of a grammar 900
produced from an input data reference sequence (input string 910).
The grammar 900 represents a hierarchical structure (a directed
acyclic graph 920) of the data references.
[0075] More particularly, each observed data reference (r.pc,
r.addr) is conceptually represented as a symbol in a grammar, and
the concatenation of the profiled bursts is a string w of symbols
(910). The Sequitur grammar analysis constructs a context-free
grammar for the language {w} consisting of exactly one word, the
string w. The Sequitur grammar analysis runs in time O (w.length).
It is incremental (one symbol can be appended at a time), and
deterministic. Thus, the grammar analysis can be performed as the
profiled data is sampled during the profiling phase 710 (FIG. 7).
The grammar 900 is a compressed representation of the input burst
910. Further, it is unambiguous and acyclic in the sense that no
non-terminal directly or indirectly defines itself.
[0076] In the Sequitur grammar 910, the terminal nodes (denoted in
small case letters) represent individual data references (r.pc,
r.addr), which may be repeated in the profiled burst. The
intermediate nodes (denoted in capital letters) represent temporal
sequences of the data references. For example, the grammar 910
produced from the example input string 910 shows that the string S
consists of the sequence "AaBB." A, in turn, consists of the data
references a and b. The intermediate node B represents a sequence
with two occurrences of the intermediate node C, which is a
sequence of the intermediate node A and data reference c.
[0077] After construction of the grammar 900 in the profiling phase
710, the dynamic optimizer 100 performs a fast hot data stream
detection 140 (FIGS. 1 and 7) to identify frequently recurring data
reference subsequences (the "hot data streams") in the profiled
bursts. For the fast hot data stream detection, the exemplary
dynamic optimizer performs analysis of the grammar as represented
in a hot data stream detection code 1000 shown in FIG. 10. The
purposes of the fast hot data stream analysis is to identify hot
data streams, which are a data reference subsequence in the
profiled bursts, whose regularity magnitude exceeds a predetermined
"heat" threshold, H. The regularity magnitude, given a data
reference subsequence v, is defined as v.heat=v.length*v.frequency,
where v.frequency is the number of non-overlapping occurrences of v
in the profiled bursts.
[0078] The analysis in code 1000 is based on the observation that
each non-terminal node (A) of a Sequitur grammar generates a
language L(A)={w.sub.A} with just one word w.sub.A. For the fast
hot data stream detection analysis, the regularity magnitude of a
non-terminal A is defined instead as
A.heat=w.sub.A.length*A.coldUses, where A.coldUses is the number of
times A occurs in the (unique) parse tree of the complete grammar,
not counting occurrences in sub-trees belonging to hot
non-terminals other than A. A non-terminal A is hot iff
minLen<=A.length<=maxlen and H<=A.heat, where H is the
predetermined heat threshold. The result of the analysis is the set
{w.sub.A|A is a hot non-terminal} of hot data streams.
[0079] FIGS. 11 and 12 show an example 1100 of the analysis in the
code 1000 (FIG. 10) for the input data reference sequence 910 and
grammar 900 in FIG. 9. As a result of the Sequitur grammar analysis
750 (FIG. 7), the input data reference sequence has been parsed (as
shown by parse tree 1110) and sub-sequences grouped under
intermediate (non-terminal) nodes into the Sequitur grammar (1120).
Further, the Sequitur grammar analysis also yields the length of
the subsequence represented in each non-terminal node of the
grammar 1120. Accordingly, the information shown in the first three
columns (the non-terminal nodes, their children, and their lengths)
of the table 1200 is provided to the fast hot data stream detection
analysis. As shown in FIGS. 11 and 12, a non-terminal node is
considered the child of another non-terminal node if it is listed
on the right-hand side of the grammar rule of the other
non-terminal node in Sequitur grammar 900 (FIG. 9).
[0080] In the fast hot data stream analysis code 1000, the analyzer
140 (FIG. 1) first executes instructions (1010) to perform a
reverse post-order numbering of the non-terminal nodes in the
grammar. For the example grammar, the results in numbering the
nodes S, A, B, and C as 0, 3, 1 and 2, respectively, as shown in
the index column of the table 1200 (FIG. 12) and illustrated in the
reverse postorder numbering tree 1130 (FIG. 11). This results in
the non-terminal nodes being numbered such that whenever a
non-terminal node (e.g., node C) is a child of another non-terminal
node (e.g., B), the number of the child node is greater (e.g., B.
index<C.index). This property guarantees that the analysis does
not visit a non-terminal node before having visited all its
predecessors.
[0081] The analyzer 140 next determines at instructions 1020 in
code 1000 how often each non-terminal node occurs in the parse-tree
1110 (FIG. 11), which is represented in the "use" column of the
table 1200 (FIG. 12). Each of the non-terminal nodes is now
associated with two values, its number of "hot uses" and its
length, which are depicted conceptually in the uses:length tree
1140 (FIG. 11).
[0082] Finally, the analyzer 140 finds the number of "cold uses"
for each non-terminal node, which are the number of hot uses not
attributable in the "cold uses" of a "hot" predecessor node. More
specifically, the analyzer finds hot non-terminal nodes such that a
non-terminal node is only considered hot if it accounts for enough
of the trace on its own, where it is not part of the expansion of
the other hot non-terminals. In the example grammar with a heat
threshold (H=8) and length restrictions (minLen=2, maxLen=7), only
the non-terminal node B is considered as "hot," since its "heat"
(cold uses.times.length=2.times.6=12) exceeds the heat threshold
(12>8). All uses of the non-terminal node C are completely
subsumed its the predecessor "hot" non-terminal node B and
therefore is not considered hot (its heat=cold
uses.times.length=0.times.3=0). The non-terminal node A has a
single use apart from as a subsequence of the "hot" non-terminal
node B, but this single use is not sufficient to exceed the heat
threshold (A's cold uses.times.length=1.times.3=3<8). The single
hot non-terminal node B represents the hot data stream
w.sub.B=abcabc, which accounts for 12/15=80% of all data references
in this example burst.
4. Computing Environment
[0083] FIG. 13 illustrates a generalized example of a suitable
computing environment 1300 in which the described techniques can be
implemented. The computing environment 1300 is not intended to
suggest any limitation as to scope of use or functionality of the
invention, as the present invention may be implemented in diverse
general-purpose or special-purpose computing environments.
[0084] With reference to FIG. 13, the computing environment 1300
includes at least one processing unit 1310 and memory 1320. In FIG.
13, this most basic configuration 1330 is included within a dashed
line. The processing unit 1310 executes computer-executable
instructions and may be a real or a virtual processor. In a
multi-processing system, multiple processing units execute
computer-executable instructions to increase processing power. The
memory 1320 may be volatile memory (e.g., registers, cache, RAM),
non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or
some combination of the two. The memory 1320 stores software 1380
implementing the dynamic optimizer 100 (FIG. 1).
[0085] A computing environment may have additional features. For
example, the computing environment 1300 includes storage 1340, one
or more input devices 1350, one or more output devices 1360, and
one or more communication connections 1370. An interconnection
mechanism (not shown) such as a bus, controller, or network
interconnects the components of the computing environment 1300.
Typically, operating system software (not shown) provides an
operating environment for other software executing in the computing
environment 1300, and coordinates activities of the components of
the computing environment 1300.
[0086] The storage 1340 may be removable or non-removable, and
includes magnetic disks, magnetic tapes or cassettes, COD-ROMs,
CD-RWs, DVDs, or any other medium which can be used to store
information and which can be accessed within the computing
environment 1300. The storage 1340 stores instructions for the
dynamic optimizer software 1380.
[0087] The input device(s) 1350 may be a touch input device such as
a keyboard, mouse, pen, or trackball, a voice input device, a
scanning device, or another device that provides input to the
computing environment 1300. For audio, the input device(s) 1350 may
be a sound card or similar device that accepts audio input in
analog or digital form, or a CD-ROM reader that provides audio
samples to the computing environment. The output device(s) 1360 may
be a display, printer, speaker, CD-writer, or another device that
provides output from the computing environment 1300.
[0088] The communication connection(s) 1370 enable communication
over a communication medium to another computing entity. The
communication medium conveys information such as
computer-executable instructions, audio/video or other media
information, or other data in a modulated data signal. A modulated
data signal is a signal that has one or more of its characteristics
set or changed in such a manner as to encode information in the
signal. By way of example, and not limitation, communication media
include wired or wireless techniques implemented with an
electrical, optical, RF, infrared, acoustic, or other carrier.
[0089] The device connectivity and messaging techniques herein can
be described in the general context of computer-readable media.
Computer-readable media are any available media that can be
accessed within a computing environment. By way of example, and not
limitation, with the computing environment 1300, computer-readable
media include memory 1320, storage 1340, communication media, and
combinations of any of the above.
[0090] The techniques herein can be described in the general
context of computer-executable instructions, such as those included
in program modules, being executed in a computing environment on a
target real or virtual processor. Generally, program modules
include routines, programs, libraries, objects, classes,
components, data structures, etc. that perform particular tasks or
implement particular abstract data types. The functionality of the
program modules may be combined or split between program modules as
desired in various embodiments. Computer-executable instructions
for program modules may be executed within a local or distributed
computing environment.
[0091] For the sake of presentation, the detailed description uses
terms like "determine," "generate," "adjust," and "apply" to
describe computer operations in a computing environment. These
terms are high-level abstractions for operations performed by a
computer, and should not be confused with acts performed by a human
being. The actual computer operations corresponding to these terms
vary depending on implementation.
[0092] In view of the many possible embodiments to which the
principles of our invention may be applied, we claim as our
invention all such embodiments as may come within the scope and
spirit of the following claims and equivalents thereto.
* * * * *