U.S. patent application number 09/792783 was filed with the patent office on 2002-11-14 for method for debugging a dynamic program compiler, interpreter, or optimizer.
Invention is credited to Deaver, Dean M., Reeve, Chris L., Rubin, Norman.
Application Number | 20020170034 09/792783 |
Document ID | / |
Family ID | 26906904 |
Filed Date | 2002-11-14 |
United States Patent
Application |
20020170034 |
Kind Code |
A1 |
Reeve, Chris L. ; et
al. |
November 14, 2002 |
Method for debugging a dynamic program compiler, interpreter, or
optimizer
Abstract
A method of debugging a dynamic computer program optimizer
beginning with creating two copies of the contents of the registers
in a computer processor. One copy is loaded into pseudo-registers
and the other is saved for a verification test. A test sequence
comprising an intermediate representation of a program hot path is
loaded in a software buffer and executed. Register and memory read
and write commands in the test sequence are executed with the
pseudo-registers and a memory buffer. The second copy of the
register contents are then loaded back to the processor registers.
The program hot path is executed and register and memory read and
write commands are executed with the processor registers and system
memory. The contents of the registers and memories are compared and
if the contents match, the test sequence is valid. The test
sequence may also comprise a translated copy of the program hot
path.
Inventors: |
Reeve, Chris L.; (Brookline,
MA) ; Deaver, Dean M.; (Sterling, MA) ; Rubin,
Norman; (Cambridge, MA) |
Correspondence
Address: |
CONLEY ROSE & TAYON, P.C.
P. O. BOX 3267
HOUSTON
TX
77253-3267
US
|
Family ID: |
26906904 |
Appl. No.: |
09/792783 |
Filed: |
February 23, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60212223 |
Jun 16, 2000 |
|
|
|
Current U.S.
Class: |
717/127 ;
714/E11.21 |
Current CPC
Class: |
G06F 11/3688
20130101 |
Class at
Publication: |
717/127 |
International
Class: |
G06F 009/44 |
Claims
What is claimed is:
1. A computer system, comprising: a processor configured with
read/write register locations and configured to execute a computer
program; a system memory coupled to said processor; a run-time
program optimizer loaded into the memory configured to create and
execute an alternate representation of the computer program; and at
least one input/output device coupled to at least one processor;
wherein the optimizer is debugged by a debugger that compares the
outputs of the optimizer versus the outputs of the processor.
2. The computer system of claim 1 wherein: the alternate
representation of the computer program is an intermediate
representation or a translation of a hot path within the computer
program; wherein the debugger completes a verification process that
compares the output of the alternate representation of the computer
program with the output of the original hot path.
3. The computer system of claim 2 further comprising: an
interpreter within the optimizer that computes the results of the
alternate representation.
4. The computer system of claim 3 further comprising:
pseudo-registers configured to hold copies of the contents of the
processor registers; and a memory buffer configured to hold updated
copies of system memory blocks; wherein the pseudo-registers and
pseudo-memories are used to store data changes as instructed during
execution of the alternate representation.
5. The computer system of claim 4 wherein: the debugger compares
the contents of the pseudo-registers to the contents of the
processor registers and wherein if the contents are the same, the
alternate representation is classified as being as correct as the
original code to which the alternate representation is
compared.
6. The computer system of claim 4 wherein: the debugger compares
the contents of the memory buffer to the contents of the processor
memory and wherein if the contents are the same, the alternate
representation is classified as being as correct as the original
code to which the alternate representation is compared.
7. A method of debugging a dynamic program optimizer, comprising:
creating a plurality of copies of the contents of the registers in
a computer processor; loading a first copy of the register contents
to pseudo-registers; loading a test sequence comprising an
intermediate representation of a program hot path in a software
buffer; executing instructions in the test sequence and fulfilling
register read and write commands with the pseudo-registers; loading
a second copy of the register contents to the processor registers;
executing instructions in the program hot path and fulfilling
register read and write commands with the processor registers; and
checking contents of the registers and pseudo-registers; wherein if
the register contents match, the test sequence is valid and wherein
if the register contents do not match, the test sequence is
invalid.
8. The method of claim 7 wherein: the test sequence comprises a
translated copy of the program hot path.
9. The method of claim 8 wherein: the program hot path comprises an
intermediate representation of a program hot path trace.
10. The method of claim 7 further comprising: executing
instructions in the test sequence and; fulfilling memory write
commands to a memory buffer; fulfilling memory read commands from
the memory buffer if the requested memory exists in the memory
buffer; fulfilling memory read commands from system memory if the
requested memory does not exist in memory buffer; executing
instructions in the program hot path and fulfilling memory read and
write commands with system memory; and checking contents of the
memory and memory buffer; wherein if the memory contents match, the
test sequence is valid and wherein if the memory contents do not
match, the test sequence is invalid.
11. The method of claim 7 further comprising: debugging the
intermediate representation before the optimizer analyzes the
interpreter output to translate the hot path.
12. The method of claim 8 further comprising: debugging the
translated copy before the optimizer overwrites program hot path
with the translated copy.
13. The method of claim 7 further comprising: storing the starting
and stopping decision points in the program hot path.
14. The method of claim 13 further comprising: creating bailout
points in the test sequence corresponding to decision points in the
original program and using the same start and stop points for the
test sequence as for the program hot path.
15. A computer program optimizer debugger, comprising: read/write
access to registers in a computer system processor; read/write
access to computer system memory; a temporary memory location in
the computer system memory; a temporary register location in the
computer system memory; and wherein the debugger is configured to
make duplicate copies of the contents of the registers of the
computer processor and wherein one copy of the register contents is
placed in the temporary register location; and wherein after a
program optimizer creates an intermediate representation of a
portion of a computer program, the debugger performs a test
execution of the instructions in the intermediate representation
using data in the temporary register and temporary memory locations
and verifies that the contents of the temporary register match the
contents of the processor registers and that the temporary memory
location matches the contents of the system memory after a
verification execution of the original portion of the computer
program by the computer system.
16. The system of claim 15 wherein: during the test execution, the
debugger reads and writes exclusively to the temporary register
location, writes exclusively to the temporary memory location and
reads from the temporary memory location or from the system memory
location if the requested data does not exist in the temporary
memory location.
17. The system of claim 16 wherein: during the verification
execution, the computer system reads and writes exclusively to the
processor register location and reads and writes exclusively to the
system memory.
18. The system of claim 15 wherein: the intermediate representation
is an interpretable alternative representation of a hot path in a
program image.
19. The system of claim 15 further comprising: the intermediate
representation is a translated copy containing machine instructions
of a hot path in a program image.
20. The system of claim 15 wherein: if debugger verification shows
that the contents of the temporary register location do not match
the contents of the processor registers or that the temporary
memory location does not match the contents of the system memory,
the debugger reports an error.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to the following commonly
assigned provisional application entitled:
[0002] "A Dynamic Optimization and Specialization Tool," Serial No.
60/212,223, filed Jun. 16, 2000, which is hereby incorporated by
reference herein.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0003] Not applicable.
BACKGROUND OF THE INVENTION
[0004] 1. Field of the Invention
[0005] The present invention generally relates to dynamic, run-time
optimization and translation of binary executables. More
particularly, the invention relates to real-time debugging of
components that copy or create new code in such optimization
systems.
[0006] 2. Background of the Invention
[0007] Improving run-time software application performance in
microprocessor systems is an important means of improving processor
throughput and execution speeds. While it is possible to optimize
application executables at compile time (before the application is
ever run by an end-user), such optimizations cannot account for all
the possible variables that may affect run-time performance. A
priori run-time optimization is difficult to predict and implement
because most executable programs operate in varying systems with
varying shared libraries and varying inputs. Thus, while these
applications may be executed on high-performance computer systems
and the executables may be optimized using a static optimizing
compiler, true run-time optimization may still offer an additional
measure of improved application performance.
[0008] As with any software program, a run-time optimizer may be
debugged. However, unlike other software programs, a dynamic
optimizer is particularly difficult to debug because each time a
program is run, small differences in timing or machine load can
cause a dynamic optimizer to produce different output. The
optimizer may also start running an optimization at different
places on different runs because of different timing situations.
Thus, unlike conventional software programs, debug situations and
start points for dynamic optimizer programs are almost never
repeatable.
[0009] Another problem with debugging dynamic optimizers has to do
with one particular function of the optimizer. Ideally, a dynamic
optimizer will analyze a frequently executed path of executable
program code and determine if that path can be optimized by taking
advantage of invariance or pseudo-invariance of instructions within
that path. More specifically, the optimizer will often invoke an
interpreter that interprets the instructions in a program path and
provides the results of the interpretation to the optimizer. The
optimizer will then analyze the results and determine, among other
things, if instructions in the program path are pseudo-invariant.
An instruction is invariant or constant if it produces the same
output value every time it is executed. An instruction is
pseudo-invariant if it is invariant or if it produces a limited set
of output values almost every time it is executed. An optimizer may
advantageously use this pseudo-invariance information to calculate
values for variables and instructions ahead of time and substitute
a translated, less costly (in terms of system resources) series of
instructions in place of the original program code. Thus, because
the dynamic optimizer executes a code translation, the optimizer
may also be referred to as a dynamic translator.
[0010] Within one execution of a program, a dynamic optimizer may
rewrite or translate code multiple times. Thus, a given code
sequence with an error may be overwritten by a subsequent code
sequence and the exact nature of the error, the time the error was
generated, and any possible reasons for the error may be lost. A
post-processing debugger is therefore incapable of capturing
real-time debug information and will not completely aid a software
developer in debugging the dynamic optimizer.
[0011] One prior method used to debug dynamic optimizers involves a
deterministic playback technique. In this method, an initial
execution records the results of all decision points into a file. A
decision point in a program is a point in a computer program where
a decision determines the subsequent path. For example, an IF
statement or a WHILE statement may qualify as a decision point.
After the first execution records these decision points, a second
execution then uses this information to remake the decisions once
again. The results of the executions are compared and checked for
discrepancies. This particular method is useful, but may be
difficult to employ if the dynamic optimizer uses multiple threads
or if the decision points are difficult to locate. This method is
also problematic in that it only checks end results and is not
capable of checking intermediate interpreter results or
intermediate translations. In general, this method is also complex
to implement.
[0012] It is desirable therefore, to develop a method of debugging
a dynamic optimizer program that provides error information during
the interpretation and translation processes. The method may
advantageously offer software developers more detailed run-time
information and provide a precise means of debugging an optimizer,
including the interpreter and translator operations within the
optimizer.
BRIEF SUMMARY OF THE INVENTION
[0013] The problems noted above are solved in large part by a
method of debugging a dynamic computer program optimizer. The
method may be applied to portions of the optimizer, including the
interpreter or the translator or any component that creates a copy
or a new version of computer code. The preferred embodiment permits
checking of the newly created code against existing code that is
presumed to be correct before proceeding with interpretation or
code replacement.
[0014] The debug method begins after the new code or an
intermediate representation of the existing code is generated. The
debugger then reads the computer processor registers and creates
two copies of the contents of those registers. One copy is loaded
into temporary pseudo-registers and the other is saved for a
verification test. If the debugger is checking the interpreter, the
new code, which may be called the test sequence, comprises an
intermediate representation of a program hot path. The intermediate
representation is loaded in a software buffer and executed. Any
register read and write commands in the test sequence are executed
with the pseudo-registers. In addition, a memory buffer is created
but is initially left empty. Any memory write requests in the test
sequence are executed to the memory buffer instead of system
memory. A memory read request will force the debugger to first
check the memory buffer for the requested data and if it does not
exist in the memory buffer, the data is read from system
memory.
[0015] At the end of the test sequence execution, the second copy
of the register contents are loaded back to the processor
registers. The original program hot path is executed and all
register and memory read and write commands are executed with the
processor registers and system memory. Following this verification
test, the contents of the registers are compared to the
pseudo-registers and if the contents match, the test sequence is
potentially valid. The debugger then proceeds to check the memory
contents. The memory buffer is checked against the relevant
addresses in system memory and if the contents match (and if the
register contents matched), the test sequence is then considered
valid. If either the register or memory contents do not match the
contents of the pseudo-registers and the memory buffer,
respectively, the test sequence is considered invalid and the
debugger reports a mismatch.
[0016] In addition to testing the interpreter, the debugger can
test every phase of the optimizer that produces a copy or
intermediate representation of the existing code. For example, the
test sequence may instead comprise a translated copy of the program
hot path. The results of executing the translated copy of the code
are then verified against the original code in the same manner as
above.
BRIEF DESCRIPTION OF THE DRAWINGS
[0017] For a detailed description of the preferred embodiments of
the invention, reference will now be made to the accompanying
drawings in which:
[0018] FIG. 1 is an illustrative diagram of a simple computer which
executes a program and implements a program optimizer that uses the
preferred embodiment;
[0019] FIG. 2 is a functional block diagram of the logical
components in the computer of FIG. 1;
[0020] FIG. 3 is a flow diagram showing the procedure by which the
preferred embodiment creates an optimized hot path IR and
translation in a program running on the computer of FIG. 1;
[0021] FIG. 4 shows the trace extraction of a hot path in a
computer program running on the computer of FIG. 1; and
[0022] FIG. 5 shows a flow diagram describing the debug validation
used in the preferred embodiment.
NOTATION AND NOMENCLATURE
[0023] Certain terms are used throughout the following description
and claims to refer to particular system components. As one skilled
in the art will appreciate, computer companies may refer to a
component by different names. This document does not intend to
distinguish between components that differ in name but not
function. In the following discussion and in the claims, the terms
"including" and "comprising" are used in an open-ended fashion, and
thus should be interpreted to mean "including, but not limited to .
. . ". Also, the term "couple" or "couples" is intended to mean
either an indirect or direct electrical connection. Thus, if a
first device couples to a second device, that connection may be
through a direct electrical connection, or through an indirect
electrical connection via other devices and connections.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0024] The preferred embodiment is directed to a technique and
method for verifying the correctness of translation steps in an
executable program optimizer. The technique involves translating a
portion of original program code into an alternate representation,
such as an intermediate representation (IR) or machine
instructions, then providing real-time verification of the
translation process by ensuring that the translated representation,
when executed or interpreted, produces the same results as a
version of code that is presumed to be correct.
[0025] FIG. 1 shows a general purpose computer 10 that is suitable
for this technique. The computer preferably includes a processor
tower or housing 20, in which the computer processor, memory and
storage media are housed. The computer 10 may be a desktop
computer, a dedicated server, or some other type of computer such
as a laptop or portable computer. The computer 10 also preferably
includes input and output devices such as a keyboard 30, mouse,
display 40, printer, or other devices that permit user
interface.
[0026] FIG. 2 shows a simplified diagram of the main chipset for
computer 10. The computer 10 preferably includes a processor 200, a
data cache 210, a logic device 220 which may operate as a memory
controller and/or a bus bridge device, an I/O controller 230, a
graphics controller 260, and a memory 240. The logic device 220
couples the processor 200 to the memory 240 and to various
peripheral devices through a primary expansion bus (Host Bus) 250
such as a Peripheral Component Interconnect (PCI) bus or some other
suitable architecture. The I/O controller 230 typically interfaces
to basic input/output devices such as the keyboard 30 of FIG. 1.
The graphics controller 260 may be coupled to the logic device 220
via an Accelerated Graphics Port bus 270 to drive the display
device 40 of FIG. 1. Processor 200 comprises a data cache 210 and
processor registers 280. Execution units within the processor 200
are capable of reading data more quickly from the cache 210 than
from main memory 240. The processor registers 280 include general
purpose registers (e.g., integer and floating point registers) and
control and status registers such as program counters and interrupt
control registers.
[0027] It should be noted that the devices shown in FIG. 2
represent a simplified chipset commonly found in a computer 10 and
may include other devices not shown in FIG. 2. For instance, the
computer 10 may include a plurality of processors 200, memory
arrays 240, and logic devices 220. The computer 10 may also provide
access to a plurality of expansion buses and include other
expansion devices. In general, any of a wide range of computer
systems using a variety of program optimizers may implement the
preferred embodiment.
[0028] Referring now to FIG. 3, the computer 10 is configured to
execute any number of conventional programs (i.e., an executable
image, EXE). The program image 300 is created and placed in memory
240 where it is accessed and executed by processor 200 (not shown).
Within the program image 300, there are any number of hot paths
310, which are program paths that are executed frequently. A
program optimizer, such as the Wiggins/Redstone optimizer, is
capable of locating such hot paths 310 and the preferred embodiment
of the invention provides a means of verifying the conversion
and/or optimization of the code in any given hot path 310. The
process of extracting code from a hot path is shown graphically in
FIG. 4.
[0029] On the left side of FIG. 4 is a network of decision points,
numbered 1 through 7, that may fall within the instructions of a
computer program. At each decision point, the program may follow
one of several possible paths. The solid line graphically depicts
the path actually followed by a program as it is being executed.
The dotted lines depict paths that may have been chosen at a
decision point.
[0030] A dynamic program optimizer is capable of tracking the paths
taken by a program and, if it is determined that a path is taken
more often than others, that path may be labeled as a hot path.
This hot path is then examined and executed a plurality of times to
determine possible ways to improve the code within the hot path.
Once a hot path is identified, it is converted to a linear trace
such as shown on the right side of FIG. 4. The trace code does not
include the decision points present in the original code, but it
does include bailout points corresponding to each of the original
decision points. Each bailout point provides a landmark that
signifies where in the original code trace instructions belong and
also provides a means of returning to the original code in the
proper location.
[0031] Referring again to FIG. 3, the trace code is then used by
the program optimizer to create an alternate representation of the
instructions within the hot path. Whereas the original executable
program image 300 is placed in memory 240, an alternate
representation of a hot path 310 is created and copied to a
software buffer 320. The alternate representation is preferably
referred to as an Intermediate Representation (IR).
[0032] The IR is interpreted to check for pseudo-invariance in
instructions or other non-varying information such as memory reads
and writes. The results of the IR interpretation are analyzed by
the program optimizer which then rewrites or translates the code
within the trace in a way that preferably takes advantage of any
invariance or pseudo-invariance within the code. The translated
code 340 is then written in place of the hot path 310 into the
original program image 300.
[0033] As shown in FIG. 3, the optimization process involves
writing any number of distinct versions of code or intermediate
representations. The preferred embodiment is capable of verifying a
version immediately after it is created. The method by which a
newly created code is verified is shown in FIG. 5.
[0034] Referring now to FIG. 5, the verification process begins 500
after the new code is created. The first step in the verification
process is to make two copies 505 of the contents of the CPU
registers 280. Each copy will be used in a separate execution path.
One execution path is a test execution path and the other is a
verification execution path. In the test execution path shown on
the left side of FIG. 5, the first step in the path is to copy the
register contents from step 505 to pseudo-registers 510, which may
be nothing more than temporary memory locations capable of storing
the register values in unique locations.
[0035] Once the pseudo-registers are created, test execution 520 of
the newly created code may begin. The instructions within the new
code are computed as they would be by the CPU. However, instead of
reading register contents from the CPU registers 280, all register
reads and writes 540 are performed through the pseudo-registers.
Similarly, memory accesses differ as well. If a block of data must
be written to memory, that data is written instead to a memory
buffer 530, which like the pseudo-registers, may simply be a
temporary memory location capable of storing memory, address,
coherence or any other information that is stored in system memory
240.
[0036] If a block of data must be read from memory, a decision is
first made 545 as to whether that particular memory address has
been written to a memory buffer during this test execution 520. If
a memory address has not been accessed (i.e., not written), the
data is read from system memory 550. If the data block has been
altered (i.e., written), then the data must be retrieved from the
memory buffer 555. This decision process guarantees that the
correct version of memory data is retrieved and that the contents
of system memory are not changed. At the end of the test execution,
the contents of the pseudo-registers and the memory buffer are kept
for comparison as described below.
[0037] After the test execution 520 is complete, the verification
execution path is started by copying the second copy of the CPU
register contents 280 from step 505 back into the CPU registers
280. This is done to guarantee that the starting point for the
verification path is the same as it was for the test path. Once the
register contents are copied, verification execution 525 begins and
the instructions within the original program code (e.g., hot path)
are executed by the CPU 200. All memory reads and writes 560 and
register reads and writes 570 are performed as during normal
program execution. That is, no pseudo-registers or memory buffer is
used in the verification execution 525. Verification execution 525
stops when the end of the code is reached. It should be noted that
the decision points and bailout points described above in
conjunction with FIG. 4 provide start and stop points to guarantee
that the new code that is checked in the test execution 520 and the
original code that is checked in the verification execution 525
begin and end at the same points in the original program.
[0038] In accordance with the preferred embodiment, once the test
and verification executions 520, 525 are completed, the contents of
the memory buffer are compared with the contents of system memory
575 and the contents of the pseudo-registers are compared with the
contents of the CPU registers 580. If all the values are the same,
then the code creation process was successful and the program
optimizer may then proceed. If the contents of the registers and
memories are not equal, then the verification will indicate an
error and the code will flagged as invalid.
[0039] The advantage to this method is that the IR and the
translated code may be checked immediately after the code is
created. Furthermore, if errors are generated, a system programmer
will know which registers and memory locations were incorrect as
well as what their correct values should be. The preferred
embodiment therefore provides an efficient method of providing
real-time debug information as well as preventing the incorporation
of code that will lead to faulty results. Note also, that the above
preferred embodiment may be implemented after an IR is created or
after a translated piece of code is created. Thus, the debugger is
fully capable of debugging any phase of the optimizer that produces
an alternate representation of the original program code or a
portion thereof.
[0040] The above discussion is meant to be illustrative of the
principles and various embodiments of the present invention.
Numerous variations and modifications will become apparent to those
skilled in the art once the above disclosure is fully appreciated.
For example, the description above included a test execution
performed prior to a verification execution. It is entirely
possible that the procedure be executed in reverse order with the
verification execution path coming first. It is intended that the
following claims be interpreted to embrace all such variations and
modifications.
* * * * *