U.S. patent application number 09/153382 was filed with the patent office on 2001-08-09 for method and apparatus for finding bugs related to garbage collection in a virtual machine.
Invention is credited to UNGAR, DAVID.
Application Number | 20010013117 09/153382 |
Document ID | / |
Family ID | 22546986 |
Filed Date | 2001-08-09 |
United States Patent
Application |
20010013117 |
Kind Code |
A1 |
UNGAR, DAVID |
August 9, 2001 |
METHOD AND APPARATUS FOR FINDING BUGS RELATED TO GARBAGE COLLECTION
IN A VIRTUAL MACHINE
Abstract
A method and apparatus for finding bugs related to garbage
collection in a virtual machine. For each possible garbage
collection point in a stream of execution, a compiler in the
virtual machine provides a map that specifies live pointer
locations in the stack. In addition, the map identifies those
locations in the stack that contain other forms of live data, such
as integers. All other locations are considered "dead," i.e., no
longer in use or never used. At each garbage collection point,
"dead" locations in the stack are overwritten with an invalid
pointer value. Because of the overwriting process, any bug in the
compiler that causes a live pointer to be omitted from the map also
causes the omitted pointer to be overwritten with the invalid
pointer value. Regardless of whether garbage collection is actually
performed at the garbage collection point where the pointer was
omitted from the compiler-generated map, subsequent execution steps
that reference the omitted pointer trigger an invalid pointer
error. The invalid pointer error may be trapped and identified as a
compiler bug related to map generation in the garbage collection
process.
Inventors: |
UNGAR, DAVID; (PALO ALTO,
CA) |
Correspondence
Address: |
THE HECKER LAW GROUP
1925 CENTURY PARK EAST
SUITE 2300
LOS ANGELES
CA
90067
US
|
Family ID: |
22546986 |
Appl. No.: |
09/153382 |
Filed: |
September 15, 1998 |
Current U.S.
Class: |
717/141 ;
714/E11.024 |
Current CPC
Class: |
G06F 11/0766 20130101;
G06F 11/0718 20130101; Y10S 707/99942 20130101; Y10S 707/99953
20130101; G06F 11/0751 20130101 |
Class at
Publication: |
717/5 |
International
Class: |
G06F 009/44 |
Claims
1. In a computer system, a method comprising: determining, at each
gc-point, a memory location that does not contain a live pointer or
live data, said determination based on a map; and overwriting said
memory location with an invalid pointer value.
2. The method of claim 1, further comprising implementing a trap
for a reference made via said invalid pointer value.
3. The method of claim 1, further comprising generating said map in
a compiler.
4. The method of claim 1, wherein said method is implemented within
a virtual machine environment.
5. The method of claim 1 wherein said memory location comprises a
register or local variable of a stack.
6. The method of claim 1, further comprising: determining from said
map a second memory location designated as containing a live data
value; and issuing a warning if the contents of said second memory
location match a possible object reference.
7. The method of claim 6, wherein said step of issuing a warning
comprises logging said warning in a log file.
8. The method of claim 6, wherein said step of issuing a warning
comprises displaying a warning dialog on a display device.
9. A computer program product comprising: a computer usable medium
having computer readable code embodied therein for debugging a
garbage collection process, said computer program product
comprising: computer readable code configured to cause a computer
to determine, at each gc-point, a memory location that does not
contain a live pointer or live data, said determination based on a
map; and computer readable code configured to cause a computer to
overwrite said memory location with an invalid pointer value.
10. The computer program product of claim 9, further comprising
computer readable code configured to cause a computer to implement
a trap for a reference made via said invalid pointer value.
11. The computer program product of claim 9, further comprising
computer readable code configured to cause a computer to generate
said map in a compiler.
12. The computer program product of claim 9, wherein said computer
readable code is configured to be executed within a virtual machine
environment.
13. The computer program product of claim 9 wherein said memory
location comprises a register or local variable of a stack.
14. The computer program product of claim 9, further comprising:
computer readable code configured to cause a computer to determine
from said map a second memory location designated as containing a
live data value; and computer readable code configured to cause a
computer to issue a warning if the contents of said second memory
location match a possible object reference.
15. The computer program product of claim 14, wherein said computer
readable code configured to cause a computer to issue a warning
comprises computer readable code configured to cause a computer to
log said warning in a log file.
16. The computer program product of claim 14, wherein said computer
readable code configured to cause a computer to issue a warning
comprises computer readable code configured to cause a computer to
display a warning dialog on a display device.
17. A virtual machine comprising: a stack having a plurality of
memory locations; a map associating a respective state with one or
more of said plurality of memory locations; and a process
operatively coupled to said map and said stack, said process
configured to overwrite at each gc-point one or more memory
locations associated with a dead state, said overwriting comprising
replacing the contents of a respective memory location with an
invalid pointer value.
18. The virtual machine of claim 17, wherein said virtual machine
is configured to implement a trap for operations that access said
invalid pointer value.
19. The virtual machine of claim 17, further comprising a compiler
configured to generate said map.
20. The virtual machine of claim 17, wherein said one or more
memory locations comprise registers and local variables.
21. The virtual machine of claim 17, wherein said process is
further configured to issue a warning regarding one or more memory
locations that are associated with a live data state and whose
contents match an object reference.
22. The virtual machine of claim 21, wherein said warning is issued
as a log entry to a log file.
23. The virtual machine of claim 21, wherein said warning is issued
as a warning dialog to a display.
24. A virtual machine comprising: means for determining, at each
gc-point, a memory location that does not contain a live pointer or
live data, said determination based on a map; and means for
overwriting said memory location with an invalid pointer value.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] This invention relates to the field of computer systems,
and, more specifically, to memory management garbage collection
processes.
[0003] Sun, Sun Microsystems, the Sun logo, Java and all Java-based
trademarks and logos are trademarks or registered trademarks of Sun
Microsystems, Inc. in the United States and other countries. All
SPARC trademarks are used under license and are trademarks of SPARC
International, Inc. in the United States and other countries.
Products bearing SPARC trademarks are based upon an architecture
developed by Sun Microsystems, Inc.
[0004] 2. Background Art
[0005] An important aspect of memory management in any computer
system is garbage collection. Garbage collection (GC) refers to the
process of reclaiming portions of main memory that are no longer in
use by the system or any running applications. In an
object-oriented system, garbage collection is typically carried out
to reclaim memory allocated to objects and other data structures
(e.g., arrays, etc.) that are no longer referenced by an
application. The reclaimed memory can then be re-allocated to store
new objects or data structures.
[0006] In a Java.TM. virtual machine, garbage collection is
performed to reclaim memory space from a region of memory known as
the heap. The heap is used to store objects and arrays that are
referenced by pointers stored as local variables in activation
records, or "stack frames," of a stack associated with an
individual thread of execution in the virtual machine. The
invocation of a method by a given thread results in the creation of
a new stack frame that is "pushed" onto the stack of that thread.
References to objects on the heap may be removed by an active
(i.e., currently executing) method setting the respective pointer
to a "null" value, or by removal of a respective stack frame in
response to completion of its associated method.
[0007] In any thread of execution, there may be many garbage
collection points, or "gc-points," where garbage collection can
occur. However, actual garbage collection typically takes place at
only a fraction of these possible gc-points each time the given
thread of execution is run. In virtual machine implementations
using a compiler, the compiler provides information at each
gc-point about the set of locations in the stack frames that
contain pointers to objects or arrays. Garbage collection is
performed by determining which objects and arrays in the heap are
referenced from within the set of locations specified by the
compiler, and reclaiming those objects and arrays that are no
longer referenced.
[0008] Unfortunately, the compiler may have an error (i.e., a
"bug") that causes a stack location to be mistakenly omitted from
the specified set of pointer locations. This type of compiler bug
can result in the reclaiming of an object or array when a reference
still exists. Also, for a type of garbage collection known as
"copying" garbage collection, this compiler bug may result in a
failure to update a pointer reference to point to the appropriate
copy of the associated object or array. In either case, future
references made to the object or array through the omitted stack
location can result in improper execution of an application. This
bug is garbage collection-related, but it may appear to be a code
generation bug, making detection and correction difficult.
[0009] To provide a better understanding of the problems associated
with garbage collection in a virtual machine, an overview of
garbage collection techniques is provided below.
[0010] Garbage Collection
[0011] Garbage collection may be either conservative or exact.
Conservative garbage collection involves scanning memory space for
stored values that match the address of an object (or other memory
structure) that is being considered for collection. If a matching
value is not found in the memory being scanned, then no references
to the object exist, and the object may be safely collected. If a
matching value is found, it is assumed that the value is a
reference (e.g., a pointer) to the object under consideration, and
the object is not collected. This assumption means that an object
is not collected even if the matching memory value is not a
reference to the object, but rather a data value that
coincidentally matches the base address of the object.
[0012] In exact garbage collection, only true references (pointers)
are considered in a scan, so coincidentally matching data values
are ignored in the collection process. This means that an object
without any associated references is always considered garbage in a
scan, and more efficient collection is achieved. However, to
perform exact garbage collection, the scanning process must have
information regarding which memory locations contain live
references (i.e., active, non-null references). Only those memory
locations containing live references are scanned to determine
reference matches for objects under consideration for
collection.
[0013] To provide more efficient use of memory space in terms of
compaction, "copying" garbage collection is commonly implemented.
In copying garbage collection, the memory space is divided into
regions and an object transfer is performed. When garbage
collection is carried out, objects in a portion of memory referred
to as "from" space are copied to a portion referred to as "to"
space. Those objects in "from" space that are considered "garbage"
by the scan process are not copied to "to" space. The process of
copying the objects results in reduced fragmentation of the memory
space and better compaction.
[0014] FIG. 1 is a flow diagram illustrating a copying garbage
collection process. In step 100, the set of references to be
scanned is determined. For example, a mechanism may be provided
that tracks the creation of references, and maintains a list of
current references for exact garbage collection. This list may be
used to define the set to be scanned in step 100. In step 101, the
garbage collection process obtains the first reference from the set
of references. In step 102, the reference is analyzed to determine
if the reference points to an object in "from" space. If the
reference does not point to an object in "from" space, the process
jumps to step 107. If, however, the reference does point to an
object in "from" space, the process continues at step 103.
[0015] In step 103, the referenced object in "from" space is
examined to determine whether the object is marked as copied. If
the referenced object is marked, the process jumps to step 106.
However, if the referenced object in "from" space is not marked as
copied, the process continues at step 104, in which the referenced
object is copied into "to" space. In subsequent step 105, the
referenced object in "from" space is marked as copied (e.g.,
replaced with a marker), with the location of the new copy in "to"
space identified in the marker. In step 106, the current reference
is updated to point to the location of the new copy of the object
in "to" space, as identified by the marked object in "from" space.
The process continues in step 107.
[0016] In step 107, a check is performed to determine whether the
current reference is the last reference in the set of references to
be scanned. If the current reference is not the last reference, in
step 108, the next reference in the set is obtained, and the
process returns to Step 102. If, however, in step 107, the current
reference is the last reference in the set, the process completes
in step 109 where "from" space is collected in its entirety.
Ideally, no references will be made in the future to objects in
"from" space. In a subsequent garbage collection, "from" space
becomes "to" space and "to" space becomes "from" space for purposes
of copying.
[0017] The copying garbage collection scheme described above may be
expanded to implement a generational approach. Generational
collection schemes are predicated on the general assumption that
newly created objects are more prone to collection than objects
that have survived several garbage collection cycles. Using the
generational approach, objects are segregated into generational
groups of objects according to the number of garbage collection
cycles survived, with each generational group having its own
respective "to" space and "from" space. Garbage collection is then
carried out separately for each generational group of objects, with
garbage collection being carried out more frequently for younger
generations.
[0018] Exact garbage collection is required when objects are
copied, to prevent a coincidental data value match from causing
mutation of the data value during updating of object references. As
stated previously, exact garbage collection requires information
about which locations contain active or "live" references to
objects. Problems can arise when this information is incorrect. For
example, if a live reference fails to be identified in step 100 of
FIG. 1 due to misidentification of a live reference, the garbage
collection process may erroneously collect the associated object
without copying, causing unpredictable performance when the
reference is used by a method to access the object in the
future.
[0019] Also, if the associated object is copied to "to" space
(e.g., because other references to the object exist and are
correctly identified), the misidentified reference is not updated
in step 106 to refer to the new object copy. Thus, while object
access through other identified references will address the new
object copy, object access through the misidentified reference will
continue to address the obsolete object with unknown and
undesirable consequences. This undesirable behavior will appear as
a code generation bug associated with the executing application,
when it is in fact associated with garbage collection, and more
specifically associated with the component that provides the
information about live object references.
[0020] In the prior art, stress tests have been performed to test
for execution bugs. Stress tests attempt to test extreme execution
conditions that will result in the triggering and resulting
detection of any bugs in the system. However, with respect to
garbage collection, a stress test will only result in testing at
gc-points where garbage collection actually occurs. Because garbage
collection occurs at only a subset of gc-points, and because that
subset of gc-points may not differ from one execution to the next
for a particular application or input data set, stress tests are
insufficient to reliably and exhaustively find bugs associated with
the misidentification of a live reference at possibly a single
gc-point out of many in the execution of the application. Further,
with respect to a virtual machine environment where a compiler
identifies the live references for applications it compiles, a
stress testing application may be insufficient to bring about
conditions that will result in misidentification of a stack
location by the compiler, whereas another application may
consistently trigger such a bug in the compiler.
SUMMARY OF THE INVENTION
[0021] A method and apparatus for finding bugs related to garbage
collection in a virtual machine are described. For each possible
garbage collection point in a stream of execution, a compiler in
the virtual machine provides a map that specifies live pointer
locations in the stack. In addition, the map identifies those
locations in the stack that contain other forms of live data, such
as integers. All other locations are considered "dead," i.e., no
longer in use or never used. At each garbage collection point,
"dead" locations in the stack are overwritten with an invalid
pointer value. Because of the overwriting process, any bug in the
compiler that causes a live pointer to be omitted from the map also
causes the omitted pointer to be overwritten with the invalid
pointer value. Regardless of whether garbage collection is actually
performed at the garbage collection point where the pointer was
omitted from the compiler-generated map, subsequent execution steps
that reference the omitted pointer trigger an invalid pointer
error. The invalid pointer error may be trapped and identified as a
compiler bug related to map generation in the garbage collection
process.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] FIG. 1 is a flow diagram of a copying garbage collection
process.
[0023] FIG. 2 is a block diagram of compile and runtime
environments.
[0024] FIG. 3 is a block diagram of the runtime data areas of an
embodiment of a virtual machine.
[0025] FIG. 4A is a block diagram illustrating an example of the
use of pointers on a stack to reference objects on a heap.
[0026] FIG. 4B is a block diagram illustrating the application of
copying garbage collection to the stack and heap example of FIG.
4A.
[0027] FIG. 5 is a flow diagram, in accordance with an embodiment
of the invention, of a process for finding bugs related to garbage
collection in a virtual machine.
[0028] FIG. 6 is a block diagram, in accordance with an embodiment
of the invention, of apparatus for finding bugs related to garbage
collection.
[0029] FIG. 7 is a block diagram of one embodiment of a computer
system capable of providing a suitable execution environment for an
embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0030] The invention is a method and apparatus for finding bugs
related to garbage collection in a virtual machine. In the
following description, numerous specific details are set forth to
provide a more thorough description of embodiments of the
invention. It will be apparent, however, to one skilled in the art,
that the invention may be practiced without these specific details.
In other instances, well known features have not been described in
detail so as not to obscure the invention.
[0031] Though discussed herein with respect to the Java programming
language and the Java virtual machine, the invention may be
implemented in any environment that supports object or data access
through references, and that provides information about live object
references for use in garbage collection.
[0032] In virtual machines that implement compiling of bytecode
input, such as the Java virtual machine, the component providing
the information about live object references is the compiler. The
compiler is responsible for compiling method code, and is therefore
knowledgeable about the contents of stack frames at garbage
collection points. An embodiment of a processing environment and
virtual machine implementation are more fully described below.
[0033] The Processing Environment
[0034] The processing environment of the Java programming language
is object-oriented in nature. To provide a better understanding of
object-oriented principles, an overview of object-oriented
programming follows.
[0035] 1. Object-Oriented Programming
[0036] Object-oriented programming is a method of creating computer
programs by combining certain fundamental building blocks, and
creating relationships among and between the building blocks. The
building blocks in object-oriented programming systems are called
"objects." A software application can be written using an
object-oriented programming language whereby the program's
functionality is implemented using these objects.
[0037] An object is a programming unit that groups together a data
structure (one or more instance variables) and the operations
(methods) that can use or affect that data. Thus, an object
consists of data and one or more operations or procedures that can
be performed on that data. The joining of data and operations into
a unitary building block is called "encapsulation."
[0038] An object can be instructed to perform one of its methods
when it receives a "message." A message is a command or instruction
sent to the object to execute a certain method. A message consists
of a method selection (e.g., method name) and zero or more
arguments. A message tells the receiving object what operations to
perform.
[0039] One advantage of object-oriented programming is the way in
which methods are invoked. When a message is sent to an object, it
is not necessary for the message to instruct the object how to
perform a certain method. It is only necessary to request that the
object execute the method. This greatly simplifies program
development.
[0040] Object-oriented programming languages are predominantly
based on a "class" scheme. An example of a class-based
object-oriented programming scheme is generally described in
"Smalltalk-80: The Language," by Adele Goldberg and David Robson,
published by Addison-Wesley Publishing Company, 1989.
[0041] An object class provides a definition for an object which
typically includes both fields (e.g., variables) and methods. An
object class is used to create a particular object "instance." (The
term "object" by itself is often used interchangeably to refer to a
particular class or a particular instance.) An instance of an
object class includes the variables and methods defined for that
class. Multiple instances can be created from the same object
class. Each instance that is created from the object class is said
to be of the same type or class.
[0042] To illustrate, an employee object class can include "name"
and "salary" instance variables and a "set_salary" method.
Instances of the employee object class can be created, or
instantiated, for each employee in an organization. Each object
instance is said to be of type "employee." Each employee object
instance includes "name" and "salary" instance variables and the
"set_salary" method. The values associated with the "name" and
"salary" variables in each employee object instance contain the
name and salary of an employee in the organization. A message can
be sent to an employee's employee object instance to invoke the
"set_salary" method to modify the employee's salary (i.e., the
value associated with the "salary" variable in the employee's
employee object).
[0043] A hierarchy of classes can be defined such that an object
class definition has one or more subclasses. A subclass inherits
its parent's (and grandparent's etc.) definition. Each subclass in
the hierarchy may add to or modify the behavior specified by its
parent class. Some object-oriented programming languages support
multiple inheritance where a subclass may inherit a class
definition from more than one parent class. Other programming
languages, such as the Java programming language, support only
single inheritance, where a subclass is limited to inheriting the
class definition of only one parent class. The Java programming
language also provides a mechanism known as an "interface" which
comprises a set of constant and abstract method declarations. An
object class can implement the abstract methods defined in an
interface. Both single and multiple inheritance are available to an
interface. That is, an interface can inherit an interface
definition from more than one parent interface.
[0044] 2. Programming and Execution
[0045] Java applications typically comprise one or more object
classes and interfaces. Unlike many programming languages in which
a program is compiled into machine-dependent, executable program
code, classes written in the Java programming language are compiled
into machine independent bytecode class files. Each class contains
code and data in a platform-independent format called the class
file format. The computer system acting as the execution vehicle
contains a program called a virtual machine, which is responsible
for executing the code in each class.
[0046] Applications may be designed as standalone Java
applications, or as Java "applets" which are identified by an
applet tag in an HTML (hypertext markup language) document, and
loaded by a browser application. The class files associated with an
application or applet may be stored on the local computing system,
or on a server accessible over a network. Each class is loaded into
the Java virtual machine, as needed, by the "class loader."
[0047] To provide a client with access to class files from a server
on a network, a web server application is executed on the server to
respond to HTTP (hypertext transport protocol) requests containing
URLs (universal resource locators) to HTML documents, also referred
to as "web pages." When a browser application executing on a client
platform receives an HTML document (e.g., as a result of requesting
an HTML document by forwarding a URL to the web server), the
browser application parses the HTML and automatically initiates the
download of the specified bytecode class files when it encounters
an applet tag in the HTML document.
[0048] The classes of a Java applet are loaded on demand from the
network (stored on a server), or from a local file system, when
first referenced during the Java applet's execution. The virtual
machine locates and loads each class file, parses the class file
format, allocates memory for the class's various components, and
links the class with other already loaded classes. This process
makes the code in the class readily executable by the virtual
machine.
[0049] FIG. 2 illustrates the compile and runtime environments for
a processing system. In the compile environment, a software
developer creates source files 200, which contain the programmer
readable class definitions written in the Java programming
language, including data structures, method implementations and
references to other classes. Source files 200 are provided to Java
compiler 201, which compiles source files 200 into compiled
".class" files 202 that contain bytecodes executable by a Java
virtual machine. Bytecode class files 202 are stored (e.g., in
temporary or permanent storage) on a server, and are available for
download over a network. Alternatively, bytecode class files 202
may be stored locally in a directory on the client platform.
[0050] The Java runtime environment contains a Java virtual machine
(JVM) 205 which is able to execute bytecode class files and execute
native operating system ("O/S") calls to operating system 209 when
necessary during execution. Java virtual machine 205 provides a
level of abstraction between the machine independence of the
bytecode classes and the machine-dependent instruction set of the
underlying computer hardware 210, as well as the platform-dependent
calls of operating system 209.
[0051] Class loader and bytecode verifier ("class loader") 203 is
responsible for loading bytecode class files 202 and supporting
class libraries 204 into Java virtual machine 205 as needed. Class
loader 203 also verifies the bytecodes of each class file to
maintain proper execution and enforcement of security rules. Within
the context of runtime system 208, either an interpreter 206
executes the bytecodes directly, or a "just-in-time" (JIT) compiler
207 transforms the bytecodes into machine code, so that they can be
executed by the processor (or processors) in hardware 210.
[0052] The runtime system 208 of virtual machine 205 supports a
general stack architecture. The manner in which this general stack
architecture is supported by the underlying hardware 210 is
determined by the particular virtual machine implementation, and
reflected in the way the bytecodes are interpreted or JIT-compiled.
Other elements of the runtime system include thread management
(e.g., scheduling) and garbage collection mechanisms.
[0053] FIG. 3 illustrates runtime data areas which support the
stack architecture within runtime system 208. In FIG. 3, runtime
data areas 300 comprise one or more thread-based data areas 307.
Each thread-based data area 307 comprises a program counter
register (PC REG) 308, a local variables pointer register (VARS
REG) 309, a frame register (FRAME REG) 310, an operand stack
pointer register (OPTOP REG) 311, and a stack 312. Stack 312
comprises one or more frames 313 which contain an operand stack 314
and local variables 315. Separate frame formats may be implemented
for interpreted code and compiled code.
[0054] Runtime data areas 300 further comprises shared heap 301.
Heap 301 is the runtime data area from which memory for all class
instances and arrays is allocated. Shared heap 301 comprises method
area 302, which is shared among all threads. Method area 302
comprises one or more class-based data areas 303 for storing
information extracted from each loaded class file. For example,
class-based data area 303 may comprise class structures such as
constant pool 304, field and method data 305, and code for methods
and constructors 306. Methods access class structures by reference.
Pointers to classes are stored in local variables 315 or in
registers associated with a given stack.
[0055] A virtual machine can support many threads of execution at
once. Each thread has its own thread-based data area 307. At any
point, each thread is executing the code of a single method, the
"current method" for that thread. The program counter register 308
contains the address of the virtual machine instruction currently
being executed. Frame register 310 points to the location of the
current method in method area 302.
[0056] Each thread has a private stack 312, created at the same
time as the thread. Stack 312 stores one or more frames 313
associated with methods invoked by the thread. Frames 313 are used
to store data and partial results, as well as to perform dynamic
linking, return values for methods and dispatch exceptions. A new
frame is created and pushed onto the stack each time a method is
invoked, and an existing frame is popped from the stack and
destroyed when its method completes. A frame that is created by a
thread is local to that thread and typically cannot be directly
referenced by any other thread.
[0057] Only one frame, the frame for the currently executing
method, is active at any point in a given thread of control. This
frame is referred to as the "current frame," and its method is
known as the "current method." A frame ceases to be current if its
method invokes another method or if its method completes. When a
method is invoked, a new frame is created and becomes current when
control transfers to the new method. On method return, the current
frame passes back the results of its method invocation, if any, to
the previous frame. The current frame is then discarded while the
previous frame becomes the current one.
[0058] Each frame 313 has its own set of local variables 315 and
its own operand stack 314. The local variables pointer register 309
contains a pointer to the base of an array of words containing
local variables 315 of the current frame. The operand stack pointer
register 311 points to the top of operand stack 314 of the current
frame. Most virtual machine instructions take values from the
operand stack of the current frame, operate on them, and return
results to the same operand stack. Operand stack 314 is also used
to pass arguments to methods and receive method results.
[0059] FIG. 4A is a block diagram illustrating the use of pointers
on the stack to reference objects on the heap in a virtual machine.
The data structures shown include stack 408 comprising stack
locations 400-407, and heap 409 comprising "from" space 410 and
"to" space 411. "From" space 410 comprises object A (412), object B
(413) and object C (414). FIG. 4 illustrates the state of the stack
and heap prior to a garbage collection cycle, so no objects are
shown copied to "to" space 411.
[0060] For purposes of example, stack locations 400-407 appear as a
single array of local variables. However, in actual application,
each stack frame contains its own set of local variables. Further,
pointers to objects may be stored in registers as well. Stack
locations 400-407 may therefore represent registers and local
variables associated with multiple stack frames.
[0061] When an object is instantiated by invocation of a class
constructor, and the object is assigned to a local variable, it is
the pointer to the object that is stored in the local variable on
the stack 408. As examples, stack locations 400 and 405 contain
references to object A in the form of a pointer value. Similarly,
stack locations 401 and 404 contain references to object C, and
stack locations 402 and 407 contain references to object B. Local
variables and registers in stack frames may also contain actual
data, i.e., data that is not indirectly referenced, such as integer
data. For example, stack locations 403 and 406 contain integers X
and Y, respectively.
[0062] FIG. 4B illustrates the stack and heap of FIG. 4A after a
garbage collection cycle has taken place at a gc-point in the
virtual machine's execution of a program. It is assumed in FIG. 4B
that prior to garbage collection, the references to object B were
explicitly released by a method assigning a "null" value to the
respective local variables 402 and 407. During compiling of the
program code, the compiler of the virtual machine prepares a map of
stack locations containing live pointers. An updated map is
generated by the compiler for each gc-point. In FIG. 4B, the map
generated by the compiler for the recent gc-point is OOP map 417
(OOP referring to object-oriented pointers) which indicates that
stack locations 400, 404 and 405 contain live OOPs or pointers. In
this example, stack location 401 is erroneously omitted from the
live OOPs designated by OOP map 417.
[0063] In carrying out garbage collection, references to object A
were found in stack locations 400 and 405, and a reference to
object C was found in stack location 404. No references to object B
were found in the stack locations of stack 408, because locations
402 and 407 were assigned null values and/or were not specified by
OOP map 417 as live pointers. As a result, objects A and C are
copied to "to" space 411 as object A.copy 415 and object C.copy
416. Further, using OOP map 417 to determine the locations of the
live pointers, stack locations 400 and 405 are updated to point to
object A.copy 415 in "to" space 411, and stack location 404 is
updated to point to object C.copy 416.
[0064] Unfortunately, due to its omission from OOP map 417, stack
location 401 is not updated. Therefore, any future reference to
object C made via stack location 401 will erroneously access the
obsolete object C 414 in "from" space 410, whereas reference made
via stack location 404 will access object C.copy 416 appropriately.
The multiple copies of object C are likely to diverge in respective
data values over time, causing inconsistent performance. Also,
obsolete object C 414 may be written over in a subsequent garbage
collection cycle, resulting in indeterminate behavior and possibly
a terminal error if accessed via stack location 401.
[0065] Zapping of Stack Locations in Debugging Mode
[0066] A debugging mode is provided in an embodiment of the
invention, that, when enabled, initiates a debugging process within
the virtual machine to find bugs associated with compiler error in
the identification of live OOPs. The activation of the debugging
mode enables a "zapping" process that overwrites unused stack
locations at each gc-point in the virtual machine's execution of a
program. Those stack locations that the compiler fails to correctly
identify as live OOPs are also overwritten by the zapping process.
As a result, during program execution, attempted object access via
a misidentified live OOP generates an error that may be trapped and
identified. The zapping process takes place at each gc-point, thus
providing an exhaustive test of compiler OOP identification for a
given executed application.
[0067] The OOP map generated by the compiler is extended in
debugging mode to identify live stack locations that do not contain
OOPS. This permits the zapping process to identify unused stack
locations as those locations which are not identified by the
compiler either as live OOPS or as live non-OOPs (e.g., directly
referenced data such as integers).
[0068] In one embodiment of the invention, a comparison process is
carried out, in addition to the zapping process, to handle another
possible form of misidentification that may occur in the revised
OOP map. This new form of misidentification is that of a live OOP
being misidentified as a live non-OOP. If a live OOP is
misidentified in this manner, the zapping process assumes that the
register location contains a live non-OOP value, such as an
integer, and thus does not overwrite the register location.
However, in the additional comparison process, live non-OOPs are
compared with possible OOP values (e.g., valid object reference
values) to ascertain whether the live non-OOPs may, in fact, be
misidentified live OOPs. If a match is obtained for any live
non-OOP, a warning is issued. The person performing the debugging
process may then determine where a possible bug of this nature may
be occurring based on the issued warning.
[0069] Where actual OOP values are intentionally being manipulated
in the form of live non-OOPs, these warnings may be ignored.
Screening capability may be built into the compiler and/or
comparison process to track where live OOP values have been written
intentionally as live non-OOPs. The comparison process may then
automatically omit a comparison operation, and subsequent warning
issuance, at the register locations of tracked live non-OOPs.
[0070] FIG. 5 is a flow diagram of a debugging process in
accordance with an embodiment of the invention. In step 500, the
debugging mode is enabled and the virtual machine begins execution
of an application. At step 501, a branch occurs based on whether a
gc-point has been reached. If execution is not at a gc-point, the
debugging process continues at step 508, where any attempts to use
an invalid pointer value during program execution are trapped. If,
at step 501, execution has reached a gc-point, the debugging
process continues at step 502.
[0071] In step 502, the debugging process determines, based on the
compiler OOP map, which stack locations contain a live OOP and
which stack locations do not contain a live OOP. In step 503, for
those stack locations specified as not containing live OOPs, the
process determines which stack locations contain other forms of
live data, or live non-OOPs. Those stack locations that are not
identified by the compiler OOP map as containing live OOPs or live
non-OOPs are assumed to be "dead" or unused locations. In step 504,
those dead or unused stack locations are overwritten with an
invalid pointer value, such as "0000." Steps 501-504 may be
implemented by the zapping process previously described.
[0072] In the embodiment of FIG. 5, in step 505, those locations
specified as live non-OOPs (or a subset thereof) in the OOP map are
compared with possible object reference values to determine whether
each live non-OOP is possibly a misidentified live OOP. If, in step
506, no live non-OOP matches a reference to an object, the
debugging process continues at step 508. If, in step 506, the value
stored in a live non-OOP matches a possible pointer value of an
object, a warning is issued in step 507 before proceeding to step
508. The warning may comprise, for example, a dialog message sent
to a display, or a warning entry written to a log file. Steps
505-507 may be implemented by the comparison process previously
described. It will be obvious to one skilled in the art that the
invention may be practiced without steps 505-507.
[0073] In step 508, during program execution, any attempts to use
invalid pointers (i.e., pointers with invalid pointer values such
as "0000") to access objects on the heap are trapped, or otherwise
registered as errors, and identified. The step of trapping the use
of invalid pointers is carried out continuously during program
execution, and may be implemented in a virtual machine process
separate from the zapping and comparison process(es). In step 509,
if program execution is completed, the debugging process also
completes in step 510. If, in step 509, execution has not yet
completed, the debugging process returns to step 501.
[0074] FIG. 6 is a block diagram of a virtual machine
implementation comprising a zapping/comparing component 600 in
accordance with an embodiment of the invention. Zapping/comparing
component 600 performs the zapping and comparison processes
described with respect to FIG. 5, and may be implemented as one or
more objects, components, methods, procedures, or elements thereof,
within a virtual machine environment.
[0075] In FIG. 6, stack 408 comprises stack locations 400-407 and
heap 409 comprises "from" space 410 and "to" space 411. "From"
space 410 comprises object A 412, object B 413 and object C 414.
"To" space 411 comprises object A.copy 415 and object B.copy 416.
Stack locations 400 and 405 point to object A.copy 415; stack
location 404 points to object C.copy 416; and stack locations 401,
402 and 407 point to an invalid address 601.
[0076] Compiler-generated OOP map 617 identifies stack locations
400, 404 and 405 as live OOPs. Further, in accordance with an
embodiment of the invention, OOP map 617 identifies stack locations
403 and 406 as live non-OOPs, because they contain integer data
values. An example representation of OOP map 617 is as follows:
1 Stack Location* 400 Live OOP -- 401 (Not Live OOP) (Dead) 402
(Not Live OOP) (Dead) 403 (Not Live OOP) Live Non-OOP 404 Live OOP
-- 405 Live OOP -- 406 (Not Live OOP) Live Non-OOP 407 (Not Live
OOP) (Dead)
[0077] *register or local variable
[0078] The implementation of the OOP map may be as flag bits or
bytes associated with stack locations. A first bit or byte may be
used to indicate whether the corresponding stack location is a Live
OOP. A second bit or byte may be used to indicate, in the event the
first bit or byte is not set, whether the corresponding stack
location is a live non-OOP. Any map implementation may be used
which provides three states for each stack location: "Live OOP,"
"Live non-OOP," and "dead."
[0079] Zapping/comparing component 600 accesses OOP map 617 to
identify "dead" locations in stack 408 for the zapping process and
to identify "Live Non-OOP" locations for the comparing process.
Zapping/comparing component 600 accesses stack 408 to carry out the
overwriting operations of the zapping process, as well as the
individual comparing operations of the comparison process. When
directed by the comparison process, zapping/comparing component 600
may issue warnings to warning destination 602.
[0080] In this example, stack locations 401, 402 and 407 are
overwritten with the pointer value 0000, which points to invalid
location 601. The zapping of stack locations 402 and 407 results in
no effect because those locations are dead stack locations
previously assigned a null value. The zapping of stack location
401, however, redirects the unidentified OOP from object C 414 to
invalid location 601. Attempted access of object C via stack
location 401 results in a trap that may be used to detect the error
in the OOP map, and to correct the source of the error in the
compiler.
[0081] The overwriting of stack location 401 occurs at the gc-point
regardless of whether garbage collection actually takes place.
Thus, even in execution situations where, due to the absence of a
garbage collection cycle, an error in the OOP map does not result
in a pointer error, the zapping process overwrites misidentified
stack locations to force a pointer error to occur and be
trapped.
[0082] If the entry for location 401 in the OOP map were to
erroneously identify the location as containing a live non-OOP
rather than a "dead" value, the entry for location 401 in OOP map
617 would read, for example, as follows:
2 401 (Not Live OOP) Live Non-OOP
[0083] In this case, the zapping process of zapping/comparing
component 600 does not overwrite location 401 because the location
is not "dead." However, the comparison process of zapping/comparing
component 600 examines locations identified as "Live Non-OOP"
(e.g., locations 401, 403 and 406), and compares the stored values
with possible OOP reference values. Location 401 is flagged for
storing a value matching the OOP reference value for object C 414
in "from" space 410, and a warning is issued to warning destination
602 to note the occurrence of this match. As with the zapping
process, the comparison process is carried out at each gc-point.
Thus, a warning may be issued by zapping/comparing component 600
regardless of whether a garbage collection cycle is carried
out.
[0084] Embodiment of Computer Execution Environment (Hardware)
[0085] An embodiment of the invention can be implemented as
computer software in the form of computer readable code executed on
a general purpose computer such as computer 700 illustrated in FIG.
7, or in the form of bytecode class files executable within a Java
runtime environment running on such a computer. A keyboard 710 and
mouse 711 are coupled to a bi-directional system bus 718. The
keyboard and mouse are for introducing user input to the computer
system and communicating that user input to processor 713. Other
suitable input devices may be used in addition to, or in place of,
the mouse 711 and keyboard 710. I/O (input/output) unit 719 coupled
to bidirectional system bus 718 represents such I/O elements as a
printer, A/V (audio/video) I/O, etc.
[0086] Computer 700 includes a video memory 714, main memory 715
and mass storage 712, all coupled to bidirectional system bus 718
along with keyboard 710, mouse 711 and processor 713. The mass
storage 712 may include both fixed and removable media, such as
magnetic, optical or magnetic optical storage systems or any other
available mass storage technology. Bus 718 may contain, for
example, address lines for addressing video memory 714 or main
memory 715. The system bus 718 also includes, for example, a data
bus for transferring data between and among the components, such as
processor 713, main memory 715, video memory 714 and mass storage
712. Alternatively, multiplex data/address lines may be used
instead of separate data and address lines.
[0087] In one embodiment of the invention, the processor 713 is a
microprocessor manufactured by Motorola, such as the 680.times.0
processor or a microprocessor manufactured by Intel, such as the
80.times.86, or Pentium processor, or a SPARC microprocessor from
Sun Microsystems, Inc. However, any other suitable microprocessor
or microcomputer may be utilized. Main memory 715 is comprised of
dynamic random access memory (DRAM). Video memory 714 is a
dual-ported video random access memory. One port of the video
memory 714 is coupled to video amplifier 716. The video amplifier
716 is used to drive the cathode ray tube (CRT) raster monitor 717.
Video amplifier 716 is well known in the art and may be implemented
by any suitable apparatus. This circuitry converts pixel data
stored in video memory 714 to a raster signal suitable for use by
monitor 717. Monitor 717 is a type of monitor suitable for
displaying graphic images. Alternatively, the video memory could be
used to drive a flat panel or liquid crystal display (LCD), or any
other suitable data presentation device.
[0088] Computer 700 may also include a communication interface 720
coupled to bus 718. Communication interface 720 provides a two-way
data communication coupling via a network link 721 to a local
network 722. For example, if communication interface 720 is an
integrated services digital network (ISDN) card or a modem,
communication interface 720 provides a data communication
connection to the corresponding type of telephone line, which
comprises part of network link 721. If communication interface 720
is a local area network (LAN) card, communication interface 720
provides a data communication connection via network link 721 to a
compatible LAN. Communication interface 720 could also be a cable
modem or wireless interface. In any such implementation,
communication interface 720 sends and receives electrical,
electromagnetic or optical signals which carry digital data streams
representing various types of information.
[0089] Network link 721 typically provides data communication
through one or more networks to other data devices. For example,
network link 721 may provide a connection through local network 722
to local server computer 723 or to data equipment operated by an
Internet Service Provider (ISP) 724. ISP 724 in turn provides data
communication services through the world wide packet data
communication network now commonly referred to as the "Internet"
725. Local network 722 and Internet 725 both use electrical,
electromagnetic or optical signals which carry digital data
streams. The signals through the various networks and the signals
on network link 721 and through communication interface 720, which
carry the digital data to and from computer 700, are exemplary
forms of carrier waves transporting the information.
[0090] Computer 700 can send messages and receive data, including
program code, through the network(s), network link 721, and
communication interface 720. In the Internet example, remote server
computer 726 might transmit a requested code for an application
program through Internet 725, ISP 724, local network 722 and
communication interface 720.
[0091] The received code may be executed by processor 713 as it is
received, and/or stored in mass storage 712, or other non-volatile
storage for later execution. In this manner, computer 700 may
obtain application code in the form of a carrier wave. In
accordance with an embodiment of the invention, an example of such
a downloaded application is the apparatus for debugging a virtual
machine described herein.
[0092] Application code may be embodied in any form of computer
program product. A computer program product comprises a medium
configured to store or transport computer readable code or data, or
in which computer readable code or data may be embedded. Some
examples of computer program products are CD-ROM disks, ROM cards,
floppy disks, magnetic tapes, computer hard drives, servers on a
network, and carrier waves.
[0093] The computer systems described above are for purposes of
example only. An embodiment of the invention may be implemented in
any type of computer system or programming or processing
environment, including embedded devices (e.g., web phones, etc.)
and "thin" client processing environments (e.g., network computers
(NC's), etc.) that support a virtual machine.
[0094] Thus, a method and apparatus for finding bugs related to
garbage collection in a virtual machine have been described in
conjunction with one or more specific embodiments. The invention is
defined by the claims and their full scope of equivalents.
* * * * *