U.S. patent application number 14/681790 was filed with the patent office on 2015-07-30 for automatic identification of interesting interleavings in a multithreaded program.
The applicant listed for this patent is F5 Networks, Inc.. Invention is credited to Kaya Bekiroglu, Peter J. Godman, Lucas M. Kreger-Stickles, Mark H. Oskin, Andrew M. Schwerin, Andrew J. Whitaker.
Application Number | 20150212835 14/681790 |
Document ID | / |
Family ID | 52782339 |
Filed Date | 2015-07-30 |
United States Patent
Application |
20150212835 |
Kind Code |
A1 |
Oskin; Mark H. ; et
al. |
July 30, 2015 |
AUTOMATIC IDENTIFICATION OF INTERESTING INTERLEAVINGS IN A
MULTITHREADED PROGRAM
Abstract
A facility that for a multithreaded program executing on a root
machine causes the threads of the program to be executed in a
relative scheduling that produces an interesting result. The
facility suspends execution of the program. The facility then tests
a plurality of relative thread schedulings on one or more virtual
machines and observes the result. Based upon the observed result
the facility selects one of the tested relative thread schedulings.
The facility then resumes execution of the program using the
selected relative thread scheduling.
Inventors: |
Oskin; Mark H.; (Seattle,
WA) ; Godman; Peter J.; (Seattle, WA) ;
Schwerin; Andrew M.; (Seattle, WA) ; Whitaker; Andrew
J.; (Seattle, WA) ; Kreger-Stickles; Lucas M.;
(Seattle, WA) ; Bekiroglu; Kaya; (Edmonds,
WA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
F5 Networks, Inc. |
Seattle |
WA |
US |
|
|
Family ID: |
52782339 |
Appl. No.: |
14/681790 |
Filed: |
April 8, 2015 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
13047716 |
Mar 14, 2011 |
9009020 |
|
|
14681790 |
|
|
|
|
61313693 |
Mar 12, 2010 |
|
|
|
61313692 |
Mar 12, 2010 |
|
|
|
61313695 |
Mar 12, 2010 |
|
|
|
Current U.S.
Class: |
717/135 |
Current CPC
Class: |
G06F 9/461 20130101;
G06F 2009/45575 20130101; G06F 2209/46 20130101; G06F 9/4881
20130101; G06F 9/4825 20130101; G06F 8/4434 20130101; G06F 9/455
20130101; G06F 2009/45591 20130101; G06F 2209/483 20130101; G06F
9/45504 20130101 |
International
Class: |
G06F 9/455 20060101
G06F009/455; G06F 9/45 20060101 G06F009/45; G06F 9/48 20060101
G06F009/48; G06F 9/46 20060101 G06F009/46 |
Claims
1. A method in a computing system that, for a multithreaded program
executing on a root machine, causes the threads of the
multithreaded program to be executed in a scheduling that produces
an interesting result, the method comprising: suspending execution
of all the threads of the multithreaded program on the root
machine; while execution of all the threads of the multithreaded
program on the root machine are suspended, saving the state of the
root machine; performing a plurality of simulations, each
simulation comprising: initializing a simulation machine with the
saved state; determining a set of communication interleavings among
threads of the multithreaded program; executing the multithreaded
program on the simulation machine in accordance with the determined
set of communication interleavings; and attributing a single score
to the simulation based on the results of execution of the
multithreaded program; selecting, based on a comparison of the
attributed scores, one of the plurality of simulations; and
resuming execution of the multithreaded program on the root machine
in accordance with the set of communication interleavings
determined for the selected simulation.
2. The method of claim 1, further comprising: storing a package
containing the saved state and an indication of the set of
communication interleavings determined for the selected simulation;
initializing a replay machine with the saved state from the
package; and executing the multithreaded program on the replay
machine in accordance with the set of communication interleavings
determined for the selected simulation indicated by the
package.
3. The method of claim 1 wherein at least one of: the root machine
and/or the simulation machine is a physical machine.
4. The method of claim 1 at least one of: the root machine and/or
the simulation machine is a virtual machine.
5. The method of claim 1 wherein at least one of: the root machine
and/or the simulation machine is a user-level process.
6. The method of claim 1 wherein executing the multithreaded
program in accordance with the determined set of communication
interleavings is performed by, for each of one or more threads of
the multithreaded program, suspending execution of the thread at a
point before it initiates a communication to another thread.
7. The method of claim 1 wherein executing the multithreaded
program in accordance with the determined set of communication
interleavings is performed by, for each of one or more threads of
the multithreaded program, delaying delivery of the communication
to another thread initiated by the thread.
8. The method of claim 1 wherein communication interleavings
include memory access interleavings.
9. The method of claim 1 wherein the single score determined for
each simulation reflects the extent to which the simulation's
execution of the multithreaded program produces undesirable
behavior.
10. The method of claim 1 wherein the single score attributed to at
least one of the simulations of the plurality is based at least in
part on an error reported by the simulation machine.
11. The method of claim 1 wherein the single score attributed to a
selected simulation of the plurality is based, at least in part, on
encountering a failed assertion in the execution of the
multithreaded program during the selected simulation.
12. The method of claim 1 wherein, for each of at least a portion
of the plurality of simulations, determining a set of communication
interleavings for the simulation comprises: determining an order in
which thread communications were performed during execution of the
multithreaded program in a previous simulation; identifying at
least one possible reordering within the determined thread
communications order; selecting one of the identified possible
reorderings; and determining a set of communication interleavings
that incorporates the identified reordering.
13. The method of claim 12 wherein the identification of possible
reorderings identifies a proper subset of all possible reorderings
that are functionally distinct.
14. The method of claim 13, further comprising, for each of the
plurality of simulations, maintaining clock vectors for each thread
of the multithreaded program that are used as a basis for
identifying reorderings among the possible reorderings that are
functionally distinct.
15. The method of claim 12 wherein the selection of one of the
identified possible reorderings is based at least in part on the
extent to which each of the identified possible reorderings relates
to a portion of the code of the multithreaded program explicitly
identified by its developer.
16. A method for causing threads of a multithreaded program
executing on a root machine to be executed in a relative scheduling
that produces an interesting result, the method comprising:
suspending execution of all the threads of the multithreaded
program; for each selected relative thread scheduling of a
plurality of relative thread schedulings: performing a test of the
selected relative thread scheduling on one or more virtual
machines; and observing the result of the test; attributing a score
to each test based on an extent to which the test identified
undesirable behavior; based on a comparison between the various
scores attributed to each of the tests, selecting one of the tested
relative thread schedulings; and resuming execution of the
multithreaded program using the selected relative thread
scheduling.
17. The method of claim 16 wherein testing at least one of the
plurality of relative thread schedulings comprises executing
through an interrupt to an interrupt handler.
18. The method of claim 16 wherein testing at least one of the
plurality of relative thread schedulings comprises executing
through a physical device access.
19. The method of claim 16 wherein testing at least one of the
plurality of relative thread schedulings comprises executing
through a logical device access.
20. A method for collectively storing, in one or more memories, a
data structure relating to a multithreaded program whose threads
each access memory locations, the method comprising: selecting,
from among multiple observed orders of communication interleavings,
a prescribed order of communication interleavings, wherein each
observed order of communication interleavings of the multiple
observed orders of communication interleavings indicates, for each
selected memory location of one or more memory locations, an order
in which the threads accessed the selected memory location during a
previous execution of the multithreaded program corresponding to
that order of communication interleavings, wherein each of the
multiple observed orders of communication interleavings is
associated with a score indicating an extent to which the
corresponding previous execution of the multithreaded program was
interesting, and wherein the selection of the prescribed order of
communication interleavings is based on the score associated with
the prescribed order of communication interleavings indicating that
the prescribed order of communication interleavings resulted in the
most interesting result; and storing, in the memories, an
indication of the selected prescribed order of communication
interleavings; wherein: the contents of the data structure can be
used to reproduce the most interesting result in a virtual machine
other than the distinguished virtual machine.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent
application Ser. No. 13/047,716, filed on Mar. 14, 2011 and
entitled "AUTOMATIC IDENTIFICATION OF INTERESTING INTERLEAVINGS IN
A MULTITHREADED PROGRAM," which is incorporated herein by reference
in its entirety. The Ser. No. 13/047,716 application claims the
benefit of U.S. Provisional Patent Application No. 61/313,693,
filed on Mar. 12, 2010 and entitled "AUTOMATIC IDENTIFICATION OF
INTERESTING INTERLEAVINGS IN A MULTITHREADED PROGRAM," U.S.
Provisional Patent Application No. 61/313,692, filed on Mar. 12,
2010 and entitled "IDENTIFYING A SET OF FUNCTIONALLY DISTINCT
REORDERINGS IN A MULTITHREADED PROGRAM," and U.S. Provisional
Patent Application No. 61/313,695, filed on Mar. 12, 2010 and
entitled "USING HINTS TO DIRECT THE EXPLORATION OF INTERLEAVINGS IN
A MULTITHREADED PROGRAM," all of which are hereby incorporated by
reference in their entireties.
[0002] This application is related to the following applications,
each of which is incorporated by reference in its entirety: U.S.
patent application Ser. No. 13/047,713 (patent counsel's docket
number 34897-8007.US01), filed on Mar. 14, 2011 and entitled
"IDENTIFYING A SET OF FUNCTIONALLY DISTINCT REORDERINGS IN A
MULTITHREADED PROGRAM," now U.S. Pat. No. 8,843,910, and U.S.
patent application Ser. No. 13/047,695 (patent counsel's docket
number 34897-8009.US01), filed Mar. 14, 2011 and entitled "USING
HINTS TO DIRECT THE EXPLORATION OF INTERLEAVINGS IN A MULTITHREADED
PROGRAM," now U.S. Pat. No. 8,893,092; U.S. Provisional Application
No. 61/013,019, filed Dec. 12, 2007; U.S. application Ser. No.
12/334,336, filed Dec. 12, 2008, now U.S. Pat. No. 8,694,997; U.S.
Provisional Application No. 61/035,490, filed Mar. 11, 2008; and
U.S. application Ser. No. 12/402,395, filed Mar. 11, 2009, now U.S.
Pat. No. 8,739,163.
TECHNICAL FIELD
[0003] The described technology is directed to the field of
software development and testing techniques, and, more
particularly, to the field of debugging techniques.
BACKGROUND
[0004] A multithreaded program is a program that is capable of
having two or more threads that can each be simultaneously executed
by a different processor of a multiprocessor system or a different
core of a multicore processor. Executing two or more threads of the
program simultaneously can increase the processing speed of the
program and/or the efficiency of the computer. Although each thread
is executed by a single processor or core, it is common for threads
to share other resources (e.g., memory) of the computer. As a
result, two or more threads of a multithreaded program may access
the same shared resource. For example, one thread may execute code
that writes a value of a shared memory location at approximately
the same time (e.g., shortly before or after) that another thread
executes code that reads the shared memory location.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is a block diagram showing an example architecture of
a computing system 100 on which the facility executes in some
embodiments.
[0006] FIG. 2 is data flow diagram depicting the exploration and
retirement performed by the facility in some embodiments.
[0007] FIG. 3 is a flow diagram showing steps typically performed
by the facility to conduct a simulation round and retirement in
some embodiments.
[0008] FIG. 4 is a flow diagram showing steps performed by the
facility in one or more embodiments to record a journal describing
a simulation of a multithreaded program.
[0009] FIG. 5 is a diagram showing access operations performed by
each thread as the multithreaded program executes over a period of
time.
[0010] FIG. 6 is a flow diagram showing steps typically performed
by the facility in some embodiments to maintain clock vectors for
the program's threads and memory locations accessed by the
program's threads during a single simulation of the program.
[0011] FIG. 7 is a sample clock vector diagram that shows how the
facility maintains clock vectors for a program's threads and memory
locations accessed by them in some embodiments.
[0012] FIG. 8 is a flow diagram showing steps typically performed
by the facility in some embodiments to identify functionally
distinct reorderings using the vector clock history for
simulation.
[0013] FIG. 9 is a sample path diagram showing paths containing
functionally distinct reorderings identified by the facility.
[0014] FIG. 10 is a flow diagram showing steps typically performed
by the facility in some embodiments in order to employ hinting to
guide the exploration performed by the facility.
[0015] FIG. 11 is a sample path diagram showing path scores
established by the facility in some embodiments using hints.
[0016] FIG. 12 is the thread timing diagram showing the relative
timing of execution of interesting regions by different
threads.
DETAILED DESCRIPTION
[0017] The inventors have recognized that the different orders in
which threads of a multithreaded program--or tasks of another type
that are executing some sort of programmatic entity--access (e.g.,
read, write, etc.) one or more memory locations may produce
different program behaviors or results, at least some of which are
of interest to the developer of the multithreaded program. As one
example, threads' access to a particular memory location in a
particular order may cause a multithreaded program to crash or
perform other undesirable behavior. Where the developer of the
multithreaded program is able to detect this situation, the
developer may wish to modify the multithreaded program code, such
as by adding a synchronization mechanism to guarantee that the
threads access the memory locations in an order that avoids this
result, removing or modifying one of the accessing instructions,
etc.
[0018] Accordingly, a hardware and/or software facility is
described that explores different execution schedules for the
threads of the program in which the threads access shared memory
locations in different orders to identify thread execution
schedules that produce interesting results, such as undesirable
results. As used herein, the term "schedule" refers to any subset
of potentially communicating memory accesses and the order in which
those accesses are executed by threads of the multithreaded
program. Potentially communicating memory accesses are two or more
accesses, at least one of which is a write, that overlap in memory
and are present in a program execution. In some embodiments, a
schedule may define one or more pairs of potentially communication
memory accesses to the same location (referred to herein as
"interleavings"). These explorations of schedules by the facility
are called "simulations." A group of simulations performed together
is called a "simulation round."
[0019] Communication between threads can occur when a thread reads
a memory location that was written by another thread or when a
thread writes to a memory location that is subsequently read by
another thread. Where multiple threads communicate by all accessing
at least one same memory location in common, multiple schedules are
possible. A schedule specifies an order in which different threads
access one or more locations in common. A schedule typically
specifies this order in a way that identifies the accesses of each
thread. In various embodiments, a schedule further specifies the
location accessed in each access operation; the type of each access
operation (e.g., read, write, or exchange); the instruction at
which each access is performed; an absolute or relative time at
which each access is performed; and/or other related information.
For example, where location A is the only communicating location
and thread 1 writes location A once and thread 2 reads location A
once, two schedules are possible: {<location A, thread
1>,<location A, thread 2>}--meaning that thread 1 writes
location A before thread 2 reads location A--and A{<location A,
thread 2>,<location A, thread 1>}--meaning that thread 2
reads location A before thread 1 writes location A. The facility's
exploration of different thread execution schedules in a simulation
round involves selecting a number of "paths" for execution as
individual simulations in simulation round, where each path is a
combination of one possible interleaving for each of multiple
locations. Continuing the example, where location A is accessed as
described above and location B is written once by thread 2 and once
by thread 3, the 24 possible schedules may be: [0020] {<location
A, thread 1>, <location A, thread 2>, <location B,
thread 2>, <location B, thread 3>} [0021] {<location A,
thread 1>, <location A, thread 2>, <location B, thread
3>, <location B, thread 2>} [0022] {<location A, thread
1>, <location B, thread 2>, <location A, thread 2>,
<location B, thread 3>} [0023] {<location A, thread 1>,
<location B, thread 2>, <location B, thread 3>,
<location A, thread 2>} [0024] {<location A, thread 1>,
<location B, thread 3>, <location A, thread 2>,
<location B, thread 2>} [0025] {<location A, thread 1>,
<location B, thread 3>, <location B, thread 2>,
<location A, thread 2>} [0026] {<location A, thread 2>,
<location A, thread 1>, <location B, thread 2>,
<location B, thread 3>} [0027] {<location A, thread 2>,
<location A, thread 1>, <location B, thread 3>,
<location B, thread 2>} [0028] {<location A, thread 2>,
<location B, thread 2>, <location A, thread 1>,
<location B, thread 3>} [0029] {<location A, thread 2>,
<location B, thread 2>, <location B, thread 3>,
<location A, thread 1>} [0030] {<location A, thread 2>,
<location B, thread 3>, <location A, thread 1>,
<location B, thread 2>} [0031] {<location A, thread 2>,
<location B, thread 3>, <location B, thread 2>,
<location A, thread 1>} [0032] {<location B, thread 2>,
<location A, thread 1>, <location A, thread 2>,
<location B, thread 3>} [0033] {<location B, thread 2>,
<location A, thread 1>, <location B, thread 3>,
<location A, thread 2>} [0034] {<location B, thread 2>,
<location A, thread 2>, <location A, thread 1>,
<location B, thread 3>} [0035] {<location B, thread 2>,
<location A, thread 2>, <location B, thread 3>,
<location A, thread 1>} [0036] {<location B, thread 2>,
<location B, thread 3>, <location A, thread 1>,
<location A, thread 2>} [0037] {<location B, thread 2>,
<location B, thread 3>, <location A, thread 2>,
<location A, thread 1>} [0038] {<location B, thread 3>,
<location A, thread 1>, <location A, thread 2>,
<location B, thread 2>} [0039] {<location B, thread 3>,
<location A, thread 1>, <location B, thread 2>,
<location A, thread 2>} [0040] {<location B, thread 3>,
<location A, thread 2>, <location A, thread 1>,
<location B, thread 2>} [0041] {<location B, thread 3>,
<location A, thread 2>, <location B, thread 2>,
<location A, thread 1>} [0042] {<location B, thread 3>,
<location B, thread 2>, <location A, thread 1>,
<location A, thread 2>} [0043] {<location B, thread 3>,
<location B, thread 2>, <location A, thread 2>,
<location A, thread 1>}
[0044] In some embodiments, to cause the threads to access an
identified memory location in a different order as part of pursuing
a particular path than an individual simulation, the facility
suspends the execution of one or more threads. For example, in
response to a thread requesting to access an identified memory
location, the facility may suspend the thread's execution for a
period of time determined to be adequate for another thread to
access the identified memory location. As another example, in
response to a thread requesting to access an identified memory
location, the facility may suspend the thread's execution until
another thread actually accesses the selected memory location. In
some embodiments, if another thread has not accessed the selected
memory location within a threshold period of time, the facility
determines that the reordering was unsuccessful. Typically, a
reordering is unsuccessful when the multithreaded program includes
a synchronization mechanism to control the threads' access to
selected memory location. If the facility identifies a successful
reordering for a selected memory location, then the facility
determines that the threads' access to the selected memory location
is not controlled and, as a result, the program may behave
unpredictably or crash.
[0045] In some embodiments, the facility selects an execution point
from which, or after which, the facility attempts to perform a
round of simulations to reorder the threads' access to one or more
identified memory locations. At the selected execution point, the
facility captures and saves the state of the multithreaded program
and the state of the "root machine" on which the program is
executing. This state is collectively referred to herein as the
"saved state." The saved state includes the state necessary to
execute the multithreaded program from the selected execution
point. For example, the initial state may include: control
registers, general purpose registers, floating point registers,
data registers, program counters, flags, the address space of the
multithreaded program (e.g., the page table, stack, heap, etc.),
the state of emulated or physical devices (e.g., clocks, network,
storage, I/O, timers, network interface card, or advanced
programmable interrupt controller etc.), or the like. After the
facility captures a saved state, the facility enters into a period
of simulation in which a number of different paths are explored
beginning at the saved state.
[0046] In some embodiments, to simulate execution of the
multithreaded program from the initial state, the facility
initializes a virtual machine with the entire saved state. The
facility observes execution of the multithreaded program from the
selected execution point within the virtual machine to identify one
or more memory locations that are accessed by two or more threads.
The facility may initialize the same or another virtual machine
with the saved state and deterministically execute the
multithreaded program to reorder the threads' access to at least
one of the identified memory locations, and thus simulate a
different path.
[0047] In some embodiments, to simulate execution of the
multithreaded program from the saved state, the facility performs
speculative execution using a copy-on-write ("COW") strategy. That
is, instead of initially initializing the virtual machine with the
entire saved state, the facility waits for a thread to perform an
access operation that would change a region of memory in the saved
state, and in response creates a copy of the region of memory and
the access operation is performed on the copied region. When a COW
strategy is used to simulate execution, the facility is able to
perform multiple simulations to explore multiple schedules because
the saved state is left unchanged.
[0048] After the facility has conducted a number of different
simulations in a simulation round, the facility selects one of the
simulations based on judging it the most interesting. The facility
"retires" the selected simulation by causing the root machine to
resume executing the program from the saved state while imposing
the path used in the selected simulation. The facility may also or
instead "package" the selected simulation for later replay on the
same or another virtual machine. In some embodiments, rather than
considering for simulation all possible paths--that is, every
possible combination of different schedules--the facility considers
only paths that include a functionally distinct set of possible
schedules by eliminating possible schedules that are functionally
equivalent with possible schedules in the functionally distinct
set. In particular, in such embodiments, the facility employs a
system of vector clocks that track, throughout the course of a
simulation, for each thread, the most recent state of each of the
other threads that it could have been made aware of by
communications from the other thread. In maintaining such vector
clocks, the facility regards any write to an address by a first
thread that is followed by a read from the same address by a second
thread as a communication from the first thread to the second
thread. Each update to a clock component of a first thread
regarding a communication from a second thread that is completed in
a particular read instruction executed by the first thread is said
to establish a reorderability window of the second thread with
respect to the location read by the completing read instruction.
For each reorderability window established in this way by the
facility, it identifies as a functionally distinct reordering any
reordering that reconstitutes a reordering between the completing
read instruction of the second thread and the write instructions
for the same location by the first thread that are within the
window.
[0049] In some embodiments, the facility exhaustively explores the
minimized set of possible schedules, conducting a separate
simulation for each path that can be constructed from the
functionally distinct set of schedules. In some embodiments, the
facility explores the paths through the functionally distinct
reorderings incompletely. As an example, in some embodiments, the
facility randomly selects fewer than all such paths. As another
example, in some embodiments, the facility actively identifies
promising paths among these based upon explicit hints included in
the program by its developer that identify points or regions in the
code that the developer expected to be interesting to explore. The
facility observes, during the foregoing simulation, which points or
regions identified by hints were executed. The facility then
selects one or more schedules proximate to the identified points
and regions to perturb in the path for the prior simulation to
arrive at the path for the next simulation.
[0050] In some embodiments, the facility attributes a "retirement
score" to each simulation in a simulation round these as a basis
for selecting the simulation from the simulation round to retire or
replay. In some embodiments, a simulation's retirement score is
based upon whether a crash or other error was encountered during
the simulation, including processor faults such as a general
protection fault, a divide by zero exception, accessing a null
pointer, etc. In some embodiments, a simulation's retirement score
is based upon explicit hints included in the program by its
developer. As one example of hints, the developer can include one
or more assertions in the program. Each assertion specifies a
condition that the developer always expects to be true when
execution reaches the point in the program where the assertion is
placed. In some embodiments, whether the condition specified by one
of these assertions evaluates as false during a simulation
contributes to the simulation's retirement score. As another
example of hints, the developer can include specialized function
calls or other constructs that identify a point in the program or a
region in the program as being interesting, such as points or
regions that contain error-prone code or are otherwise sensitive.
In some embodiments, the number and/or weight of interesting points
and sections encountered by the simulation contributes to the
simulation's retirement score. In some embodiments, when a
simulation round is completed, the facility selects the simulation
having the highest retirement score to be retired and/or replay. In
some embodiments, as soon as a simulation is completed whose
retirement score exceeds a predetermined threshold, but simulation
round is terminated and the simulation question is retired and/or
replayed.
[0051] In some embodiments, the facility provides a mechanism for
delivering at least some types of interrupts to simulations
executing on a virtual machine--such as inter-processor interrupts
("IPIs") and timer device interrupts ("TDIs")--and/or permits
simulations to continue to execute in the virtual machine through
the invocation of an interrupt handler after an interrupt of one of
these types occurs. In some embodiments, the facility provides such
a mechanism by providing a synthetic advanced programmable
interrupt controller ("APIC") that sends and receives IPIs, and/or
by providing virtual timers to generate TDIs.
[0052] By operating in some or all of the ways described above, the
facility enables developers to more efficiently and/or effectively
identify interesting schedules in a multithreaded program, such as
schedules that surface concurrency defects.
[0053] FIG. 1 is a block diagram showing an example architecture of
a computing system 100 on which the facility executes in some
embodiments. Certain well-known structures and functions have not
been shown or described in detail to avoid obscuring the
description. The computing system 100 includes one or more
processors 105 and memory 110 coupled to an interconnect system
115. The processors 105 are the central processing units ("CPUs")
of the computing system 100 and, thus, control its overall
operation. In some embodiments, the processors 105 accomplish this
by executing software stored in memory 110. In some embodiments,
the computing system 100 has at least one processor 105 having two
or more independent cores in a package composed of a single
integrated circuit (referred to as a "die"), one or more dies
packaged together, multiple packages, and so on. In some
embodiments, the computing system 100 has at least one
hyper-threaded processor 105 that, despite having only a single
core, is capable of performing as a multi-core processor. A
processor 105 may be, or may include, one or more programmable
general-purpose or special-purpose microprocessors, digital signal
processors ("DSPs") programmable controllers, application specific
integrated circuits ("ASICs"), programmable logic devices ("PLDs"),
or the like, or a combination of such devices.
[0054] The interconnect system 115 shown in FIG. 1 is an
abstraction that represents any one or more separate physical buses
and/or point-to-point connections, connected by appropriate
bridges, adapters and/or controllers. The interconnect system 115
may include, for example, a system bus, a form of Peripheral
Component Interconnect (PCI) bus, a HyperTransport or industry
standard architecture (ISA) bus, a small computer system interface
(SCSI) bus, a universal serial bus (USB), an Institute of
Electrical and Electronics Engineers (IEEE) standard 1394 bus
(sometimes referred to as "Firewire"), and so on.
[0055] System memory 110 includes a memory 120 for storing programs
and data while they are being used; a persistent storage device
125, such as a hard drive, for persistently storing programs and
data; and a computer-readable media drive 130, such as a CD-ROM or
DVD-ROM drive, for reading programs and data stored on a
computer-readable medium. As used herein, system memory 110
includes any form of volatile, nonvolatile, removable, and
non-removable media, or any combination of such media devices that
are capable of storing information such as computer-readable
instructions, data structures, program modules, and other data of
the computing system 100.
[0056] Also connected to the processors 105 through the
interconnect system 115 is a network adapter 135 and one or more
input devices and output devices ("I/O devices") 140. The network
adapter 135 provides the computing system 100 with the ability to
communicate with other computing systems over a network and may be,
for example, an Ethernet adapter. The I/O devices 140 provide a
user of the computing system 100 with the ability to access
programs and data stored in system memory 110. For example, I/O
devices 140 may include input devices such as a keyboard, pointing
device, microphone, etc., and output devices such as a display
device, speakers, a printer, and so on.
[0057] While computing systems configured as described above are
typically used to support the operation of the facility, those
skilled in the art will appreciate that the facility may be
implemented using devices of various types and configurations, and
having various components.
[0058] FIG. 2 is data flow diagram depicting the exploration and
retirement performed by the facility in some embodiments. In the
illustrated embodiment, the facility includes a root machine 200
and a virtual machine 205. The root machine 200 may be a physical
machine or a virtual machine. The root machine 200 executes a
multithreaded program (not shown), while the virtual machine 205 is
used to explore various schedules of the threads of the
multithreaded program. In some embodiments, at arbitrary and/or
specified points in execution, the facility pauses execution of the
multithreaded program on the root machine 200 to explore various
schedules and determine whether a replay package should be
generated for one or more explored schedules. While execution of
the multithreaded program is paused, the facility captures the
state 210 of the root machine 200 and the address space 215 of the
multithreaded program. The address space 215 of the multithreaded
program is unchanged while execution of the multithreaded program
is paused. The virtual machine 205 is initialized with a copy 220
of the machine state 210 and a copy 225 of the address space 215 of
the multithreaded program. The virtual machine 205 simulates
execution of the multithreaded program to identify one or more
memory locations accessed by at least two threads. For at least one
identified memory location, the virtual machine 205 attempts to
reorder the threads' access to the identified memory location. For
one or more of the attempted reorderings ("explored schedules"),
the facility maintains a record of the regions of memory 230
accessed by the threads and the path 235. One or more of the paths
235 are included in a replay package 250 together with the copied
machine state and the regions of memory 230 accessed by the
threads. The regions of memory 230 that are included in the replay
package 250 have the same contents as the corresponding regions of
memory at the time that the multithreaded program was paused. The
regions of memory 230 that are included in the replay package 250
may be copied to the replay package 250 from the address space 215
or 225 a variety of ways. For example: [0059] In some embodiments,
while simulating execution of the multithreaded program on the
virtual machine 205, the facility speculatively executes the
multithreaded program using a copy-on-write ("COW") strategy. In
response to a thread's request to write to a memory location, the
facility creates a copy of the region of memory that contains the
memory location and the write operation is performed on the copied
region. When a COW strategy is used to simulate execution, the
facility is able to perform multiple simulations to explore
multiple schedules because the copied address space 225 is left
unchanged. For each explored schedule, the facility maintains a
record of the regions of memory accessed (e.g., read, written,
etc.) by threads of the multithreaded program. For each explored
schedule that is included in the replay package 250, the facility
includes a copy of the regions of memory 230 from the address space
215 or 225. In some embodiments, if multiple explored schedules are
included in the replay package 250, the facility includes the union
of the regions of memory accessed by the threads to perform each of
the explored schedules. [0060] In some embodiments, while
simulating execution of the multithreaded program on the virtual
machine 205, the facility executes the multithreaded program using
a copy-on-access ("COA") strategy. In response to a thread's
request to access (e.g., read, write, etc.) a location within a
region of memory, the facility determines whether the request is
the first request to access any location within the region memory.
If the request is the first request, the facility creates a copy of
the region of memory 230 from the address space 215 or 225 before
the access operation is performed. If the simulation is included in
the replay package 250, the copied regions of memory 230 are
included in the replay package 250. In some embodiments, if
multiple simulations are performed using the COA strategy, the
facility includes the union of the regions of memory accessed by
the threads for each of the simulations included in the replay
package 250.
[0061] In some embodiments, a selected path 245 is "retired" on the
root machine 200 by deterministically executing the multithreaded
program on the root machine 200 in accordance with the path
245.
[0062] FIG. 3 is a flow diagram showing steps typically performed
by the facility to conduct a simulation round and retirement for a
multithreaded program in some embodiments. In step 301, the
facility begins execution of the program on the root machine. In
some embodiments, the root machine is a physical machine, while in
other embodiments the root machine is a root virtual machine. In
step 302, the facility pauses execution of the program on the root
machine. In step 303, the facility saves the state of the root
machine. In some embodiments, the state saved in step 303 includes
all programmer visible registers (e.g., the control register, the
general purpose registers, and the floating point register), the
stack, the heap, and the page table. In some embodiments, step 303
includes saving the state persistently, such as on hard disk. In
step 304, the facility determines an initial path to use in the
first simulation of the simulation round. In some embodiments, the
path determined by the facility of step 304 is the path that
results from permitting the program to execute in a natural order,
without artificially suspending execution of any of the program's
threads for the purpose of perturbing the execution schedule of the
threads. In some embodiments, the facility uses information
determined in or from past simulation rounds for the same program
in order to determine in step 304 an initial path other than the
natural path. In step 305, the facility initializes a virtual
machine using the state saved in step 302. In some embodiments, the
state is initialized with the saved state using a hypercall
interface that alters the state of the virtual machine in a
predefined way (e.g., by placing values in registers, program
counters, etc.) and then performs an action that causes a transfer
of control to the virtual machine monitor (e.g., a VMEXIT). When
the virtual machine monitor runs it examines the state of the
machine and uses this to determine what action to perform on behalf
of the virtual machine. In step 306, the facility performs a
simulation by, beginning with the saved state with which the
virtual machine was initialized in step 305, executing the program
on the virtual machine, imposing a determined path and observing
the results. Because this execution begins with the saved state of
the root machine, this execution begins at the point at which the
program was paused on the root machine. For example, the program
counter for each thread at the time the program is executed on the
virtual machine is the same as the program counter that was saved
on the physical machine. Details of step 306 are described in
greater detail below in connection with FIGS. 4-7. In step 307,
based upon the results of the simulation conducted in step 306, the
facility attributes to the simulation a retirement score reflecting
the strength of the simulation's candidacy for the simulation that
will be retired or packaged for replay. In step 307, the facility
further stores the attributed score, together with information
defining the path pursued during the simulation. Details of step
307 are discussed in greater detail below in connection with FIG.
10. In step 308, if additional simulations are to be performed as
part of the current simulation round, then the facility continues
in step 309, else the facility continues in step 310. In step 309,
based upon the results of the simulation, the facility determines
an additional path to explore in future simulation of the round.
Details of step 307 are discussed in greater detail below in
connection with FIGS. 10-12. After step 309, the facility continues
in step 305 to begin the next simulation. In step 310, the facility
selects the simulation of the round that received the highest
retirement score in step 307. In step 311, the facility retires the
selected simulation by resuming execution of the program of the
root machine, imposing a path pursued in the simulation selected in
step 310. Additionally or alternatively, in step 311, the facility
packages the simulation selected in step 310 for replay. Additional
details about such packaging and replay are provided in U.S.
Provisional Patent Application No. 61/393,691 entitled
"DETERMINISTIC REPLAY PACKAGE," filed on Mar. 12, 2010, hereby
incorporated by reference in its entirety. During either retirement
or replay, the execution of the program consistent with the path of
the selected simulation can be observed using a debugger or a
variety of other tools to identify and analyze events of interest
that occur when the program is executed consistent with the path of
the selected simulation. After step 311, these steps conclude.
[0063] While FIG. 3 shows the use of only a single virtual machine
to perform simulations, in some embodiments, the facility uses
multiple virtual to simultaneously perform different simulations.
These multiple virtual machines may execute on the same or
different physical machines.
[0064] Those skilled in the art will appreciate that the steps
shown in FIG. 3 and in each of the flow diagrams discussed below
may be altered in a variety of ways. For example, the order of the
steps may be rearranged; some steps may be performed in parallel;
shown steps may be omitted, or other steps may be included;
etc.
[0065] FIG. 4 is a flow diagram showing steps performed by the
facility in one or more embodiments to record a journal describing
a simulation of a multithreaded program. In some embodiments, the
facility records a journal by trapping thread requests to perform
access operations before the access operations are actually
performed. An access operation can be a read, a write, or more
complex operation such as an operation to acquire or release a
synchronization mechanism, which may be treated as both a read and
a write.
[0066] The facility loops through steps 405-430 for each request by
a thread of the multithreaded program to access a memory location
within a region of memory. FIG. 4 can be better understood in the
context of FIG. 5 and the table listed below, which shows the
regions of memory accessed as a result of the operations shown in
FIG. 5. In various embodiments, those regions are pages of memory,
pages in the page table, or portions of pages, such as
quarter-pages, cache lines, or words.
TABLE-US-00001 location memory region accessed? A, B, C 1 Yes D 2
Yes E, F, G 3 No H 4 No W, X, Y, Z 5 Yes
[0067] FIG. 5 is a table diagram showing access operations
performed by each thread as the multithreaded program executes over
a period of time. To facilitate description, the table 500 is
indexed by a time column 505. Columns 510, 515 and 520 list the
access operations performed by thread 1, thread 2, and thread 3,
respectively. These columns do not show operations executed by the
threads that are not access operations. Thread 1 executes access
operations at times 1, 2, 3, 4, and 5; thread 2 executes operations
at times 2, 3, and 6; and thread 3 executes access operations at
times 4 and 5. The facility loops through steps 405-430 for each
access operation shown in FIG. 5. For example, at time 1, the
facility loops through steps 405-430 for the access operation
("write A") performed by thread 1. As another example, at time 4,
the facility loops through steps 405-430 for the access operation
("write A") performed by thread 1 and the access operation ("read
X") performed by thread 3.
[0068] While FIG. 5 and each of the table diagrams discussed below
show a table whose contents and organization are designed to make
them more comprehensible by a human reader, those skilled in the
art will appreciate that actual data structures used by the
facility to store this information may differ from the table shown,
in that they, for example, may be organized in a different manner;
may contain more or less information than shown; may be compressed
and/or encrypted; etc.
[0069] Returning to FIG. 4, in step 405, the facility traps a
request by a thread to access a memory location within a region of
memory. For example, at time 1, the facility traps the access
operation ("write A") that is to be performed by thread 1. In step
410, the facility determines whether the request is the first
request to access any location within the region of memory. If the
request is the first request to access any location within the
region of memory, then the facility continues at step 415; else the
facility continues at step 420. In step 415, the facility records
an identifier associated with the accessed region of memory to
identify that the region of memory has been accessed by a thread of
the multithreaded program. For example, a region identifier of an
accessed region may be recorded in a table or other data structure.
As another example, the facility can mark a region as accessed in
response to a thread requesting to access a location within the
region. In some embodiments, instead of or in addition to recording
an identifier associated with the accessed region of memory, the
facility creates a copy of the region of memory before the access
operation is performed. If the simulation is included in the replay
package, the copied regions of memory are included in the replay
package. In the example illustrated in FIG. 5, the facility
determines that memory region 1 is first accessed by thread 1 at
time 1; memory region 2 is first accessed by thread 1 at time 2;
and memory region 5 is first accessed by thread 1 at time 3. Each
of these access operations are trapped by the facility and an
identifier associated with the accessed region is recorded in step
415. In the example illustrated in FIG. 5, the remaining
instructions are trapped and the facility determines that the
requested locations are within regions of memory previously
accessed. For example, for the access operation ("write B")
performed by thread 2 at time 3, the facility determines that the
requested location ("B") is not the first access to the region of
memory (region 1), in which case, the facility continue at step
420.
[0070] In step 420, the facility determines whether the requested
memory location has been accessed by another thread during the
simulation. If the memory location has been accessed by another
thread, the facility continues at step 425; else the facility
continues at step 430. To determine whether a memory location has
been accessed by another thread, the facility may maintain a list
of each memory location that has been accessed during the
simulation, for example. In step 425, the facility records the
order in which the threads accessed the memory location ("schedule
order"). For example, for the access operation performed at time 2
by thread 2, the facility determines that memory location A has
been previously accessed by thread 1. In this example, in step 425
for the access operation performed at time 2 by thread 2, the
facility records the schedule order for memory location A as A(1,
2). As another example, in step 425 for the access operation
performed at time 4 by thread 1, the facility records the schedule
order for memory location A as A(1, 2, 1). As yet another example,
in step 425 for the access operation performed at time 5 by thread
3, the facility records the schedule order for memory location B as
B(2, 3).
[0071] In step 430, the facility determines whether the simulation
is complete. For example, the facility may determine that a
simulation is complete if the multithreaded program crashes. As
another example, the facility may determine that a simulation is
complete if the multithreaded program reaches an unexpected state.
As yet another example, the facility may determine that a
simulation is complete if the multithreaded program ends. If the
simulation is not complete, the facility continues at step 405 to
trap the next access operation; else, the process 400 ends.
[0072] In some embodiments, the schedule is the subset of
potentially communicating accesses in the order in which those
accesses are executed by the threads of a multithreaded program.
For example, the schedule illustrated in FIG. 5 is {<location A,
thread 1>, <location A, thread 2>,<location B, thread
2>,<location A, thread 1>, <location C, thread 1>,
<location B, thread 3>, <location C, thread 2>}.
[0073] FIG. 6 is a flow diagram showing steps typically performed
by the facility in some embodiments to maintain clock vectors for
the program's threads and memory locations accessed by the
program's threads during a single simulation of the program. In
step 601, the facility traps an access operation to an accessed
memory location by an accessing thread. In some embodiments, the
facility performs this trapping in a hypervisor on top of which the
program is executing. In various embodiments, the facility performs
this trapping inside the operating system kernel, or in the
processor. In some embodiments, the facility performs this trapping
when the thread reaches the access operation, but before the access
operation is actually completed. In some embodiments, the access
operation can be a read, a write, or a more complex operation such
as an operation to acquire or release a synchronization mechanism,
which is treated as both a read and a write. In various
embodiments, operations treated by the facility as both read and
write include operations such as atomic compare and swap
instructions. In some embodiments, this trapping is performed only
with respect to certain memory access operations, such as memory
access operations with respect to memory locations in certain
memory pages or memory words. The remaining steps of FIG. 6 can be
better understood in the context of FIG. 7.
[0074] FIG. 7 is a sample clock vector diagram that shows how the
facility maintains clock vectors for a program's threads and memory
locations accessed by them in some embodiments. The clock vector
diagram comprises a table 700 that indicates, as the simulation
progresses, the state of clock vectors 714-715 of the program's two
threads and the state of clock vectors 716-718 of the memory
locations accessed by the threads. The table is indexed by an
absolute time column 711. In some embodiments (not shown), the
facility makes no effort to maintain an absolute time, and simply
uses a system of local times that is local to each thread. Columns
712-713 show the memory access instructions executed by thread 1
and thread 2, respectively. These columns do not show instructions
executed by the threads that are not memory access instructions.
From these columns, it can be seen that no thread executes a memory
access instruction at time 0. Thread 1 executes memory access
instructions at times 1-4; and thread 2 executes memory access
instructions at times 5-8. Each clock vector contains two
components each corresponding to one of the threads. For example,
in the clock vector state <4, 6> occurring at time 6 in
column 715, the first component, 4, corresponds to thread 1; and
the second component, 6, corresponds to thread 2. In particular,
this state of the clock vector for thread 2 indicates that, at time
6, based upon memory access operations performed by the threads up
until time 6: it is possible that thread 2 has been influenced by
processing of thread 1 up through the memory access instruction at
time 4; and it is possible that thread 2 has been influenced by its
own processing through the memory access instruction at time 6.
Where the state of a particular clock vector does not change at a
particular time, the box at that time in the clock vector's column
is blank. For example, because the state of the clock vector for
location A does not change at time 2, the box at time 2 in column
715 is blank, which should be understood to mean that the clock
vector for location A at time 2 remains <1, 0> from time
1.
[0075] Returning to FIG. 6, in step 602, the facility advances the
accessing thread's component of its own clock vector to reflect the
advancement of absolute time. As an example, for the memory access
instruction performed by thread 1 at time 2, the clock vector for
accessing thread 1 in column 715 is changed from <1, 0> to
<2, 0>, such that its first component corresponding to
accessing thread 1 is set equal to time 2.
[0076] Returning to FIG. 6, in step 603, the facility determines
the type of memory access operation that the accessing thread is
performing; if the performed access operation is a write (shown as
"wr" in FIG. 7), then the facility continues in step 604; if the
performed access operation is a read (shown as "rd" in FIG. 7),
then the facility continues in step 605. As mentioned above, the
facility treats some complex memory access operations, such as the
exchange operation (shown as "x" in FIG. 7) as both a read and a
write, and so continues at both step 604 and step 605.
[0077] In step 604 where the access operation is a write, the
facility modifies the clock vector for the accessed memory location
based upon the clock vector of the accessing thread. In particular,
the facility sets the clock vector for the accessed memory location
to be the component-wise maximum of the current clock vector for
the accessed memory location and the clock vector of the accessing
thread. The facility performs step 604 as a recognition that, by
writing information to the accessed memory location, the accessing
thread could have stored in the accessed memory location
information that is based on its current state, or the state of any
other thread of which the accessing thread is currently aware. This
information may be later obtained and used by any thread that
subsequently reads the accessed memory location. For example, for
the access operation performed at time 6 by thread 2 that involves
writing to location C, the facility determines the new clock vector
state <4, 6> for location C at time 6 in column 718 by
performing a component-wise maximum of the clock vector state for
location C at times 0-5, <0, 0>, and the clock vector state
for thread 2 at time 6, <4, 6>. In some embodiments, if a
clock vector does not exist for the accessed memory location, the
facility creates and initializes a new clock vector for the
accessed memory location as part of step 604. After step 604, the
facility continues in step 601 to trap the next access
operation.
[0078] In step 605 where the access operation is a read, the
facility modifies the clock vector of the accessing thread based
upon the clock vector of the accessed memory location. In
particular, the facility sets the clock vector for the accessing
thread to be the component-wise maximum of the current clock vector
for the accessing thread and the clock vector for the accessed
memory location. The facility performs step 605 as a recognition
that, by reading information from the accessed memory location, the
accessing thread could have obtained and used information stored in
the accessed memory location by any earlier writer to the accessed
memory location that reflects the writing thread's state on
writing, as well as the state of any other thread of which the
writing thread is then aware. For example, for the access operation
performed at time 2 by thread 1 that involves reading from location
B, the facility determines the new clock vector state <2, 0>
for thread 1 in the box for time 1 in column 714 by performing a
component-wise maximum of the current clock vector state for
accessing thread 1, <1, 0>, and the clock vector state for
location C, <0, 0, 0>. After step 606, the facility continues
in step 601 to trap the next access operation.
[0079] FIG. 8 is a flow diagram showing steps typically performed
by the facility in some embodiments to identify functionally unique
schedules using the vector clock history for a simulation. In step
800, the facility identifies in the vector clock history for a
simulation any instance where, in the vector clock of a first
thread at an identified time, a component of the vector clock
corresponding to a second thread increases from the preceding
vector clock of the first thread. By reviewing FIG. 7, it can be
seen that there is one such instance in the sample vector clock
history; the increased thread vector clock component for this
instance is underlined. In steps 802-806, the facility loops
through each instance identified in step 801. In step 803, the
facility establishes a reorderability window for the instance. The
established window extends from the time shown in the component of
the preceding vector clock of the first thread corresponding to the
second thread to the time shown in the component of the vector
clock of the first thread at the identified time corresponding to
the second thread. In the example, the reorderability window
extends from time 0 to time 4. In step 804, the facility identifies
any accesses that are within the reorderability window established
in step 803, that are by the second thread, and that are to or from
the location that is accessed by the first thread at the identified
time, except where the accesses by both the first and second
threads are reads. In the example, the facility identifies the
exchange operations by thread one at times 1 and 4. In step 805,
the facility identifies as a functionally unique a reordering or
the access by the first thread at the identified time, together
with each of the access by the second thread identified in step
804. In the example, the facility identifies the following two
reorderings as functionally unique reorderings: a reordering of the
exchange by thread 2 at time 5 with the exchange by thread 1 at
time 1, and a reordering of the exchange by thread 2 at time 5 with
the exchange by thread 1 at time 4. In step 806, if the additional
identified instances remain to be processed, then the facility
continues in step 802 to process the next identified instance, else
these steps conclude. In the example, the facility identifies
one-third as many functionally distinct reorderings as identified
by a more naive approach to selecting reorderings to consider
pursuing in a simulation. Accordingly where there was a concurrency
bug that could only be exposed by simulating a path incorporating a
single possible reordering that is a functionally distinct
reordering, the facility is three times as likely to discover the
concurrency bug if it pursues paths based on functionally distinct
reorderings than it is if it pursues paths based on all possible
reorderings.
[0080] FIG. 9 is a sample path diagram showing paths containing
functionally distinct reorderings identified by the facility. The
path diagram shows in double lines the scheduled pursued by the
facility in the preceding simulation as shown in FIG. 7. In
particular, each of the accesses 902-909 corresponding to accesses
shown in columns 712 and 713 of FIG. 7. As described above in
connection with FIG. 8, and the example shown in FIG. 7, the
facility identified two functionally distinct reorderings that each
lead to a functionally distinct schedule: reordering the exchange
of A by thread 2 at time 5--access 911 in FIG. 9--to occur before
the exchange of A performed at time 1 by thread 1--shown as access
902; and reordering the exchange of A at time 5 by thread 2--shown
as access 914--before the exchange of A at time 4 by thread
1--access 905. Also shown as broken single lines are possible
schedules that are based upon interleavings not found by the
facility to be functionally distinct, such as those in which access
914 occurs after access 903 or 904. In some embodiments, the
facility makes only path schedules based upon the two functionally
distinct schedules available for selection for the next
simulation.
[0081] FIG. 10 is a flow diagram showing steps typically performed
by the facility in some embodiments in order to employ hinting to
guide the exploration performed by the facility. Such hinting is
sometimes referred to as "paravirtualized speculation hinting." In
step 1001, the developer of the program includes hints in the
program's code. In some embodiments, the developer also or instead
specifies that particular hints apply to either points or regions
within the code without modifying the code, such as by storing
indications of the hints separately from a code with some form of
index into the code that can be used by the facility to identify
the particular point or region twitch and applies, such as
instruction counts, instruction signatures, etc. In various
embodiments, the hints established for the program in step 1001 can
be of a variety of types.
[0082] One type of hint recognized by the facility in some
embodiments is assertions. Each assertion specifies a condition
that the developer always expects to be true when execution reaches
the point in the program where the assertion is placed. For
example:
TABLE-US-00002 ... price = *A; Assert (price >= 0);
ChargeCustomer (quantity * price); ...
[0083] By including the assertion in this code snippet, the
developer is indicating that something has gone wrong if the value
for the variable price, retrieved from location A, is negative. In
some cases, the facility leverages assertions added to the code by
the developer for a purpose other than guiding exploration, such as
for use with a debugger. An example of such assertions that are
leveraged in some embodiments is a language level assertion. In
some embodiments, the facility causes a compiler or a compiler
pre-processor to substitute specialized simulation hinting
assertions for language-level assertions.
[0084] One type of hint recognized by the facility in some
embodiments is hints relating to a level of interestingness of code
near a point or within a region, referred to herein as "code
interestingness hints." Interestingness hints may be either
positive, indicating that nearby code is interesting and nearby
schedules should be explored, or negative, indicating that nearby
code is not interesting and nearby schedules should not be
explored. In some embodiments, the facility can specify weightings
for interestingness hints, particularly positive interestingness
hints, indicating how interesting code nearby the hint is. In some
embodiments, interestingness hints are further subdivided, such as
into a class where the nearby code is interesting because it is
sensitive, as contrasted with another class where the nearby code
is interesting because it is expected to be rarely executed. In
some embodiments, hints, such as interestingness hints, can be
conditional, such as a hint that specifies that code near a certain
point is interesting if, at that point in the program's execution,
the available resources of a certain type is below a specified
threshold.
[0085] It is typical to add a hint relating to a particular point
in the code by adding a single function call specifying the type of
hint to the code at the point in question. For example:
TABLE-US-00003 ... InterestingPoint( ); average_price = price / 0;
...
[0086] It is typical to add a hint relating to a particular region
of code by adding two different function calls specifying the type
of the hint to the code, one at the beginning of the region and one
at the end of the region. For example:
TABLE-US-00004 ... BeginInterestingRegion(23); price = *A; quantity
= *B; EndInterestingRegion(23); ...
[0087] One type of hint recognized by the facility in some
embodiments is hints relating to a level of interestingness of a
piece of data or a data structure, referred to herein as "data
interestingness hints." It is typical to add a data interestingness
hint relating to a particular piece of data or data structure by
adding a function call specifying the type of hint to the code
adjacent to the declaration for the data or data structure. For
example:
TABLE-US-00005 ... int foo; InterestingnessData( ); ...
[0088] One type of hint recognized by the facility in some
embodiments is "directive hints," which specify that particular
schedules of memory accesses should occur, either during all
simulations or during a particular percentage of simulations. In
some embodiments, the developer adds such a directive hints to the
code by adding hints at the beginning and end of a region of code
executed by first thread, and adding a third hint to a point in
code executed by a second thread that indicates that the second
thread should be suspended at the point of the third hint until the
first thread finishes executing the region defined by the first two
hints.
[0089] In some embodiments, the facility automatically adds hints
to the code in response to actions by users. For example, in some
embodiments, where a user uses a debugger to establish a watch
point for a piece of data or data structure, the facility
automatically adds a data interestingness hint for that data or
data structure. As another example, in some embodiments, in
response to a user using a debugger to add a break point to the
program's code at a particular point, the facility automatically
establishes a code interestingness hint at that point.
[0090] Returning to FIG. 10, in step 1002, during each simulation,
the facility uses some or all of the code and/or data hints it
encounters while executing the simulation to score paths that are
related to the path pursued during the current simulation. These
scores are then used by the facility in step 3010 to select a path
to be pursued in a future simulation. In some embodiments, the
facility uses some or all of interestingness hints and directive
hints as a basis for scoring paths in step 1002.
[0091] FIG. 11 is a sample path diagram showing path scores
established by the facility in some embodiments using hints. The
path diagram shows in double lines the path pursued by the facility
in the preceding simulation, made up of schedules 1111, 1112, 113,
1114, and 1115. As described above in connection with FIG. 8, the
facility identified functionally distinct reorderings that lead to
several other possible schedules: schedules 1122 and 1132 on
location D; schedules 1143 and 1153 on location E; schedules 1164,
1174, and 1184 on location F; and schedule 1955 location G. During
the simulation, the facility encountered in interestingness hint at
point 1100, most proximal to the node for location E, and somewhat
less proximal to the node for location F. Accordingly, the facility
scores the highest schedules 1143 and 1153 on location E, and
scores somewhat less highly schedules 1164, 1174, and 1184 on
location F. In some embodiments, the facility uses these scores to
select among the shown additional schedules for the next simulation
by randomly selecting among the schedules when weighted with their
scores. Under this approach, only schedules on locations E and F
can be selected, and each of the schedules on location E is four
times as likely to be selected as each of the schedules on location
F.
[0092] Returning to FIG. 10, in step 1003, after each simulation
has completed, the facility uses some or all of the hints include
in the code in step 1001 to score the path pursued during the
simulation as a basis for selecting a simulation of the round to
retire. In some embodiments, the facility uses failed assertions as
a basis for determining this retirement score for the simulation.
In some embodiments, the facility uses crashes or other errors
encountered during the simulation as a basis for determining the
retirement score for the simulation, including such errors as
general protection faults, divide by zero exceptions, accessing a
null pointer, etc. after step 1003, these steps conclude.
[0093] In some embodiments, interestingness hints are instead or
also used by the facility to choose paths to pursue in future
simulations in a different way. In such embodiments, the facility
determines at what times during the current simulation interesting
points or regions occur in different threads, and designs a future
path intended to cause interesting points and regions in different
threads to intersect in time. FIG. 12 is the thread timing diagram
showing the relative timing of execution of interesting regions by
different threads. Timeline 1210 shows the execution schedule for
thread 1, and timeline 1220 shows the execution schedule for thread
2. Timeline 1210 shows that thread 1 encounters an interesting
region during time period 1211, which is completely distinct from
time period 1221 in which thread 2 encounters an interesting
region. In these embodiments, the facility designs a path for a
future simulation in which thread 1 is suspended when it reaches
point 1212, and is only allowed to resume execution when thread to
reaches point 1222, thus causing these interesting regions to
intersect in time.
[0094] In some embodiments, the facility provides a mechanism for
delivering at least some types of interrupts to simulations
executing on a virtual machine--such as inter-processor interrupts
("IPIs") and timer device interrupts ("TDIs")--and/or permits
simulations to continue to execute in the virtual machine through
the invocation of an interrupt handler after an interrupt of one of
these types occurs. In some embodiments, the facility provides such
a mechanism by providing a synthetic advanced programmable
interrupt controller ("APIC") that sends and receives IPIs, and/or
by providing virtual timers to generate TDIs.
[0095] In some embodiments, the facility employs a fully or
partly-virtualized advanced programmable interrupt controller
("APIC") that has the ability to send IPIs and deliver and
acknowledge incoming IPIs and timer interrupts. Such APIC's effect
IPIs by calling an IPI handler callback. A default such callback
posts IPIs to the physical APIC.
[0096] In some embodiments, the facility prohibits simulation at a
time when a pending IPI exists in the physical APIC. In some
embodiments, the facility similarly prohibits simulation when any
bit is set in the physical APIC IRR.
[0097] In some embodiments, when an IPI is sent to a virtual
processor executing a thread that is presently suspended by the
facility, the facility unsuspends the thread and virtual processor
so that the IPI can be acknowledged.
[0098] When an IPI is sent to a virtual processor that is presently
in a halted or power saving state, the facility unhalts the virtual
processor so that the IPI can be acknowledged and appropriate
interesting actions taken, such as the scheduling of a new thread
on the virtual processor.
[0099] In some embodiments, if a simulation sets a bit of the
in-service register ("ISR") bit but does not clear it, the facility
does not allow that simulation to retire. In some embodiments, the
facility enforces this constraint using its simulation scoring
mechanism.
[0100] In some embodiments, the facility takes steps to avoid
suspending any thread and corresponding virtual processor for an
extended period of time that may be an IPI target. In some cases,
the facility achieves this by observing a rule that no thread and
associated virtual processor can be suspended in an interval of a
predetermined length that begins when an IPI is sent.
[0101] In some embodiments, the facility is able to begin a
simulation with an ISR bit set in the physical APIC. If the ISR bit
is cleared in the simulation by an end of interrupt signal ("EOI"),
the facility replays the EOI to the physical APIC during replay of
the simulation. If the ISR bit is not cleared by an EOI during the
simulation, then the physical APIC is still in the correct state
when replay concludes.
[0102] In some embodiments, the facility forces additional thread
schedules by delaying the delivery of interrupts such as IPIs and
TDIs. In some cases, the facility delivers broadcast IPIs to some
processors before others, which in some cases can cause the
operating system scheduled to create new scheduling combinations.
In some embodiments, the facility uses this technique to reorder
across a context switch: On a two-processor machine on which thread
set (1, 2) is running concurrently before an IPI, and thread set
(3, 4) is running concurrently after the IPI; by delaying the IPI
to thread 2's virtual processor but delivering it to thread 1's
virtual processor, the facility causes thread set (1, 4) to run
concurrently, and similarly causes thread set (3, 2) to run
concurrently.
[0103] In some embodiments, the facility forces additional thread
schedules by delaying or creating timer device interrupts (TDIs),
so as to manipulate the perceived time the program has spent
executing and thus influence the operating system scheduler into
creating rare thread scheduling combinations. In other embodiments,
manipulation of the operating system scheduler by varying time
inputs may be accomplished, or augmented, by accelerating or
decelerating the observed passage of time through various
virtualized clock devices, such as the real time timestamp counter
(TSC).
[0104] In some embodiments, the facility speculates based on the
state of inputs other than the timing of answer-thread
communications, such as input from a variety of devices. In some
embodiments, the facility speculates based upon network card inputs
in order to perform security testing or fuzz testing. In some
embodiments, the facility speculates based upon clock inputs to
ensure more complete test coverage. The facility can use the
hypervisor to inject these inputs on behalf of devices that cannot
easily be simulated by the developer of the program.
[0105] While various embodiments are described with reference to
multithreaded program code, in some embodiments, the facility
identifies bugs and/or generates a replay package for other types
of program code, including a single-threaded program executing
alone, a single-threaded program running on top of an operating
system, a task-based application, operating system code, or the
like.
[0106] It will be appreciated by those skilled in the art that the
above-described facility may be straightforwardly adapted or
extended in various ways. While the foregoing description makes
reference to particular embodiments, the scope of the invention is
defined solely by the claims that follow and the elements recited
therein.
* * * * *