U.S. patent application number 13/326587 was filed with the patent office on 2013-06-20 for control flow-based approach in implementing exception handling on a graphics processing unit.
This patent application is currently assigned to ADVANCED MICRO DEVICES, INC.. The applicant listed for this patent is Gang Chen, Dz-ching Ju, Norman Rubin. Invention is credited to Gang Chen, Dz-ching Ju, Norman Rubin.
Application Number | 20130159685 13/326587 |
Document ID | / |
Family ID | 47472078 |
Filed Date | 2013-06-20 |
United States Patent
Application |
20130159685 |
Kind Code |
A1 |
Ju; Dz-ching ; et
al. |
June 20, 2013 |
CONTROL FLOW-BASED APPROACH IN IMPLEMENTING EXCEPTION HANDLING ON A
GRAPHICS PROCESSING UNIT
Abstract
A function in source code is processed by a compiler for
execution on a graphics processing unit, wherein the function
includes an exception handling structure. An exception raising
block is converted into a first control flow and an exception
handler block is converted into a second control flow. The first
control flow includes setting an exception raised indicator and
finding an exception handler to process the raised exception. The
exception raised indicator remains set until an appropriate
exception handler is found. The second control flow includes
clearing the exception raised indicator and processing the
exception.
Inventors: |
Ju; Dz-ching; (Saratoga,
CA) ; Rubin; Norman; (Cambridge, MA) ; Chen;
Gang; (Southborough, MA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Ju; Dz-ching
Rubin; Norman
Chen; Gang |
Saratoga
Cambridge
Southborough |
CA
MA
MA |
US
US
US |
|
|
Assignee: |
ADVANCED MICRO DEVICES,
INC.
Sunnyvale
CA
|
Family ID: |
47472078 |
Appl. No.: |
13/326587 |
Filed: |
December 15, 2011 |
Current U.S.
Class: |
712/244 ;
712/E9.06 |
Current CPC
Class: |
G06F 8/443 20130101 |
Class at
Publication: |
712/244 ;
712/E09.06 |
International
Class: |
G06F 9/38 20060101
G06F009/38 |
Claims
1. A method for processing a source code function by a compiler for
execution on a graphics processing unit (GPU), the source code
function including an exception handling structure, the method
comprising: converting an exception raising block in the source
code function into a first control flow for execution on the GPU,
wherein the first control flow includes: setting an exception
raised indicator; and finding an exception handler to process the
raised exception; and converting an exception handler block in the
source code function into a second control flow for execution on
the GPU, wherein the second control flow includes: clearing the
exception raised indicator; and processing the exception.
2. The method according to claim 1, wherein the exception raised
indicator remains set until an appropriate exception handler is
found.
3. The method according to claim 1, wherein the finding includes:
comparing an exception object type against one or more candidate
exception handlers in a current lexical scope; jumping to the
exception handler if the exception object type matches an exception
handler type; and jumping to a landing pad block in a lexical scope
one layer outside the current lexical scope if no matching
exception handler is found.
4. The method according to claim 3, wherein the landing pad block
includes: calling destructors for objects local to the current
lexical scope; comparing the exception object type against each
candidate exception handler in the current lexical scope; jumping
to the exception handler if the exception object type matches an
exception handler type; and jumping to a return function in the
current lexical scope if no matching exception handler is
found.
5. The method according to claim 4, wherein the return function
includes: calling destructors for objects local to the current
function; and returning to a calling function.
6. The method according to claim 1, wherein the first control flow
further includes: calling destructors for objects local to a
current lexical scope.
7. The method according to claim 1, wherein the second control flow
further includes: jumping to a location in the function after the
exception raising block.
8. A system, comprising: a processor; and a compiler executed by
the processor to perform operations to process a source code
function for execution on a graphics processing unit (GPU, the
operations including: converting an exception raising block in the
source code function into a first control flow for execution on the
GPU, wherein the first control flow includes: setting an exception
raised indicator; and finding an exception handler to process the
raised exception; and converting an exception handler block in the
source code function into a second control flow for execution on
the GPU, wherein the second control flow includes: clearing the
exception raised indicator; and processing the exception.
9. The system according to claim 8, wherein the exception raised
indicator remains set until an appropriate exception handler is
found.
10. The system according to claim 8, wherein the finding includes:
comparing an exception object type against one or more candidate
exception handlers in a current lexical scope; jumping to the
exception handler if the exception object type matches an exception
handler type; and jumping to a landing pad block in a lexical scope
one layer outside the current lexical scope if no matching
exception handler is found.
11. The system according to claim 10, wherein the landing pad block
includes: calling destructors for objects local to the current
lexical scope; comparing the exception object type against each
candidate exception handler in the current lexical scope; jumping
to the exception handler if the exception object type matches an
exception handler type; and jumping to a return function in the
current lexical scope if no matching exception handler is
found.
12. The system according to claim 11, wherein the return function
includes: calling destructors for objects local to the current
function; and returning to a calling function.
13. The system according to claim 8, wherein the first control flow
further includes: calling destructors for objects local to a
current lexical scope.
14. The system according to claim 8, wherein the second control
flow further includes: jumping to a location in the function after
the exception raising block.
15. A non-transitory computer-readable storage medium storing a set
of instructions for execution by a computer to process a source
code function for execution on a graphics processing unit (GPU),
the source code function including an exception handling structure,
the set of instructions comprising: a first converting code segment
for converting an exception raising block in the source code
function into a first control flow for execution on the GPU,
wherein the first control flow includes: setting an exception
raised indicator; and finding an exception handler to process the
raised exception; and a second converting code segment for
converting an exception handler block in the source code function
into a second control flow for execution on the GPU, wherein the
second control flow includes: clearing the exception raised
indicator; and processing the exception.
16. The non-transitory computer-readable storage medium according
to claim 15, wherein the exception raised indicator remains set
until an appropriate exception handler is found.
17. The non-transitory computer-readable storage medium according
to claim 15, wherein the finding includes: comparing an exception
object type against one or more candidate exception handlers in a
current lexical scope; jumping to the exception handler if the
exception object type matches an exception handler type; and
jumping to a landing pad block in a lexical scope one layer outside
the current lexical scope if no matching exception handler is
found.
18. The non-transitory computer-readable storage medium according
to claim 17, wherein the landing pad block includes: calling
destructors for objects local to the current lexical scope;
comparing the exception object type against each candidate
exception handler in the current lexical scope; jumping to the
exception handler if the exception object type matches an exception
handler type; and jumping to a return function in the current
lexical scope if no matching exception handler is found.
19. The non-transitory computer-readable storage medium according
to claim 18, wherein the return function includes: calling
destructors for objects local to the current function; and
returning to a calling function.
20. The non-transitory computer-readable storage medium according
to claim 15, wherein the first control flow further includes:
calling destructors for objects local to a current lexical
scope.
21. The non-transitory computer-readable storage medium according
to claim 15, wherein the second control flow further includes:
jumping to a location in the function after the exception raising
block.
22. The non-transitory computer-readable storage medium of claim
15, wherein the instructions are hardware description language
(HDL) instructions used for the manufacture of a device.
Description
FIELD OF THE INVENTION
[0001] The present invention is generally directed to implementing
exception handling in computer program code, and in particular, to
implementing exception handling in computer program code run on a
graphics processing unit.
BACKGROUND
[0002] The problem that is addressed herein is to support the
exception handling feature of programming languages for graphics
processing unit (GPU) computing applications on GPU architectures.
Many programming languages, such as C++, Java, C#, Python, Ada,
Ruby, and more, support exception handling (EH), which provides a
way to react to exceptional circumstances (like runtime errors) in
a program by transferring control and information from the
exception point to an exception handler. The purpose of EH is to
cleanly separate the error handling from the rest of the program
logic. The C++ EH feature is used herein to discuss the issues and
to illustrate the proposed techniques, but the discussions and
techniques are also applicable to the EH support in other
languages.
[0003] The current C++ EH mechanism is defined with respect to
single thread execution. There are certain extensions beyond the
current C++ language standard to propagate exceptions across
multiple threads. A number of such extensions and how they may be
handled on the Fusion System Architecture (FSA), which is an
architecture for accelerated processing units (APUs), which combine
features of a central processing unit (CPU) and a GPU on a single
die and as manufactured by Advanced Micro Devices, Inc., have been
previously described.
[0004] C++ EH primarily consists of the try, catch, throw, and
re-throw constructs. A try block encloses a portion of code under
exception inspection. An exception is thrown by using a throw
clause from inside a try block. Exception handlers are declared
with a catch clause, which is placed immediately after the
corresponding try block. If no exception is thrown, the program
execution continues normally and all handlers are ignored. Matching
a thrown exception object to an exception handler is based on the
type specified in the catch clause. If an exception is thrown but
is not caught by any immediate catch clause, the exception is
propagated to the enclosing try blocks to check against their
respective catch clauses. If an exception handler is not located
within the current function, the current function returns and the
call stack is unwound to the caller to search for a proper
exception handler. This process continues until an exception
handler is found or the execution is terminated when the search
exhausts all call stack frames.
[0005] The following is a simple C++ EH example.
TABLE-US-00001 int foo ( ) { int x = 30; try { throw 20; } catch
(float e) // int e? { x = 40; } return x; } int main ( ) { int x =
0; try { x = foo( ); } catch (int e) { x = 10; } printf("%d\n", x);
}
[0006] Because the function foo( ) throws an exception object with
a value of 20, which is an integer, the exception is not caught
within the function foo( ) because the exception handler has a
float type, and the function foo( ) immediately returns and
propagates the exception to the caller. The exception is caught by
the handler "catch (int e)" in the function main( ), and the output
of the program is 10. If the exception handler in the function foo(
) were "catch (int e)" as noted in the comment, the thrown
exception would have been caught and handled by the handler in the
function foo( ) The function foo( ) would have returned normally
without an exception, and the output would have been 40.
[0007] While the C++ language including the EH feature has been
fully supported on CPUs for decades, the differences between CPU
and GPU architectures as well as the differences in their
associated tool chains present new challenges to support some
existing C++ language features on a GPU.
[0008] Because GPUs use the SIMD (single instruction, multiple
data) execution model (e.g., a vector instruction) to support data
parallelism (each thread executing with one piece of data), a set
of work-items share the same instruction pointer and are executed
in lock-steps. But there are times when these work-items want to
execute different code paths due to the differences in the data
processed by the respective work-items.
[0009] Predication is one mechanism to handle such thread
divergence, where the predicated-off work-items still execute the
same instruction stream along with the predicated-on work-items,
except that they do not write any results to affect the
architectural states. But predication usually handles only a
limited set of control flow divergence found in regular control
flow structures. With divergence in more complex control flows, the
GPU architecture may serialize the execution of diverged work-items
through a pair of specially marked branch and join instructions.
Because compilers typically generate codes one function at a time,
it is infeasible to place the branch and join instructions in
different functions in such cases. This practically limits the
support of thread divergence to within a function scope. If the
execution of work-items may diverge across a function boundary,
this restriction would require the functions to join at the
function granularity and then diverge again immediately after
function return.
[0010] The GPU tool chains are noticeably different from the CPU
tool chains in that because the GPU architectures evolve quickly
and have a proprietary instruction set architecture (ISA), GPU
vendors typically provide an abstraction intermediate
representation (as opposed to an actual ISA) to software, where
this abstract intermediate representation is stable over multiple
generations of GPU architectures.
[0011] In the case of FSA, this layer is called FSAIL (FSA
Intermediate Language). Because of the gap between FSAIL and the
native GPU ISA, a layer of software is required to dynamically
translate FSAIL instructions to the native GPU ISA, and this
software is called the FSA just-in-time (JIT) compiler. The
compiler which generates the FSAIL instructions has a similar role
to the typical CPU compiler, and in contrast to the JIT compiler,
this compiler is a high-level compiler. Because the JIT compiler
translates FSAIL instructions and possibly re-orders the produced
native GPU instructions, the FSAIL instruction order produced by
the high-level compiler may not be preserved in the JIT-produced
native instruction sequence. This potentially inconsistent order
presents a challenge if a high-level compiler intends to
communicate certain information to the runtime system by relying on
the FSAIL instruction order, which is part of the design in a
"zero-cost" EH implementation on CPUs (discussed below).
[0012] EH is considered a high-productivity language feature
instead of a performance feature. Exceptions are expected to occur
infrequently and hence the performance of handling exceptions is
usually less of a concern. One of the key design issues in
implementing EH is to minimize any adverse performance impact when
EH constructs are present, but no exceptions are actually thrown,
which is expected a common case. Unless a compiler is told
otherwise, C++ programs by default have to assume that any function
call may throw an exception.
[0013] The EH features in C++ and other programming languages have
been well-supported on CPUs. Some initial EH implementations simply
mapped EH constructs back to setjmp/longjmp instructions, which had
been an error handling mechanism predating the general EH language
feature. There are a number of issues with the setjmp/longjmp
approach. First, it imposes a fair amount of overhead to save and
set up information on regular execution paths to be ready to handle
exceptions even if no exceptions are eventually thrown. Second, the
presence of setjmp/longjmp instructions often require shutting down
many subsequent compiler optimizations to preserve the states that
need to be saved and restored. In addition, setjmp/longjmp
instructions may transfer control flows across a function boundary.
While this is not an issue on a CPU, it does not work well with the
current GPU data parallel architectures as mentioned above.
[0014] In one implementation, the Itanium application binary
interface (ABI) Exception Handling Specification defines a
methodology for providing outlying data in the form of exception
tables, without inlining the testing of exception occurrence to
conditionally branch to exception handling code in the flow of an
application's main algorithm. Thus, the specification is said to
add "zero-cost" to the normal execution of an application.
[0015] In the "zero-cost" EH implementation, a C++ compiler
generates exception tables stored in data sections of object files
and retrieved by the C++EH runtime library when an exception is
thrown during program execution. The runtime system first attempts
to find an exception frame corresponding to the function where the
exception was thrown. The exception frame contains a reference to
an exception table describing how to process the exception. If the
exception needs to be forwarded to a prior activation (i.e., a
caller), the exception frame contains information about how to
unwind the current activation and restore the state of the prior
activation. An exception handling personality is defined by way of
a personality function (e.g., _gxx_personality_v0 in C++), which
receives the context of the exception, an exception structure
containing the exception object type and value, and a reference to
the exception table for the current function.
[0016] An exception table is organized as a series of code ranges
defining what to do if an exception occurs in that range.
Typically, the information associated with a range defines which
types of exception objects (using C++ type information) that are
handled in that range, and an associated action that should take
place. Actions typically pass control to a landing pad. A landing
pad corresponds to the code found in the catch portion of a
try/catch sequence. When execution resumes at a landing pad, it
receives the exception structure and a selector corresponding to
the type of exception thrown. The selector is then used to
determine which catch clause should process the exception.
[0017] While the steps to identify exception handlers and handle
exceptions are elaborative in the "zero-cost" EH implementation,
the biggest advantage of this approach is that these costs are
incurred only when exceptions occur. Normal execution paths, where
no exceptions are thrown, have minimal performance impact. Another
benefit is that this approach puts a fair amount of work, e.g.,
stack unwinding and exception frames, into the common ABI of a
given architecture. This common support can be shared across the EH
features often with slight variations among different programming
languages and can reduce the amount of language-specific work. This
also allows EH to work when mixing functions written in different
languages in an application.
[0018] Applying the "zero-cost" EH implementation to a GPU
encounters certain issues. First, the unwinding step could lead to
threads divergent across a function boundary. This is related to a
hardware limitation of GPUs, in that thread divergence on a GPU
only works within a function boundary. But, for EH to work
properly, it needs to be able to work across function boundaries
(to be able to locate an exception handler for a thrown
exception).
[0019] Second, the FSAIL instructions that are generated by
high-level compilers are abstract instructions and may be
subsequently re-ordered by the JIT compiler. Checking an
exception-throwing instruction against the code ranges tracked in
the exception tables generated by the high-level compilers may be
problematic, because the re-ordered instructions may not be in the
original code range as shown in the exception tables. In contrast,
the instruction sequence generated by a CPU compiler is final and
checking a given instruction against code ranges in exception
tables is not an issue.
SUMMARY
[0020] A method is described for processing a function in source
code by a compiler for execution on a graphics processing unit,
wherein the function includes an exception handling structure. The
method includes converting an exception raising block into a first
control flow and converting an exception handler block into a
second control flow. The first control flow includes setting an
exception raised indicator and finding an exception handler to
process the raised exception. The second control flow includes
clearing the exception raised indicator and processing the
exception. The exception raised indicator remains set until an
appropriate exception handler is found.
[0021] A system includes a processor and a compiler executed by the
processor to perform operations. The operations performed by the
compiler include converting an exception raising block into a first
control flow and converting an exception handler block into a
second control flow. The first control flow includes setting an
exception raised indicator and finding an exception handler to
process the raised exception. The second control flow includes
clearing the exception raised indicator and processing the
exception.
[0022] A computer-readable storage medium storing a set of
instructions for execution by a computer to process a function in
source code for execution on a graphics processing unit, wherein
the function includes an exception handling structure. The set of
instructions includes a first converting code segment for
converting an exception raising block into a first control flow and
a second converting code segment for converting an exception
handler block into a second control flow. The first control flow
includes setting an exception raised indicator and finding an
exception handler to process the raised exception. The second
control flow includes clearing the exception raised indicator and
processing the exception.
BRIEF DESCRIPTION OF THE DRAWINGS
[0023] A more detailed understanding may be had from the following
description, given by way of example in conjunction with the
accompanying drawings wherein:
[0024] FIG. 1 is a block diagram of an example device in which one
or more disclosed embodiments may be implemented;
[0025] FIG. 2 is a flowchart of a method for processing C++ code to
implement exception handling on a GPU;
[0026] FIG. 3 is a flowchart of a method for processing a catch
clause in a current try block;
[0027] FIG. 4 is a flowchart of a method for processing a throw
clause or a function call in a current try block;
[0028] FIG. 5 is a flowchart of a method for processing a catch
clause in an enclosing try block;
[0029] FIG. 6 is a flowchart of a method for processing a found
handler flag in an enclosing try block;
[0030] FIG. 7 is a flowchart of a method for processing a found
handler flag in a current try block; and
[0031] FIG. 8 is a flowchart of a method for processing a function
located outside a try block.
DETAILED DESCRIPTION
[0032] A function in source code is processed by a compiler for
execution on a graphics processing unit, wherein the function
includes an exception handling structure. An exception raising
block is converted into a first control flow and an exception
handler block is converted into a second control flow. The first
control flow includes setting an exception raised indicator and
finding an exception handler to process the raised exception. The
exception raised indicator remains set until an appropriate
exception handler is found. The second control flow includes
clearing the exception raised indicator and processing the
exception.
[0033] FIG. 1 is a block diagram of an example device 100 in which
one or more disclosed embodiments may be implemented. The device
100 may include, for example, a computer, a gaming device, a
handheld device, a set-top box, a television, a mobile phone, or a
tablet computer. The device 100 includes a processor 102, a memory
104, a storage 106, one or more input devices 108, and one or more
output devices 110. The device 100 may also optionally include an
input driver 112 and an output driver 114. It is understood that
the device 100 may include additional components not shown in FIG.
1.
[0034] The processor 102 may include a central processing unit
(CPU), a graphics processing unit (GPU), a CPU and GPU located on
the same die, or one or more processor cores, wherein each
processor core may be a CPU or a GPU. The memory 104 may be located
on the same die as the processor 102, or may be located separately
from the processor 102. The memory 104 may include a volatile or
non-volatile memory, for example, random access memory (RAM),
dynamic RAM, or a cache.
[0035] The storage 106 may include a fixed or removable storage,
for example, a hard disk drive, a solid state drive, an optical
disk, or a flash drive. The input devices 108 may include a
keyboard, a keypad, a touch screen, a touch pad, a detector, a
microphone, an accelerometer, a gyroscope, a biometric scanner, or
a network connection (e.g., a wireless local area network card for
transmission and/or reception of wireless IEEE 802 signals). The
output devices 110 may include a display, a speaker, a printer, a
haptic feedback device, one or more lights, an antenna, or a
network connection (e.g., a wireless local area network card for
transmission and/or reception of wireless IEEE 802 signals).
[0036] The input driver 112 communicates with the processor 102 and
the input devices 108, and permits the processor 102 to receive
input from the input devices 108. The output driver 114
communicates with the processor 102 and the output devices 110, and
permits the processor 102 to send output to the output devices 110.
It is noted that the input driver 112 and the output driver 114 are
optional components, and that the device 100 will operate in the
same manner is the input driver 112 and the output driver 114 are
not present.
[0037] To be an acceptable solution to support EH on a GPU, the
solution has to (1) allow excepting and non-excepting work items
(i.e., threads) to join their execution at each function boundary
(due to the GPU hardware limitations), and (2) have minimal
performance overhead when no exceptions are thrown, because the CPU
zero-cost case described above cannot be achieved on a GPU.
[0038] In this approach, a high-level C++ compiler transforms throw
clauses and the functions in a try block that may throw exceptions
to a sequence of control flows, which compare the exception object
type against each candidate exception handler. If there is a match,
a branch instruction jumps to the matched handler to handle the
thrown exception and then resumes normal execution. The sequence of
checking candidate exception handlers traverses enclosing try
blocks and their associated exception handlers from inner to outer
scopes. If the exception object type is known at the compilation
time, the compiler may simplify the control flows and jump directly
to the corresponding handler. It is noted that the compiler also
has to generate code to destruct live objects local to the scope
that is being exited.
[0039] Current GPUs are already capable of dealing with thread
divergence under arbitrary control flows expressed in FSAIL
instructions within a given function. The JIT compiler translates
FSAIL branch instructions into predicated code or special native
conditional branch and join instructions. Both excepting threads
(those threads that throw an exception) and non-excepting threads
have to join back together at a function boundary. When execution
reaches a function boundary, the execution waits and does not begin
to unwind the stack if there are still exceptions to be handled.
The execution waits for non-excepting threads to execute and return
(i.e., complete execution). If and only if both the excepting
threads and the non-excepting threads reach the return point (e.g.,
the end of the function), both will return back to the caller. This
restriction on returning back to the caller is imposed because of
the nature of SIMD execution.
[0040] When the excepting thread returns and has not found an error
handler for the exception, the thread needs to continue to look for
an appropriate error handler. The execution flow returns to the
caller, and the excepting and non-excepting threads may diverge
again. Branches are used to lead the diverging thread forward. To
allow an excepting work-item whose exception has not been handled
upon a function return to continue looking for a handler instead of
executing the normal code paths as the non-excepting work-items, a
reserved FSAIL variable "private_b8 hasexceptionhappened" is
defined. This variable (referred to generally herein as an
"exception flag") has a global scope but is unique to each
work-item, and it is allocated outside of any function. The
convention is to set the exception flag as soon as an exception is
raised, and reset the exception flag only after the exception is
handled. Upon the return from a call, a work-item needs to check
the value of the exception flag. If the flag is set, this work-item
needs to follow a code path divergent from the non-excepting
work-items to continue searching a proper handler. The JIT compiler
has to recognize this special variable and map it to a fixed memory
location (or possibly a register, as an optimization). Because the
C++EH specification does not allow multiple outstanding exceptions
in each thread, a single variable is sufficient for each work-item.
This provides the appearance of no thread divergence across
function boundaries.
[0041] The costs of checking exception types against exception
handlers and performing control flow transfers are incurred only
when an exception is thrown, except for the case that the exception
flag is checked upon a function return even if no exceptions have
been thrown. This is an artifact of not allowing thread divergence
across function boundaries. This comparison is an unavoidable but
acceptable overhead, because functions may continue to be
aggressively inlined for GPU offload functions. If a C++ high-level
compiler is informed through a compilation option that no
exceptions are thrown in an application, it does not need to
generate the code to check the exception flag.
[0042] An implementation in a C++ high-level compiler is described
the following pseudo code. The C++ code is processed, and
additional structure and mechanisms are added to the code to
perform the EH on the GPU. The resulting code follows the same
semantics of the original source code.
[0043] This implementation uses a global variable (the exception
flag) to indicate that any thread has thrown an exception. In the
examples below, this variable is referred to as
hasexceptionhappened. It is noted that a person skilled in the art
could devise other ways of tracking whether any thread has thrown
an exception (e.g., an exception raised indicator), without
altering the overall operation of the method. Once an appropriate
exception handler has been found, the exception handler resets this
variable to indicate that the exception has been handled, and the
thread can resume normal execution upon returning to the calling
function. The variable will remain set (as indicating that there is
an exception that has not yet been handled) upon the excepting
thread returning to the caller, unless the exception handler
routine resets it.
TABLE-US-00002 // Perform this during the start of each work-item.
Allocate hasexceptionhappened and initialize it to False;
ProcessExceptionHandling(Function currFunc) { Visit all try blocks
in currFunc in a lexical and outer-to-inner order { currTry = the
current try block; Add a label, join_label, following the currTry
block; For each catch clause in currTry { currCatch = the current
catch clause; Create a label attached to the beginning of
currCatch; Append an instruction to reset hasexceptionhappened in
currCatch; Append an unconditional jump to join_label in currCatch;
} For each throw or function call in currTry { currInst = the
current instruction, i.e. a throw or a call; foundHandler = False;
Create a landing pad block, landing_pad; if (currInst is a throw) {
Replace the throw with the landing_pad block; Capture the exception
object in exception_object; Append an instruction to set
hasexceptionhappened in landing_pad; } else if (currInst is a call)
{ Insert a conditional branch after currInst based on a True value
of Hasexceptionhappened and the branch target is landing_pad; //
The exception object is already in exception_object; } Visit the
enclosing try blocks of currTry following the inner-to-outer order
{ currScopeTry = the try block of the current scope; Append
destructor calls in landing_pad for currently live objects local to
currScopeTry; For each catch clause for currScopeTry { currCatch =
the current catch clause; if (the type of exception_object is known
&& the type equals the type of currCatch) { Append a jump
to the label of currCatch in landing_pad; foundHandler = True; }
else { Append a conditional branch in landing_pad by comparing the
type in exception_object against the type in currCatch and branch
on a True condition to the label of currCatch; if (currCatch is a
catch-all case) { foundHandler = True; } } if (foundHandler ==
True) break; } if (foundHandler == True) { break; } else { Create a
block, new_landing_pad; Append a jump in landing_pad to
new_landing_pad; landing_pad = new_landing_pad; } } if
(foundHandler == False) { // The exception may have to go to the
callers to find handlers. Append destructor calls in landing_pad
for the currently live objects local to currFunc; Append a return
instruction to landing_pad; } } } Visit all functions in currFunc
but not enclosed in any try block { currInst = the current call
instruction; Create a landing pad block, landing_pad; Insert a
conditional branch after currInst based on a True value of
Hasexceptionhappened and the branch target is landing_pad; Append
destructor calls in landing_pad for the currently live objects
local to currFunc; Append a return instruction to landing_pad; }
}
[0044] FIG. 2 is a flowchart of a method 200 for processing C++
code to implement exception handling on a GPU. The method 200 is
performed for each function block in the program code and shows an
overview of the code processing; several procedures will be further
described in additional detail below. The method 200 begins by
allocating an exception flag and initializing it to false (step
202).
[0045] The method 200 processes all of the try blocks in the
current function in a lexical and outer to inner order. A current
try block is selected and processed (step 204). The try block
processing includes adding a "join label" at the end of the current
try block, and is used as an exit point for the current try
block.
[0046] Each catch clause in the current try block is processed
(step 206). This catch clause processing will be described in
greater detail in connection with FIG. 3. Each throw clause or
function call in the current try block is processed (step 208).
This throw clause and function call processing will be described in
greater detail in connection with FIG. 4.
[0047] As part of processing each throw clause or function call in
the current try block, any other try blocks contained within the
current try block (referred to as "enclosing try blocks") are
visited in an inner to outer order (step 210). Destructor calls are
added to a landing pad block associated with the enclosing try
block for currently live objects that are local to the enclosing
try block being evaluated.
[0048] The landing pads as used herein follow the concept from the
CPU side, in that they are convenient locations for common branches
to go to if an appropriate exception handler for the exception
object type cannot be found. The landing pad acts as a placeholder
to call a destructor for live objects that are local to the current
function (because the function is being exited, this is part of the
necessary clean up). After this cleaning up of the function is
complete, the next "outer" enclosing scope is checked for an
appropriate exception handler for the exception object type. If an
appropriate exception handler is not found as the code moves back
up the layers of function calls, the landing pads are used at each
layer where an appropriate exception handler is not found.
[0049] When performing EH on a CPU, the branches are not explicit
(e.g., not directly to a landing pad). In a CPU implementation, an
EH routine performs a lookup in a table, and if there is no match
in the table, then the landing pad is used. This implementation is
waiting to pay a high penalty when an exception happens, meaning
that the implementation is structured such that when there is no
exception (which is the normal case), the code executes
efficiently, without table lookups, etc. But in a GPU
implementation, the execution flow is streamlined in using
conditional branches, not using indirect branches through lookup
tables, to jump to the landing pads. This is due to the nature of
the SIMD design of a GPU, in which the efficiencies realized in a
CPU implementation cannot be utilized in a GPU implementation.
[0050] Referring back to FIG. 2, each catch clause within the
enclosing try block is processed (step 212). This catch clause
processing will be described in greater detail in FIG. 5. After
each catch clause in the enclosing try block has been processed, a
found handler flag at the enclosing try block level is checked
(step 214). The found handler flag indicates whether an exception
handler for the thrown exception has been found. This process will
be described in greater detail in FIG. 6.
[0051] After visiting all of the enclosing try blocks within the
current try block, the found handler flag at the current try block
level is checked (step 216). This process will be described in
greater detail in FIG. 7. All other functions within the current
try block that are not enclosed in any other try block are
processed (step 218) and the method terminates (step 220).
Processing the other functions within the current try block will be
described in greater detail in FIG. 8.
[0052] The method 200 only imposes a low performance overhead on
non-excepting execution paths. A small amount of overhead is added
after each function return to check for excepting threads, but does
not add any other execution overhead if no exceptions occur. While
this approach adds a slight overhead compared to the "zero-cost"
approach on CPUs, it is more efficient compared to previous
approaches like using setjmp/longjmp instructions.
[0053] The method 200 does not rely on any handshake between the
FSAIL instructions and the exception tables generated by a
high-level compiler as in the CPU "zero-cost" approach. Because the
JIT compiler may expand the FSAIL instructions and alter the
instruction order, such a handshake is challenging to maintain
correctly in the GPU tool chains, where the JIT compiler is an
essential component.
[0054] FIG. 3 is a flowchart of a method for processing a catch
clause in a current try clause block (step 206 in FIG. 2). An
identifier label is added to the current catch clause (step 302).
In the current catch clause, an instruction is added to reset the
exception flag (step 304) and an instruction is added to jump to
the join label location, to exit the try block (step 306). The
processing of the current catch clause then terminates (step
308).
[0055] FIG. 4 is a flowchart of a method for processing a throw
clause or a function call in a current try clause block (step 208
in FIG. 2). The found handler flag is cleared (step 402). A landing
pad block is created (step 404) and the instruction is evaluated to
determine whether it is a throw or a call (step 406). If the
instruction is a throw, then the throw instruction is replaced with
the landing pad block (step 408). The exception object is captured
(step 410), an instruction is added to the landing pad to set the
exception flag (step 412), and the processing of the throw
instruction terminates (step 414). If the instruction being
evaluated is a call (step 406), then a conditional branch
instruction is added after the call instruction (step 416). This
conditional branch will be taken if the exception flag is set, with
a branch target of the landing pad. The processing of the call
instruction then terminates (step 414).
[0056] FIG. 5 is a flowchart of a method for processing a catch
clause in an enclosing try block (step 212 in FIG. 2). A
determination is made whether the exception object type matches the
catch clause type (step 502). If the exception object type and the
catch clause type match, then a jump instruction is added to the
current catch clause (step 504). The target of the jump instruction
is the catch clause label location in the landing pad. The found
handler flag is set (step 506) and the processing of the catch
clause terminates (step 508). If the exception object type does not
match the catch clause type (step 502), a conditional branch
instruction is added to the landing pad (step 510). This
conditional branch is taken if the exception object type matches
the catch clause type, and the branch target is the catch clause
label location in the landing pad. A determination is made whether
the current catch clause is a "catch-all" case (step 512). If the
current catch clause is a "catch-all" case, then the found handler
flag is set (step 506) and the processing of the catch clause
terminates (step 508). If the current catch clause is not a
"catch-all" case (step 512), then the processing of the catch
clause terminates (step 508).
[0057] FIG. 6 is a flowchart of a method for processing a found
handler flag in an enclosing try block (step 214 in FIG. 2). A
determination is made whether the found handler flag is set (step
602). If the found handler flag is not set, then a new landing pad
block is created (step 604). A jump instruction is added to the
current landing pad block, and the jump destination is the new
landing pad (step 606). The processing of the found handler flag
then terminates (step 608). If the found handler flag is set (step
602), then processing of the found handler flag terminates (step
608).
[0058] FIG. 7 is a flowchart of a method for processing a found
handler flag in a current try block (step 216 in FIG. 2). A
determination is made whether the found handler flag is set (step
702). If the found handler flag is not set, then a destructor for
live objects that are local to the function is added to the landing
pad (step 704). A return instruction is added to the landing pad
(step 706), and processing of the found handler flag terminates
(step 708). If the found handler flag is set (step 702), then
processing of the found handler flag terminates (step 708).
[0059] FIG. 8 is a flowchart of a method for processing a function
located outside a try block (step 218 of FIG. 2). A landing pad
block is created (step 802). A conditional branch is added after
the current call instruction (step 804). This conditional branch is
taken if the exception flag is set, with the branch target being
the landing pad. A destructor for live objects that are local to
the function is added to the landing pad (step 806). A return
instruction is added to the landing pad (step 808), and processing
of the function terminates (step 810).
[0060] The following is an example which applies this approach in
translating the C++ EH constructs into control flows on the current
GPU architectures.
TABLE-US-00003 try { // try 1 try { // try 2 if (..) throw e1; .. }
catch (t1) { // handler 1 } catch (t2) { //handler 2 } try { // try
3 if (..) throw e2; .. if (..) foo( ); .. } catch (t3) { // handler
3 } catch (t4) { // handler 4 } ... } catch (t5) { // handler 5 }
catch (t6) { // handler 6 } ... return; The transformed pseudo code
will look like the following. try1: { ... try2: { if (..) { //
throw e1; Hasexceptionhappened = True; Calling destructors for live
objects local to try2; if ( e1.type == t1 ) { goto catch_t1; } else
If ( e1.type == t2 ) { goto catch_t2; } else { goto
land_pad_e1_try1; } } ... } join_try2: try3: { if (..) { // throw
e2; Hasexceptionhappened = True; Calling destructors for live
objects local to try3; if ( e2.type == t3 ) { goto catch_t3; } else
If ( e2.type == t4 ) { goto catch_t4; } else { goto
land_pad_e2_try1; } } ... if (..) { foo( ); if
(Hasexceptionhappened) { goto landing_pad_foo_try3; } } }
join_try3: } join_try1: ... return; catch_t1: // handler 1
Hasexceptionhappened = False; goto join_try2; catch_t2: // handler2
Hasexceptionhappened = False; goto join_try2; land_pad_e1_try1:
Calling destructors for live objects local to try1; if ( e1.type ==
t5 ) { goto catch_t5; } else if ( e1.type == t6 ) { goto catch_t6;
} else { goto land_pad_e1_rtn; } land_pad_e1_rtn: // Did not find
an exception handler in the current function Calling destructors
for live objects local to the current function; return; catch_t3:
// handler3 Hasexceptionhappened = False; goto join_try3; catch_t4:
// handler4 Hasexceptionhappened = False; goto join_try3;
land_pad_e2_try1: Calling destructors for live objects local to
try1; if ( e2.type == t5 ) { goto catch_t5; } else if ( e2.type ==
t6 ) { goto catch_t6; } else { goto land_pad_e2_rtn; }
land_pad_e2_rtn: // Did not find an exception handler in the
current function Calling destructors for live objects local to the
current function; return; landing_pad_foo_try3: Calling destructors
for live objects local to try3 for leaving try3; if (
exception_object.type == t3 ) { goto catch_t3; } else if
(exception_object.type == t4 ) { goto catch_t4; } else { goto
land_pad_foo_try1; } land_pad_foo_try1: Calling destructors for
live objects local to try1; if (exception_object.type == t5 ) {
goto catch_t5; } else if (exception_object.type == t6 ) { goto
catch_t6; } else { goto land_pad_foo_rtn; } catch_t5: // handler5
Hasexceptionhappened = False; goto join_try1; catch_t6: // handler6
Hasexceptionhappened = False; goto join_try1; land_pad_foo_rtn: //
Did not find an exception handler in the current function Calling
destructors for live objects local to the current function; return;
// end of the transformed example
[0061] It should be understood that many variations are possible
based on the disclosure herein. Although features and elements are
described above in particular combinations, each feature or element
may be used alone without the other features and elements or in
various combinations with or without other features and
elements.
[0062] The methods provided may be implemented in a general purpose
computer, a processor, or a processor core. Suitable processors
include, by way of example, a general purpose processor, a special
purpose processor, a conventional processor, a digital signal
processor (DSP), a plurality of microprocessors, one or more
microprocessors in association with a DSP core, a controller, a
microcontroller, Application Specific Integrated Circuits (ASICs),
Field Programmable Gate Arrays (FPGAs) circuits, any other type of
integrated circuit (IC), and/or a state machine. Such processors
may be manufactured by configuring a manufacturing process using
the results of processed hardware description language (HDL)
instructions and other intermediary data including netlists (such
instructions capable of being stored on a computer readable media).
The results of such processing may be maskworks that are then used
in a semiconductor manufacturing process to manufacture a processor
which implements aspects of the present invention.
[0063] The methods or flow charts provided herein may be
implemented in a computer program, software, or firmware
incorporated in a non-transitory computer-readable storage medium
for execution by a general purpose computer or a processor.
Examples of computer-readable storage mediums include a read only
memory (ROM), a random access memory (RAM), a register, cache
memory, semiconductor memory devices, magnetic media such as
internal hard disks and removable disks, magneto-optical media, and
optical media such as CD-ROM disks, and digital versatile disks
(DVDs).
* * * * *