U.S. patent application number 12/633840 was filed with the patent office on 2010-04-08 for arithmetic device for concurrently processing a plurality of threads.
This patent application is currently assigned to FUJITSU LIMITED. Invention is credited to Norihito GOMYO, Ryuichi SUNAYAMA.
Application Number | 20100088544 12/633840 |
Document ID | / |
Family ID | 40155968 |
Filed Date | 2010-04-08 |
United States Patent
Application |
20100088544 |
Kind Code |
A1 |
GOMYO; Norihito ; et
al. |
April 8, 2010 |
ARITHMETIC DEVICE FOR CONCURRENTLY PROCESSING A PLURALITY OF
THREADS
Abstract
A processor is provided that is capable of concurrently
processing a sequence of instructions for a plurality of threads
achieving the retry success rate equivalent to the success rate in
processors that process a sequence of instructions for a single
thread. An arithmetic device 200 is provided with an instruction
execution circuit 201 for executing a plurality of threads, or an
execution control circuit 202 for controlling the execution state
or rerunning of the threads.
Inventors: |
GOMYO; Norihito; (Kawasaki,
JP) ; SUNAYAMA; Ryuichi; (Kawasaki, JP) |
Correspondence
Address: |
STAAS & HALSEY LLP
SUITE 700, 1201 NEW YORK AVENUE, N.W.
WASHINGTON
DC
20005
US
|
Assignee: |
FUJITSU LIMITED
Kawasaki
JP
|
Family ID: |
40155968 |
Appl. No.: |
12/633840 |
Filed: |
December 9, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
PCT/JP2007/000661 |
Jun 20, 2007 |
|
|
|
12633840 |
|
|
|
|
Current U.S.
Class: |
714/17 ;
714/E11.114 |
Current CPC
Class: |
G06F 11/1405
20130101 |
Class at
Publication: |
714/17 ;
714/E11.114 |
International
Class: |
G06F 11/14 20060101
G06F011/14 |
Claims
1. An arithmetic device, comprising: an instruction execution
circuit to execute a plurality of threads, and to detect and notify
a hardware error that occurs during the execution; and an execution
control circuit, once it is notified of the hardware error by the
instruction execution circuit, to order the instruction execution
circuit to cancel execution of all threads and wait until all the
threads are cancelled, and to order the instruction execution
circuit to rerun an error thread in which the hardware error
occurred for only one instruction.
2. The arithmetic device according to claim 1, wherein the
execution control circuit comprises for each of the threads: an
execution state control circuit to order the instruction execution
circuit to cancel execution of the thread so as to control an
execution state of the thread; and a retry control circuit to
detect a timing at which it would be possible to rerun the one
instruction on a basis of an execution state of the error thread in
the instruction execution circuit, and to order the instruction
execution circuit to rerun the one instruction.
3. The arithmetic device according to claim 1, wherein the
execution control circuit comprises a thread wait circuit to
monitor an execution state of the thread until all the threads are
cancelled, and to order the retry control circuit to let the
instruction execution circuit rerun the one instruction when an
execution state of all the threads is cancelled.
4. The arithmetic device according to claim 1, wherein the
execution control circuit comprises for each of the threads: an
execution state control circuit to order the instruction execution
circuit to cancel execution of the thread so as to control an
execution state of the thread; and a retry control circuit to
detect a timing at which it would be possible to rerun the one
instruction from an execution state of the error thread in the
instruction execution circuit, and to order the instruction
execution circuit to rerun the one instruction, and the execution
control circuit further comprises a thread wait circuit to order a
second execution state control circuit other than a first execution
state control circuit to cancel execution of the thread when
receiving a notification indicating that the first execution state
control circuit of which the hardware error is notified instructed
the instruction execution circuit to cancel execution of the error
thread, to monitor an execution state of the thread until all
threads are cancelled, and to order the retry control circuit to
let the instruction execution circuit rerun the one instruction
when an execution state of all the threads is cancelled.
5. The arithmetic device according to claim 2, wherein the retry
control circuit determines a timing at which it would be possible
to rerun the one instruction on a basis of a time of being notified
by the instruction execution circuit of completion of execution of
the error thread.
6. The arithmetic device according to claim 3, wherein the thread
wait circuit orders the execution state control circuit to cancel
execution of a normal thread other than the error thread when the
execution state control circuit orders the instruction execution
circuit to cancel the error thread in response to a notification of
the hardware error.
7. An instruction retry method for ordering an arithmetic device to
perform processing, the processing comprising: an instruction
execution processing of executing a plurality of threads, and
detecting a hardware error that occurs during the execution; a
retry order processing of ordering an instruction execution circuit
that performs the instruction execution processing to cancel
execution of all threads and of waiting until all the threads are
cancelled, and ordering the instruction execution circuit to rerun
an error thread in which the hardware error occurred for only one
instruction; and a retry processing of rerunning the error thread
in which the hardware error occurred only for one instruction in
response to the retry order.
8. The instruction retry method according to claim 7, wherein the
retry order processing comprises: ordering the instruction
execution circuit to cancel execution of the error thread and to
cancel execution of each normal thread other than the error thread;
waiting until the error thread and the normal thread are all
cancelled in the instruction execution circuit; and detecting a
timing at which it would be possible to rerun the one instruction
from an execution state of the error thread in the instruction
execution circuit, and ordering the instruction execution circuit
to rerun the one instruction.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of PCT application of
PCT/JP2007/000661, which was filed on Jun. 20, 2007.
FIELD
[0002] The present invention relates to an arithmetic device that
concurrently processes a plurality of threads, and that has a retry
function for a hardware instruction.
BACKGROUND
[0003] In processors for servers in which reliability is important,
instruction retry processing, which reruns in the hardware the
instruction being processed at that time, is performed when an
error is detected during the instruction processing. For example,
when errors such as the following occur, the processing may
continue without abnormally terminating the program by executing
the instruction retry:
[0004] (1) Errors that occur as the state of the insides of the
hardware temporarily changes due to alpha rays or the like
[0005] As this error is not caused by a failure of the hardware
itself, the possibility is very low that the same error would occur
when the instruction is rerun. Therefore, this type of error may
almost certainly be recovered by performing the instruction
retry.
[0006] (2) Errors that occur due to noise from the adjacent wiring
inside the hardware
[0007] When a signal line inside the processor is nearly damaged by
electromigration or the like, an error may occur in signal lines
adjacent to that signal line.
[0008] The possibility of recovering this type of error may be
increased by rerunning a single instruction, as the probability of
the adjacent wiring varying at the time of rerunning is greatly
decreased.
[0009] FIG. 1 is a diagram illustrating a conventional instruction
retry method.
[0010] A conventional arithmetic device 100 that has an instruction
retry mechanism includes an instruction execution circuit 101, an
execution state control circuit 102, and a retry control circuit
103, as illustrated in FIG. 1.
[0011] The instruction execution circuit 101 fetches an arbitrary
instruction from a storage device, and decodes the fetched
instruction. Then, the instruction execution circuit 101 performs
arithmetic operation on the basis of the decoded instruction.
Moreover, when executing the instruction, the instruction execution
circuit 101 sequentially signals an instruction for updating a
programmable resource, and checks the existence of an error in the
instruction execution. Furthermore, when the instruction execution
or the resource update is completed, the instruction execution
circuit 101 notifies the retry control circuit 103 of that
completion.
[0012] The execution state control circuit 102 orders the
instruction execution circuit 101 to cancel the instruction
execution. The retry control circuit 103 determines a timing at
which it would be possible for the instruction execution circuit
101 to perform an instruction retry, and controls the ON/OFF of a
flag (e.g., register) that indicates the determination of
performing the instruction retry. Then, when determining that it is
possible to perform a retry, the execution state control circuit
102 orders the instruction execution circuit 101 to execute a
single instruction.
[0013] When the instruction is for updating only one resource, the
instruction execution circuit 101 signals a notification of the
completion of an instruction execution and a resource update at the
same time. When the instruction is for updating the resource in two
or more cycles, the instruction execution circuit 101 signals a
notification of completing an instruction execution and a
notification of completing a resource update at different times. In
this case, the retry control circuit 103 determines that it is not
possible to perform a retry between the time of the completion of a
resource update and the time of the completion of instruction
execution.
[0014] In the above configuration, retry processing is performed as
in the following.
[0015] (1) When detecting an error while executing an instruction,
the instruction execution circuit 101 notifies the execution state
control circuit 102 and the retry control circuit 103 of the
occurrence of the error.
[0016] (2) When receiving a notification of the occurrence of the
error from the instruction execution circuit 101, the execution
state control circuit 102 instantly orders the instruction
execution circuit 101 to cancel the instruction execution in order
to prevent the updating of resources from being performed using
error data.
[0017] (3) When receiving a notification of the occurrence of the
error from the instruction execution circuit 101, the retry control
circuit 103 determines whether it is possible to perform a retry.
If it is determined that it is possible to perform a retry, the
retry control circuit 103 sets a flag indicating that it is
possible to perform an instruction retry, and orders the
instruction execution circuit 101 to rerun the instruction.
[0018] On the other hand, (4) when receiving an order cancelling
the instruction execution from the execution state control circuit
102, the instruction execution circuit 101 clears all the
processing in the instruction execution circuit 101. Moreover, when
the order cancelling the instruction execution from the execution
state control circuit 102 is negated, the instruction execution
circuit 101 reruns the instruction in accordance with the order
from the retry control circuit 103.
[0019] (5) When the rerunning of the instruction is completed, the
instruction execution circuit 101 notifies the retry control
circuit 103 of the completion of the instruction execution.
[0020] (6) When receiving a notification of completing the
instruction execution from the instruction execution circuit 101,
the retry control circuit 103 resets the flag that indicates that
it is possible to perform the instruction retry processing, and
negates the order of the rerunning to the instruction execution
circuit 101.
[0021] (7) When the order of the rerunning is negated, the
instruction execution circuit 101 completes the retry processing,
and resumes normal instruction execution processing.
[0022] As processors in which performance is important, a processor
is proposed in which the performance is improved by concurrently
processing a sequence of instructions for two or more threads.
[0023] For example, processors are proposed that use a method
called "fine grained vertical multi-threading" that performs a
sequence of instructions for a thread different for every cycle, or
a method called "simultaneous multi-threading" that performs a
sequence of instructions for two or more threads at the same time.
Those methods realize the concurrent processing of a sequence of
instructions for two or more threads using the instruction
execution circuit.
[0024] As high performance and high reliability are required in
processors for servers, both high performance in processing a
sequence of instructions for two or more threads at the same time
and high reliability in performing retry processing upon the
occurrence of an error are required.
[0025] As a method of performing retry processing in a processor
that processes a sequence of instructions for two or more threads,
the following two methods are possible.
[0026] (A) A method in which only one instruction retry mechanism
of a processor that processes a sequence of instructions for a
single thread, as in the conventional art, is provided for the
processor, wherein the mechanism is common to all the threads.
[0027] (B) A method in which an instruction retry mechanism of a
processor that processes a sequence of instructions for a single
thread, as in the conventional art, is provided for each thread of
the processor.
[0028] In method (A), however, it is not possible to perform an
instruction retry if any one of the two or more threads being
processed is in a state unable to perform a retry at the time of
detecting the occurrence of an error. In other words, the greater
the number of threads there are, the greater the possibility that
it will be determined to be not possible to perform an instruction
retry. Accordingly, the success rate of retries becomes lower than
that of a processor for a single thread.
[0029] In method (B), an instruction retry is performed for every
thread. In other words, a sequence of instructions for threads in
which no error is detected is normally performed while instruction
retry processing is being performed due to the detection of an
error in another thread. Accordingly, in comparison to a processor
that processes a single thread, there will be an increased circuit
size for the circuit while processing an instruction retry.
Therefore, when there is an error due to the noise from the other
wiring, the success rate of retries becomes lower than that of a
processor for single thread.
[0030] In relation to the technique described in the above, in
Patent Document 1 an information processing device is disclosed
that achieves the instruction retry function of a high quality by
configuring the device such that an instruction retry is repeatedly
performed and thereby verification is made.
[0031] In Patent Document 2, an information processing device is
disclosed in which a command that accesses operand data two or more
times is divided into commands which each access the operand data
only one time, and in which when an error has occurred during the
execution, only that command is rerun.
[0032] Patent Document 1: Japanese Laid-open Patent Publication No.
2006-040174
[0033] Patent Document 2: U.S. Pat. No. 5,564,014
SUMMARY
[0034] The present invention has been made in view of the
above-described problems, and an object of the present invention is
to provide a processor capable of concurrently processing a
sequence of instructions for a plurality of threads achieving a
retry success rate equivalent to the success rate in processors
that process a sequence of instruction for a single thread.
[0035] In order to solve the above problems, an arithmetic device
according to the present invention is provided with an instruction
execution circuit to execute a plurality of threads in parallel,
and to detect and notify a hardware error that occurs during the
execution, and an execution control circuit to order the
instruction execution circuit to cancel execution of all threads
and wait until all the threads are cancelled, and to order the
instruction execution circuit to rerun an error thread in which the
hardware error occurred for only one instruction.
[0036] According to the present invention, the execution control
circuit cancels the execution of all the threads, and waits until
all the threads are cancelled. When all the threads are cancelled,
the execution control circuit orders the instruction execution
circuit to rerun only one instruction.
[0037] Accordingly, the other circuits excluding the circuit
operated by the rerunning instruction are not operated, and thus it
is possible to prevent an error due to the noise from the other
circuits from occurring.
[0038] Moreover, the execution control circuit waits until all the
threads are cancelled, and orders the instruction execution circuit
to perform rerunning, and thus the instruction execution circuit
may rerun the instructed thread without fail.
[0039] As a result, the success rate of rerunning may be improved
when a hardware error occurs in the processor capable of
concurrently processing a sequence of instructions for two or more
threads.
[0040] As described in the above, according to the present
invention, a processor capable of concurrently processing a
sequence of instructions for two or more threads and achieving a
retry success rate equivalent to the success rate in processors
that process a sequence of instruction for a single thread may be
provided.
BRIEF DESCRIPTION OF DRAWINGS
[0041] FIG. 1 is a diagram illustrating a conventional instruction
retry method.
[0042] FIG. 2 is a diagram illustrating the operating principle of
an arithmetic device according to the present embodiment.
[0043] FIG. 3 is a diagram illustrating a general outline of the
configuration of an arithmetic device as a whole according to the
present embodiment.
[0044] FIG. 4 is a diagram illustrating a configuration example of
an execution state control circuit according to the present
embodiment.
[0045] FIG. 5 is a diagram illustrating a configuration example of
a retry control circuit according to the present embodiment.
[0046] FIG. 6 is a diagram illustrating a configuration example of
a thread wait circuit according to the present embodiment.
[0047] FIG. 7 is a diagram illustrating a configuration example of
an instruction execution circuit according to the present
embodiment.
[0048] FIG. 8 is a flowchart illustrating the operation of an
arithmetic device according to the present embodiment.
DESCRIPTION OF EMBODIMENTS
[0049] Some embodiments of the present invention will be described
with reference to FIG. 2 through FIG. 8.
[0050] FIG. 2 is a diagram illustrating the operating principle of
an arithmetic device 200 according to the present embodiment.
[0051] The arithmetic device 200 of FIG. 2 includes an instruction
execution circuit 201 for executing a plurality of threads, and an
execution control circuit 202 for controlling the execution state
or the return of the threads.
[0052] The instruction execution circuit 201 is a circuit for
processing in parallel a plurality of threads including more than
one sequence of instructions. For example, the instruction
execution circuit 201 signals instructions such as an instruction
fetch, instruction decoding, execution of an arithmetic operation,
instruction completion, and programmable resource updating, to the
threads, or checks for a hardware error while performing
instruction execution.
[0053] Then, when detecting a hardware error while performing
instruction execution for the threads, the instruction execution
circuit 201 notifies the execution control circuit 202 of the
hardware error (hereinafter, this notification is simply referred
to as an "error notification").
[0054] Moreover, when receiving an order cancelling the execution
of a specific thread from the execution control circuit 202
(hereinafter, this instruction is referred to as a "cancel order"),
the instruction execution circuit 201 cancels the execution of that
thread.
[0055] Furthermore, the instruction execution circuit 201 reruns
the thread to which an order resetting the cancellation of
execution (hereinafter, this order is referred to as a "cancel
order cancelation") is signaled from an execution state control
circuit 302 (not illustrated in FIG. 2) for only one instruction,
where the thread comes from among the specific threads to which a
rerun order is signaled from the execution control circuit 202
(hereinafter, this order is referred to as a "retry order").
[0056] When receiving an error notification from the instruction
execution circuit 201, the execution control circuit 202 signals a
cancel order for the thread that was being executed when a hardware
error occurred (hereinafter, this thread is referred to as an
"error thread"), and also signals a retry order for that error
thread.
[0057] At this time, the execution control circuit 202 orders the
instruction execution circuit 201 to cancel threads other than the
error thread (hereinafter, these threads are referred to as "normal
threads").
[0058] Then, the execution control circuit 202 monitors the
execution state of the threads in the instruction execution circuit
201, and waits until the execution of all the threads is
cancelled.
[0059] Once the execution of all the threads has been cancelled,
the execution control circuit 202 signals a cancel order
cancelation to the instruction execution circuit 201 so as to cause
error threads to rerun the error threads for only one instruction.
The execution control circuit 202 reruns the threads one by one,
and completes the retry processing when the rerunning of all the
error threads is completed.
[0060] FIG. 3 is a diagram illustrating a general outline of the
configuration of an arithmetic device 200 as a whole according to
the present embodiment.
[0061] As illustrated in FIG. 3, the arithmetic device 200 includes
an instruction execution circuit 201 for executing a plurality of
threads, an execution control circuit 202 for controlling the
execution state or the rerunning of the threads, and programmable
resources 301-1, 301-2, . . . , 301-n ("n" is a natural number;
hereinafter, an arbitrary programmable resource is referred to as a
"programmable resource 301") for executing the threads.
[0062] Furthermore, the execution control circuit 202 includes
execution state control circuits 302-1, 302-2, . . . , 302-n for
controlling the execution state of the threads, and retry control
circuits 303-1, 303-2, . . . , 303-n for controlling the rerunning
of the threads, for each of the threads, and further includes a
wait circuit 304 for waiting until all the threads are cancelled in
the retry execution.
[0063] One of the execution state control circuits 302-1 to 302-n,
selected arbitrarily, is referred to as an "execution state control
circuit 302". Similarly, one of the execution state control
circuits 303-1 to 303-n, selected arbitrarily, is referred to as a
"retry control circuit 303".
[0064] As described in FIG. 2, the instruction execution circuit
201 processes in parallel a plurality of threads including more
than one sequence of instructions. For example, the instruction
execution circuit 201 signals instructions such as an instruction
fetch, instruction decoding, execution of an arithmetic operation,
instruction completion, and programmable resource updating, to the
threads, or checks for a hardware error while performing
instruction execution.
[0065] When updating the programmable resource 301 while executing
the instructions or at the time of completing the instruction
execution, the instruction execution circuit 201 notifies the retry
control circuit 303 that manages the threads used for the updated
programmable resource 301 of the updating of the resource
(hereinafter, this notification is referred to as a "resource
update notification").
[0066] When completing the instruction execution of an arbitrary
thread, the instruction execution circuit 201 notifies the retry
control circuit 303 that manages the thread and the thread wait
circuit 304 of the completion of the instruction execution
(hereinafter, this notification is referred to as an "instruction
completion notification").
[0067] Moreover, when detecting a hardware error while performing
instruction execution, the instruction execution circuit 201
signals an error notification to the execution state control
circuit 302 that manages the error thread.
[0068] Furthermore, when receiving a cancel order from the
execution state control circuit 302, the instruction execution
circuit 201 clears (cancels) all the instruction execution
processing of the threads that are managed by that execution state
control circuit 302, and terminates the updating of the
programmable resource of that thread. Then, the instruction
execution circuit 201 maintains this state until a cancel
cancellation notification is signaled from the execution state
control circuit 302.
[0069] Moreover, the instruction execution circuit 201 reruns the
thread to which a cancel cancellation notification is signaled from
an execution state control circuit 302 for only one instruction,
where the thread is from among the threads to which a retry order
is signaled from the retry control circuit 303.
[0070] When receiving an error notification from the instruction
execution circuit 201 or receiving a cancel order from the thread
wait circuit 304, the execution state control circuit 302 signals a
cancel order to the instruction execution circuit 201.
[0071] At this time, the execution state control circuit 302
notifies the thread wait circuit 304 that the thread is in the
cancelled state.
[0072] Moreover, when receiving a cancel order cancelation from the
thread wait circuit 304, the execution state control circuit 302
signals a cancel order cancelation for the thread to the
instruction execution circuit 201, though this is not
illustrated.
[0073] When receiving an error notification from the instruction
execution circuit 201, the retry control circuit 303 determines
whether it is at a timing at which it would be possible to retry
the thread on the basis of the instruction completion notification
or resource update notification from the instruction execution
circuit 201. When it is determined to be at a timing at which it
would be possible to retry the thread (hereinafter, this state is
referred to as a "retry determined state"), the retry control
circuit 303 signals a retry order to the instruction execution
circuit 201, and notifies the thread wait circuit 304 that the
thread is in a retry determined state (hereinafter, this
notification is referred to as a "retry determination
notification").
[0074] When receiving a retry determination notification from the
retry control circuit 303, the thread wait circuit 304 signals a
cancel order to the execution state control circuit 302 of all the
threads to which an instruction completion notification is signaled
from the instruction execution circuit 201.
[0075] Then, the thread wait circuit 304 monitors the cancelled
state notification from the execution state control circuit 302
that manages the threads, and waits until all the threads are
cancelled.
[0076] Then, the thread wait circuit 304 selects one thread out of
the threads that are in the retry determined state, and signals a
cancel order cancelation to the execution state control circuit 302
that manages the thread. The execution state control circuit 302
that received the cancel order cancelation signals the cancel order
cancelation to the instruction execution circuit 201 as described
in the above (not illustrated in FIG. 3), and thus retry processing
is performed for the thread only for one instruction (this
processing is performed for all the threads that are in the retry
determined state).
[0077] Then, once the retry processing of all the threads that are
in the retry determined state is completed, the thread wait circuit
304 signals a cancel order cancelation to all the execution state
control circuits 302, thereby completing the retry processing.
[0078] Specific configuration examples of each circuit will now be
described.
[0079] FIG. 4 is a diagram illustrating a configuration example of
the execution state control circuit 302 according to the present
embodiment.
[0080] The execution state control circuit 302 of FIG. 4 includes a
logical sum circuit a for performing logical operation, and a
register (e.g., RS-FF) 401 capable of holding/transiting the
state.
[0081] The logical sum circuit a has inputs of an error
notification from the instruction execution circuit 201 and a
cancel order from the thread wait circuit 304. The register 401 has
inputs of an output from the logical sum circuit a on a setting
side, and a cancel order cancelation from the thread wait circuit
304 on a resetting side.
[0082] The logical sum circuit a outputs "1" when the error
notification is "1" or the cancel order is "1", and thereby "1" is
set to the register 401. Then the register 401 holds the set value.
The state of the register 401 at this time is referred to as
"cancelled state". Further, the cancel order cancelation becomes
"1", and the register 401 is set to "0" (the cancelled state is
reset).
[0083] An output of the register 401 is input into the instruction
execution circuit 201 and the thread wait circuit 304. The output
to the instruction execution circuit 201 is a cancel order (cancel
order cancelation), and the output to the thread wait circuit 304
is a cancelled state notification.
[0084] Accordingly, when receiving an error notification from the
instruction execution circuit 201 or a cancel order from the thread
wait circuit 304, the execution state control circuit 302 is set to
the cancelled state, and signals a cancel order to the instruction
execution circuit 201 and also signals a cancelled state
notification to the thread wait circuit 304. Moreover, when
receiving a cancel order cancelation from the thread wait circuit
304, the execution state control circuit 302 is reset from the
cancelled state, and signals a cancel order cancelation to the
instruction execution circuit 201.
[0085] FIG. 5 is a diagram illustrating a configuration example of
the retry control circuit 303 according to the present
embodiment.
[0086] The retry control circuit 303 of FIG. 5 includes registers
(e.g., RS-FF) 501 and 502 capable of holding/transiting the state,
and also logical product circuit b, non-conjunction circuit c, and
negation circuit d, which perform logical operation.
[0087] The register 501 has inputs of an instruction completion
notification from the instruction execution circuit 201 on a
setting side, and a resource update notification from the
instruction execution circuit 201 on a resetting side. Moreover,
the register 502 has inputs of an output from the logical product
circuit b on a setting side, and an instruction completion
notification from the instruction execution circuit 201 on a
resetting side. The logical product circuit b has inputs of an
output from the register 501, an output from the non-conjunction
circuit c, and an error notification from the instruction execution
circuit 201. Furthermore, the non-conjunction circuit c has inputs
of an output from the negation circuit d that has an input of an
instruction completion notification from the instruction execution
circuit 201, and a resource update notification from the
instruction execution circuit 201.
[0088] The register 501 is set to "1" when the instruction
completion notification becomes "1", and is set to "0" when the
resource update notification becomes "1". Accordingly, the register
501 is reset when a resource update notification is signaled except
the timing of the instruction completion, and is set when an
instruction completion notification is signaled. Hereinafter, this
register 501 is referred to as RETRY_POINT.
[0089] The logical product circuit b outputs "1" only when the
RETRY_POINT is "1", the output of the non-conjunction circuit c is
"1", and the error notification is "1".
[0090] The non-conjunction circuit c outputs "1" except when the
instruction completion notification is "0" and the resource update
notification is "1". Accordingly, the non-conjunction circuit c
outputs "1" unless the resource update notification is input.
[0091] The register 502 is set to "1" when the output of the
logical product circuit b becomes "1", and is set to "0" when the
instruction completion notification becomes "1". Hereinafter, this
register 502 is referred to as RETRY_TGR, and it is expressed as
"retry is determined" when the RETRY_TGR is set to "1".
[0092] The output of the RETRY_TGR is input to the instruction
execution circuit 201 and the thread wait circuit 304. The output
to the instruction execution circuit 201 is a retry order, and the
output to the thread wait circuit 304 is a retry determination
notification.
[0093] As described in the above, the retry control circuit 303
determines a retry if an error notification is signaled between the
recipients of the instruction completion notification and the
resource update notification, and resets the retry determination if
an instruction completion notification is signaled.
[0094] FIG. 6 is a diagram illustrating a configuration example of
the thread wait circuit 304 according to the present
embodiment.
[0095] The thread wait circuit 304 of FIG. 6 includes a cancel
order unit 601 for signalling a cancel order to the execution state
control circuit 302 that manages the threads, and a cancel order
cancelation unit 602 for signalling a cancel order cancelation to
the execution state control circuit 302 that manages the
threads.
[0096] The cancel order unit 601 includes a cancel order unit 603-1
for signalling a cancel order to thread 1, a cancel order unit
603-2 for signalling a cancel order to thread 2, . . . , and a
cancel order unit 603-n for signalling a cancel order to thread
n.
[0097] For example, the cancel order unit 603-1 includes logical
sum circuit e, which has an input of the RETRY_TGR from all the
threads (threads 2 to n) other than thread 1, and logical product
circuit f, which has inputs of the instruction completion of thread
1 from the instruction execution circuit 201 and the output from
logical sum circuit e.
[0098] The logical sum circuit e outputs "1" when the RETRY_TGR of
at least more than one thread other than thread 1 becomes "1". The
logical product circuit f outputs "1" when the instruction
completion notification of thread 1 from the instruction execution
circuit 201 is "1" and the output of the logical sum circuit e is
"1".
[0099] Accordingly, the cancel order unit 603-1 outputs a cancel
order for thread 1 when a retry is determined for a thread other
than thread 1 and the instruction for thread 1 is completed.
[0100] Similarly, the cancel order unit 603-m (m is natural number)
includes logical sum circuit e, which has an input of the RETRY_TGR
from all the threads (threads 1 to m-1 and thread m+1 to n) other
than thread m, and logical product circuit f, which has inputs of
the instruction completion of thread m from the instruction
execution circuit 201 and the output from logical sum circuit
e.
[0101] The logical sum circuit e outputs "1" when the RETRY_TGR of
at least more than one thread other than thread m becomes "1", and
the logical product circuit f outputs "1" when the instruction
completion notification of thread m from the instruction execution
circuit 201 is "1" and the output of the logical sum circuit e is
"1".
[0102] Accordingly, the cancel order unit 603-m outputs a cancel
order for thread m when a retry is determined for a thread other
than thread m and the instruction for thread m is completed.
[0103] The cancel order cancelation unit 602 includes a cancel
order cancelation unit 604-1 for signalling a cancel order
cancelation for thread 1, a cancel order cancelation unit 604-2 for
signalling a cancel order cancelation for thread 2, . . . , and a
cancel order cancelation unit 604-n for signalling a cancel order
cancelation for thread n, to the execution state control circuit
302, and a wait unit 605 for waiting until all the threads are
cancelled.
[0104] The cancel order cancelation unit 604-1 includes a negation
circuit c1 whose input is the RETRY_TGR of thread 1, a logical sum
circuit d whose input is the RETRY_TGR of threads 2-n, a logical
product circuit e whose inputs are the outputs of the negation
circuit c1 and the logical sum circuit d, a logical sum circuit f
whose inputs are the outputs of the logical product circuit e and a
wait unit 605, and a negation circuit g whose input is the output
of the logical sum circuit f.
[0105] The negation circuit c1 outputs "1" when the RETRY_TGR of
thread 1 is "0". The logical sum circuit d outputs "1" when the
RETRY_TGR of the threads other than thread 1 is "0". The logical
product circuit e outputs "1" to the logical sum circuit f only
when the negation circuit c1 is "1" and the logical sum circuit d
is "1".
[0106] Accordingly, the cancel order cancelation unit 604-1 signals
a cancel order cancelation to thread 1 when instruction retry of
the threads other than thread 1 is determined and also the wait
process by the wait unit 605 is completed.
[0107] Similarly, a cancel order cancelation unit 604-m includes
negation circuits c1, c2, . . . , cm whose input is the RETRY_TGR
of thread 1, 2, . . . , m, the logical sum circuit d whose input is
the RETRY_TGR of thread m+1, m+2, . . . , n, the logical product
circuit e whose inputs are the outputs of the negation circuit c1,
c2, . . . , cm, and the output of the logical sum circuit d, the
logical sum circuit f whose inputs are the outputs of the logical
product circuit e and the output of the wait unit 605, and the
negation circuit g whose input is the output of the logical sum
circuit f.
[0108] Each of the negation circuits c1, c2, . . . , cm outputs "1"
when the RETRY_TGR of threads 1, 2, . . . , m is "0". The logical
sum circuit d outputs "1" when the RETRY_TGR of threads other than
the threads c1, c2, . . . , cm is "1". The logical product circuit
e outputs "1" to the logical sum circuit f only when the negation
circuits c1, c2, . . . , cm are "1" and also the RETRY_TGR of
threads m+1, m+2, . . . , n is "1".
[0109] Accordingly, the cancel order cancelation unit 604-m signals
a cancel order cancelation to thread m when the instruction retry
of the threads other than threads 1, 2, . . . , m is determined and
also the wait process by of the wait unit 605 is completed.
[0110] The wait unit 605 includes an logical product circuit h
whose input is the cancelled state notification of threads 1, 2, .
. . , n from the respective execution state control circuit 302, a
negation circuit i whose input is the output of the logical product
circuit h, an logical sum circuit j whose input is the RETRY_TGR of
threads 1, 2, . . . , n, and an logical product circuit k whose
inputs are the outputs of the negation circuit i and the logical
sum circuit j.
[0111] The wait unit 605 outputs "0" to the logical sum circuit f
only when the RETRY_TGR of at least one or more thread out of
threads 1, 2, . . . , n becomes "1" and also the cancelled state
notification of all the threads becomes "1". In other cases, the
wait unit 605 outputs "1" to the logical sum circuit f.
[0112] Accordingly, the wait unit 605 waits until all the threads
are cancelled when there is a retry of at least one or more thread
out of threads 1, 2, . . . , n. Then, when all the threads are
cancelled, the wait unit 605 allows the cancel order cancelation
for the threads to be signaled.
[0113] FIG. 7 is a diagram illustrating a configuration example of
the instruction execution circuit 201 according to the present
embodiment.
[0114] The instruction execution circuit 201 of FIG. 7 includes an
instruction fetch control circuit 701 for controlling fetch
processing of a sequence of instructions for the threads, an
instruction buffer 702 for temporarily storing the fetched sequence
of instructions, an instruction decoder 703 for decoding the
sequence of instructions, a branch instruction control circuit 704
for, for example, computing the branch address of a branch
instruction, an arithmetic unit 705 for performing an arithmetic
operation in accordance with the instruction, and an instruction
commitment control circuit 706 for ordering updating of the
programmable resource by completing the instructions in the order
of the sequence of instructions.
[0115] The instruction fetch control circuit 701 orders a cache
circuit 707 to fetch a sequence of instructions for each thread,
and stores the fetched sequence of instructions in the instruction
buffer 702. Moreover, when a retry order is signaled from the retry
control circuit 303 and also the cancelled state is reset by a
cancel order cancelation from the execution state control circuit
302, the instruction fetch control circuit 701 orders the cache
circuit 707 to fetch the sequence of instructions for the
thread.
[0116] The instruction decoder 703 fetches and decodes a sequence
of instructions for each thread from the instruction buffer 702,
and issues an instruction to the branch control circuit 704, the
arithmetic unit 705, the instruction commitment control circuit
706, or the like, in accordance with a result of decoding.
Moreover, when a retry order is received from the retry control
circuit 303 and also the cancelled state is cancelled by a cancel
order cancelation from the execution state control circuit 302, the
instruction decoder 703 fetches an instruction from the instruction
buffer 702 and reruns the fetched instruction for only one
instruction.
[0117] The branch control circuit 704 performs branch instruction
processing such as computing of a branch address or determination
of a branch direction of the branch instruction for a plurality of
threads, and subsequently notifies the instruction commitment
control circuit 706 of the completion of branch instruction
processing.
[0118] The arithmetic unit 705 performs arithmetic operation in
accordance with the instruction for the threads, and notifies the
instruction commitment control circuit 706 of the completion of the
arithmetic operation.
[0119] When completing all the processing necessary for the
instruction after receiving a completion notification of the
processing from the branch control circuit 704 or the arithmetic
unit 705, the instruction commitment control circuit 706 completes
the instruction in the order of the sequence of instructions. Then,
the instruction commitment control circuit 706 notifies the retry
control circuit 303 and the thread wait circuit 304 of the
instruction completion. Furthermore, the instruction commitment
control circuit 706 signals an instruction of updating the
programmable resource as necessary, and notifies the retry control
circuit 303 of the resource update.
[0120] The circuits in the above-described instruction execution
circuit 201 are provided with an error detection circuit for
detecting a hardware error (not illustrated in FIG. 7). Then, when
detecting a hardware error while executing an instruction, the
error detection circuit signals an error notification to the
execution state control circuit 302 and the retry control circuit
303 that manage an error thread.
[0121] When receiving a cancel order from the execution state
control circuit 302, the respective circuits of the instruction
fetch control circuit 701, the instruction buffer 702, the
instruction decoder 703, the branch instruction control circuit
704, the arithmetic unit 705, and the instruction commitment
control circuit 706 stop the processing of the thread that is
managed by that execution state control circuit 302, and thereby
clear the state (i.e., the thread is cancelled).
[0122] Furthermore, when receiving a retry order from the retry
control circuit 303 and receiving a cancel order cancelation from
the execution state control circuit 302 as described in the above,
the instruction execution circuit 201 performs the retry processing
of the thread that is managed by the execution state control
circuit 302 and the retry control circuit 303. Description in
detail of the processing in which execution is performed only for
one instruction is omitted as the functions generally provided for
arithmetic devices may be used.
[0123] FIG. 8 is a flowchart illustrating the operation of the
arithmetic device 200 according to the present embodiment.
[0124] In step S801, when detecting a hardware error while
instruction execution of the threads is being performed, the
instruction execution circuit 201 signals an error notification to
the execution state control circuit 302 and the retry control
circuit 303 that manage an error thread.
[0125] In step S802, the execution state control circuit 302 to
which an error notification is signaled signals to the instruction
execution circuit 201 a cancel order of the instruction execution
of the thread that is managed on its own.
[0126] In step S803, the retry control circuit 303 to which an
error notification is signaled determines whether or not it is
possible to perform the retry of the thread that is managed on its
own. When it is determined that it is not possible to perform the
retry, the retry control circuit 303 shifts the processing to step
S804.
[0127] In step S804, the arithmetic device 200 sets an error
trapping, and executes processing such as error processing (e.g.,
interrupt handler) by using software that is being executed.
[0128] In step S803, when it is determined to be possible to
perform the retry, the retry control circuit 303 shifts the
processing to step S805.
[0129] In step S805, the retry control circuit 303 sets "1" to the
RETRY_TGR so as to set the retry to be in the determined state.
[0130] In step S806, the thread wait circuit 304 determines whether
or not the threads are cancelled. When there are some threads that
are not cancelled, the thread wait circuit 304 shifts the
processing to step S807.
[0131] In step S807, the thread wait circuit 304 determines whether
or not the instruction execution of the normal threads that are not
cancelled is completed. Unless the instruction execution is
completed, the processing of step S807 is recursively executed.
[0132] In step 807, when receiving an instruction completion
notification from the normal threads that are not cancelled, the
thread wait circuit 304 shifts the processing to step S808. In step
S808, the thread wait circuit 304 orders the execution state
control circuit 302 to cancel the instruction execution of the
thread. Then, the thread wait circuit 304 shifts the processing to
step S806.
[0133] In steps S806-S808 described above, the thread wait circuit
304 waits until all the threads are cancelled.
[0134] In step S806, when the instruction execution of all the
threads is cancelled, the thread wait circuit 304 shifts the
processing to step S809.
[0135] In step S809, the thread wait circuit 304 selects one thread
out of the threads in which "1" is set to the RETRY_TGR in the
processing of step S805. Hereinafter, the selected thread is
referred to as a "selection thread".
[0136] In step S810, the thread wait circuit 304 signals a cancel
order cancelation to the execution state control circuit 302 that
manages a selection thread.
[0137] In step S811, when the cancellation of the selection thread
is reset, the instruction execution circuit 201 reruns the
selection thread for only one instruction. Then, the instruction
execution circuit 201 shifts the processing to step S812.
[0138] In step S812, the retry control circuit 303 determines
whether or not the rerunning is completed. For example, the retry
control circuit 303 detects an instruction completion notification
of the selection thread transmitted from the instruction execution
circuit 201.
[0139] When the instruction completion notification is not
detected, the retry control circuit 303 recursively executes the
processing of step S812.
[0140] In step S812, when the instruction completion notification
is detected, the retry control circuit 303 determines that the
rerunning is completed and thereby shifts the processing to step
S813.
[0141] In step S813, the retry control circuit 303 resets the
RETRY_TGR set in step S805 to "0" to cancel the determined state of
the retry.
[0142] In step S814, the thread wait circuit 304 determines whether
or not the rerun processing of S809-S813 is completed for all the
threads in which the instruction retry is set to be in the
determined state in step S805.
[0143] When threads that are not rerun exist, the thread wait
circuit 304 shifts the processing to step S815 and orders the
execution state control circuit 302 to cancel the instruction
execution of the selection thread. Then, the thread wait circuit
304 performs the processing of steps S809-S814.
[0144] In step S814, when it is determined that the rerun
processing of S809-S813 is completed for all the threads in which
the retry is determined in step S805, the thread wait circuit 304
shifts the processing to step S816.
[0145] In step S816, the thread wait circuit 304 orders all the
execution state control circuits 302 to reset the cancelation of
the threads. Then, the thread wait circuit 304 shifts the
processing to step S817.
[0146] In step S817, the instruction execution circuit 201 starts
the normal processing for all the threads.
[0147] As described in the above, when a hardware error occurs
while the instruction execution is being performed in the
instruction execution circuit 201, the arithmetic device 200
cancels the execution of all the threads. Then, the arithmetic
device 200 waits until all the threads are cancelled. Then, when
all the threads are cancelled, the arithmetic device 200 reruns the
error thread for only one instruction.
[0148] As the arithmetic device 200 reruns one error thread with
all the threads cancelled, the arithmetic device 200 may, for
example, avoid a hardware error due to the noise from the other
wiring.
[0149] Moreover, the arithmetic device 200 performs the rerunning
after waiting until all the threads are cancelled, and thus may
rerun the error thread without fail.
[0150] As a result, a high retry success rate may be achieved.
[0151] Moreover, as the retry success rate becomes high, the
reliability of the arithmetic device 200 may be improved.
[0152] Furthermore, as the size of circuits that operate during the
retry processing may be reduced, the arithmetic device 200 may
successfully perform the retry in a similar way as a processor that
processes a sequence of instructions for a single thread even if
the processor concurrently processes a sequence of instructions for
a plurality of threads.
* * * * *