U.S. patent application number 13/086629 was filed with the patent office on 2012-04-26 for reconfigurable processor and method for processing a nested loop.
This patent application is currently assigned to Samsung Electronics Co., Ltd.. Invention is credited to Hee-Jin Ahn, Min-Wook Ahn, Bernhard Egger, Tai-Song Jin, Won-Sub Kim, Jin-Seok Lee, Dong-Hoon Yoo.
Application Number | 20120102496 13/086629 |
Document ID | / |
Family ID | 45974099 |
Filed Date | 2012-04-26 |
United States Patent
Application |
20120102496 |
Kind Code |
A1 |
Ahn; Min-Wook ; et
al. |
April 26, 2012 |
RECONFIGURABLE PROCESSOR AND METHOD FOR PROCESSING A NESTED
LOOP
Abstract
A reconfigurable processor which merges an inner loop and an
outer loop which are included in a nested loop and allocates the
merged loop to processing elements in parallel, thereby reducing
processing time to process the nested loop. The reconfigurable
processor may extract loop execution frequency information from the
inner loop and the outer loop of the nested loop, and may merge the
inner loop and the outer loop based on the extracted loop execution
frequency information.
Inventors: |
Ahn; Min-Wook; (Seoul,
KR) ; Yoo; Dong-Hoon; (Seoul, KR) ; Lee;
Jin-Seok; (Seoul, KR) ; Egger; Bernhard;
(Seoul, KR) ; Jin; Tai-Song; (Seoul, KR) ;
Kim; Won-Sub; (Anyang-si, KR) ; Ahn; Hee-Jin;
(Seoul, KR) |
Assignee: |
Samsung Electronics Co.,
Ltd.
Suwon-si
KR
|
Family ID: |
45974099 |
Appl. No.: |
13/086629 |
Filed: |
April 14, 2011 |
Current U.S.
Class: |
718/102 ;
712/241; 712/E9.045 |
Current CPC
Class: |
G06F 9/5066
20130101 |
Class at
Publication: |
718/102 ;
712/241; 712/E09.045 |
International
Class: |
G06F 9/46 20060101
G06F009/46; G06F 9/38 20060101 G06F009/38 |
Foreign Application Data
Date |
Code |
Application Number |
Oct 21, 2010 |
KR |
10-2010-0103095 |
Claims
1. A reconfigurable processor for processing a nested loop, the
reconfigurable processor comprising: an extracting unit configured
to extract loop execution frequency information from each of an
inner loop and an outer loop which are included in the nested loop;
and a loop merging unit configured to merge the inner loop and the
outer loop based on the extracted loop execution frequency
information.
2. The reconfigurable processor of claim 1, further comprising: a
scheduler configured to allocate a command of the inner loop to at
least one of a plurality of processing elements and to allocate a
command of the outer loop to at least one of the remaining
processing elements which have not been allocated the command of
the inner loop.
3. The reconfigurable processor of claim 1, wherein the loop
merging unit is further configured to generate third loop execution
frequency information based on first loop execution frequency
information of the inner loop and second loop execution frequency
information of the outer loop, and to generate a loop control
command to control commands of the merged loop to be executed as
many times as a value of the third loop execution frequency
information.
4. The reconfigurable processor of claim 1, wherein the loop
merging unit is further configured to copy a command of the inner
loop and/or a command of the outer loop using a loop peeling
technique.
5. The reconfigurable processor of claim 1, further comprising: a
plurality of processing elements, each comprising a frequency
register file configured to store the first loop execution
frequency information, and a counter configured to increase a
counter value at each occasion of a predefined event, and to output
the counter value, wherein the loop merging unit is further
configured to generate a storage command to store the first loop
execution frequency information in the frequency register file.
6. The reconfigurable processor of claim 5, wherein the loop
merging unit is further configured to insert an execution operand
into the command of the outer loop, the execution operand
comprising location information indicating where the first loop
execution frequency information is stored in the frequency register
file.
7. The reconfigurable processor of claim 6, wherein each of the
processing elements is further configured to determine whether the
execution operand is present in the allocated command, and in
response to determining that the execution operand is present, to
execute the allocated command each time the first loop execution
frequency information stored in the frequency register file becomes
the same as the counter value.
8. The reconfigurable processor of claim 5, wherein the counter is
further configured to increase the counter value each time a
command included in the loop is executed or each time the loop is
executed.
9. The reconfigurable processor of claim 3, wherein the loop
merging unit is further configured to extract the first loop
execution frequency information and the second loop execution
frequency information from loop control commands of the inner loop
and outer loop, respectively.
10. The reconfigurable processor of claim 3, wherein the value of
third loop execution frequency information is the product of a
value of the first loop execution frequency information and a value
of the second loop execution frequency information.
11. The reconfigurable processor of claim 1, further comprising: a
scheduler configured to allocate a command of the inner loop and a
command of the outer loop to a plurality of processing elements,
wherein the plurality of processing elements are configured to
process the command of the inner loop and the command of the outer
loop in parallel.
12. A method of processing a nested loop, the method comprising:
extracting loop execution frequency information from each of an
inner loop and an outer loop which are included in the nested loop;
and merging the inner loop and the outer loop of the nested loop
based on the extracted loop execution frequency information.
13. The method of claim 12, further comprising: allocating a
command of the inner loop to at least one of a plurality of
processing elements; and allocating a command of the outer loop to
at least one of the remaining processing elements which have not
been allocated the command of the inner loop.
14. The method of claim 12, wherein the merging of the inner loop
and the outer loop comprises generating third loop execution
frequency information based on first loop execution frequency
information of the inner loop and second loop execution frequency
information of the outer loop; and generating a loop control
command to control commands of the merged loop to be executed as
many times as a value of the third loop execution frequency
information.
15. The method of claim 12, wherein the merging of the inner loop
and the outer loop comprises copying a command of the inner loop
and/or a command of the outer loop using a loop peeling
technique.
16. The method of claim 12, wherein the merging of the inner loop
and the outer loop comprises generating a storage command to store
the first loop execution frequency information in a frequency
register file included in each of the processing elements.
17. The method of claim 16, wherein the merging of the inner loop
and the outer loop comprises inserting an execution operand into
the command of the outer loop, the execution operand including
location information indicating where the first loop execution
frequency information is stored in the frequency register file.
18. The method of claim 17, further comprising: determining whether
the execution operand is present in the allocated command; and in
response to determining that the execution operand is present,
executing the allocated command each time the first loop execution
frequency information stored in the frequency register file becomes
the same as a counter value output from a counter.
19. The method of claim 18, wherein the counter increases a counter
value each time a command included in the loop is executed or each
time the loop is executed, and the counter outputs the counter
value.
20. The method of claim 14, wherein the extracting of the loop
execution frequency information comprises extracting the first loop
execution frequency information and the second loop execution
frequency information from loop control commands of the inner loop
and outer loop, respectively.
21. The method of claim 14, wherein the value of the third loop
execution frequency information is the product of a value of the
first loop execution frequency information and a value of the
second loop execution frequency information.
22. The method of claim 12, further comprising allocating a command
of the inner loop and a command of the outer loop to a plurality of
processing elements, wherein the plurality of processing elements
are configured to process the command of the inner loop and the
command of the outer loop in parallel.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application claims the benefit under 35 U.S.C.
.sctn.119(a) of Korean Patent Application No. 10-2010-0103095,
filed on Oct. 21, 2010, in the Korean Intellectual Property Office,
the entire disclosure of which is incorporated herein by reference
for all purposes.
BACKGROUND
[0002] 1. Field
[0003] The following description relates to a technique for
processing a nested loop, and additionally, to a technique for
processing a nested loop by allocating commands included in a
nested loop to a plurality of processing elements and processing
the allocated commands.
[0004] 2. Description of the Related Art
[0005] Reconfigurable architecture refers to architecture in which
a hardware configuration of a computing apparatus may be changed to
more efficiently process a given task.
[0006] In processing a task in a hardware manner, even a slight
change in the task may not be efficiently dealt with because
hardware functions are rigid. In contrast, in processing a task in
a software manner, software can be more easily optimized to the
task, but the processing speed is slower than processing in a
hardware manner.
[0007] Reconfigurable architecture has the advantages of both
hardware/software processing. For example, reconfigurable
architecture has drawn significant attention from a digital signal
processing field in which the same process is recursively
executed.
[0008] In general, a digital signal processing procedure includes
multiple loop operations that repeat the same task. To increase
loop operation speed, loop level parallelism (LLP) may be used. A
representative example of LLP is software pipelining.
[0009] Typically, an inner loop and an outer loop which are
included in a nested loop are processed in series in reconfigurable
architecture. However, the series processing may substantially
lengthen the processing time of the loop operation.
SUMMARY
[0010] In one general aspect, there is provided a reconfigurable
processor for processing a nested loop, the reconfigurable
processor including an extracting unit configured to extract loop
execution frequency information from each of an inner loop and an
outer loop which are included in the nested loop, and a loop
merging unit configured to merge the inner loop and the outer loop
based on the extracted loop execution frequency information.
[0011] The reconfigurable processor may further comprise a
scheduler configured to allocate a command of the inner loop to at
least one of a plurality of processing elements and to allocate a
command of the outer loop to at least one of the remaining
processing elements which have not been allocated the command of
the inner loop.
[0012] The loop merging unit may be further configured to generate
third loop execution frequency information based on first loop
execution frequency information of the inner loop and second loop
execution frequency information of the outer loop, and to generate
a loop control command to control commands of the merged loop to be
executed as many times as a value of the third loop execution
frequency information.
[0013] The loop merging unit may be further configured to copy a
command of the inner loop and/or a command of the outer loop using
a loop peeling technique.
[0014] The reconfigurable processor may further comprise a
plurality of processing elements, each comprising a frequency
register file configured to store the first loop execution
frequency information, and a counter configured to increase a
counter value at each occasion of a predefined event, and to output
the counter value, wherein the loop merging unit is further
configured to generate a storage command to store the first loop
execution frequency information in the frequency register file.
[0015] The loop merging unit may be further configured to insert an
execution operand into the command of the outer loop, the execution
operand comprising location information indicating where the first
loop execution frequency information is stored in the frequency
register file.
[0016] Each of the processing elements may be further configured to
determine whether the execution operand is present in the allocated
command, and in response to determining that the execution operand
is present, to execute the allocated command each time the first
loop execution frequency information stored in the frequency
register file becomes the same as the counter value.
[0017] The counter may be further configured to increase the
counter value each time a command included in the loop is executed
or each time the loop is executed.
[0018] The loop merging unit may be further configured to extract
the first loop execution frequency information and the second loop
execution frequency information from loop control commands of the
inner loop and outer loop, respectively.
[0019] The value of third loop execution frequency information may
be the product of a value of the first loop execution frequency
information and a value of the second loop execution frequency
information.
[0020] The reconfigurable processor may further comprise a
scheduler configured to allocate a command of the inner loop and a
command of the outer loop to a plurality of processing elements,
wherein the plurality of processing elements are configured to
process the command of the inner loop and the command of the outer
loop in parallel.
[0021] In another aspect, there is provided a method of processing
a nested loop, the method including extracting loop execution
frequency information from each of an inner loop and an outer loop
which are included in the nested loop, and merging the inner loop
and the outer loop of the nested loop based on the extracted loop
execution frequency information.
[0022] The method may further comprise allocating a command of the
inner loop to at least one of a plurality of processing elements,
and allocating a command of the outer loop to at least one of the
remaining processing elements which have not been allocated the
command of the inner loop.
[0023] The merging of the inner loop and the outer loop may
comprise generating third loop execution frequency information
based on first loop execution frequency information of the inner
loop and second loop execution frequency information of the outer
loop, and generating a loop control command to control commands of
the merged loop to be executed as many times as a value of the
third loop execution frequency information.
[0024] The merging of the inner loop and the outer loop may
comprise copying a command of the inner loop and/or a command of
the outer loop using a loop peeling technique.
[0025] The merging of the inner loop and the outer loop may
comprise generating a storage command to store the first loop
execution frequency information in a frequency register file
included in each of the processing elements.
[0026] The merging of the inner loop and the outer loop may
comprise inserting an execution operand into the command of the
outer loop, the execution operand including location information
indicating where the first loop execution frequency information is
stored in the frequency register file.
[0027] The method may further comprise determining whether the
execution operand is present in the allocated command, and in
response to determining that the execution operand is present,
executing the allocated command each time the first loop execution
frequency information stored in the frequency register file becomes
the same as a counter value output from a counter.
[0028] The counter may increase a counter value each time a command
included in the loop is executed or each time the loop is executed,
and the counter may output the counter value.
[0029] The extracting of the loop execution frequency information
may comprise extracting the first loop execution frequency
information and the second loop execution frequency information
from loop control commands of the inner loop and outer loop,
respectively.
[0030] The value of the third loop execution frequency information
may be the product of a value of the first loop execution frequency
information and a value of the second loop execution frequency
information.
[0031] The method may further comprise allocating a command of the
inner loop and a command of the outer loop to a plurality of
processing elements, wherein the plurality of processing elements
are configured to process the command of the inner loop and the
command of the outer loop in parallel.
[0032] Other features and aspects may be apparent from the
following detailed description, the drawings, and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] FIG. 1 is a diagram illustrating an example of a
reconfigurable processor
[0034] FIGS. 2A to 2D are diagrams illustrating examples of
procedures of merging a nested loop.
[0035] FIG. 3 is a diagram illustrating an example of the statuses
of PEs to which commands of an inner loop and an outer loop are
allocated.
[0036] FIG. 4 is a flowchart of an example of a method of merging a
nested loop.
[0037] FIG. 5 is a flowchart of an example of a method of
processing a nested loop.
[0038] Throughout the drawings and the detailed description, unless
otherwise described, the same drawing reference numerals should be
understood to refer to the same elements, features, and structures.
The relative size and depiction of these elements may be
exaggerated for clarity, illustration, and convenience.
DETAILED DESCRIPTION
[0039] The following description is provided to assist the reader
in gaining a comprehensive understanding of the methods,
apparatuses, and/or systems described herein. Accordingly, various
changes, modifications, and equivalents of the methods,
apparatuses, and/or systems described herein may be suggested to
those of ordinary skill in the art. Also, descriptions of
well-known functions and constructions may be omitted for increased
clarity and conciseness.
[0040] FIG. 1 illustrates an example of a reconfigurable
processor.
[0041] Referring to the example illustrated in FIG. 1,
reconfigurable processor 100 includes an extracting unit 105, a
loop merging unit 110, a scheduler 120, a reconfigurable array 130,
and a memory 140. For example, the reconfigurable processor 100 may
be included in various devices, for example, a mobile terminal, a
computer, a personal digital assistant (PDA), an MP3 player, and
the like.
[0042] As an example, a command may be an operation or instruction
to be executed by a processing element. For example, the command
may correspond to a plurality of operations or instructions
included in a loop body. A nested loop indicates a loop including
an inner loop and an outer loop.
[0043] The extracting unit 105 may extract information about the
number of times the inner loop and the outer loop are executed. For
example, the extracting unit 105 may extract information about the
number of times the inner loop is executed (hereinafter, referred
to as a "first loop execution frequency information") and
information about the number of times the outer loop is executed
(hereinafter, referred to as a "second loop execution frequency
information") from a loop control command. The loop control command
may be used to control the number of times a loop is executed, and
may include loop execution frequency information.
[0044] The loop merging unit 110 may merge the inner loop and the
outer loop based on the extracted loop execution frequency
information.
[0045] For example, the loop merging unit 110 may generate third
loop execution frequency information based on the first loop
execution frequency information about the inner loop and the second
loop execution frequency information about the outer loop. For
example, the loop merging unit 110 may generate a loop control
command to execute commands of the merged loop as many times as a
value indicated by the third loop execution frequency information.
The third loop execution frequency information may be the product
of the first loop execution frequency information and the second
loop execution frequency information. As an example, if the first
loop execution frequency is 10 and the second loop execution
frequency is 8, the third loop execution frequency is the product
of 10*8=80.
[0046] The loop merging unit 110 may generate a storage command to
store the first loop execution frequency information in a frequency
register file 150 of the processing element 132. For example, the
loop merging unit 110 may change a location of the generated loop
control command such that the generated loop control command
becomes the last command executed from among the commands included
in the merged loop.
[0047] As another example, the generated loop control command is
not necessarily the last command executed from among the commands
included in the merged loop. For example, the loop merging unit 110
may change the location of the command through loop peeling. The
loop merging unit 110 may create an execution operand that includes
location information that indicates where the first loop execution
frequency information is stored, and may insert the generated
execution operand into a command of the outer loop. As an example,
the location information may be an address, which is further
described with reference to FIGS. 2A to 2D.
[0048] The scheduler 120 may allocate the plurality of commands
included in the nested loop to multiple processing elements
included in the reconfigurable array 130. For example, the
scheduler 120 may allocate commands of the inner loop of the merged
loop to at least one of the multiple processing elements, and
commands of the outer loop of the merged loop to at least one of
remaining multiple processing which were not allocated commands of
the inner loop.
[0049] In this example, the reconfigurable array 130 includes a
register file 131 and a plurality of processing elements (PE) 132.
The reconfigurable array 130 may change hardware configuration to
perform an operation more efficiently. For example, the
reconfigurable array 130 may change the connection statuses between
PEs based on the type of operation to be processed.
[0050] The register file 131 may store various types of data for
executing a command or data transmission between the PEs 132. For
example, each PE 132 may access the register file 131 to read or
write data to be used for executing a command. As an example, not
all PEs 132 may be connected to one another. In this example, some
PEs may access the register file 131 via other PEs.
[0051] The PEs 132 may execute allocated commands. The connection
and the operation order of each PE 132 may be changed based on a
task to be processes.
[0052] Each of the processing elements PE1, PE2, PE3, PE4, PE5,
PE6, PE7, PE8, PE9, PE10, PE11, and PE12 may include a frequency
register file and a control unit. In this example, the PE1 includes
a frequency register file 150 and a control unit 151. The control
unit 151 may include a counter 152.
[0053] The frequency register file 150 may store the first loop
execution frequency information of the inner loop. The control unit
151 may control the operation of the PE1.
[0054] The counter 152 may output a counter value by increasing a
value at each occasion of a predefined event. For example, the
counter 152 may increase a value and output the value each time a
command included in a loop is executed. As another example, the
counter 152 may increase a value and output the value each time a
loop is executed. The former example (hereinafter, referred to as a
"first example") in which the counter value is increased each time
the command included in a loop is executed may be more precise than
the latter example (hereinafter, referred to as a "second example")
in which the counter value is increased each time the loop is
executed. In the first example, a location of a command currently
being executed or a command which causes execution to be stopped
may be easily recognized based on the counter value. In contrast,
in the second example, the counting takes place on a loop-by-loop
basis, and it may not be able to accurately recognize a command
being executed.
[0055] The control unit 151 may determine whether the execution
operand is present in the command allocated by the scheduler 120.
For example, in response to the execution operand being present,
the control unit 151 may execute the command allocated to the PE1
each time the first loop execution frequency stored in the
frequency register file 150 and the counter value of the counter
152 become the same as each other. As another example, in response
to the execution operand not being present, the control unit 151
may execute the command allocated to the PE1 regardless of the
first loop execution frequency and the counter value. The presence
of the execution operand may be used to indicate that the command
of the outer loop has been allocated to the PE, and the
non-presence of the execution operand may be used to indicate that
the command of the inner loop has been allocated to the PE.
[0056] For example, if the first loop execution frequency is 16 and
the second loop execution frequency is 8, the third loop execution
frequency may be 128 (i.e. 8*16). In this example, each time the
outer loop is executed once, the inner loop is executed 16 times.
Accordingly, because the outer loop is executed 8 times, the total
number of times the inner loop is executed is 128. The control unit
151 may determine whether an execution operand is present in the
allocated command. For example, in response to the execution
operand not being present, the control unit 151 may execute the
command allocated to the PE1 128 times, which is the third loop
execution frequency. In other words, the PE1 may execute the
command (`no execution operand present`) of the inner loop 128
times which is the third loop execution frequency.
[0057] As another example, in response to the execution operand
being present, the control unit 151 may be allowed to execute the
command allocated to the PE1 when the counter value of the counter
152 becomes 16 which is the first loop execution frequency. In
other words, the PE 1 may execute the command (`execution operand
present`) 8 times while the merged loop is executed a total of 128
times.
[0058] The number (128) of times for executing the inner loop and
the number (8) of times for executing the outer loop may be the
same before and after merging the inner and outer loops.
[0059] In the example described above, the reconfigurable processor
may use the frequency register file and the control unit to control
the PE to execute a command of the outer loop the same number of
times as before the merging.
[0060] The memory 140 may store information about the connection
statuses between PEs 132, commands for processing, and information
on result of processing. For example, the memory 140 may store data
to be processed or results of processing. As another example, the
memory 140 may store information used to drive reconfigurable
processor 100, and connection status information and an operation
method of the reconfigurable processor 100.
[0061] The reconfigurable processor 100 may merge the inner loop
and the outer loop which are included in the nested loop, and may
allocate the merged loop to the PEs in a parallel manner, thereby
reducing processing time of the nested loop and a code length.
[0062] FIGS. 2A to 2D illustrate examples of procedures of merging
a nested loop. For example, the nested loop may be merged by the
loop merging unit shown in the example illustrated in FIG. 1.
[0063] Referring to the examples illustrated in FIGS. 1 and 2A, an
assembly code may include a plurality of code blocks, for example,
five code blocks. In this example, `mov A, B` represents a command
to store a value of B in A, `add A.rarw.B, C` represents a command
to sum up values of B and C and store the resultant value in A,
`bne A, B, C` represents a command to execute a code block
corresponding to C if A and B are not the same as each other, and
otherwise execute a code block subsequent to a current code block.
#0.times.0, #0.times.10, and the like indicate hexadecimal numbers.
In this example, `bne` command is a loop control command, and
`#0.times.10` indicates loop execution frequency information. For
example, `#0.times.10` may be represented by 16 in the decimal
system.
[0064] The extracting unit 105 may extract first loop execution
frequency information 202 from a loop control command 201 from
among commands included in an inner loop 200. In this example, the
first loop execution frequency information 202 is "#0.times.10,"
which is represented by 16 in the decimal system. The extracting
unit 105 may extract second loop execution frequency information
212 from a loop control command 211 from among commands included in
an outer loop 210. In this example, the second loop execution
frequency information is "#0.times.8," which is represented by 8 in
the decimal system.
[0065] Referring to the example illustrated in FIG. 2B, the loop
merging unit 110 may generate third loop execution frequency
information 232 based on the first loop execution frequency
information 202 of the inner loop and based on the second loop
execution frequency information 212 of the outer loop. For example,
the loop merging unit 110 may multiply the first loop execution
frequency information 202 and the second loop execution frequency
information 212 to obtain the third loop execution frequency
information 232. As an example, the loop merging unit 110 may
multiply 16 which is the first loop execution frequency and 8 which
is the second loop execution frequency, and use the multiplication
result of 128 as the third loop execution frequency information
232.
[0066] The loop merging unit 110 may generate a loop control
command 231 for executing commands of the merged loop as many times
as the third loop execution frequency information 232. The loop
merging unit 110 may delete the loop control command 201 from the
inner loop and the loop control command 212 from the outer
loop.
[0067] The loop merging unit 110 may generate a storage command 241
for storing the first loop execution frequency information in the
frequency register file 150. In this example, `freq A, B` indicates
a command to store B in A. For example, `freq f0, #0.times.10` may
be a command to store "#0.times.10" at a location corresponding to
f0 in the frequency register file 150. The loop merging unit 110
may generate a new code block 242, and insert the generated storage
command 241 in the newly generated code block 242.
[0068] Referring to the example illustrated in FIG. 2C, the loop
merging unit 110 may utilize a loop peeling technique to copy
commands that are present in a code block cb1 (shown in FIG. 2B) to
code blocks prologue cb and cb3 (shown in FIG. 2C). The loop
merging 110 may insert execution operands `f0` 250 which include
location information indicating where the first loop execution
frequency information is stored in the frequency register file into
a command of the outer loop.
[0069] Referring to the example illustrated in FIG. 2D, the loop
merging unit 110 may move the loop control command 231 to the
bottom of the code block cb3. In this example, the loop control
command 231 is the last command to be executed from among the
commands of the inner and outer loops. For example, the loop
merging unit 231 may generate and insert a move command 260 in code
block cb2 to move a value stored in a register `r11` included in
the loop control command 231 to a different register `r21,` and
change the register `r11` included in the loop control command 231
to the register `r21.` The changed loop control command 231 may be
moved to the bottom of the code block cb3.
[0070] If the loop control command 231 is moved to the bottom of
the code block cb3 without changing the register, the value
included in the register `r11` of the loop control command 231 may
be changed by a command `move r11, #0.times.0 (f0)` of the code
block cb3. Accordingly, to prevent such change, the register of the
loop control command may be changed after the move command 260 is
generated and inserted in the code block cb2.
[0071] The loop merging unit 110 may merge the inner loop and the
outer loop into one loop 240.
[0072] The control unit 151 may determine whether an execution
operand `f0` is present in the allocated commands. For example, in
response to the execution operand being present, the control unit
151 may be allowed to execute the command allocated to the PE1 each
time the first loop execution frequency stored in the frequency
register file 150 becomes the same as the counter value of the
counter 152. As another example, in response to the execution
operand not being present, the control unit 151 may be allowed to
execute the command allocated to the PE1 regardless of the first
loop execution frequency and the counter value. The counter 152 may
increase a value at each occasion of a predefined event. For
example, the counter 152 may increase a value each time the loop is
executed once.
[0073] FIG. 3 illustrates an example of the statuses of PEs to
which commands of an inner loop and an outer loop are
allocated.
[0074] Referring to the examples illustrated in FIGS. 1 and 3, the
PEs are executed at each cycle. For example, a merged loop may be
executed each time all cycles 0, 1, and 2 are executed. For
example, 8 times execution of the merged loop may indicate that the
cycles 0, 1, and 2 are executed 8 times. That is, the merged loop
may include a plurality of cycles, for example, three cycles.
[0075] The counter 152 may increase a value at each occasion of a
predefined event. As an example, the counter 152 may increase a
value each time a command included in the loop is executed. In
other words, the counter 152 may increase a value at each cycle. As
another example, the counter 152 may increase a value each time the
loop is executed. For example, the counter 152 may increase a value
after all of the cycles 0, 1, and 2 have been executed.
[0076] The scheduler 120 may allocate commands of the inner loop of
the merged loop to some of a plurality of PEs PE1, PE2, PE3, PE4,
PE5, PE6, PE7, PE8, PE9, PE10, PE11, and PE12. The scheduler 120
may allocate commands of the outer loop of the merged loop to the
PEs which are not allocated with the commands of the inner
loop.
[0077] By allocating the commands included in the inner loop to the
PEs and allocating the commands included in the outer loop to the
rest of the PEs, the reconfigurable processor may process the outer
loop in parallel with the inner loop without influencing the inner
loop. Therefore, the reconfigurable processor may reduce processing
time to process the nested loop that includes both the inner loop
and the outer loop.
[0078] FIG. 4 illustrates an example of a method of merging a
nested loop. Referring to the examples illustrated in FIGS. 1 and
4, the first loop execution frequency information and the second
loop execution frequency information may be extracted from the loop
control commands of the respective inner loop and outer loop. The
third loop execution frequency information is generated based on
the first loop execution frequency information of the inner loop
and the second loop execution frequency information of the outer
loop, in 400.
[0079] A loop control command is generated to execute the commands
of a merged loop as many times as a value of the third loop
execution frequency information, in 410. For example, the value of
the third loop execution frequency information may be a product of
the first loop execution frequency information and the second loop
execution frequency information.
[0080] A storage command to store the first loop execution
frequency information in the frequency register file is generated,
in 420. A location of the command is changed using loop peeling
technique, in 430. An execution operand including location
information indicating where the first loop execution frequency
information is stored in the frequency register file is generated
and inserted into a command of the outer loop, in 440. The
generated loop control command is changed, in 450. For example, the
generated loop command may be changed in its location such that the
loop control command becomes the last command to be executed among
the commands included in the merged loop. As described above, the
nested loop can be merged.
[0081] As another example, a plurality of loops may be merged
together. For example, if three or more nested loops are present,
the three or more nested loops may be merged into one loop based on
the above described procedures. For example, if a first loop, a
second loop, and a third loop are nested loops, and the first loop
is the innermost loop, the first and second loops may be merged
together as described above. Then, the merged loop may be merged
with the third loop in the same manner as the above.
[0082] FIG. 5 illustrates an example of a method of processing a
nested loop.
[0083] Referring to FIGS. 4 and 5, loop execution frequency
information may be extracted from each of the inner loop and the
outer loop which are included in the nested loop. The inner loop
and the outer loop of the nested loop are merged together based on
the extracted loop execution frequency information, in 500. The
merging may be performed in the same manner as the example
illustrated in FIG. 4.
[0084] A command of the inner loop of the merged loop is allocated
to at least one of a plurality of PEs, in 510. A command of the
outer loop of the merged loop is allocated to one or more of the
PEs which have not been allocated with the command of the inner
loop, in 520. Whether an execution operand is present in the
allocated command is determined, in 530. In response to the
execution operand being present, the PEs execute the allocated
commands each time the first loop execution frequency information
becomes the same as the counter value, in 540. In response to no
execution operand being present, the PEs execute the allocated
commands regardless of the first loop execution frequency and the
counter value, in 550.
[0085] As described above, the method of merging the nested loop
merges the inner loop and the outer loop which are included in the
nested loop, and allocates the merged loop to the PEs in parallel,
thereby reducing processing time to process the nested loop.
[0086] The processes, functions, methods, and/or software described
herein may be recorded, stored, or fixed in one or more
computer-readable storage media that includes program instructions
to be implemented by a computer to cause a processor to execute or
perform the program instructions. The media may also include, alone
or in combination with the program instructions, data files, data
structures, and the like. The media and program instructions may be
those specially designed and constructed, or they may be of the
kind well-known and available to those having skill in the computer
software arts. Examples of computer-readable storage media include
magnetic media, such as hard disks, floppy disks, and magnetic
tape; optical media such as CD ROM disks and DVDs; magneto-optical
media, such as optical disks; and hardware devices that are
specially configured to store and perform program instructions,
such as read-only memory (ROM), random access memory (RAM), flash
memory, and the like. Examples of program instructions include
machine code, such as produced by a compiler, and files containing
higher level code that may be executed by the computer using an
interpreter. The described hardware devices may be configured to
act as one or more software modules that are recorded, stored, or
fixed in one or more computer-readable storage media, in order to
perform the operations and methods described above, or vice versa.
In addition, a computer-readable storage medium may be distributed
among computer systems connected through a network and
computer-readable codes or program instructions may be stored and
executed in a decentralized manner.
[0087] A number of examples have been described above.
Nevertheless, it should be understood that various modifications
may be made. For example, suitable results may be achieved if the
described techniques are performed in a different order and/or if
components in a described system, architecture, device, or circuit
are combined in a different manner and/or replaced or supplemented
by other components or their equivalents. Accordingly, other
implementations are within the scope of the following claims.
* * * * *