U.S. patent application number 11/231397 was filed with the patent office on 2006-05-11 for maintaining even and odd array pointers to extreme values by searching and comparing multiple elements concurrently where a pointer is adjusted after processing to account for a number of pipeline stages.
Invention is credited to Jose Fridman, Ravi Kolagotla, Charles P. Roth.
Application Number | 20060101230 11/231397 |
Document ID | / |
Family ID | 24708922 |
Filed Date | 2006-05-11 |
United States Patent
Application |
20060101230 |
Kind Code |
A1 |
Roth; Charles P. ; et
al. |
May 11, 2006 |
Maintaining even and odd array pointers to extreme values by
searching and comparing multiple elements concurrently where a
pointer is adjusted after processing to account for a number of
pipeline stages
Abstract
In one embodiment, a programmable processor searches an array of
N data elements in response to N/M machine instructions, where the
processor has a pipeline configured to process M data elements in
parallel. In response to the machine instructions, a control unit
directs the pipeline to retrieve M data elements from the array of
elements in a single fetch cycle, concurrently compare the data
elements to M current extreme values, and update the current
extreme values, as well as M references to the current extreme
values, based on the comparisons.
Inventors: |
Roth; Charles P.; (Austin,
TX) ; Kolagotla; Ravi; (Austin, TX) ; Fridman;
Jose; (Brookline, MA) |
Correspondence
Address: |
FISH & RICHARDSON, PC
P.O. BOX 1022
MINNEAPOLIS
MN
55440-1022
US
|
Family ID: |
24708922 |
Appl. No.: |
11/231397 |
Filed: |
September 20, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09675066 |
Sep 28, 2000 |
6948056 |
|
|
11231397 |
Sep 20, 2005 |
|
|
|
Current U.S.
Class: |
712/9 ; 712/221;
712/E9.017; 712/E9.02 |
Current CPC
Class: |
G06F 7/22 20130101; G06F
9/30036 20130101; G06F 9/30021 20130101 |
Class at
Publication: |
712/009 ;
712/221 |
International
Class: |
G06F 15/00 20060101
G06F015/00 |
Claims
1. An apparatus comprising: a processor coupled to a memory device,
wherein the processor includes a pipeline configured to process M
data elements in parallel and a control unit configured to direct
the pipeline to search an array of N data elements for an extreme
value in response to N/M machine instructions, wherein in response
to the machine instructions, the pipeline being configured to:
retrieve M data elements from the array of N data elements in a
single fetch cycle; concurrently compare the retrieved M data
elements to corresponding M current extreme values, and update
accumulators and pointers associated with the M current extreme
values based on said comparing, the pointers including one or more
pointer registers to store information indicative of addresses of
extreme values in the array of N data elements; and analyze results
of the N/M machine instructions to identify at least a value of at
least one extreme value in the array, wherein the at least one
extreme value comprises an extreme value occurring more than once
in the array, and wherein the position of the at least one extreme
value in the array comprises a position of a predetermined one of a
first occurrence and a last occurrence of the extreme value
occurring more than once in the array.
2. An apparatus as in claim 1, further comprising the memory
device.
3. An apparatus as in claim 2, wherein the memory device comprises
static random access memory.
4. An apparatus as in claim 2, wherein the memory device comprises
FLASH memory.
5. An apparatus as in claim 1, wherein the pipeline includes M
registers configured to store the accumulators and pointers.
6. An apparatus as in claim 5, wherein the registers include first
and second pointer registers to store information indicative of
addresses of first and second extreme values of the array.
7. An apparatus as in claim 5, wherein the registers are
general-purpose data registers.
8. An apparatus comprising: a processor coupled to a memory device,
wherein the processor comprises: a pipeline configured to process M
data elements in parallel; a control unit configured to direct the
pipeline to search an array of N data elements by issuing N/M
search instructions; and M registers configured to store
accumulators and pointers; wherein in response to the search
instructions, the pipeline being configured to: store references to
a location of a data element value for each of the M data elements;
determine an array value based on the stored references to the data
element values; update an accumulator to hold the array value; and
update a pointer to reference data quantity corresponding to the
array value.
9. An apparatus as in claim 8, further comprising the memory
device.
10. An apparatus as in claim 9, wherein the memory device comprises
static random access memory.
11. An apparatus as in claim 9, wherein the memory device comprises
FLASH memory.
12. An apparatus for searching an array of N data elements for an
extreme value, the apparatus comprising: means for issuing N/M
machine instructions to a processor, wherein the processor is
adapted to process M data elements in parallel; means for
concurrently comparing M data elements to corresponding M current
extreme values, means for retrieving another M elements in a single
fetch cycle to be compared when executing a subsequent machine
instruction; means for updating accumulators and pointers
associated with the M current extreme values based on said means
for concurrently comparing, the pointers including one or more
pointer registers to store information indicative of addresses of
extreme values in the array of N data elements; and means for
analyzing results of the machine instructions to identify at least
a value and a position of at least one extreme value in the array,
wherein the at least one extreme value comprises an extreme value
occurring more than once in the array, and wherein the position of
the at least one extreme value in the array comprises a position of
a predetermined one of a first occurrence and a last occurrence of
the extreme value occurring more than once in the array.
13. An apparatus as in claim 12, further comprising: means for
determining an address of a first extreme value based on a value in
a pointer register and based on a correction factor to compensate
for one or more errors.
14. An apparatus as in claim 12, further comprising: means for
storing the M current extreme values in M accumulators; and means
for copying the M data elements to the accumulators based on said
means for concurrently comparing.
15. An apparatus as in claim 12, wherein said means for
concurrently comparing the M data elements to M corresponding
current extreme values comprises means for determining whether each
of the data elements is less than the corresponding current extreme
value.
16. An apparatus as in claim 12, wherein said means for
concurrently comparing the M data elements to M corresponding
current extreme values comprises means for determining whether each
of the data elements is greater than the corresponding current
extreme value.
17. An apparatus as in claim 12, further comprising: means for
setting up registers for said accumulators and pointers.
18. An apparatus as in claim 12, wherein M=2 and N is greater than
two.
19. An apparatus as in claim 17, wherein said means for
concurrently comparing the M data elements comprises means for
processing a first data element with a first execution unit of a
pipelined processor and means for processing a second data element
with a second execution unit of the pipelined processor.
20. An apparatus as in claim 17, wherein said means for
concurrently comparing the M data elements comprises means for
concurrently processing a first data element and a second data
element within a single execution unit of a pipelined processor.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a divisional application of and claims
priority to U.S. patent application Ser. No. 09/675,066, filed Sep.
28, 2000.
BACKGROUND
[0002] This invention relates to array searching operations for a
computer.
[0003] Many conventional programmable processors, such as digital
signal processors (DSP), support a rich instruction set that
includes numerous instructions for manipulating arrays of data.
These operations are typically computationally intensive and can
require significant computing time, depending upon the number of
execution units, such as multiply-accumulate units (MACs), within
the processor.
DESCRIPTION OF DRAWINGS
[0004] FIG. 1 is a block diagram illustrating an example of a
pipelined programmable processor.
[0005] FIG. 2 is a block diagram illustrating an example execution
pipeline for the programmable processor.
[0006] FIG. 3 is a flowchart for implementing an example array
manipulation machine instruction.
[0007] FIG. 4 is a flowchart of an example routine for invoking the
machine instruction.
[0008] FIG. 5 is a flowchart for a single SEARCH instruction.
[0009] FIG. 6 is a flowchart where a software application issues
N/M SEARCH instructions and, upon completion of the N/M SEARCH
instructions, determines an extreme value for an entire array.
DESCRIPTION
[0010] FIG. 1 is a block diagram illustrating a programmable
processor 2 having an execution pipeline 4 and a control unit 6.
Processor 2, as explained in detail below, reduces the
computational time required by array manipulation operations. In
particular, processor 2 may support a machine instruction, referred
to herein as the SEARCH instruction, that reduces the computational
time to search an array of numbers in a pipelined processing
environment.
[0011] Pipeline 4 has a number of stages for processing
instructions. Each stage processes concurrently with the other
stages and passes results to the next stage in pipeline 4 at each
clock cycle. The final results of each instruction emerge at the
end of the pipeline in rapid succession.
[0012] Control unit 6 controls the flow of instructions and data
through the various stages of pipeline 4. During the processing of
an instruction, for example, control unit 6 directs the various
components of the pipeline 4 to fetch and decode the instruction,
perform the corresponding operation and write the results back to
memory or local registers.
[0013] FIG. 2 illustrates an example pipeline 4 configured
according to the invention. Pipeline 4, for example, has five
stages: instruction fetch (IF), decode (DEC), address calculation
(AC), execute (EX) and write back (WB). Instructions are fetched
from memory, or from an instruction cache, during the IF stage by
fetch unit 21 and decoded within address registers 22 during the
DEC stage. At the next clock cycle, the results pass to the AC
stage, where data address generators 23 calculate any memory
addresses that are necessary to perform the operation.
[0014] During the EX stage, execution units 25A through 25M perform
the specified operation such as, for example, adding or multiplying
numbers, in parallel. Execution units 25 may contain specialized
hardware for performing the operations including, for example, one
or more arithmetic logic units (ALU's), floating-point units (FPU)
and barrel shifters. A variety of data can be applied to execution
units 25 such as the addresses generated by data address generator
23, data retrieved from data memory 18 or data retrieved from data
registers 24. During the final stage (WB), the results are written
back to data memory or to data registers 24.
[0015] The SEARCH instruction supported by processor 2, may allow
software applications to search an array of N data elements by
issuing N/M search instructions, where M is the number of data
elements that can be processed in parallel by execution units 25 of
pipeline 4. Note, however, that a single execution unit may be
capable of executing two or more operations in parallel. For
example, an execution unit may include a 32-bit ALU capable of
concurrently comparing two 16-bit numbers.
[0016] Generally, the sequence of SEARCH instructions allows the
processor 2 to process M sets of elements in parallel to identify
an "extreme value", such as a maximum or a minimum, for each set.
During the execution of the search instructions, processor 2 stores
references to the location of the extreme value of each of the M
sets of elements. Upon completion of the N/M instructions, as
described in detail below, the software application analyzes the
references to the extreme values for each set to quickly identify
an extreme value for the array. For example, the search instruction
allows the software applications to quickly identify either the
first or last occurrence of a maximum or minimum value.
Furthermore, as explained in detail below, processor 2 implements
the operation in a fashion suitable for vectorizing in a pipelined
processor across the M execution units 25.
[0017] As described above, a software application searches an array
of data by issuing N/M SEARCH machine instructions to processor 2.
FIG. 3 is a flowchart illustrating an example mode of operation 300
for processor 2 when it receives a single SEARCH machine
instruction. Process 300 is described with reference to identifying
the last occurrence of a minimum value within the array of
elements; however, process 300 can be easily modified to perform
other functions such as identifying the first occurrence of a
minimum value, the first occurrence of a maximum value or a last
occurrence of a maximum value.
[0018] For exemplary purposes, process 300 is described in assuming
M equals 2, i.e., processor 2 concurrently processes two sets of
elements, each set having N/2 elements. However, the process is not
limited as such and is readily extensible to concurrently process
more than two sets of elements. In general, process 300 facilitates
vectorization of the search process by fetching pairs of elements
as a single data quantity and processing the element pairs through
pipeline 4 in parallel, thereby reducing the total number of clock
cycles necessary to identify the minimum value within the array.
Although applicable to other architectures, process 300 is well
suited for a pipelined processor 2 having multiple execution units
in the EX stage. For the two sets of elements, process 300
maintains two pointer registers, P.sub.Even and P.sub.Odd, that
store locations for the current extreme value within the
corresponding set. In addition, process 300 maintains two
accumulators, A0 and A1, that hold the current extreme values for
the sets. The pointer registers and the accumulators, however, may
readily be implemented as general-purpose data registers without
departing from process 300.
[0019] Referring to FIG. 3, in response to each SEARCH instruction,
processor 2 fetches a pair of elements in one clock cycle as a
single data quantity (301). For example, processor 2 may fetch two
adjacent 16-bit values as one 32-bit quantity. Next, processor 2
compares the even element of the pair to a current minimum value
for the even elements (302) and the odd element of the pair to a
current minimum value for the odd elements (304).
[0020] When a new minimum value for the even elements is detected,
processor 2 updates accumulator A0 to hold the new minimum value
and updates a pointer register P.sub.Even to hold a pointer to
point to a corresponding data quantity within the array (303).
Similarly, when a new minimum value for the odd elements has been
detected, processor 2 updates accumulator A1 and a pointer register
P.sub.Odd (305). In this example, each pointer register P.sub.Even
and P.sub.Odd points to the data quantity and not the individual
elements, although the process is not limited as such. Processor 2
repeats the process until all of the elements within the array have
been processed (306). Because processor 2 is pipelined, element
pairs may be fetched until the array is processed.
[0021] The following illustrates exemplary syntax for invoking the
machine instruction: (P.sub.Odd, P.sub.Even)=SEARCH R.sub.Data LE,
R.sub.Data=[P.sub.fetch.sub.--.sub.addr++]
[0022] Data register R.sub.Data is used as a scratch register to
store each newly fetched data element pair, with the least
significant word of R.sub.Data holding the odd element and the most
significant word of R.sub.Data holding the even element. Two
accumulators, A0 and A1, are implicitly used to store the actual
values of the results. An additional register,
P.sub.fetch.sub.--.sub.addr, is incremented when the SEARCH
instruction is issued and is used as a pointer to iterate over the
N/2 data quantities within the array. The defined condition, such
as "less than or equal" (LE) in the above example, controls which
comparison is executed and when the pointer registers P.sub.Even
and P.sub.Odd, as well as the accumulators A0 and A1, are updated.
The "LE", for example, directs processor 2 to identify the last
occurrence of the minimum value.
[0023] In a typical application, a programmer develops a software
application or subroutine that issues the N/M search instructions,
probably from within a loop construct. The programmer may write the
software application in assembly language or in a high-level
software language. A. compiler is typically invoked to process the
high-level software application and generate the appropriate
machine instructions for processor 2, including the SEARCH machine
instructions for searching the array of data.
[0024] FIG. 4 is a flowchart of an example software routine 30 for
invoking the example machine instructions illustrated above. First,
the software routine 30 initializes the registers including
initializing A0 and A1 and pointers P.sub.Eve and P.sub.Odd to the
first data quantity within the array (31). In one embodiment,
software routine 30 initializes a loop count register with the
number of SEARCH instructions to issue (N/M). Next, routine 30
issues the SEARCH machine instruction N/M times (32). This can be
accomplished a number of ways, such as by invoking a hardware loop
construct supported by processor 2. Often, however, a compiler may
unroll a software loop into a sequence of identical SEARCH
instructions (32).
[0025] After issuing N/M search instructions, A0 and A1 hold the
last occurrence of the minimum even value and the last occurrence
of the minimum odd value, respectively. Furthermore, P.sub.Even and
P.sub.Odd store the locations of the two data quantities that hold
the last occurrence of the minimum even value and the last
occurrence of the minimum odd value.
[0026] Next, in order to identify the last occurrence of the
minimum value for the entire array, routine 30 first increments
P.sub.Odd by a single element, such that P.sub.Odd points directly
at the minimum odd element (33). Routine 30 compares the
accumulators A0 and A1 to determine whether the accumulators
contain the same value, i.e., whether the minimum of the odd
elements equals the minimum of the even elements (34). If so, the
routine 30 compares the pointers to determine whether P.sub.Odd is
less than P.sub.Even and, therefore, whether the minimum even value
occurred earlier or later in the array (35). Based on the
comparison, the routine determines whether to copy P.sub.Odd into
P.sub.Even (37).
[0027] When the accumulators A0 and A1 are not the same, the
routine compares A0 to A1 in order to determine which holds the
minimum value (36). If A1 is less than A0 then routine 30 sets
P.sub.Even equal to P.sub.Odd, thereby copying the pointer to the
minimum value from P.sub.Odd into P.sub.Even (37).
[0028] At this point, P.sub.Even points to the last occurrence of
the minimum value for the entire array. Next, routine 30 adjusts
P.sub.Even to compensate for errors introduced to the pipelined
architecture of processor 2 (38). For example, the comparisons
described above are typically performed in the EX stage of pipeline
4 while incrementing the pointer register
P.sub.fetch.sub.--.sub.addr typically occurs during the AC stage,
thereby causing the P.sub.Odd and P.sub.Even to be incorrect by a
known quantity. After adjusting P.sub.Even, routine 30 returns
P.sub.Even as a pointer to the last occurrence of the minimum value
within the array (39).
[0029] FIG. 5 illustrates the operation for a single SEARCH
instruction as generalized to the case where processor 2 is capable
of processing M elements of the array in parallel, such as when
processor 2 includes M execution units. The SEARCH instruction
causes processor 2 to fetch M elements in a single fetch cycle
(51). Furthermore, in this example, processor 2 maintains M pointer
registers to store addresses (locations) of [[ ]] corresponding
extreme values for the M sets of elements. After fetching the M
elements, processor 2 concurrently compares the M elements to [[ ]]
current extreme values for the respective element sets, as stored
in M accumulators (52). Based on the comparisons, processor 2
updates the M accumulators and the M pointer registers (53).
[0030] FIG. 6 illustrates the general case where a software
application issues N/M SEARCH instructions and, upon completion of
the instructions, determines the extreme value for the entire
array. First, the software application initializes a loop counter,
the M accumulators used to store the current extreme values for the
M element sets and the M pointers used to store the locations of
the extreme values (61). Next, the software application issues N/M
SEARCH instructions (62). After completion of the instructions, the
software application may adjust each of the M pointer registers to
correctly reference its respective extreme value, instead of the
data quantity holding the extreme value (63). After adjusting the
pointer registers, the software application compares the M extreme
values for the M element sets to identify an extreme value for the
entire array, i.e., a maximum value or a minimum value (64). Then,
the software application may use the pointer registers to determine
whether more than one of the element sets have an extreme value
equal to the array extreme value and, if so, determine which
extreme value occurred first, or last, depending upon the desired
search function (65).
[0031] Various embodiments of the invention have been described.
For example, a single machine instruction has been described that
searches an array of data in a manner that facilitates
vectorization of the search process within a pipelined processor.
The processor may be implemented in a variety of systems including
general purpose computing systems, digital processing systems,
laptop computers, personal digital assistants (PDA's) and cellular
phones. For example, cellular phones often maintain an array of
values representing signal strength for services available
360.degree. around the phone. In this context, the process
discussed above can be readily used upon initialization of the
cellular phone to scan the available services and quickly select
the best service. In such a system, the processor may be coupled to
a memory device, such as a FLASH memory device or a static random
access memory (SRAM), that stores an operating system and other
software applications. These and other embodiments are within the
scope of the following claims.
* * * * *