U.S. patent application number 11/528326 was filed with the patent office on 2007-03-29 for parameterizable clip instruction and method of performing a clip operation using the same.
This patent application is currently assigned to ARC International (UK) Limited. Invention is credited to Aris Aristodemou, Carl Norman Graham, Simon Jones, Yazid Nemouchi, Nigel Topham, Kar-Lik Wong.
Application Number | 20070074007 11/528326 |
Document ID | / |
Family ID | 37968194 |
Filed Date | 2007-03-29 |
United States Patent
Application |
20070074007 |
Kind Code |
A1 |
Topham; Nigel ; et
al. |
March 29, 2007 |
Parameterizable clip instruction and method of performing a clip
operation using the same
Abstract
A parameterizable clip instruction for SIMD microprocessor
architecture and method of performing a clip operating the same. A
single instruction is provided with three input operands: a
destination address, a source address and a controlling parameter.
The controlling parameter includes a range type and a range
specifier. The range type is a multi-bit integer in the operand
that is used to index a table of range types. The range specifier
plugs into the range type to define a range. The data input at the
source address is clipped according to the controlling parameters.
The instruction is particularly suited to video encoding/decoding
applications where interpolations or other calculations, lies
outside the maximum value and that final result will have to be
clipped to saturation value, for example, the maximum pixel value.
Signed and unsigned clipping ranges may be used that are not only
powers of two.
Inventors: |
Topham; Nigel; (Midlothian,
GB) ; Nemouchi; Yazid; (Sandhurst, GB) ;
Jones; Simon; (London, GB) ; Graham; Carl Norman;
(London, GB) ; Wong; Kar-Lik; (Wokingham, GB)
; Aristodemou; Aris; (Frien Barnet, GB) |
Correspondence
Address: |
HUNTON & WILLIAMS LLP;INTELLECTUAL PROPERTY DEPARTMENT
1900 K STREET, N.W.
SUITE 1200
WASHINGTON
DC
20006-1109
US
|
Assignee: |
ARC International (UK)
Limited
|
Family ID: |
37968194 |
Appl. No.: |
11/528326 |
Filed: |
September 28, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60721108 |
Sep 28, 2005 |
|
|
|
Current U.S.
Class: |
712/221 ;
712/E9.019; 712/E9.034 |
Current CPC
Class: |
G06F 9/30018 20130101;
H04N 19/86 20141101; H04N 19/14 20141101; G06F 9/3802 20130101;
G06F 9/3808 20130101; H04N 19/436 20141101; G06F 9/3885 20130101;
H04N 19/61 20141101; H04N 19/117 20141101; G06F 13/28 20130101;
H04N 19/43 20141101; H04N 19/182 20141101; H04N 19/176 20141101;
G06F 9/30003 20130101; G06F 9/3875 20130101; G06F 9/3887 20130101;
G06F 9/3893 20130101; G06F 9/3897 20130101; H04N 19/82 20141101;
G06F 9/3867 20130101; G06F 9/30076 20130101; G06F 9/3877 20130101;
H04N 19/523 20141101; G06F 9/30032 20130101; G06T 3/4007
20130101 |
Class at
Publication: |
712/221 |
International
Class: |
G06F 9/44 20060101
G06F009/44 |
Claims
1. A method of causing a microprocessor to perform a clip operation
comprising: providing an assembly instruction to the
microprocessor, the instruction comprising an input address, an
output address and a controlling parameter; decoding the
instruction with logic in the microprocessor; retrieving a data
input from the input address; determining a specific clip operation
based on the controlling parameter; performing the clip operation
on the data input; and writing the result to output address.
2. The method according to claim 1, wherein determining a clip
operation based on the controlling parameter comprises decoding the
controlling parameter into a range type and a range specifier.
3. The method according to claim 2, wherein the range type is a
type selected from the group consisting of a [0, 2.sup.N-1], [-N,
N], [-2.sup.N, 2.sup.N-1] and [0, N], where N is the range
specifier.
4. The method according to claim 2, wherein decoding the
controlling parameter into a range type comprises performing a
table look up of a X-bit number in the controlling parameter where
2.sup.X is the number of range types.
5. The method according to claim 2, wherein performing the clip
operation comprises clipping the input value according to the range
type and range specifier.
6. The method according to claim 1, wherein in the input address
and output addresses comprise vector registers.
7. A method of performing a clip operation with a single
parameterizable assembly language-based clip instruction executing
on a microprocessor comprising: specifying a source address of a
data input, a destination address of a clipped output and a
controlling parameter in a single instruction; obtaining the data
input at the source address; performing the clip operation on the
data input in accordance with the controlling parameter; and
storing the result at the destination address.
8. The method according to claim 7, wherein specifying a
controlling parameter comprises specifying a Y bit number including
a range type and a range specifier, where Y is an integer power of
2.
9. The method according to claim 8, wherein the range type is a is
a type selected from the group consisting of a [0, 2.sup.N-1], [-N,
N], [-2.sup.N, 2.sup.N-1] and [0, N], where N is the range
specifier.
10. The method according to claim 9, wherein the range specifier is
a positive integer.
11. The method according to claim 8, wherein performing the clip
operation in accordance with the controlling parameter comprises
clipping the data input based on the instruction's range specifier
and range type.
12. The method according to claim 7, wherein the source address and
destination address comprise vector registers and performing the
clip operation comprises performing the clip operation in
accordance with the controlling parameter on each slice of the
source address vector registers and storing the results at a
corresponding slice of the destination address vector register.
13. A parameterizable assembly language program instruction for
performing a clip operation in an video processing application
comprising: an instruction name for a particular microprocessor
instruction; a first instruction input operand comprising a
destination register address to write an instruction result; a
second instruction input operand comprising a source register
address containing a value to be clipped; and a third instruction
input operand comprising a controlling parameter.
14. The instruction according to claim 13, wherein the controlling
parameter comprises a Z-bit number wherein Z is an integer power of
2.
15. The instruction according to claim 13, wherein the controlling
parameter includes a range type and a range specifier.
16. The instruction according to claim 15, wherein the range type
is a type selected from the group consisting of a [0, 2.sup.N-1],
[-N, N], [-2.sup.N, 2.sup.N-1] and [0, N], where N is the range
specifier.
17. The instruction according to claim 16, wherein N is a positive
integer.
18. The instruction according to claim 13, wherein the destination
register address and the source register address are vector
register addresses.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent
Application No. 60/721,108 titled "SIMD Architecture and Associated
Systems and Methods," filed Sep. 28, 2005, the disclosure of which
is hereby incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0002] The invention relates generally to embedded microprocessor
architectures and more specifically to a clip instruction for SIMD
microprocessor architectures and a method of performing a clip
operation using such a clip instruction.
BACKGROUND OF THE INVENTION
[0003] Single instruction multiple data (SIMD) architectures have
become increasingly important as demand for video processing in
electronic devices has increased. The SIMD architecture exploits
the data parallelism that is abundant in data manipulations often
found in media related applications, such as discrete cosine
transforms (DCT) and filters. Data parallelism exists when a large
mass of data of uniform type needs the same instruction performed
on it. Thus, in contrast to a single instruction single data (SISD)
architecture, in a SIMD architecture a single instruction may be
used to effect an operation on a wide block of data. SIMD
architecture exploits parallelism in the data stream while SISD can
only operate on data sequentially.
[0004] An example of an application that takes advantage of SIMD is
one where the same value is being added to a large number of data
points, a common operation in many media application. One example
of this is changing the brightness of a graphic image. Each pixel
of the image may consist of three values for the brightness of the
red, green ad blue portions of the color. To change the brightness,
the R, G and B values, or alternatively the YUV values are read
from memory, a value is added to it, and the resulting value is
written back to memory. A SIMD processor enhances performance of
this type of operation over that of a SISD processor. A reason for
this improvement is that that in SIMD architectures, data is
understood to be in blocks and a number of values can be loaded at
once. Instead of a series of instructions to incrementally fetch
individual pixels, a SIMD processor will have a single instruction
that effectively says "get all these pixels" Another advantage of
SIMD machines is multiple pieces of data are operated on
simultaneously. Thus, a single instruction can say "perform this
operations on all the pixels." Thus, SIMD machines are much more
efficient in exploiting data parallelism than SISD machines.
[0005] SIMD architectures have particular promise for video
encoding/decoding applications where many repetitive numerical
computations must be performed on relatively large blocks of data.
Numerical computation algorithms, such as those common in video
encoding/decoding, often require results to be clipped to be within
a specified range of values. For example, in video processing, a
system will have a maximum pixel depth depending on the system's
resolution. If the value of an intermediate calculation result,
such as interpolation or other calculation, lies outside the
maximum value the final result will have to be clipped to the
saturation value, for example, the maximum pixel value.
[0006] Clipping is typically implemented in software using a
sequence of instructions that first test the intermediate value and
then conditionally assign the final value, for example, if
value>maximum, then value=maximum. Such a software clipping
implementation incurs a high overhead due to the number of
calculations required to test each value. The sequential nature of
a software implementation makes it very difficult to be optimized
in processors designed to exploit instruction level parallelism,
such as, for example, SISD reduced instruction set (RISC) machines
or very long instruction word (VLIW) machines. Some processors do
implement clipping at the hardware level using specialized
processor instructions, however, the clipping ranges of these
instructions are fixed to some value, typically a power of two.
SUMMARY OF THE INVENTION
[0007] Thus, there exists a need for a SIMD microprocessor
architecture that ameliorates at least some of the above-noted
deficiencies of conventional systems. At least one embodiment of
the invention may provide a parameterizable microprocessor clip
instruction. The parameterizable microprocessor clip instruction
according to this embodiment may comprise a destination register
operand, a source register operand of a value to be clipped, and a
second source operand containing the control parameter specifying
the manner in which clipping is to be performed, wherein the
control parameter comprises a range type and range specifier. It
should be appreciated that in the context of a SIMD machine, the
source operand containing the "value" to be clipped is really
referring to the values to be clipped because a 128-bit register is
used to hold 8 16-bit values to be clipped by a single
instruction.
[0008] Accordingly, at least one embodiment of the invention may
provide a method of causing a microprocessor to perform a clip
operation. The method according to this embodiment may comprise
providing an assembly instruction to the microprocessor, the
instruction comprising an input address, an output address and a
controlling parameter, decoding the instruction with logic in the
microprocessor, retrieving a data input from the input address,
determining a specific clip operation based on the controlling
parameter, performing the clip operation on the data input, and
writing the result to output address.
[0009] Another embodiment of the invention may provide a method of
performing a clip operation with a single parameterizable assembly
language-based clip instruction executing on a microprocessor. The
method of performing a clip operation with a single parameterizable
assembly language-based clip instruction executing on a
microprocessor may comprise specifying a source address of a data
input, a destination address of a clipped output and a controlling
parameter in a single instruction, obtaining the data input at the
source address, performing the clip operation on the data input in
accordance with the controlling parameter, and storing the result
at the destination address.
[0010] At least one other embodiment of the invention may provide a
parameterizable assembly language program instruction for
performing a clip operation in a video processing application. The
parameterizable assembly language program instruction according to
this embodiment may comprise an instruction name for a particular
microprocessor instruction, a first instruction input operand
comprising a destination register address to write an instruction
result, a second instruction input operand comprising a source
register address containing a value to be clipped, and a third
instruction input operand comprising a controlling parameter.
[0011] These and other embodiments and advantages of the present
invention will become apparent from the following detailed
description, taken in conjunction with the accompanying drawings,
illustrating by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] In order to facilitate a fuller understanding of the present
disclosure, reference is now made to the accompanying drawings, in
which like elements are referenced with like numerals. These
drawings should not be construed as limiting the present
disclosure, but are intended to be exemplary only.
[0013] FIG. 1 is a diagram illustrating the components of a
parameterizable clip instruction for either SISD or SIMD processor
architectures according to at least one embodiment of the
invention;
[0014] FIG. 2 illustrates the format of a 32-bit parameter input to
the parameterizable clip instruction of FIG. 1 according to at
least one embodiment of the invention;
[0015] FIG. 3 is a table illustrating the ways in which the
parameters of the parameterizable clip instruction may be
specified; and
[0016] FIG. 4 is a flow chart of an exemplary method of performing
a clip operation with a parameterizable clip instruction according
to at least one embodiment of the invention.
DETAILED DESCRIPTION
[0017] The following description is intended to convey a thorough
understanding of the embodiments described by providing a number of
specific embodiments and details involving microprocessor
architecture and systems and methods for performing clip operations
with a parameterizable clip instruction. It should be appreciated,
however, that the present invention is not limited to these
specific embodiments and details, which are exemplary only. It is
further understood that one possessing ordinary skill in the art,
in light of known systems and methods, would appreciate the use of
the invention for its intended purposes and benefits in any number
of alternative embodiments, depending upon specific design and
other needs.
[0018] Referring now to FIG. 1, a diagram illustrating the
components of a parameterizable clip instruction for either SISD or
SIMD processor architectures according to at least one embodiment
of the invention is provided. As discussed above, algorithms in
numerical computations, such as those common in video
encoding/decoding, often require results to be clipped to be within
a specified range of values. For example, in video processing, a
system will have a maximum pixel depth depending on the system's
resolution. If the value of an intermediate calculation result,
such as an interpolation or other calculation lies outside the
maximum value the final result will have to be clipped to a
saturation value, for example, the maximum pixel value.
[0019] Conventionally, clipping is implemented in software using a
sequence of instructions that first test the intermediate value and
then conditionally assign the final value, for example, if
value>maximum, then value=maximum. Such a software clipping
implementation incurs a high overhead due to the number of
calculations required to test each value. The sequential nature of
a software implementation makes it very difficult to be optimized
in processors designed to exploit instruction level parallelism,
such as, for example, SISD reduced instruction set (RISC) machines
or very long instruction word (VLIW) machines. Some processors do
implement clipping at the hardware level using specialized
processor instructions, however, the clipping ranges of these
instructions are fixed to some value, typically a power of two.
Therefore, various embodiments of this invention provide a
parameterizable clip instruction for a microprocessor that enables
adjustment of clipping parameters.
[0020] Referring to FIG. 1, the instruction 100 labeled "VBLCIP"
contains three elements, rd, rb and rc. Rb and rd are the source
and destination register addresses respectively. That is, rb is the
register address of the value to be clipped and rd is the register
address where the clipped value is to be written. Rc is the
controlling parameter for the instruction. The value of rc dictates
how the value located at address rb will be clipped. This
instruction permits 8 16 bit values to be clipped within the range
specified by the control parameter rc.
[0021] FIG. 2 illustrates the format of controlling parameter rc in
the form of a 32-bit operand and FIG. 3 is a table illustrating the
ways in which the parameters of the parameterizable clip
instruction may be specified. As seen from these Figures, in this
example, the input rc is a 32 bit input. However, it should be
appreciated that depending upon the native word size of the
processor, rc may be 16, 32, 64, 128 or other bit size. In various
embodiments, the most significant 16 bits, that is, bits 31 to 16
are unused as seen in the table. In various embodiments, bits 15
and 14 are reserved for the range type, while bits 13-0 are used
for the range specifier.
[0022] In the example of FIG. 3, four range types are available.
Specifically, range types of [0, 2.sup.N-1], [-N, N], [-2.sup.N,
2.sup.N-1] and [0, N] corresponding to 2-bit binary values 00, 01,
10 and 11. The remaining 14 least significant bits, bits 13 to bit
0 are used to represent N, the range specifier. These bits contain
a binary number having a maximum value of 11111111111111 (16383).
Thus, by using range type 01 or 11, ranges not limited to powers of
two may be used.
[0023] In the table 110 of FIG. 3, the range specifier N is itself
a parameter supplied to the VBCLIP instruction 100. The bit type RT
specifies one of the four possible ways the clipping range can be
defined using the range specifier N. Range types 00 and 10 are
designed to work with unsigned and signed clipping ranges
respectively, while types 01 and 11 are designed to work with
signed and unsigned clipping ranges that are not powers of two. The
VBCLIP instruction is therefore a highly flexible processor
implementation of clipping. In addition, though the example of
FIGS. 2 and 3 describes VBCLIP as an SISD instruction, the
instruction syntax can easily be extended to SIMD architectures in
which both registers rb and rc are vector registers. In this case,
clipping, as specified in rc, is applied to each slice of the
vector register rb with the results assigned to the corresponding
slice in rd. An additional advantage of a SIMD version of the
clipping instruction is bypassing the data dependent sequential
nature of clipping operations that is awkward to implement in
parallel machines.
[0024] Referring now to FIG. 4, this figure is flow chart an
exemplary method for performing a clip operation with a
parameterizable clip instruction according to at least one
embodiment of the invention. The method begins in step 200 and
proceeds to step 205 where the clip instruction is fed to the
microprocessor pipeline. As discussed above in the context of FIGS.
1-3, in various embodiments, the instruction comprises an
instruction taking the form of a name and three input operands: a
destination address, a source address and a controlling parameter.
Then, in step 210, the data to be operated on is fetched from the
source address specified in the instruction. Also, in step 215, the
range type indicated in the instruction is referenced to determine
the actual range after decoding the instruction. In various
embodiments, the range type is represented by two bits of the input
operand's controlling parameter rc. In various embodiments, a table
is stored in a memory register of the processor that maintains a
list of the range types indexed by the two-bit code. In step 220,
the range specifier is extracted from the instruction and using the
range type, a range is determined. In step 225, the value fetched
in step 210 is clipped in accordance with the range determined in
step 220. In step 230 the result is written to the destination
address specified in the destination address input operand rd of
the instruction. Operation of the method stops in step 235.
[0025] The embodiments of the present inventions are not to be
limited in scope by the specific embodiments described herein. For
example, although many of the embodiments disclosed herein have
been described with reference to systems and methods for performing
clip operations with a parameterizable clip instruction, the
principles herein are equally applicable to other aspects of
microprocessor design and function. Indeed, various modifications
of the embodiments of the present inventions, in addition to those
described herein, will be apparent to those of ordinary skill in
the art from the foregoing description and accompanying drawings.
Thus, such modifications are intended to fall within the scope of
the following appended claims. Further, although some of the
embodiments of the present invention have been described herein in
the context of a particular implementation in a particular
environment for a particular purpose, those of ordinary skill in
the art will recognize that its usefulness is not limited thereto
and that the embodiments of the present inventions can be
beneficially implemented in any number of environments for any
number of purposes. Accordingly, the claims set forth below should
be construed in view of the full breath and spirit of the
embodiments of the present inventions as disclosed herein.
* * * * *