U.S. patent application number 10/621908 was filed with the patent office on 2005-01-20 for byte execution unit for carrying out byte instructions in a processor.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Dhong, Sang Hoo, Michael, Brad William, Mueller, Silvia Melitta, Oh, Hwa-Joon, Tran, Kevin D..
Application Number | 20050015576 10/621908 |
Document ID | / |
Family ID | 34063092 |
Filed Date | 2005-01-20 |
United States Patent
Application |
20050015576 |
Kind Code |
A1 |
Dhong, Sang Hoo ; et
al. |
January 20, 2005 |
Byte execution unit for carrying out byte instructions in a
processor
Abstract
A disclosed byte execution unit receives byte instruction
information and two operands, and performs an operation specified
by the byte instruction information upon one or both of the
operands, thereby producing a result. The byte instruction
specifies either a count ones in bytes operation, an average bytes
operation, an absolute differences of bytes operation, or a sum
bytes into halfwords operation. In one embodiment, the byte
execution unit includes multiple byte units. Each byte unit
includes multiple population counters, two compressor units, adder
input multiplexer logic, adder logic, and result multiplexer logic.
A data processing system is described including a processor coupled
to a memory system. The processor includes the byte execution unit.
The memory system includes a byte instruction, wherein the byte
instruction specifies either the count ones in bytes operation, the
average bytes operation, the absolute differences of bytes
operation, or the sum bytes into halfwords operation.
Inventors: |
Dhong, Sang Hoo; (Austin,
TX) ; Oh, Hwa-Joon; (Austin, TX) ; Michael,
Brad William; (Cedar Park, TX) ; Mueller, Silvia
Melitta; (St. Ingbert, DE) ; Tran, Kevin D.;
(Austin, TX) |
Correspondence
Address: |
IBM CORPORATION (CS)
C/O CARR LLP
670 FOUNDERS SQUARE
900 JACKSON STREET
DALLAS
TX
75202
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
34063092 |
Appl. No.: |
10/621908 |
Filed: |
July 17, 2003 |
Current U.S.
Class: |
712/221 ;
712/E9.017 |
Current CPC
Class: |
G06F 9/30036 20130101;
G06F 9/30014 20130101 |
Class at
Publication: |
712/221 |
International
Class: |
G06F 009/00 |
Claims
What is claimed is:
1. A byte execution unit, comprising: logic coupled to receive byte
instruction information and two operands and configured to perform
an operation specified by the byte instruction information upon at
least one of the two operands, thereby producing a result, wherein
the byte instruction specifies either a count ones in bytes
operation, an average bytes operation, an absolute differences of
bytes operation, or a sum bytes into halfwords operation.
2. The byte execution unit as recited in claim 1, wherein each of
the two operands comprises a plurality of bits, and wherein the
bits of the two operands are grouped to form a plurality of
corresponding 8-bit bytes.
3. The byte execution unit as recited in claim 2, wherein each of
the two operands comprises 128 bits, and wherein the bits of the
two operands are grouped to form 16 corresponding bytes.
4. The byte execution unit as recited in claim 2, wherein in the
event the byte instruction information specifies the count ones in
bytes operation, the byte execution unit is configured to count a
number of logic one bits in each of the bytes of one of the two
operands, and wherein the result is indicative of the number of
logic one bits in each of the bytes.
5. The byte execution unit as recited in claim 2, wherein in the
event the byte instruction information specifies the average bytes
operation, the byte execution unit is configured to compute
averages of corresponding bytes of the two operands, and wherein
the result is indicative of the averages.
6. The byte execution unit as recited in claim 2, wherein in the
event the byte instruction information specifies the absolute
differences of bytes operation, the byte execution unit is
configured to subtract a value of a byte of one of the two operands
from a value of a corresponding byte of the other operand, and to
compute an absolute value of a result of the subtraction operation,
and wherein the result is indicative of the absolute value of the
result of the subtraction operation.
7. The byte execution unit as recited in claim 2, wherein in the
event the byte instruction information specifies the sum bytes into
halfwords operation, the byte execution unit is configured to
compute sums of values of a number of consecutive bytes of the two
operands, and wherein the result is indicative of the sums.
8. A byte execution unit, comprising: pre-processing logic coupled
to receive a plurality of operands and configured to perform an
operation upon the operands dependent upon an operation specified
by a byte instruction, thereby producing an intermediate result;
adder logic coupled to receive the intermediate result and
configured to perform an addition operation upon the intermediate
result, thereby producing a sum and a sum+1; and post-processing
logic coupled to receive the sum and sum+1 and configured to
perform an operation upon the sum and sum+1 dependent upon the
operation specified by a byte instruction, thereby producing a
result.
9. The byte execution unit as recited in claim 8, wherein the byte
instruction specifies either a count ones in bytes operation, an
average bytes operation, an absolute differences of bytes
operation, or a sum bytes into halfwords operation.
10. The byte execution unit as recited in claim 8, wherein the
pre-processing logic and the post-processing logic are each coupled
to receive control signals indicative of the operation specified by
the byte instruction.
11. The byte execution unit as recited in claim 8, wherein the
pre-processing logic comprises population counter logic coupled to
receive the operands and configured to produce population output
signals indicative of numbers of logic ones in portions of the
operands.
12. The byte execution unit as recited in claim 8, wherein the
pre-processing logic comprises compressor logic coupled to receive
the operands and configured to perform a compression function.
13. The byte execution unit as recited in claim 8, wherein the
post-processing logic comprises end-around carry logic configured
to perform an end-around carry function.
14. The byte execution unit as recited in claim 8, wherein the
post-processing logic is configured to perform bit shift
operations.
15. A logic system, comprising: compressor logic coupled to receive
an 4n-bit operand and configured to perform a compression operation
upon the operand, thereby producing a plurality of compressor
output signals; and adder logic coupled to receive the compressor
output signals and configured to perform an addition operation upon
the compressor output signals, thereby producing an (n+2)-bit
result.
16. The logic system as recited in claim 15, wherein the addition
operation comprises an (n+1)-bit addition operation.
17. The logic system as recited in claim 16, wherein the adder
logic comprises an n-bit adder.
18. The logic system as recited in claim 17, wherein n=8.
19. A byte execution unit, comprising: a plurality of byte units,
wherein each byte unit comprises: a plurality of population
counters each coupled to receive a portion of a first operand and
configured to produce a population output signal indicative of a
number of logic ones in the corresponding portion of the first
operand; a first compressor unit coupled to receive a portion of
the first operand and configured to produce a first plurality of
compressor output signals dependent upon the first operand; a
second compressor unit coupled to receive a portion of the second
operand and configured to produce a second plurality of compressor
output signals dependent upon the second operand; adder input
multiplexer logic coupled to receive the population output signals
and the first and second pluralities of compressor output signals
as data input signals, and a first plurality of control signals,
and configured to produce a portion of the data input signals as
output signals dependent upon the first plurality of control
signals; adder logic coupled to receive the output signals produced
by the adder input multiplexer logic and configured to produce a
plurality of adder output signals dependent upon the output signals
produced by the adder input multiplexer logic; and result
multiplexer logic coupled to receive the adder output signals as
data input signals, and a second plurality of control signals, and
configured to produce a portion of the data input signals as a
result signal dependent upon the second plurality of control
signals; wherein the byte execution unit is coupled to receive byte
instruction information, and wherein the first and second
pluralities of control signals are indicative of the byte
instruction information, and wherein the byte instruction
information specifies either a count ones in bytes operation, an
average bytes operation, an absolute differences of bytes
operation, or a sum bytes into halfwords operation.
20. The byte execution unit as recited in claim 19, wherein the
first and second operands each comprise a plurality of bits, and
wherein the bits of the first and second operands are grouped to
form a plurality of corresponding 8-bit bytes.
21. The byte execution unit as recited in claim 20, wherein the
first and second operands each comprise 128 bits, and wherein the
bits of the first and second operands are grouped to form 16
corresponding bytes.
22. The byte execution unit as recited in claim 20, wherein in the
event the byte instruction information specifies the count ones in
bytes operation, the result signal is indicative of a number of
logic one bits in each of the bytes of the first operand.
23. The byte execution unit as recited in claim 20, wherein in the
event the byte instruction information specifies the average bytes
operation, the result signal is indicative of averages of
corresponding bytes of the first and second operands.
24. The byte execution unit as recited in claim 20, wherein in the
event the byte instruction information specifies the absolute
differences of bytes operation, the result signal is indicative of
an absolute value of a result of subtraction operations wherein
bytes of the first operand are subtracted from the corresponding
bytes of the second operand.
25. The byte execution unit as recited in claim 20, wherein in the
event the byte instruction information specifies the sum bytes into
halfwords operation, the result signal is indicative of sums of
values of consecutive bytes of the first and second operands.
26. The byte execution unit as recited in claim 20, wherein the
first compressor unit of one of the byte units is coupled to
receive a 32-bit portion A[0:31] of the first operand, and wherein
the first plurality of compressor output signals produced by the
second compressor unit comprises output signals F2[0:7], F2[8], and
F3[0:7], and wherein the F2[0] signal conveys a carry value
resulting from an addition operation A[0]+A[8]+A[16], and wherein
the F2[1:8] signal conveys a sum vector, and wherein the F2[8]
signal conveys a sum value resulting from an addition operation
A[7]+A[15]+A[23]+A[31], and wherein the F3[0:7] signal conveys a
carry vector.
27. The byte execution unit as recited in claim 20, wherein the
second compressor unit of one of the byte units is coupled to
receive a portion B[0:31] of the second operand, and wherein the
second plurality of compressor output signals produced by the
second compressor unit comprises output signals F0[0:7], F0[8], and
F1[0:7], and wherein the F0[0] signal conveys a carry value
resulting from an addition operation B[0]+B[8]+B[1 6], and wherein
the F0[1 :8] signal conveys a sum vector, and wherein the F0[8]
signal conveys a sum value resulting from an addition operation
B[7]+B[15]+B[23]+B[31], and wherein the F1[0:7] signal conveys a
carry vector.
28. The byte execution unit as recited in claim 20, wherein the
adder logic comprises a plurality of 8-bit compound adders.
29. A data processing system, comprising: a memory system
comprising a byte instruction, wherein the byte instruction
specifies either a count ones in bytes operation, an average bytes
operation, an absolute differences of bytes operation, or a sum
bytes into halfwords operation; and a processor coupled to the
memory system and configured to fetch and execute instructions from
the memory system, wherein the processor comprises: a byte
execution unit coupled to receive byte instruction information and
two operands and configured to perform an operation specified by
the byte instruction information upon at least one of the two
operands, thereby producing a result.
30. The data processing system as recited in claim 29, wherein each
of the two operands comprises a plurality of bits, and wherein the
bits of the two operands are grouped to form a plurality of
corresponding 8-bit bytes.
31. The data processing system as recited in claim 30, wherein each
of the two operands comprises 128 bits, and wherein the bits of the
two operands are grouped to form 16 corresponding bytes.
Description
TECHNICAL FIELD
[0001] This invention relates generally to data processing systems
and, more particularly, to instruction execution units of
processors of data processing systems.
BACKGROUND OF THE INVENTION
[0002] In many audio-visual or multimedia applications involving
images, sound, and/or moving pictures (i.e., videos), the basic
unit of data is the 8-bit byte. An 8-bit data byte can represent
any one of 2.sup.8=256 different binary levels, and two 8-bit bytes
can represent any one of 2.sup.16=65,536 different binary levels.
The levels may be equally sized (linear quantization) or different
sizes (e.g., logarithmic quantization). For example, in the United
States, telephone voice signals are typically sampled using
logarithmic u-law encoding.
[0003] Images and individual frames of moving pictures or videos
are made up of two-dimensional arrays of picture elements (i.e.,
"pixels") called bitmaps. Each pixel is typically represented by a
collection of bits conveying intensity and/or color. For example, a
single bit allows only two values (e.g., black and white), while 8
bits allows 2.sup.8=256 different values (e.g., black, white, and
254 intermediate shades of gray).
[0004] The acronym "MPEG" is commonly used to refer to the family
of standards developed by the Moving Picture Experts Group (MPEG)
for coding audio-visual information (e.g., movies, video, music) in
a digital compressed format. MPEG data compression has greatly
facilitated the storing and distribution of digital video and audio
signals.
[0005] In general, MPEG video data compression predicts motion from
frame to frame in time, then uses discrete cosine transforms (DCTs)
to organize redundancy in other dimensions (i.e., other "spatial
directions"). Motion prediction is typically performed on
16.times.16 pixel blocks called "macroblocks," and DCTs are
performed on 8.times.8 pixel blocks of the macroblocks. For
example, given a 16.times.16 macroblock in a current frame, an
attempt is made to find a closely matching macroblock in a previous
or future frame. If a close match is found, DCTs are performed on
differences between the 8.times.8 pixel blocks of the current
macroblock and the close match. On the other hand, if a close match
is not found, DCTs are performed directly on the 8.times.8 pixel
blocks of the current macroblock. The resulting DCT coefficients
are then divided by a determined value (i.e., "quantized") and
Huffman coded using fixed tables.
[0006] In the MPEG standards, the fundamental unit of data is the
8-bit byte. Each pixel of a video frame typically has three color
components, each represented by one or more bytes. For example,
each pixel may be represented by a 24-bit red-green-blue (RGB)
value having one byte for red, one byte for green, and one byte for
blue.
[0007] It would thus be advantageous to have a computer system
capable of efficiently operating on 8-bit data bytes.
SUMMARY OF THE INVENTION
[0008] A disclosed byte execution unit receives byte instruction
information and two operands, and performs an operation specified
by the byte instruction information upon one or both of the
operands, thereby producing a result. The byte instruction
specifies either a count ones in bytes operation, an average bytes
operation, an absolute differences of bytes operation, or a sum
bytes into halfwords operation. In one embodiment, the byte
execution unit includes multiple byte units. Each byte unit
includes multiple population counters, two compressor units, adder
input multiplexer logic, adder logic, and result multiplexer
logic.
[0009] A data processing system is described including a processor
coupled to a memory system. The processor includes the byte
execution unit. The memory system includes a byte instruction,
wherein the byte instruction specifies either the count ones in
bytes operation, the average bytes operation, the absolute
differences of bytes operation, or the sum bytes into halfwords
operation.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The invention may be understood by reference to the
following description taken in conjunction with the accompanying
drawings, in which like reference numerals identify similar
elements, and in which:
[0011] FIG. 1 is a diagram of one embodiment of a data processing
system including a processor coupled to a memory system, wherein a
byte instruction is stored in the memory system, and wherein the
processor includes a byte execution unit for performing operations
on 8-bit data units (i.e., byte operations);
[0012] FIG. 2 is a diagram of one embodiment of the byte
instruction of FIG. 1;
[0013] FIG. 3 is a diagram of one embodiment of the byte execution
unit of FIG. 1, wherein the byte execution unit includes four byte
units and a control unit;
[0014] FIGS. 4A-4D in combination form a diagram of one embodiment
of a representative one of the byte units of FIG. 3, wherein FIGS.
4A and 4C each includes a 4:2 compressor;
[0015] FIG. 5 is a diagram of one embodiment of the 4:2 compressor
of FIG. 4A; and
[0016] FIG. 6 is a diagram illustrating how logic of the byte unit
of FIGS. 4A-4D computes a 10-bit sum of bytes using a 4:2
compressor and an 8-bit compound adder.
DETAILED DESCRIPTION
[0017] In the following discussion, numerous specific details are
set forth to provide a thorough understanding of the present
invention. However, those skilled in the art will appreciate that
the present invention may be practiced without such specific
details. In other instances, well-known elements have been
illustrated in schematic or block diagram form in order not to
obscure the present invention in unnecessary detail. Additionally,
for the most part, details concerning network communications,
electromagnetic signaling techniques, and the like, have been
omitted inasmuch as such details are not considered necessary to
obtain a complete understanding of the present invention, and are
considered to be within the understanding of persons of ordinary
skill in the relevant art.
[0018] It is further noted that, unless indicated otherwise, all
functions described herein may be performed in either hardware or
software, or some combination thereof. In a preferred embodiment,
however, the functions are performed by a processor, such as a
computer or an electronic data processor, in accordance with code,
such as computer program code, software, and/or integrated circuits
that are coded to perform such functions, unless indicated
otherwise.
[0019] FIG. 1 is a diagram of one embodiment of a data processing
system 100 including a processor 102 coupled to a memory system
104. The processor 102 includes a byte execution unit 106 for
performing operations on 8-bit data units (i.e., byte operations).
In general, the processor 102 fetches and executes instructions of
a predefined instruction set stored in the memory system 104. As
illustrated in FIG. 1, the memory system 104 includes a software
program (i.e., code) 108 including instructions from the
instruction set. The code 108 includes a byte instruction 110 of
the instruction set. The byte instruction 110 specifies a byte
operation, and the byte execution unit 106 performs the specified
byte operation.
[0020] As indicated in FIG. 1, the processor 102 receives a clock
signal "CLOCK."The processor 102 executes instructions dependent
upon the CLOCK signal. In one embodiment, the processor 102
includes several functional units, and operations performed by the
functional units are synchronized by the CLOCK signal.
[0021] In the embodiment of FIG. 1, the byte instruction 110 may be
one of several byte instructions each specifying a different byte
operation. In general, the byte execution unit 106 is used to carry
out the byte operation specified by the byte instruction 110. When
the processor 102 fetches the byte instruction 110 from the memory
system 104, the byte execution unit 106 receives byte instruction
information and operand values. The byte instruction information
indicates the particular byte instruction and specifies the byte
operation to be performed. The byte execution unit 106 performs the
specified byte operation, thereby producing a result value, and
provides the result value to other circuitry within the processor
102 as indicated in FIG. 1.
[0022] In the embodiment of FIG. 1, the processor 102 also includes
a register file 112. As described below, the operand values
provided to the byte execution unit 106 are values stored in
specific registers of the register file 112, and the result value
is to be stored in a specific register of the register file
112.
[0023] In one embodiment, the instruction set executable by the
processor 102 of FIG. 1 includes the following four byte
instructions described in detail below: "CNTB" (count ones in
bytes), "AVGB" (average bytes), "ABSDB" (absolute differences of
bytes), and "SUMB" (sum bytes into half words). As used herein, the
term "word" refers to a 32-bit data unit, thus a half word is a
16-bit data unit. The byte instruction 110 of FIG. 1 may be, for
example, a CNTB instruction, an AVGB instruction, an ABSDB
instruction, or a SUMB instruction.
[0024] FIG. 2 is a diagram of one embodiment of the byte
instruction 110 of FIG. 1. In the embodiment of FIG. 2, the byte
instruction 10 includes an opcode field 200, a source register 2
field 202, a source register 1 field 204, and a destination
register field 206. The opcode field 200 contains a value
identifying the instruction as a byte instruction and specifying
the byte operation to be performed.
[0025] The source register 2 field 202 specifies a register of the
register file 112 of FIG. 1 containing a first operand value. The
source register 1 field 204 specifies another register of the
register file 112 containing a second operand value. The
destination register field 206 specifies a third register of the
register file 112 in which the result value produced by the byte
execution unit 106 of FIG. 1 is to be stored.
[0026] In one embodiment, the byte instruction 110 is the "count
ones in bytes" instruction having the mnemonic "CNTB." The opcode
field 200 of the CNTB instruction is an 11-bit field identifying
the instruction as the CNTB instruction, the source register 2
field 202 is ignored, the source register 1 field 204 specifies a
source register "RA," and the destination register field 206
specifies a destination register "RT." In one embodiment, the
register file 112 of FIG. 1 includes 128 128-bit registers each
having a different 7-bit address. The source register RA is a
128-bit register containing 16 8-bit "byte slots" referred to as
RA[0] through RA[15]. Similarly, the destination register RT is a
128-bit register containing 16 8-bit byte slots referred to as
RT[0] through RT[15].
[0027] An assembly language instruction using the CNTB instruction
is expressed "CNTB RT,RA" wherein the RT register is the
destination register and the RA register is the source register. In
general, execution of the CNTB instruction involves carrying out
the following operations for each of 16 byte slots of the source
register RA and the destination register RT: (i) count the number
of bits in a byte slot of the source register RA, and (ii) store
the count in the corresponding byte slot of the destination
register. The following pseudo code expresses the operation the
processor 102 of FIG. 1 carries out when executing the CNTB
instruction:
1 FOR J = 0 TO 15 `16 BYTE SLOTS C = 0 `COUNT OF ONES B = RA[J]
`SET B TO VALUE OF BYTE SLOT J OF RA FOR M = 0 TO 7 IF B[M] = 1
THEN C = C + 1 END RT[J] = C `STORE COUNT C IN BYTE SLOT J OF RT
END
[0028] In another embodiment, the byte instruction 110 is the
"average bytes" instruction having the mnemonic "AVGB." The opcode
field 200 of the AVGB instruction is an 11-bit field identifying
the instruction as the AVGB instruction, the source register 2
field 202 specifies a source register "RB," the source register 1
field 204 specifies the source register RA, and the destination
register field 206 specifies the destination register RT.
[0029] As described above, the source register RA is a 128-bit
register containing 16 8-bit byte slots referred to as RA[0]
through RA[15], and the destination register RT is a 128-bit
register containing 16 8-bit byte slots referred to as RT[0]
through RT[15]. Similarly, the source register RB is a 128-bit
register containing 16 8-bit byte slots referred to as RB[0]
through RB[15].
[0030] An assembly language instruction using the AVGB instruction
is expressed "AVGB RT,RA,RB" wherein the RT register is the
destination register and the RA and RB registers are the source
registers. In general, execution of the AVGB instruction involves
carrying out the following operations for each of the 16 byte slots
of the source and destination registers: (i) compute an average of
values stored in the corresponding byte slots of the RA and RB
source registers, and (ii) store the average of values in the
corresponding byte slot of the destination register. The following
pseudo code expresses the operation the processor 102 of FIG. 1
carries out when executing the AVGB instruction:
2 FOR J = 0 TO 15 `16 BYTE SLOTS RT[J] = (RA[J] + RB[J] + 1)
>> 1 `RIGHT SHIFT TO DIVIDE BY 2 END
[0031] In another embodiment, the byte instruction 110 is the
"absolute difference of bytes" instruction having the mnemonic
"ABSDB." The opcode field 200 of the ABSDB instruction is an 11-bit
field identifying the instruction as the ABSDB instruction, the
source register 2 field 202 specifies the source register RB, the
source register 1 field 204 specifies the source register RA, and
the destination register field 206 specifies the destination
register RT.
[0032] An assembly language instruction using the ABSDB instruction
is expressed "ABSDB RT,RA,RB" wherein the RT register is the
destination register and the RA and RB registers are the source
registers. In general, execution of the ABSDB instruction involves
carrying out the following operations for each of 16 byte slots of
the source and destination registers: (i) subtract a value stored
in a byte slot of the RA source register from a value stored in the
corresponding byte slot of the RB source register, (ii) compute an
absolute value of a result of the subtraction operation, and (iii)
store the absolute value of the result of the subtraction operation
in the corresponding byte slot of the destination register. The
following pseudo code expresses the operation the processor 102 of
FIG. 1 carries out when executing the ABSDB instruction:
3 FOR J = 0 TO 15 `16 BYTE SLOTS RT[J] = ABS(RB[J] - RA[J])
`ABSOLUTE DIFFERENCE OF OPERANDS END
[0033] In another embodiment, the byte instruction 110 is the "sum
bytes into half words" instruction having the mnemonic "SUMB." The
opcode field 200 of the SUMB instruction is an 11-bit field
identifying the instruction as the SUMB instruction, the source
register 2 field 202 specifies the source register RB, the source
register 1 field 204 specifies the source register RA, and the
destination register field 206 specifies the destination register
RT.
[0034] An assembly language instruction using the SUMB instruction
is expressed "SUMB RT,RA,RB" wherein the RT register is the
destination register and the RA and RB registers are the source
registers. In general, execution of the SUMB instruction involves
carrying out the following operations for each of 4 16-bit (double
byte) word slots: (i) compute a first sum of values stored in the
next 4 consecutive byte slots of the source register RB, (ii) store
the sum in the next 2 consecutive byte slots of the destination
register RT, (iii) compute a second sum of values stored in the
next 4 consecutive byte slots of the source register RA, and (iv)
store the second sum in the next 2 consecutive byte slots of the
destination register RT. The following pseudo code expresses the
operation the processor 102 of FIG. 1 carries out when executing
the SUMB instruction:
[0035] RT[0:1]=RB[0]+RB[1]+RB[2]+RB[3]
[0036] RT[2:3]=RA[0]+RA[1]+RA[2]+RA[3]
[0037] RT[4:5]=RB[4]+RB[5]+RB[6]+RB[7]
[0038] RT[6:7]=RA[4]+RA[5]+RA[6]+RA[7]
[0039] RT[10:11]=RA[8]+RAB[9]+RA[10]+RA[11]
[0040] RT[12:13]=RB[12]+RB[13]+RB[14]+RB[15]
[0041] RT[14:15]=RA[12]+RAB[13]+RA[14]+RA[15]
[0042] FIG. 3 is a diagram of one embodiment of the byte execution
unit 106 of FIG. 1. In the embodiment of FIG. 3, the byte execution
unit 106 includes four byte units 300A-300D and a control unit 302.
The control unit receives the byte instruction information
specifying the byte operation, and produces control signals
dependent upon the byte instruction information. Each of the four
byte units 300A-300D receives 32-bit portions of two 128-bit
operands "A" and "B," and control signals from the control unit
302, and produces a 32-bit portion of a 128-bit result value
"RESULT" dependent upon the control signals. For example, as
indicated in FIG. 3, the byte unit 300A receives the 32-bit
portions "A[0:31]" and "B[0:31]" of the respective operands A and
B, and control signals from the control unit 302, and performs an
operation on the operand values dependent on the control signals,
thereby producing a 32-bit result value "RESULT[0:31]."
[0043] In the embodiment described below, ordered sets of bits are
numbered such that higher valued (i.e., more significant) bits have
lower numbers than lower valued (i.e., less significant) bits. For
example, the A[0:31] operand includes bits A[0] through A[31],
wherein the bit A[0] is the highest valued (most significant) bit
and bit A[31] is the lowest valued (least significant) bit.
[0044] In one embodiment, the four byte units 300A-300D are
substantially identical and operate similarly. FIGS. 4A-4D in
combination form a diagram of one embodiment of the byte unit 300A
of FIG. 3. A first population counter unit 400A, a first 4:2
compressor 402A, and a portion of adder input multiplexer (MUX)
logic 404 are shown in FIG. 4A. The population counter unit 400A
includes four separate population counters 406A-406D. Each of the
population counters 406A-406D receives a different 4-bit quantity
of the A[0:15] portion of the A[0:31] operand and produces an
output signal indicating a number of logic `1`s in the received
4-bit quantity.
[0045] The 4:2 compressor 402A receives the B[0:31] operand and
produces output signals "F0[0:7]," "F0[8]," and "F1[0:7]" wherein
the F0[0] signal conveys a carry value resulting from an addition
operation (B[0]+B[8]+B[16]), the F0[1:8] signal conveys a sum
vector (see FIG. 5), the F0[8] signal conveys a sum value resulting
from an addition operation (B[7]+B[15]+B[23]+B[31]), and the
F1[0:7] signal conveys a carry vector (see FIG. 5).
[0046] The portion of the adder input MUX logic 404 shown in FIG.
4A includes several multiplexers receiving output signals produced
by the population counter unit 400 and the 4:2 compressor 402 as
input signals. Each of the multiplexers also receives one or more
control signals from the control unit 302 of FIG. 3, and produces
one of the input signals as an output signal dependent upon the one
or more control signals. The portion of the adder input MUX logic
404 shown in FIG. 4A produces output signals "X0[0:7]" or simply
"X0," "Y0[0:7]" or simply "Y0," "C0," "X1[0:7]" or simply "X1,"
"Y1[0:7]" or simply "Y1," and "C1."
[0047] Table 1 below shows the output signals produced by the
portion of the adder input MUX logic 404 shown in FIG. 4A:
4TABLE 1 Output Signals Produced by The Portion of the Adder Input
MUX Logic 404 Shown in FIG. 4A. INSTR. X0[0:7] Y0[0:7] C0 X1[1:7]
Y1[0:7] C1 ABSDB A[0:7] B_b[0:7] 0 A[8:15] B_b[8:15] 0 AVGB A[0:7]
B[0:7] 1 A[8:15] B[8:15] 1 CNTB `0000` `0000` 0 `0000` `0000` 0
+P0[0:2] +P1[0:2] +P2[0:2] +P3[0:2] SUMB 0 0 0 F0[0:7] F1[0:7]
0
[0048] As used herein, the suffix "_b" following a signal name
indicates the logical complement of the signal. For example, the
B_b[0:7] signal is the bitwise logical complement of the B[0:7]
signal. The `+` symbols in Table 1 above represent a concatenation
operation.
[0049] In FIG. 4B, two 8-bit compound adders 408A and 408B and a
portion of result multiplexer (MUX) logic 410 are shown. The 8-bit
compound adder 408A receives the X0, Y0, and C0 signals produced by
the portion of the MUX logic 404 shown in FIG. 4A, and produces
output signals "S0[0]," "SUM1/T0[0:8]," "SUM_b/S0_b[0:8]," and
"SUM/S0[0:8]," wherein the SUM_b/S0_b[0:8] signal is the bitwise
logical complement of the SUM/S0[0:8] signal.
[0050] The compound adder 408A produces the SUM/S0[0:8] signal by
summing X0, Y0, and C0, i.e. SUM/S0[0:8]=X0[0:7]+Y0[0:7]+C0. The
most significant bit S0[0] is the carry out, and bits S0[1:8]
represent the 8-bit sum. The SUM1/T0[0:8] signal is produced by
summing X0, Y0, and a carry in of `1`, i.e.,
SUM1/T0[0:8]=X0[0:7]+Y0[0:7]+1.
[0051] The 8-bit compound adder 408B receives the X1, Y1, and C1
signals produced by the portion of the MUX logic 404 shown in FIG.
4A, and produces output signals "S1[0]," "SUM1/T1[0:8],"
"SUM_b/S1_b[0:8]," and "SUM/S1[0:8]," wherein the SUM_b/S1_b[0:8]
signal is the bitwise logical complement of the SUM/S1[0:8]
signal.
[0052] The compound adder 408B produces the SUM/S1[0:8] signal by
summing X1, Y1, and C1, i.e. SUM/S1[0:8]=X[0:7]+Y1[0:7]+C1. The
most significant bit S1[0] is the carry out, and bits S1[1:8]
represent the 8-bit sum. The SUM1/T1[0:8] signal is produced by
summing X1, Y1, and a carry in of `1`, i.e.,
SUM1/T1[0:8]=X1[0:7]+Y1[0:7]+1.
[0053] The portion of the result MUX logic 410 shown in FIG. 4B
includes several multiplexers receiving output signals produced by
the 8-bit compound adders 408A and 408B as input signals. Each of
the multiplexers also receives one or more control signals from the
control unit 302 of FIG. 3, and produces one of the input signals
as an output signal dependent upon the one or more control signals.
The portion of the result MUX logic 410 shown in FIG. 4A produces
output signals "RESULT[0:7]" and "RESULT[8:15]."
[0054] Table 2 below shows the output signals produced by the
portion of the result MUX logic 410 shown in FIG. 4B:
5TABLE 2 Output Signals Produced by The Portion of the Result MUX
Logic 410 Shown in FIG. 4B. CONDITION RESULT[0:7] RESULT[8:15]
ABSDB IF S0[0] = 1 IF S1[0] = 1 THEN T0[1:8] THEN T1[1:8] ELSE
S0_b[1:8] ELSE S1_b[1:8] AVGB T0[0:7] T1[0:7] CNTB `0000`S0[5:8]
`0000`S1[5:8] SUMB `000000`S1[0:1] S1[2:8]F0[8]
[0055] It is noted that, as indicated in Table 2 above, when a SUMB
instruction is fetched and executed, the 8-bit compound adder 408B
produces a 10-bit result conveyed by the concatenated RESULT[0:7]
and RESULT[8:15] signals.
[0056] A second population counter unit 400B, a second 4:2
compressor 402B, and another portion of the adder input MUX logic
404 are shown in FIG. 4C. The population counter unit 400B includes
four separate population counters 412A-412D. Each of the population
counters 412A-412D receives a different 4-bit quantity of the
A[16:31] portion of the A[0:31] operand and produces an output
signal indicating a number of logic `1`s in the received 4-bit
quantity.
[0057] The 4:2 compressor 402B receives the A[0:31] operand and
produces output signals "F2[0:7]," "F2[8]," and "F3[0:7]" wherein
the F2[0] signal conveys a carry value resulting from an addition
operation (A[0]+A[8]+A[16]), the F2[1:8] signal conveys a sum
vector, the F2[8] signal conveys a sum value resulting from an
addition operation (A[7]+A[15]+A[23]+A[31]), and the F3[0:7] signal
conveys a carry vector.
[0058] The portion of the adder input MUX logic 404 shown in FIG.
4C includes several multiplexers receiving output signals produced
by the population counter unit 400B and the 4:2 compressor 402B as
input signals. Each of the multiplexers also receives one or more
control signals from the control unit 302 of FIG. 3, and produces
one of the input signals as an output signal dependent upon the one
or more control signals. The portion of the adder input MUX logic
404 shown in FIG. 4C produces output signals "X2[0:7]" or simply
"X2," "Y2[0:7]" or simply "Y2," "C2," "X3[0:7]" or simply "X3,"
"Y3[0:7]" or simply "Y3," and "C3."
[0059] Table 3 below shows the output signals produced by the
multiplexers of the portion of the MUX logic 404 shown in FIG.
4C:
6TABLE 3 Output Signals Produced by The Portion of the Adder Input
MUX Logic 404 Shown in FIG. 4C. INSTR. X2[0:7] Y2[0:7] C2 X3[1:7]
Y3[0:7] C3 ABSD A[16:23] B_b[16:23] 0 A[24:31] B_b[24:31] 0 AVGB
A[16:23] B[16:23] 1 A[24:31] B[24:31] 1 CNTB `0000` `0000` 0 `0000`
`0000` 0 +P4[0:2] +P5[0:2] +P6[0:2] +P7[0:2] SUMB 0 0 0 F2[0:7]
F3[0:7] 0
[0060] In Table 3 above, the `+` signal represents the
concatenation operation. In FIG. 4D, two more 8-bit compound adders
408C and 408D and another portion of the result MUX logic 410 are
shown. The 8-bit compound adder 408C receives the X2, Y2, and C2
signals produced by the portion of the MUX logic 404 shown in FIG.
4C, and produces output signals "S2[0]," "SUM1/T2[0:8],"
"SUM_b/S2_b[0:8]," and "SUM/S2[0:8]," wherein the SUM_b/S2_b[0:8]
signal is the bitwise logical complement of the SUM/S2[0:8]
signal.
[0061] The compound adder 408C produces the SUM/S2[0:8] signal by
summing X2, Y2, and C2, i.e. SUM/S2[0:8]=X2[0:7]+Y2[0:7]+C2. The
most significant bit S2[0] is the carry out, and bits S2[1:8]
represent the 8-bit sum. The SUM1/T2[0:8] signal is produced by
summing X2, Y2, and a carry in of `1`, i.e.,
SUM1/T2[0:8]=X2[0:7]+Y2[0:7]+1.
[0062] The 8-bit compound adder 408D receives the X3, Y3, and C3
signals produced by the portion of the MUX logic 404 shown in FIG.
4C, and produces output signals "S3[0] ," "SUM1/T3[0:8],"
"SUM_b/S3_b[0:8]," and "SUM/S3[0:8]," wherein the SUM_b/S3_b[0:8]
signal is the bitwise logical complement of the SUM/S3[0:8]
signal.
[0063] The compound adder 408D produces the SUM/S3[0:8] signal by
summing X3, Y3, and C3, i.e. SUM/S3[0:8]=X3[0:7]+Y3[0:7]+C3. The
most significant bit S3[0] is the carry out, and bits S3[1:8]
represent the 8-bit sum. The SUM1/T3[0:8] signal is produced by
summing X3, Y3, and a carry in of `1`, i.e.,
SUM1/T3[0:8]=X3[0:7]+Y3[0:7]+1.
[0064] The portion of the result MUX logic 410 shown in FIG. 4D
includes several multiplexers receiving output signals produced by
the 8-bit compound adders 408C and 408D as input signals. Each of
the multiplexers also receives one or more control signals from the
control unit 302 of FIG. 3, and produces one of the input signals
as an output signal dependent upon the one or more control signals.
The portion of the result MUX logic 410 shown in FIG. 4D produces
output signals "RESULT[16:23]" and "RESULT[24:3 1]."
[0065] Table 4 below shows the output signals produced by the
portion of the result MUX logic 410 shown in FIG. 4D:
7TABLE 4 Output Signals Produced by The Portion of the Result MUX
Logic 410 Shown in FIG. 4D. CONDITION RESULT[0:7] RESULT[8:15] ABSD
IF S0[0] = 1 IF S3[0] = 1 THEN T2[1:8] THEN T3[1:8] ELSE S2_b[1:8]
ELSE S3_b[1:8] AVGB T2[0:7] T3[0:7] CNTB `0000`S2[5:8]
`0000`S3[5:8] SUMB `000000`S3[0:1] S3[2:8]F2[8]
[0066] It is noted that, as indicated in Table 4 above, when a SUMB
instruction is fetched and executed, the 8-bit compound adder 408D
produces a 10-bit result conveyed by the concatenated RESULT[16:23]
and RESULT[24:31] signals.
[0067] FIG. 5 is a diagram of one embodiment of the 4:2 compressor
402A of FIG. 4A. In the embodiment of FIG. 5, the 4:2 compressor
402A includes several full adder (FA) circuits 500 and a half adder
circuit 502 arranged to form a two-dimensional array having two
rows and eight columns. Each full adder circuit 500 receives three
bits as inputs, and produces a 1-bit sum "S" and a 1-bit carry "C."
The half adder circuit 502 receives two bits as inputs, and
produces a 1-bit sum "S" and a 1-bit carry "C." Each row of adders
essentially functions as a 3:2 compressor, the two 3:2 compressors
being connected in cascade as shown in FIG. 5 to form the 4:2
compressor 402A.
[0068] The byte execution unit 106 of FIGS. 1 and 3 includes four
separate byte units. Each of the four byte units includes two
population counter units, two 4:2 compressors, and adder input MUX
logic forming pre-processing logic. In general, the pre-processing
logic performs an operation upon input operands dependent upon
control signals produced by a corresponding control unit.
[0069] Each of the four byte units also includes four 8-bit
compound adders that constitute adder logic. In general, the adder
logic receives signals produced by the pre-processing logic and
performs an addition operation upon the received signals, thereby
producing a result. In some cases the result includes a sum signal
and a sum+1 signal.
[0070] Each of the four byte units also includes result MUX logic
forming post-processing logic. The post-processing logic receives
the result produced by the adder logic and performs an operation
upon the result dependent upon control signals produced by the
corresponding control unit.
[0071] For example, as described above, an assembly language
instruction using the "absolute differences of bytes" instruction
ABSDB is expressed "ABSDB RT,RA,RB" wherein the RT register is the
destination register and the RA and RB registers are the source
registers. In general, for each byte, RT=ABS(RA-RB). The ABSDB
instruction may be implemented as RT=((RA+RB_b+EAC) XOR EAC) where
EAC=end-around-carry=CARRY(RA+RB_b). In this situation, the
pre-processing logic may produce values X, Y, and C (i.e., CIN) for
an 8-bit compound adder such that X=RA, Y=NOT(RB), and CIN=0. In
general, the 8-bit compound receives the X, Y, and CIN signals, and
produces signals "S[0:8]" and "T[0:8]" such that S[0:8]=X+Y and
T[0:8]=X+Y+1. The post-processing logic produces a "RESULT" signal
such that if S[0]=0 then RESULT=NOT(S[1:8]) else RESULT=T[1:8].
[0072] The "average of bytes" instruction AVGB may be expressed
"AVGB RT,RA,RB" wherein the RT register is the destination register
and the RA and RB registers are the source registers. In general,
for each byte, RT=(RA+RB+1)>>1. The pre-processing logic may
produce the values X, Y, and CIN for the 8-bit compound adder such
that X=RA, Y=RB, and CIN=1. As described above, the 8-bit compound
receives the X, Y, and CIN signals, and produces the signals S[0:8]
and T[0:8] such that S[0:8]=X+Y and T[0:8]=X+Y+1. The
pre-processing logic produces the RESULT signal such that
RESULT=S[0:7].
[0073] The "count ones in bytes" instruction CNTB may be expressed
"CNTB RT,RA" wherein the RT register is the destination register
and the RA register is the source register. In general, for each
byte, RT=COUNT_ONES(RA). The CNTB instruction may be implemented by
counting the number of logic ones in 4-bit units of RA and adding
the results. (This approach allows more generalized hardware to be
used.) The pre-processing logic may produce the values X, Y, and
CIN for the 8-bit compound adder such that X=COUNT_ONES(RA[0:3]),
Y=COUNT_ONES(RA[4:7]), and CIN=0. As described above, the 8-bit
compound receives the X, Y, and CIN signals, and produces the
signals S[0:8] and T[0:8] such that S[0:8]=X+Y and T[0:8]=X+Y+1.
The pre-processing logic produces the RESULT signal such that
RESULT=S[1:8].
[0074] The "sum bytes into half words" instruction SUMB may be
expressed "SUMB RT,RA,RB" wherein the RT register is the
destination register and the RA and RB registers are the source
registers. In general, for word slot:
[0075] RT[0:15]=RB[0:7]+RB[8:15]+RB[16:23]+RB[24:31]
[0076] RT[16:31]=RA[0:7]+RA[8:15]+RA[16:23]+RA[24:31]
[0077] A 4:2 compressor receives four 8-bit input vectors and
produces two intermediate result vectors: a 9-bit vector F0[0:8]
and an 8-bit vector F1[0:7]. The 8-bit compound adder receives
F0[0:7] as X and F1[0:7] as Y, and computes the signal S[0:8] such
that S[0:8]=F0[0:7]+F1[0:7]. The post-processing logic produces a
10-bit result signal "R[0:9]" such that R[0:9]={S[0:8], F0[8]}.
[0078] FIG. 6 is a diagram illustrating how logic of the byte unit
300A of FIGS. 4A-4D computes a 10-bit sum of bytes using the 4:2
compressor 402A and the 8-bit compound adder 408B. In general, the
4:2 compressor 402A receives the 32-bit operand B[0:31] and
performs a compression operation upon the operand, thereby
producing signals F0[0:8] and F1[0:7] as illustrated in FIG. 4A and
described above. As indicated in FIG. 6, the 4:2 compressor 402A
includes two 3:2 compressors in cascade. The 8-bit compound adder
408B receives the signals F0[0:7] and F1[0:7] produced by the 4:2
compressor 402A and sums the F0[0:7] and F1[0:7] signals, thereby
producing a 10-bit result generally referred to as sum signal
"S[0:9]." As indicated in FIG. 6, the signal F0[8] bypasses the
adder 408B and becomes the least significant bit S[9] of the sum
signal (i.e., S[9]=F0[8]). The carry out signal "CARRY" of the
adder 408B becomes the most significant bit S[0] of the sum
signal.
[0079] The particular embodiments disclosed above are illustrative
only, as the invention may be modified and practiced in different
but equivalent manners apparent to those skilled in the art having
the benefit of the teachings herein. Furthermore, no limitations
are intended to the details of construction or design herein shown,
other than as described in the claims below. It is therefore
evident that the particular embodiments disclosed above may be
altered or modified and all such variations are considered within
the scope and spirit of the invention. Accordingly, the protection
sought herein is as set forth in the claims below.
* * * * *