U.S. patent application number 09/840026 was filed with the patent office on 2001-10-25 for risc microprocessor architecture implementing multiple register sets.
Invention is credited to Chen, Sho Long, Garg, Sanjiv, Lentz, Derek J., Nguyen, Le Trong.
Application Number | 20010034823 09/840026 |
Document ID | / |
Family ID | 24919957 |
Filed Date | 2001-10-25 |
United States Patent
Application |
20010034823 |
Kind Code |
A1 |
Garg, Sanjiv ; et
al. |
October 25, 2001 |
RISC microprocessor architecture implementing multiple register
sets
Abstract
A register system for a data processor which operates in a
plurality of modes. The register system provides multiple,
identical banks of register sets, the data processor controlling
access such that instructions and processes need not specify any
given bank. An integer register set includes first (RA[23:0]) and
second (RA[31:24]) subsets, and a shadow subset (RT[31:24]). While
the data processor is in a first mode, instructions access the
first and second subsets. While the data processor is in a second
mode, instructions may access the first subset, but any attempts to
access the second subset are re-routed to the shadow subset
instead, transparently to the instructions, allowing system
routines to seemingly use the second subset without having to save
and restore data which user routines have written to the second
subset. A re-typable register set provides integer width data and
floating point width data in response to integer instructions and
floating point instructions, respectively. Boolean comparison
instructions specify particular integer or floating point registers
for source data to be compared, and specify a particular Boolean
register for the result, so there are no dedicated, fixed-location
status flags. Boolean combinational instructions combine specified
Boolean registers, for performing complex Boolean comparisons
without intervening conditional branch instructions, to minimize
pipeline disruption.
Inventors: |
Garg, Sanjiv; (Fremont,
CA) ; Lentz, Derek J.; (Los Gatos, CA) ;
Nguyen, Le Trong; (Monte Sereno, CA) ; Chen, Sho
Long; (Saratoga, CA) |
Correspondence
Address: |
STERNE, KESSLER, GOLDSTEIN & FOX PLLC
1100 NEW YORK AVENUE, N.W., SUITE 600
WASHINGTON
DC
20005-3934
US
|
Family ID: |
24919957 |
Appl. No.: |
09/840026 |
Filed: |
April 24, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09840026 |
Apr 24, 2001 |
|
|
|
09480136 |
Jan 10, 2000 |
|
|
|
6249856 |
|
|
|
|
09480136 |
Jan 10, 2000 |
|
|
|
09188708 |
Nov 10, 1998 |
|
|
|
6044449 |
|
|
|
|
09188708 |
Nov 10, 1998 |
|
|
|
08937361 |
Sep 25, 1997 |
|
|
|
5838986 |
|
|
|
|
08937361 |
Sep 25, 1997 |
|
|
|
08665845 |
Jun 19, 1996 |
|
|
|
5682546 |
|
|
|
|
08665845 |
Jun 19, 1996 |
|
|
|
08465239 |
Jun 5, 1995 |
|
|
|
5560035 |
|
|
|
|
08465239 |
Jun 5, 1995 |
|
|
|
07726773 |
Jul 8, 1991 |
|
|
|
5493687 |
|
|
|
|
Current U.S.
Class: |
712/208 ;
712/225; 712/E9.018; 712/E9.025; 712/E9.027; 712/E9.035;
712/E9.053 |
Current CPC
Class: |
G06F 9/30029 20130101;
G06F 9/30116 20130101; G06F 9/30189 20130101; G06F 9/3012 20130101;
G06F 9/30138 20130101; G06F 9/30036 20130101; G06F 9/3013 20130101;
G06F 9/30112 20130101; G06F 9/30123 20130101; G06F 9/3851 20130101;
G06F 9/30167 20130101 |
Class at
Publication: |
712/208 ;
712/225 |
International
Class: |
G06F 009/00 |
Claims
What is claimed is:
1. An apparatus executing a set of instructions, the instructions
including one or more fields, wherein a field of a given
instruction specifies a source of an operand of the given
instruction or a destination of a result of the given instruction,
and wherein the apparatus comprises: processing means for executing
the instructions; and a register file, coupled to the processing
means, for storing operands and results of the instructions,
wherein, the register file includes a plurality of register sets,
and the register file is responsive to one or more of the fields in
a given instruction to retrieve an operand of the given instruction
from, or store a result of the given instruction into, a given
register in a given one of the register sets as identified by the
one or more fields in the given instruction.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of application Ser. No.
09/480,136, filed Jan. 10, 2000, now allowed, which is a
continuation of application Ser. No. 09/188,708, filed Nov. 10,
1998, now U.S. Pat. No. 6,044,449, which is a continuation of
application Ser. No. 08/937,361, filed Sep. 25, 1997, now U.S. Pat.
No. 5,838,986, which is a continuation of application Ser. No.
08/665,845, filed Jun. 19, 1996, now U.S. Pat. No. 5,682,546, which
is a continuation of application Ser. No. 08/465,239, filed Jun. 5,
1995, now U.S. Pat. No. 5,560,035, which is a continuation of
application Ser. No. 07/726,773, filed Jul. 8, 1991, now U.S. Pat.
No. 5,493,687. Each of the above-referenced applications is
incorporated by reference in its entirety herein.
[0002] Applications of particular interest to the present
application, include:
[0003] 1. High-Performance, Superscalar-Based Computer System with
Out-of-Order Instruction Execution, application Ser. No.
07/817,810, filed Jan. 8, 1992, now U.S. Pat. No. 5,539,911, by Le
Trong Nguyen et al.;
[0004] 2. High-Performance Superscalar-Based Computer System with
Out-of-Order Instruction Execution and Concurrent Results
Distribution, application Ser. No. 08/397,016, filed Mar. 1, 1995,
now U.S. Pat. No. 5,560,032, by Quang Trang et al.;
[0005] 3. RISC Microprocessor Architecture with Isolated
Architectural Dependencies, application Ser. No. 08/292,177, filed
Aug. 18, 1994, now abandoned, which is a FWC of application Ser.
No. 07/817,807, filed Jan. 8, 1992, which is a continuation of
application Ser. No. 07/726,744, filed Jul. 8, 1991, by Yoshiyuki
Miyayama;
[0006] 4. RISC Microprocessor Architecture Implementing Fast Trap
and Exception State, application Ser. No. 08/345,333, filed Nov.
21, 1994, now U.S. Pat. No. 5,481,685, by Quang Trang;
[0007] 5. Page Printer Controller Including a Single Chip
Superscalar Microprocessor with Graphics Functional Units,
application Ser. No. 08/267,646, filed Jun. 28, 1994, now U.S. Pat.
No. 5,394,515, by Derek Lentz et al., and
[0008] 6. Microprocessor Architecture Capable with a Switch Network
for Data Transfer Between Cache, Memory Port, and IOU, application
Ser. No. 07/726,893, filed Jul. 8, 1991, now U.S. Pat. No.
5,440,752, by Derek Lentz et al.
BACKGROUND OF THE INVENTION
[0009] 1. Field of the Invention
[0010] The present invention relates generally to microprocessors,
and more specifically to a RISC microprocessor having plural,
symmetrical sets of registers.
[0011] 2. Background
[0012] In addition to the usual complement of main memory storage
and secondary permanent storage, a microprocessor-based computer
system typically also includes one or more general purpose data
registers, one or more address registers, and one or more status
flags. Previous systems have included integer registers for holding
integer data and floating point registers for holding floating
point data. Typically, the status flags are used for indicating
certain conditions resulting from the most recently executed
operation. There generally are status flags for indicating whether,
in the previous operation: a carry occurred, a negative number
resulted, and/or a zero resulted.
[0013] These flags prove useful in determining the outcome of
conditional branching within the flow of program control. For
example, if it is desired to compare a first number to a second
number and upon the conditions that the two are equal, to branch to
a given subroutine, the microprocessor may compare the two numbers
by subtracting one from the other, and setting or clearing the
appropriate condition flags. The numerical value of the result of
the subtraction need not be stored. A conditional branch
instruction may then be executed, conditioned upon the status of
the zero flag. While being simple to implement, this scheme lacks
flexibility and power. Once the comparison has been performed, no
further numerical or other operations may be performed before the
conditional branch upon the appropriate flag; otherwise, the
intervening instructions will overwrite the condition flag values
resulting from the comparison, likely causing erroneous branching.
The scheme is further complicated by the fact that it may be
desirable to form greatly complex tests for branching, rather than
the simple equality example given above.
[0014] For example, assume that the program should branch to the
subroutine only upon the condition that a first number is greater
than a second number, and a third number is less than a fourth
number, and a fifth number is equal to a sixth number. It would be
necessary for previous microprocessors to perform a lengthy series
of comparisons heavily interspersed with conditional branches. A
particularly undesirable feature of this serial scheme of comparing
and branching is observed in any microprocessor having an
instruction pipeline.
[0015] In a pipelined microprocessor, more than one instruction is
being executed at any given time, with the plural instructions
being in different stages of execution at any given moment. This
provides for vastly improved throughput. A typical pipeline
microprocessor may include pipeline stages for: (a) fetching an
instruction, (b) decoding the instruction, (c) obtaining the
instruction's operands, (d) executing the instruction, and (e)
storing the results. The problem arises when a conditional branch
instruction is fetched. It may be the case that the conditional
branch's condition cannot yet be tested, as the operands may not
yet be calculated, if they are to result from operations which are
yet in the pipeline. This results in a "pipeline stall," which
dramatically slows down the processor.
[0016] Another shortcoming of previous microprocessor-based systems
is that they have included only a single set of registers of any
given data type. In previous architectures, when an increased
number of registers has been desired within a given data type, the
solution has been simply to increase the size of the single set of
those type of registers. This may result in addressing problems,
access conflict problems, and symmetry problems.
[0017] On a similar note, previous architectures have restricted
each given register set to one respective numerical data type.
Various prior systems have allowed general purpose registers to
hold either numerical data or address "data," but the present
application will not use the term "data" to include addresses. What
is intended may be best understood with reference to two prior
systems. The Intel 8085 microprocessor includes a register pair
"HL" which can be used to hold either two bytes of numerical data
or one two-byte address. The present application's improvement is
not directed to that issue. More on point, the Intel 80486
microprocessor includes a set of general purpose integer data
registers and a set of floating point registers, with each set
being limited to its respective data type, at least for purposes of
direct register usage by arithmetic and logic units.
[0018] This proves wasteful of the microprocessor's resources, such
as the available silicon area, when the microprocessor is
performing operations which do not involve both data types. For
example, user applications frequently involve exclusively integer
operations, and perform no floating point operations whatsoever.
When such a user application is run on a previous microprocessor
which includes floating point registers (such as the 80486), those
floating point registers remain idle during the entire
execution.
[0019] Another problem with previous microprocessor register set
architecture is observed in context switching or state switching
between a user application and a higher access privilege level
entity such as the operating system kernel. When control within the
microprocessor switches context, mode, or state, the operating
system kernel or other entity to which control is passed typically
does not operate on the same data which the user application has
been operating on. Thus, the data registers typically hold data
values which are not useful to the new control entity but which
must be maintained until the user application is resumed. The
kernel must generally have registers for its own use, but typically
has no way of knowing which registers are presently in use by the
user application. In order to make space for its own data, the
kernel must swap out or otherwise store the contents of a
predetermined subset of the registers. This results in considerable
loss of processing time to overhead, especially if the kernel makes
repeated, short-duration assertions of control.
[0020] On a related note, in prior microprocessors, when it is
required that a "grand scale" context switch be made, it has been
necessary for the microprocessor to expend even greater amounts of
processing resources, including a generally large number of
processing cycles, to save all data and state information before
making the switch. When context is switched back, the same
performance penalty has previously been paid, to restore the system
to its former state. For example, if a microprocessor is executing
two user applications, each of which requires the full complement
of registers of each data type, and each of which may be in various
stages of condition code setting operations or numerical
calculations, each switch from one user application to the other
necessarily involves swapping or otherwise saving the contents of
every data register and state flag in the system. This obviously
involves a great deal of operational overhead, resulting in
significant performance degradation, particularly if the main or
the secondary storage to which the registers must be saved is
significantly slower than the microprocessor itself.
[0021] Therefore, we have discovered that it is desirable to have
an improved microprocessor architecture which allows the various
component conditions of a complex condition to be calculated
without any intervening conditional branches. We have further
discovered that it is desirable that the plural simple conditions
be calculable in parallel, to improve throughput of the
microprocessor.
[0022] We have also discovered that it is desirable to have an
architecture which allows multiple register sets within a given
data type.
[0023] Additionally, we have discovered it to be desirable for a
microprocessor's floating point registers to be usable as integer
registers, in case the available integer registers are inadequate
to optimally to hold the necessary amount of integer data. Notably,
we have discovered that it is desirable that such re-typing be
completely transparent to the user application.
[0024] We have discovered it to be highly desirable to have a
microprocessor which provides a dedicated subset of registers which
are reserved for use by the kernel in lieu of at least a subset of
the user registers, and that this new set of registers should be
addressable in exactly the same manner as the register subset which
they replace, in order that the kernel may use the same register
addressing scheme as user applications. We have further observed
that it is desirable that the switch between the two subsets of
registers require no microprocessor overhead cycles, in order to
maximally utilize the microprocessor's resources.
[0025] Also, we have discovered it to be desirable to have a
microprocessor architecture which allows for a "grand scale"
context switch to be performed with minimal overhead. In this vein,
we have discovered that is desirable to have an architecture which
allows for plural banks of register sets of each type, such that
two or more user applications may be operating in a multi-tasking
environment, or other "simultaneous" mode, with each user
application having sole access to at least a full bank of
registers. It is our discovery that the register addressing scheme
should, desirably, not differ between user applications, nor
between register banks, to maximize simplicity of the user
applications, and that the system should provide hardware support
for switching between the register banks so that the user
applications need not be aware of which register bank which they
are presently using or even of the existence of other register
banks or of other user applications.
[0026] These and other advantages of our invention will be
appreciated with reference to the following description of our
invention, the accompanying drawings, and the claims.
SUMMARY OF THE INVENTION
[0027] The present invention provides a register file system
comprising: an integer register set including first and second
subsets of integer registers, and a shadow subset; a re-typable set
of registers which are individually usable as integer registers or
as floating point registers; and a set of individually addressable
Boolean registers.
[0028] The present invention includes integer and floating point
functional units which execute integer instructions accessing the
integer register set, and which operate in a plurality of modes. In
any mode, instructions are granted ordinary access to the first
subset of integer registers. In a first mode, instructions are also
granted ordinary access to the second subset. However, in a second
mode, instructions attempting to access the second subset are
instead granted access to the shadow subset, in a manner which is
transparent to the instructions. Thus, routines may be written
without regard to which mode they will operate in, and system
routines (which operate in the second mode) can have at least the
second subset seemingly at their disposal, without having to expend
the otherwise-required overhead of saving the second subset's
contents (which may be in use by user processes operating in the
first mode).
[0029] The invention further includes a plurality of integer
register sets, which are individually addressable as specified by
fields in instructions. The register sets include read ports and
write ports which are accessed by multiplexers, wherein the
multiplexers are controlled by contents of the register
set-specifying fields in the instructions.
[0030] One of the integer register sets is also usable as a
floating point register set. In one embodiment, this set is
sixty-four bits wide to hold double-precision floating point data,
but only the low order thirty-two bits are used by integer
instructions.
[0031] The invention includes functional units for performing
Boolean operations, and further includes a Boolean register set for
holding results of the Boolean operations such that no dedicated,
fixed-location status flags are required. The integer and floating
point functional units execute numerical comparison instructions,
which specify individual ones of the Boolean registers to hold
results of the comparisons. A Boolean functional unit executes
Boolean combinational instructions whose sources and destination
are specified registers in the Boolean register set. Thus, the
present invention may perform conditional branches upon a single
result of a complex Boolean function without intervening
conditional branch instructions between the fundamental parts of
the complex Boolean function, minimizing pipeline disruption in the
data processor.
[0032] Finally, there are multiple, identical register banks in the
system, each bank including the above-described register sets. A
bank may be allocated to a given process or routine, such that the
instructions within the routine need not specify upon which bank
they operate.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] FIG. 1 is a block diagram of the instruction execution unit
of the microprocessor of the present invention, showing the
elements of the register file.
[0034] FIGS. 2, 2A, 3, 3A and 4 are simplified schematic and block
diagrams of the floating point, integer and Boolean portions of the
instruction execution unit of FIG. 1, respectively.
[0035] FIGS. 5-6 are more detailed views of the floating point and
integer portions, respectively, showing the means for selecting
between register sets.
[0036] FIG. 7 illustrates the fields of an exemplary microprocessor
instruction word executable by the instruction execution unit of
FIG. 1.
DETAILED DESCRIPTION OF THE INVENTION
[0037] I. Register File
[0038] FIG. 1 illustrates the basic components of the instruction
execution unit (IEU) 10 of the RISC (reduced instruction set
computing) processor of the present invention. The IEU 10 includes
a register file 12 and an execution engine 14. The register file 12
includes one or more register banks 16-0 to 16-n. It will be
understood that the structure of each register bank 16 is identical
to all of the other register banks 16. Therefore, the present
application will describe only register bank 16-0. The register
bank includes a register set A 18, a register set FB 20, and a
register set C 22.
[0039] In general, the invention may be characterized as a RISC
microprocessor having a register file optimally configured for use
in the execution of RISC instructions, as opposed to conventional
register files which are sufficient for use in the execution of
CISC (complex instruction set computing) instructions by CISC
processors. By having a specially adapted register file, the
execution engine of the microprocessor's IEU achieves greatly
improved performance, both in terms of resource utilization and in
terms of raw throughput. The general concept is to tune a register
set to a RISC instruction, while the specific implementation may
involve any of the register sets in the architecture.
[0040] A. Register Set A
[0041] Register set A 18 includes integer registers 24 (RA[31:0]),
each of which is adapted to hold an integer value datum. In one
embodiment, each integer may be thirty-two bits wide. The RA[ ]
integer registers 24 include a first plurality 26 of integer
registers (RA[23:0]) and a second plurality 28 of integer registers
(RA[31:24]). The RA[ ] integer registers 24 are each of identical
structure, and are each addressable in the same manner, albeit with
a unique address within the integer register set 24. For example, a
first integer register 30 (RA[0]) is addressable at a zero offset
within the integer register set 24.
[0042] RA[0] always contains the value zero. It has been observed
that user applications and other programs use the constant value
zero more than any other constant value. It is, therefore,
desirable to have a zero readily available at all times, for
clearing, comparing, and other purposes. Another advantage of
having a constant, hard-wired value in a given register, regardless
of the particular value, is that the given register may be used as
the destination of any instruction whose results need not be
saved.
[0043] Also, this means that the fixed register will never be the
cause of a data dependency delay. A data dependency exists when a
"slave" instruction requires, for one or more of its operands, the
result of a "master" instruction. In a pipelined processor, this
may cause pipeline stalls. For example, the master instruction,
although occurring earlier in the code sequence than the slave
instruction, may take considerably longer to execute. It will be
readily appreciated that if a slave "increment and store"
instruction operates on the result data of a master "quadruple-word
integer divide" instruction, the slave instruction will be fetched,
decoded, and awaiting execution many clock cycles before the master
instruction has finished execution. However, in certain instances,
the numerical result of a master instruction is not needed, and the
master instruction is executed for some other purpose only, such as
to set condition code flags. If the master instruction's
destination is RA[0], the numerical results will be effectively
discarded. The data dependency checker (not shown) of the IEU 10
will not cause the slave instruction to be delayed, as the ultimate
result of the master instruction--zero--is already known.
[0044] The integer register set A 24 also includes a set of shadow
registers 32 (RT[31:24]). Each shadow register can hold an integer
value, and is, in one embodiment, also thirty-two bits wide. Each
shadow register is addressable as an offset in the same manner in
which each integer register is addressable.
[0045] Finally, the register set A includes an IEU mode integer
switch 34. The switch 34, like other such elements, need not have a
physical embodiment as a switch, so long as the corresponding
logical functionality is provided within the register sets. The IEU
mode integer switch 34 is coupled to the first subset 26 of integer
registers on line 36, to the second subset of integer registers 28
on line 38, and to the shadow registers 32 on line 40. All accesses
to the register set A 18 are made through the IEU mode integer
switch 34 on line 42. Any access request to read or write a
register in the first subset RA[23:0] is passed automatically
through the IEU mode integer switch 34. However, accesses to an
integer register with an offset outside the first subset RA[23:0]
will be directed either to the second subset RA[31:24] or the
shadow registers RT[31:24], depending upon the operational mode of
the execution engine 14.
[0046] The IEU mode integer switch 34 is responsive to a mode
control unit 44 in the execution engine 14. The mode control unit
44 provides pertinent state or mode information about the IEU 10 to
the IEU mode integer switch 34 on line 46. When the execution
engine performs a context switch such as a transfer to kernel mode,
the mode control unit 44 controls the IEU mode integer switch 34
such that any requests to the second subset RA[31:24] are
re-directed to the shadow RT[31:24], using the same requested
offset within the integer set. Any operating system kernel or other
then-executing entity may thus have apparent access to the second
subset RA[31:24] without the otherwise-required overhead of
swapping the contents of the second subset RA[31:24] out to main
memory, or pushing the second subset RA[31:24] onto a stack, or
other conventional register-saving technique.
[0047] When the execution engine 14 returns to normal user mode and
control passes to the originally-executing user application, the
mode control unit 44 controls the IEU mode integer switch 34 such
that access is again directed to the second subset RA[31:24]. In
one embodiment, the mode control unit 44 is responsive to the
present state of interrupt enablement in the IEU 10. In one
embodiment, the execution engine 14 includes a processor status
register (PSR) (not shown), which includes a one-bit flag (PSR[7])
indicating whether interrupts are enabled or disabled. Thus, the
line 46 may simply couple the IEU mode integer switch 34 to the
interrupts-enabled flag in the PSR. While interrupts are disabled,
the IEU 10 maintains access to the integers RA[23:0], in order that
it may readily perform analysis of various data of the user
application. This may allow improved debugging, error reporting, or
system performance analysis.
[0048] B. Register Set FB
[0049] The re-typable register set FB 20 may be thought of as
including floating point registers 48 (RF[31:0]); and/or integer
registers 50 (RB[31:0]). When neither data type is implied to the
exclusion of the other, this application will use the term RFB[ ].
In one embodiment, the floating point registers RF[ ] occupy the
same physical silicon space as the integer registers RB[ ]. In one
embodiment, the floating point registers RF[ ] are sixty-four bits
wide and the integer registers RB[ ] are thirty-two bits wide. It
will be understood that if double-precision floating point numbers
are not required, the register set RFB[ ] may advantageously be
constructed in a thirty-two-bit width to save the silicon area
otherwise required by the extra thirty-two bits of each floating
point register.
[0050] Each individual register in the register set RFB[ ] may hold
either a floating point value or an integer value. The register set
RFB[ ] may include optional hardware for preventing accidental
access of a floating point value as though it were an integer
value, and vice versa. In one embodiment, however, in the interest
of simplifying the register set RFB[ ], it is simply left to the
software designer to ensure that no erroneous usages of individual
registers are made. Thus, the execution engine 14 simply makes an
access request on line 52, specifying an offset into the register
set RFB[ ], without specifying whether the register at the given
offset is intended to be used as a floating point register or an
integer register. Within the execution engine 14, various entities
may use either the full sixty-four bits provided by the register
set RFB[ ], or may use only the low order thirty-two bits, such as
in integer operations or single-precision floating point
operations.
[0051] A first register RFB[0] 51 contains the constant value zero,
in a form such that RB[0] is a thirty-two-bit integer zero
(0000.sub.hex) and RF[0] is a sixty-four-bit floating point zero
(00000000.sub.hex). This provides the same advantages as described
above for RA[0].
[0052] C. Register Set C
[0053] The register set C 22 includes a plurality of Boolean
registers 54 (RC[31:0]). RC[ ] is also known as the "condition
status register" (CSR). The Boolean registers RC[ ] are each
identical in structure and addressing, albeit that each is
individually addressable at a unique address or offset within RC[
].
[0054] In one embodiment, register set C further includes a
"previous condition status register" (PCSR) 60, and the register
set C also includes a CSR selector unit 62, which is responsive to
the mode control unit 44 to select alternatively between the CSR 54
and the PCSR 60. In the one embodiment, the CSR is used when
interrupts are enabled, and the PCSR is used when interrupts are
disabled. The CSR and PCSR are identical in all other respects. In
the one embodiment, when interrupts are set to be disabled, the CSR
selector unit 62 pushes the contents of the CSR into the PCSR,
overwriting the former contents of the PCSR, and when interrupts
are re-enabled, the CSR selector unit 62 pops the contents of the
PCSR back into the CSR. In other embodiments it may be desirable to
merely alternate access between the CSR and the PCSR, as is done
with RA[31:24] and RT[31:24]. In any event, the PCSR is always
available as a thirty-two-bit "special register."
[0055] None of the Boolean registers is a dedicated condition flag,
unlike the Boolean registers in previously known microprocessors.
That is, the CSR 54 does not include a dedicated carry flag, nor a
dedicated a minus flag, nor a dedicated flag indicating equality of
a comparison or a zero subtraction result. Rather, any Boolean
register may be the destination of the Boolean result of any
Boolean operation. As with the other register sets, a first Boolean
register 58 (RC[0]) always contains the value zero, to obtain the
advantages explained above for RA[0]. In the preferred embodiment,
each Boolean register is one bit wide, indicating one Boolean
value.
[0056] II. Execution Engine
[0057] The execution engine 14 includes one or more integer
functional units 66, one or more floating point functional units
68, and one or more Boolean functional units 70. The functional
units execute instructions as will be explained below. Buses 72,
73, and 75 connect the various elements of the IEU 10, and will
each be understood to represent data, address, and control
paths.
[0058] A. Instruction Format
[0059] FIG. 7 illustrates one exemplary format for an integer
instruction which the execution engine 14 may execute. It will be
understood that not all instructions need to adhere strictly to the
illustrated format, and that the data processing system includes an
instruction fetcher and decoder (not shown) which are adapted to
operate upon varying format instructions. The single example of
FIG. 7 is for ease in explanation only. Throughout this application
the identification I[ ] will be used to identify various bits of
the instruction. I[31:30] are reserved for future implementations
of the execution engine 14. I[29:26] identify the instruction class
of the particular instruction. Table 1 shows the various classes of
instructions performed by the present invention.
1TABLE 1 Instruction Classes Class Instructions 0-3 Integer and
floating point register-to-register instructions 4 Immediate
constant load 5 Reserved 6 Load 7 Store 8-11 Control Flow 12
Modifier 13 Boolean operations 14 Reserved 15 Atomic (extended)
[0060] Instruction classes of particular interest to this
Application include the Class 0-3 register-to-register instructions
and the Class 13 Boolean operations. While other classes of
instructions also operate upon the register file 12, further
discussion of those classes is not believed necessary in order to
fully understand the present invention.
[0061] I[25] is identified as B0, and indicates whether the
destination register is in register set A or register set B.
I[24:22] are an opcode which identifies, within the given
instruction class, which specific function is to be performed. For
example, within the register-to-register classes an opcode may
specify "addition." I[21] identifies the addressing mode which is
to be used when performing the instruction--either register source
addressing or immediate source addressing. I[20:16] identify the
destination register as an offset within the register set indicated
by B0. I[15] is identified as B1 and indicates whether the first
operand is to be taken from register set A or register set B.
I[14:10] identify the register offset from which the first operand
is to be taken. I[9:8] identify a function selection--an extension
of the opcode I[24:22]. I[7:6] are reserved. I[5] is identified as
B2 and indicates whether a second operand of the instruction is to
be taken from register set A or register set B. Finally, I[4:0]
identify the register offset from which the second operand is to be
taken.
[0062] With reference to FIG. 1, the integer functional unit 66 and
floating point functional unit 68 are equipped to perform integer
comparison instructions and floating point comparisons,
respectively. The instruction format for the comparison instruction
is substantially identical to that shown in FIG. 7, with the caveat
that various fields may advantageously be identified by slightly
different names. I[20:16] identifies the destination register where
the result is to be stored, but the addressing mode field I[21]
does not select between register sets A or B. Rather, the
addressing mode field indicates whether the second source of the
comparison is found in a register or is immediate data. Because the
comparison is a Boolean type instruction, the destination register
is always found in register set C. All other fields function as
shown in FIG. 7. In performing Boolean operations within the
integer and floating point functional units, the opcode and
function select fields identify which Boolean condition is to be
tested for in comparing the two operands. The integer and the
floating point functional units fully support the IEEE standards
for numerical comparisons.
[0063] The IEU 10 is a load/store machine. This means that when the
contents of a register are stored to memory or read from memory, an
address calculation must be performed in order to determine which
location in memory is to be the source or the destination of the
store or load, respectively. When this is the case, the destination
register field I[20:16] identifies the register which is the
destination or the source of the load or store, respectively. The
source register 1 field, I[14:10], identifies a register in either
set A or B which contains abase address of the memory location. In
one embodiment, the source register 2 field, I[4:0], identifies a
register in set A or set B which contains an index or an offset
from the base. The load/store address is calculated by adding the
index to the base. In another mode, I[7:0] include immediate data
which are to be added as an index to the base.
[0064] B. Operation of the Instruction Execution Unit and Register
Sets
[0065] It will be understood by those skilled in the art that the
integer functional unit 66, the floating point functional unit 68,
and the Boolean functional unit 70 are responsive to the contents
of the instruction class field, the opcode field, and the function
select field of a present instruction being executed.
[0066] 1. Integer Operations
[0067] For example, when the instruction class, the opcode, and
function select indicate that an integer register-to-register
addition is to be performed, the integer functional unit may be
responsive thereto to perform the indicated operation, while the
floating point functional unit and the Boolean functional unit may
be responsive thereto to not perform the operation. As will be
understood from the cross-referenced applications, however, the
floating point functional unit 68 is equipped to perform both
floating point and integer operations. Also, the functional units
are constructed to each perform more than one instruction
simultaneously.
[0068] The integer functional unit 66 performs integer functions
only. Integer operations typically involve a first source, a second
source, and a destination. A given integer instruction will specify
a particular operation to be performed on one or more source
operands and will specify that the result of the integer operation
is to be stored at a given destination. In some instructions, such
as address calculations employed in load/store operations, the
sources are utilized as a base and an index. The integer functional
unit 66 is coupled to a first bus 72 over which the integer
functional unit 66 is connected to a switching and multiplexing
control (SMC) unit A 74 and an SMC unit B 76. Each integer
instruction executed by the integer functional unit 66 will specify
whether each of its: sources and destination reside in register set
A or register set B.
[0069] Suppose that the IEU 10 has received, from the instruction
fetch unit (not shown), an instruction to perform an integer
register-to-register addition. In various embodiments, the
instruction may specify a register bank, perhaps even a separate
bank for each source and destination. In one embodiment, the
instruction I[ ] is limited to a thirty-two-bit length, and does
not contain any indication of which register bank 16-0 through 16-n
is involved in the instruction. Rather, the bank selector unit 78
controls which register bank is presently active. In one
embodiment, the bank selector unit 78 is responsive to one or more
bank selection bits in a status word (not shown) within the IEU
10.
[0070] In order to perform the integer addition instruction, the
integer functional unit 66 is responsive to the identification in
I[14:10] and I[4:0] of the first and second source registers. The
integer functional unit 66 places an identification of the first
and second source registers at ports S1 and S2, respectively, onto
the integer functional unit bus 72 which is coupled to both SMC
units A and B 74 and 76. In one embodiment, the SMC units A and B
are each coupled to receive B0-2 from the instruction I[ ]. In one
embodiment, a zero in any respective Bn indicates register set A,
and a one indicates register set B. During load/store operations,
the source ports of the integer and floating point functional units
66 and 68 are utilized as a base port and an index port, B and I,
respectively.
[0071] After obtaining the first and second operands from the
indicated register sets on the bus 72, as explained below, the
integer functional unit 66 performs the indicated operation upon
those operands, and provides the result at port D onto the integer
functional unit bus 72. The SMC units A and B are responsive to B0
to route the result to the appropriate register set A or B.
[0072] The SMC unit B is further responsive to the instruction
class, opeode, and function selection to control whether operands
are read from (or results are stored to) either a floating point
register RF[ ] or an integer register RB[ ]. As indicated, in one
embodiment, the registers RF[ ] may be sixty-four bits wide while
the registers are RB[ ] are only thirty-two bits wide. Thus, SMC
unit B controls whether a word or a double word is written to the
register set RFB[ ]. Because all registers within register set A
are thirty-two bits wide, SMC unit A need not include means for
controlling the width of data transfer on the bus 42.
[0073] All data on the bus 42 are thirty-two bits wide, but other
sorts of complexities exist within register set A. The IEU mode
integer switch 34 is responsive to the mode control unit 44 of the
execution engine 14 to control whether data on the bus 42 are
connected through to bus 36, bus 38 or bus 40, and vice versa.
[0074] IEU mode integer switch 34 is further responsive to
I[20:16], I[14:10], and I[4:0]. If a given indicated destination or
source is in RA[23:0], the IEU mode integer switch 34 automatically
couples the data between lines 42 and 36. However, for registers
RA[31:24], the IEU mode integer switch 34 determines whether data
on line 42 is connected to line 38 or line 40, and vice versa. When
interrupts are enabled, IEU mode integer switch 34 connects the SMC
unit A to the second subset 28 of integer registers RA[31:24]. When
interrupts are disabled, the IEU mode integer switch 34 connects
the SMC unit A to the shadow registers RT[31:24]. Thus, an
instruction executing within the integer functional unit 66 need
not be concerned with whether to address RA[31:24] or RT[31:24]. It
will be understood that SMC unit A may advantageously operate
identically whether it is being accessed by the integer functional
unit 66 or by the floating point functional unit 68.
[0075] 2. Floating Point Operations
[0076] The floating point functional unit 68 is responsive to the
class, opcode, and function select fields of the instruction, to
perform floating point operations. The S1, S2, and D ports operate
as described for the integer functional unit 66. SMC unit B is
responsive to retrieve floating point operands from, and to write
numerical floating point results to, the floating point registers
RF[ ] on bus 52.
[0077] 3. Boolean Operations
[0078] SMC unit C 80 is responsive to the instruction class,
opcode, and function select fields of the instruction I[ ]. When
SMC unit C detects that a comparison operation has been performed
by one of the numerical functional units 66 or 68, it writes the
lean result over bus 56 to the Boolean register indicated at the D
port of the functional unit which performed the comparison.
[0079] The Boolean functional unit 70 does not perform comparison
instructions as do the integer and floating point functional units
66 and 68. Rather, the Boolean functional unit 70 is only used in
performing bitwise logical combination of Boolean registers
contents, according to the Boolean functions listed in TABLE 2.
2TABLE 2 Boolean Functions I[23,22,9,8] Boolean result calculation
0000 ZERO 0001 S1 AND S2 0010 S1 AND (NOT S2) 0011 S1 0100 (NOT S1)
AND S2 0101 S2 0110 S1 XOR S2 0111 S1 OR S2 1000 S1 NOR S2 1001 S1
XNOR S2 1011 S1 OR (NOT S2) 1100 NOT S1 1101 (NOT S1) OR S2 1110 S1
NAND S2 1111 ONE
[0080] The advantage which the present invention obtains by having
a plurality of homogenous Boolean registers, each of which is
individually addressable as the destination of a Boolean operation,
will be explained with reference to Tables 3-5. Table 3 illustrates
an example of a segment of code which performs a conditional branch
based upon a complex Boolean function. The complex Boolean function
includes three portions which are OR-ed together. The first portion
includes two sub-portions, which are AND-ed together.
3TABLE 3 Example of Complex Boolean Function 1 RA[1]: = 0; 2 IF
(((RA[2] = RA[3]) AND (RA[4] > RA[5])) OR 3 (RA[6] < RA[7])
OR 4 (RA[8] <> RA[9])) THEN 5 X() 6 ELSE 7 Y(); 8 RA[10]: =
1;
[0081] Table 4 illustrates, in pseudo-assembly form, one likely
method by which previous microprocessors would perform the function
of Table 3. The code in Table 4 is written as though it were
constructed by a compiler of at least normal intelligence operating
upon the code of Table 3. That is, the compiler will recognize that
the condition expressed in lines 2-4 of Table 3 is passed if any of
the three portions is true.
4TABLE 4 Execution of Complex Boolean Function Without Boolean
Register Set 1 START LDI RA[1],0 2 TEST1 CMP RA[2],RA[3] 3 BNE
TEST2 4 CMP RA[4],RA[5] 5 BGT DO-IF 6 TEST2 CMP RA[6],RA[7] 7 BLT
DO-IF 8 TEST3 CMP RA[8],RA[9] 9 BEQ DO-ELSE 10 DO-IF JSR ADDRESS OF
X() 11 JMP PAST-ELSE 12 DO-ELSE JSR ADDRESS OF Y() 13 PAST-ELSE LDI
RA[10],1
[0082] The assignment at line 1 of Table 3 is performed by the
"load immediate" statement at line 1 of Table 4. The first portion
of the complex Boolean condition, expressed at line 2 of Table 3,
is represented by the statements in lines 2-5 of Table 4. To test
whether RA[2] equals RA[3], the compare statement at line 2 of
Table 4 performs a subtraction of RA[2] from RA[3] or vice versa,
depending upon the implementation, and may or may not store the
result of that subtraction. The important function performed by the
comparison statement is that the zero, minus, and carry flags will
be appropriately set or cleared.
[0083] The conditional branch statement at line 3 of Table 4
branches to a subsequent portion of code upon the condition that
RA[2] did not equal RA[3]. If the two were unequal, the zero flag
will be clear, and there is no need to perform the second
sub-portion. The existence of the conditional branch statement at
line 3 of Table 4 prevents the further fetching, decoding, and
executing of any subsequent statement in Table 4 until the results
of the comparison in line 2 are known, causing a pipeline stall. If
the first sub-portion of the first portion (TEST1) is passed, the
second sub-portion at line 4 of Table 4 then compares RA[4] to
RA[5], again setting and clearing the appropriate status flags.
[0084] If RA[2] equals RA[3], and RA[4] is greater than RA[5],
there is no need to test the remaining two portions (TEST2 and
TEST3) in the complex Boolean function, and the statement at Table
4, line 5, will conditionally branch to the label DO-IF, to perform
the operation inside the "IF" of Table 3. However, if the first
portion of the test is failed, additional processing is required to
determine which of the "IF" and "ELSE" portions should be
executed.
[0085] The second portion of the Boolean function is the comparison
of RA[6] to RA[7], at line 6 of Table 4, which again sets and
clears the appropriate status flags. If the condition "less than"
is indicated by the status flags, the complex Boolean function is
passed, and execution may immediately branch to the DO-IF label. In
various prior microprocessors, the "less than" condition may be
tested by examining the minus flag. If RA[7] was not less than
RA[6], the third portion of the test must be performed. The
statement at line 8 of Table 4 compares RA[8] to RA[9]. If this
comparison is failed, the "ELSE" code should be executed;
otherwise, execution may simply fall through to the "IF" code at
line 10 of Table 4, which is followed by an additional jump around
the "ELSE" code. Each of the conditional branches in Table 4, at
lines 3, 5, 7 and 9, results in a separate pipeline stall,
significantly increasing the processing time required for handling
this complex Boolean function.
[0086] The greatly improved throughput which results from employing
the Boolean register set C of the present invention will now
readily be seen with specific reference to Table 5.
5TABLE 5 Execution of Complex Boolean Function With Boolean
Register Set 1 START LDI RA[1],0 2 TEST1 CMP RC[11],RA[2],RA[3],EQ
3 CMP RC[12],RA[4],RA[5],GT 4 TEST2 CMP RC[13],RA[6],RA[7],LT 5
TEST3 CMP RC[14],RA[8],RA[9],NE 6 COMPLEX AND RC[15],RC[11],RC[12]
7 OR RC[16],RC[13],RC[14] 8 OR RC[17],RC[15],RC[16] 9 BC
RC[17],DO-ELSE 10 DO-IF JSR ADDRESS OF X() 11 JMP PAST-ELSE 12
DO-ELSE JSR ADDRESS OF Y() 13 PAST-ELSE LDI RA[10],1
[0087] Most notably seen at lines 2-5 of Table 5, the Boolean
register set C allows the microprocessor to perform the three test
portions back-to-back without intervening branching. Each Boolean
comparison specifies two operands, a destination, and a Boolean
condition for which to test. For example, the comparison at line 2
of Table 5 compares the contents of RA[2] to the contents of RA[3],
tests them for equality, and stores into RC[1] the Boolean value of
the result of the comparison. Note that each comparison of the
Boolean function stores its respective intermediate results in a
separate Boolean register. As will be understood with reference to
the above-referenced related applications, the IEU 10 is capable of
simultaneously performing more than one of the comparisons.
[0088] After at least the first two comparisons at lines 2-3 of
Table 5 have been completed, the two respective comparison results
are AND-ed together as shown at line 6 of Table 3. RC[15] then
holds the result of the first portion of the test. The results of
the second and third sub-portions of the Boolean function are OR-ed
together as seen in Table 5, line 7. It will be understood that,
because there are no data dependencies involved, the AND at line 6
and the OR-ed in line 7 may be performed in parallel. Finally, the
results of those two operations are OR-ed together as seen at line
8 of Table 5. It will be understood that register RC[17] will then
contain a Boolean value indicating the truth or falsity of the
entire complex Boolean function of Table 3. It is then possible to
perform a single conditional branch, shown at line 9 of Table 5. In
the mode shown in Table 5, the method branches to the "ELSE" code
if Boolean register RC[17] is clear, indicating that the complex
function was failed. The remainder of the code may be the same as
it was without the Boolean register set as seen in Table 4.
[0089] The Boolean functional unit 70 is responsive to the
instruction class, opcode, and function select fields as are the
other functional units. Thus, it will be understood with reference
to Table 5 again, that the integer and/or floating point functional
units will perform the instructions in lines 1-5 and 13, and the
Boolean functional unit 70 will perform the Boolean bitwise
combination instructions in lines 6-8. The control flow and
branching instructions in line 9-12 will be performed by elements
of the IEU 10 which are not shown in FIG. 1.
[0090] III. Data Paths
[0091] FIGS. 2-5 illustrate further details of the data paths
within the floating point, integer, and Boolean portions of the
IEU, respectively.
[0092] A. Floating Point Portion Data Paths
[0093] As seen in FIG. 2, the register set FB 20 is a multi-ported
register set. In one embodiment, the register set FB 20 has two
write ports WFB0-1, and five read ports RDFB0-4. The floating point
functional unit 68 of FIG. 1 is comprised of the ALU2 102, FALU
104, MULT 106, and NULL 108 of FIG. 2. All elements of FIG. 2
except the register set 20 and the elements 102-108 comprise the
SMC unit B of FIG. 1.
[0094] External, bidirectional data bus EX-DATA[ ] provides data to
the floating point load/store unit 122. Immediate floating point
data bus LDF_IMED[ ] provides data from a "load immediate"
instruction. Other immediate floating point data are provided on
busses RFF1_IMED and RFF2_IMED, such as is involved in an "add
immediate" instruction. Data are also provided on bus EX_SR_DT[ ],
in response to a "special register move" instruction. Data may also
arrive from the integer portion, shown in FIG. 3, on busses 114 and
120.
[0095] The floating point register set's two write ports WFB0 and
WFB1 are coupled to write multiplexers 110-0 and 110-1,
respectively. The write multiplexers 110 receive data from: the
ALU0 or SHF0 of the integer portion of FIG. 3; the FALU; the MULT;
the ALU2; either EX_SR_DT[ ] or LDF-IMED[ ]; and EX_DATA[ ]. Those
skilled in the art will understand that control signals (not shown)
determine which input is selected at each port, and address signals
(not shown) determine to which register the input data are written.
Multiplexer control and register addressing are within the skill of
persons in the art, and will not be discussed for any multiplexer
or register set in the present invention.
[0096] The floating point register set's five read ports RDFB0 to
RDFB4 are coupled to read multiplexers 112-0 to 112-4,
respectively. The read multiplexers each also receives data from:
either EX_SR_DT[ ] or LDF_IMED[ ], on load immediate bypass bus
126; a load external data bypass bus 127, which allows external
load data to skip the register set FB; the output of the ALU2 102,
which performs non-multiplication integer operations; the FALU 104,
which performs non-multiplication floating point operations; the
MULT 106, which performs multiplication operations; and either the
ALU0 140 or the SHF0 144 of the integer portion shown in FIG. 3,
which respectively perform non-multiplication integer operations
and shift operations. Read multiplexers 112-1 and 112-3 also
receive data from RFF1_IMED[ ] and RFF2_IMED[ ], respectively.
[0097] Each arithmetic-type unit 102-106 in the floating point
portion receives two inputs, from respective sets of first and
second source multiplexers S1 and S2. The first source of each unit
ALU2, FALU, and MULT comes from the output of either read
multiplexer 112-0 or 112-2, and the second source comes from the
output of either read multiplexer 112-1 or 112-3. The sources of
the FALU and the MULT may also come from the integer portion of
FIG. 3 on bus 114.
[0098] The results of the ALU2, FALU, and MULT are provided back to
the write multiplexers 110 for storage into the floating point
registers RF[ ], and also to the read multiplexers 112 for re-use
as operands of subsequent operations. The FALU also outputs a
signal FALU_BD indicating the Boolean result of a floating point
comparison operation. FALU_BD is calculated directly from internal
zero and sign flags of the FALU.
[0099] Null byte tester NULL 108 performs null byte testing
operations upon an operand from a first source multiplexer, in one
mode that of the ALU2. NULL 108 outputs a Boolean signal NULLB_BD
indicating whether the thirty-two-bit first source operand includes
a byte of value zero.
[0100] The outputs of read multiplexers 112-0, 112-1, and 112-4 are
provided to the integer portion (of FIG. 3) on bus 118. The output
of read multiplexer 112-4 is also provided as STDT_FP[ ] store data
to the floating point load/store unit 122.
[0101] FIG. 5 illustrates further details of the control of the S1
and S2 multiplexers. As seen, in one embodiment, each S1
multiplexer may be responsive to bit B1 of the instruction I[ ],
and each S2 multiplexer may be responsive to bit B2 of the
instruction I[ ]. The S1 and S2 multiplexers select the sources for
the various functional units. The sources may come from either of
the register files, as controlled by the B1 and B2 bits of the
instruction itself. Additionally, each register file includes two
read ports from which the sources may come, as controlled by
hardware not shown in the FIGS.
[0102] B. Integer Portion Data Paths
[0103] As seen in FIG. 3, the register set A 18 is also
multi-ported. In one embodiment, the register set A 18 has two
write ports WA0-1, and five read ports RDA0-4. The integer
functional unit 66 of FIG. 1 is comprised of the ALU0 140, ALU1
142, SHF0 144, and NULL 146 of FIG. 3. All elements of FIG. 3
except the register set 18 and the elements 140-146 comprise the
SMC unit A of FIG. 1.
[0104] External data bus EX_DATA[ ] provides data to the integer
load/store unit 152. Immediate integer data on bus LDI_IMED[ ] are
provided in response to a "load immediate" instruction. Other
immediate integer data are provided on busses RFA1_IMED and
RFA2_IMED in response to non-load immediate instructions, such as
an "add immediate." Data are also provided on bus EX_SR_DT[ ] in
response to a "special register move" instruction. Data may also
arrive from the floating point portion (shown in FIG. 2) on busses
116 and 118.
[0105] The integer register set's two write ports WA0 and WA1 are
coupled to write multiplexers 148-0 and 148-1, respectively. The
write multiplexers 148 receive data from: the FALU or MULT of the
floating point portion (of FIG. 2); the ALU0; the ALU1; the SHF0;
either EX_SR_DT[ ] or LDI_IMED[ ]; and EX_DATA[ ].
[0106] The integer register set's five read ports RDA0 to RDA4 are
coupled to read multiplexers 150-0 to 150-4, respectively. Each
read multiplexer also receives data from: either EX_SR_DT[ ] or
LDI_IMED[ ] on load immediate bypass bus 160; a load external data
bypass bus 154, which allows external load data to skip the
register set A; ALU0; ALU1; SHF0; and either the FALU or the MULT
of the floating point portion (of FIG. 2). Read multiplexers 150-1
and 150-3 also receive data from RFA1_IMED[ ] and RFA2_IMED[ ],
respectively.
[0107] Each arithmetic-type unit 140-144 in the integer portion
receives two inputs, from respective sets of first and second
source multiplexers S1 and S2. The first source of ALU0 comes from
either the output of read multiplexer 150-2, or a thirty-two-bit
wide constant zero (0000.sub.hex) or floating point read
multiplexer 112-4. The second source of ALU0 comes from either read
multiplexer 150-3 or floating point read multiplexer 112-1. The
first source of ALU1 comes from either read multiplexer 150-0 or
IF_PC[ ]. IF_PC[ ] is used in calculating a return address needed
by the instruction fetch unit (not shown), due to the IEU's ability
to perform instructions in an out-of-order sequence. The second
source of ALU1 comes from either read multiplexer 150-1 or
CF_OFFSET[ ]. CF_OFFSET[ ] is used in calculating a return address
for a CALL instruction, also due to the out-of-order
capability.
[0108] The first source of the shifter SHF0 144 is from either:
floating point read multiplexer 112-0 or 112-4; or any-integer read
multiplexer 150. The second source of SHF0 is from either: floating
point read multiplexer 112-0 or 112-4; or integer read
multiplexer-150-0, 150-2, or 150-4. SHF0 takes a third input from a
shift amount multiplexer (SA). The third input controls how far to
shift, and is taken by the SA multiplexer from either: floating
point read multiplexer 112-1; integer read multiplexer 150-1 or
150-3; or a five-bit wide constant thirty-one (11111.sub.2 or
31.sub.10) The shifter SHF0 requires a fourth input from the size
multiplexer (S). The fourth input controls how much data to shift,
and is taken by the S multiplexer from either: read multiplexer
150-1; read multiplexer 150-3; or a five-bit wide constant sixteen
(10000.sub.2 or 16.sub.10).
[0109] The results of the ALU0, ALU1, and SHF0 are provided back to
the write multiplexers 148 for storage into the integer registers
RA[ ], and also to the read multiplexers 150 for re-use as operands
of subsequent operations. The output of either ALU0 or SHF0 is
provided on bus 120 to the floating point portion of FIG. 3. The
ALU0 and ALU1 also output signals ALU0_BD and ALU1_BD,
respectively, indicating the Boolean results of integer comparison
operations. ALU0_BD and ALU1_BD are calculated directly from the
zero and sign flags of the respective functional units. ALU0 also
outputs signals EX_TADR[ ] and EX_VM_ADR. EX_TADR[ ] is the target
address generated for an absolute branch instruction, and is sent
to the IFU (not shown) for fetching the target instruction.
EX_VM_ADR[ ] is the virtual address used for all loads from memory
and stores to memory, and is sent to the VMU (not shown) for
address translation.
[0110] Null byte tester NULL 146 performs null byte testing
operations upon an operand from a first source multiplexer. In one
embodiment, the operand is from the ALU0. NULL 146 outputs a
Boolean signal NULLA_BD indicating whether the thirty-two-bit first
source operand includes a byte of value zero.
[0111] The outputs of read multiplexers 150-0 and 150-1 are
provided to the floating point portion (of FIG. 2) on bus 114. The
output of read multiplexer 150-4 is also provided as STDT_INT[ ]
store data to the integer load/store unit 152.
[0112] A control bit PSR[7] is provided to the register set A 18.
It is this signal which, in FIG. 1, is provided from the mode
control unit 44 to the IEU mode integer switch 34 on line 46. The
IEU mode integer switch is internal to the register set A 18 as
shown in FIG. 3.
[0113] FIG. 6 illustrates further details of the control of the S1
and S2 multiplexers.
[0114] C. Boolean Portion Data Paths
[0115] As seen in FIG. 4, the register set C 22 is also
multi-ported. In one embodiment, the register set C 22 has two
write ports WC0-1, and five read ports RDA0-4. All elements of FIG.
4 except the register set 22 and the Boolean combinational unit 70
comprise the SMC unit C of FIG. 1.
[0116] The Boolean register set's two write ports WC0 and WC1 are
coupled to write multiplexers 170-0 and 170-1, respectively. The
write multiplexers 170 receive data from: the output of the Boolean
combinational unit 70, indicating the Boolean result of a Boolean
combinational operation; ALU0_BD from the integer portion of FIG.
3, indicating the Boolean result of an integer comparison; FALU_BD
from the floating point portion of FIG. 2, indicating the Boolean
result of a floating point comparison; either ALU1_BD-P from ALU1,
indicating the results of a compare instruction in ALU1, or
NULLA_BD from NULL 146, indicating a null byte in the integer
portion; and either ALU2_BD_P from ALU2, indicating the results of
a compare operation in ALU2, or NULLB_BD from NULL 108, indicating
a null byte in the floating point portion. In one mode, the
ALU0_BD, ALU1_BD, ALU2_BD, and FALU_BD signals are not taken from
the data paths, but are calculated as a function of the zero flag,
minus flag, carry flag, and other condition flags in the PSR. In
one mode, wherein up to eight instructions may be executing at one
instant in the IEU, the IEU maintains up to eight PSRs.
[0117] The Boolean register set C is also coupled to bus EX_SR DT[
], for use with "special register move" instructions. The CSR may
be written or read as a whole, as though it were a single
thirty-two-bit register. This enables rapid saving and restoration
of machine state information, such as may be necessary upon certain
drastic system errors or upon certain forms of grand scale context
switching.
[0118] The Boolean register set's five read ports RDC0 to RDC3 are
coupled to read multiplexers 172-0 to 172-4, respectively. The read
multiplexers 172 receive the same set of inputs as the write
multiplexers 170 receive. The Boolean combinational unit 70
receives inputs from read multiplexers 170-0 and 170-1. Read
multiplexers 172-2 and 172-3 respectively provide signals
BLBP_CPORT and BLBP_DPORT. BLBP_CPORT is used as the basis for
conditional branching instructions in the IEU. BLBP_DPORT is used
in the "add with Boolean" instruction, which sets an integer
register in the A or B set to zero or one (with leading zeroes),
depending upon the content of a register in the C set. Read port
RDC4 is presently unused, and is reserved for future enhancements
of the Boolean functionality of the IEU.
[0119] IV. Conclusion
[0120] While the features and advantages of the present invention
have been described with respect to particular embodiments thereof,
and in varying degrees of detail, it will be appreciated that the
invention is not limited to the described embodiments. The
following claims define the invention to be afforded patent
coverage.
* * * * *