U.S. patent application number 10/238276 was filed with the patent office on 2004-03-11 for extended register space apparatus and methods for processors.
Invention is credited to Kling, Ralph M..
Application Number | 20040049657 10/238276 |
Document ID | / |
Family ID | 31990940 |
Filed Date | 2004-03-11 |
United States Patent
Application |
20040049657 |
Kind Code |
A1 |
Kling, Ralph M. |
March 11, 2004 |
Extended register space apparatus and methods for processors
Abstract
Methods and apparatus for accessing an extended register space
associated with a processor are disclosed. In an example method, a
first portion of an encoding field of an instruction is compared to
a value associated with the extended register space. A first
operand of the instruction is associated with a second portion of
the encoding field if the first portion of the encoding field
matches the value associated with the extended register space.
Inventors: |
Kling, Ralph M.; (Sunnyvale,
CA) |
Correspondence
Address: |
RALPH M.KLING
1422 BEDFORD AVE.
SUNNYVALE
CA
94087
US
|
Family ID: |
31990940 |
Appl. No.: |
10/238276 |
Filed: |
September 10, 2002 |
Current U.S.
Class: |
712/208 ;
712/E9.023; 712/E9.035; 712/E9.041 |
Current CPC
Class: |
G06F 9/30134 20130101;
G06F 9/30138 20130101 |
Class at
Publication: |
712/208 |
International
Class: |
G06F 009/30 |
Claims
What is claimed is:
1. A method of accessing an extended register space associated with
a processor, the method comprising: comparing a first portion of a
first encoding field of an instruction to a value associated with
the extended register space; and associating a first operand of the
instruction with a second portion of the first encoding field if
the first portion of the first encoding field matches the value
associated with the extended register space.
2. The method of claim 1, wherein comparing the first portion of
the first encoding field to the value associated with the extended
register space includes comparing a portion of a displacement field
of the instruction to the value associated with the extended
register space.
3. The method of claim 2, wherein comparing the portion of the
displacement field of the instruction to the value associated with
the extended register space includes comparing a page identifier
portion of the displacement field to a page identifier associated
with the extended register space.
4. The method of claim 3, wherein comparing the page identifier
portion of the displacement field to the page identifier associated
with the extended register space includes comparing a predetermined
number of most significant bits of the displacement field to the
page identifier associated with the extended register space.
5. The method of claim 1, wherein associating the first operand of
the instruction with the second portion of the first encoding field
if the first portion of the first encoding field matches the value
associated with the extended register space includes associating
one of a source and a destination operand with the second portion
of the first encoding field.
6. The method of claim 1, further including configuring a memory
associated with the processor so that a portion of the memory
corresponding to the extended register space is used exclusively by
the processor.
7. The method of claim 6, further including storing information
from the extended register space within the portion of the memory
corresponding to the extended register space in response to a
context switch.
8. The method of claim 1, further including associating a portion
of a second encoding field of the instruction with a second operand
of the instruction.
9. The method of claim 8, wherein associating the portion of the
second encoding field with the second operand includes associating
a portion of an Mr/m field with one of a source and a destination
operand.
10. The method of claim 1, further including associating third and
fourth portions of the first encoding field and a portion of a
second encoding field of the instruction with second and third
operands if the first portion of the first encoding field matches
the value associated with the extended register space.
11. A method of accessing a register space associated with a
processor, the method comprising: comparing a first portion of a
displacement field of an instruction to a value associated with the
register space; and associating an operand of the instruction with
a second portion of the displacement field if the first portion of
the displacement field matches the value associated with the
register space.
12. The method of claim 11, wherein comparing the first portion of
the displacement field to the value associated with the register
space includes comparing a tag portion of the displacement field to
a tag associated with the register space.
13. The method of claim 12, wherein comparing the tag portion of
the displacement field to the tag associated with the register
space includes comparing a predetermined number of most significant
bits of the displacement field to the tag associated with the
register space.
14. The method of claim 11, wherein associating the operand of the
instruction with the second portion of the displacement field if
the first portion of the displacement field matches the value
associated with the register space includes associating one of a
source and a destination operand with the second portion of the
displacement field.
15. The method of claim 11, further including configuring a system
memory so that a portion of the system memory corresponding to the
register space is not shared.
16. The method of claim 11, further including associating third and
fourth portions of the displacement field and a portion of an Mr/m
field with second and third operands if the first portion of the
displacement field matches the value associated with the register
space.
17. A method of processing an instruction requiring access to a
register space associated with a processor, the method comprising:
comparing a tag portion of a first encoding field of the
instruction to a value associated with the register space; and
decoding the instruction to associate first and second operands of
the instruction with respective first and second registers within
the register space.
18. The method of claim 17, wherein comparing the tag portion of
the first encoding field of the instruction to the value associated
with the register space includes comparing a portion of a
displacement field to the value associated with the register
space.
19. The method of claim 17, wherein decoding the instruction to
associate the first and second operands of the instruction with the
respective first and second registers includes associating a
register index portion of the first encoding field with the first
operand and a portion of a second encoding field of the instruction
with the second operand.
20. The method of claim 19, wherein associating the register index
portion of the first encoding field with the first operand and the
portion of the second encoding field with the second operand
includes associating a portion of a displacement field with the
first operand and a portion of an Mr/m field with the second
operand.
21. The method of claim 17, further including tracking changes
within a base register and restarting the instruction in response
to detection of a change affecting the instruction.
22. The method of claim 17, further including comparing the tag
portion of the first encoding field of the instruction to a
plurality of values, each of which is associated with a portion of
the register space.
23. The method of claim 17, further including saving the
information stored within the register space in response to a
context switch.
24. The method of claim 17, further including mapping registers not
located within the register space into the register space.
25. The method of claim 17, further including using a microcode
trace to store information associated with accessing the register
space.
26. The method of claim 17, further including emulating the
functionality of the instruction within a fault handler in response
to an attempt by the instruction to access an unmapped memory
address.
27. A processor, comprising: a register space; an instruction
decoding pipeline; and microcode adapted to cause the processor to
process an instruction requiring access to the register space
within the instruction decoding pipeline so that a first portion of
a first encoding field of the instruction is compared to a value
associated with the register space and so that a first operand of
the instruction is associated with a second portion of the first
encoding field if the first portion of the first encoding field
matches the value associated with the register space.
28. The processor of claim 27, wherein the register space includes
first and second portions and wherein the first portion contains
fewer registers than the second portion.
29. The processor of claim 27, wherein the register space is
physically integrated within the processor.
30. The processor of claim 27, wherein the instruction decoding
pipeline includes a plurality of decoders adapted to perform
parallel decoding of the instruction.
31. The processor of claim 27, wherein the first encoding field is
a displacement addressing field and wherein the first portion of
the first encoding field is one of a memory page identifier and a
tag.
32. The processor of claim 27, wherein the second portion of the
first encoding field is one of a register index and an offset.
33. A computer system, comprising: a memory controller; a system
memory coupled to the memory controller; and a processor having a
register space and coupled to the memory controller, wherein the
processor is programmed to process an instruction requiring access
to the register space so that a first portion of an encoding field
of the instruction is compared to a value associated with the
register space and so that an operand of the instruction is
associated with a second portion of the encoding field if the first
portion of the encoding field matches the value associated with the
register space.
34. The computer system of claim 33, wherein a portion of the
register space portion corresponds to a page of the system
memory.
35. The computer system of claim 33, wherein the register space is
physically integrated within the processor.
36. The computer system of claim 33, wherein the register space
includes first and second portions and wherein the first portion
contains fewer registers than the second portion.
37. A method of processing computer readable instructions, the
method comprising: providing a first processor having a first
number of registers; defining a field in an instruction set
associated with a second processor having a second number of
registers less than the first number of registers so that an
instruction from the instruction set that access an off-chip
resource when executed by the second processor accesses an on-chip
resource when executed by the first processor.
38. The method of claim 37, wherein providing the first processor
having the first number of registers includes providing a processor
having more than eight general purpose registers.
39. The method of claim 37, wherein defining the field in the
instruction set includes defining a tag within a displacement field
of the instruction set.
40. The method of claim 39, wherein defining the field in the
instruction set associated with the second processor having the
second number of registers less than the first number of registers
so that the instruction from the instruction set that access the
off-chip resource when executed by the second processor accesses
the on-chip resource when executed by the first processor includes
using a portion of the field to access the on-chip resource.
41. A method of executing an instruction, comprising: providing a
processor with an extended set of on-chip registers; encoding the
instruction with an address of a first register in the extended set
of on-chip registers; executing the instruction using only data
from the first register and data from a second on-chip register
associated with the processor.
42. The method of claim 41, wherein the second register is in the
extended set of registers.
43. The method of claim 41, wherein the second register is in a
second set of on-chip registers associated with the processor.
44. A processor, comprising: a first set of registers; an extended
set of registers; a decoder to decode an instruction so that the
instruction is executed using only data from at least one of the
first set and extended set of registers.
45. The processor of claim 44, wherein the decoder comprises: a
first decoder to decode an opcode portion of an instruction; and a
second decoder to decode a tag portion of the instruction at
substantially the same time the first decoder decodes the opcode
portion of the instruction.
Description
FIELD OF THE DISCLOSURE
[0001] The present disclosure relates generally to microprocessors
and, more particularly, to apparatus and methods that extend the
register space available to a processor without requiring
modification of the instruction set encodings associated with that
processor.
BACKGROUND
[0002] The architectural register set or register space of a
processor is typically physically integrated within the processor
(i.e., is on-chip). Register space or registers may be used to
facilitate the rapid execution of instructions and manipulation of
operand values by a processor. As is well known, the registers
making up a register space are not a shared resource and, as a
result, can be accessed more rapidly than other resources that are
physically external or separate from the processor chip (i.e.,
off-chip) and/or which are shared with other agent resources. The
register space of a processor is not subject to memory coherency
schemes (such as those that are used within multiprocessor systems)
and other operational overhead associated with the management of
shared memory resources. Also, using a memory stack in lieu of a
larger register file introduces additional overhead associated with
address calculations.
[0003] Some microprocessors or processors provide a relatively
limited register space or architectural register set. For example,
the thirty-two bit Intel processor families, which are collectively
referred to as IA-32 processors, provide eight thirty-two bit
general purpose registers, which are located on-chip.
Unfortunately, many compiler optimizations, which are usually used
to increase the effective instruction-per-clock-cycle (IPC) rate of
processors, typically require more than eight general purpose
registers. Additionally, a larger number of registers is generally
beneficial because a larger number of registers enables program
execution to be carried out using fewer memory-based operations,
thereby reducing the overhead associated with accessing stack-based
operands and, thus, reducing cache occupation and bandwidth (i.e.,
cache ports) overhead. Reducing the number of stack-based memory
operations performed by a processor can free a substantial amount
of cache space and bandwidth for use by other load, store and
prefetch instructions, which can substantially increase the IPC
rate of the processor.
[0004] While it is a relatively simple matter to redesign a
processor to have a larger register space, such a processor
redesign typically requires changes to the instruction set
encodings to enable the redesigned processor to efficiently use the
additional register space. Furthermore, instruction set encoding
changes are typically not backward compatible with earlier versions
of the processor that have a smaller register space.
BRIEF DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is a block diagram of an example processor system
that uses the extended register space apparatus and methods
described herein;
[0006] FIG. 2 is a more detailed block diagram of the processor
shown in FIG. 1;
[0007] FIG. 3 is a block diagram that depicts an example manner in
which an instruction encoding can be used by the processor shown in
FIGS. 1 and 2 to access an extended register space;
[0008] FIG. 4 is a flow diagram that depicts an example manner in
which the processor shown in FIGS. 1 and 2 can process an
instruction encoding to access an extended register space; and
[0009] FIG. 5 is a block diagram that depicts another example
manner in which an instruction encoding can be used by the
processor shown in FIGS. 1 and 2 to access an extended register
space.
DESCRIPTION OF THE PREFERRED EXAMPLES
[0010] FIG. 1 is a block diagram of an example processor system 10
that uses the extended register space apparatus and methods
described herein. As shown in FIG. 1, the processor system 10
includes a processor 12 that is coupled to an interconnection bus
or network 14. The processor 12 includes an architectural register
set or register space 16, which is depicted in FIG. 1 as being
entirely on-chip, but which could alternatively be located entirely
or partially off-chip and directly coupled to the processor 12 via
dedicated electrical connections and/or via the interconnection
network or bus 14. The processor 12 may be any suitable processor,
processing unit or microprocessor such as, for example, an Intel
Itanium.TM. processor, Intel X-Scale.TM. processor, Intel
Pentium.TM. processor, etc. However, in the example described in
detail below, the processor 12 is a thirty-two bit Intel processor,
which is commonly referred to as an IA-32 processor.
[0011] In the example shown in FIG. 1, regardless of whether the
register space 16 is implemented on-chip, off-chip, or some
combination of on-chip and off-chip, the register space 16 is
extended to provide more than eight thirty-two bit general purpose
registers, which are currently provided by existing IA-32
processors. Although not shown in FIG. 1, the system 10 may be a
multi-processor system and, thus, may include one or more
additional processors that are identical or similar to the
processor 12 and which are coupled to the interconnection bus or
network 14.
[0012] The processor 12 of FIG. 1 is coupled to a chipset 18, which
includes a memory controller 20 and an input/output (I/O)
controller 22. As is well known, a chipset typically provides I/O
and memory management functions as well as a plurality of general
purpose and/or special purpose registers, timers, etc. that are
accessible or used by one or more processors coupled to the
chipset. The memory controller 20 performs functions that enable
the processor 12 (or processors if there are multiple processors)
to access a system memory 24, which may include any desired type of
volatile memory such as, for example, static random access memory
(SRAM), dynamic random access memory (DRAM), etc. The I/O
controller 22 performs functions that enable the processor 12 to
communicate with peripheral input/output (I/O) devices 26 and 28
via an I/O bus 30. The I/O devices 26 and 28 may be any desired
type of I/O device such as, for example, a keyboard, a video
display or monitor, a mouse, etc. While the memory controller 20
and the I/O controller 22 are depicted in FIG. 1 as separate
functional blocks within the chipset 18, the functions performed by
these blocks may be integrated within a single semiconductor
circuit or may be implemented using two or more separate integrated
circuits.
[0013] FIG. 2 is a more detailed block diagram of the processor 12
shown in FIG. 1. In the example of FIG. 2, the register space 16 of
the processor 12 includes eight on-chip general purpose registers
36 that are currently provided by existing IA-32 processors and an
extended on-chip register space or set of registers 38. In
addition, the processor 12 includes instruction processing hardware
and/or logic 40 which, in addition to the pipeline hardware
provided with known IA-32 processors, includes two decoding blocks
42 and 44 that are adapted to process or decode instructions or
portions of an instruction in parallel. Still further, the
processor 12 includes microcode 46 that, in addition to enabling
the processor 12 to carry out the functions of a known IA-32
processor, enables the processor 12 to utilize the extended
register space 38 for carrying out existing IA-32 instruction set
encodings.
[0014] FIG. 3 is a block diagram that depicts an example manner in
which an existing or standard IA-32 instruction encoding can be
used by the processor 12 of FIGS. 1 and 2 to access the extended
register space 38. As shown in FIG. 3, the encoding fields 50 of a
standard instruction for an IA-32 processor include an optional
prefix field 52, an opcode field 54, an Mr/m field 56, an Sib field
58, a displacement addressing field 60, and an immediate addressing
field 62. Because the IA-32 processor instruction encoding fields
50 shown in FIG. 3 are well known, additional detailed description
of these fields is not required. However, for purposes of
facilitating an understanding of the examples described herein,
some additional description of the purpose and operation of these
fields is provided below.
[0015] The opcode field 54 contains the binary encoding, which in
this example is one-byte or eight bits of encoding, required to
carry out a particular processor operation such as, for example, an
arithmetic operation, a memory access operation, a register
contents manipulation (e.g., shift), or any combination of these
operations. The Mr/m field 56, among other things, is a one-byte
field that determines the addressing mode to be used in carrying
out an instruction (e.g., execution of an instruction by a
processor such as the processor 12 shown in FIG. 1). For example, a
displacement addressing mode or an immediate addressing mode may be
used depending on the status of the bits within the Mr/m field 56.
As is known, a displacement addressing mode uses the contents of
the displacement field 60 to address an operand associated with an
instruction relative to another memory address such as, for
example, the starting address of the instruction. On the other
hand, an immediate addressing mode uses the immediate addressing
field 62 to address an operand associated with the instruction
based on the contents of the immediate addressing field 62. In
other words, if used, the immediate addressing field 62 typically
contains an absolute (as opposed to a relative) memory address,
which is associated with an operand of the instruction.
[0016] The example instruction described in connection with FIG. 3
is an add with carry instruction, which is represented mnemonically
as ADC. As is known, the ADC instruction for an IA-32 processor
requires two operands, one of which is referred to as a source
(SRC) operand and the other of which is referred to as a
destination (DEST) operand. With existing IA-32 processors, one of
the two operands (i.e., SRC or DEST) must be located within an
on-chip register and the other one of the operands may be located
within system memory. When executed by an existing IA-32 processor,
the ADC instruction results in the summation of the contents of the
SRC, the DEST and the carry flag (CF) and storage of the sum in the
location associated with the DEST operand. Mnemonically, this
operation can be represented as DEST<=DEST+SRC+CF. Thus, the
DEST location functions as both an operand and a storage location
for the result of the instruction. For processor architectures that
allow more than one memory operand, the methods described herein
can be individually applied to each memory operand.
[0017] When executing an ADC instruction, existing or known IA-32
processors interpret bits three to five of the Mr/m field 56 as an
address for one of the eight known or traditional general purpose
on-chip registers (e.g., registers zero to seven). Depending on the
particular encodings used for the ADC instruction, the register
address represented in the Mr/m field 56 may be either the location
of the SRC operand or the DEST operand. In the example depicted in
connection with reference numeral 66, the on-chip register is the
DEST operand and existing IA-32 processors use the displacement
field 60 to address a portion of system memory (e.g., a portion of
the memory 24 shown in FIG. 1) for the SRC operand. On the other
hand, in the example depicted in connection with reference numeral
68, the on-chip register is the SRC operand and existing IA-32
processors use the displacement field 60 to address system memory
for the DEST operand.
[0018] For the example IA-32 processor of FIG. 2, the register
space 16 (FIG. 1) is extended and, thus, contains more than the
eight traditional general purpose registers currently provided with
IA-32 processors. In the example of FIG. 2, the register space 16
is extended to contain an additional 1024 thirty-two bit registers.
However, any other number of additional registers may be used
instead. As described in greater detail in connection with FIG. 4
below, the apparatus and methods described herein enable the
instruction encoding fields 50 shown in FIG. 3 to access the
register space 16 of the processor 12. In particular, when
executing an ADC instruction in a displacement addressing mode as
depicted in FIG. 3, the processor 12 reads the most significant
(i.e., the upper) twenty bits of the displacement field 60 as a
page identifier or tag and then compares this page identifier or
tag to a predetermined identifier value associated with the
extended register space 38. As described in detail in connection
with FIG. 4 below, if the page identifier or tag read from the
displacement field 60 matches the identifier value associated with
the extended register space 38, the processor 12 processes the
instruction by using the lower twelve bits of the displacement
field 60 to access one of the two operands of the instruction
within the extended register space 38.
[0019] As depicted in FIG. 3, the lower twelve bits or offset of
the displacement field 60 are used as a register index to the
extended register space 38. Specifically, bits two to eleven are
used to address the 1024 thirty-two bit registers. The lowest two
bits (i.e., zero and one) are ignored because these bits correspond
to (i.e., may be used to individually address or select) the four
bytes making up each thirty-two bit register word. Thus, if bits
three to five of the Mr/m field 56 address the SRC operand, then
bits two to eleven of the displacement field 60 are used by the
processor 12 to address the DEST operand within the extended
register space 38. On the other hand, if bits three to five of the
Mr/m field 56 address the DEST operand, then bits two to eleven of
the displacement field 60 are used by the processor 12 to address
the SRC operand within the extended register space 38.
[0020] Although the example described in connection with FIG. 3
uses a single page identifier or tag that corresponds to a four
kilobyte page or 1024 thirty-two bit words within the memory map of
the processor 12, additional page identifiers or tags could be used
to enable the processor 12 to access more than 1024 thirty-two bit
registers within the extended register space 38. Likewise, fewer
than 1024 thirty-two bit registers may be provided within the
extended register space 38, in which case some of the register
addresses provided by the lower twelve bits of the displacement
field 60 may be unused or ignored. Alternatively, a tag having more
than twenty bits may be used to access registers within the
extended register space 38. In that case, the offset or register
index portion of the displacement field 60 would have fewer than
twelve bits and, thus, would enable addressing and access to fewer
than 1024 thirty-two bit registers. Additionally, although the
example depicted in FIG. 3 is based on an add with carry
instruction any other instruction using memory operands could be
used instead. Still further, while the example depicted in FIG. 3
is based on using an instruction set for an IA-32 processor, other
instruction sets associated with other processor types could be
used instead. In particular, for implementations based on these
other instruction sets and processor types, the fields associated
with the native register address and memory address would be used
instead of the IA-32 fields "M/rm" and "displacement."
[0021] In the example described in connection with FIGS. 1-3, the
processor 12 is an IA-32 processor and the register space 16
includes the eight general purpose on-chip registers that are
traditionally provided by known IA-32 processors and an additional
1024 thirty-two bit on-chip registers, which have not previously
been provided with IA-32 processors. To enable the processor 12 to
access the extended register space 38 using instruction encodings
compatible with existing IA-32 processors (i.e., processors which
do not have the extended register space 38), the processor 12
includes microarchitecture (e.g., microcode) for causing the
processor 12 to carry out the instruction processing technique
described in detail in connection with FIG. 4 below. In addition,
the operating system (OS) and/or basic input/output system (BIOS)
of the computer system 10 is configured so that the memory map of
the system 10 reserves the memory page associated with the extended
register space 38 for exclusive use by the processor 12. In other
words, the memory page identifier that would normally be used by
existing IA-32 processors to address a physical page of memory
within the system memory 24 is instead used exclusively by the
processor 12 (i.e., is not shared by other resources within the
system 10) to address registers within the extended register space
38.
[0022] FIG. 4 is a flow diagram that depicts an example manner in
which the processor 12 shown in FIGS. 1 and 2 can process existing
or standard IA-32 instruction encodings to access the extended
register space 38. In particular, the flow diagram shown in FIG. 4
depicts an example manner in which the front-end instruction
processing pipeline within the instruction processing hardware or
logic 40 of the processor 12 is configured to operate when
processing a standard IA-32 instruction encoding such as, for
example, the instruction depicted in FIG. 3. As shown in FIG. 4,
the processor 12 accesses the cache (block 100), fetches the next
instruction to be processed (block 102) and decodes the length of
the instruction to be processed (block 104). As is known, decoding
the length of an instruction enables a processor to parse the
instruction into its component encoding fields (i.e., opcode field,
Mr/m field, displacement field, etc.). The instruction to be
processed by the processor 12 is then decoded (blocks 106 and 108),
renamed (block 110) and then queued for execution (block 112). It
should be recognized that the activities associated with blocks
100112 of FIG. 4 are currently employed by existing IA-32
processors and, thus, are well known and are not described in
greater detail herein.
[0023] The processor 12 is adapted to perform additional activities
in parallel to the instruction processing activities associated
with blocks 100-112 described in connection with FIG. 4. The
processor 12 uses the decoding blocks 42 and 44 to carry out the
decoding activities associated with blocks 106 and 108. In
addition, the decoding blocks 42 and 44 are used to determine
whether the page identifier or tag portion of the displacement
field 60 matches an identifier value or tag associated with the
extended register space 38 of the processor 12 (block 114). If the
tag portion of the displacement field 60 does not match the tag
associated with the extended register space 38 of the processor 12,
then the decoding hardware or logic performing parallel decoding
(i.e., in parallel to blocks 106 and 108) of the instruction
currently being processed takes no further action in connection
with the instruction. On the other hand, if the page identifier or
tag portion of the displacement field 60 does match the tag
associated with the extended register space 38, then the processor
12 uses one of the decoders 42 and 44 to decode (block 116) the
register pointer bits (i.e., bits three to five) of the Mr/m field
56 and the register index bits (i.e., the lower twelve bits) of the
displacement field 60 to determine whether the SRC operand or DEST
operand is located within the extended register space 38 and, thus,
is to be addressed by the register index portion of the
displacement field 60.
[0024] As can been seen from the example in FIG. 4, the number of
clock cycles required to decode an instruction that utilizes the
extended register space 38 can be minimized by providing additional
decoding hardware and/or logic that performs register decoding
operations (e.g., block 116) in parallel to instruction decoding
activities (e.g., blocks 106 and 108). For example, with the
example processor 12 shown in FIG. 2, one of the decoders 42 and 44
can be used for register decoding operations while the other one of
the decoders 42 and 44 is used for instruction decoding activities.
However, the addressing mode used by the instruction effects the
extent to which instruction decoding and register decoding
operations can be performed in parallel. For instance, for the
example instruction shown and described in connection with FIG. 3,
displacement addressing is used. With displacement addressing, an
operand address is directly encoded within the instruction (i.e.,
within the displacement field 60 and/or the Mr/m field 56), thereby
enabling substantial parallel processing of the encoding fields
within the instruction.
[0025] In the case where the page identifier or tag portion of the
displacement field 60 is contained within a register (i.e., the tag
value is stored in the register) such as, for example, addressing
that uses indirection through a base register, the technique shown
in FIG. 4 may be used to compare (block 114) the value stored in
the base register to the tag or value associated with the extended
register space 38. However, such a comparison may be speculative
because the comparison is performed at the front-end of the
instruction processing pipeline and a subsequent processor
operation could change the value stored in the base register. Thus,
with indirect or other more complex addressing modes, the processor
12 is preferably configured to track changes to the base register
and, upon recognition of changes to the base register value,
restart any instruction affected by the change. In any event,
changes to the page identifier or tag portion (i.e., the upper
twenty bits) of the base register are a relatively rare occurrence
and, thus, instruction restarts and the like would have a minimal
impact on overall execution speed or the effective IPC rate of the
processor 12.
[0026] From the above example, it can be seen that a standard or
known IA-32 instruction set or encodings can be used to enable an
IA-32 processor having an extended register space (e.g., the
extended register space 38 of the processor 12) to use that
extended register space to store operand values that would
traditionally be stored within system memory (e.g., within off-chip
shared memory). The use of register-based operations in place of
operations that would otherwise be memory-based reduces the use of
stack-based operations and other memory access overhead, thereby
resulting in an increased IPC rate for the processor having the
extended register space.
[0027] Software written for a processor having an extended register
set such as the example processor described in connection with
FIGS. 1-4 above is backward compatible with (i.e., can run natively
on or can be executed by) an existing IA-32 processor having only
the eight traditional on-chip general purpose registers. To enable
such backward compatibility, software or instructions utilizing the
extended register set are compiled so that an instruction requiring
access to a register within the extended register set is reduced to
a memory access operation. However, the BIOS and/or OS executed by
the existing IA-32 processor must ensure that the system memory
used as register space is available to the existing IA-32
processor. In other words, if software is written for use by an
IA-32 processor having an additional 1024 thirty-two bit on-chip
registers, executing this software on a currently available IA-32
processor having only eight on-chip general purpose registers
requires the BIOS and/or OS of the existing IA-32 processor to map
a page (i.e., 1024 thirty-two bit words) with the same base address
as the extended register tag within its system memory. However,
executing software that makes use of the extended register space 38
on an existing IA-32 processor does not provide a performance
advantage (e.g., an increased IPC rate) because operands addressed
within the extended register space physically reside within system
memory and, thus, accessing these operands involves memory
operations and the processing overhead associated therewith.
[0028] As noted above, the extended register space 38 provided
within the processor 12 can be more or less than 1024 thirty-two
bit words (e.g., more than one page) if desired. For example, in a
case where the processor 12 is executing a single thread or process
that uses multiple pages of register space within the extended
register space 38, the tag match or comparison (block 114) shown in
FIG. 4 compares the tag portion of the displacement field 60 of
each instruction executed in the thread to identifier values or
tags that correspond to the multiple pages of register space. If
any one of the identifiers or tags matches the tag portion of the
displacement field 60, the processor 12 carries out the register
decoding (block 116) as described in connection with FIG. 4
above.
[0029] On the other hand, in a case where the processor 12 uses its
operating system to carry out multiple threads or processes, each
thread or process can be associated with a different page
identifier or tag so that each thread or process has its own page
of register space. Thus, in the case where the processor 12 is
executing multiple threads or processes, each of which is
associated with a different page identifier or tag, the tag match
or comparison (block 114) shown in FIG. 4 compares the tag portion
of the displacement field 60 to the identifier associated with the
page used for the current thread or process.
[0030] Still further, the processor 12 may execute multiple threads
or processes where some or all of those threads or processes use a
plurality of pages within the extended register space 38. In other
words, there may be multiple threads and each of those threads may
have access to more than one page within the extended register
space 38. In this case, the tag match or comparison (block 114)
compares the tag portion of the displacement field 60 to the
identifier values or tags associated with the current thread.
[0031] For single- or multi-threaded processors (i.e., processors
that execute multiple processes simultaneously) that have the
extended register space 38, the operating system is preferably
adapted to save and restore the extended register space 38 for each
thread or process in response to a context switch (i.e., when
switching from execution of one process or thread to another
process or thread). Additionally, an efficient transfer of operands
between the eight traditional on-chip general purpose registers and
the extended register space 38 can be implemented by mapping the
traditional registers into the extended register space 38.
Alternatively, the eight traditional registers associated with
known IA-32 processors may be kept physically and logically
separate from the extended register space 38 and specific encodings
of the Mr/m field 56 can be used to indicate that a source or
destination operand is located in one of the eight traditional
on-chip registers.
[0032] Further optimization of the use of the extended register
space 38 can be achieved with processors having trace cache-based
microarchitectures. In particular, when a processor having a trace
cache-based microarchitecture identifies an instruction that
requires access to the extended register space 38, information
relating to that instruction and the extended register space to
which it requires access can be stored in the microcode trace to
enable more efficient processing of that instruction during
subsequent invocations of the instruction.
[0033] FIG. 5 is a block diagram that depicts another example
manner in which instruction encoding fields 150 of a standard IA-32
instruction can be used by the processor 12 shown in FIG. 1 to
access the extended register space 38. As shown in FIG. 5, the
example instruction is composed using standard IA-32 processor
instruction encoding fields (i.e., the encoding fields that are
used with IA-32 processors having only eight on-chip general
purpose registers). As with the instruction shown in FIG. 3, the
example encoding fields 150 include a prefix field 152, an opcode
field 154, an Mr/m byte or field 156, an Sib field 158, a
displacement addressing field 160 and an immediate addressing field
162.
[0034] As depicted in FIG. 5, bits three to five of the Mr/m field
156 and an offset portion (i.e., bits zero to eleven) 163 of the
displacement field 160 are used by the processor 12 to access three
operands within three different registers. In the example shown in
FIG. 5, bits within the Mr/m field 156 and the offset portion 163
of the displacement field 160 are decoded as a three operand add
with carry (ADC) instruction 164. However, the principals depicted
in FIG. 5 could be applied to any other instruction. Mnemonically,
the ADC instruction 164 can be depicted as DESTSRC1+SCR2+CF.
[0035] To process the instruction shown in FIG. 5, the processor 12
executes the register decode process (block 116 of FIG. 4) so that
bits three to five of the Mr/m field 156 and bits ten and eleven of
the offset 163 are used to address the destination (DEST) operand,
bits five to nine of the offset 163 are used to address the first
source operand (SRC1) and bits zero to four of the offset 163 are
used to address the second source operand (SRC2). Thus, each of the
three operands shown in FIG. 5 is represented by a five-bit value
and, as a result, each of the operands can randomly access any one
of thirty-two registers located in the extended register space 38
of the processor 12.
[0036] The example manner of enabling the processor 12 to access an
extended register space depicted in FIG. 5 is similar to the
technique depicted in FIG. 4. However, as can be seen from a
comparison of FIGS. 3 and 5, the manner in which the bits of the
displacement field are decoded enables native backward
compatibility of software written using the standard IA-32
encodings on known IA-32 processors.
[0037] On the other hand, software written using the standard IA-32
instruction encodings for a processor such as that shown in the
example of FIG. 5 is not natively backward compatible with known
IA-32 processors. However, backward compatibility can be achieved
by using a modified exception handler. In particular, because the
tag field of the pseudo memory displacement points to an unmapped
memory address, the fault handler can be used to inspect an
instruction that is attempting to access this unmapped memory, and
emulate the functionality of the instruction. Upon completion, the
fault handler returns program execution to the instruction
following the emulated instruction. Of course, a substantial
performance penalty is incurred as a result of using a fault
handler to emulate each software instruction that attempts to
access the extended register space within a processor that does not
have the extended register space.
[0038] Although certain methods and apparatus implemented in
accordance with the teachings of the invention have been described
herein, the scope of coverage of this patent is not limited
thereto. On the contrary, this patent covers all embodiments of the
teachings of the invention fairly falling within the scope of the
appended claims either literally or under the doctrine of
equivalents.
* * * * *