U.S. patent application number 12/411913 was filed with the patent office on 2009-07-23 for method and apparatus for improved computer load and store operations.
Invention is credited to Stephen Melvin, Enrique Musoll, Mario NEMIROVSKY, Narendra Sankar.
Application Number | 20090187739 12/411913 |
Document ID | / |
Family ID | 39078450 |
Filed Date | 2009-07-23 |
United States Patent
Application |
20090187739 |
Kind Code |
A1 |
NEMIROVSKY; Mario ; et
al. |
July 23, 2009 |
Method and Apparatus for Improved Computer Load and Store
Operations
Abstract
Load and store operations in computer systems are extended to
provide for Stream Load and Store and Masked Load and Store. In
Stream operations, a CPU executes a Stream instruction that
indicates, by appropriate arguments, a first address in memory or a
first register in a register file from whence to begin reading data
entities, and a first address or register from whence to begin
storing the entities, and a number of entities to be read and
written. In Masked Load and Masked Store operations stored masks
are used to indicate patterns relative to first addresses and
registers for loading and storing. Bit-string vector methods are
taught for masks.
Inventors: |
NEMIROVSKY; Mario; (Los
Gatos, CA) ; Musoll; Enrique; (San Jose, CA) ;
Sankar; Narendra; (Campbell, CA) ; Melvin;
Stephen; (Los Gatos, CA) |
Correspondence
Address: |
STERNE, KESSLER, GOLDSTEIN & FOX P.L.L.C.
1100 NEW YORK AVENUE, N.W.
WASHINGTON
DC
20005
US
|
Family ID: |
39078450 |
Appl. No.: |
12/411913 |
Filed: |
March 26, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11876442 |
Oct 22, 2007 |
7529907 |
|
|
12411913 |
|
|
|
|
09629805 |
Jul 31, 2000 |
|
|
|
11876442 |
|
|
|
|
09240012 |
Jan 27, 1999 |
6292888 |
|
|
09629805 |
|
|
|
|
09273810 |
Mar 22, 1999 |
6389449 |
|
|
09240012 |
|
|
|
|
09216017 |
Dec 16, 1998 |
6477562 |
|
|
09273810 |
|
|
|
|
09312302 |
May 14, 1999 |
7020879 |
|
|
09216017 |
|
|
|
|
Current U.S.
Class: |
712/204 ;
712/E9.034 |
Current CPC
Class: |
G06F 9/30036 20130101;
G06F 9/30043 20130101 |
Class at
Publication: |
712/204 ;
712/E09.034 |
International
Class: |
G06F 9/315 20060101
G06F009/315 |
Claims
1. In computer operation, a method for selecting data entities from
a memory and writing the data entities to a register file,
comprising: consulting a first map of entities to copy relative to
a first address; selecting and reading those entities indicated by
the map; consulting a second map of positions to write the entities
copied from the memory, relative to a first register; and writing
the entities to the register file according to the second map.
2. The method of claim 1, wherein the steps follow from a Masked
Load instruction implemented according to an instruction set
architecture (ISA).
3. The method of claim 2, wherein the ISA is MIPS.
4. The method of claim 3, wherein arguments of the Masked Load
instruction indicate a beginning memory address for positioning a
mask, a mask number to be used, and a first register where to begin
writing data entities in the register file.
5. The method of claim 1, wherein the first and second maps are
implemented as bit strings, wherein the position of bits in the
string indicate the positions for data entities to be selected from
memory, and the registers to which data entities are to be
written.
6. The method of claim 2, wherein the execution of the Masked Load
is performed in a Dynamic Multi-streaming (DMS) processor by a
first stream running a first thread, and the first stream remains
inactive while the Masked Load instruction is executed.
7. In computer operation, a method for selecting data entities from
a register file and writing the data entities to a memory,
comprising: consulting a first map of entities to read relative to
a first register; selecting and reading those entities indicated by
the map; consulting a second map of positions to write the entities
read from the register file, relative to a first address; and
writing the entities to the memory file according to the second
map.
8. The method of claim 7, wherein the steps follow from a Masked
Store instruction implemented according to an instruction set
architecture (ISA).
9. The method of claim 8, wherein the ISA is MIPS.
10. The method of claim 9, wherein arguments of the Masked Store
instruction indicate a beginning register for positioning a mask, a
mask a number to be used, and a first register where to begin
writing data entities in the memory.
11. The method of claim 7, wherein the first and second maps are
implemented as bit strings, wherein the position of bits in the
string indicate the positions for data entities to be read, and the
registers to which data entities are to be written.
12. For use in computer operations, a Stream Load instruction
comprising: an indication of the instruction; a first argument
indicating a first address in a memory from which to begin reading
data entities; a second argument indicating a first register in a
register file from which to write the data entities read from the
memory; and a third argument indicating a number of data entities
to be read and written.
13. For use in computer operations, a Masked Load instruction
comprising: an indication of the instruction; a first argument
indicating a first address in a memory at which to position a mask
to indicate data entities to be read; a second argument indicating
a first register in a register file beginning at which to write the
data entities read from the memory; and a third argument indicating
a mask number to be used to select the data entities to be read and
written.
14. For use in computer operations, a Masked Store instruction
comprising: an indication of the instruction; a first argument
indicating a first register in a register file at which to position
a mask to indicate data entities to be read; a second argument
indicating a first address in a memory beginning at which to write
the data entities read from the register file; and a third argument
indicating a mask number to be used to select the data entities to
be copied and written.
15. A computing system, comprising: a CPU; a memory; and a register
file, characterized in that the CPU, in loading data entities from
the memory into the register file, reads a predetermined number of
data entities, and writes the data entities into registers of the
register file in the same order as in the memory, beginning at a
predetermined first register.
16. The system of claim 15, wherein the transferring of data
entities from memory into the register file follow from a Stream
Load instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU.
17. The system of claim 16, wherein the ISA is MIPS.
18. The system of claim 17, wherein arguments of the Stream Load
instruction indicate a beginning memory address from which to read
data entities, a first register in the register file from which to
write the data entities, and a number indicating the number of data
entities to read and write.
19. The system of claim 16, wherein the execution of the Stream
Load is performed in a Dynamic Multi-streaming (DMS) processor by a
first stream running a first thread, and the first stream remains
inactive while the Stream Load instruction is executed.
20. The system of claim 16, wherein the execution of the Stream
Load is performed in a Dynamic Multi-streaming (DMS) processor by a
first stream running a first thread, and the first stream executes
instructions that do not depend on values in memory affected by the
Stream Load instruction while the Stream Load instruction is
executed.
21. A computing system, comprising: a CPU; a memory; and a register
file, characterized in that the CPU, in storing data entities into
the memory from the register file, reads a predetermined number of
data entities from the register file, and writes the data entities
into addressed locations in memory in the same order as in the
register file, beginning at a predetermined first address.
22. The system of claim 21, wherein the storing of data entities
from the register file into memory follows from a Stream Store
instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU.
23. The system of claim 22, wherein the ISA is MIPS.
24. The system of claim 23, wherein arguments of the Stream Store
instruction indicate a first register file from which to read data
entities, a first address in memory to which to write the data
entities, and a number indicating the number of data entities to
read and write.
25. The system of claim 22, wherein the execution of the Stream
Store is performed in a Dynamic Multi-streaming (DMS) processor by
a first stream running a first thread, and the first stream remains
inactive while the Stream Store instruction is executed.
26. The system of claim 22, wherein the execution of the Stream
Store is performed in a Dynamic Multi-streaming (DMS) processor by
a first stream running a first thread, and the first stream
executes instructions that do not depend on values in memory
affected by the Stream Store instruction while the Stream Store
instruction is executed.
27. A computing system, comprising: a CPU; a memory; and a register
file; characterized in that the CPU, in loading data entities from
the memory into the register file, enters the memory at a first
address, reads data entities according to a pre-determined pattern
relative to the first address, and writes the data entities into
registers of the register file in a pre-determined pattern relative
to a first register.
28. The system of claim 27, wherein the loading of data entities
from memory into the register file follows from a Masked Load
instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU.
29. The system of claim 28, wherein the ISA is MIPS.
30. The system of claim 29, wherein arguments of the Masked Load
instruction indicate a beginning memory address from which to read
data entities, a first register in the register file beginning at
which to write the data entities, and a Mask Number indicating a
stored mask to be employed to indicate the relative positions in
the memory and register file for reading a writing data
entities.
31. The system of claim 30, wherein the stored masks are
implemented as two bit-string vectors, a first vector indicating
which data entities relative to the first address to read, and the
second indicating into which registers relative to the first
register to write the data entities.
32. The method of claim 31, wherein bit string maps are expressed
as sub-masks, and sub masks are linkable in different combinations
to provide combined masks.
33. The system of claim 29, wherein the execution of the Masked
Load is performed in a Dynamic Multi-streaming (DMS) processor by a
first stream running a first thread, and the first stream remains
inactive while the Stream Load instruction is executed.
34. The system of claim 29, wherein the execution of the Masked
Load is performed in a Dynamic Multi-streaming (DMS) processor by a
first stream running a first thread, and the first stream executes
instructions that do not depend on values in memory affected by the
Masked Load instruction while the Masked Load instruction is
executed.
35. A computing system, comprising: a CPU; a memory; and a register
file; characterized in that the CPU, in storing data entities into
the memory from the register file, enters the register file at a
first register, reads data entities from the register file
according to a pre-determined pattern, and writes the data entities
into addressed locations in memory also according to a
pre-determined pattern, beginning at a first address.
36. The system of claim 35, wherein the storing of data entities
from the register file into memory follows from a Masked Store
instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU.
37. The system of claim 36, wherein the ISA is MIPS.
38. The system of claim 37, wherein arguments of the Masked Store
instruction indicate a beginning memory address from which to read
data entities, a first register in the register file beginning at
which to write the data entities, and a Mask Number indicating a
stored mask to be employed to indicate the relative positions in
the memory and register file for reading and writing the data
entities.
39. The system of claim 38, wherein the stored masks are
implemented as two bit-string vectors, a first vector indicating
which data entities relative to the first register to read, and the
second indicating into which registers relative to the first
address to write the data entities.
40. The method of claim 39, wherein bit string maps are expressed
as sub-masks, and sub masks are linkable in different combinations
to provide combined masks.
41. The system of claim 36, wherein the execution of the Masked
Store Load is performed in a Dynamic Multi-streaming (DMS)
processor by a first stream running a first thread, and the first
stream remains inactive while the Stream Load instruction is
executed.
42. The system of claim 36, wherein the execution of the Masked
Store is performed in a Dynamic Multi-streaming (DMS) processor by
a first stream running a first thread, and the first stream
executes instructions that do not depend on values in memory
affected by the Masked Store instruction while the Masked Store
instruction is executed.
43. A dynamic multistreaming processor, comprising: a first
plurality k of individual streams; and a second plurality m of
masks or mask sets, wherein individual masks or masks sets of the
second plurality m are dedicated to exclusive use of individual
ones of the first plurality of k streams for performing Masked Load
and/or Masked Store operations.
44. The DMS processor of claim 43, wherein individual masks or mask
sets are amendable only by the stream to which the individual mask
or mask sets are dedicated.
45. A dynamic multistreaming (DMS) processor system, comprising: a
plurality k of individual streams; a set of masks or mask sets for
use in performing Masked Load and Masked Store operations, wherein
multiple data entities are loaded or stored as a result of
executing a single instruction, and according to the masks; a cache
memory; and a system memory, characterized in that the system, in
performing a Masked Load or a Masked Store operation transfers data
entities directly between the system memory and one or more
register files.
Description
CROSS-REFERENCE TO RELATED DOCUMENTS
[0001] The present application is a continuation of U.S.
application Ser. No. 11/876,442, filed Oct. 22, 2007 (allowed), and
claims priority to U.S. Provisional Application Ser. No.
60/176,937, filed Jan. 18, 2000. The Ser. No. 11/876,442
application is a divisional of co-pending U.S. application Ser. No.
09/629,805. The Ser. No. 09/629,805 application is a
continuation-in-part of U.S. Pat. Nos. 6,292,888, 6,389,449,
6,477,562, and 7,020,879. All of which are incorporated into the
present application by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Field of the Invention
[0003] The present invention is in the field of digital processing
and pertains more particularly to apparatus and methods for loading
and storing data entities in computer operations.
[0004] 2. Background Art
[0005] The present invention is in the area of CPU operations in
executing instructions from software. As is known in the art there
are many kinds of instruction set architectures (ISA), and certain
architectures have become favored in many computer operations. One
of those architectures is the well-known MIPS ISA, and the MIPS ISA
is used in the present specification in several examples. The
invention, however, is not limited to MIPS ISA.
[0006] One of the necessary operations in computer processes when
executing instructions is moving data entities between
general-purpose or cache memory and register files in a CPU where
the data is readily accessible. When more than one data entity must
be loaded or stored before execution can commence or continue,
several instructions are needed in a conventional instruction set
architecture. In applications that need to access data the present
inventors have discovered that it would be desirable to have a
single instruction that could load or store data entities that are
related in a known pattern, and that a single instruction capable
of such operation would significantly improve the speed and
efficiency of many computer operations.
[0007] What is therefore clearly needed is a method and apparatus
comprising a single instruction for indicating data entities having
a known positional relationship in memory, and for loading or
storing a series of such data entities as a result of executing the
single instruction.
BRIEF SUMMARY OF THE INVENTION
[0008] In a preferred embodiment of the present invention, in
computer operation, a method for selecting data entities from a
memory and writing the data entities to a register file is
provided, comprising steps of (a) selecting and reading N entities
beginning at a first address; and (b) writing the entities to the
register file from a first register in the order of the entities in
the memory. In preferred embodiments, the steps follow from a
Stream Load instruction implemented according to an instruction set
architecture (ISA), and the ISA may be MIPS. Also, in a preferred
embodiment, arguments of the Stream Load instruction indicate a
beginning memory address from which to read data entities, a first
register in the register file at which to begin writing the data
entities, and a number indicating the number of data entities to
read and write.
[0009] In another aspect of the invention, in computer operation, a
method for selecting data entities from a register file and writing
the data entities to a memory is provided, comprising steps of (a)
selecting and reading N entities beginning at a first register; and
(b) writing the entities to the memory from a first address in the
order of the entities in the register file. In preferred
embodiments, the steps follow from a Stream Store instruction
implemented according to an instruction set architecture (ISA), and
the ISA is MIPS. Also, in preferred embodiments, arguments of the
Stream Store instruction indicate a beginning register from which
to read data entities, an address in memory from which to write the
data entities, and a number indicating the number of data entities
to read and write.
[0010] In another aspect of the invention, in computer operations,
a method for selecting data entities from a memory and writing the
data entities to a register file is provided, comprising steps of
(a) consulting a first map of entities to copy relative to a first
address; (b) selecting and reading those entities indicated by the
map; (c) consulting a second map of positions to write the entities
copied from the memory, relative to a first register; and (d)
writing the entities to the register file according to the second
map. In preferred embodiments, the steps follow from a Masked Load
instruction implemented according to an instruction set
architecture (ISA). Also, in preferred embodiments, the ISA is
MIPS. Also, in preferred embodiments, arguments of the Masked Load
instruction indicate a beginning memory address for positioning a
mask, a mask number to be used, and a first register where to begin
writing data entities in the register file. In some embodiments,
the first and second maps are implemented as bit strings, wherein
the position of bits in the string indicate the positions for data
entities to be selected from memory, and the registers to which
data entities are to be written.
[0011] In yet another aspect of the invention, a method for
selecting data entities from a register file and writing the data
entities to a memory is provided, comprising steps of (a)
consulting a first map of entities to read relative to the first
register; (b) selecting and reading those entities indicated by the
map; (c) consulting a second map of positions to write the entities
read from the register file, relative to the first address; and (d)
writing the entities to the memory file according to the second
map. In preferred embodiments, the steps follow from a Masked Store
instruction implemented according to an instruction set
architecture (ISA), and the ISA maybe MIPS. Also, in preferred
embodiments, arguments of the Masked Store instruction indicate a
beginning register for positioning a mask, a mask a number to be
used, and a first register where to begin writing data entities in
the memory. In some embodiments, the first and second maps are
implemented as bit strings, wherein the position of bits in the
string indicate the positions for data entities to be read, and the
registers to which data entities are to be written.
[0012] In yet another embodiment of the invention, for use in
computer operations, a Stream Load instruction is provided
comprising an indication of the instruction; a first argument
indicating a first address in a memory from which to begin reading
data entities; a second argument indicating a first register in a
register file from which to write the data entities read from the
memory; and a third argument indicating a number of data entities
to be read and written.
[0013] In another aspect, a Stream Store instruction is provided
comprising an indication of the instruction; a first argument
indicating a first address in a register file from which to begin
reading data entities; a second argument indicating a first address
in a memory beginning from which to write the data entities read
from the register file; and a third argument indicating a number of
data entities to be read and written.
[0014] In still another aspect, a Masked Load instruction is
provided comprising an indication of the instruction; a first
argument indicating a first address in a memory at which to
position a mask to indicate data entities to be read; a second
argument indicating a first register in a register file beginning
at which to write the data entities read from the memory; and a
third argument indicating a mask number to be used to select the
data entities to be read and written.
[0015] In still another aspect, a Masked Store instruction is
provided comprising an indication of the instruction; a first
argument indicating a first register in a register file at which to
position a mask to indicate data entities to be read; a second
argument indicating a first address in a memory beginning at which
to write the data entities read from the register file; and a third
argument indicating a mask number to be used to select the data
entities to be copied and written.
[0016] In another aspect, a computing system is provided comprising
a CPU; a memory; and a register file. The system is characterized
in that the CPU, in loading data entities from the memory into the
register file, reads a predetermined number of data entities, and
writes the data entities into registers of the register file in the
same order as in the memory, beginning at a predetermined first
register. In preferred embodiments of the system, the transferring
of data entities from memory into the register file follow from a
Stream Load instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU, and the ISA may be
MIPS. In some embodiments, arguments of the Stream Load instruction
indicate a beginning memory address from which to read data
entities, a first register in the register file from which to write
the data entities, and a number indicating the number of data
entities to read and write.
[0017] In yet another aspect, a computing system is provided
comprising a CPU; a memory; and a register file. The system is
characterized in that the CPU, in storing data entities into the
memory from the register file, reads a predetermined number of data
entities from the register file, and writes the data entities into
addressed locations in memory in the same order as in the register
file, beginning at a predetermined first address. In preferred
embodiments, the storing of data entities from the register file
into memory follows from a Stream Store instruction implemented
according to an instruction set architecture (ISA) and executed by
the CPU, and the ISA may be MIPS. Also in preferred embodiments,
arguments of the Stream Store instruction indicate a first register
file from which to read data entities, a first address in memory to
which to write the data entities, and a number indicating the
number of data entities to read and write.
[0018] In another aspect, a computing system is provided comprising
a CPU; a memory; and a register file. This system is characterized
in that the CPU, in storing data entities into the memory from the
register file, reads a predetermined number of data entities from
the register file, and writes the data entities into addressed
locations in memory in the same order as in the register file,
beginning at a predetermined first address. In preferred
embodiments, the storing of data entities from the register file
into memory follows from a Stream Store instruction implemented
according to an instruction set architecture (ISA) and executed by
the CPU, and the ISA may be MIPS. In some embodiments, arguments of
the Stream Store instruction indicate a first register file from
which to read data entities, a first address in memory to which to
write the data entities, and a number indicating the number of data
entities to read and write.
[0019] In another aspect, a computing system is provided comprising
a CPU; a memory; and a register file. The CPU, in loading data
entities from the memory into the register file, reads data
entities according to a pre-determined pattern relative to a first
address, and writes the data entities into registers of the
register file in a pre-determined pattern relative to a first
register. In preferred embodiments, the loading of data entities
from memory into the register file follows from a Masked Load
instruction implemented according to an instruction set
architecture (ISA) and executed by the CPU, and the ISA may be
MIPS. In some embodiments, arguments of the Masked Load instruction
indicate a beginning memory address from which to read data
entities, a first register in the register file beginning at which
to write the data entities, and a Mask Number indicating a stored
mask to be employed to indicate the relative positions in the
memory and register file for reading a writing data entities.
Further, the stored masks may be implemented as two bit-string
vectors, a first vector indicating which data entities relative to
the first address to read, and the second indicating into which
registers relative to the first register to write the data
entities.
[0020] In still another aspect, a computing system is provided
comprising a CPU; a memory; and a register file. In the system, the
CPU, in storing data entities into the memory from the register
file, reads data entities from the register file according to a
pre-determined pattern, and writes the data entities into addressed
locations in memory also according to a pre-determined pattern,
beginning at a first address. In preferred embodiments, the storing
of data entities from the register file into memory follows from a
Masked Store instruction implemented according to an instruction
set architecture (ISA) and executed by the CPU, and the ISA may be
MIPS. In preferred embodiments, arguments of the Masked Load
instruction indicate a beginning memory address from which to read
data entities, a first register in the register file beginning at
which to write the data entities, and a Mask Number indicating a
stored mask to be employed to indicate the relative positions in
the memory and register file for reading and writing the data
entities. In some embodiments, the stored masks are implemented as
two bit-string vectors, a first vector indicating which data
entities relative to the first register to read, and the second
indicating into which registers relative to the first address to
write the data entities.
[0021] In still another aspect, a dynamic multistreaming (DMS)
processor is provided, comprising a first plurality k of individual
streams, and a second plurality m of masks or mask sets. Individual
masks or masks sets of the second plurality m are dedicated to
exclusive use of individual ones of the first plurality of k
streams for performing Masked Load and/or Masked Store operations.
In preferred embodiments, individual masks or mask sets are
amendable only by the stream to which the individual mask or mask
sets are dedicated.
[0022] In still another aspect, a dynamic multistreaming (DMS)
processor system is provided, comprising a plurality k of
individual streams, a set of masks or mask sets for use in
performing Masked Load and Masked Store operations, wherein
multiple data entities are loaded or stored as a result of
executing a single instruction, and according to the masks, a cache
memory, and a system memory. The system is characterized in that
the system, in performing a Masked Load or a Masked Store
operation, transfers data entities directly between the system
memory and one or more register files.
[0023] In embodiments of the invention taught in enabling detail
below, for the first time methods and apparatus are provided for
load and store operations in computer systems wherein multiple data
entities may be read and written according to a single instruction,
saving many cycles in execution, and data entities may be selected
for reading and writing consecutively, or according to pre-stored
position masks.
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES
[0024] FIG. 1A is a schematic diagram of a memory and a register
file illustrating a Stream Load operation according to an
embodiment of the present invention.
[0025] FIG. 1B is a schematic diagram of a memory and a register
file illustrating a Stream Store operation according to an
embodiment of the present invention.
[0026] FIG. 2A is a schematic diagram of a memory and a register
file illustrating a Masked Load operation according to an
embodiment of the present invention.
[0027] FIG. 2B illustrates an exemplary mask according to an
embodiment of the present invention.
[0028] FIG. 2C illustrates a set of masks according to an
embodiment of the present invention.
[0029] FIG. 3A illustrates a mask comprising submasks implemented
as vectors according to an embodiment of the invention.
[0030] FIG. 3B illustrates a memory and a register file in masked
operations according to an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0031] As was described briefly above, there exist in the technical
field of computer operations a number of different instruction set
architectures (ISA). An instruction set architecture, generally
speaking, is the arrangement of bits and sets of bits in a binary
word that a CPU interprets as an instruction. The well-known MIPS
ISA is the architecture used by the present Inventors in
implementing the present invention in a preferred embodiment, but
the invention is certainly not limited to the MIPS ISA. For this
reason the specific use of portions of an instruction word as known
in MIPS architecture will not be described in detail herein. It is
well-known that the MIPS architecture provides unused op-codes that
can be used to implement new instructions, and the present
inventors, in the MIPS preferred embodiment, have taken advantage
of this feature.
[0032] Because the invention will apply to conceivably any ISA, the
inventors will specify and describe the instructions that initiate
new and non-obvious functions in the following manner: [0033]
Instruction A, B, C where A, B, and C are arguments defining
parameters for functions to be performed in executing the
instruction.
[0034] FIG. 1A is a schematic diagram illustrating a memory 11,
which may be any memory, such as a cache memory or a system memory
from which a CPU may fetch data, and a register file 15. Memory 11
has a Word Width, which in a preferred embodiment is 32 bits, and
register file 15 similarly has a register width. The word width and
the register width are preferably the same, but may differ in
different embodiments of the invention.
[0035] Below the schematic of memory and register file in FIG. 1A
there is a logical structure for a Stream Load instruction
according to an embodiment of the present invention. In the
instruction structure there is an instruction opcode (for Stream
Load), and three arguments, being a first argument @, a second
argument "first register," and a third argument "N." Referring to
the diagram, when the CPU executes this instruction, it knows from
the instruction opcode what the order of operations is to be,
taking words from memory 11 and writing these words into register
file 15. The arguments provide the parameters.
[0036] In the example shown the CPU will read N consecutive words,
beginning at address @ in memory 11, shown in FIG. 1A as words 13
in the shaded area, and will write those N words in the same order
to register file 15, beginning at register "first register"
providing in the register file the block of words 17. In one
embodiment of the Stream Load and/or Stream Store instruction, the
number N is specified by means of an immediate value or the
contents of a register specified by the instruction.
[0037] In alternative embodiments of the invention, because the
width of a word in memory may differ from the width of a register
in the register file, words selected from memory may affect more
than a single register, or may not fill a register. If the memory
word, for example, is twice the register width, one memory word
will fill two consecutive registers, and a selected number of
memory words will fill twice that number of registers. On the other
hand, if a memory word is one-half the register width it will take
two memory words to fill a single register.
[0038] FIG. 1B is a schematic diagram similar to FIG. 1A, but
depicting a companion Stream Store instruction, wherein the CPU,
executing the instruction, will read N consecutive words (words 17)
from register file 15, beginning at register "first register", and
will write those N words in the same order to memory 11 beginning
at address @ defined in the arguments, providing words 13.
[0039] The new instructions defined herein have important
application in several instances, one of which is in application of
multi-streaming processors to processing packets in network packet
routing. These instructions, however, will find many other uses in
use of virtually any sort of processor in a wide range of
applications.
[0040] In packet processing, many packets have identical structure,
and it is necessary, once a packet is brought into a router and
stored in a memory such as memory 11, to load certain header fields
into a register file to be processed according to certain rules. As
the structure is known, bytes that comprise the header may be
stored in memory consecutively, the arguments of the new Stream
Load and Stream Store instructions may be structured to load all of
the necessary data for a packet to a register file for processing,
and to store registers after processing. It may, of course, be the
same or different registers that are stored as the registers that
are used in Load.
[0041] There are similarly many other potential applications for
Stream Load and Stream Store, which will improve computer
operations in many instances.
[0042] In an alternative embodiment, of the present invention the
inventors have determined the functionality of the invention may be
significantly enhanced by structuring new commands to load and
store multiple words without a limitation that the words be
consecutive in either the memory or in the register file. The new
commands are named Masked Load and Masked Store respectively.
[0043] FIG. 2A is a schematic diagram of memory 11 and register
file 15 illustrating an example of Masked Load. Memory 11 in this
example is 1 byte wide, and 8 memory words are shown in memory 11,
arbitrarily numbered 0 through 7. Each word has a memory address as
is known in the art. Register file 15 in this example is 4 bytes
wide, and is shown organized into registers arbitrarily numbered on
the left from 0 to 7. Below the schematic is an example of the
organization of a Masked Load instruction, having three arguments.
A first argument is an address in memory 11, the second argument is
a first register in the register file, and the third argument is
now a mask number.
[0044] FIG. 2B illustrates a Mask example having two columns, the
left-most column for memory byte number, as shown, and the
right-most column for relative register number.
[0045] This is the mask for the Masked Load example of FIG. 2A.
Note that memory byte numbers 0, 3, 5, and 7 are listed in the
left-most column, and relative register numbers 0, 0, 2, and 3 are
listed in the right-most column. The mask tells the Masked Load
instruction which memory bytes to read, and where to write these
bytes into the register file.
[0046] Referring again to FIG. 2A, note that relative memory bytes
0, 3, 5, and 7 are shaded (each differently). The address (@)
argument of the Masked Load instruction tells the CPU where to
position the mask in memory, and the mask selects the bytes to read
relative to the starting address. Since the register file is four
bytes wide, four bytes from memory can be written side-by-side in a
single register of the register file. In this example the default
is that selected bytes will be written into the register file
beginning in the least significant byte of each register, which is,
by default, the right-most byte in this example.
[0047] The mask says that relative memory byte number 0 is to go to
relative register number 0. This is the first register indicated by
the second argument of the instruction. Memory byte 0 is thus shown
as written to the least significant byte of relative register 0 in
the register file. The mask indicates next that relative memory
byte 3 is also to be written to relative register 0 of the register
file. Since this is the second byte to go to relative register 0,
it is written to the second to the second least significant byte in
the indicated register of the register file. Memory byte 5 is
written to relative register 2, and since it is the only byte to go
to register 2, it goes in the l.s. position. Relative memory byte 7
goes to relative register 3 according to the mask, and this is
shown in FIG. 2A as well. The cross-hatching has been made common
to illustrate the movement of data from the memory to the register
file.
[0048] By default in this example data entities, selected from
memory are written to registers beginning at the least significant
byte until a next entity is to be written to a different register.
This is just one example of placement of selected bytes in
registers. Any other placement may also be indicated by a mask, and
the simple mask shown could have more columns indicating byte
placement in registers. Many mask implementations and defaults are
possible within the spirit and scope of the invention.
[0049] Just as illustrated above in the case of the Stream Load and
Stream Store operations, the Masked Load operation has a matching
Masked Store instruction as well. In the Store case, in the
instruction architecture selected bits indicate the Store as
opposed to Load operation, and the arguments have the same
structure as for the Masked Load.
[0050] It will be apparent to the skilled artisan that the masks
can be of arbitrary number in different embodiments of the
invention, and the length of each mask, defining the number and
position of bytes to be loaded, can vary in different embodiments
as well. In one embodiment of the present invention, the masks are
useful in the situation discussed briefly above, that of processing
data packets in routing machines. In this particular case, the
masks can be implemented to capture certain patterns of data
entities from a memory, such as certain headers of packets for
example, in processing data packets for routing.
[0051] Also in some embodiments of the present invention Masked
Load and Masked Store instructions are used in threads (software)
used for packet processing using dynamic multi-streaming
processors. These processors have plural physical streams, each
capable of supporting a separate thread, and each stream typically
has a dedicated register file. In this case mask sets can be stored
and dedicated to individual streams, or shared by two or more, or
all streams. Such dynamic multi-streaming (DMS) processors are
described in detail in the priority documents listed in the
Cross-Reference to related documents above.
[0052] In a preferred embodiment, masks are programmable, such that
mask sets can be exchanged and amended as needed. Masks may be
stored in a variety of ways. They may be stored and accessible from
system memory for example, or in hidden registers on or off a
processor, or in programmable ROM devices. In some embodiments,
facility is provided wherein masks may be linked, making larger
masks, and providing an ability to amend masks without
reprogramming. In one embodiment of the invention 32 masks are
provided and up to 8 masks may be linked. In some cases, masks may
be stored in the instruction itself, if the instruction is of
sufficient width to afford the bits needed for masking. If the
instruction width is, for example, 64 bits, and only 32 bits are
needed for the instruction itself, the other 32 bits may be a mask
vector.
[0053] In the matter of programmability, masks may be programmed
and/or amended in a variety of ways. Programming can be manual, in
the sense of requiring human intervention, or amendable by dynamic
action of the processing system using the masks. In the latter
case, in application to DMS processors, there may be certain
software burden, because, if one stream is using a mask or a set of
masks in a load or store operation, it must be guaranteed that no
other stream will update that mask or mask set. So in the case of
DMS processors it is preferred that masks be dedicated to streams.
In such a processor system, having k streams, there might be a mask
or a set of masks dedicated to each of the k streams, such that a
particular stream can only use and update its own mask or set of
masks.
[0054] In the descriptions above, no particular distinction has
been made to the memory source and destination of data entities for
a Masked Load or a Masked Store operation. It is well known in the
art, however, that state-of-the-art processors operate typically
with cache memory rather than directly with system memory only.
Cache memory and cache operations are notoriously well-known in the
art, and need not be described in detail here.
[0055] In one embodiment of Masked Load and Store operations used
with DMS processors according to the present invention, the masked
load/stored could chose to bypass the cache (i.e. the access goes
directly to the memory without consulting whether the required data
resides in the cache), even if the memory access belongs to a
cacheable space. Then, it is up to software to guarantee the
coherency of the data. If the data cache is bypassed, the
read/write ports to the data cache are freed for other accesses
performed by the regular load/stores by other streams. Ports to
caches are expensive.
[0056] In a preferred embodiment of the invention, masks (or in
some cases parts of masks) are implemented as two vectors, each
written and stored as a 32-bit word. FIG. 3A is an illustration of
vector-masks, and FIG. 3B illustrates a memory 17 and a register
file (context register) 19 wherein bytes from memory 17 are
transferred into file 19 according to the vector-mask of FIG.
3A.
[0057] Referring now to FIG. 3A, in each submask there are two
vectors, being a select vector and a register vector. A submask as
illustrated in FIG. 3A may be a complete mask, and a complete mask
may consist of up to eight (in this embodiment) submasks. This is
described in more detail below.
[0058] Referring now to submask 0 in FIG. 3A, there are ones in
bits 0, 1, 7, 12, and 13 in the Select vector. A one in any
position in the select vector is to select a relative bit to be
transferred from a memory to a register file. Other bits are zero.
Of course the opposite could be true.
[0059] Referring now to FIG. 3B, memory 17 is organized as 32 bytes
wide. In this example the application is packet processing, and the
data entities manipulated are bytes from header fields for packets.
As described before, the beginning position for selecting data
entities is given in the Masked Load instruction as the first
argument @ (for address, see FIG. 2A). The third argument provides
the Mask number, which is, in this case the two-vector submask of
FIG. 3A. The relevant bytes of the packet header stored in memory
17 and indicated as to-be-transferred by submask 0 of FIG. 3A are
shown in memory 17 of FIG. 3B as shaded, each a different shading.
This any combination or all of the bytes from the packet header of
32 bytes may be selected for transfer to a register file.
[0060] The Register vector of submask 0 indicates the relative
position within the register file to write the selected bytes. Note
there is a one in only one position in the Register vector in this
particular example, that at position 12. The significance of the
one in the register vector is to index the register wherein bytes
are to be stored in the register file. There may in other examples
be more than a single one in the register vector.
[0061] Referring now to FIG. 3B, bytes are stored in the register
file beginning at a first register (FR). The first register for
storage (start loading register) is the second argument of the
Masked Load instruction. In other applications and embodiments
there may be different defaults for different reasons. The Masked
Load instruction in this example begins loading selected bytes from
memory 17 into register file 19 at the first register and the
default is to load in order from the least significant position,
and adjacent, until the register is indexed by the register vector.
Another order could well be used in another embodiment. Accordingly
bytes 0, 1, and 7 are loaded into the first register from the right
(l.s.). The one at position 12 in the Register vector of FIG. 3A
indexes the register, so bytes 12 and 13 are loaded into the first
two positions of register FR+1. As there are no more bytes from
memory 17 selected, this is the end of the operation.
[0062] As described above and illustrated herein, submask 0 is a
complete mask. In a preferred embodiment, however, up to eight
submasks may be combined to make a mask. Each submask in this
embodiment has an end-of-mask bit as indicated in FIG. 3B. A one in
the end-of-mask bit indicates that submask is the last submask to
be combined to form the mask for a particular instruction.
[0063] It is emphasized that the example of vector masks described
just above is a single example. Many other masking schemes are
possible within the spirit and scope of the invention. For example,
selection and placement could be indicated by a single vector
wherein a first data entity indicated to be selected beginning at a
first address would be copied to a first register, and one or more
zeros between data entities to be selected would indicate an index
in the register in which following entities are to be placed in the
register file. Many such schemes are possible, and a relatively few
are indicated by example herein.
[0064] It will be apparent to the skilled artisan that, just as
described above in the case of Stream Load and Store instructions,
Masked Store may be accomplished in much the same fashion as the
Masked Load instruction described in detail.
[0065] In the store operations of the example, note that there are
bytes of the register file to which data entities are not written.
There is a choice of whether to leave these bytes or to clear them.
In a preferred embodiment the unused bytes are cleared.
[0066] It will be apparent to the skilled artisan that there are
many variations that may be made in the embodiments of the present
invention described above without departing from the spirit and
scope of the invention. For example, there are a wide variety of
ways that masks may be structured and implemented, and a wide
variety of ways that masks may be stored, programmed, exchanged,
and amended. There are similarly a variety of ways Masked Load and
Store instructions may be defined and implemented, depending on the
Instruction Set Architecture used. There are similarly many
applications for such unique instructions beyond the
packet-processing applications used as examples herein, and the new
instructions may be useful with many kinds of processors, including
Dynamic Multi-Streaming (DMS) Processors, which are a particular
interest of the present inventors.
[0067] In the matter of DMS processors, the present application is
related to four cases teaching aspects of DMS processors and their
functioning, all four of which are listed in the Cross-Reference
section above, and all four of which are incorporated into the
present case by reference. The use of the stream and masked
load/store instructions as taught above are especially interesting
in DMS processors, since the stream that executes the new
instructions in a thread can remain inactive while the masked
load/store instruction is being executed in a functional unit.
Therefore, other streams can make use of the rest of the resources
of the processor. The stream executing the new instructions does
not need to sit idle until the masked load/store completes,
however. That stream can go on and execute more instructions, as
long as the instructions do not depend on the values in the
registers affected by the masked load/store instruction in
execution. In other words, the stream could execute instructions
out-of-order.
[0068] In addition to the above, there is a wide choice of
granularity in different embodiments of the invention. In the
example used, bytes are selected, but in other embodiments the
granularity may be bits, words, or even blocks of memory. If words
are used, there need not be a register vector, if the register is
of the same word width. It should further be noted that the Stream
Load and Store operations are simply a particular case of the
Masked Load and Store operations.
[0069] Given the broad application of the invention and the broad
scope, the invention should be limited only by the claims which
follow.
* * * * *