U.S. patent application number 11/820350 was filed with the patent office on 2008-12-25 for age matrix for queue dispatch order.
This patent application is currently assigned to Raza Microelectronics, Inc.. Invention is credited to Gaurav Singh, Srivatsan Srinivasan, Lintsung Wong.
Application Number | 20080320274 11/820350 |
Document ID | / |
Family ID | 40137740 |
Filed Date | 2008-12-25 |
United States Patent
Application |
20080320274 |
Kind Code |
A1 |
Singh; Gaurav ; et
al. |
December 25, 2008 |
Age matrix for queue dispatch order
Abstract
An apparatus for queue allocation. An embodiment of the
apparatus includes a dispatch order data structure, a bit vector,
and a queue controller. The dispatch order data structure
corresponds to a queue. The dispatch order data structure stores a
plurality of dispatch indicators associated with a plurality of
pairs of entries of the queue to indicate a write order of the
entries in the queue. The bit vector stores a plurality of mask
values corresponding to the dispatch indicators of the dispatch
order data structure. The queue controller interfaces with the
queue and the dispatch order data structure. The queue controller
excludes at least some of the entries from a queue operation based
on the mask values of the bit vector.
Inventors: |
Singh; Gaurav; (Los Altos,
CA) ; Srinivasan; Srivatsan; (San Jose, CA) ;
Wong; Lintsung; (Santa Clara, CA) |
Correspondence
Address: |
ZILKA-KOTAB, PC- RMI
P.O. BOX 721120
SAN JOSE
CA
95172-1120
US
|
Assignee: |
Raza Microelectronics, Inc.
Cupertino
CA
|
Family ID: |
40137740 |
Appl. No.: |
11/820350 |
Filed: |
June 19, 2007 |
Current U.S.
Class: |
712/23 ;
712/E9.086 |
Current CPC
Class: |
G06F 9/3836 20130101;
G06F 9/3814 20130101; G06F 9/3838 20130101 |
Class at
Publication: |
712/23 ;
712/E09.086 |
International
Class: |
G06F 9/30 20060101
G06F009/30 |
Claims
1. An apparatus for queue allocation, the apparatus comprising: a
dispatch order data structure corresponding to a queue, the
dispatch order data structure to store a plurality of dispatch
indicators associated with a plurality of pairs of entries of the
queue to indicate a write order of the entries in the queue; a bit
vector to store a plurality of mask values corresponding to the
dispatch indicators of the dispatch order data structure; and a
queue controller to interface with the queue and the dispatch order
data structure, the queue controller to exclude at least some of
the entries from a queue operation based on the mask values of the
bit vector.
2. The apparatus according to claim 1, wherein the queue operation
comprises a dispatch operation to write a new entry in the
queue.
3. The apparatus according to claim 1, wherein the mask values of
the bit vector comprise a replay mask to mask a dispatch indictor
for an entry of the queue associated with a replay operation.
4. The apparatus according to claim 1, wherein the mask values of
the bit vector comprise an atomic flush mask to mask a dispatch
indicator for an entry of the queue associated with an atomic flush
operation.
5. The apparatus according to claim 1, wherein the mask values of
the bit vector comprise a hazard mask to mask a dispatch indicator
for an entry of the queue associated with prevention of a hazard
event.
6. The apparatus according to claim 5, wherein the hazard event
comprises a structural hazard event.
7. The apparatus according to claim 5, wherein the hazard event
comprises a data hazard event.
8. The apparatus according to claim 5, wherein the hazard event
comprises a control hazard event.
9. The apparatus according to claim 1, wherein the mask values of
the bit vector comprise a thread mask to mask a subset of dispatch
indicators for corresponding entries of the queue associated with a
thread of a plurality of threads in a multi-threaded processing
system.
10. The apparatus according to claim 1, further comprising a flop
bank with a plurality of flip-flops, each flip-flop to store a bit
value indicative of the dispatch order of the entries of a
corresponding pair of entries.
11. The apparatus according to claim 10, the queue controller
further comprising dispatch logic to interface with the dispatch
order data structure, the dispatch logic to flip the bit value for
at least one of the dispatch indicators in response to the queue
operation to write the new entry in the queue.
12. The apparatus according to claim 11, further comprising a
random access memory (RAM) device to store the queue and the
dispatch order data structure, wherein the queue comprises a fully
associative RAM structure and the dispatch order data structure
comprises a control structure separate from the fully associative
RAM structure.
13. The apparatus according to claim 1, further comprising a mapper
coupled to the queue, the mapper to dispatch the queue operation to
insert a new entry in the queue.
14. The apparatus according to claim 1, the queue controller
further comprising least recently used (LRU) logic, the LRU logic
to implement a queue entry replacement strategy for the queue based
on the dispatch order data structure, wherein the queue entry
replacement strategy comprises a true LRU replacement strategy or a
pseudo LRU replacement strategy.
15. A method for managing a dispatch order of entries in a queue,
the method comprising: storing a plurality of dispatch indicators
corresponding to pairs of entries in a queue, each dispatch
indicator indicative of the dispatch order of the corresponding
pair of entries; storing a bit vector comprising a plurality of
mask values corresponding to the dispatch indicators of the
dispatch order data structure; and performing a queue operation on
a subset of the entries in the queue, wherein the subset excludes
at least some of the entries of the queue based on the mask values
of the bit vector.
16. The method according to claim 15, wherein performing the queue
operation comprises dispatching a new entry into the queue.
17. The method according to claim 16, further comprising masking a
replay instruction stored in an entry of the queue to avoid
dispatching the new entry in the location of the replay
instruction.
18. The method according to claim 16, further comprising masking an
instruction stored in an entry of the queue from an atomic flush
operation to flush a plurality of instructions from the queue.
19. The method according to claim 16, further comprising masking an
instruction stored in an entry of the queue to prevent a hazard
event.
20. The method according to claim 19, wherein the hazard event
comprises a structural hazard, a data hazard, or a control
hazard.
21. The method according to claim 16, further comprising masking a
plurality of instructions associated with a first thread to give
priority to instructions associated with a second thread.
22. The method according to claim 15, further comprising storing
the dispatch indicators in a dispatch order data structure
corresponding to a representation of at least a partial matrix with
intersecting rows and columns, each row corresponding to one of the
entries of the queue and each column corresponding to one of the
entries of the queue, the intersections of the rows and columns
corresponding to the pairs of entries in the queue.
23. The method according to claim 15, further comprising storing
the dispatch indicators in a plurality of flip-flops of a flop
bank, each flip-flop comprising a bit value indicative of the
dispatch order of the corresponding pair of entries.
24. The method according to claim 15, further comprising
implementing a least recently used (LRU) replacement strategy for
the queue based on at least some of the dispatch indicators.
25. A computer readable storage medium embodying a program of
machine-readable instructions, executable by a digital processor,
to perform operations to facilitate queue allocation, the
operations comprising: store a plurality of dispatch indicators
corresponding to pairs of entries in a queue, each dispatch
indicator indicative of the dispatch order of the corresponding
pair of entries; store a bit vector comprising a plurality of mask
values corresponding to the dispatch indicators of the dispatch
order data structure; and perform a queue operation on a subset of
the entries in the queue, wherein the subset excludes at least some
of the entries of the queue based on the mask values of the bit
vector.
26. The computer readable storage medium according to claim 25, the
operations further comprising an operation to dispatch a new entry
into the queue.
27. The computer readable storage medium according to claim 25, the
operations further comprising an operation to mask a replay
instruction stored in an entry of the queue to avoid dispatching
the new entry in the location of the replay instruction.
28. The computer readable storage medium according to claim 25, the
operations further comprising an operation to mask an instruction
stored in an entry of the queue from an atomic flush operation to
flush a plurality of instructions from the queue.
29. The computer readable storage medium according to claim 25, the
operations further comprising an operation to mask an instruction
stored in an entry of the queue to prevent a hazard event.
30. The computer readable storage medium according to claim 29, the
operations further comprising an operation to mask a plurality of
instructions associated with a first thread to give priority to
instructions associated with a second thread.
Description
BACKGROUND
[0001] An instruction scheduling queue is used to store
instructions prior to execution. There are many different ways to
manage the dispatch order, or age, of instructions in an
instruction scheduling queue. A common queue implementation uses a
first-in-first-out (FIFO) data structure. In this implementation,
instruction dispatches arrive at the tail, or end, of the FIFO data
structure. A look-up mechanism finds the first instruction ready
for issue from the head, or start, of the FIFO data structure.
[0002] In conventional out-of-order implementations, instructions
are selected from anywhere in the FIFO data structure. This creates
"holes" in the FIFO data structure at the locations of the selected
instructions. To maintain absolute ordering of instruction
dispatches in the FIFO data structure (e.g., for fairness), all of
the remaining instructions after the selected instructions are
shifted forward in the FIFO, and the data structure is collapsed to
form a contiguous chain of instructions. Shifting and collapsing
the remaining queue entries in this manner allows new entries to be
added to the tail, or end, of the FIFO data structure. However,
with a robust out-of-order issue rate, several instructions are
shifted and collapsed every cycle. Hence, maintaining a contiguous
sequence of queue entries without "holes" consumes a significant
amount of power and processing resources.
SUMMARY
[0003] Embodiments of an apparatus are described. In one
embodiment, the apparatus is an apparatus for queue allocation. An
embodiment of the apparatus includes a dispatch order data
structure, a bit vector, and a queue controller. The dispatch order
data structure corresponds to a queue. The dispatch order data
structure stores a plurality of dispatch indicators associated with
a plurality of pairs of entries of the queue to indicate a write
order of the entries in the queue. The bit vector stores a
plurality of mask values corresponding to the dispatch indicators
of the dispatch order data structure. The queue controller
interfaces with the queue and the dispatch order data structure.
The queue controller excludes at least some of the entries from a
queue operation based on the mask values of the bit vector. Other
embodiments of the apparatus are also described.
[0004] Embodiments of a method are also described. In one
embodiment, the method is a method for managing a dispatch order of
queue entries in a queue. An embodiment of the method includes
storing a plurality of dispatch indicators corresponding to pairs
of entries in a queue. Each dispatch indicator is indicative of the
dispatch order of the corresponding pair of entries. The method
also includes storing a bit vector comprising a plurality of mask
values corresponding to the dispatch indicators of the dispatch
order data structure. The method also includes performing a queue
operation on a subset of the entries in the queue. The subset
excludes at least some of the entries of the queue based on the
mask values of the bit vector. Other embodiments of the method are
also described.
[0005] Embodiments of a computer readable storage medium are also
described. In one embodiment, the computer readable storage medium
embodies a program of machine-readable instructions, executable by
a digital processor, to perform operations to facilitate queue
allocation. The operations include operations to store a plurality
of dispatch indicators corresponding to pairs of entries in a
queue. Each dispatch indicator is indicative of the dispatch order
of the corresponding pair of entries. The operations also include
operations to store a bit vector comprising a plurality of mask
values corresponding to the dispatch indicators of the dispatch
order data structure, and to perform a queue operation on a subset
of the entries in the queue. The subset excludes at least some of
the entries of the queue based on the mask values of the bit
vector. Other embodiments of the computer readable storage medium
are also described.
[0006] Other aspects and advantages of embodiments of the present
invention will become apparent from the following detailed
description, taken in conjunction with the accompanying drawings,
illustrated by way of example of the principles of the
invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] FIG. 1 depicts a schematic block diagram of one embodiment
of a plurality of instruction scheduling queues with corresponding
dispatch order data structures.
[0008] FIG. 2 depicts a schematic diagram of one embodiment of a
dispatch order data structure in a matrix configuration.
[0009] FIG. 3 depicts a schematic diagram of one embodiment of a
sequence of data structure states of the dispatch order data
structure shown in FIG. 2.
[0010] FIG. 4 depicts a schematic diagram of another embodiment of
a dispatch order data structure with masked duplicate entries.
[0011] FIG. 5 depicts a schematic diagram of one embodiment of a
sequence of data structure states of the dispatch order data
structure shown in FIG. 4.
[0012] FIG. 6 depicts a schematic diagram of another embodiment of
a dispatch order data structure in a partial matrix
configuration.
[0013] FIG. 7 depicts a schematic diagram of one embodiment of a
sequence of data structure states of the dispatch order data
structure shown in FIG. 6.
[0014] FIG. 8 depicts a schematic block diagram of one embodiment
of an instruction queue scheduler which uses a dispatch order data
structure.
[0015] FIG. 9 depicts a schematic flow chart diagram of one
embodiment of a queue operation method for use with the instruction
queue scheduler of FIG. 8.
[0016] Throughout the description, similar reference numbers may be
used to identify similar elements.
DETAILED DESCRIPTION
[0017] FIG. 1 depicts a schematic block diagram of one embodiment
of a plurality of instruction scheduling queues 102 with
corresponding dispatch order data structures 104. In general, the
instruction scheduling queues 102 store instructions, or some
representative indicators of the instructions, prior to execution.
The instruction scheduling queues 102 are also referred to as issue
queues. The stored instructions are referred to as entries. It
should be noted that although the following description references
a specific type of queue (i.e., an instruction scheduling queue),
embodiments may be implemented for other types of queues.
[0018] Instead of implementing shifting and collapsing operations
to continually adjust the positions of the entries in each queue
102, the dispatch order data structure 104 is kept separately from
the queue. In one embodiment, each issue queue 102 is a
fully-associative structure in a random access memory (RAM) device.
The dispatch order data structures 104 are separate control
structures to maintain the relative dispatch order, or age, of the
entries in the corresponding issue queues 102. An associated
instruction scheduler may be implemented as a RAM structure or,
alternatively, as another type of structure.
[0019] In one embodiment, the dispatch order data structures 104
correspond to the queues 102. Each dispatch order data structure
104 stores a plurality of dispatch indicators associated with a
plurality of pairs of entries of the corresponding queue 102. Each
dispatch indicator indicates a dispatch order of the entries in
each pair.
[0020] In one embodiment, the dispatch order data structure 104
stores a representation of at least a partial matrix with
intersecting rows and columns. Each row corresponds to one of the
entries of the queue, and each column corresponding to one of the
entries of the queue. Hence, the intersections of the rows and
columns correspond to the pairs of entries in the queue. Since the
dispatch order data structure 104 stores dispatch, or age,
information, and may be configured as a matrix, the dispatch order
data structure 104 is also referred to as an age matrix.
[0021] FIG. 2 depicts a schematic diagram of one embodiment of a
dispatch order data structure 110 in a matrix configuration. The
dispatch order data structure 110 is associated with a specific
issue queue 102. The dispatch order of the entries in the queue 102
depends on the relative age of each entry, or when the entry is
written into the queue, compared to the other entries in the queue
102. The dispatch order data structure 110 provides a
representation of the dispatch order for the corresponding issue
queue 102.
[0022] The illustrated dispatch order data structure 110 has four
rows, designated as rows 0-3, corresponding to entries of the issue
queue 102. Similarly, the dispatch order data structure has four
columns, designated as columns 0-3, corresponding to the same
entries of the issue queue 102. Other embodiments of the dispatch
order data structure 110 may include fewer or more rows and
columns, depending on the number of entries in the corresponding
issues queue 102.
[0023] The intersections between the rows and columns correspond to
different pairs, or combinations, of entries in the issue queue
102. As described above, each entry of the dispatch order data
structure 110 indicates a relative dispatch order, or age, of the
corresponding pair of entries in the queue 102. Since there is not
a relative age difference between an entry in the queue 102 and
itself (i.e., where the row and column correspond to the same entry
in the queue 102), the diagonal of the dispatch order data
structure 110 is not used or masked. Masked dispatch indicators are
designated by an "X."
[0024] For the remaining entries, arrows are shown to indicate the
relative dispatch order for the corresponding pairs of entries in
the queue 102. As a matter of convention in FIG. 2, the arrow
points toward the older entry, and away from the newer entry, in
the corresponding pair of entries. Hence, a left arrow indicates
that the issue queue entry corresponding to the row is older than
the issue queue entry corresponding to the column. In contrast, an
upward arrow indicates that the issue queue entry corresponding to
the column is older than the issue queue entry corresponding to the
row.
[0025] For example, Entry_0 of the queue 102 is older than all of
the other entries, as shown in the bottom row and the rightmost
column of the dispatch order data structure 110 (i.e., all of the
arrows point toward the older entry, Entry_0). In contrast, Entry_3
of the queue 102 is newer than all of the other entries, as shown
in the top row and the leftmost column of the dispatch order data
structure 110 (all of the arrows point away from the newer entry,
Entry_3). By looking at all of the dispatch indicators of the
dispatch order data structure 110, it can be seen that the dispatch
order, from oldest to newest, of the corresponding issue queue 102
is: Entry_0, Entry_1, Entry_2, Entry_3.
[0026] FIG. 3 depicts a schematic diagram of one embodiment of a
sequence 112 of data structure states of the dispatch order data
structure 110 shown in FIG. 2. At time T0, the dispatch order data
structure 110 has the same dispatch order as shown in FIG. 2 and
described above. At time T1, a new entry is written in Entry_0 of
the issue queue 102. As a result, the dispatch indicators of the
dispatch order data structure 110 are updated to show that Entry_0
is the newest entry in the issue queue 102. Since Entry_0 was
previously the oldest entry in the issue queue 102, all of the
dispatch indicators for Entry_0 are updated.
[0027] At time T2, a new entry is written in Entry_2. As a result,
the dispatch indicators of the dispatch order data structure 110
are updated to show that Entry_2 is the newest entry in the issue
queue 102. Since Entry_2 was previously older than Entry_3 and
Entry_0 at time T1, the corresponding dispatch indicators for the
pairs Entry_2/Entry_3 and Entry_2/Entry_0 are updated, or flipped.
Since Entry_2 is already marked as newer than Entry_1 at time T1,
the corresponding dispatch indicators for the pair Entry_2/Entry_1
is not changed.
[0028] At time T3, a new entry is written in Entry_1. As a result,
the dispatch indicators of the dispatch order data structure 110
are updated to show that Entry_1 is the newest entry in the issue
queue 102. Since Entry_1 was previously the oldest entry in the
issue queue 102 at time T2, all of the corresponding dispatch
indicators for Entry_1 are updated, or flipped.
[0029] FIG. 4 depicts a schematic diagram of another embodiment of
a dispatch order data structure 120 with masked duplicate entries.
Since the dispatch indicators above and below the masked diagonal
entries are duplicates, either the top or bottom half of the
dispatch order data structure 120 may be masked. In the embodiment
of FIG. 4, the top portion is masked. However, other embodiments
may use the top portion and mask the bottom portion.
[0030] FIG. 5 depicts a schematic diagram of one embodiment of a
sequence 122 of data structure states of the dispatch order data
structure 120 shown in FIG. 4. In particular, the sequence 122
shows how the dispatch indicators in the lower portion of the
dispatch order data structure 120 are changed each time an entry in
the corresponding queue 102 is changed. At time T1, a new entry is
written in Entry_2, and the dispatch indicator for the pair
Entry_2/Entry_3 is updated. At time T2, a new entry is written in
Entry_0, and the dispatch indicators for all the pairs associated
with Entry_0 are updated. At time T3, a new entry is written in
Entry_3, and the dispatch indicators for the pairs Entry_3/Entry_0
and Entry_3/Entry_2 are updated. At time T4, a new entry is written
in Entry_1, and the dispatch indicators for all of the entries
associated with Entry_1 are updated.
[0031] FIG. 6 depicts a schematic diagram of another embodiment of
a dispatch order data structure 130 in a partial matrix
configuration. Instead of masking the duplicate and unused dispatch
indicators, the dispatch order data structure 130 only stores one
dispatch indicator for each pair of entries in the queue.
[0032] In this embodiment, the partial matrix configuration has
fewer entries, and may be stored in less memory space, than the
previously described embodiments of the dispatch order data
structures 110 and 120. In particular, for an issue queue 102 with
a number of entries, N, the dispatch order data structure 130 may
store the same number of dispatch indicators, n, as there are pairs
of entries, according to the following:
n = C 2 N = N ! 2 ! ( N - 2 ) ! ##EQU00001##
where n designates the number of pairs of entries of the queue 102,
and N designates a total number of entries in the queue 102. For
example, if the queue 102 has 4 entries, then the number of pairs
of entries is 6. Hence, the dispatch order data structure 130
stores six dispatch indicators, instead of 16 (i.e., a 4.times.4
matrix) dispatch indicators. As another example, an issue queue 102
with 16 entries has 120 unique pairs, and the corresponding
dispatch order data structure 130 stores 120 dispatch
indicators.
[0033] FIG. 7 depicts a schematic diagram of one embodiment of a
sequence 132 of data structure states of the dispatch order data
structure 130 shown in FIG. 6. However, instead of showing the
dispatch indicators as arrows, the illustrated dispatch order data
structures 130 of FIG. 7 are shown as binary values. As a matter of
convention, a binary "1" corresponds to a left arrow, and a binary
"0" corresponds to an upward arrow. However, other embodiments may
be implemented using a different convention. Other than using
binary values for a limited number of dispatch indicators, the
sequence 132 of queue operations for times T0-T4 are the same as
described above for FIG. 5.
[0034] FIG. 8 depicts a schematic block diagram of one embodiment
of an instruction queue scheduler 140 which uses a dispatch order
data structure 104 such as one of the dispatch order data
structures 110, 120, or 130. In one embodiment, the scheduler 140
is implemented in a processor (not shown). The processor may be
implemented in a reduced instruction set computer (RISC) design.
Additionally, the processor may implement a design based on the
MIPS instruction set architecture (ISA). However, alternative
embodiments of the processor may implement other instruction set
architectures. It should also be noted that other embodiments of
the scheduler 140 may include fewer or more components than are
shown in FIG. 8.
[0035] In conjunction with the scheduler 140, the processor also
may include execution units (not shown) such as an arithmetic logic
unit (ALU), a floating point unit (FPU), a load/store unit (LSU),
and a memory management unit (MMU). In one embodiment, each of
these execution units is coupled to the scheduler 140, which
schedules instructions for execution by one of the execution units.
Once an instruction is scheduled for execution, the instruction may
be sent to the corresponding execution unit where it is stored in
an instruction queue 102.
[0036] The illustrated scheduler 140 includes a queue 102, a mapper
142, and a queue controller 144. The mapper 142 is configured to
issue one or more queue operations to insert new entries in the
queue 102. In one embodiment, the mapper 142 dispatches up to two
instructions per cycle to each issue queue 102. The queue
controller 144 also interfaces with the queue 102 to update a
dispatch order data structure 104 in response to a queue operation
to insert a new entry in the queue 102.
[0037] In order to receive two instructions per cycle, each issue
queue 102 has two write ports, which are designated as Port_0 and
Port_1. Alternatively, the mapper 142 may dispatch a single
instruction on one of the write ports. In other embodiments, the
issue queue 102 may have more than two write ports. If multiple
instructions are dispatched at the same time to multiple write
ports, then the write ports may have a designated order to indicate
the relative dispatch order of the instructions which are issued
together. For example, an instruction issued on Port_0 may be
designated as older than an instruction issued in the same cycle on
Port_1. In one embodiment, write addresses are generated internally
in each issue queue 102.
[0038] The queue controller 144 keeps track of the dispatch order
of the entries in the issue queue 102 to determine which entries
can be overwritten (or evicted). In order to track the dispatch
order of the entries in the queue 102, the queue controller 144
includes dispatch logic 146 with least recently used (LRU) logic
148. The queue controller 144 also includes a bit mask vector 150
and an age matrix flop bank 152. In one embodiment, the flop bank
152 includes a plurality of flip-flops. Each flip-flop stores a bit
value indicative of the dispatch order of the entries of a
corresponding pair of entries. In other words, each flip-flop
corresponds to a dispatch indicator, and the flop bank 152
implements the dispatch order data structure 104. The bit value of
each flip-flop is a binary bit value. In one embodiment, a logical
high value of the binary bit value indicates one dispatch order of
the pair of entries (e.g., the corresponding row is older than the
corresponding column), and a logical low value of the binary bit
value to indicate a reverse dispatch order of the pair of entries
(e.g., the corresponding column is older than the corresponding
row). When a dispatch indicator is updated in response to a new
instruction written to the queue 102, the dispatch logic 146 is
configured to potentially flip the binary bit value for the
corresponding dispatch indicators. As described above, the number
of flip-flops in the flop bank 152 may be determined by the number
of pairs (e.g., combinations) of entries in the queue 102.
[0039] In order to determine which entries may be overwritten in
the queue 102, the dispatch logic 146 includes least recently used
(LRU) logic 148 to implement a LRU replacement strategy. In one
embodiment, the LRU replacement strategy is based, at least in
part, on the dispatch indicators of the corresponding dispatch
order data structure 104 implemented by the flop bank 152. As
examples, the LRU logic 148 may implement a true LRU replacement
strategy or a pseudo LRU replacement strategy. In a true LRU
replacement strategy, the LRU entries in the queue 102 are
replaced. The LRU entries are designated by LRU replacement
addresses. However, generating the LRU replacement addresses, which
is a serial operation, can be logically complex. A pseudo LRU
replacement strategy approximates the true LRU replacement strategy
using a less complicated implementation.
[0040] When the mapper dispatches a new entry to the queue 102 as a
part of a queue operation, the queue 102 interfaces with the queue
controller 144 to determine which existing entry to discard to make
room for the newly dispatched entry. In some embodiments, the
dispatch logic 146 uses the age matrix flop bank 152 to determine
which entry to replace based on the absolute dispatch order of the
entries in the queue 102. However, in other embodiments, it may be
useful to identify an entry to discard from among a subset of the
entries in the queue 102.
[0041] As one example, some entries in the queue 102 may be
associated with a replay operation, so it may be useful to maintain
the corresponding entries in the queue 102, regardless of the
dispatch order of the entries. Thus, the entry to be discarded may
be selected from a subset that excludes the entries associated with
the replay operation.
[0042] As another example, it may be useful to maintain certain
entries in the queue 102 in order to prevent a hazard event such as
a structural, data, or control hazard. Thus, the entry to be
discarded may be selected from a subset that excludes the entries
that, if discarded, would potentially create a hazard event.
[0043] As another example, it may be useful to preserve entries of
the queue 102 that are related to a particular thread of a
multi-threaded processing system. Thus, the entry to be discarded
may be selected from a subset that excludes entries related to the
identified thread. In this way, the preserved entries corresponding
to the identified thread are given priority, because the entries
associated with the thread are not discarded.
[0044] In order to identify a subset of the entries in the queue
102, the queue controller 144 may use one or more bit mask vectors
150. In one embodiment, each bit mask vector 150 is used to mask
out one or more dispatch indicators of a dispatch order data
structure 104 such as the age matrix flop bank 152. In other words,
each bit mask vector 150 (or bit vector) is configured to store a
plurality of mask values corresponding to the dispatch indicators
of the dispatch order data structure 104. Thus, the queue
controller 144 can exclude at least some of the entries of the
queue 102 from a queue operation based on the mask values of the
bit vector 150. For example, instead of selecting the absolute
oldest entry of the queue 102 to be discarded, the dispatch logic
146 may select the oldest entry of the subset of entries that are
not masked by the bit mask vector 150. In an alternative
embodiment, the bit mask vector 150 is used to identify entries
that may be discarded in a dispatch operation, rather than entries
to be maintained in the queue 102 (i.e., excluded from potentially
discarding) in a dispatch operation.
[0045] FIG. 9 depicts a schematic flow chart diagram of one
embodiment of a queue operation method 160 for use with the
instruction queue scheduler 140 of FIG. 8. Although the tracking
method 160 is described with reference to the instruction queue
scheduler 140 of FIG. 8, other embodiments may be implemented in
conjunction with other schedulers.
[0046] In the illustrated queue operation method 160, the queue
controller 144 initializes 162 the dispatch order data structure
104. As described above, the queue controller 144 may initialize
the dispatch order data structure 104 with a plurality of dispatch
indicators based on the dispatch order of the entries in the queue
102. In this way, the dispatch order data structure 104 maintains
an absolute dispatch order for the queue 102 to indicate the order
in which the entries are written into the queue 102. Although some
embodiments are described as using a particular type of dispatch
order data structure 104 such as the age matrix, other embodiments
may use other implementations of the dispatch order data
structure.
[0047] The illustrated queue operation method 160 continues as the
queue 102 receives 164 a command for a queue operation such as a
dispatch operation. As explained above, the queue controller 144
selects an existing entry of the queue 102 to be discarded from all
of the entries in the queue 102 or from a subset of the entries in
the queue 102. In order to identify a subset of the entries in the
queue 102, the queue controller 144 determines 166 if there is a
bit mask vector 150 to use with the received queue operation. If
there is a bit mask vector 150, then the dispatch logic 146 applies
168 the bit mask vector 150 to the dispatch order data structure
104 before executing 170 the queue operation. In this situation,
the candidate entries which may be discarded from the queue 102 is
limited to some subset of the entries in the queue 102. Otherwise,
if there is not an applicable bit mask vector 150, then the
dispatch logic 146 may directly execute 170 the queue operation. In
this situation, the candidate entries which may be discarded from
the queue 102 is not limited to a subset of the entries in the
queue 102. After executing 170 the queue operation, the dispatch
logic 146 updates 172 the dispatch order data structure 104, and
the depicted queue operation method 160 ends.
[0048] It should be noted that embodiments of the methods,
operations, functions, and/or logic may be implemented in software,
firmware, hardware, or some combination thereof. Additionally, some
embodiments of the methods, operations, functions, and/or logic may
be implemented using a hardware or software representation of one
or more algorithms related to the operations described above. To
the degree that an embodiment may be implemented in software, the
methods, operations, functions, and/or logic are stored on a
computer-readable medium and accessible by a computer
processor.
[0049] As one example, an embodiment may be implemented as a
computer readable storage medium embodying a program of
machine-readable instructions, executable by a digital processor,
to perform operations to facilitate queue allocation. The
operations may include operations to store a plurality of dispatch
indicators corresponding to pairs of entries in a queue. Each
dispatch indicator is indicative of the dispatch order of the
corresponding pair of entries. The operations also include
operations to store a bit vector comprising a plurality of mask
values corresponding to the dispatch indicators of the dispatch
order data structure, and to perform a queue operation on a subset
of the entries in the queue. The subset excludes at least some of
the entries of the queue based on the mask values of the bit
vector. Other embodiments of the computer readable storage medium
may facilitate fewer or more operations.
[0050] Embodiments of the invention also may involve a number of
functions to be performed by a computer processor such as a central
processing unit (CPU), a graphics processing unit (GPU), or a
microprocessor. The microprocessor may be a specialized or
dedicated microprocessor that is configured to perform particular
tasks by executing machine-readable software code that defines the
particular tasks. The microprocessor also may be configured to
operate and communicate with other devices such as direct memory
access modules, memory storage devices, Internet related hardware,
and other devices that relate to the transmission of data. The
software code may be configured using software formats such as
Java, C++, XML (Extensible Mark-up Language) and other languages
that may be used to define functions that relate to operations of
devices required to carry out the functional operations related
described herein. The code may be written in different forms and
styles, many of which are known to those skilled in the art.
Different code formats, code configurations, styles and forms of
software programs and other means of configuring code to define the
operations of a microprocessor may be implemented.
[0051] Within the different types of computers, such as computer
servers, that utilize the invention, there exist different types of
memory devices for storing and retrieving information while
performing some or all of the functions described herein. In some
embodiments, the memory/storage device where data is stored may be
a separate device that is external to the processor, or may be
configured in a monolithic device, where the memory or storage
device is located on the same integrated circuit, such as
components connected on a single substrate. Cache memory devices
are often included in computers for use by the CPU or GPU as a
convenient storage location for information that is frequently
stored and retrieved. Similarly, a persistent memory is also
frequently used with such computers for maintaining information
that is frequently retrieved by a central processing unit, but that
is not often altered within the persistent memory, unlike the cache
memory. Main memory is also usually included for storing and
retrieving larger amounts of information such as data and software
applications configured to perform certain functions when executed
by the central processing unit. These memory devices may be
configured as random access memory (RAM), static random access
memory (SRAM), dynamic random access memory (DRAM), flash memory,
and other memory storage devices that may be accessed by a central
processing unit to store and retrieve information. Embodiments may
be implemented with various memory and storage devices, as well as
any commonly used protocol for storing and retrieving information
to and from these memory devices respectively.
[0052] Although the operations of the method(s) herein are shown
and described in a particular order, the order of the operations of
each method may be altered so that certain operations may be
performed in an inverse order or so that certain operations may be
performed, at least in part, concurrently with other operations. In
another embodiment, instructions or sub-operations of distinct
operations may be implemented in an intermittent and/or alternating
manner.
[0053] Although specific embodiments of the invention have been
described and illustrated, the invention is not to be limited to
the specific forms or arrangements of parts so described and
illustrated. The scope of the invention is to be defined by the
claims appended hereto and their equivalents.
* * * * *