U.S. patent application number 09/820967 was filed with the patent office on 2002-07-04 for system and method for maintaining prefetch stride continuity through the use of prefetch bits.
Invention is credited to Abdallah, Mohammad A., Al-Dajani, Khalid.
Application Number | 20020087802 09/820967 |
Document ID | / |
Family ID | 25015840 |
Filed Date | 2002-07-04 |
United States Patent
Application |
20020087802 |
Kind Code |
A1 |
Al-Dajani, Khalid ; et
al. |
July 4, 2002 |
System and method for maintaining prefetch stride continuity
through the use of prefetch bits
Abstract
A processor includes a cache that has a lines to store data. The
processor also includes prefetch bits each of which is associated
with one of the cache lines. The processor further includes a
prefetch manager that calculates prefetch data as if a cache miss
occurred whenever a cache request results in a cache hit to a cache
line that is associated with a prefetch bit that is set. In a
further embodiment, the prefetch manager prefetches data into the
cache based on the distance between cache misses for an
instruction.
Inventors: |
Al-Dajani, Khalid;
(Orangevale, CA) ; Abdallah, Mohammad A.; (Folsom,
CA) |
Correspondence
Address: |
KENYON & KENYON
1500 K STREET, N.W., SUITE 700
WASHINGTON
DC
20005
US
|
Family ID: |
25015840 |
Appl. No.: |
09/820967 |
Filed: |
March 30, 2001 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
09820967 |
Mar 30, 2001 |
|
|
|
09749936 |
Dec 29, 2000 |
|
|
|
Current U.S.
Class: |
711/137 ;
711/204; 711/E12.057; 712/E9.047 |
Current CPC
Class: |
G06F 9/3832 20130101;
G06F 9/3455 20130101; G06F 12/0862 20130101; G06F 9/383 20130101;
G06F 2212/6026 20130101 |
Class at
Publication: |
711/137 ;
711/204 |
International
Class: |
G06F 012/00 |
Claims
What is claimed is:
1. A processor comprising: a cache having plurality of lines to
store data; a plurality of prefetch bits each associated with one
of the cache lines; and a prefetcher to calculate prefetch data as
though a cache miss occurred if a cache request results in a cache
hit to a line that is associated with a set prefetch bit.
2. The processor of claim 1, wherein the prefetch data includes
miss distance information for instructions.
3. The processor of claim 1, wherein the prefetch bits are stored
in the cache.
4. The processor of claim 1, wherein the prefetch manager has logic
to reset a prefetch bit associated with a cache line whenever a
cache request results in a cache hit to the cache line and the
prefetch bit was set.
5. The processor of claim 1 further comprising logic to store
recency of use information for the plurality of cache lines which
logic uses information from the prefetch bits to determine the
recency of use information.
6. A cache comprising: a data array having plurality of lines to
store data; and a plurality of prefetch bits each associated with
one of the data array lines to indicate that data stored in the
associated line was prefetched into the cache.
7. The cache of claim 6, wherein the cache contains logic to reset
a prefetch bit in response to a read from the data array line
associated with the prefetch bit.
8. The cache of claim 6, wherein the cache further comprises a
least recently used (LRU) array, and wherein said plurality of
prefetch bits are located in the LRU array.
9. The cache of claim 6, wherein the cache has logic to store
recency of use information associated with each data array line,
and wherein said logic stores information indicating that a data
array line has a status of least recently used whenever the data
array line is updated with data that was prefetched into the
cache.
10. The cache of claim 9, wherein said logic stores information
indicating that the data array line last read has a status of most
recently used unless the data array line is associated with a
prefetch bit that indicates data being stored in this data array
line was prefetched into the cache.
11. A processor comprising: a cache; an instruction decoder to
decode instructions and to cause cache requests to be sent for data
to be used by the instructions decoded; a cache manager to generate
a cache miss response for a cache request if data requested is
stored in the cache and the cache manager determines that the data
was prefetched into the cache; and a cache prefetcher to receive
cache miss responses from the cache manager and to prefetch data
into the cache based on the distance between cache misses for an
instruction.
12. The processor of claim 11, wherein the processor further
includes a plurality of prefetch bits that store information used
by the prefetch manager to determine if data was prefetched into
the cache.
13. The processor of claim 12, wherein the cache contains a Read
request buffer, and wherein the plurality of prefetch bits are
attached to the Read request buffer.
14. The processor of claim 12, wherein the processor contains logic
to reset the prefetch bit associated with prefetched data in
response to the first hit to the prefetched data, and wherein the
cache manager will determine that data was not prefetched if the
prefetch bit associated with said data is reset.
15. The processor of claim 11, wherein the cache contains bits to
store information about the status of each line in the cache, and
wherein the cache contains logic to update the status of a cache
line to least recently used whenever prefetched data is stored in
the cache line and to update the status of said cache line to most
recently used whenever the prefetched data is read a second time
after the data is prefetched into the cache.
16. A method of maintaining the continuity of prefetch information,
the method comprising: decoding an instruction a first time;
sending a first request to a cache for data to be used by said
instruction; determining that the data requested in the first
request is stored in a line in a cache; determining that a prefetch
bit associated with said cache line indicates that the cache line
stores data that was prefetched into the cache; and calculating
prefetch information for said instruction, wherein the prefetch
information is calculated based on the first request having
resulted in a cache miss.
17. The method of claim 16, wherein the method further comprises
resetting the prefetch bit associated with the cache line.
18. The method of claim 16, wherein the calculation of prefetch
information for an instruction comprises calculating miss distance
information for the instruction.
19. The method of claim 16, further comprising: decoding a second
instruction which is to use said data; sending a second request to
the cache for said data; determining that the data requested in the
second request is stored in a line in the cache; determining that
the prefetch bit associated with said cache line indicates that the
cache line stores data that was not prefetched into the cache;
calculating prefetch information for the instruction, wherein the
prefetch information is calculated based on the second request
having resulted in a cache hit; and updating the status information
corresponding to the cache line to indicate that the cache line was
most recently used.
20. A processor comprising: a cache having a plurality of cache
lines; a means for prefetching data into one of the cache lines;
and a means for indicating that a cache line contains prefetched
data.
21. The processor of claim 20, further comprising a means for
determining that a virtual miss has occurred in response to a
request for data sent to the cache whenever the data is stored in a
cache line and the means for indicating indicates that the cache
lines contains prefetched data.
22. The processor of claim 20, wherein the means for prefetching
updates a prefetch table to indicate that a miss response was
received whenever a response was received for an actual miss or a
virtual miss.
23. The processor of claim 20 further comprising a means for
preventing the calculation of a miss distance of zero for
instructions that have a miss distance that is greater than zero
but less than the size of a cache line.
24. The processor of claim 20, wherein the means for indicating
that a cache line contains prefetched data stores a data structure
that is used to determine whether a cache line contains prefetched
data, and wherein the processor further comprising a means for
reducing cache pollution that uses the same data structure as said
means for indicating.
Description
FIELD OF THE INVENTION
[0001] Embodiments of the present invention relate to prefetching
data from a memory. In particular, the present invention relates to
methods and apparatus for prefetching data from a memory for use by
a processor.
BACKGROUND
[0002] Instructions executed by a processor often use data that may
be stored in a system memory device such as a Random Access Memory
(RAM). For example, a processor may execute a LOAD instruction to
load a register with data that is stored at a particular memory
address. In many systems, because the access time for the system
memory is relatively slow, frequently used data elements are copied
from the system memory into a faster memory device called a cache
and, if possible, the processor uses the copy of the data element
in the cache when it needs to access (i.e., read to or write from)
that data element. If the memory location that is accessed by an
instruction has not been copied into a cache, then the access to
the memory location by the instruction is said to cause a "cache
miss" because the data needed could not be obtained from the cache.
Computer systems operate more efficiently if the number of cache
misses is minimized.
[0003] One way to decrease the time spent waiting to access a RAM
is to "prefetch" data from the system memory before it is needed
and, thus, before the cache miss occurs. Many processors have an
instruction cycle in which instructions to be executed are obtained
from memory in one step (i.e., an instruction fetch) and executed
in another step. If the instruction to be executed accesses a
memory location (e.g., a memory LOAD), then the data at that
location must be fetched into the appropriate section of the
processor from a cache or, if a cache miss, from a system memory. A
cache prefetcher attempts to anticipate which data addresses will
be accessed by instructions in the future and to prefetch this data
from the memory before the data is needed. A cache prefetcher
typically determines and maintains a data access pattern for an
instruction and prefetches data into the cache based on this data
access pattern. As used herein, "instruction" refers to a
particular instance of an instruction in the program, with each
instruction being identified by a different instruction pointer
("IP") value.
[0004] The performance of a cache prefetching scheme degrades if
the data access pattern is not properly managed. A prefetcher
maintains access pattern "continuity" if the prefetcher maintains a
discovered access pattern as long as the pattern is active and
relinquishes an access pattern that is no longer active. A
prefetcher operates less efficiently if the continuity of the
access patterns are not maintained.
DESCRIPTION OF THE DRAWINGS
[0005] FIG. 1 is a partial block diagram of a computer system
having a processor that maintains prefetch stride continuity
through the use of prefetch bits according to an embodiment of the
present invention.
[0006] FIG. 2 is a partial block diagram of a cache having prefetch
bits according to an embodiment of the present invention.
[0007] FIG. 3 is a flow diagram of a method of maintaining prefetch
stride continuity through the use of prefetch bits according to an
embodiment of the present invention.
[0008] FIG. 4 is a partial block diagram of a computer system
having prefetch bits according to another embodiment of the present
invention.
DETAILED DESCRIPTION
[0009] Embodiments of the present invention relate to a prefetcher
which prefetches data for an instruction based on an access pattern
that has been determined and maintained for the instruction. In one
embodiment, the access pattern used is the distance between cache
misses caused by the instruction. This distance is the stride for
the cache misses and may be referred to as the "miss distance" for
the instruction. The miss distance may be stored in a prefetch
table.
[0010] A prefetcher may incorrectly determine that an access
pattern has been dropped when in fact the access pattern is still
active. This situation may occur, for example, when the prefetched
data is stored in a storage medium, such as a cache, that is not
controlled by the prefetcher. If the prefetcher is using a pattern
of cache misses as the access pattern, data prefetched into the
cache could falsely interrupt the access pattern detected and
result in a loss of stride continuity because the prefetched data
may cause a cache hit even though this data would have caused a
cache miss if it had not been prefetched. That is, the act of
prefetching data into the cache causes requests that would have
resulted in cache misses to instead result in a cache hit. Thus,
while the ultimate object of prefetching is to decrease the number
of cache misses, a prefetcher that relies on a pattern of cache
misses may disrupt the detected access pattern by the very act of
prefetching data into the cache.
[0011] In order to maintain stride continuity, the invention
disclosed in this application handles a cache request that results
in a cache hit to prefetched data as if a cache miss has occurred.
Such a miss may be referred to as a "virtual miss" because an
actual cache miss ("actual miss") did not occur. In an embodiment,
a plurality of prefetch bits (or "virtual bits") are associated
with each line in the cache and are used to indicate that the data
stored in the associated line was prefetched into the cache. In an
embodiment, the prefetcher will calculate the miss pattern and
other prefetch data as though a cache miss occurred if a cache
request results in a cache hit to a line that is associated with a
set prefetch bit. Thus, the prefetch bits store information that is
used by a prefetch manager to determine if data was prefetched into
the cache.
[0012] Other embodiments use the prefetch bits for purposes in
addition to maintaining the continuity of the prefetch access
pattern. For example, in an embodiment a prefetch bit may be reset
after the first hit to a prefetched cache line (i.e., the first hit
occurring after the line is updated with prefetched data) in order
to prevent the calculation of a miss distance of zero for
instructions with stride that is smaller than the size of a cache
line. In a further embodiment, prefetch bits are also used to
prevent cache pollution.
[0013] FIG. 1 is a partial block diagram of a computer system
having a processor that maintains prefetch stride continuity
through the use of prefetch bits according to an embodiment of the
present invention. Computer system 100 includes a processor 101
that has a decoder 110 that is coupled to a prefetcher 120.
Computer system 100 also has an execution unit 107 that is coupled
to decoder 110 and prefetcher 120. The term "coupled" encompasses a
direct connection, an indirect connection, an indirect
communication, etc. Processor 101 may be may be any micro-processor
capable of processing instructions, such as for example a general
purpose processor in the INTEL PENTIUM family of processors.
Execution unit 107 is a device which performs instructions. Decoder
110 may be a device or program that changes one type of code into
another type of code that may be executed. For example, decoder 110
may decode a LOAD instruction that is part of a program, and the
decoded LOAD instruction may later be executed by execution unit
107. Processor 101 is coupled to Random Access Memory (RAM) 140.
RAM 140 is a system memory. In other embodiments, a type of system
memory other than a RAM may be used in computer system 100 instead
of or in addition to RAM 140.
[0014] In the embodiment shown in FIG. 1, processor 101 contains a
cache 130 that is coupled to execution unit 107, prefetcher 120,
and RAM 140. In another embodiment, cache 130 may be located
outside of processor 101. Cache 130 may be a Static Random Access
Memory (SRAM). In an embodiment, cache 130 contains prefetch bits
135. In a further embodiment, cache 130 contains plurality of lines
to store data and each prefetch bit is associated with one of the
cache lines. Further details of prefetch bits are discussed below
with reference to other figures.
[0015] As shown in FIG. 1, prefetcher 120 includes a prefetch
manager 122 and a prefetch memory 125. Prefetch manager 122 may
include logic to prefetch data for an instruction based on the
distance between cache misses caused by the instruction. As used in
this application, "logic" may include hardware logic, such as
circuits that are wired to perform operations, or program logic,
such as firmware that performs operations. Prefetch memory 125 may
store a prefetch table 126 that contains entries including the
distance between cache misses caused by an instruction. In an
embodiment, prefetch memory 125 is a content addressable memory
(CAM). Prefetch manager 122 may determine the addresses of data
elements to be prefetched based on the miss distance that is
recorded for instructions in the prefetch table.
[0016] FIG. 2 is a partial block diagram of a cache having prefetch
bits 135 according to an embodiment of the present invention. FIG.
2 shows cache 130 including a data array 240 and a least recently
used (LRU) array 250. As shown in FIG. 2, data array 240 contains a
plurality of cache lines 245. Each cache line may be, for example,
32 bytes long. In an embodiment, data array 240 may be organized
into sets and ways, as per conventional techniques, and cache 130
may contain other arrays such as for example a tag array. As would
be appreciated by a person of skill in the art, LRU array 250 may
contain recency of use information that is used, for example, to
determine cache lines to be evicted when a portion of the data
array 240 becomes full. In an embodiment of the present invention,
prefetched bits 135 are stored as part of the LRU array 250. In
this embodiment, LRU array 250 contains a prefetch bit for each
cache line 135 in data array 240. In this embodiment, each cache
line 245 is associated with one of the prefetch bits 135. In
another embodiment, the prefetch bits 135 may be located in a part
of the cache 130 other than LRU array 250.
[0017] FIG. 2 shows a cache manager 260 that is coupled to data
array 240 and LRU array 250. In the embodiment shown, cache manager
260 contains prefetch bit management logic 261 and recency of use
logic 262. In an embodiment, the prefetch bit management logic 261
manages the values stored in the prefetch bits 135. For example,
the prefetch bit management logic 261 may set a prefetch bit each
time that a cache line is updated with data that was prefetched
into the cache. In an embodiment, the prefetcher 120 sends a signal
to prefetch management logic 261 whenever the data loaded into the
cache is prefetched data. In a further embodiment, prefetch bit
management logic 261 resets a prefetch bit in response to a read
from a data array line associated with the prefetch bit. Recency of
use logic 262 may store recency of use information in LRU array 250
which information is associated with each data array line. In an
embodiment, the recency of use logic 262 stores information
indicating that a data array line has a status of least recently
used whenever the data array line is updated with data that was
prefetched into the cache. In a further embodiment, the recency of
use logic 262 stores information indicating that the data array
line last read has a status of most recently used unless the data
array line is associated with a prefetch bit that indicates data
being stored in this data array line was prefetched into the
cache.
[0018] In an embodiment, a set prefetch bit may indicate that the
associated data array line contains prefetched data. As shown in
FIG. 2, two of the prefetch bits shown are set (i.e., they have a
value of "1") and five of the prefetch bits shown are not set
(i.e., have a value of "0"). If data is loaded into the cache in
response to a cache miss, this data would not have been prefetched
and thus the associated prefetch bit may indicate that the data was
not prefetched. Of course, any value may be used to indicate that
the prefetch bit is set. In an embodiment that is discussed in more
detail below, the prefetch bit may be cleared the first time that
prefetched data is loaded, even though the associated cache line
will still contain prefetch data, to handle the case where more
than one miss occurs for an instruction in the same cache line.
[0019] An example of the operation of the present invention is
described with reference to FIG. 3. FIG. 3 is a flow diagram of a
method of maintaining prefetch stride continuity through the use of
prefetch bits according to an embodiment of the present invention.
The method shown in FIG. 3 may be used with a system such as that
shown in FIGS. 1-2. The processor 101 may be executing a program
that contains instructions. As shown in FIG. 3, decoder 101 may
decode an instruction (301). This instruction may be, for example,
a LOAD instruction that has an IP of XXXX. The LOAD instruction may
load data from a location in RAM 140, for example the data element
at address YYY. In the example shown in FIG. 3, the instruction
decoded has been executed a number of times in the past. This
allowed prefetcher 120 to determine an access pattern for the
instruction (information on which is stored in prefetch table 126)
and to prefetch the next data element to be loaded from RAM
according to the access pattern. Thus, in this example the data at
address YYY has already been prefetched into a line of cache 130.
Because this data was prefetched from the RAM into a line of cache
130, at the time the data was prefetched a prefetch bit associated
with the cache line was set by prefetch bit management logic
261.
[0020] According to the example shown in FIG. 3, after decoding the
instruction the decoder 110 may cause a request to be sent to cache
130 for a data element (e.g., the data stored at address YYY) that
is to be used by the instruction (302). Prefetcher 120 will receive
information about the response to the cache request and will
determine whether the request resulted in a cache hit (303). In
this example, the request would have resulted in an actual miss if
the data had not been prefetched into the cache. If the request
resulted in a cache miss, prefetcher 120 calculates prefetch
information for the instruction based on the request having
resulted in a cache miss (304). If the request resulted in a cache
hit, the prefetcher 120 obtains information for the prefetch bit
associated with the cache line that contains the data element
requested (305). The prefetcher 120 then determines if the
information indicates that the data element was prefetched into the
cache (306). If the information indicates that the data element had
been prefetched into the cache, the prefetcher 120 treats the
request as a virtual miss and calculates prefetch information for
the instruction based on the request having resulted in a cache
miss (304). If the information indicates that the data element had
not been prefetched into the cache, the prefetcher 120 calculates
prefetch information for the instruction based on the request
having resulted in a cache hit (307). In this embodiment, the cache
manager will generate a cache miss response for a cache request if
data requested is stored in the cache and the cache manager
determines that the data was prefetched into the cache. The cache
prefetcher receives cache miss responses from the cache manager and
prefetches data into the cache based on the distance between cache
misses for an instruction. In an embodiment, the prefetcher 120
updates prefetch table 126 to indicate that a miss response was
received whenever either an actual miss or a virtual miss response
was received.
[0021] In the example above, the prefetcher may have detected an
access pattern of every fifth address because a cache miss has been
detected occurring at every fifth address (e.g., 0x0005, 0x0010,
0x005, 0x0020, . . . ) for this instruction. Thus, the prefetcher
will prefetch the address that is five addresses away from the last
address accessed by that instruction (because that is the next
expected miss). Once the data element at address 0x0025 has been
prefetched into the cache, however, it will not cause an actual
cache miss. If the prefetching scheme is based on a detected
pattern of cache misses, the presence of the prefetched data
element from address 0x0025 in the cache could cause the prefetcher
to determine that the pattern has been interrupted (because the
request for address 0x0025 did not cause an actual cache miss) even
though the access pattern is actually still valid. Thus, the
learned stride access pattern of 5 may become corrupted. According
to embodiments of the invention disclosed in this application, the
prefetcher will determine, based on the content of the prefetch bit
for the cache line in question, that the request caused a virtual
miss. Thus, the prefetcher will update the prefetch information
(e.g., the miss distance) for the instruction as if the request
generated an actual miss.
[0022] In a further embodiment, the prefetch bit management logic
161 prevents the calculation of a miss distance of zero for
instructions that have a stride greater than zero but less than the
size of a cache line. In this embodiment, whenever a request
results in a virtual miss, the prefetch bit management logic 261
resets the prefetch bit associated with the cache line that
contains the data requested. The next time that this data is
requested from the cache, the cache will respond to the request by
indicating that an actual hit has resulted, even though the data
had been prefetched, because the prefetch bit will have been reset.
If the stride of the instruction is less than a cache line apart,
the addresses requested by two or more instructions' could occur in
the same cache line. Thus, a virtual miss would be generated by the
same cache line with a stride of zero every time these instructions
hit the same cache line with a prefetch bit set. Clearing the
prefetch bit after the first hit to the prefetched cache line
prevents this case from occurring.
[0023] In a further embodiment, the cache manager 260 stores
recency of use information for the plurality of cache lines and
uses information from the prefetch bits to determine this recency
of use information. In an embodiment, the recency of use logic 262
stores information in LRU array 250 indicating that a data array
line has a status of least recently used whenever the data array
line is updated with data that was prefetched into the cache.
According to this embodiment, data that has been prefetched into
the cache, but has not yet been used, may be selected first for
eviction. The recency of use logic 262 stores information
indicating that the data array line last read has a status of most
recently used unless the data array line is associated with a
prefetch bit that indicates data being stored in this data array
line was prefetched into the cache. According to this embodiment, a
cache line containing prefetched data that is hit a first time will
not be changed to a status of most recently used. Thus, prefetched
data that is hit only once may also be evicted first. The prefetch
bit is cleared once the cache line is hit, and thus upon the second
hit to the cache line the recency of use logic 262 will treat the
cache line as it if were not prefetched and will change its status
to most recently used. The above embodiments for reducing cache
pollution use the same data structure (i.e., the prefetch bits) as
is used to indicate that a cache line contains prefetched data. If
data is prefetched into the cache that is not accessed or reused,
this data will first be replaced.
[0024] FIG. 4 is a partial block diagram of a computer system
having prefetch bits according to another embodiment of the present
invention. Similar to FIG. 1, FIG. 4 shows a computer system 400
that contains a processor 401 that is coupled to a RAM 440.
Processor 401 contains a decoder 410 coupled to an execution unit
407. Processor 401 also contains a prefetcher 420 that is coupled
to decoder 410 and execution unit 407. Computer system 400 contains
a cache 430 that is coupled to processor 401 and to RAM 440. Unlike
the processor 101 of FIG. 1, processor 401 also contains a read
request buffer 470 that is coupled to prefetcher 420, cache 430,
and RAM 440. In this embodiment, prefetch bits 475 are attached to
read request buffer 470. Read request buffer 470 may be a cache
fill buffer that starts the prefetch request to memory. When this
embodiment is used, the prefetch bit may be associated with the
cache line before the data is brought into the cache. If the same
instruction hits the prefetch line when it is still in the request
stage, then the stride continuity may be maintained and the new
prefetch request may be issued while the old prefetch request is in
progress.
[0025] Embodiments of the present invention relate to a prefetcher
which prefetches data for an instruction based on an access pattern
that has been determined and maintained for the instruction. The
present invention maintains stride continuity by handling cache
requests resulting in a cache hit to prefetched data as if a cache
miss had occurred. Several embodiments of the present invention are
specifically illustrated and/or described herein. However, it will
be appreciated that modifications and variations of the present
invention are covered by the above teachings and within the purview
of the appended claims without departing from the spirit and
intended scope of the invention. For example, any combination of
one or more of the aspects described above may be used. In
addition, the invention may be used with physical address or linear
addresses. In addition, the invention may be used with prefetch
schemes based on different types of access patterns including those
based on a sequential, linear or series patterns.
* * * * *