U.S. patent application number 13/846731 was filed with the patent office on 2013-10-24 for method for performing cache coherency in a computer system.
The applicant listed for this patent is Martin M. Deneroff, Steve C. Miller, Kenneth C. Yeager. Invention is credited to Martin M. Deneroff, Steve C. Miller, Kenneth C. Yeager.
Application Number | 20130282988 13/846731 |
Document ID | / |
Family ID | 42734020 |
Filed Date | 2013-10-24 |
United States Patent
Application |
20130282988 |
Kind Code |
A1 |
Miller; Steve C. ; et
al. |
October 24, 2013 |
Method for Performing Cache Coherency in a Computer System
Abstract
In a computing system, cache coherency is performed by selecting
one of a plurality of coherency protocols for a first memory
transaction. Each of the plurality of coherency protocols has a
unique set of cache states that may be applied to cached data for
the first memory transaction. Cache coherency is performed on
appropriate caches in the computing system by applying the set of
cache states of the selected one of the plurality of coherency
protocols.
Inventors: |
Miller; Steve C.;
(Livermore, CA) ; Deneroff; Martin M.; (Oakhurst,
NJ) ; Yeager; Kenneth C.; (Sunnyvale, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Miller; Steve C.
Deneroff; Martin M.
Yeager; Kenneth C. |
Livermore
Oakhurst
Sunnyvale |
CA
NJ
CA |
US
US
US |
|
|
Family ID: |
42734020 |
Appl. No.: |
13/846731 |
Filed: |
March 18, 2013 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12887374 |
Sep 21, 2010 |
8402225 |
|
|
13846731 |
|
|
|
|
10837057 |
Apr 30, 2004 |
7802058 |
|
|
12887374 |
|
|
|
|
60467102 |
Apr 30, 2003 |
|
|
|
Current U.S.
Class: |
711/141 |
Current CPC
Class: |
G06F 12/0808 20130101;
G06F 12/0815 20130101 |
Class at
Publication: |
711/141 |
International
Class: |
G06F 12/08 20060101
G06F012/08 |
Claims
1. A method for performing cache coherency in a computer system,
comprising: selecting one of a plurality of coherency protocols for
a memory transaction to be applied to cached data stored in the
computer system, each of the plurality of coherency protocols
having a unique set of cache states that may be applied to cached
data for the memory transaction; and performing cache coherency for
the memory transaction on caches in the computer system storing the
cached data by applying the set of cache states of the selected one
of the plurality of coherency protocols.
2. The method of claim 1, wherein selection among the plurality of
coherency protocols is performed for one or more separate memory
transactions.
3. The method of claim 1, wherein the selected one of the plurality
of coherency protocols is a write invalidate protocol.
4. The method of claim 3 wherein the write invalidate protocol
generates invalidate messages in response to a change in data
pursuant to the memory transaction for caches in the computer
system holding a copy of the data.
5. The method of claim 1, wherein the selected one of the plurality
of coherency protocols is an update protocol.
6. The method of claim 5, wherein the update protocol does not make
data associated with the memory transaction exclusive before
executing memory location stores.
7. The method of claim 6, wherein new data to be stored is
transmitted to all caches in a computing system that hold copies of
the original data.
8. The method of claim 1, wherein the selected one of the plurality
of coherency protocols is an exclusive protocol.
9. The method of claim 8, wherein data associated with the memory
transaction is requested to be placed into an exclusive state in
response to the data not being stored on a local cache associated
with the origination of the memory transaction.
10. The method of claim 1, wherein the selected one of the
plurality of coherency protocols is an update once protocol.
11. The method of claim 10, wherein data associated with the memory
transaction is requested to be placed into an exclusive state in
response to the data being updated twice by a first processor
before being updated by another processor.
12. A non-transitory computer readable storage medium tangibly
storing code for performing cache coherency in a computer system,
the code operable to: select one of a plurality of coherency
protocols for a memory transaction to be applied to cached data
stored in the computer system, each of the plurality of coherency
protocols having a unique set of cache states that may be applied
to the cached data for the memory transaction; and perform cache
coherency for the memory transaction on caches in the computer
system storing the cached data by applying the set of cache states
of the selected one of the plurality of coherency protocols.
13. The non-transitory computer readable storage medium of claim
12, wherein selection among the plurality of coherency protocols is
performed for each separate memory transaction.
14. The non-transitory computer readable storage medium of claim
12, wherein the plurality of cache coherency protocols from which
the code selects includes a write invalidate protocol, an update
protocol, an exclusive protocol, and an update once protocol.
15. The non-transitory computer readable storage medium of claim
12, wherein the code is operable to dynamically select different
cache coherency protocols for each cache in the computer
system.
16. The non-transitory computer readable storage medium of claim
12, wherein selection of one of the plurality of cache coherency
protocols is performed based on the cached data associated with the
memory transaction.
17. The non-transitory computer readable storage medium of claim
12, wherein the code is further operable to update a directory
associated with each cache pursuant to the selected one of the
plurality of coherency protocols.
18. A system for performing cache coherency in a computer system,
comprising: means for selecting one of a plurality of coherency
protocols for a memory transaction to be applied to cached data
stored in the computer system, each of the plurality of coherency
protocols having a unique set of cache states that may be applied
to the cached data for the memory transaction; and; means for
performing cache coherency for the memory transaction on caches in
the computer system storing the cached data by applying the set of
cache states of the selected one of the plurality of coherency
protocols.
19. The system of claim 18, wherein selection among the plurality
of coherency protocols is performed for each separate memory
transaction.
20. The system of claim 18, wherein the plurality of cache
coherency protocols from which selection is made includes a write
invalidate protocol, an update protocol, an exclusive protocol, and
an update once protocol.
21. The system of claim 18, further comprising means for
dynamically selecting different cache coherency protocols for each
cache in the computer system.
22. The system of claim 18, wherein selection of one of the
plurality of cache coherency protocols is performed based on the
cached data associated with the memory transaction.
23. The system of claim 18, further comprising means for updating a
directory associated with each cache pursuant to the selected one
of the plurality of coherency protocols.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. application Ser.
No. 12/887,374, filed Sep. 21, 2010; which is a continuation of
U.S. patent application Ser. No. 10/837,057, which is now U.S. Pat.
No. 7,802,058; which claims priority to U.S. Provisional
Application No. 60/467,102 filed Apr. 30, 2003, which are
incorporated by references herein in their entireties.
TECHNICAL FIELD OF THE INVENTION
[0002] The present invention relates in general to multi-processor
computing systems and more particularly to a method for performing
cache coherency in a computer system.
BACKGROUND OF THE INVENTION
[0003] In computer systems, there is a disparity between processor
cycle time and memory access time. Since this disparity limits
processor utilization, caches have been introduced to solve this
problem. Caches, which are based on the principal of locality,
provide a small amount of extremely fast memory directly connected
to a processor to avoid the delay in accessing the main memory and
reduce the bandwidth needed to the main memory. Even though caches
significantly improve system performance, a coherency problem
occurs as a result of the main memory being updated with new data
while the cache contains old data. For shared multi-processor
systems, a cache is almost a necessity since access latency to
memory is further increased due to contention for the path to the
memory. It is not possible for the operating system to ensure
coherency since processors need to share data to run parallel
programs and processors cannot share a cache due to bandwidth
constraints.
[0004] Various algorithms and protocols have been developed to
handle cache coherency. For example, in a directory based caching
structure, a write invalidate scheme allows for a processor to
modify the data in its associated cache at a particular time and
force the other processors to invalidate that data in their
respective caches. When a processor reads the data previously
modified by another processor, the modifying processor is then
forced to write the modified data back to the main memory. Though
such a scheme handles cache coherency in theory, limitations in
system performance are still apparent.
SUMMARY OF THE INVENTION
[0005] From the foregoing, it may be appreciated by those skilled
in the art that a need has arisen for a scheme to provide
significant performance benefits for cache coherency in a computer
system. In accordance with the present invention, there is provided
a method for performing cache coherency in a computing system that
substantially eliminates or greatly reduces disadvantages and
problems associated with conventional cache coherency
techniques.
[0006] According to an embodiment of the present invention, there
is provided a method of performing cache coherency in a computer
system that includes selecting one of a plurality of coherency
protocols for a first memory transaction. Cache coherency is
performed for the first memory transaction on caches in the
computer system in accordance with the one of the plurality of
coherency protocols selected for the first memory transaction. The
plurality of cache coherency protocols includes invalidation,
update, exclusive, and update once. Each cache coherency protocol
provides a specific process for maintaining cache coherency in a
computing system. Selection of a cache coherency protocol can be
performed on a dynamic basis for each memory transaction to be
processed.
[0007] The present invention provides various technical advantages
over conventional cache coherency techniques. For example, one
technical advantage is the capability to selectively use other
coherency and consistency mechanisms for memory update
transactions. Another technical advantage is to develop a computer
system with dramatically increased delivered performance with
respect to other more standard computers designed with similar
integrated circuit technology. Embodiments of the present invention
may incorporate all, some, or none of these technical advantages
while other technical advantages may be readily apparent to those
skilled in the art from the following figures, description, and
claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] For a more complete understanding of the present invention
and the advantages thereof, reference is now made to the following
description taken in conjunction with the accompanying drawings,
wherein like reference numerals represent like parts, in which:
[0009] FIG. 1 illustrates an example multiprocessor computing
system.
DETAILED DESCRIPTION OF THE INVENTION
[0010] FIG. 1 shows a simplified block diagram of a multiprocessor
computing system 10. Computing system 10 includes a plurality of
nodes 12. Each node includes a processor 14, input/output 16, a
directory 18, and a cache 20. As shown, computing system 10 is a
directory based multiprocessor system where data can be shared
across all of the processors 14. As a result of data being shared
throughout computing system 10, a coherency mechanism is used to
track and update data in the various caches 20 to ensure that valid
data is used during execution of applications. Various coherency
protocols may be implemented within computing system 10. These
coherency protocols may be dynamically selected for each cache and
memory transaction performed in computing system 10. The coherency
protocols may be associated with particular data stored within
computing system 10 to facilitate selection of a specific coherency
protocol. The technique of the present invention may be implemented
as one or more software modules residing anywhere within computing
system 10.
Base Protocol
[0011] The computing system supports the base distributed shared
memory model used in current systems: MESI cache coherency with
system state maintained in a directory associated with main memory.
Caches are copyback with read and write allocation. A three level
cache structure may be employed, where the second level is
inclusive of the first and the third level is a victim cache. The
directory maintains a sharing vector for shared data, which
identifies all caches holding copies. When a processor intends to
perform a store to a line in shared state, it sends an upgrade
request to the directory, which then sends Invalidate messages to
all caches whose bit is set in the sharing vector. The directory
also sends a sharing count back to the requesting node. When a node
receives an Invalidate, it changes its cache state to Invalid and
sends an Acknowledge message back to the requesting node. The
requesting node counts the Acknowledges it receives, remaining
stalled until the count matches the sharing count that the
directory sent back. This guarantees sequential consistency,
meaning that all processors in the system will see all
modifications to a particular memory location in the same order,
and that any processor will always see modifications to memory made
by a particular processor in the same order that the program on
that processor performs them.
[0012] Performance achieved by a system using only this protocol
and consistency model should be considered the baseline against
which the improvements of the present invention are compared. It is
expected that this model is actually optimal for some programs, and
frequently optimal for a significant subset of the data used by
many programs. However, very large improvements can be achieved if
other models can be selected for use on specific data used within a
program.
[0013] It is possible to implement weak consistency in a MESI
protocol, restricted to the specific situation of upgrades from
Shared to Exclusive state. This is done by stalling only until the
directory sends back the sharing count (instead of stalling until
all acknowledgments have been received) and then using a SYNC
instruction to stall until the total outstanding acknowledgment
count goes to zero. The value of this capability is relatively
small on most applications and, though this functionality may be
included, the benefits of weakly ordered updates, discussed below,
is expected to be much larger.
Update Protocol
[0014] An Update protocol is one in which data is not made
exclusive before executing stores. Instead, store data is actually
transmitted to all caches in the systems that hold copies of that
data, and those caches update their copies accordingly. Updates may
be implemented in concert with either sequential or weak
consistency models. Update protocol has strong advantages in the
following cases (other cases may benefit as well): [0015] Data is
shared at the same time--that is, one or more processors are
writing the data in the same time period where one or more
processors are reading the data. In many cases, sequential ordering
is not required here. [0016] Producer--consumer situations in which
it is important to minimize the startup time for the consumer. MPI
messages are a good example of this case: if the consumer first
obtains a shared copy of the message buffer, the producer can
`push` the data directly into the consumer's cache, thus avoiding a
cache miss when the consumer tries to access the data. Sequential
ordering is usually not required in this application, since there
is only one writer. [0017] False sharing--This is a situation where
data within a cache line is not actually shared; instead different
portions of the line are private to two or more processors. As a
result, in a MESI protocol, the line must be moved whenever one of
the processors desires to perform a write of its data. Since
updates do not require making the line exclusive, this problem is
avoided. [0018] Barriers--This is actually a special case of the
first situation. All processors in a group must inform each other
that they have reached a particular point in their program.
Barriers require that sequential ordering be maintained, since
processors are accumulating data in a location and must always add
their result to that of the other processors.
[0019] Unfortunately, the Update protocol has some disadvantages.
Since all writes are transmitted to all nodes that cache the data,
the bandwidth reduction normally achieved by copyback caches is
eliminated. In most systems, individual messages would need to be
sent by the directory to each cache holding a copy, generating
traffic proportional to the number of stores times the number of
sharers. Since interconnect bandwidth tends to be a scarce
resource, excessive updates can lead to a performance reduction
over MESI if the interconnect becomes saturated. This problem can
be mitigated by a combination of careful selection of data to be
maintained by updates and by incorporating hardware features to
reduce the traffic generated by updates. Also, support for updates
tends to be significantly more complex to implement than standard
MESI systems. Large transistor counts implementable today make this
less of an issue, although managing the complexity remains a
problem.
[0020] The proposed implementation would include a processor, which
can generate updates on selected data. Stores of a full word would
generate update messages. Stores of less than a word would use MESI
protocol for simplicity of hardware implementation. When in update
mode, caches would use read allocation only. Performing a store
would not cause a fill on a cache miss. This allows producers to
avoid forcing out data and to cache data they are producing and
will not consume. It may be worth performing write allocation as an
optional implementation. Outgoing update messages would be sent to
the directory and memory controller. The memory would be written,
and a new cache update message would be sent to all caches marked
in the sharing vector. The directory would also send a sharing
count back to the originator of the update, so the originator knows
how many acknowledges to expect. To reduce traffic, hardware would
be implemented to fan out a single cache update within the routers
at each vertex in the interconnection network, resulting in a
maximum of one message traveling down any particular link for each
update performed. This mechanism would also collapse the
acknowledges on the way back, resulting in a maximum of one
acknowledgment per update per link.
[0021] Upon receiving an update message, a node treats the update
as an Invalidate to its primary cache and writes the data to its
secondary cache. It also sends an acknowledgment back to the
directory, which relays the acknowledgment back to the requestor.
If the updated line is no longer cached, this fact is included in
the acknowledgment message. The directory then clears the
associated bit in the sharing vector though this is somewhat
difficult if a coarse sharing vector is used.
[0022] If operating in sequential consistency mode, the requestor
stalls until it has received all acknowledges for the update. If
operating in weak consistency mode, it keeps count of the total
number of expected acknowledgments, but does not stall until a SYNC
instruction is executed. SYNC causes a stall until the outstanding
acknowledgment count is zero. An intermediate mode may be needed
wherein the processor stalls until it receives its own update back
from the directory. This would support special memory
operations.
[0023] In weak consistency mode, it is possible to use hardware
`write gatherers` to reduce update bandwidth by storing outgoing
updates and coalescing multiple updates to the same line into a
single larger transaction. The write gatherers would flush to the
system when full, when another gatherer is needed and all are in
use, and when a SYNC instruction executes. A single instruction or
separate instructions may be used to flush the gatherers and to
stall until the acknowledgments complete. Weak consistency should
greatly reduce stall time in the writing processor, but does
require that the programmer, or ideally the compiler or other
`smart` intermediary, know when it is safe to use. Most parallel
programs, which use barriers to divide program phases, and most
messaging systems, can safely use weak consistency except on the
barriers themselves.
[0024] In very large systems, the system is currently divided into
regions and only allow Exclusive (not Shared) access to data whose
home is outside the requestor's region. This is done to control the
size of the required sharing vector. Since write allocation is not
used with updates, it seems possible that one could perform
update-mode stores from outside the data's home region without
acquiring the data exclusively and without growing the sharing
vector.
[0025] Extra link bandwidth needed for selective updating cannot be
predicted as it depends on how often the data is written. Updates
are most useful when the reader needs to know the results of a
different processor's writes quickly. Since this may consume
significant bandwidth, it is important to restrict use of updates
to situations where this is true. Predicting the proper use of
updates by examining the source code only is not practical as a
knowledge of the high level functionality of the code is needed. A
wizard application may be useful in accomplishing this. It would
also be possible to automatically choose coherency modes by
gathering statistics about access patterns during execution and
changing the modes dynamically.
Exclusive Protocol
[0026] A mechanism can be created wherein both Load and Store
misses always request data in Exclusive state. This is desirable in
cases where the programmer/compiler/other tool knows that the data
is eventually going to be written in Exclusive state, and that no
other processor will be accessing the data before the data is
written. This condition certainly applies to private data, which is
not ever shared, as well as public data, which is shared, but only
during specific known intervals. If done correctly, this reduces
Invalidate traffic in the system and associated stall time spent
waiting for acknowledgments.
[0027] Exclusive data includes data that is private to an execution
thread, data that is read by only one thread though it may be
written by a different thread, and data that is known to be
destined to be written but is initially accessed by a Load
instruction. In this latter instance, it is also known that other
threads will not usually be accessing the data between the Load and
Store instructions. Compilers can detect some of these conditions.
For other conditions, the programmer can specify the condition in
an appropriate manner. It is possible to develop hardware and/or
software to provide such detection automatically. For example, a
wizard application may be implemented to provide a possible
detection approach. The programmer's ability to specify coherency
modes may be performed through new instructions, TLB entries, or
special addresses. It may be beneficial to have a user level
instruction that modifies the coherency fields in the TLB or page
tables. Though difficult, performance benefits may be quantified in
the context of a specific program fragment.
Non-Coherent Cached Access
[0028] An access mode can be implemented where a read receives the
current snapshot of the requested data, but the directory does not
take note of the caching and therefore does not send updates or
invalidates. This of course eliminates all coherency overhead for
this data, but creates numerous system issues.
[0029] Much academic research has suggested that allowing the
software to use this mode and manage its own coherency is
beneficial, but this work generally ignores issues of process
migration, reclaiming of the memory at the end of the process, IO,
etc. Strangely, it appears that completely private data gets no
benefit from this treatment--it generates no coherency traffic if
handled in Exclusive mode, which does not suffer from any of the
issues described above.
[0030] Actually, this mechanism appears best suited to acquiring
data produced by another thread after that thread finishes and
generates some sort of completion event (message or barrier) which
tells the consumer to pull in a new snapshot. To enable use of this
mechanism, there must be a completely reliable way to guarantee
that the user's store will miss in cache and not access stale
(un-updated) cached data. The obvious FLUSH instruction doesn't
necessarily do the job in the presence of an operating system. It
is always possible for the thread to flush the cache, then be
suspended and restarted on another processor. This can occur
between execution of the flush (on processor 1) and the load (on
processor 2). In that case, the cache could hit on stale
information, and the user thread would never know. It would seem
the operating system would need to track all non-coherent data and
guarantee proper flushing when it restarts a thread.
Update Once Protocol
[0031] An interesting variant is something termed Update Once
protocol, modeled after the conventional Write Once protocol
originally used for copyback caches. This is an attempt to have the
hardware automatically select between Update and Invalidate
mechanisms. By default, the first write to a cache line generates
an update. If an update from a second processor has not been
received before the processor attempts a second write to the same
line, the hardware assumes the line is not being actively shared
and issues an Invalidate instead of an update and converts the line
to exclusive state. This approach is useful for situations with
multiple writers, but not for the other cases discussed above where
updates may add value. It therefore seems necessary to offer pure
update mode as well. With the various protocols, coherency mode
decisions can be made dynamically.
Coherency States and Actions
[0032] This section discusses the system behavior for the various
cache states under the extended coherency model needed to support
the functions described above. As usual, stable cache states are
Modified, Exclusive, Shared, and Invalid. Directory states are
Exclusive, Shared, and Invalid. Additional transitional states used
in implementation are not discussed here. The cache tags include an
Update Sent (US) bit that is used to implement the Update Once
protocol.
Cache in Invalid State--Coherency Actions
TABLE-US-00001 [0033] Invalidate Update Update Once Exclusive Event
Protocol Protocol Protocol Protocol Load Send data; Dir Send data;
Dir Send data; Dir Send data; Dir Instruction - & cache -> E
& cache -> S & cache -> S & cache -> E Dir I
state Load Send data; Dir Send data; Dir Send data; Dir Send
Invalidate Instruction - & cache -> S & cache -> S
& cache -> S to sharers; send Dir S state data; Dir &
cache -> E Load Send Send Send Send Invalidate Instruction -
downgrade to downgrade to downgrade to to owner; owner Dir E state
owner; owner owner; owner owner; owner sends data; Dir sends data;
Dir sends data; Dir sends data; Dir & caches -> E &
caches -> S & caches -> S & caches -> S Store Send
data; Dir -> Update memory; Update memory; Send data; Dir ->
Instruction - E & cache -> No state change No state change E
& cache -> Dir I state M M Store Send Invalidate Update
memory Update memory Send Invalidate Instruction - to sharers; send
and sharers; No and sharers; No to sharers; send Dir S state data;
Dir -> E & state change state change data; Dir -> E &
cache -> E cache -> M Store Send Invalidate Update memory
Update memory Send Invalidate Instruction - to owner; owner and
owner; No and owner; No to owner; owner Dir E state sends data; Dir
-> state change state change sends data; Dir -> E & cache
-> E & cache -> M M Invalidate Acknowledge Acknowledge
Acknowledge Acknowledge Received Update Acknowledge Acknowledge
Acknowledge Acknowledge Received Invalid; dir Invalid; dir Invalid;
dir Invalid; dir removes from removes from removes from removes
from sharing vector sharing vector sharing vector sharing
vector
Cache in Shared State (Directory Will Also be in S
State)--Coherency Actions
TABLE-US-00002 [0034] Invalidate Update Update Once Exclusive Event
Protocol Protocol Protocol * Protocol Load Cache hit - no Cache hit
- no Cache hit - no ** Cache Hit - Instruction action action action
no action Store Send upgrade Update memory IF (not US) {set ** Send
upgrade Instruction request; Dir and all shares; US; Update
request; Dir sends Invalidates Dir & cache -> S memory and
all sends Invalidates to shares; Dir -> sharers; Dir & to
sharers; Dir -> E & cache -> M cache -> S} Else E
& cache -> {Send upgrade M request; Dir sends Invalidates to
sharers; Dir -> E & cache -> M} Invalidate Cache -> I;
Cache -> I; Cache -> I; ** Cache -> I; Received Dir ->E
(pointing Dir ->E (pointing Dir ->E (pointing Dir ->E
(pointing at new owner) at new owner) at new owner) at new owner)
Update Update cache; Update cache; Update cache; ** Update cache;
Received send Ack; No send Ack; No send Ack; clear send Ack; No
state change state change US state change * Note US bit is always
cleared by transition out of S state ** Unexpected cache state for
this protocol
Cache in Exclusive State (Directory Will Also be in E
State)--Coherency Actions
TABLE-US-00003 [0035] Invalidate Update Update Once Exclusive Event
Protocol Protocol Protocol * Protocol Load Cache hit - no Cache hit
- no Cache hit - no Cache hit - no Instruction action action action
action Store Cache -> M ? Either Cache -> Cache -> M Cache
-> M Instruction M or Send Update to memory; Update memory Dir
& cache -> S ? Invalidate Cache -> I; Cache -> I;
Cache -> I; Cache -> I; Received Dir -> E (pointing Dir
-> E (pointing Dir -> E (pointing Dir -> E (pointing at
new owner) at new owner) at new owner) at new owner) Update Update
cache; Update cache; Update cache; ** ? Update Received send Ack;
No send Ack; No send Ack; No cache; send Ack; state change state
change state change No state change Shared Cache & Dir ->
Cache & Dir -> Cache & Dir -> Cache & Dir ->
Intervention S; send Clean S; send Clean S; send Clean S; send
Clean Received Ack Ack Ack Ack Exclusive Cache -> I Cache ->
I Cache -> I Cache -> I Intervention Dir -> E (points Dir
-> E (points Dir -> E (points Dir -> E (points Received to
new owner); to new owner); to new owner); to new owner); send Clean
Ack send Clean Ack send Clean Ack send Clean Ack ** Not an expected
event in this protocol
Cache in Modified State (Directory Will Also be in E
State)--Coherency Actions
TABLE-US-00004 [0036] Invalidate Update Update Once Exclusive Event
Protocol Protocol Protocol * Protocol Load Cache hit - no Cache hit
- no Cache hit - no Cache hit - no Instruction action action action
action Store Cache hit - no ? Either Cache Cache hit - no Cache hit
- no Instruction action hit no action or action action Send Update
to memory; Update memory Dir & cache -> S ? Invalidate Cache
-> I; Cache -> I; Cache -> I; Cache -> I; Received Dir
-> E (points Dir -> E (points Dir -> E (points Dir -> E
(points to new owner) to new owner) to new owner) to new owner)
Update Update cache; Update cache; Update cache; ** ? Update
Received send Ack; No send Ack; No send Ack; No cache; send Ack;
state change state change state change No state change Shared Cache
& Dir -> Cache & Dir -> Cache & Dir -> Cache
& Dir -> Intervention S; send Dirty S; send Dirty S; send
Dirty S; send Dirty Received Ack & data (data Ack & data
(data Ack & data (data Ack & data (data is written back is
written back is written back is written back and sent to and sent
to and sent to and sent to requestor) requestor) requestor)
requestor) Exclusive Cache -> I Cache -> I Cache -> I
Cache -> I Intervention Dir -> E (points Dir -> E (points
Dir -> E (points Dir -> E (points Received to new owner); to
new owner); to new owner); to new owner); send Dirty Ack send Dirty
Ack send Dirty Ack send Dirty Ack & data (data is & data
(data is & data (data is & data (data is written back and
written back and written back and written back and sent to
requestor) sent to requestor) sent to requestor) sent to requestor)
** Not an expected event in this protocol
[0037] In summary, the goal of is to develop a computer with
dramatically increased delivered performance with respect to other
more standard computers designed with similar integrated circuit
technology. The application space of interest is defense-related
and tends to use very large datasets, which are often accessed in a
sparse, and sometimes random, pattern, Integer arithmetic has equal
or greater importance as floating point. To achieve this goal, a
variety of new hardware and software mechanisms are incorporated.
Focus is placed on those mechanisms relating to cache coherency and
memory consistency. The target system will be a shared memory
multiprocessor which employs a directory for cache coherency and a
high bandwidth interconnect network to transmit information between
nodes. Most current systems of this type employ writeback caches,
MESI protocols, and sequential ordering memory models. Significant
performance can be gained by adding the capability to selectively
use other coherency and consistency mechanisms for memory update
transactions.
[0038] Thus, it is apparent that there has been provided, in
accordance with the present invention, a method for performing
cache coherency in a computer system that satisfies the advantages
set forth above. Although the present invention has been described
in detail, it should be understood by those skilled in the art that
various changes, substitutions, and alterations may be readily
ascertainable by those skilled in the art and may be made herein
without departing from the spirit and scope of the present
invention as defined by the following claims. Moreover, the present
invention is not intended to be limited in any way by any statement
made herein that is not reflected in the appended claims.
* * * * *