U.S. patent application number 09/344660 was filed with the patent office on 2002-06-06 for caching method using cache data stored in dynamic ram embedded in logic chip and cache tag stored in static ram external to logic chip.
Invention is credited to JANAKIRAMAN, GOPALAKRISHNAN, PONG, FONG.
Application Number | 20020069325 09/344660 |
Document ID | / |
Family ID | 23351449 |
Filed Date | 2002-06-06 |
United States Patent
Application |
20020069325 |
Kind Code |
A1 |
PONG, FONG ; et al. |
June 6, 2002 |
CACHING METHOD USING CACHE DATA STORED IN DYNAMIC RAM EMBEDDED IN
LOGIC CHIP AND CACHE TAG STORED IN STATIC RAM EXTERNAL TO LOGIC
CHIP
Abstract
A caching method for using cache data stored in dynamic RAM
embedded in a logic chip and cache tags stored in static RAM
external to the logic chip. In general, there are at least two
cache applications where this method can be employed. First, there
are caches integral to a processor and interfaced to a processor
pipeline. Second, there are caches external to a processor and
interfaced with a shared bus.
Inventors: |
PONG, FONG; (MOUNTAIN VIEW,
CA) ; JANAKIRAMAN, GOPALAKRISHNAN; (SANTA CLARA,
CA) |
Correspondence
Address: |
HEWLETT PACKARD COMPANY
P O BOX 272400, 3404 E. HARMONY ROAD
INTELLECTUAL PROPERTY ADMINISTRATION
FORT COLLINS
CO
80527-2400
US
|
Family ID: |
23351449 |
Appl. No.: |
09/344660 |
Filed: |
June 25, 1999 |
Current U.S.
Class: |
711/118 ;
711/104; 711/105; 711/E12.038; 711/E12.041 |
Current CPC
Class: |
Y02D 10/00 20180101;
Y02D 10/13 20180101; G06F 12/0893 20130101; G06F 12/084
20130101 |
Class at
Publication: |
711/118 ;
711/105; 711/104 |
International
Class: |
G06F 012/00 |
Claims
What is claimed is:
1. A method of caching memory for a device comprising a logic chip
having embedded logic and embedded DRAM and an external SRAM
connected to the logic chip, the method comprising the steps of:
storing at least a portion of the cache data in the embedded DRAM;
and storing at least a portion of the cache tags in the external
SRAM.
2. A cache memory comprising: a logic chip having embedded logic
and embedded DRAM; an external SRAM connected to the logic chip;
means for storing at least a portion of the cache data in the
embedded DRAM; and means for storing at least a portion of the
cache tags in the external SRAM.
3. A cache memory comprising: a logic chip having embedded logic
and embedded DRAM wherein at least a portion of the cache data is
stored; and an external SRAM connected to the logic chip wherein at
least a portion of the cache tags are stored.
4. A computer system comprising: a processor having embedded logic;
and a cache memory comprising: a DRAM embedded in the processor
wherein at least a portion of the cache data is stored; and an
external SRAM connected to the processor wherein at least a portion
of the cache tags are stored.
5. The computer system according to claim 4, wherein the processor
further comprises: an address buffer connected to the embedded
DRAM; a data buffer connected to the embedded DRAM; a register file
connected to the data buffer; and a pipeline connected to the
address buffer, the data buffer, and the register file.
6. A shared bus computer system comprising: at least one shared
bus; at least one processor connected to the at least one shared
bus; a bus interface having embedded logic connected to the at
least one shared bus; and a cache memory comprising: a DRAM
embedded in the bus interface wherein at least a portion of the
cache data is stored; and an external SRAM connected to the bus
interface wherein at least a portion of the cache tags are
stored.
7. The shared bus computer system according to claim 6, further
comprising a memory connected to the bus interface.
8. The shared bus computer system according to claim 6, further
comprising a second processor connected to the at least one shared
bus.
9. The shared bus computer system according to claim 6, further
comprising a memory connected to the bus interface; and a second
processor connected to the at least one shared bus.
10. The shared bus computer system according to claim 6, further
comprising: a second shared bus connected to the bus interface; a
second bus interface connected to the second shared bus; and a
memory connected to the second bus interface.
11. The shared bus computer system according to claim 10, further
comprising a second processor connected to the at least one shared
bus.
12. The shared bus computer system according to claim 10, further
comprising: a third bus interface having embedded logic connected
to the second shared bus; a second cache memory comprising: a
second DRAM embedded in the third bus interface wherein at least a
portion of the second cache data is stored; and a second external
SRAM connected to the third bus interface wherein at least a
portion of the second cache tags are stored; a third shared bus
connected to the third bus interface; and a second processor
connected to the third shared bus.
13. The shared bus computer system according to claim 12, further
comprising a third processor connected to the at least one shared
bus.
14. The shared bus computer system according to claim 12, further
comprising a third processor connected to the third shared bus.
15. The shared bus computer system according to claim 12, further
comprising: a third processor connected to the at least one shared
bus; and a fourth processor connected to the third shared bus.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of Invention
[0002] The present invention relates generally to the field of
computer system memory and pertains more particularly to a caching
method using cache data stored in dynamic RAM embedded in a logic
chip and cache tag stored in static RAM external to the logic
chip.
[0003] 2. Discussion of the Prior Art
[0004] Modern computer systems are often comprised of multiple
forms and locations of memory. The memory subsystem is typically
organized hierarchically. For example, from cache memory of various
levels at the top to main memory and finally to hard disc memory. A
processor in search of data or instructions looks first in the
cache memory, which is closest to the processor. If the information
is not found there, then the request is passed next to the main
memory and finally to the hard disc. The relative sizes and
performance of the memory units are conditioned primarily by
economic considerations. Generally, the higher the memory unit is
in the hierarchy the higher its performance and the higher its
cost. For reference purposes, the memory subsystem will be divided
into "caches" and "memory." The term memory will cover every form
of memory other than caches. Information that is frequently
accessed is stored in caches and information that is less
frequently accessed is stored in memory. Caches allow higher system
performance because the information can typically be accessed from
the cache faster than from the memory. Relatively speaking, this is
especially true when the memory is in the form of a hard disk.
[0005] A cache consists of a cache data portion and a cache tag
portion. The cache data portion contains the information that is
currently stored in the cache. The cache tag portion contains the
addresses of the locations where the information is stored.
Generally, the cache data will be larger than the cache tags. The
cache data and the cache tags will not necessarily be stored
together depending on the design. When a specific piece of
information is requested, one or more of the cache tags are
searched for the address of the requested information. Which cache
tags are searched will depend on the cache design. If the address
of the requested information is present in the cache tags, then the
information will be available from that address in the cache data.
If the address is not present, then the information may be
available from memory.
[0006] In general, there are two cache applications that will be
considered. First, there are caches integral to a processor and
interfaced to a processor pipeline. Second, there are caches
external to a processor and interfaced with a shared bus. Caches
must be designed in such a way that their latency meets the timing
requirements of the requesting components such as the processor
pipeline or the shared bus. For example, consider the design of the
shared bus. A cache or other agent on the bus that requires a
specific piece of information will issue the address of the
information on the bus. This is known as the address phase.
Subsequently, all caches or other agents attached to the bus must
indicate whether the information at the issued address is located
there. This is known as the snoop phase. Typically, the bus design
specifies that the cache must supply its snoop response within a
fixed time interval after the address has been issued on the bus.
If the cache is not designed to satisfy this timing requirement, it
will lead to sub-optimal usage of the bus thus lowering system
performance.
[0007] Examples of prior art systems will now be discussed in
greater detail. Turning first to FIGS. 1-3, block diagrams of a
processor 10 having an integral cache 12 that is interfaced to a
processor pipeline 14 are shown. The processor 10 further consists
of a register file 16, an address buffer 18, and a data buffer 20.
The various elements are connected together by unidirectional and
bidirectional conductors as shown. When the cache 12 of FIG. 1 is
integral to the processor 10, conventionally both the cache tags
and the cache data are stored in fast static random access memory
(SRAM) technology. In general, such an implementation is shown as
cache 12 in FIG. 2. Sometimes, insufficient cache is provided
integral to the processor, so a supplemental cache is provided
external to the processor. Such an implementation is shown as
caches 12a and 12b in FIG. 3. Among the drawbacks to
implementations of caches exclusively in SRAM are that, relatively
speaking, SRAM is expensive, is less dense, and uses more power
than dynamic random access memory (DRAM) technology.
[0008] With reference to FIGS. 4-6, block diagrams of a cache 12
external to a processor 10 and interfaced with a shared bus 22 are
shown. Also interfaced with the shared bus 22 is a memory 24. The
cache 12 and the memory 24 are interfaced with the shared bus 22
through a bus interface 26 as shown. When the cache 12 of FIG. 4 is
external to the processor 10, conventionally the cache tags are
stored in a SRAM cache and the cache data is stored in a DRAM
cache. In one implementation, both the SRAM cache 12a containing
cache tags and the DRAM cache 12b containing cache data are
external to the bus interface 26 as shown in FIG. 5. In another
implementation, only the DRAM cache 12b containing cache data is
external to the bus interface 26 while the SRAM cache 12a
containing cache tags is integral to the bus interface as shown in
FIG. 6. Among the drawbacks to these implementations are that the
latency of accessing the cache data is long since it is stored in
slower DRAM external to the logic chip. This may force a delay in
transferring data to the shared bus thus degrading the system
performance. Further, when the cache tags are implemented in SRAM
embedded on the logic chip, the size of the cache is limited by the
higher cost, the lower density, and the greater power consumption
of SRAM.
[0009] A definite need exists for a system having an ability to
meet the latency timing requirements of the requesting components
of the system. In particular, a need exists for a system which is
capable of accessing cache memory in a timely manner. Ideally, such
a system would have a lower cost and a higher capacity than
conventional systems. With a system of this type, system
performance can be enhanced. A primary purpose of the present
invention is to solve this need and provide further, related
advantages.
SUMMARY OF THE INVENTION
[0010] A caching method is disclosed for using cache data stored in
dynamic RAM embedded in a logic chip and cache tags stored in
static RAM external to the logic chip. In general, there are at
least two cache applications where this method can be employed.
First, there are caches integral to a processor and interfaced to a
processor pipeline. Second, there are caches external to a
processor and interfaced with a shared bus.
BRIEF DESCRIPTION OF THE DRAWING
[0011] The above and other objects and advantages of the present
invention will be more readily appreciated from the following
detailed description when read in conjunction with the accompanying
drawing, wherein:
[0012] FIG. 1 is a block diagram of a processor having an integral
cache that is interfaced to a processor pipeline according to the
prior art;
[0013] FIG. 2 is a prior art block diagram of a processor having an
integral SRAM cache that is interfaced to a processor pipeline;
[0014] FIG. 3 is a prior art block diagram of a processor having an
integral SRAM cache and an external supplemental SRAM cache both of
which are interfaced to a processor pipeline;
[0015] FIG. 4 is a prior art block diagram of a cache external to a
processor and interfaced with a shared bus;
[0016] FIG. 5 is a prior art block diagram of a SRAM cache
containing cache tags and a DRAM cache containing cache data both
of which are external to a processor and interfaced with a shared
bus;
[0017] FIG. 6 is a prior art block diagram of a DRAM cache
containing cache data and a SRAM cache containing cache tags which
is integral to a bus interface both of which are external to a
processor and interfaced with a shared bus;
[0018] FIG. 7 is a block diagram of a logic chip having embedded
logic and embedded DRAM cache containing cache data according to
one embodiment of the present invention;
[0019] FIG. 8 is a block diagram of a processor having an embedded
DRAM cache containing cache data that is interfaced to a processor
pipeline according to another embodiment of the present
invention;
[0020] FIG. 9 is a block diagram of a SRAM cache containing cache
tags and an embedded DRAM cache containing cache data which is
integral to a bus interface both of which are external to a
processor and interfaced with a shared bus according to a further
embodiment of the present invention; and
[0021] FIG. 10 is a block diagram of a pair of SRAM caches
containing cache tags and a pair of embedded DRAM caches containing
cache data each of which is integral to one of a pair of bus
interfaces both pairs of which are external to a processor and
interfaced with a shared sub-bus according to still another
embodiment of the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0022] Turning now to FIG. 7, a block diagram of a logic chip 30
having embedded logic 32 and embedded DRAM cache 34 containing
cache data according to one embodiment of the present invention is
shown. The embedded logic 32 can be any of a wide variety of logic
that is well known to one of ordinary skill in the art. For
example, the embedded logic 32 may be a floating point unit or a
bus interface. The logic chip 30 is connected to an external SRAM
cache 36 containing cache tags. In general, there are at least two
cache applications where this method can be employed. First, there
are caches integral to a processor and interfaced to a processor
pipeline. Second, there are caches external to a processor and
interfaced with a shared bus. For example, in a shared bus design,
the external SRAM cache 36 can be accessed within the minimum time
delay specified between the address and snoop phases of the shared
bus. Concurrent with the tag access, the cache data can also be
accessed from the embedded DRAM cache 34 on the logic chip 30. The
latency of accessing the embedded DRAM cache 34 is substantially
lower than accessing the external DRAM cache 12b as in FIGS. 5 and
6 above. Among the advantages of the method of the present
invention are that the embedded DRAM cache results in faster data
access and lower pin-count than an external DRAM cache. Further, by
storing the cache tags in external SRAM, the method of the present
invention allows a cache with a larger capacity than a cache
implemented with an integral SRAM as DRAM is cheaper, is more
dense, and consumes less power.
[0023] With reference to FIG. 8, a block diagram of a processor 10
having an embedded DRAM cache 34 containing cache data that is
interfaced to a processor pipeline 14 according to one embodiment
of the present invention is shown. As above with respect to FIGS.
1-3, the processor 10 further consists of a register file 16, an
address buffer 18, and a data buffer 20. The processor 10 is
connected to an external SRAM cache 36 containing cache tags. Such
an implementation is able to meet the stringent time requirements
of the processor.
[0024] FIGS. 9 and 10 are block diagrams of caches external to a
processor and interfaced with a shared bus. The implementation
shown in FIG. 9 is for a single shared bus while the implementation
shown in FIG. 10 is for a hierarchical shared bus. FIG. 9 shows a
block diagram of a SRAM cache 36 containing cache tags and an
embedded DRAM cache 34 containing cache data which is integral to a
bus interface 26, both of which are external to a processor 10 and
interfaced with a shared bus 22 according to one embodiment of the
present invention. FIG. 10 is a block diagram of a pair of SRAM
caches 36 containing cache tags and a pair of embedded DRAM caches
34 containing cache data each of which is integral to one of a pair
of bus interfaces 26 both pairs of which are external to a
processor 10 and interfaced with a shared sub-bus 38 according to
another embodiment of the present invention. As above with respect
to FIGS. 4-6, also interfaced with the shared bus 22 is a memory
24. Both such implementations support faster access to cache data
than conventional approaches while continuing to meet the
requirements of the shared bus.
[0025] While the invention has been illustrated and described by
means of specific embodiments, it is to be understood that numerous
changes and modifications may be made therein without departing
from the spirit and scope of the invention as defined in the
appended claims and equivalents thereof.
* * * * *