Cache memory with improved replacement policy Schubert, Richard P. [Schubert, Richard P.]

Cache memory with improved replacement policy

Schubert, Richard P.

Patent Application Summary

U.S. patent application number 10/786250 was filed with the patent office on 2005-08-25 for cache memory with improved replacement policy. Invention is credited to Schubert, Richard P..

Application Number	20050188158 10/786250
Document ID	/
Family ID	34861740
Filed Date	2005-08-25

United States Patent Application	20050188158
Kind Code	A1
Schubert, Richard P.	August 25, 2005

Cache memory with improved replacement policy

Abstract

A processor system having a cache memory. The replacement policy for the cache is augmented with a consideration of priority so that higher priority items are not displaced by lower priority items. The priority based replacement policy can be used to allow processes that are of lower priority to share the same cache with processes that are of higher priority. A processor including digital signal processing and general purpose logic function is shown to employ the priority based replacement policy to allow processes executing generalized logic functions to use the cache when not needed for digital signal processing operations that are time critical. A processor having digital signal processing capability is shown to employ the priority system to reserve a block of memory configured for a cache. The block of memory is reserved by setting the priority of those cache locations to a priority higher than any other executing process.

Inventors:	Schubert, Richard P.; (Medfield, MA)
Correspondence Address:	Edmund J. Walsh Wolf, Greenfield & Sacks, P.C. 600 Atlantic Avenue Boston MA 02210 US
Family ID:	34861740
Appl. No.:	10/786250
Filed:	February 25, 2004

Current U.S. Class:	711/133 ; 711/128; 711/E12.075
Current CPC Class:	G06F 12/126 20130101
Class at Publication:	711/133 ; 711/128
International Class:	G06F 012/00

Claims

What is claimed is:

1. A method of operating a cache in a digital computer system, the cache having a plurality of memory locations, the method comprising: a) obtaining a priority indicator with memory locations in the cache; b) storing a new item in the cache by: i) associating a priority with the new item; ii) selecting a memory location in the cache based in part on the priority indicators of the memory locations in the cache relative to the priority of the new item; iii) storing the new item in the selected memory location; c) associating the priority of the new item with the selected memory location in the cache.

2. The method of operating a cache as in claim 1 wherein selecting a memory location in the cache based in part on the priority indicators comprises: a) when the cache has an empty memory location suitable for storing the new item, storing the new item in an empty memory location; b) when the cache has no empty memory location suitable for storing the new item, storing the new item in the least frequently used memory location with a priority indicator that is the same or lower than the new item, if one exists, otherwise not storing the new item in the cache and treating the new item as not cacheable.

3. The method of operating a cache as in claim 1 wherein selecting a memory location in the cache based in part on the priority indicators comprises storing the new item in the least frequently used memory location with a priority indicator that is the same or lower than the new item, if one exists.

4. The method of operating a cache as in claim 3 wherein selecting a memory location in the cache based in part on the priority indicators comprises: a) when the cache has an empty memory location suitable for storing the new item, storing the new item in an empty memory location; b) when the cache has no empty memory location suitable for storing the new item, storing the new item in the least frequently used memory location with a priority indicator that is lower than the new item.

5. The method of operating a cache as in claim 1 wherein selecting a memory location in cache based in part on the priority indicators comprises: a) when the cache has an empty memory location suitable for storing the new item, storing the new item in an empty memory location; b) when the cache has no empty memory location suitable for storing the new item, storing the new item in the least recently used memory location with a priority indicator that is the same or lower than the new item, if one exists, otherwise not storing the new item and treating the new item as not cacheable.

6. The method of operating a cache as in claim 1 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in the least recently used memory location with a priority indicator that is the same or lower than the new item, if one exists.

7. The method of operating a cache as in claim 6 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in the least recently used memory location with a priority indicator that is lower than the new item, if one exists.

8. The method of operating a cache as in claim 1 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in the least recently loaded memory location with a priority indicator that is the same or lower than the new item, if one exists.

9. The method of operating a cache as in claim 8 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in the least recently loaded memory location with a priority indicator that is lower than the new item, if one exists.

10. The method of operating a cache as in claim 1 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in a psuedo randomly selected memory location with a priority indicator that is the same or lower than the new item, if one exists.

11. The method of operating a cache as in claim 10 wherein selecting a memory location in cache based in part on the priority indicators comprises: storing the new item in t a psuedo randomly selected memory location with a priority indicator that is lower than the new item, if one exists.

12. The method of operating a cache as in claim 1 wherein the cache contains a data array and a tag array and associating a priority indicator with a memory location comprises storing a value in a field in the tag array.

13. The method of operating a cache as in claim 1 wherein the digital computer system executes a plurality of processes, each process having a priority associated with it and the priority associated with the new item is derived from the priority of the process that generated the new item.

14. The method of operating a cache as in claim 1 additionally comprising: a) assigning a first priority to a first portion of the plurality of memory locations; b) assigning a second priority, lower than the first priority, to a second portion of the plurality of memory locations; c) generating new items to store in the cache with priorities lower than or equal to the second priority; and d) using the first portion of the plurality of memory locations for non-cache memory operations.

15. The method of operating a cache as in claim 14 wherein the digital computer system comprises a digital signal processor and using the first portion of the plurality of memory locations for non-cache operations comprises using the first plurality of operations for digital signal processing operations.

16. The method of claim 14 wherein assigning a first priority to a first portion of the plurality of memory locations comprises writing to a control register.

17. The method of claim 1 wherein associating a priority with a new item comprises reading a priority from a table associating priorities with memory addresses.

18. The method of claim 1 additionally comprising altering the priority associated with a plurality of memory locations in the cache by writing to a control register.

19. A processor system having a cache, the cache comprising: a) a data array having a plurality of memory locations for storing items; b) a tag array having a plurality of memory locations, each location associated with a location in the data array, each location in the tag array having associated therewith: a first field, indicating a relative priority of the item stored in the associated location in the data array; and a second field, indicating a portion of an address identifying the item stored in the associated location in the data array.

20. The processor system of claim 19 additionally comprising a memory management unit controlling storage of items in the cache coupled to the tag array whereby locations in the data array are assigned to new items according to a policy in which an empty locations is used, where available, and where no empty location is available, a location associated with a priority that is the same or less than a priority of the new item.

21. The processor system of claim 20 comprising at least one address bus with a plurality of address bits wherein the cache has an address input with a plurality of address bits coupled to at least a portion of the address bus, and the cache further comprises a plurality of ways, each of the ways having a location in the tag array addressed by a subset of the plurality of address bits, the cache further comprising selection circuitry that, upon application of an address to the address input, couples at least the first fields and second fields associated with the addressed location in each of the tag arrays in each of the ways to the memory management unit.

22. The processor system of claim 19 wherein each location in the tag array additionally 1 has associated therewith a third field indicating whether a valid item is stored in the associated location in the data array.

23. The processor system of claim 19 additionally comprising a control register having at least one control bit controlling the value stored in the first field of a plurality of memory locations in the tag array.

24. The processor system of claim 19 wherein the cache is implemented in SRAM.

25. The processor system of claim 19 additionally comprising: a) a memory structure storing priorities associated with addresses; and b) performance monitoring hardware monitoring a parameter indicative of cache efficiency and dynamically altering priorities stored in the memory structure.

26. A processor system, comprising: a) a system bus; b) a semiconductor chip comprising: i) a processor core; ii) at least one cache memory coupled to the processor core, the cache comprising a plurality of memory locations for storing items; iii) a plurality of control bits associated with each memory location in the cache, the plurality of control bits associated with each memory location in the cache, with at least a first control bit for each memory location indicating whether valid information is stored in the memory location and at least a second control bit for each memory location indicating a priority of information stored in the memory location; iv) a memory management unit coupled to the core, the memory management unit configured to receive as an input at least a first control bit and a second control bit, the memory unit having control outputs connected to the cache, the memory management unit having circuitry implementing a priority based cache replacement policy; v) an interface to the bus; and c) semiconductor memory outside the semiconductor chip coupled to the system bus.

27. The processor system of claim 26 wherein the processor core comprises circuitry to execute general purpose microprocessor instructing and digital signal processing functions.

28. The processor system of claim 26 wherein the cache memory is implemented as SRAM and the semiconductor memory is DRAM.

29. The processor system of claim 26 wherein the plurality of control bits additionally comprises a bit for each memory location indicating a replacement policy.

30. The processor system of claim 29 wherein the plurality of control bits additionally comprises a bit for each memory location indicating whether the information stored in the memory location differs from information stored in a corresponding location in the semiconductor memory.

Description

BACKGROUND OF INVENTION

[0001] 1. Field of Invention

[0002] This invention relates generally to computerized data processors and more specifically to the memory subsystems of such processors.

[0003] 2. Discussion of Related Art

[0004] Computer data processors are widely used in modern electronic systems. Some are designed for specialized functions. One example is a digital signal processor (DSP). A digital signal processor is configured to quickly perform complex mathematical operations used in processing of digitized signals.

[0005] FIG. 1 shows a high level block diagram of a computerized data processor. FIG. 1 could represent a general purpose computerized data processor or it could represent a special purpose data processor, such as a digital signal processor. FIG. 1 illustrates a processor chip 100. Within processor chip 100 is a processor core 110. In operation, processor core 110 reads instructions from memory and then performs functions dictated by the instruction. In many cases, these instructions operate on data that is also stored. When an operation performed by processor core 110 manipulates data, the data is read from memory and results are generally stored in memory after the instruction is executed.

[0006] FIG. 1 shows that processor chip 100 includes a level 1 instruction memory unit 112 and a level 1 data memory unit 116. Both the instruction memory unit 112 and data memory unit 116 are controlled by a memory management unit 114. Instruction memory unit 112 and data memory unit 116 each contain memory that stores information accessed by processor core 110 as instructions or data, respectively.

[0007] Level 1 memory is the fastest memory in a computer system. The area required on an integrated circuit chip to implement level 1 memory often makes it impossible to build a processor chip with enough level 1 memory to store all the instructions and all the data needed to run a program. Therefore, a computer system includes level 2 or level 3 memory, level 3 memory is generally very slow but stores a lot of information. Disk drives, tapes or other bulk storage devices are generally used to implement level 3 memory. Level 2 memory is typically semiconductor memory that is slower then level 1 memory. Level 2 memory might be located off-chip. In some cases, level 2 memory is implemented on processor chip 100, but is slower than level 1 memory. For example, level 1 memory might be SRAM and level 2 memory might be DRAM.

[0008] The computer system of FIG. 1 shows off-chip memory 150 that could be level 2 or level 3 memory. Integrated circuit 100 includes a memory interface 122 that can read or write instructions or data in memory 150. Memory 150 is not implemented on semiconductor chip 100.

[0009] In designing a computerized data processing system where speed of operation is a concern, an effort is made to use level one memory as much as possible. Semiconductor chip 100 is configured so that memory operations involving instructions or data pass first through level one instruction memory unit 112 or level one data memory unit 116, respectively. If the needed instruction or data is not located within those units, those units can access memory interface 132 through internal bus interface 130. In this way, processor core 110 receives the required instruction or data regardless of where it is stored.

[0010] To make maximum use of L1 memory, a memory architecture called a cache is often used. A cache stores a small amount of information in comparison to what can be stored in level two and level three memories. Initially the cache stores a copy of information contained in a level two or level three memory location. As processor core 100 needs to read or write to that memory location, it uses the information in the cache instead of accessing the level 2 or level 3 memory. "Policies" that determine what information is stored in the cache are intended to increase the likelihood that information required by processor core 110 is stored within the cache.

[0011] Control circuitry implements the cache "policies" by controlling when information read from the level 2 or level 3 memory is stored in the cache and when information in the cache is written into the level 2 or level 3 memory. Policies also dictate when the control circuit can overwrite or delete information in the cache. Before information in a cache is overwritten or deleted, if it has been changed from what is in the level 2 or level 3 memory, it must be written back to the level 2 or level 3 memory. Policies also control timing of writes of cached information back to level 2 or level 3 memory.

[0012] In the following description, a cache is explained in terms of data read from memory. It should be appreciated, though, that a cache can store information to be written into level 2 or level 3 memory.

[0013] Control circuitry for the cache must take into account that not all data accessed by processor core 110 should be stored in a cache. For example, FIG. 1 illustrates a computerized processor with a memory mapped architecture. Data may be acquired from or sent to locations other than a memory storage device. Processor core 110 may perform an operation based on data from timer 136 or may send data to timer 136 to control its operation. Likewise, data may be sent or received from peripherals, such as a printer, attached to semiconductor chip 100. To interface to peripherals, a serial interface 134 may be used.

[0014] Timer 136 and serial interface 134 are assigned memory addresses. When processor core 110 performs an operation that requires data from these locations or generates data to be sent to these locations, internal bus interface 130 routes the information to the appropriate location based on the address that has been assigned to these devices. It should be appreciated, though, that reading from a cache a copy of information read from a timer at a previous instant in time is not the same as reading from the timer at a later instant of time because the value in the timer may change. Accordingly, the control circuitry for a cache must preclude reads or writes to the memory addresses assigned to the timer from using or storing data in the cache.

[0015] More complicated examples of the need to control whether a memory operation can be performed using information in a cache exist in multi-process systems. Software programs executing in processor chip 100 may create processes. Each process may exist for a period of time and terminate when the operation performed by the process is completed. A first process may store information in a particular memory location. When the first process terminates, a second process may use that same memory location. But, if the processor provides the second process with data stored in the cache for the first process, incorrect operation may result.

[0016] As a further example, the contents of some memory locations may be altered by "Direct Memory Access" (DMA) operations. DMA operations do not initiate in processor core 110. DMA operations would be controlled by DMA controller 138 and would not pass through memory controllers 112 and 116. Thus, the information in the cache would not be updated if a DMA operation involving an address stored in the cache occurred.

[0017] The portion of the cache control circuit that determines which locations in the level two or level 3 memory can be cached is sometimes called a "Cacheability Protection Look aside Buffer" (CPLB). In prior processors with CPLB circuits, the CPLB is implemented as a memory table storing information about blocks of memory--such as which process uses the information in each block and whether memory locations within a block are subject to updating by circuitry other than the processor core 110.

[0018] FIG. 2 shows a block diagram of a cache 200, including a CPLB 250. Other control circuitry is not specifically shown. However, it is well known in the art that semiconductor circuits, including those relating to memories, contain timing and control circuits so that the circuitry achieves the desired operation.

[0019] In a preferred embodiment, cache 200 represents the cache circuits within L1 instruction memory unit 112 and L1 data memory unit 116. The physical architecture of the cache does not depend on the type of data stored in the cache. In operation, processor core 110 generates an address on address line 202. The specific number of bits in the address line is not important. The address is shown to have an X portion and a Y portion. Each portion of the address is made up of some number of the total bits in the address. The X portion and the Y portion of the address together define the address of the smallest "item" of memory that cache 200 stores.

[0020] An "item" of information in a cache may be an individual word or byte. However, most semiconductor memories are organized in rows. Time is required to set up the memory to access any row. Once the memory is set up to access the row, the incremental time to read another location in the row is relatively small. For this reason, when information is read from level two or level three memory to store in a cache, an entire row is often read from the memory and stored in the cache. Little additional time is required to store an entire row, but significant time savings results if a subsequent memory operation needs to access another location in the row. In this case, the "item" stored in the cache corresponds to an entire row in the level 2 or level 3 memory.

[0021] Additional address bits are applied to the cache 200 to select a particular piece of information from the item. For simplicity, FIG. 2 shows address lines to access an "item" but does not show additional circuitry or address lines that may be present to access a particular memory location within any item. Also, a cache that stores "items" with multiple words will some times have a fill buffer. The fill buffer holds words being read from level 2 or level 3 memory until an entire item is read and transferred from the fill buffer to the cache. Such circuitry is not expressly shown because the invention will work with or without a fill buffer. Where a fill buffer is used, the tag array location associated with an item to be stored in the data array might be updated before or during the processes of reading values into the fill buffer. Alternatively, the tag array could be updated after information is stored in the data array. The specific process of updating the array, as well as other processes and features not critical to the invention, are not fully described for simplicity, but one of skill in the art will understand that such processes or features might be used.

[0022] FIG. 2 shows that cache 200 contains a tag array 210 and a data array 220. Each location 222.sub.1, . . . 222.sub.N in data array 220 can store an "item". Tag array 210 contains corresponding locations 212.sub.1 . . . 212.sub.N. The locations in tag array 210 indicate whether an item is stored in the corresponding location in data array 220 and, if so, which memory address the item is associated with. Each of the locations 212.sub.1 . . . 212.sub.N has multiple fields (not numbered). A first field stores an indication of whether valid data is stored in the corresponding location in data array 220. This field is sometimes called the "data valid" field. The second field in each of the locations 212.sub.1 . . . 212.sub.N identifies the address in level 2 or level 3 memory that is stored in the cache. This field is sometimes called the "tag" field. The tag array has fields to store other control bits. For example, a bit might indicate whether the information stored in the data array is a current copy of information in the corresponding level 2 or level 3 memory location or whether it has been modified. Another field might store bits indicating a "policy" applicable to that cache location.

[0023] To simplify the construction and increase the speed the operation of the cache 200, the locations within cache 200 in which the information for any level 2 or level 3 memory location may be stored are constrained. As shown, the Y portion of the address bits of each external memory address are applied to tag array 210 and data array 220. The Y portion of the address bits are used to select one of the locations within these arrays. If information from a level 2 or level 3 address having those Y portions is stored in the cache, it will be stored at the selected location. To indicate that information has been stored in the data array, the data valid field in the corresponding location in the tag array is set.

[0024] Because many external addresses have the same values for their Y bits but different values for the X bits, the information stored in the data array could correspond to multiple external addresses. To distinguish between the many locations that might correspond to the same Y bits, the tag field in the tag array stores the X bits of the address that is being represented by the information in the cache.

[0025] To determine whether cache 200 stores information for a specific address in external memory, the Y bits are used to access a particular location in tag array 210. If the data valid field in that location is set, the tag field in the location addressed by the Y address bits is applied to comparator 230. A second input to comparator 230 comes from the X bits on address line 202. If the X bits match, then the location within data array 220 addressed by the same Y bits can be used in place of making an access to external memory.

[0026] Where information already stored in cache 200 can be used in place of making an access to external memory, it is said that the access resulted in a cache "hit." Conversely, where the cache does not store information corresponding to the external address being accessed, a "miss" is said to occur.

[0027] To increase the chance of a "hit," cache 200 is constructed with multiple "ways." A way is sometimes also called a bank. In the illustration of FIG. 2, two ways 210A and 210B are shown in tag array 210 and a corresponding two ways, 220A and 220B, are shown for data array 220. Each way is addressed by the Y bits of the external address as described above. However, because the tag array can store a different tag in each way for the same Y values, having two ways allows two locations with the same Y bits to be stored in the cache. Being able to store twice as many values nearly doubles the chances of a "hit" and therefore reduces the time required for memory access.

[0028] Increasing the number of ways to 4 or more would further increase the chance of a hit and creates a corresponding reduction in memory access time. However, the number of ways cannot be arbitrarily increased. First, doubling the number of ways doubles the space required on a processor chip 100 to implement the cache. A main reason for having a cache is because it is uneconomical to make large memories on a processor chip. Further, to achieve an increase in speed by having multiple ways, it is necessary that accessing the information in the ways must not take significant additional time.

[0029] Accordingly, comparator 230 contains additional circuitry for each way to simultaneously compare the value in the tag field with the X address bits of the applied address. The output of comparator 230 indicates whether there is a match between the X bits of the applied address and the X bits at the location in any of the ways of the tag array addressed by the Y bits.

[0030] The output of comparator 230 also indicates in which way the match was found. The output of comparator 230 is provided to multiplexer 240. Multiplexer selects the output of the appropriate way when there is a cache hit. If information corresponding to the applied address is not stored in any way, then there is a cache "miss."

[0031] When a cache miss occurs, the level 2 or level 3 memory location containing the addressed information is read. Cache control circuitry causes this information to be stored in cache 200. If there is a location in at least one of the ways addressed by the same Y address bits as the applied address that does not already hold valid data, cache control circuitry causes the new information to be stored in an unused location. However, if all the locations with the same Y address bits in all of the ways hold valid information, the information in one of the ways must be replaced by the new information to be stored in the cache.

[0032] One of the "policies" implemented by the cache control circuitry is a "replacement policy." The replacement policy dictates which way is selected for replacement. Commonly used replacement policies include the Least Recently Used (LRU), Least Recently Loaded (LRL) and Least Frequently Used (LFU). In other instances, the way to be replaced is selected psuedo randomly. Psuedo randomly means that the location is not selected based on the contents of the location. For example, psuedo random replacement could be achieved with a random number generator, though other mechanisms of selecting a location are possible.

SUMMARY OF INVENTION

[0033] It is an object of the invention to provide a cache with an improved replacement policy.

[0034] The foregoing and other objects are achieved in a cache that has a priority indication associated with locations in the cache. The replacement policy selects a location for replacement based in part on the priorities.

[0035] In a preferred embodiment, priorities are associated with blocks of memory in the CPLB. When an item is stored in a cache, the priority indication associated with the block containing that item is copied into a priority field in the tag array.

[0036] In one aspect, the invention allows low priority and high priority processes to use the same cache. In a preferred embodiment, priority indications are assigned to blocks of memory based on the processes with which those blocks of memory are associated. Processes performing time critical functions are given higher priority than processes performing less time critical functions.

[0037] In another aspect, the priority indications are used to dynamically reserve portions of the on-chip memory for processes that require predictable timing for memory access. To reserve a portion of the on-chip memory, the highest priority is assigned to the cache locations that would otherwise occupy that portion of on-chip memory, guaranteeing that the memory locations in the data array corresponding to those locations will not be used by the cache. In a preferred embodiment, the cache is used in connection with a processor chip that contains digital signal processing circuitry and circuitry for performing general processor functions. The portion of the on-chip memory associated with the reserved cache locations can be used for direct access by processes performing time critical digital signal processing tasks.

BRIEF DESCRIPTION OF DRAWINGS

[0038] The accompanying drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. In the drawings:

[0039] FIG. 1 is a block diagram of a prior art processor chip;

[0040] FIG. 2 is a block diagram of a prior art memory cache for the processor chip of FIG. 1;

[0041] FIG. 3 is a block diagram of an improved memory cache for the processor chip of FIG. 1;

[0042] FIG. 4 is a flow chart of a method of storing information in the cache of FIG. 3; and

[0043] FIG. 5 is a block diagram showing dynamic reservation of cache memory.

DETAILED DESCRIPTION

[0044] This invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. Also, the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having," "containing", "involving", and variations thereof herein, is meant to encompass the items listed thereafter and equivalents thereof as well as additional items.

[0045] We have recognized that significant improvement can result in a processor chip from slight changes in the replacement policy of the on-chip cache memories. FIG. 3 shows the architecture of an improved memory cache 300, which can be used in a processor such as shown in FIG. 1. Cache 300 may be used as part of level 1 instruction memory unit 112 or level 1 data memory unit 116. Alternatively, cache 300 may represent a cache implemented in other memory, such as level 2 memory.

[0046] Cache 300 contains a data array 220, which can have the same structure as data array 220 shown in FIG. 2. As in the prior art, tag array 310 has locations that correspond to the locations in data array 220. Both the tag array and the data array are shown with multiple ways. In the illustration of FIG. 3, tag array 310 includes ways 310A and 310B. The ways in the tag array 310 correspond to ways 220A and 220B in data array 220. Also as in the prior art, each location 312.sub.1, 312.sub.2 . . . 312.sub.N in each way of tag array 310 includes a tag field and a data valid field. Other status or control fields as in the prior art could be present, but are not shown. In addition, each location is augmented with a priority field 354.sub.1, 354.sub.2 . . . 354.sub.N. In a preferred embodiment, the priority fields do not impact the manner in which data is read from cache 300. As described above, the Y portion of an address applied on bus 202 is used to index a location in tag array 310. The tag value stored in the indexed location for each way is provided to comparator 230. Comparator 230 compares the tag values with the X portion of the address applied on bus 202. If there is a match, the output of comparator 230 is provided to selector 230 to select the output of the appropriate way in data array 220.

[0047] The priority fields are used in the event of a miss. As described above, when a cache miss occurs, information is fetched from level 2 or level 3 memory. In the prior art, the information fetched from memory was then stored in the cache, sometimes requiring the cache control circuit to select a location to store the new information that would result in information already in the cache being replaced. FIG. 4 shows a modification to the method used by the cache control circuitry for cache 300 that may be used for more efficient cache operation.

[0048] FIG. 4 shows a process for selecting a location in the cache to store an item newly fetched from level 2 or level 3 memory. At step 408, the priority of the new item to be stored is obtained. In the illustrated embodiment, CPLB 350 stores priorities associated with blocks of memory addresses. The priority of any specific address is determined by finding the block in CPLB 350 containing that address and reading the priority associated with that block. In a preferred embodiment, step 408 can be performed at the same time that CPLB 350 is consulted to determine whether the new information should be stored in the cache.

[0049] At step 410 a check is made to determine whether the location in any of the ways corresponding to the Y address bits of the item of information fetched from level 2 or level 3 memory is empty. If the location in one of the ways is empty, meaning that the data valid bit is not set, processing proceeds to step 412.

[0050] At step 412 one of the ways with an empty location is selected. This process can be as in the prior art.

[0051] Processing continues to step 414 where the item fetched from off-chip memory is stored in the selected way. Step 414 can also be as in the prior art. For example, this step may include buffering individual words in an item until the full item is ready to write in the cache.

[0052] At step 416, information is stored in the priority field of the selected location. In a preferred embodiment, the priority bit indicates the importance of maintaining the item of information available in cache memory. In the preferred embodiment, priorities are assigned to items in memory based upon the process which accessed that item of information. For example, processes performing digital signal processing functions that must be performed in real time are assigned higher priorities than processes performing generalized logic functions. For example, if cache 300 is used in a processor chip that drives a cell phone, a process that filters the incoming signal to be presented to a human user as a audio signal is given a higher priority than a process that periodically updates a status display.

[0053] At step 416 other control on status bits can also be stored. For example, the bits indicating the replacement policy might be stored. Also, as part of overwriting a location, the information in that location might be written back to level 2 or level 3 memory before it is destroyed by the overwrite. The process of determining when information must be written back to memory may be as in the prior art.

[0054] In the presently preferred embodiment, process priorities are assigned by a human programmer developing the software that runs on a processor chip using cache 300. In the presently preferred embodiment, the priorities associated with each process are stored in CPLB 350. As described above, each process is assigned certain blocks within memory and CPLB stores the correspondence between the processes and the allocated memory blocks. The CPLB can be readily augmented to include a priority assignment for each block of memory. In the presently contemplated embodiment, the priority field stores a single bit, allowing two levels of priority. However, any convenient number of priority bits can be used, allowing more than two priorities to be available.

[0055] Once the item is stored at step 414 and the priority is stored at step 416, the data valid field corresponding to the selected location is stored at step 418. Steps 414, 416, 418 show the logical steps in storing an item in a cache. More or fewer control steps may be required when the process is implemented in a semiconductor memory. For example, an item, priority and data valid bit may be stored simultaneously in one write operation. Conversely, if an item contains multiple words, step 414 may require multiple write operations. Further more, items may be retrieved from a fill buffer rather than from the cache while they are contained in the fill buffer. This may relax the ordering of steps 414, 416 and 418 within the cache during this interval. In this instance, the cache and fill buffer collectively achieve the same result as executing steps 414, 416 and 418 as shown.

[0056] When an empty way is available, the process of storing an item in the cache is similar to the prior art, except that a priority field is also stored for the item. However, when an empty way is not available, processing proceeds from step 410 to step 430. At step 430 a check is made to determine whether the location in any of the ways corresponding to the same Y address bits as the item to be stored has a priority lower than or equal to the item to be stored. If none of the corresponding locations in any of the ways has the same or lower priority, the storing process shown in FIG. 4 ends without the item being stored, effectively treating the item as not cacheable. Higher priority items are retained in the cache.

[0057] However, if a way with the same or lower priority is available, processing proceeds to step 432. At step 432 candidates for replacement are chosen. Preferably, all ways having the lowest priority provided to step 432 are selected as candidates for replacement. However, alternative implementations of the step are possible. One possible alternative is that all ways having the same or lower priority locations are selected as candidates for replacement.

[0058] At step 434 the replacement policy of the cache is applied to only the selected ways. As a result, when the new item is stored in cache 300, it overwrites an item previously stored in the cache only if the item being overwritten has the same or lower priority, or, depending on the selection process used at step 432, a lower priority. The specific replacement policy applied at step 434 is not critical to the invention. A least recently used or a least recently loaded replacement policy as in the prior art may be used.

[0059] Once the way to be replaced is selected, processing proceeds to step 414 where the new item is stored in the selected way. Thereafter processing proceeds to step 416 and 418 where the priority of the new item is stored and the data valid bit is retained in a set "true" state.

[0060] One benefit of the process shown in FIG. 4 is that processes that are of different priorities can run on the same processor chip and both use the same cache memory. Concern that a lower priority process will cause an overwrite of a location in the cache that stores information needed by a higher priority process is eliminated or, depending on the selection process used in step 432, reduced. As a result, there is less chance that a higher priority process will be delayed by needing to fetch information from off-chip memory that could have been stored in the cache.

[0061] The architecture of cache 300 provides an added benefit of allowing dynamic reservation of memory locations in the cache memory. As shown in FIG. 5 a set of cache locations noted 312.sub.Z-312.sub.N have their priority bits set to one. All other priority bits are set to zero. Likewise, the priorities for all executing processes are set to zero in CPLB 350. Priorities for the process or processes using locations 312.sub.Z . . . 312.sub.N might not be set to zero. This arrangement of priority bits ensures that the memory locations in the data array corresponding to addresses 312.sub.Z . . . 312.sub.N are never used for caching information from off-chip memory. This use of the priority fields 354.sub.Z . . . 354.sub.N effectively creates two blocks of memory in the same way of the data array. Block 520 is used for normal cache operations. In contrast block 530 is not be used for the cache.

[0062] Block 530 is shown as a contiguous block of addresses for simplicity. Block 530 could be fragmented across multiple ways and addresses.

[0063] When a processor such as processor 100 is running a program, memory block 530 provides fast on-chip memory. For example, on-chip memory 530 might be used to store information for a process where time of execution is critical. However, the remainder of the way in data array is available for use as a cache. All portions of other ways may be reserved for fast memory access or may be allocated for use as a cache.

[0064] Because processor 100 uses a memory mapped architecture, each location in tag array 310 can be addressed separately. Separately addressing the locations in tag array 310 allows the priority bits of certain locations to be set to reserve a block 530 in one of the ways of the data array.

[0065] Having thus described several aspects of at least one embodiment of this invention, it is to be appreciated various alterations, modifications, and improvements will readily occur to those skilled in the art.

[0066] For example, the Y portion of the address for external memory locations is described as being used to address the tag array and the data array within a cache. It will be appreciated that this value need not be used as a direct, physical address. It is possible that the Y portion of the address is used as a logical address. The logical address may be converted to the actual physical address of the cache tag array and data array by adding an offset, scaling it or otherwise manipulating the logical address.

[0067] As another example, FIG. 4 shows that lower priority ways are first identified and then a replacement policy is applied. Alternatively, the replacement policy may be applied to select a specific way, but that way may be overwritten only if it contains an item of lower priority, or, depending on implementation, the same or lower proiority, than the new item to be stored.

[0068] Further, FIG. 4 shows that step 414 stores information in the data array and then fields in the tag array are updated at steps 416 and 418. The ordering of these steps is not a limitation on the invention.

[0069] Also, it was described that a process may replace only items in the cache with the same or lower priority. Similar results may be achieved if replacement of only items with lower priority is permitted.

[0070] Further, it is described that priority fields and the data valid fields are stored in the tag array. A convenient implementation of such structure is to have the priority and data valid fields in the same semiconductor memory as the tag array. It should be appreciated, through, that the fields can be physically located in any memory so long as the information they store can be accessed when needed. As a further example, it was described that the process of storing an item in a cache includes steps 410 and 412. Those steps verify whether empty cache locations exist before checking which location to use. Such steps might be omitted because they impact operation of the cache for only a small percentage of its operation. A program running on a data processor makes many accesses to memory and all locations in cache quickly get full. All locations could be treated as initially storing data. In this scenario, the locations in the cache will preferably be initialized with the lowest possible priority. The operation of the replacement policies might be adequate to ensure that unused locations get used before information in other locations is overwritten.

[0071] A further variation that is employed in the presently preferred embodiment is the ability to set the priority bits of all locations in the tag array with one write operation. In the presently preferred embodiment, a bit in a control register is mapped to all of the priority fields in the tag array. By writing to that one control bit, all priority fields change. Such a structure is useful, for example, in clearing the priority bit to release memory that was previously reserved or to clear the priority bits from the cache when a high priority process terminates. Upon termination of a high priority process, changing all priority bits in the cache to the lowest priority might be the only practical approach to ensure that cache locations accessed by that high priority process are returned to normal priority level for use by other processes.

[0072] In a further variation, the priority bits of all locations in the tag array that match a chosen level, or range of levels, may be converted to a new priority level--either higher or lower--with one write operation. This is useful, for example, in providing a system that is highly adaptive to changes of process priority, and providing a system that can easily consolidate process priorities as new processes are initiated when priority levels are scarce.

[0073] Such alterations, modifications, and improvements are intended to be part of this disclosure, and are intended to be within the spirit and scope of the invention. Accordingly, the foregoing description and drawings are by way of example only.

* * * * *