U.S. patent application number 08/818060 was filed with the patent office on 2002-03-14 for structure and method for providing multiple externally accessible on-chip caches in a microprocessor.
Invention is credited to CHU, RAYMOND M., NGUYEN, DE H..
Application Number | 20020032827 08/818060 |
Document ID | / |
Family ID | 24900223 |
Filed Date | 2002-03-14 |
United States Patent
Application |
20020032827 |
Kind Code |
A1 |
NGUYEN, DE H. ; et
al. |
March 14, 2002 |
STRUCTURE AND METHOD FOR PROVIDING MULTIPLE EXTERNALLY ACCESSIBLE
ON-CHIP CACHES IN A MICROPROCESSOR
Abstract
A structure and a method provide read and write access to a
microprocessor's internal cache. During write access, an external
data bus transmits to an internal data bus an address, cache tags
and data in accordance with a clock provided externally. During
read access, the external data bus transmits an address and
receives from the internal data bus data and cache tags. In one
embodiment, during write access, the external data bus is
time-multiplexed to transmit an address, cache tags and data in two
clock periods of an externally provided clock signal. During read
access, the external data bus is time-multiplexed to transmit to
the internal data bus an address in the first clock period of the
external clock signal, and to receive tag and data in the next
successive clock periods of the externally provided clock signal.
In this embodiment, reserved pins are used to specify a cache
access mode. Control for the cache access is provided via pins
which are used during functional operation to receive external
interrupt signals.
Inventors: |
NGUYEN, DE H.; (MILPITAS,
CA) ; CHU, RAYMOND M.; (SARATOGA, CA) |
Correspondence
Address: |
SKJERVEN MORRILL MACPHERSON LLP
25 METRO DRIVE
SUITE 700
SAN JOSE
CA
95110
US
|
Family ID: |
24900223 |
Appl. No.: |
08/818060 |
Filed: |
March 14, 1997 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
08818060 |
Mar 14, 1997 |
|
|
|
07722026 |
Jun 27, 1991 |
|
|
|
Current U.S.
Class: |
711/3 ; 711/123;
711/144; 711/203; 711/207; 711/E12.017 |
Current CPC
Class: |
G06F 12/0802
20130101 |
Class at
Publication: |
711/3 ; 711/123;
711/203; 711/207; 711/144 |
International
Class: |
G06F 012/08 |
Claims
We claim:
1. A structure for reading and writing an internal memory of an
integrated circuit having a plurality of pins, comprising: an
internal bus interfaced to said internal memory; means for
receiving at one of said pins a clock signal; means for receiving
at one of said pins a read signal indicating reading of said
internal memory is desired; means for receiving at one of said pins
a write signal indicating writing of said internal memory is
desired; means for providing the data on said internal bus to a
first group of said pins; and means for providing the data on a
second group of said pins to said internal bus.
2. A structure as in claim 1, wherein said first and second groups
of pins includes common pins belonging to both said first and
second groups of pins, said common pins provided with tristate
buffers to effectuate bidirectional operations.
3. A structure as in claim 1, wherein in said internal memory has a
bit-width exceeding the number of pins in said first group of pins,
said means for providing the data on said internal bus provides
said data by time-multiplexing said first group of pins.
4. A structure as in claim 1, wherein in said internal memory has a
bit-width exceeding the number of pins in said second group of
pins, said means for providing the data on said second group of
pins provides said data by time-multiplexing said second group of
pins.
5. A method for writing an internal memory of an integrated circuit
having a plurality of pins, comprising the steps of: providing an
internal bus interfaced to said internal memory; receiving at one
of said pins a clock signal; receiving at one of said pins a write
signal indicating writing of said internal memory is desired; and
providing the data on a group of said pins to said internal
bus.
6. A method for reading an internal memory of an integrated circuit
having a plurality of pins, comprising the steps of: providing an
internal bus interfaced to said internal memory; receiving at one
of said pins a clock signal; receiving at one of said pins a read
signal indicating reading of said internal memory is desired; and
providing the data on said internal bus to a group of said
pins.
7. A method as in claim 6 wherein in said internal memory has a
bit-width exceeding the number of pins in said group of pins, said
step of providing the data on said internal bus provides said data
by time-multiplexing said group of pins.
8. A structure as in claim 5, wherein in said internal memory has a
bit-width exceeding the number of pins in said group of pins, said
step of providing the data on said group of pins provides said data
by time-multiplexing said group of pins.
Description
FIELD OF THE INVENTION
[0001] This invention relates to integrated circuits, and in
particular, relates to the design of microprocessors.
DESCRIPTION OF RELATED ART
[0002] Exploiting the property of locality of memory references,
cache memories have been successfully used to achieve high
performance in many computer systems. In the past, cache memories
of microprocessor-based systems are provided off-chip using high
performance memory components. This is primarily because the amount
of silicon area necessary to provide an on-chip cache memory of
reasonable performance would have been impractical, since
increasing the size of an integrated circuit to accommodate a cache
memory will adversely impact the yield of the integrated circuit in
a given manufacturing process. However, with the density achieved
recently in integrated circuit technology, it is now possible to
provide on-chip cache memory economically.
[0003] In a computer system in which a cache memory is provided,
when a memory word is needed, the central processing unit (CPU)
looks into the cache memory system for a copy of the memory word.
If the memory word is found in the cache memory, a cache "hit" is
said to have occurred, and the main memory is not accessed. Thus, a
figure of merit which can be used to measure the effectiveness of
the cache memory is the "hit" ratio. The hit ratio is the
percentage of total memory references in which the desired datum is
found in the cache memory without accessing the main memory. When
the desired datum is not found in the cache memory, a "cache miss"
is said to have occurred. In addition, in many computer systems,
there is one or more portions of the address space which is not
mapped to the cache memory. This portion of the address space is
said to be "uncached" or "uncacheable". For example, the addresses
assigned to input/output (I/O) devices are almost always uncached.
Both a cache miss or an uncacheable memory reference results in an
access to the main memory.
[0004] In the course of developing or debugging a computer system,
it is often necessary to monitor program execution by the CPU or to
interrupt one instruction stream to direct the CPU to execute
certain alternate instructions. For example, a technique for
testing a microprocessor in a system under development uses an
in-circuit emulator (ICE) which provides facilities to monitor and
intervene in the CPU's instruction stream. The ICE typically
monitors the signals on the microprocessor's pins. In one mode of
ICE operation, when a predetermined condition in the program
execution is encountered, the ICE causes alternative instructions
to be executed for such purpose as reading or altering the internal
states of the CPU. Such alternative instructions can be preloaded
or excluded from the cache memory. The ability to load or exclude
instructions from the cache memory from a source external to the
CPU can be very useful in many applications. Such ability is not
known in the prior art.
[0005] When the cache memory is implemented off-chip, the ICE can
easily isolate the cache memory, perform diagnostic test on each
cell in the cache memory, using such techniques as exhaustive
standard memory test algorithms independent from the operation of
the CPU. In addition, the transactions between the cache memory and
the CPU can be monitored by the ICE on the off-chip bus between the
cache memory and the CPU. Hence, no difficulty is created in
testing or using an off-chip cache. However, when the cache memory
is implemented on-chip, the transactions between the cache and the
CPU occur on an on-chip bus, which cannot be probed from the pins
of the integrated circuit. As a result, debugging operations using
an ICE in a system with an on-chip cache system can be very
restricted. The inability to access and exhaustively test the
internal cache makes diagnosing certain system problems difficult.
When the on-chip cache achieves a high hit ratio, only the
relatively infrequent accesses to main memory due to cache misses
or references to uncacheable parts of memory can be monitored from
the pins.
SUMMARY OF THE INVENTION
[0006] In accordance with the present invention, a structure and a
method provide read and write accesses to a microprocessor's
internal cache. During write access, an external data bus transmits
to an internal data bus an address, cache tags and data in
accordance with a clock signal provided externally. During read
access, the external data bus transmits an address and receives
from the internal data bus data and tag, also in accordance with
the clock, signal provided externally.
[0007] In one embodiment, during write access, the external data
bus is time-multiplexed to transmit the address, the cache tags and
data in two clock periods of an externally provided clock signal.
In the same embodiment, during read access, the external data bus
is time-multiplexed to transmit to the internal data bus an address
in the first clock period of the external clock signal, and to
receive cache tags and data in the next two successive clock
periods of the externally provided clock signal. In this
embodiment, "reserved" pins are used to specify a cache access
mode. Control signals for the cache access are provided via pins
which are used during functional operation to receive external
interrupt signals.
[0008] The present invention allows the user of the microprocessor
to exhaustively test the on-chip cache using standard memory test
algorithms. The present invention also allows preloading the
on-chip cache under control of signals external to the
microprocessor. Such preloading operations can be useful in certain
applications. In addition, the present invention provides a
facility for external testing equipment to monitor or intervene
internal operations of the microprocessor.
[0009] The present invention is better understood upon
consideration of the below detailed description and the
accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] FIG. 1a shows a computer system 100 having a processor 101
with an on-chip instruction cache system 102 and a main memory
system 150 external to the processor 101, in accordance with the
present invention.
[0011] FIG. 1b is a block diagram of the processor 101 of FIG.
1a.
[0012] FIG. 2 is a block diagram showing the addressing scheme used
in instruction cache 102a of the cache system 102 of FIGS. 1a and
1b.
[0013] FIG. 3 is a block diagram in further detail than FIG. 2 of
the interface between CPU core 103 and the instruction and data
caches 102a and 102b, including the control signals ICLK, DCLK,
{overscore (IWR)}, {overscore (DWR)}, {overscore (IRD)} and
{overscore (DRD)}.
[0014] FIG. 4 summarizes some control signals generated from
signals received on the microprocessor's pins for controlling
reading and writing the instruction and data caches 102a and 102b,
in accordance with the present invention.
[0015] FIG. 5 shows data flow between one pin of processor 101 to
one bit each in the DATA[31:0] bus and one of ADRLO[12:0] and
TAG[31:11] busses, in accordance with the present invention.
[0016] FIG. 6 shows a timing diagram for a read cycle and a write
cycle involving either the instruction cache memory 102a, or the
data cache memory 102b, in accordance with the present
invention.
DETAILED DESCRIPTION
[0017] FIG. 1a shows, as an example, a computer system 100 having a
processor 101 with an on-chip cache system 102 and a main memory
system 150 external to the processor, in accordance with the
present invention. As shown in FIG. 1a, external or read and write
memory ("main memory") system 150, which is interfaced to the
processor 101 over a bus 153, comprises a dynamic random access
memory (DRAM) controller 151, a main memory 152 implemented by
banks 152a and 152b of DRAMs and a bus interface 154. In addition,
the address space of computer system 100 is also used to access
other memory-mapped devices such as I/O controller 141, I/O devices
142 and 143, and programmable read-only memory (PROM) 144. To
facilitate reference, the memory-mapped devices other than the main
memory 150 defined above are collectively referred to as the I/O
system 140, even though read-only memories, such as PROM 144, are
often not considered part of the I/O system. I/O system 140 is also
interfaced to the bus 153. Bus 153 comprises address/data bus 153a
and control bus 153b. Memory data and memory addresses are
time-multiplexed on the 32-bit address/data bus 153a. Other device
configurations using the memory address space are also possible
within the scope of the present invention.
[0018] The organization of processor 101 is shown in FIG. 1b. As
shown in FIG. 1b, processor 101 includes two co-processors 103a and
103b, controlled by a master pipeline control unit 103c.
Coprocessor 103a is also referred to as the integer CPU, and
includes 32 32-bit general registers 103a-1, an ALU 103a-2, a
shifter 103a-3, a multiplication and division unit 103a-4, an
address adder 103a-5, and program counter control unit 103a-6.
Processor 103a executes the instruction set known as the MIPS-I
Instruction Set Architecture (ISA). Coprocessor 103b, also known as
the System Control Coprocessor, comprises exception/control
registers 103b-1, a memory management registers unit 103b-2 and a
translation look-aside buffer (TLB) 103b-3. The TLB unit 103b-3
provides a mapping between virtual and physical addresses. The TLB
unit 103b-3 has a 64-entry look-up table to provide mapping between
virtual and physical addresses efficiently. In this embodiment, the
TLB unit 103b-3 is provided at the user's option. The TLB unit
103b-3 can be disabled. The above units of the coprocessors 103a
and 103b can be implemented by conventional or any suitable designs
known in the art. The coprocessor units 103a and 103b, and the
pipeline control unit 103c are collectively referred to as the CPU
core 103.
[0019] The cache system 102 of processor 101 comprises two cache
memories 102a and 102b. Cache 102a is an instruction cache. In this
embodiment shown, the capacity of cache 102a can be 4K or 8K bytes,
and block fill and line sizes of four memory words each. Cache 102b
is a data cache, and has a selectable block refill size of one or
four memory words, a line size of one memory word, and a capacity
of 2K bytes. Other cache, block refill and line sizes can be
provided within the scope of the present invention. Both the
capacities of cache 102a and cache 102b, and their respective block
refill and line sizes, are matters of design choice. In addition,
it is also not necessary to provide separate data and instruction
caches. A joint data and instruction cache is also within the scope
of the present invention. The TLB unit 103b-3 receives from the CPU
core 103 on bus 109 a virtual address and provides to either cache
102a or cache 102b on bus 107 the corresponding physical memory
address. Although cache accessing using virtual addresses is also
possible, by using physical addressing in the instruction and data
caches, the present embodiment simplifies software requirements and
avoids the cache flushing operations necessary during a context
switch in a virtually addressed cache. The cache addressing scheme
of the present embodiment is discussed below in conjunction with
FIG. 2. Other cache addressing schemes are also possible within the
scope of the present invention.
[0020] Bus interface unit (BIU) 106 interfaces processor 101 with
the main memory 150 when a read or write access to main memory is
required. BIU 106 comprises a 4-deep write buffer 106-4, a 4-deep
read buffer 106-3, a DMA arbiter 106-2 and BIU control unit 106-1.
BIU control unit 106-1 provides all control signals on bus 153b,
which comprises buses 153b-1 to 153b-3 necessary to interface with
the main memory 150 and the I/O system 140. Both addresses and data
are multiplexed on the address/data bus 153a, and the control
signals are provided on the {overscore (Rd)}/{overscore (Wr)}
control bus 153b-1, the system clock signal 153b-2, and the DMA
control bus 153b-3.
[0021] FIG. 2 is a block diagram showing the addressing scheme used
in the instruction cache 102a of the cache system 102, which is
shown in FIGS. 1a and 1b. As shown in FIG. 2, the higher order 20
bits of a virtual address (generated by CPU core 103, as shown in
FIG. 1b), which is represented by block 202, is provided to the
cache addressing mechanism represented by block 201. The remaining
10 bits of the memory word address are common between the virtual
and the physical addresses. (The lowest two address bits are byte
addresses, which are not used in cache addressing.) These common
bits are directly provided to index into the cache memory 102a,
represented by blocks 204 and 205. Block 205 represents the data
portion of the cache line, which comprises four 32-bit memory words
in this embodiment. Block 204 represents the "tag" portion
(TAG[32:11]) of the cache data word; this tag portion contains both
a "valid" TAGV bit and the higher order 20 bits of the memory word
addresses of the data words stored in the cache line. (Since the
addresses of memory words within the cache line are contiguous, the
higher order 20 bits are common to all of the memory words in the
cache line). The valid bit TAGV indicates that the cache word
contains valid data. Invalid data may exist if the data in the
cache does not contain a current memory word. This condition may
arise, for example, after a reset period.
[0022] Each virtual address is associated with a particular process
identified by a unique "process id" PID, which is represented by
block 203. Block 201 represents the virtual address to the physical
address translation, which is performed using the TLB unit 103b-3
when the TLB is present. (FIG. 1b.) When the TLB is present, a TLB
miss occurs if either a mapping between the virtual address and the
corresponding physical address cannot be found in the 64 entries of
the TLB unit 103b-3, the PID stored in the TLB unit 103b-3 does not
match the PID of the virtual address, or if the valid bit in the
data word is not set. Block 207 represents the determination of
whether a TLB miss has occurred. The TLB miss condition raises an
exception condition, which is handled by CPU core 103. If a virtual
address to physical address mapping is found, the higher order 20
bits of the physical memory word address is compared (block 206)
with the memory address portion of the tag. The valid bit is
examined to ensure the data portion of the cache line contains
valid data. If the comparison (block 206) indicates a cache hit,
the selected 32-bit word in the cache line is the desired data.
[0023] If a cache miss is indicated, BIU 106 is invoked and CPU
core 103 stalls until BIU 106 indicates that the requested data is
available. A cache miss can also be generated when the memory
access is to a "uncacheable" portion of memory. When BIU 106
receives a datum from main memory, the CPU core 103 executes either
a "refill", a "fix-up", or a "stream" cycle. In a refill cycle, an
instruction datum received (in the read buffer 106-3) is brought
into the cache 102a. In a fix-up cycle, the CPU core 103
transitions from a refill cycle to execute the instruction brought
out of the read buffer 106-3. In a stream cycle, the CPU core 103
simultaneously refills cache memory 102a and executes the
instruction brought out of the read buffer 106-3. For uncacheable
references, the CPU core 103 executes a fixup cycle to bring out
the fetched memory word from the read buffer 106-3, but the
uncacheable memory word is not brought into the cache memory 102a.
Otherwise, the CPU core 103 executes refill cycles until the miss
address is reached. At that time, a fixup cycle is executed.
Subsequent cycles are stream cycles until the end of the 4-memory
word block is reached and normal run operation resumes. If
sequential execution is interrupted, e.g. a successful branch
condition, refill cycles are executed to refill the cache before
execution is resumed at the branch address.
[0024] The operation of the data cache 102b is similar to that of
instruction cache 102a, except that only one fixup cycle is used
after one or four refill cycles, depending upon the refill block
size selected. Because the size of the data caches is 2K bytes, a
21-bit "tag" is required. Hence, because of the different sizes of
the instruction and data caches, the data cache's tag is 1 bit
longer than the instruction cache's tag. In order to have the data
and instruction caches share a common cache addressing scheme, the
instruction cache routes one of its lower order address bits back
as a tag bit, so as to appear as if the tag portion of the
instruction cache is 21-bit. If the refill block size selected for
the data cache is four memory words, as will be apparent below, the
present invention provides the same benefit in the data cache as in
the instruction cache.
[0025] FIG. 3 is a more detailed block diagram of the interface
between CPU core 103 and the instruction cache memory 102a and the
data cache memory 102b. As shown in FIG. 3, CPU core 103 provides
the lower order bits of the physical cache addresses on bus 107-1
(ADRLO[12:0]) to address either of the cache memories 102a and
102b, and receives the tag and data contents of the cache memory
addressed respectively on 22-bit bus 108-1 (TAG[31:11] and TAGV,
hereinafter "TAG BUS") and 32-bit bus 108-2 ("DATA[31:0]"). CPU
core 103 provides to instruction cache 102a the clock signal ICLK,
the read signal {overscore (IRd)}, and the write signal {overscore
(IWr)} for reading and writing cache 102a. An analogous set of
signals DCLK, {overscore (DRd)} and {overscore (DWr)} are provided
to the data cache memory 102b. Instruction cache 102a is divided
into two banks 102a-1 and 102a-2. In bank 102a-1 is stored the tags
of the cache entries, and the data words are stored in bank 102a-2.
Since instruction cache 102a has a line size of four, there are
four times as many entries in the data bank 102a-2 as tag bank
102a-1. Data cache 102b is similarly divided into tag and cache
banks 102b-1 and 102b-2 respectively.
[0026] Processor 101 is a microprocessor of 84 pins. Other than the
power and ground signals, processor 101 receives or provides: a
32-bit address or data bus ADBUS[31:0], lower address bus ADR[3:2],
address latch enable signal ALE, data input enable signal
{overscore (DataEn)}, burst transfer or write near signal
{overscore (Burst)}/{overscore (WrNear)}, read signal {overscore
(Rd)}, write signal {overscore (Wr)}, acknowledge signal {overscore
(ACK)}, read buffer clock enable signal {overscore (RdCEn)}, bus
error signal {overscore (BusError)}, diagnostic signals Diag[1:0],
DMA bus request signal {overscore (BusReq)}, DMA bus grant signal
{overscore (BusGnt)}, branch condition port BrCond[3:0], interrupt
signals {overscore (Int[5:0])}, clock signals Clk2xIn and
{overscore (SysClk)}, reset signal {overscore (Reset)}, and
reserved signals RSVD[4:0]. The functional descriptions of these
signals can be found in the "IDT79R3051 Family Hardware User's
Manual," available from Integrated Device Technology, Inc., Santa
Clara, Calif. This hardware manual is hereby incorporated by
reference in its entirety.
[0027] In order to provide the benefits of the present invention,
the pins receiving reserved signals RSVD[4:0] (i.e. the "reserved
pins RSVD[4:0]") are used to place processor 101 into the "cache
memory access" mode. This is accomplished when bit pattern `011` is
detected on the reserved pins RSVD[4:2]. Reserved pins RSVD[4:0]
are provided for general testing purpose, such as testing the cache
memories 102a and 102b as provided by the present invention. To
avoid accidentally placing processor 101 into the a testing mode,
reserved pins RSVD[4:0] are each provided with a weak pull-down
device. Consequently, since the user of processor 101 will normally
leave reserved pins RSVD[4:0] floating, each of the reserved pins
RSVD[4:0] will settle at ground voltage.
[0028] When cache memory access mode is entered, the CPU core 103
stalls to yield control of the data busses DATA[31:0] (108-2),
ADRLO[12:0] (107-1), TAG BUS (108-1) and the leads for the cache
control signals ICLK, DCLK, {overscore (IWr)}, {overscore (IRd)},
{overscore (DWr)} and {overscore (DRd)} to the external testing
device desiring to access the cache memory. Because processor 101
stalls in cache memory access mode, the signals on tag and data
buses TAG BUS (108-1) and DATA[31:0] and the control signals ICLK,
DCLK, {overscore (IRd)}, {overscore (DRd)}, {overscore (IWr)} and
{overscore (DWr)} are provided externally. In the cache memory
access mode, the pins ("{overscore (INT[5:0])} pins") normally
receiving interrupt signals {overscore (INT[5:0])}, and the
reserved pin RSVD[1] are used to provide these control signals from
the external testing device. Specifically, the {overscore (INT[0])}
pin provides a clock signal CA_CLK, the {overscore (INT[1])} pin
provides a read signal {overscore (CA_Rd)}, and the {overscore
(INT[2])} pin provides a write signal {overscore (CA_Wr)}. 1-6 5 In
addition, the signal ("I/{overscore (D)}") reserved pin RSVD[1]
indicates whether the signals on the {overscore (INT[2:0])} pins
are directed to data cache 102b (RSVD[1] at logic low) or the
instruction cache 102a (RSVD[1] at logic high). Using the signals
on these pins, the control signals ICLK, DCLK, {overscore (IRd)},
{overscore (DRd)}, {overscore (IWr)}, and {overscore (DWr)} are
generated internally. Under cache memory access mode, because the
combined width of the TAG, ADRLO, and DATA busses are 67 bits, and
when added to the number of the control signals, exceeds the total
number of functional pins (i.e. other than power and ground pins)
available, the pins ADBUS[31:0] and ADR[3:2], which are to be used
for reading or writing the cache memories 102a and 102b must be
time-multiplexed. Specifically, data flowing to and from the data
bus DATA[31:0](108-2), and the data flowing to and from the TAG BUS
(108-1) must occur at different phases of the CA_CLK. During a read
cycle (see below) the tag and data phases of the clock are
indicated by the logic state of the signal ("T/{overscore (D)}") on
the {overscore (INT[5])} pin. Consequently, the following pin
assignments are made:
1 FUNCTIONAL MODE CACHE MEMORY ACCESS MODE {overscore (INT [0])}
CA_CLK {overscore (INT [1])} {overscore (CA_Rd)} {overscore (INT
[2])} {overscore (CA_Wr)} {overscore (INT [5])} T/{overscore (D)}
RSVD[1] I/{overscore (D)} ADBUS[31:11] TAG[31:11], DATA[31:11]
ADBUS[1O:4] ADRLO[1O:4], DATA[1O:4] ADBUS [3:2] ADRLO[12:11],
DATA[3:2] ADBUS[0] TAGV ADR[3:2] ADRLO[3:2]
[0029] In order to provide time-multiplexing of ADBUS[31:0],
control signals must be generated according to (i) whether a read
cycle or a write cycle is desired, (ii) whether data is to flow
between the ADBUS[31:0] and which one of the TAG BUS 108-1, the
ADRLO[12:0] bus 107-1, and the DATA[31:0] bus 108-2. A set of
control signals TEST[4:2, 0] are generated accordingly. Some
control signals generated from the values of the control pins
discussed above for accomplishing the present invention are
summarized in FIG. 4.
[0030] As shown above, each bit on an external pin (any pin on the
ADBUS[31:0] bus or the ADR[3:2] bus) is time-multiplexed between a
bit on the DATA[31:0] bus 108-2 and a bit from either the TAG BUS
108-1 or the ADRLO[12:0] bus 107-1. The present invention provides
datapaths between an ADBUS bit and its corresponding DATA (108-2)
bit and ADRLO (107-1) or TAG BUS (108-1) bit in the manner provided
in FIG. 5. As shown in FIG. 5, an external pin 501 is provided with
both receiving (i.e. input) and driving (i.e. output) abilities by
input buffer 505 and output buffer 504 respectively. When
inputting, the output buffer 504 is disabled by control signal
ADOUTEN (ADBUS output enable). The input buffer 505 is always
enabled. During functional ocprations, pin 501 is multiplexed
between the read buffer 106-3 (FIG. 1b) and the write buffer 106-4.
An output signal from write buffer 106-4, for example, is provided
on lead 513 for output to pin 501 through tristate buffers 511 and
504. Tristate buffer 511 is controlled by NOR gate 512, which
receives as input signals the control signals TEST[0] and TEST[2].
During cache access mode, however, the write buffer 106-4 and the
read buffer 106-3 are deselected by placing tristate buffer 511 in
the high impedance state.
[0031] Depending on whether pin 501 is associated with a TAG BUS
(108-1) bit or an ADRLO (107-1) bit, only one of the circuits
enclosed in the boxes 502 and 503 is present at any pin. Thus, FIG.
5 is a generalized data path description of one external pin. For
example, ADBUS[11], which is multiplexed between DATA[11] and
TAG[11] does not have the circuit enclosed in box 503. Alternative,
ADBUS[4], which is multiplexed between DATA[4] and ADRLO[4] does
not have the circuit enclosed in box 502.
[0032] As shown in FIG. 5, the signal received by input buffer 505
is provided to the tristate buffer 510 and to either the latch 506
or the tristate buffer 512 depending on whether pin 501 is
associated with the TAG BUS (108-1) or the ADRLO[12:0] bus (107-1).
Latch 506 is clocked by a signal TAG_LC, which is a derivative of
the clock signal CA_CLK driven from the {overscore (INT[0])} pin,
to latch a tag bit from pin 501. Tristate buffer 507 is controlled
by the control signal TEST[3] for driving the TAG BUS 108-1 at the
predetermined phase of the CA_CLK. In the circuit enclosed in box
503, a similar tristate buffer 512 is controlled by the control
signal TEST[4] to drive the ADRLO[12:0] bus (107-1). When
outputting a TAG BUS (108-1) bit, the control signal TEST[2]
activates on tristate buffer 508.
[0033] To output a bit from DATA bus 108-2, tristate buffer 509,
which is controlled by control signal TEST[0], is activated.
Conversely, to input a bit from pin 501, tristate buffer 510, which
is controlled by control signal TEST[3], is activated.
[0034] FIG. 6 is a timing diagram showing a write cycle and a read
cycle for either the instruction cache memory 102a or the data
cache memory 102b, depending on whether the I/{overscore (D)}
signal on the RSVD[1] bus is at logic high (instruction cache), or
at logic low (data cache). As mentioned above, in the cache memory
access mode, the output signals of the read buffer 106-3 and 106-4
are deselected from their functional operation output pins
ADBUS[31:0].
[0035] As shown in FIG. 6, the write cycle, which is two {overscore
(SysClk)} periods long, is initiated at time t0. The cache address
ADR[12:2], in the order specified, is placed on the ADBUS[3:2,
10:4] and the ADR[3:2] pins. At the same time, the tag data to be
written TAG[31:11] and TAGV are placed on the ADBUS[31:11] and the
ADBUS[0] pins. The CA_CLK signal on the {overscore (INT[0])} pin
latches the ADRLO[12:2] data in the address latches of the cache
memory specified by the signal I/{overscore (D)} on the RSVD[1]
pin. At the same time, the tag data TAG[31:11] and the TAGV bit are
latched into latches provided, such as latch 506. The control
signal Test[4] is activated to drive the input signals on the
ADBUS[3:2], the ADBUS[10:4] and the ADR[3:2] pins onto the target
ADRLO bus. At the next {overscore (SysClk)} cycle, i.e. after time
t2, the data to be written DATA[31:0] are placed on the ADBUS[31:0]
pins. At time t3, the {overscore (CA_WR)} signal on the {overscore
(INT[1])} pin is asserted and both the tag data TAG[31:11]
previously latched, and the data DATA[31:0] on the ADBUS[31:0] are
written into the location specified by ADRLO[12:2] in the selected
cache memory. The control signal TEST[3] is activated to drive the
both signals on ADBUS[31:0] and the tag data previously latched
onto the respective targets, i.e. the DATA[31:0] bus (108-2) and
the TAG BUS (108-1).
[0036] At time t4, a read cycle is initiated. The address
ADRLO[12:2] of the location in the cache memory selected by the
I/{overscore (D)} signal on RSVD[1] is placed on the assigned
ADBUS[3:2, 10:4] and ADR[3:2] pins. At time t5, this address is
latched into the address latches of the selected cache memory, the
control signal TEST[4] having driven this address onto the
ADRLO[12:0] bus. At the same time, the T/{overscore (D)} signal on
the {overscore (INT[5])} pin goes to logic low to select
DATA[31:10] bus (108-2) for output in the next {overscore (SysClk)}
cycle, i.e. after time t6. At time t7, {overscore (CA_Rd)} signal
is asserted to cause the selected cache memory to place the tag and
data bits respectively onto the TAG BUS (108-1) and the DATA[31:0]
bus (108-2), and the control signal ADOUTEN enables the ADBUS[31:0]
pins for output. Control signal TEST[0] is also asserted to
activate tristate buffer 509, so as to allow the data on DATA[31:0]
bus (108-2) to be output on the ADBUS[31:0] pins. At time t8, the
signal T/{overscore (D)} on pin INT[5] goes to logic high,
activating control signal TEST[2] and deactivating control signal
TEST[0], so that the tag data on TAG BUS 108-1 (TAG[31:11] and TAGV
bit) can be output on the ADBUS[31:11] and ADBUS[0]. The read cycle
completes at time t10, when the read signal {overscore (CA_Rd)} is
negated.
[0037] Using these read and write cycles, every location in each of
the instruction cache memory 102a and the data cache memory 102b
can be accessed. Standard exhaustive memory testing algorithms can
be applied to each of the instruction and data cache memories 102a
and 102b. In addition, the present invention allows testing
processor 101 using methods requiring preloading the cache memories
with data and instructions. Further, during testing by an
in-circuit emulator, the contents of the cache memory can be
examined and monitored.
[0038] The above detailed description is provided to illustrate the
specific embodiments provided above, and not intended to be
limiting the present invention. Many modifications and variations
within the scope of the present invention are possible. The present
invention is defined by the following Claims.
* * * * *