Control of pipelined operation in a microcomputer system employing dynamic bus sizing with 80386 processor and 82385 cache controller Patent Grant Begun , et al. June 23, 1 [IBM Corporation]

Control of pipelined operation in a microcomputer system employing dynamic bus sizing with 80386 processor and 82385 cache controller

Begun , et al. June 23, 1

Patent Grant 5125084

U.S. patent number 5,125,084 [Application Number 07/198,894] was granted by the patent office on 1992-06-23 for control of pipelined operation in a microcomputer system employing dynamic bus sizing with 80386 processor and 82385 cache controller. This patent grant is currently assigned to IBM Corporation. Invention is credited to Ralph M. Begun, Patrick M. Bland, Mark E. Dean.

United States Patent	5,125,084
Begun , et al.	June 23, 1992

**Please see images for: ( Certificate of Correction ) **

Control of pipelined operation in a microcomputer system employing dynamic bus sizing with 80386 processor and 82385 cache controller

Abstract

Any incompatibility between pipelined operations (such as is available in the 80386) and dynamic bus sizing (allowing the processor to operate with devices of 8-, 16- and 32-bit sizes) is accommodated by use of an address decoder and ensuring that device addresses for cacheable devices are in a first predetermined range and any device addresses for non-cacheable devices are not in that predetermined range. Since by definition cacheable devices are 32-bit devices, pipelined operation is allowed only if the address decoder indicates the access is to a cacheable device. In that event, a next address signal is provided to the 80386. This allows the 80386 to proceed to a following cycle prior to completion of the previous cycle. For accesses which are to devices whose address indicate they are non-cacheable, a next address signal is withheld until the cycle is completed, i.e. without pipelining. The invention further provides for proper interface between a DMA mechanism (driven by a first clock) and a CPU local bus subsystem (driven by an entirely different clock). Data provided by the DMA mechanism is latched into an interface between the CPU local bus and the system bus, and a DMA cycle completed. Only after completion of the DMA cycle is detected, is the cycle on the CPU local bus allowed to complete. In this fashion, the CPU can go on to a following operation and be assured that the DMA mechanism is no longer driving the system bus.

Inventors:	Begun; Ralph M. (Boca Raton, FL), Bland; Patrick M. (Delray Beach, FL), Dean; Mark E. (Delray Beach, FL)
Assignee:	IBM Corporation (Armonk, NY)
Family ID:	22735315
Appl. No.:	07/198,894
Filed:	May 26, 1988

Current U.S. Class:	712/33; 711/E12.021; 712/E9.055; 711/200
Current CPC Class:	G06F 12/0888 (20130101); G06F 9/3802 (20130101)
Current International Class:	G06F 12/08 (20060101); G06F 9/38 (20060101); G06F 009/38 ()
Field of Search:	;364/200,900

References Cited [Referenced By]

U.S. Patent Documents


4016541	April 1977	Delagi et al.
4257095	March 1981	Nadir
4314334	February 1982	Daughton et al.
4315308	February 1982	Jackson
4480307	October 1984	Budde et al.
4503534	March 1985	Budde et al.
4649476	March 1987	Sibigtroth
4853846	August 1989	Johnson et al.
4933845	June 1990	Hayes

Other References

Burn's et al., within the 68020, Jan. 1987, pp. 103-106. .
Intel, Introduction to the 80386, Sep. 1985, pp. 5-1 to 5-8. .
Gandhi, Cache-Controller auf einem chip, Elektronik, Jan. 22, 1988, pp. 54, 55, 58, 60. .
Dean et al., "16-Bit Data Transfer Using 8-Bit Direct-Memory Access Controller", IBM TDB, vol. 27, No. 3, Aug. 1984, pp. 1699-1700. .
Intel Corporation, "Introduction to the 80386 Including the 80386 Data Sheet", Apr. 1986. .
Intel Corporation, "82385 High Performance 32-Bit Cache Controller", Jul. 1987. .
Intel Corporation, "80386 Hardware Reference Manual"-Chapter 7, Cache Subsystems, 1986..

Primary Examiner: Shaw; Gareth D.
Assistant Examiner: Von Buhr; Maria N.
Attorney, Agent or Firm: Black; John C.

Claims

We claim:

1. A microcomputer system comprising:

a processor of a given bit width for executing instructions,

first means including the processor for generating an address of a next instruction and initiating fetching of said next instruction during the execution of at least some currently executing instructions, whereby instructions may be prefetched for pipelined instruction execution,

additional means including the processor for generating a next instruction address and initiating fetching of a next instruction upon completing execution of a currently executing instruction for non-pipelined instruction execution,

a cache memory subsystem of said given bit width, having an address range for cacheable data, and coupled to said processor by a local bus,

a further bus connecting said local bus with addressable other components having said given bit width and with at least one addressable component of a reduced bit width for transferring data between said components and said processor and subsystem via said further bus and local bus during execution of certain of said instructions, at least some of said components having said given bit width having an address in said address range of said cache subsystem and at least one component of reduced bit width having an address outside said address range;

said processor further including means responsive to the execution of each of said certain instructions for asserting on said local bus a component address alternatively within or outside of said address range,

address decoder means responsive to each said asserted address on said local bus during the execution of said certain instructions for generating a signal indicating whether said asserted address is alternatively within or outside of said address range, and

logic means responsive to the signal generated by said address decoder for rendering said first means effective to generate a next instruction address for pipelined operation when said address decoder means indicates an asserted address within said address range and for rendering said additional means effective to generate a next instruction address for non-pipelined operation when said address decoder means indicates an asserted address outside said address range.

2. A system as recited in claim 1 further comprising:

a first interface coupled between said local bus and said further bus, and wherein said further bus comprises:

a system bus directly connected to said interface, at least some of said other components connected to said system bus;

a second interface directly connected between said system bus and an optional feature bus, where at least some of said other components are coupled to said optional feature bus for transferring data between said last mentioned components and said processor and subsystem via said first and second interfaces, said system bus and said optional feature bus.

3. A system as recited in claim 2 wherein said one component of a reduced bit width is directly connected to said optional feature bus for transferring data with said processor and subsystem via said first and second interface, said system bus and said optional feature bus.

4. A system as recited in claim 1 further comprising:

DMA control means coupled directly to said further bus and including

means effective during the execution of said certain instructions by the processor to initiate DMA cycles for controlling communication and data transfer between the processor and said other components via said local bus and said further bus, and

means effective at the end of each DMA cycle for notifying the additional means that the execution of a currently executing instruction is completed whereby the processor can initiate fetching of a next instruction.

5. A microprocessor system comprising:

a CPU, a cache subsystem having an address range for cacheable data and a local bus interconnecting said CPU and subsystem,

a system bus connected to the local bus by a buffer means for coupling functional units to said CPU and cache subsystem,

said CPU, cache subsystem, local bus, system bus and certain of said functional units having equal bit widths and others of said functional units having a smaller bit width,

at least some of said certain functional units, but none of said other functional units, having addresses within the address range of the cache subsystem,

said CPU including means for initiating a next cycle of operation,

an address decoder responsive to functional unit addresses generated by the CPU on the local bus during cycles of operation to determine whether or not they fall within the address range of the cache subsystem, and

logic means responsive to the address decoder for controlling operation of the CPU means by allowing it to proceed to a next cycle of operation prior to completion of a current cycle of operation when the decoded address is within said address range or alternatively stopping it from proceeding to the next cycle of operation until completion of the current cycle of operation when the decoded address is outside said address range.

6. A microprocessor system comprising:

a CPU, a cache subsystem having an address range for cacheable data and a local bus interconnecting said CPU and subsystem, said CPU, cache subsystem and local bus having equal bit widths,

a system bus coupled to the local bus by a buffer and an optional feature bus coupled to the system bus by an interface for coupling respective functional units to the CPU and cache subsystem,

said system bus, optional feature bus and certain of said functional units having a bit width equal to that of said CPU, cache subsystem and local bus and others of said functional units having a smaller bit width;

at least some of said certain functional units, but none of said other functional units, having addresses within the address range of the cache subsystem,

said CPU including means for initiating a next cycle of operation,

an address decoder responsive to functional unit addresses generated by the CPU on the local bus during each cycle of operation to determine whether or not they fall within the address range of the cache subsystem, and

logic means responsive to the address decoder for controlling operation of the CPU means by allowing it to proceed to a next cycle of operation prior to completion of a current cycle of operation when the decoded address is within said address range or alliteratively stopping it from proceeding to the next cycle of operation until completion of the current cycle of operation when the decoded address is outside said address range.

7. A microprocessor system comprising:

a CPU, a cache subsystem and a local bus interconnecting the CUP and cache subsystem, said CPU, cache subsystem and local bus having equal data bus widths,

said CPU including means for executing data transfer instructions directed to functional units and for generating a functional unit address on the local bus during the execution of each of said instructions;

a system bus coupled to said local bus for routing data transfers to and from the CPU during execution of said data transfer instructions;

said cache subsystem having a range of addresses including addresses assignable to functional unit having said equal data bus width;

addresses outside of said range of addresses including addresses assignable to other functional units of said equal data bus width and of smaller data bus widths;

said CPU including means for initiating a next instruction cycle;

an address decoder responsive to a functional unit address generated by the CPU on the local bus during the execution of each of said data transfer instructions to determine whether or not it falls within the address range of the cache subsystem; and

logic means responsive to the address decoder for controlling operation of the CPU means by allowing it to proceed to a next instruction cycle prior to completion of a current instruction cycle when the decoded address is within said address range or alliteratively stopping it from proceeding to the next instruction cycle until completion of the current instruction cycle when the decoded address is outside said address range.

8. A microcomputer system comprising: a processor of a given bit width for executing instructions including data transfer instructions directed to functional units,

first means including the processor for generating an address of a next instruction and initiating fetching of said next instruction during the execution of at least some currently executing instructions, whereby instructions may be prefeteched for pipelined instruction execution,

additional means including the processor for generating a next instruction address and initiating fetching of a next instruction upon completing execution of a currently executing instruction for non-pipelined instruction execution,

a cache memory subsystem of said given bit width, having an address range for cacheable data, and coupled to said processor by a local bus,

a system bus for routing data transfers to and from the processor via said system bus and local bus during execution of said data transfer instructions,

addressees within said address range including addresses assignable to functional units having said given bit width,

addresses outside of said address range including addresses assignable to other functional units having said given bit width and to functional units having a smaller bit width than said given bit width,

said processor further including means responsive to the execution of each of said data transfer instructions for asserting on said local bus a functional unit address alternatively within or outside of said address range,

address decoder means responsive to each said asserted address on said local bus during the execution of said data transfer instructions for generating a signal indicating whether said asserted address is alternatively within or outside of said address range, and

logic means responsive to the signal generated by said address decoder for rendering said first means effective to generate a next instruction address for pipelined operation when said address decoder means indicates an asserted address within said address range and for rendering said additional means effective to generate a next instruction address for non-pipelined operation when said address decoder means indicates an asserted address outside said address range.

9. A microprocessor system comprising:

a CPU for executing certain instructions including data transfers directed to functional units and for generating functional unit address information on a local bus during the execution of said instructions;

a cache subsystem and a local bus interconnecting said CUP and subsystem;

a system bus and a buffer coupling the system bus to the local bus for routing said data transfers between the CPU and said functional units, said CPU, local bus and system bus having equal bit widths;

an address decoder responsive to said address information placed on the local bus by the CPU to determine whether or not the address falls within an address range of the cache subsystem;

logic means responsive to the address decoder during each data transfer for controlling operation of the CPU by allowing it to proceed to a next CPU instruction prior to completion of the data transfer when the decoded address is within said address range, and alliteratively stopping the CPU from proceeding to the next CPU instruction until completion of the data transfer when the decoded address is outside said address range;

addresses within said address range including addresses assignable to functional units having said equal data bus width; and

addresses outside said address range including addresses assignable to other functional units having said equal data bus width and functional units having a data bus width smaller than said equal data bus width.

Description

TECHNICAL FIELD

The present invention relates to improvements in microcomputer systems, particularly microcomputer systems which employ dynamic bus sizing and pipelined operations, and more particularly in such systems which further include a cache subsystem.

BACKGROUND ART

Background information respecting the 80386, its characteristics and its use in microcomputer systems including cache memory subsystems are described in Intel's "Introduction to the 80386", April 1986 and the 80386 Hardware Reference Manual (1986). The characteristics and operating performance of the 82385 are described in the Intel publication "82385 High Performance 32-Bit Cache Controller" (1987).

In microcomputer systems, as in other computer systems, speed of operations is an important criterion which in most cases has to be balanced against system cost. Many features which were first introduced to speed up operations in mainframe computers and minicomputers are now finding their way into microcomputer systems. These include cache memory subsystems and pipelined operations.

In some microcomputer systems (for example those employing the Intel 80386), pipelined operations are an attractive operating option. For 80386/82385 microcomputer systems the 82385 cache controller provides an NA signal which it is suggested should be connected to a corresponding input on the 80386. This allows the 82385 to assert the NA signal to the 80386 prior to completion of a given bus cycle to enable the 80386 to output information (data, address and/or control) for the next operating cycle. The timing is arranged, along with connected components, so that if a cache operation is the given operation, information for the given operation has already been accepted by the cache memory and therefore changing the 80386 output information to correspond to the next cycle will not interfere with the operation which is in the process of being completed. This also holds true for situations in which a cache miss condition has occurred requiring reference not to the cache memory but to main memory since access to main memory is via latched buffers which therefore store information respecting the given operation.

The 80386 also has the capability of operating with what is referred to as dynamic bus sizing. The 80386 is nominally a 32-bit machine, i.e. the width of the data bus is 32 bits. However, the 80386 can operate with 16-bit devices (devices which will transfer only 16 bits of data) by responding to a BS16 signal which is provided to the 80386 to indicate the presence of a 16-bit device. The BS16 signal is important to the 80386 in the event that it has performed a 32-bit operation. Of course the 16-bit device cannot, in one operation, transfer the 32 bits of data which the 80386 is capable of generating and/or accepting. Thus, a second operation (cycle) is necessary and the 80386 responds to the BS16 signal to automatically generate the next, necessary, cycle.

In order for this type of operation to be successful when the 82385 is present it is also necessary for the 82385 to have information respecting the character of the device taking part in the operation so that the NA signal to the 80386 is not improvidently generated. As described in the referenced Intel publications the 82385 requires device information at the beginning of a cycle in contrast to the 80386 which can accept and act on device information substantially later in the cycle.

In some microcomputer systems, this constraint is accepted by correlating the address of a device with its size so that for example all 32-bit devices are in a first predetermined address range and all 16-bit devices are in a second, different address range. By this technique then, the size of a device can be determined by its address.

However, in systems which require the flexibility of locating a wide variety of devices within widely varying address ranges, this constraint cannot be complied with. One class of such systems are those employing the IBM Micro-Channel (TM) bus.

In many systems, including systems of the foregoing type, a device when addressed will return an indication of its size. However, since the 82385 requires device size information to be available at the beginning of a bus cycle, in systems where this information is only available later in the cycle, there is the possibility of an improvident NA signal.

Therefore it is an object of the present invention to provide an arrangement in an 80386/82385 microcomputer system which selectively controls the NA signal so that it is coupled to the 80386 only under appropriate circumstances. In other circumstances the NA signal is prevented from reaching the 80386. Preventing the NA signal from reaching the 80386 prevents pipelined operation, i.e. it prevents the generation of information for a next cycle before the given cycle has been completed. With this flexible control of the NA signal, the microcomputer system can now operate with dynamic bus sizing, i.e. it can now operate with devices of various sizes without information at the beginning of the cycle as to the size of the device being operated with.

Cacheable devices, i.e. devices generating data which can be cached, are necessarily 32-bit devices. All such devices have addresses (specifically, tags) indicating cacheability. All other devices (whether or not 32-bit devices) have addresses indicating information they generate will not be found in cache. Control of the NA signal depends in part on cacheability. Specifically, since any cacheable device is a 32-bit device, there is no question of improvident NA signals when operating with cacheable devices. Furthermore, in the event of a cache hit, there is no question but that the NA signal is appropriate since only cacheable devices (32-bit devices) will produce a cache hit. Thus an important feature of allowing dynamic bus sizing and pipelined operations is preventing or inhibiting the NA signal from reaching the 80386 in the event of assertion of an address to a non-cacheable device.

If, during the course of a given cycle (where the NA signal has been inhibited), it turns out the device is a 32-bit device, then the cycle completes in normal fashion (without pipelining). On the other hand, if it turns out during the course of the cycle that the device being operated with is a 16-bit device (and the cycle is a 32-bit cycle), then the 80386 generates the additional, necessary cycle. This operation is entirely transparent to the 82385, i.e. the 82385 takes no part in controlling the second of the two cycles. Finally, if the device being operated with turns out to be an 8-bit device, then the 80386 "sees" a 16-bit device so that it operates exactly as has been described for the case when the device is a 16-bit device. In other words, the 80386 "sees" a 16-bit device so that after the first cycle is completed (during which cycle it is informed that it is operating with a 16-bit device), a second cycle is generated. However, not only transparent to the 82385, but also transparent to the 80386, other logic converts each of the 16-bit cycles of the 80386 into two 8-bit cycles. Thus in the case where a 32-bit cycle is directed at an 8-bit device, the 82385 takes no part in the operation, the 80386 generates two 16-bit cycles, and other logic operates on each of the 80386 16-bit cycles to generate two 8-bit cycles.

Thus in accordance with one aspect, the invention provides a microcomputer system with pipelined instruction sequencing responsive to a next address signal prior to completion of a pending operation, said microcomputer system comprising:

a processor of a given bit width,

a cache memory subsystem of said given bit width coupled to said processor by a local bus,

a further bus connecting said local bus with other components having said given bit width and with at least one component of a reduced bit width, at least some of said components having said given bit width having an address in an address range associated with said cache subsystem and said at least one component of reduced bit width having an address outside an address range associated with said cache subsystem, wherein said microcomputer system further includes:

a) address decoder means responsive to an asserted address on said local bus for generating a signal indicating whether said asserted address is or is not within said address range associated with said cache memory subsystem, and

b) logic means responsive to said signal from said address decoder means for generating a next address signal to said microprocessor for pipelined operation unless said address decoder means indicates an asserted address outside a range associated with said cache subsystem.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an overall three-dimensional view of a typical microcomputer system employing the present invention;

FIG. 2 is a detailed block diagram of a majority of the components of a typical microcomputer system employing the present invention;

FIG. 3 is a block diagram, in somewhat more detail than FIG. 2, showing those connections between the 80386 and 82385 recommended by the manufacturer and which is useful to an understanding of why such an arrangement is inconsistent with dynamic bus sizing;

FIG. 4 is a more detailed block diagram similar to FIG. 3 but illustrating application of the present invention which allows both pipelined instruction sequencing and dynamic bus sizing;

FIG. 5 is further detailed block diagram of apparatus which handles 16-to-8-bit cycle conversion transparent to both the 82385 and 80386; and

FIGS. 6 and 7 are a timing diagram and a block diagram of the latch DL and its control signal LEDMA (Latch Enable from DMA) to illustrate how the synchronous DMA subsystem is interfaced to the CPU local bus (operating under a different clock) to avoid possible contention on the system bus 250.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

FIG. 1 shows a typical microcomputer system in which the present invention can be employed. As shown, the microcomputer system 10 comprises a number of components which are interconnected together. More particularly, a system unit 30 is coupled to and drives a monitor 20 (such as a conventional video display). The system unit 30 is also coupled to input devices such as a keyboard 40 and a mouse 50. An output device such as a printer 60 can also be connected to the system unit 30. Finally, the system unit 30 may include one or more disk drives, such as the disk drive 70. As will be described below, the system unit 30 responds to input devices such as the keyboard 40 and the mouse 50, and input/output devices such as the disk drive 70 for providing signals to drive output devices such as the monitor 20 and the printer 60. Of course, those skilled in the art are aware that other and conventional components can also be connected to the system unit 30 for interaction therewith. In accordance with the present invention, the microcomputer system 10 includes (as will be more particularly described below) a cache memory subsystem such that there is a CPU local bus interconnecting a processor, a cache control and a cache memory which itself is coupled via a buffer to a system bus. The system bus is interconnected to and interacts with the I/0 devices such as the keyboard 40, mouse 50, disk drive 70, monitor 20 and printer 60. Furthermore, in accordance with the present invention, the system unit 30 may also include a third bus comprising a Micro Channel (TM) architecture for interconnection between the system bus and other input/output devices.

FIG. 2 is a high level block diagram illustrating the various components of a typical microcomputer system in accordance with the present invention. A CPU local bus 230 (comprising data, address and control components) provides for the connection of a microprocessor 225 (such as an 80386), a cache control 260 (which may include an 82385 cache controller) and a random access cache memory 255. Also coupled on the CPU local bus 230 is a buffer 240. The buffer 240 is itself connected to the system bus 250, also comprising address, data and control components. The system bus 250 extends between the buffer 240 and a further buffer 253.

The system bus 250 is also connected to a bus control and timing element 265 and a DMA controller 325. An arbitration control bus 340 couples the bus control and timing element 265 and a central arbitration element 335. The main memory 350 is also connected to the system bus 250. The main memory includes a memory control element 351, an address multiplexer 352 and a data buffer 353. These elements are interconnected with memory elements 361 through 364, as shown in FIG. 2. Address lines 390 couple the address multiplexer 352 to the memory elements 361-364. Memory data lines 400 couple the data buffer 353 to the memory elements 361-364 inclusive.

A further buffer 267 is coupled between the system bus 250 and a planar bus 270 of I/O subsystem 200. The planar bus 270 includes address, data and control components, respectively. Coupled along the planar bus 270 are a variety of I/O adapters and other components such as the display adapter 275 (which is used to drive the monitor 20), a clock 280, additional random access memory 285, an RS 232 adapter 290 (used for serial I/O operations), a printer adapter 295 (which can be used to drive the printer 60), a timer 300, a diskette adapter 305 (which cooperates with the disk drive 70), an interrupt controller 310 and read only memory 315.

The buffer 253 (coupled on one side to the system bus 250) is coupled on the other side to the Micro-Channel (TM) sockets 401 and 330 of the Micro-Channel (TM) bus 320. Bus 320 can support devices of various size including 32-bit devices, 16-bit devices and 8-bit devices. A typical memory device 331 is shown coupled on the bus 320. One of the advantages of this bus is the flexibility the user is provided with to freely insert various devices in the different sockets, at will. The mixing of these devices and their appearance at different sockets prevents rigid addressing allocation and is a prime reason for the necessity of the present invention. Absent the present invention, the 80386/82385 microcomputer system as described herein would be either wholly prevented from using pipelining operations or alternatively subjected to faulty operation caused by improvident or inappropriate pipelining operations.

FIG. 3 shows selected interconnections between the 80386 and 82385, relative to generation and use of the Next Address (NA) signal, as recommended by the manufacturer. More particularly, the interface between the CPU local bus 230 and the system bus 250 is the buffer 240 whose components include a control buffer DL (for data) and AL (for address). As shown in FIG. 3, the data component of the CPU local bus (CPULBD) is one input/output connection for the DL, on the CPU local bus side. Likewise, the data component SBD of the system local bus 250 is the input/output connection to the DL, on the side of the system bus 250. The buffer DL is controlled by its input signals LDSTB, BT/R and DOE. LDSTB latches data into DL and DOE enables the output of DL. The direction from which input is accepted (CPU local bus 230 or system bus 250) and to which output is directed (system bus 250 or CPU local bus 230) is determined by BT/R.

Likewise, the address component of the buffer AL has an input from the address component of the CPU local bus (CPULBA) and an output to the address component of the system local bus 250 (SBA). In a similar fashion, the AL component of the buffer is controlled by its two control signals BACP and BAOE, the former latches address information from CPULBA into AL and the latter enables the output, i.e. to the system bus 250. The control signals LDSTB, DOE, BACP and BAOE are provided for by the 82385.

The address component CPULBA of the CPU local bus 230 originates at the 80386 and the data component CPULBD of the CPU local bus 230 originates/terminates at the 80386.

The 82385 uses two additional signals X16 and NCA in the generation of the NA signal. The X16 signal indicates whether the device which is being operated with is a 16-bit or 32-bit device. One source of the signal X16 is the Local Bus Address Decoder (LBAD). In addition, the manufacturer recommends that the signal BS16 be tied to the signal X16. The signal BS16 is input to the 80386 to indicate to the 80386 whether the device being operated with is 16 or 32 bits in size. A source of the signal BS16 is a device address decoder, such as MAD. MAD represents the address decoder of any device coupled to the system bus 250 or any other bus coupled thereto such as the bus 270 or 320. When a device such as MAD recognizes its address on a bus address component such as MA, it returns the signal BS16, which is high or low depending on whether the device is a 16- or 32-bit device.

The Non-Cacheable Address (NCA) is another input to the 82385 which indicates whether an address asserted is a cacheable address. A cacheable address is defined as an address for a device which is compatible with the cache subsystem and hence is a 32-bit device. This signal is generated by the Cache Address Decoder (CAD) by virtue of an input from the address component of the CPU local bus 230. The manufacturer suggests that a portion of memory can be set aside as non-cacheable by use of this input to the 82385.

The recommended architecture illustrated in FIG. 3 provides for effective pipelined operations (by use of the NA signal) under any one of the following circumstances:

1) where all devices connected to the system bus 250 or any bus coupled thereto are 32-bit devices, or

2) where the devices connected to the system bus 250 or any bus coupled thereto may be 32-bit or 16-bit devices, so long as the signal X16 is available at the beginning of any bus cycle.

When the X16 signal is available at the beginning of a given bus cycle, the 82385 based on that information determines whether the NA signal is appropriate, and if it is not appropriate, it is not generated. Thus in the event the 80386 is operating with a 32-bit device, then the NA signal is normally generated prior to completion of a given cycle for effecting pipelined operation. On the other hand, if the device is a 16-bit device, then the NA signal is not generated, allowing the 80386 to generate the second necessary cycle for proper operation with a 16-bit device.

However, the architecture of some bus subsystems (for example the Micro-Channel (TM) bus) provides for the mixing of 8-bit and 16-bit and 32-bit devices. Because the signal BS16 is not returned until sometime after the typical device MAD has recognized its address on the bus subsystem, it is not generated early enough for the 82385 to take into account in determining whether to generate NA. Accordingly, the peculiarities of the 82385 make it incompatible, particularly in the generation of the NA signal, with bus subsystems having the flexibility such as is exhibited by the Micro-Channel (TM) bus.

FIG. 4 is a block diagram similar to FIG. 3 but altered for use in accordance with the present invention. Before describing the operation of the apparatus shown in FIG. 4, a number of the differences between FIGS. 3 and 4 will be described.

One major difference is that the outputs X16 and NA from the 82385 are no longer utilized. In addition, a new logic element CL is interposed between the 82385 and the 80386. The logic element CL generates the CPUNA signal (replacing the unused NA signal from the 82385). The logic unit CL is responsive to the NCA signal from the CADR and BADS signal from the 82385 (indicating whether there is a cache miss). The logic element CL also has an input BW/R indicating whether any system bus cycle is a read or a write. As will be described, while BW/R is used in an embodiment of the invention, it is a signal which is not essential.

The decoder CADR receives, in addition to its inputs from bits A17-A26 and A31 of the CPULBA, Programmable Cacheability Bits (PCB). In an embodiment actually constructed, the first of these three bits represents a decision to allow caching ROM space. A second bit either enables the decoder CADR to output NCA based on the decoded information, or alternatively, to declare all accesses non-cacheable, i.e. assert NCA regardless of the asserted address on the CPULBA. Finally, a third bit indicates, in one state, that address space between 0 and 8 megabytes is cacheable and that address space between 8 to 16 megabytes is non-cacheable. In the other state, the bit indicates that address space in the range 0 to 16 megabytes is all cacheable. If desired, the effect of this bit can be multiplied by simply reflecting the status of the first 16 megabytes through the succeeding 16 megabyte ranges. Of course, the use of these particular cacheability indicators (or others) is not essential to the invention. It is, however, important to arrange the CADR so that it can quickly decode its inputs and produce NCA. In an embodiment actually constructed, the timing window for production of NCA, from valid addresses on the CPULBA, was very short, on the order of ten nanoseconds.

A further major difference between FIGS. 3 and 4 is that the typical MAD now generates not a single BS16 signal (as is the case in FIG. 3), but two signals, CS32 and CS16. This enables the MAD to identify itself as a 32-bit device, a 16-bit device or an 8-bit device. The signal CS32 is input to a flip-flop FF which is now used to generate the BS16 signal which is in one condition in the event the MAD is a 32-device and in a different condition if the MAD is an 8-bit or a 16-bit device. If neither CS32 nor CS16 is generated by the device, then by default the device is an 8-bit device.

The logic equations related to the generation of CPUNA are reproduced immediately below. In this material the symbols have the following meanings associated with them:

______________________________________ Symbol Definition ______________________________________ / Negation := A registered term, equal to = A combinatorial term, equal to & Logical AND + Logical OR ______________________________________ ##EQU1##

In the foregoing logic equations the following signals are described or referred to in the cited Intel publications:

BADS

BREADY

(BW/R) actually referred to as BW/R, the parenthesis are used to indicate that the entire term is one signal

CLK

(W/R) actually referred to as W/R, the parenthesis are used to indicate the entire term is one signal

BADS, when active indicates a valid address on the system bus 250. BREADY is a ready signal from the system bus 250 to the CPU local bus 230. BW/R defines a system bus 250 Write or Read. CLK is a processor clocking signal which is in phase with the processor 225. (W/R) is the conventional Write or Read signal for the CPU local bus 230.

Equations (1)-(5) define:

BT2

BUSCYC385

CPUNA

MISSl

PIPECYC385

in terms of the defined signals, the signals described or referred to in the cited Intel publications and NCA and NACACHE.

BT2 reflects the state of the system bus 250. The state BT2 is a state defined in the cited Intel publications.

BUSCYC385 also reflects the state of the system bus 250. It is high for bus states BTI, BT1 BT1P and low for bus states BT2, BT2P and BT2I (again these are bus states referenced in the cited Intel publications).

CPUNA is a signal to the 80386 allowing pipelined operation. This is the signal which replaces the NA signal from the 82385.

MISS1 is active defining the first cycle in a double cycle for handling 64 bit read to cacheable devices.

PIPECYC385 is active during BT1P (which is a bus state referred to in the cited Intel publications.

NCA is a signal created by decoding the address component on the CPU local bus 230 to reflect, when active, a non-cacheable access. Cacheability is determined by a tag component (A31 to A17) and programmable information defining what tags (if any) refer to cacheable as opposed to non-cacheable addresses.

NACACHE is a signal similar to the BNA signal. BNA is a system generated signal requesting a next address from the CPU local bus 230, and is referenced in the cited Intel publications. NACACHE differs from BNA only in respect of the fact that BNA is created for 32K cache while NACACHE is created for a 64K cache. So long as the cache memory is 32K, as cited in the Intel publications the NACACHE signal referred to here could be replaced by the BNA signal.

Reviewing Equation 3, the set terms are found in lines 1 and 2. The term on line 1 indicates a pipelined operation. /MISS1 indicates a 32-bit operation (as will be described) and so CPUNA is appropriate. The term on line 2 is a cycle that ends early without pipelining, e.g. /BREADY. Referring to Equation 4, the terms on lines 1 and 2 are the set terms. Both depend on NCA.

FIG. 5 is a detail of apparatus associated with the buffer and decoder 267 (associated with bus 270) and buffer 253 (associated with a Micro-Channel (TM) bus 320). As has already been described, the apparatus of FIG. 4 provides for two 16-bit cycles when 32-bit cycles are directed to 8- or 16-bit devices, and does so in a manner transparent to the 82385. The apparatus of FIG. 5 is employed so as to generate, for each 16-bit cycle directed to an 8-bit device, two 8-bit cycles, in a manner which is transparent to the 80386.

More particularly, the apparatus includes control logic CLA, a sequencer S and a controlled coupler ALS which couples selective ones of the data lines to others of the data lines. More particularly, as shown in FIG. 5, the low order eight bits (D0-D7) can be coupled to the second 8-bit segment (D8-D15) under the control of the signal DIR and /G. Further, the direction of transfer (left to right or vice versa) is also controlled by DIR.

More particularly, in any 16-bit cycle generated by the 80386, data is placed on the low order sixteen bits of the data bus. In order to convert a 16-bit cycle generated by the 80386 into two 8-bit cycles for the appropriate MAD, the hardware shown in FIG. 5 forces the 16-bit operation to look like two 8-bit operations to the 8-bit device MAD while to the 80386 the two 8-bit cycles look like a single 16-bit cycle.

The control logic CLA performs a decode function for the conversion cycle control. It detects when a conversion cycle must occur and controls the GATE 245 and DIR 245 signals of the ALS which connect the low order eight bits of the data bus to a second 8-bit segment of the data bus. The logic equations for CLA are listed below. Once the conversion cycle conditions have been detected, the signal DATACONV activates the sequence. The signal CS16, when inactive, signals presence of an 8-bit device and hence the necessity for the conversion. The signal CS16, when active, indicates a 16- or 32-bit device. Q1 is used in the sequence to indicate the end of the first half of the conversion cycle. The signals DATACONV and Q1 are input to the sequencer S. The sequencer S performs two functions. One is to hold the 80386 not ready until the second half of the cycle has been initiated. ARDYEN is driven to a logic 1 to signal the 80386 not to end the cycle. The sequencer also stimulates the end of one 8-bit cycle and the beginning of a second. ALTALE clocks a logic 1 onto the A0 address line and causes the wait state generator to restart. CNTLOFF causes the active command signal to go inactive and active again to stimulate the end of one cycle and the beginning of another. CONVA0 is used as an input to the A0 address latch to force A0 to a logic 1 for the second half of the conversion cycle.

The completed 16- to 8-bit conversion cycle is equivalent in time and function to two consecutive 8-bit cycles.

The signals referred to above and shown in FIG. 5 are defined as follows:

______________________________________ MEMCS16 16 bit memory device select (I/OCS16) 16 bit I/O device select. Parenthesis are used to indicate the expression relates to a single signal AEN1, AEN2 Address enables from DMA controllers MEMR, MEMW, CPU command signals IOR, IOW XBHE Bus high enable, indicates when data should appear on a higher order eight bit segment of the data bus XA0 Least significant address order line Q1 Wait state generator output (2d state output) D0-D7 Lowest order eight bit segment of the data bus D8-D15 Second eight bit segment of the data bus ARDYEN Asynchronous ready enable to ready synchronization logic. ADDS wait states in conversion cycle to provide for the generation of two 8 bit cycles. ALT ALE Alternate address latch. (active high) User to clock A0 to a logic one and to restart the wait state generator for second 8 bit transfer CONV.A0 Conversion cycle A0. (active high) Provides a logic one to the A0 address latch so that A0 will increment for the second 8 bit cycle CNTLOFF Control off. Turns off the active command signal for 3 clocks to simulate the end of one 8 bit cycle and the start of the second 8 bit cycle. CNTLOFF is also used to latch the low order data bus to the CPU (during a read cycle) on the first 8 bit transfer. An input to the system bus interface unit. ______________________________________ ##EQU2##

FIG. 7 shows a detail of the latch DL, the data input/output connections between the system bus 250 and the CPU local bus 230 as well as the control signal LEDMA. FIG. 6 relates the signal LEDMA to other signals on the CPU local bus 230 (indicated in the top third of FIG. 6 with the legend "82385 Local Bus Signals & 25 MHz"), signals on the optional feature bus 320 (indicated by the reference "Micro-Channel (TM)") bus 320 signals and signals on the system bus 250 (indicated in the bottom third by the legend "DMA Signals"). As indicated in the upper third of FIG. 6, the CPU local bus 230 is a synchronously clocked bus (clocks CPUCLK2 and CPUCLK are related as indicated). The central portion (comprising the major portion of FIG. 6) shows a single cycle on the CPU local bus 230. The beginning of the cycle is indicated by the left-most vertical line, and the end of the cycle is indicated by the right-most vertical line. As shown in FIG. 6 in the middle third, once a cycle has begun on the CPU local bus 230, for a cycle which extends through the system bus 250 to the bus 320, the command signals SS0* and SS1* become active (low). Shortly thereafter, an address is latched into the buffer 253 (referenced ADL*). Subsequently, the signal CMD* also becomes active (low). It is this point which defines the beginning of the DMA cycle on the system bus 250 and the optional feature bus 320. This is indicated by a continuation of the third vertical line (from the left) extending into the lower third of FIG. 6. Shortly after the beginning of the DMA cycle, the signals DMAS0 and DMAS1 become active (low). As seen in FIG. 6, the signal DMARDY* subsequently changes state. On the next transition in DMARDY* (the low going transition), the required data is located on the system bus 250. Accordingly, the signal LEDMA changes state (rising transition). This has the effect of latching the data on the system bus 250 into the latch DL. The same transition which produced the change of state in LEDMA also has an effect on BREADY* (as shown by the arrow in FIG. 6). In other words, once DMARDY* is sampled low, the required data has already latched into the latch DL and the DMA channel has released the bus 250. Thus as shown in FIG. 6, the transition in DMARDY* leads to a transition in BREADY*. On the next subsequent high going transition of BT2, the CPU cycle can safely end (and this is the right-most vertical line in FIG. 6). As seen in FIG. 6, ending the CPU cycle produces a change of state in CMD* (on the Micro-Channel (TM) bus 320) and also has the effect of releasing LEDMA, since at the termination of the cycle the CPU 225 accepted the data from the latch DL.

The foregoing illustrates that antecedent to terminating the CPU cycle, and a requirement for terminating the CPU cycle, is the prior termination of the DMA cycle. This guarantees that when the CPU initiates a following cycle (to the right of the right-most vertical line in FIG. 6), the DMA cycle had earlier completed and so the bus is free of any data that may have been driven during the course of the DMA cycle.

Thus the invention provides an accommodation between dynamic bus sizing (which provides the flexibility essential to the Micro-Channel (TM) bus 320) and pipelining operation through selected generation of CPUNA, at the appropriate time. As has been indicated, CPUNA depends either on detection of a cacheable access, where CPUNA is produced prior to completion of the pending cycle. On the other hand, where a non-cacheable access is detected, then pipelining is withheld, i.e. CPUNA is not asserted until the system bus 250 indicates that the cycle has completed (BREADY is asserted).

At the same time, an accommodation is created between the synchronous DMA mechanism and the synchronous (but driven by a different clock) CPU local bus through the buffer DL. While the DMA cycle is initiated by the CPU local bus 230, the cycle on the CPU local bus 230 will not terminate until after the termination of the DMA cycle.

While a particular embodiment of the invention has been described herein, it should be apparent that the invention can be embodied in many varied forms. Accordingly, the invention is not to be construed by the example described herein but rather is to be construed in accordance with the attached claims.

* * * * *