U.S. patent application number 10/212682 was filed with the patent office on 2003-01-23 for multiple-mode memory system.
Invention is credited to Elkhlifi, Fatima-Zahra, Lalama, Mustapha, Litaize, Daniel, Mzoughi, Abdelaziz, Sainrat, Pascal, Salinier, Jean-Clauke.
Application Number | 20030018880 10/212682 |
Document ID | / |
Family ID | 21822481 |
Filed Date | 2003-01-23 |
United States Patent
Application |
20030018880 |
Kind Code |
A1 |
Litaize, Daniel ; et
al. |
January 23, 2003 |
Multiple-mode memory system
Abstract
A system includes a first and second integrated circuit and
associated interconnect. The first integrated circuit is a memory
component and includes a RAM, one or more configuration registers,
and associated control logic. The second integrated circuit is a
memory controller that is a master to the memory component. The
behavior of the memory component, including selection from a number
of different operating modes, is controllable via configuration
register mode bits. The various modes include several
transfer-length modes, where each mode corresponds to data
transfers of a predetermined length. Based on the mode selection
specified by the mode bits, the controller determines the length of
the data transfers.
Inventors: |
Litaize, Daniel; (Saint
Orens de Gameville, FR) ; Salinier, Jean-Clauke;
(Ramonville Saint-Agne, FR) ; Mzoughi, Abdelaziz;
(Toulouse, FR) ; Elkhlifi, Fatima-Zahra;
(Toulouse, FR) ; Lalama, Mustapha; (Toulouse,
FR) ; Sainrat, Pascal; (Toulouse, FR) |
Correspondence
Address: |
WALSTEIN BENNETT SMITH III
P. O. BOX 1668
GEORGETOWN
TX
78628
US
|
Family ID: |
21822481 |
Appl. No.: |
10/212682 |
Filed: |
August 7, 2002 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10212682 |
Aug 7, 2002 |
|
|
|
10022421 |
Dec 20, 2001 |
|
|
|
10022421 |
Dec 20, 2001 |
|
|
|
09577447 |
May 24, 2000 |
|
|
|
6345321 |
|
|
|
|
09577447 |
May 24, 2000 |
|
|
|
08024803 |
Mar 1, 1993 |
|
|
|
6112287 |
|
|
|
|
08024803 |
Mar 1, 1993 |
|
|
|
07400113 |
Aug 14, 1989 |
|
|
|
Current U.S.
Class: |
712/11 ;
711/E12.025 |
Current CPC
Class: |
G06F 15/8015 20130101;
G06F 12/0813 20130101; G11C 7/1036 20130101 |
Class at
Publication: |
712/11 |
International
Class: |
G06F 015/00 |
Foreign Application Data
Date |
Code |
Application Number |
Dec 14, 1987 |
FR |
87.18103 |
Dec 9, 1988 |
PCT/FR88/00608 |
Claims
1. A multiprocessor system, of the type comprising a central memory
(RAM) organized in blocks of information (bi), treatment processors
(CPU.sub.1 . . . CPU.sub.j . . . CPU.sub.n), a cache memory
(MC.sub.j) connected to each treatment processor (CPU.sub.j) and
organized in blocks of information (bi) of the same size as those
of the central memory, a directory (RG.sub.j) and its management
processor (PG.sub.j) associated with each cache memory (MC.sub.j),
means for communicating addresses of blocks between processors
(CPU.sub.j) and the central memory (RAM), said multiprocessor
system being characterized in that it is provided with: an assembly
of shift registers, called memory registers (RDM.sub.1 . . .
RDM.sub.j . . . RDM.sub.n), each register (RDM.sub.j) of this
assembly being connected to the central memory (RAM) in such a
manner as to permit, in one cycle of this memory, a parallel
transfer in read or write of a block of information (bi) between
said register and said central memory, shift registers, called
processor registers (RDP.sub.1 . . . RDP.sub.j . . . RDP.sub.n),
each processor shift register (RDP.sub.j) being connected to the
cache memory (MC.sub.j) of a processor (CPU.sub.j) in such a manner
as to permit a parallel transfer in read or write of a block of
information (bi) between said shift register (RDP.sub.j) and said
cache memory (MC.sub.j), an assembly of series links (LS.sub.1 . .
. LS.sub.j . . . LS.sub.n), each connecting a memory shift register
(RDM.sub.j) and a processor shift register (RDP.sub.j) and adapted
to permit the transfer of blocks of information (bi) between the
two registers considered (RDM.sub.j, RDP.sub.j).
2. A multiprocessor system as in claim 1, characterized in that:
each memory shift register (RDM.sub.j) and each processor shift
register (RDP.sub.j) are doubled into two registers, one
specialized for transfer in one direction, the other for transfer
in the other direction, each series link (LS.sub.j) comprising two
unidirectional series links for transfer bib by bit, connecting the
doubled memory shift register (RDM.sub.j) and the corresponding
doubled processor shift register (RDP.sub.j), these links being
connected to said registers for permitting, for one, a transfer in
one direction, and for the other a transfer in the other
direction.
3. A multiprocessor system as in claim 1, characterized in that
each series linek (LS.sub.j) comprises a bidirectional link for bit
by bit transfer, connected to the memory shift register (RDM.sub.j)
and to the corresponding processor shift register (RDP.sub.j) and a
validation logic (LV) for the direction of transfer in such a
manner as to permit an alternate transfer in the two
directions.
4. A multiprocessor system as in one of claims 1, 2 or 3,
characterized in that the address communication means comprises a
common bus for parallel communication of addresses of blocks (BUSA)
connecting the processors (CPU.sub.j) and the central memory (RAM)
and a bus arbitrator (AB) adapted to manage access conflicts on
said bus.
5. A multiprocessor system as in one of claims 1, 2 or 3,
characterized in that the address communication means comprises a
complementary shift register (RDC.sub.j) connected on each series
link (LS.sub.j) in parallel with the corresponding memory shift
register (RDM.sub.j) in such a manner as to permit the transmission
of addresses by the series links and their loading in said
complementary shift registers (RDC.sub.j), an access management
arbitrator (ABM) being connected to said complementary shift
registers (RDC.sub.j) and to the central memory (RAM) in order to
select the addresses contained in said registers (RDC.sub.j) and to
manage the access conflicts to the central memory (RAM).
6. A multiprocessor system as in one of claims 1, 2, 3, 4 or 5,
comprising means for managing the data shared between processors,
in order to assure their coherence.
7. A multiprocessor system as in claim 6, characterized in that the
means for managing the shared data comprises>: a special
parallel communication bus for words (BUSD) connecting the
processors (CPU.sub.j) and the central memory (RAM), a partition
logic (LP.sub.j), associated with each processor (CPU.sub.j) and
adapted to differentiate the addresses of shared data and those of
non-shared data in such a manner as to transmit the same on the
address communication means with their identification, a decoding
logic (DEC) associated with the central memory (RAM) and adapted to
receive the addresses with their identification and direct the data
in the memory output either to the corresponding memory shift
register (RDM.sub.j) for non-shared data, or to the special word
communication bus (BUSD) for shared data.
8. A multiprocessor system as in claim 6, characterized in that the
shared data management means comprises, for one part, a special
parallel word communication bus (BUSD) and a special common bus for
communication of word addresses (BUSAM) connecting the processors
(CPU.sub.j) and the central memory (RAM), and for the other part a
partition logic (LP.sub.j) associated with each processor
(CPU.sub.j) and adapted to differentiate the addresses of shared
data and those of non-shared data, in such a manner as to direct
the former to the special common bus (BUSAM) and the latter to the
means for communication of block addresses.
9. A multiprocessor system as in claims 4 and 6 taken together,
characterized in that the shared data management means comprises a
memory management processor (PGM) associated with the central
memory (RAM) and a snooper processor (PE.sub.j) with a bus
associated with each treatment processor (CPU.sub.j) and the
corresponding management directory (RG.sub.j), each snooper
processor (PE.sub.j) with a bus and the memory management processor
(PGM) being connected to the address communication bus (BUSA) in
order respectively to supervise and treat the addresses of blocks
transmitted on said bus in such a manner as to permit an updating
of the central memory (RAM) and of the associated cache memory
(MC.sub.j) in case of detection of a block address present in the
associated directory.
10. A multiprocessor system as in claims 4 and 6 taken together,
characterized in that the shared data management means comprises a
memory management processor (PGM) associated with the central
memory (RAM) and a coherence maintenance processor for shared data
(PMC.sub.j) associated with each treatment processor (CPU.sub.j)
and with the corresponding management directory (RG.sub.j), each
coherence maintenance processor (PMC.sub.j) being connected to a
synchronization bus (SYNCHRO) controlled by the memory management
processor (PGM) in such a manner as to permit an updating of the
central memory (RAM) and of the associated cache memory (MC.sub.j)
in case of detection of a block address, an updating of the central
memory (RAM) and of the cache memories (MC.sub.j) upon each
selection of addresses on the common address bus BUSA.
11. A multiprocessor system as in claims 5 and 6 taken together,
characterized in that the shared data management means comprises a
memory management processor (PGM) associated with the central
memory (RAM) and a shared data coherence maintenance processor
(PMC.sub.j) associated with each treatment processor (CPU.sub.j)
and with the corresponding managing directory (RG.sub.j), each
coherence maintenance processor (PMC.sub.j) being connected to a
synchronization bus (SYNCHRO) controlled by the memory management
processor (PGM) in such a manner as to permit an updating of the
central memory (RAM) and of the associated cache meory (MC.sub.j)
in case of detection of a block address, an updating of the central
memory (RAM) and of the cache memories (MC.sub.j) upon each address
selection in the complementary shift registers (RDC.sub.j).
12. A multiprocessor system as in one of claims 1 to 11,
characterized in that: several processor shift registers
(RDP.sub.k, RDP.sub.k+1 . . . ) corresponding to a given assembly
of processors (CPU.sub.k, CPU.sub.k+1 . . . ) are connected in
parallel to a same series link (LS.sub.k), a local arbitrator being
associated with each assembly of processors (CPU.sub.k, CPU.sub.k+1
. . . ) in order to arbitrate access conflicts to the series link
(LS.sub.k) a memory management processor is connected to block
address communication means and to the central memory (RAM) and
comprises coding means adapted to associate with each block of
information (BI) a leading identification of the processor
concerned among each assembly (CPU.sub.k, CPU.sub.k+1 . . . )
sharing a series link (LS.sub.k), management processors (PG.sub.k,
PG.sub.k+1 . . . ) associated with the cache memories (MC.sub.k,
MC.sub.k+1 . . . ) of the processors of said assembly (CPU.sub.k,
CPU.sub.k+1 . . . ) comprising decoding means for the leading
identification.
13. A multiprocessor system as in one of claims 1 to 12,
characterized in that each memory shift register (RDM.sub.j) is
connected in a static manner to a series link (LS.sub.j)
specifically provided to said register.
14. A multiprocessor system as in one of claims 1 to 12,
characterized in that: a memory management processosr (PGM) is
associated with the central memory (RAM) and comprises a periodic
block logic (ALLOC) for the memory shift registers to the series
links, the memory shift registers (RDM, . . . RDM.sub.j, . . .
RDM.sub.n) are connected in a dynamic manner to the series links
(LS.sub.1 . . . LS.sub.j) by the intervention of an interconnection
network (RI) controlled by the memory management processor
(PGM).
15. A multiprocessor system as in one of claims 1 to 14, in which
the central memory (RAM) is constituted by m memory banks
(RAM.sub.1 . . . RAM.sub.p . . . RAM.sub.m) arranged in parallel,
characterized in that each memory shift register comprises m
elementary shift registers (RDM.sub.j1 . . . RDM.sub.jp . . .
RDM.sub.jm) connected in parallel to the corresponding series link
(LS.sub.j), each elementary register (RDM.sub.jp) being connected
to a memory bank (RAM.sub.p) in such a manner as to permit, in one
cycle of said memory bank, a parallel transfer in reading or
writing of a block of information (bi) between said elementary
register and said memory bank.
16. A multiprocessor system as in claim 15, in which each series
link (LS.sub.j) is divided into m series links (LS.sub.jp),
connecting point to point each processor (CPU.sub.j) to the
elementary shift register (RDM.sub.jp).
17. A multiprocessor system as in claim 15, in which each memory
bank (RAM.sub.p) is of the random access type provided with an
input/output for data of a size corresponding to a block of
information (bi), characterized in that said input/output for each
memory bank (RAM.sub.p) is connected by a parallel bus to the
assembly of elementary registers (RDM.sub.1p . . . RDM.sub.jp).
18. A multiprocessor system as in one of claims 15, 16 or 17,
synchronized by a clock of a frequency F at least equal to 100
megahertz, characterized in that each elementary memory shift
register (RDM.sub.jp) and each processor shift register (RDP.sub.j)
are of a type adapted to present a shift frequency at least equal
to F.
19. A multiprocessor system as in one of claims 15, 16 or 17,
synchronized by a clock of a frequency F at least equal to 100
megahertz, characterized in that each elementary memory shift
register and/or each processor shift register comprises an assembly
of 2.sup.u multiplexed sub-registers (RDM.sub.jp, RDP.sub.jp), each
able to present a shift frequency at least equal to F/2.sup.u.
20. A process for the exchange of information between a central
memory (RAM) organized in blocks of information (bi) and processors
(CPU.sub.1 . . . CPU.sub.j . . . CPU.sub.n) each provided with a
cache memory (MC.sub.j) organized in blocks of the same size (bi),
and a directory (RG.sub.j) and its management processor (PG.sub.j),
in such a manner that the exchange between the central memory (RAM)
and each processor (CPU.sub.j) is carried out via the cache memory
(MC.sub.j) of the latter, said process being characterized in that
each transfer of a block of information (bi) from the central
memory (RAM) to the cache memory (MC.sub.j) of a given processor
consists of: transferring in one cycle of the central memory, the
block (bi) of said central memory (RAM) to a memory shift register
(RDM.sub.j) of the size of one block, making part of an assembly of
shift registers (RDM.sub.1 . . . RDM.sub.j . . . RDm.sub.n)
connected to the central memory, transferring on a series link
(LS.sub.j) the contents of the memory shift register (RDM.sub.j) to
a processor shift register (RDP.sub.j) of the same capacity,
associated with the cache memory (MC.sub.j) of the processor
considered (CPU.sub.j), transferring the contents of said processor
shift register (RDP.sub.j) to said cache memory (MC.sub.j).
21. A process for the exchange of information between a central
memory (RAM) organized in blocks of information (bi) and processors
(CPU.sub.1 . . . CPU.sub.j . . . CPU.sub.n) each provided with a
cache memory (MC.sub.j) organized in blocks of the same size (bi),
and a directory (RG.sub.j) and its management processor (PG.sub.j),
in such a manner that the exchange between the central memory (RAM)
and each processor (CPU.sub.j) is carried out by the cache memory
(MC.sub.j) of the latter, said process being characterized in that
each transfer of a block of information (bi) from the cache memory
(MC.sub.j) of a given processor (CPU.sub.j) to the central memory
consists of: transferring the block (bi) of said cache memory
considered (MC.sub.j) to a processor shift register (RDP.sub.j) of
the size of one block, associated with said cache memory
(MC.sub.j), transferring on a series link (LS.sub.j) the contents
of the processor shift register (RDP.sub.j) to a memory shift
register (RDM.sub.j) of the same capacity, provided to the
processor considered in an assembly of shift registers (RDM.sub.1 .
. . RDM.sub.j . . . RDM.sub.n) connected to the central memory
(RAM), transferring in one cycle of the central memory the contents
of the memory shift register (RDM.sub.j) to said central memory
(RAM).
22. A multiport series memory component, capable of equipping a
multiprocessor system according to one of claims 1 to 19,
characterized in that it comprises an integrated circuit comprising
a random access memory (RAM) of a predetermined size corresponding
to a block of information (bi), an assembly of shift registers
(RDM.sub.1 . . . RDM.sub.j . . . RDM.sub.n), each of a capacity
corresponding to the size of the memory, an internal parallel bus
(BUSI) connecting the access of the memory and the shift registers,
a shift register selection logic (LSR) adapted to validate the link
on the internal bus between the memory and a predetermined shift
register, and an assembly of external input/output pins (adbloc,
admot, numreg, cs, wr, rd, bitbloc, normal/config, hi, di) for the
input of addresses to the memory (RAM), for the input of addresses
to the selection logic (LSR), for the input and the falidation of
transfer commands in reading or writing of a block of information
(bi) between the memory (RAM) and the shift registers (RDM.sub.j),
for the input of a clock signal to each shift register (RDM.sub.j)
, for the bit by bit input of a block of information (bi) to each
shift register (RDM.sub.j) and for the bit by bit output of a block
of information from each shift register (RDM.sub.j).
23. A component as in claim 22, characterized in that it comprises
at least one configuration register (RC.sub.1, RC.sub.2 . . . )
having programming inputs, each configuration register being
connected to a forcing logic (LF) connected to the memory (RAM) and
to the shift registers (RDM.sub.j) in order to assure the forcing
of states of said memory and of said shift registers.
24. A component as in claim 23, permitting the choice of the size
of blocks of information (bi) treated, characterized in that: the
memory (RAM) is divided into zones combinable for permitting the
sotrage of various possible sizes of blocks of information, each
shift register (RDM.sub.j) is divided into combinable legs for
permitting the loading of various possible sizes of blocks of
information, with branches able to assure the corresponding shift
for each size, the internal bus (BUSI) is provided with a
multiplexing logic (MT) for permitting the transfer of blocks of
information (bi) of various sizes between the combinations of zones
of the memory (RAM) and the corresponding combinations of legs of
the shift registers (RDM.sub.j), a configuration register
(RC.sub.1) is provided with a capacity corresponding to the number
of sizes of blocks possible, the forcing logic (LF) connected to
the register (RC.sub.1) comprises a logic unit adapted to control
the multiplexing logic (MT) in order to validate the transfers of
blocks of information (bi) in a given size corresponding to the
parameter contained on the configuration register (RC.sub.1).
25. A component as on one of claims 23 or 24, characterized in
that: the input and the output of each shift register (RDM.sub.j)
are connected to a same external pin by the intermediary of a logic
gate (PL.sub.j), a configuration register (RC.sub.2) is provided of
a capacity corresponding to the number of shift registers
(RDM.sub.j), the forcing logic (LF) connected to the configuration
register (RC.sub.2) comprises a logic unit adapted to control the
logic gates (PL.sub.j) in order to force the operation of each
shift register (RDM.sub.j) in the input mode or in the output mode
as a function of a bit contained in the configuration register
(RC.sub.2) provided to said shift register (RDM.sub.j).
26. A component as in one of claims 23, 24 or 25, characterized in
that: the input and the output of each shift register (RDM.sub.j)
are connected to the same external pin by the intermediary of a
logic gate (PL.sub.j), a configuration register (RC.sub.3) is
provided with a capacity corresponding to the number of shift
registers (RDM.sub.j), the forcing logic (LF) connected to the
configuration register (RC.sub.3) comprises a logic unit connected
to control the reading of the memory (RAM) and adapted to control
each logic gate (PL.sub.j) either in the output mode at the moment
of reading of the memory (transfer of the memory RAM to the
corresponding register RDM.sub.j) during the entire duration of the
emptying of said shift register (RDM.sub.j), of in the input mode
the remainder of the time.
27. A component as in one of claims 23, 24, 25 or 26, characterized
in that it comprises an external input pin (bit/bloc), one or more
external data input/output pins, a control logic (COM) connected to
the input pin (bit/bloc), to the data input/output pines, to the
memory (RAM) and to the selection logic (LSR) and adapted according
to the state of the input (bit/block) for generating either the
transfers of blocks of information (bi) between memory (RAM) and
shift registers (RDM.sub.j), or the transfers of bits directly
between the memory (RAM) and the data pins.
28. A component as in claims 23 and 27 taken together characterized
in that the configuration registers (RC.sub.1, RC.sub.2 . . . ) are
connected: to the selection logic (LSR) which is adapted for
selecting said configuration registers for the predetermined
addresses affected by said registers, to the control logic (COM)
which is adapted for transmitting the data originating from the
data input/output pins to said configuration registers for their
programming.
29. A component as in one of claims 22 to 28, in which on the
internal bus (BUSI) connecting the access of the memory (RAM) and
the shift registers (RDM.sub.j) is interposed a barrel shifter type
logic (BS) adapted to assure a circular permutation on the bits of
each block of information, said logic (BS) having a control input
for the step of the shift in the word unit connected to the input
pins (admot).
Description
[0001] The invention relates to a multiprocessor system of the type
comprising a central memory, treatment processors and cache
memories associated with treatment processors. It also relates to a
process for the exchange of information between central memory and
treatment processors via the cache memory associated with each of
these processors. It also provides a new integrated circuit
component, capable of equipping the multiprocessor system.
[0002] It is known that, in the most common known multiprocessor
systems, all the information (data, address) is relayed by a common
parallel communication bus between the central memory and the
various treatment processors, which constitutes a bottleneck: its
transfer rate is in effect insufficient to feed all the processors
for full efficiency, from a common central memory.
[0003] For increasing the information transfer rate, a first
solution consists in associating with each treatment processor a
cache memory which, by the location of the information, permits
reducing the demands on the central memory. However, in the case in
which the volume of data shared between processors is substantial,
the maintenance of coherence of the data between memories generates
complementary information traffic on the communication bus which
resists a significant reduction of the overall flow on this bus,
and therefor removes a large part of the interest in this
solution.
[0004] Another solution consists in providing the communication bus
in the form of a grid network designed as a "crossbar", which
permits a direct communication between each treatment processor and
each subassembly of the central memory (memory bank). However, this
solution is very heavy and very costly to achieve because of the
very great number of interconnections, and it becomes completely
unrealistic beyond about ten treatment processors. Moreover, in the
case of multiple demands of several processors on the same memory
bank, such a solution implies access conflicts, a source of slowing
up the exchanges.
[0005] Another more current solution by reason it its architectural
simplicity consists in associating a local memory with each
treatment processor for storing specific data therein, and storing
the transferred data in the common central memory. However, the
great deficiency of this architecture is its non-transparency, that
is, the need for the programmer to organize the detail of the
allocation of data in the various memories, such that this solution
is of a very constrained usefulness. Moreover, in the case of high
volume of transferred data, it may lead as before to a saturation
of the access bus in the central memory.
[0006] A solution which has been called "aquarius architecture" has
been proposed by the University of Berkeley and consists in
improving the aforementioned crossbar solution by combining with
the crossbar network, for the shared data, cache memories which are
connected to the crossbar network, and for the shared data,
distinct cache memories which are connected to a common
synchronization bus. This solution contributes a gain in speed of
exchange but remains very heavy and very costly to achieve.
[0007] The present invention seeks to provide a new solution,
permitting considerably increasing the flow rate of information
exchange, while retaining an architecture which is transparent for
the user, much simpler than the crossbar architecture.
[0008] An object of the invention is thus to permit notably
increasing the number of treatment processors of the system, while
benefitting from a high efficiency for each processor.
[0009] Another object is to provide a structure of an integrated
circuit component, permitting a very simple realization of the
architecture of this new multiprocessor system.
[0010] To this end, the multiprocessor system provided by the
invention is of the type comprising a central memory (RAM)
organized in blocks of information (bi), treatment processors
(CPU.sub.1 . . . CPU.sub.j . . . CPU.sub.n), a cache memory
(MC.sub.j) connected to each treatment processor (CPU.sub.j . . .
CPU.sub.j . . . CPU.sub.n) a cache memory (MC.sub.j) connected to
each treatment processor (CPU.sub.j) and organized in blocks of
information (b.sub.i) of the same size as those of the central
memory, a directory (RG.sub.j) and its management processor
(PG.sub.j) associated with each cache memory (MC.sub.j), means for
communication of addresses of blocks between processors (CPU.sub.j)
and a central memory (RAM). According to the present invention, the
multiprocessor system is provided with:
[0011] an assembly of shift registers, termed memory shift
registers (RDM1 . . . RDM.sub.j . . . RDMn), each register
(RDM.sub.j) of this assembly being connected to the central memory
(RAM) in such a manner as to permit, in one cycle of this memory, a
parallel transfer in read or write of a block of information
(b.sub.i) between said register and said central memory;
[0012] shift registers, termed processor shift registers (RDP1 . .
. RDP.sub.j . . . RDPn), each processor shift register (RDP.sub.j)
being connected to the cache memory (MC.sub.j) of a processor
(CPU.sub.j) in such a manner as to permit a parallel transfer in
reading or writing of a block of information (b.sub.i) between said
shift register (RDP.sub.j) and said cache memory (MC.sub.j);
[0013] an assembly of series links (LS.sub.1 . . . LS.sub.j . . .
LS.sub.n), each connecting a memory shift register (RDM.sub.j) and
a processor shift register (RDP.sub.j) and adapted to permit the
transfer of blocks of information (b.sub.i) between the two
registers considered (RDM.sub.j, RDP.sub.j).
[0014] Thus, in the multiprocessor system according to the
invention, the exchanges between the cache memories and the
associated processors are carried out as in the conventional
systems provided with cache memories. By contrast, the exchanges
between the central memory and the cache memories is carried out in
an entirely original manner.
[0015] Each transfer of an information block (b.sub.i) from the
central memory (RAM) to the cache memory (MC.sub.j) of a given
processor (CPU.sub.j) consists of:
[0016] transferring, in a cycle of the central memory, the block
(b.sub.i) of said central memory (RAM) to the memory shift register
(RDM.sub.j) (of the size of one block) which is directly connected
to the central memory and which corresponds to the processor
(CPU.sub.j) considered,
[0017] transferring on the corresponding series link (LS.sub.j) the
contents of this memory shift register (RDM.sub.j) to the processor
shift register (RDP.sub.j) (of the same capacity) which is
associated with the cache memory (MC.sub.j) of the processor
considered (CPU.sub.j),
[0018] transferring the contents of said processor shift register
(RDP.sub.j) to the cashe memory (MC.sub.j)
[0019] In the opposite direction, each transfer of information
blocks (b.sub.i) from the cache memory (MC.sub.j) of a given
processor (CPU.sub.j) to the central memory (RAM) consists of:
[0020] transferring the block (b.sub.i) of said cache memory
considered (MC.sub.j) to the processor shift register (RDP.sub.j)
which is associated with said cache memory (MC.sub.j),
[0021] transferring on the corresponding series link (LS.sub.j) the
contents of the processor shift register (RDP.sub.j) to the memory
shift register (RDM.sub.j), allocated to the processor considered
(among the assembly of shift registers (RDM1 . . . RDM.sub.j . . .
RDMn) connected to the central memory (RAM),
[0022] transferring in a cycle of the central memory, the contents
of the memory shift register (RDM.sub.j) to said central memory
(RAM).
[0023] In these conditions, the transfer of each block of
information (b.sub.i) is carried out, no longer through a parallel
bus as is the case in the known systems, but by the series links of
high flow rate. These series links permit obtaining comparable
times of transfer for each block (b.sub.i) and even lower than the
transfer times in known parallel bus systems. The comparative
example given hereinbelow with the current values of the parameter
for current technology, illustrates clearly this fact which seems
paradoxical.
[0024] It is assumed that each block of information (b.sub.i) is of
a size equal to 64 octets.
[0025] In the system of the invention, the transfer time between
the central memory and a cache memory breaks down into:
[0026] a central memory transfer time (RAM)/memory shift register
(RDM.sub.j): 100 nanoseconds (performance of a central random
access memory of known type),
[0027] a series transfer time on the corresponding series link:
64.times.8.times.1/500.10.sup.6, either 1024 nanoseconds, assuming
a transfer frequency of 500 megahertz (not exceptional with current
technology which permits attaining frequencies of 3000
megahertz),
[0028] a processor shift register transfer time (RDP.sub.j)/cache
memory (Mc.sub.j): 50 nanoseconds (cache memory of the very current
type).
[0029] The total time of transfer of a block is therefor on the
order of 1200 nanoseconds (while integrating the chaining delays of
the second order).
[0030] In known systems with cache memories in which the exchanges
of information is carried out directly in parallel by words of 4
octets (the most current systems leading to busses of the
conventional type of 32 data lines), the transfer time for one
block is equal to the transfer time of 16 words of 4 octets which
comprise this block, that is: 16.times.100=1600 nanoseconds.
[0031] Thus, it is seen that, with the average hypotheses in the
two solutions, these times are comparable. But, if one compares the
architecture of the system according to the invention with that of
a parallel bus common with cache memories (first solution mentioned
previously), it will be realized that:
[0032] in the conventional solution (common parallel bus), the
central memory and the common bus are occupied at 100% during the
transfer, since the information circulates between the two for the
entire transfer time,
[0033] in the system according to the invention, the series link is
occupied 100% during the,transfer, but the central memory is
occupied less that 10% of the transfer time (time of memory reading
and loading of the memory shift register (RDM.sub.j)), such that
the central memory may serve 10 times more processors than in the
preceding case (the use of the series link being without
significance since it is private and directed to the
processor).
[0034] It is important to emphasize that in this system of the
invention, each series connection which connects each processor in
an individual manner to the central memory is a simple connection
(of one or two data leads), such that the series network thus
constituted is not comparable in the overall plan to the complexity
with, for example, a crossbar network of which each connection is a
parallel connection with multiplicity of leads (32 leads of data in
the comparative example above), with all of the necessary
switches.
[0035] Further, as will be seen below on the comparative curves,
the system according to the invention has greatly improved
performance with respect to the traditional common bus systems and
permits in practice operating a much higher number of processors
(of several tens to a hundred processors). This performance is
compatible with that of a crossbar system, but the system,
according to the invention is of a much greater architectural
simplicity.
[0036] In the system of the invention, each series link may in
practice be achieved either by means of two unidirectional series
links for bit by bit transfer, or by means of a single
bidirectional series link.
[0037] In the first case, each memory shift register; (RDM.sub.j)
and each processor shift register (RDP.sub.j) are divided into two
registers, one specialized for the transfer in one direction, the
other for the transfer in the other direction. The two
unidirectional series links are then connected to the divided
memory shift register (RDM.sub.j) and to the corresponding divided
processor shift register (RDP.sub.j), in such a manner as to
permit, for one, a transfer in one direction, and for the other, a
transfer in the other direction.
[0038] This embodiment with two unidirectional links presents the
advantage of not requiring any transfer management on the link, but
the inconvenience of doubling the necessary resources (link,
registers).
[0039] In the second case, a validation logic of the transfer
direction is associated with the bidirectional link such as to
permit an alternate transfer in the two directions on said link.
This logic may be integrated in the management processor (PG.sub.j)
associated with the cache memory (MC.sub.j) to which said
bidirectional link is connected.
[0040] It will be understood that each series link may ultimately
be provided with a higher number of series links.
[0041] In the multiprocessor system according to the invention, the
address communication means may cover essentially two forms
embodiments: in the first case, it may consist of a parallel
address communication bus for blocks (BUSA), common to all of the
processors (CPU.sub.j) and connecting the latter and the central
memory (RAM) in a conventional manner with an arbitrator bus (AB)
adapted to manage access conflicts on said bus. It is necessary to
note that this address bus is only utilized for communication of
addresses of blocks: in the plan of the structure, this bus is
identical to the parallel address communication bus of known
systems, for which no problems of saturation are interposed, since
it will be freed right after transfer of the address block.
[0042] However, another embodiment of this address communication
means may be considered in the multiprocessor system of the
invention, consisting in operating the series links for transfer of
blocks of information (b.sub.i) to transfer the addresses of these
blocks.
[0043] In this case, a complementary shift register (RDC.sub.j) is
connected to each series link (LS.sub.j) in parallel with the
corresponding memory shift register (RDC.sub.j). The addresses
transmitted by said series link are thus loaded into each of these
complementary registers (RDC.sub.j). An access management
arbitrator connected to said registers (RDC.sub.j) and to the
central memory (RAM) is thus provided for selecting the addresses
contained in said registers and for managing the conflicts of
access to the central memory (RAM). Such an arbitrator is
conceivably known it itself, this type of access conflicts being
now resolved for a number of years. In this embodiment, the
presence of a parallel communication address bus is avoided, but
the management resources are made more heavy.
[0044] Further, the multiprocessor system according to the
invention is particularly well suited for managing in an efficient
manner the problems of coherence of the data shared between
treatment processors. In effect, the conventional solutions for
managing these shared data find their limits in the known systems
from the fact of the bottleneck at the level of the communication
of information, but become, on the contrary, perfectly satisfactory
and efficient in the system of the invention where such a
bottleneck no longer exists, such that this system may be equipped
with shared data management means of an analogous concept to that
of known systems.
[0045] For example, one traditional solution of shared data
management consists in avoiding the relay of shared data by the
cache memories: in a conventional manner, a partition logic
(LP.sub.j) is associated with each treatment processor (CPU.sub.j)
in order to differentiate the addresses of the shared data and
those of the non-shared data so as to direct the first directly
toward the central memory (RAM) and the second toward the
corresponding cache memory (MC.sub.j).
[0046] In a first version of the architecture according to the
invention, the system comprises:
[0047] a special bus for parallel communication of words (BUSD)
connecting the processors (CPU.sub.j) and the central memory
(RAM),
[0048] a partition logic (LP.sub.j) associated with each processor
(CPU.sub.j) and adapted to differentiate the addresses of the
shared data and those of the non-shared data in such a manner as to
transmit the non-shared data on the address communication means
with their identification,
[0049] a decoding logic (DEC) associated with the central memory
(RAM) and adapted to receive the addresses with their
identification and to direct the data into the memory output either
to the corresponding memory shift register (RDM.sub.j) for the
non-shared data, or to the special word communication bus (BUSD)
for the shared data.
[0050] This solution presents the advantage of being very simple in
the architectural plan. The presence of the special parallel
communication bus (BUSD) leads to better performances with respect
to a solution which consists in utilizing the series connections
for transferring not only the blocks of non-shared data but also
the words of shared data. It should be noted that this latter
solution may, in some cases, be provided in case of low flow of
shared data.
[0051] In another version, the system is provided with a special
bus for parallel communication of words and a special common bus
for communication of addresses of words (BUSAM) in order to
transfer the data by the special word bus (BUSD), and direct the
non-shared data to the address communication means (which may
comprise a parallel communication bus where the communication is
carried out by the series links).
[0052] The presence of a special bus for communication of addresses
of words permits, in this version, to move back the saturation
limit of the address communication means, in case of high demand
for shared data.
[0053] Another version which will be preferred in practice in the
case in which the address communication means comprises a parallel
address communication bus (BUSA) consists in providing the system
with a memory management processor (PGM) associated with the memory
(RAM) and a snooper processor with a bus (PE.sub.j) associated with
each treatment processor (CPU.sub.j) and to the corresponding
management directory (RG.sub.j). The memory management processor
(PGM) and each (espion) processor, of structures known in
themselves, are connected to the address communication bus (BUSA)
in order respectively to oversee and to treat the addresses of
blocks transmitted on said bus in such a manner as to permit an
updating of the central memory (RAM) and of the associated cache
memory (MC.sub.j) in case of detection of an address of a block
present in the associated directory (RG.sub.j).
[0054] The memory management processor (PGM) and each snooper
processor (PE.sub.j) associate status bits of each block of
information, holding them open as a function of the nature (read or
write) of the requirements of the block which transitions on the
bus (BUSA) and assures the coherence of the shared data while using
these status bits which permit them to force or not write a block
into the central memory at the moment of the requests on the bus
(BUSA).
[0055] In the case referred to previously where the communications
of addresses are made by the series connections, the management of
shared data may also be assured in a centralized manner, by a
memory management processor (PGM) associated with the central
memory (RAM) and a processor for maintaining the coherence of the
shared data (PMC.sub.j) associated with each treatment processor
(CPU.sub.j) and with the corresponding management directory
(RG.sub.j), each coherence maintenance processor being connected to
a synchronization bus (SYNCHRO) controlled by the memory management
processor (PGM), in such a manner as to permit an updating of the
central memory (RAM) and of the associated cache memories
(MC.sub.j) in case of detection of an address block, an updating of
the central memory (RAM) and the cache memories (MC.sub.j) at each
address selection in the complementary shift registers
(RDC.sub.j).
[0056] As before this operation is assured due to the status bits
associated with each block of information by the processor
(PGM).
[0057] It should be noted that a synchronization bus of the type
hereinafter defined may, in some cases, be provided in the
preceding architecture where the address of blocks move on a common
address bus BUSA. In this case, the snooper processors (PE.sub.j)
are urged by the memory management processor (PGM) via the
synchronization bus, and this only when they are concerned by the
transfer. Thus, non-useful access to the cache memories is avoided.
The snooper processors become then passive (since driven by the
processor PGM) and they are designated more by the more appropriate
expression "coherence maintenance processor" according to the
terminology hereinabove utilized.
[0058] Another solution consists in reserving the parallel address
communication bus (BUSA) for the transfer of addresses of blocks of
shared data and using the series links for the transfer of blocks
of non-shared data.
[0059] Further, the multiprocessor system according to the
invention lends itself to the regroupings of treatment processors
on a same series link, in such a manner as to limit the series
links and the corresponding memory shift registers (RDM.sub.j)
necessary.
[0060] The number of memory shift registers (RDM.sub.j) may
correspond to the number of series links (LS.sub.j), to which case
each memory shift register (RDM.sub.j) is connected in a static
manner to a series link (LS.sub.j) specifically appropriated to
said register.
[0061] The number of memory shift registers (RDM.sub.j) may also be
different from that of the series connections (LS.sub.j) and in
particular less, in which case these registers are connected in a
dynamic manner to the series links (LS.sub.j) through an
interconnection network.
[0062] As in conventional systems, the central memory (RAM) may be
divided into `m` memory banks (RAM.sub.1 . . . RAM.sub.p . . .
RAM.sub.m) arranged in parallel. Each memory shift register
(RDM.sub.j) is then comprised of m elementary registers (RDM.sub.j1
. . . RDM.sub.jp . . . RDM.sub.jm) connected in parallel to the
corresponding series link (LS.sub.j). However, a level of
supplementary parallelism and a better electrical or optical
adaptation of the connection is obtained in a variation in which
each memory bank RAM.sub.p is connected to each processor CPU.sub.j
by a series link from point to point LS.sub.jp.
[0063] In order to provide transfer performance at least equal to
those of conventional systems with a parallel bus, the system
according to the invention is preferably synchronized by a clock of
a frequency F at least equal to 100 megahertz. The memory shift
registers (RDM.sub.j) and processor shift registers (RDP.sub.j) may
very simply be of a type adapted to present a shift frequency at
least equal to F.
[0064] In the case of very high frequencies (particularly greater
than 500 megahertz with current technology), the registers may be
divided into sub-registers of a lower shift frequency, and then
multiplexed.
[0065] The invention also relates to a multiport series memory
component, susceptible of equipping the multiprocessor system
previously described, in order to simplify the fabrication. This
component, which may have different applications, is constituted by
an integrated circuit comprising a random access memory (RAM) of a
pre-determined width corresponding to a block of information (bi),
an assembly of shift registers (RDM.sub.1 . . . RDM.sub.j . . .
RDM.sub.n), each of a capacity corresponding to the size of the
memory, an internal parallel bus (BUSI) connecting the access of
the memory and the shift registers, a selection logic of a shift
register (LSR) adapted to validate the connection on the internal
bus between the memory and a predetermined shift register, and an
assembly of external input/output pins for the input of addresses
to the memory (RAM), for the input of addresses to the selection
program (LSR), for the input and the validation of transfer
commands in read or write of a block of information (bi) between
the memory (RAM) and the shift registers (RDM.sub.j), for the input
of a clock signal to each shift register (RDM.sub.j), for the input
bit by bit of a block of information (bi) to each shift register
(RDM.sub.j) and for the output bit by bit of a block of information
from each shift register (RDM.sub.j).
[0066] This component may be made parametrable by the adjunction of
configuration registers (RC.sub.1, RC.sub.2, . . . ) permitting
particularly a choice of sizes of blocks of information (bi) and of
diverse modes of operation of the shift registers.
[0067] The invention having been described in its general form, is
illustrated by the description which follows in reference to the
accompanying drawings which show without limitation several
embodiments. In these drawings, which form an integral part of the
present description:
[0068] FIG. 1 is a block diagram of a first embodiment of the
multiprocessor system according to the invention;
[0069] FIG. 2 is a diagram giving the calculated performance curve
of this system (A) and, by way of comparison, the corresponding
curve (B) for a conventional multiprocessor architecture with a
common bus;
[0070] FIGS. 3, 4 and 5 are detailed logic schematics of functional
units of the system of FIG. 1;
[0071] FIG. 6 is a block diagram of another embodiment of the
system;
[0072] FIG. 7 is a block diagram of a system of the type as that of
FIG. 1, provided with shared data management means;
[0073] FIG. 8 is a detailed logic diagram of a sub-assembly of the
system of FIG. 7;
[0074] FIG. 9 is a block diagram of a system analogous to that of
FIG. 7 with a variation in the shared data management means;
[0075] FIG. 10 is a block diagram of an analogous system, provided
with a different shared data management means;
[0076] FIGS. 11, 12a, 12b, 12c, 12d, 13, 14, 15, 16 17 are detailed
logic schematics of the functional units of the processor system of
FIG. 10;
[0077] FIG. 18 is a block diagram of a system of the type as that
of FIG. 6, provided with means for shared data management;
[0078] FIG. 19 is a simplified block diagram of a variation of the
system, in which several central units share the same series
link;
[0079] FIG. 20a is a block diagram of a preferred embodiment, in
which the central memory is organized into several memory
banks,
[0080] FIG. 20b is a variation of the architecture shown in FIG.
20A;
[0081] FIGS. 21a and 21b schematically show another structure of
the memory RAM susceptible of equipping said system; and
[0082] FIG. 22 is a block diagram showing the structure of a
multiport series memory component, capable of equipping the
system.
[0083] The apparatus presented in the form of a block diagram in
FIG. 1 is a multiprocessor system having n treatment processors
CPU.sub.1 . . . CPU.sub.j . . . CPU.sub.n. Shown in this figure are
two treatment processors CPU1 and CPU.sub.j with their associated
logic. Each of these treatment processors is of a conventional
type, for example "Motorola 68020" or "Intel 80386" . . . and with
peripheral interfaces and being provided with a virtual memory
device.
[0084] The apparatus comprises a central random access memory RAM
provided in a conventional manner from integrated memory circuits:
in particular dynamic RAM "INTEL," "NEC," "TOSHIBA" . . . of 256
Kbits, 1 Mbits, 4 Mbits . . . depending upon the application. This
memory is organized in blocks of information b.sub.o . . . b.sub.i
. . . of predetermined size t (usually 256 bits per 2 Kbits) and
the access front of said memory corresponds to the size of one
block.
[0085] The central memory is connected in parallel to n shift
registers RDM.sub.1 . . . RDM.sub.j . . . RDM.sub.n called memory
registers, each memory register having the size t of one block of
information. Each of these registers is provided from high speed
technology ("ASGA"), one latch able to be loaded or unloaded in one
cycle of the central memory RAM. The number n of registers is equal
to the number of processors CPU.sub.j.
[0086] A cache memory MC.sub.j is associated in a known manner with
each processor CPU.sub.j. Each cache memory comprises a
conventional high speed random access memory, of low capacity with
respect to the central memory RAM. A directory RG.sub.j and a
management processor PG.sub.j are, in a traditional manner,
connected to the cache memory and to the treatment processor for
managing the information flowing in the cache memory.
[0087] Further, in the system of the invention, a shift register
RDP.sub.j called a processor register is connected by its parallel
port to each cache memory MC.sub.j. Each processor register
RDP.sub.j is of a size corresponding to that of a block bi and of a
structure similar to that of the memory registers RDM.sub.j.
[0088] Each memory register RDM.sub.j is connected by its series
port to the series port of a processor register RDP.sub.j by a
series link LS.sub.j. Examples of the provision of this series
link, which may comprise a bidirectional link or two unidirectional
links, are illustrated in FIGS. 4 and 5. The control of the
transfer of the blocks b.sub.i between corresponding registers
RDM.sub.j and RDP.sub.j is assured by the transfer logic TFR.sub.j
and TFR' which are associated in a symmetrical manner with the
memory register RDM.sub.j and with the processor register
RDP.sub.j. One embodiment of these transfer logics (known in
themselves) is detailed in FIG. 3.
[0089] The central memory RAM, memory shift registers RDM.sub.1 . .
. RDM.sub.n, and associated transfer logics TFR.sub.1 . . .
TFR.sub.n comprise a functional assembly called a "multiport series
memory" MMS. The treatment processor assembly CPU.sub.j, cache
memory MC.sub.j, management directory for the cache RG.sub.j,
management processor of the cache PG.sub.j, shift register
processor RDP.sub.j and the associated transfer logic TFR'.sub.j
constitute a "functional" assembly called a "central unit"
UC.sub.j.
[0090] Further, the system comprises means for communication of
addresses of blocks of the processors CPU.sub.j to the central
memory RAM, comprising in the example a common parallel
communication bus BUSA on which are connected the processors
CPU.sub.j (though their management processor PG.sub.j) and the
central memory RAM.
[0091] Access to the bus BUSA is controlled in a conventional
manner by a bus arbitrator AB.
[0092] The general operation of the architecture hereinabove
defined is as follows:
[0093] A processor CPU.sub.j carries out a program proper
comprising instructions, which are found in the form of words in
the central memory RAM with extracts in the associated cache memory
MC.sub.j. On these instructions of the program, the processor
CPU.sub.j is brought either to read the words of data which
themselves are found in the central memory RAM or in the cache
memory MC.sub.j in the form of extracts, or to write the words of
data in the central memory RAM and in the cache memory
MC.sub.j.
[0094] Each operation of a processor CPU.sub.j called a "request")
requires the furnishing of the address adr of the word concerned,
the nature r, w of the operation (read, write) and the data
exchange of the word concerned.
[0095] Each request of a word activates the processor PG.sub.j
which then consults in a conventional manner the directory of the
cache RG.sub.j which indicates if the block bi containing the word
concerned is present in the cache memory MC.sub.j at, in some
cases, the frame of the block in the cache memory where the block
searched for is found.
[0096] If the block bi containing the word concerned is in the
cache memory MC.sub.j, then in case of reading, this word is read
in said cache memory and sent to the processor CPU.sub.j. In case
of writing, the word furnished by the processor CPU.sub.j is
written in the cache memory: the memory transaction is
terminated.
[0097] If the block containing the word concerned is not in the
cache memory MC.sub.j, then a reading of the block bi into the
central memory RAM is necessary. Two cases may be produced.
[0098] First Case
[0099] The cache memory MC.sub.j makes use of at least one free
block, determined by the processor PG.sub.j with the help of status
bits associated with each input of the directory RG.sub.j. In this
case, the processor PG.sub.j is requesting, in a conventional
manner, the bus BUSA while referring its demand to the bus
arbitrator AB. This latter reconciles, in turn, the bus BUSA to the
processor PG.sub.j which agrees while writing to the central memory
RAM, the block read in the memory being loaded into the register
RDM.sub.j, identified by the origin .sub.j of the call. The end of
the reading cycle translates by the liberation of the bus BUSA and
the activation of the transfer with the series link LS.sub.j
permitting the transfer of the contents of the memory register
RDM.sub.j in the processor register RDP.sub.j. The end of the
transfer activates the writing in the cache memory MC.sub.j of the
contents of the processor register in the placement of the block
reserved for this effect and the transaction may terminate as
before.
[0100] Second case
[0101] The cache memory MC.sub.j is not arranged for free placement
then, by a conventional algorithm, a placement of the cache is made
a candidate for receiving the block requested. Two situations may
be encountered:
[0102] The block contained in the candidate position has not been
modified since its installation: it is simply eliminated while
freeing the frame of the block by a simple writing of a status bit
in the directory (RG.sub.j) and the transaction may proceed as
before.
[0103] The block contained in the candidate position has been
modified and an operation of the central memory is required. For
this fact, the management processor PG.sub.j transfers the
candidate block to the processor register RDP.sub.j, activates the
transfer of the processor register RDP.sub.j to the memory register
RDM.sub.j, then requests the common bus BUSA while referring its
request to the arbitrator AB. When the arbitrator admits the bus to
the management processor PG.sub.j, the latter activates a write
command which has the effect of transferring the contents of the
memory register RDM.sub.j to its position in the central memory
RAM. The operation of the memory RAM is terminated and the
transaction may continue as before.
[0104] Thus, in the apparatus of the invention, the exchanges
between the treatment processors CPU.sub.j and their cache memory
MC.sub.j and associated logics RG.sub.j, PG.sub.j, are carried out
in a conventional manner. By contrast, the transfers of blocks
between central memory RAM and cache memories MC.sub.j proceed, no
longer by a common parallel bus, but by the series links LS.sub.j
dedicated to each treatment processor CPU.sub.j, the common bus
BUSA only serving for the transfer of the addresses and having thus
a considerably reduced traffic.
[0105] It is known that, for conventional common bus architectures,
a modelization studied by "PATEL" ("Analysis of Multiprocessors
With Private Cache") JANAK H. PATEL, IEEE Transactions on
Computers, Vol C.31, No. 4, April, 1982) has led to the following
approximate formula giving the efficiency U as a function of the
number of processors present: 1 U = 1 1 + m ( W + tf )
[0106] where the efficiency U is the average rate of utilization of
each treatment processor,
[0107] m is the probability for a treatment processor to make a
memory request, not present in its the cache memory (this
probability m=.varies.. Pa is proportional to the probability of
the absence Pa of the information in the cache memory and a factor
.varies. which is a function of the power of the treatment
processor expressed as a percentage of the memory requests):
[0108] W is the average waiting time of the common bus, which is a
function of the number of processors,
[0109] tf is the transfer time of a block from the central memory
to a cache memory.
[0110] The hypotheses from which this formula has been established
show that it is applicable to the architecture according to the
invention, with a level of approximation comparable to the level of
approximation of the formula for conventional common bus
architecture.
[0111] It is thus possible to compare the performances of the two
types of architecture while -assuming that the components common to
the two architectures are of identical characteristics.
[0112] FIG. 2 gives the curves obtained of the efficiency U as a
function of the number n of processors for the following
parameters, the parameters common to the two systems being
identical, and all of a usual value:
[0113] size of block b.sub.i=64 octets,
[0114] size of word for the parallel transfer on common bus=4
octets,
[0115] central memory RAM access time=100 nanoseconds,
[0116] cycle time of the bus BUSA=50 nanoseconds,
[0117] series transfer frequency=500 Mhz,
[0118] probability of absence Pa=0.005 (cache memory of 16
octets),
[0119] power factor of the processors: .varies.=0.5.
[0120] It is established by comparing the curves A (architecture of
the invention) and B (conventional architecture) that the
architecture according to the invention has an efficiency clearly
superior to the conventional architecture. The architecture of the
invention permits putting in place a number of processors very
superior to the conventional common bus architecture which, in
practice, cannot exceed about ten processors. For example, in the
conventional case, an efficiency of 0.75 is obtained at the tenth
processor, whereas it is obtained for more than 80 processors in
the case of the invention.
[0121] FIG. 3 presents an embodiment of a transfer logic TFR.sub.j
or TFR'.sub.j permitting the transfer of a block bi of information
from a memory register RDM.sub.j to a processor register RDP.sub.j
(the inverse transfer is assured by the symmetric means not shown
in this figure). Each logic TFR.sub.j or TFR'.sub.j comprises a
part for control of the output TFRE.sub.j and TFRE'.sub.j, and a
part for control of the reception TFRR.sub.j and TFRR'.sub.j which
are activated in a crossed manner (output TFRE.sub.j activated in
synchronism with the reception TFRR'.sub.j). The system comprises a
clock generator H the frequency of which fixes the speed of
transmission and furnishes the clock signal h at the output part
TFRE.sub.j and at the reception part TFRR'.sub.j.
[0122] In the output part TFRE.sub.j a countdown register DC
receiving at its load input {overscore (load)}.sub.2 the reading
signal {overscore (r)} from the management processor PG.sub.j
permits allowing to pass t+1 pulses of the clock h through a logic
port ET1 controlled by a pass to zero signal "{overscore
(borrow)}", the output of this port ET1 being connected to the
input of the "down" counting of the countdown device DC and to the
shift input shift1 of the memory register RDM.sub.j.
[0123] In the reception part TFRR'.sub.j, a flip flop B is
connected by its data input D to the series output of the processor
register RDP.sub.j, the clock input clk of this flip flop being
connected to the clock H for receiving the signal h. An
initialization signal "{overscore (init)}" furnished by the
management processor PG.sub.j is connected to the input {overscore
(S)} of the flip flop B and to the loading input {overscore
(load)}3 of the processor register RDP.sub.j. The output Q of the
flip flop transmits a control signal {overscore (end-transfer)} to
the logic port ET2, permitting the clock signal h to pass toward
the shift input shift2 of the processor register RDP.sub.j. This
control signal is also delivered to the management processor
PG.sub.j for indicating the end of transfer of the block.
[0124] The operation of the assembly is as follows: the management
processor PG.sub.j, after having obtained access to the central
memory RAM via the bus BUSA, carried out its reading of the block
bi while furnishing the address of the block concerned and the
reading signal r. This signal releases the activation of the output
part TFRE.sub.j: the final front of the reading signal r provokes
the loading of the block bi into the memory register RDM.sub.j
while activating the signal {overscore (load)}1 and the loading of
the value t+1, corresponding to the size in bits of the block bi
plus one supplementary bit called "start", into the countdown
register DC by the signal {overscore (load)}2. This has for effect
to reset to 1 the signal {overscore (borrow)} and authorize the
transfer clock H to furnish, through the logic port ET1 conditioned
by this borrow signal, t+1 clock pulses h: these pulses have for
effect to shift by the input shift1 t+1 bits from the memory
register RDM.sub.j and to cause to wait for the value 0 from the
down input to the countdown device DC: the signal {overscore
(borrow)} is reset to zero and locks the operation of the output
part TFRE.sub.j.
[0125] Thus, the series connection LS.sub.j, initially having the
logic rest state 1, transfers the start bit 0, then the t bits of
the block bi, and then returns to the logic rest state 1, the
latter bit sent being the value 1 forced on the series input of the
memory register RDM.sub.j.
[0126] As a preliminary to the reading demand, the management
processor PG.sub.j has initialized the receiving part TFRR'.sub.j
while activating the {overscore (init)} signal which has the effect
of loading the processor register RDP.sub.j with t bits to 1
through the input {overscore (load)}3 and to set the output Q of
the flip flop B to the logic state 1 by the input {overscore (S)}.
This output Q then validates the logic port ET2 which allows the
clock signal h to pass toward the input shift2 of the processor
register RDP1. At each clock pulse this processor register
furnishes a bit on its series output which is stored in the flip
flop B. The first bit 0 which is presented has the effect of
setting to zero the output Q of the flip flop B and locking the
clock signal h on the gate ET2. This first bit 0 being the start
bit which precedes the block bi, this latter is then trapped in the
processor register RDP.sub.j when the management processor PG.sub.j
is notified of the change of state of the flip flop B by the
{overscore (end-transfer)} signal: the management processor
PG.sub.j need only come to read this block bi on the parallel
output of the register RDP.sub.j.
[0127] The writing of a block bi to the central memory RAM requires
the presence of a logic TFRE'.sub.j, identical to the logic
TFRE.sub.j, associated with the processor register RDP.sub.j, and a
logic TFRR.sub.j, identical to the logic TFRR'.sub.j, associated
with the memory register RDM.sub.j. In this case, the signal init
of the logic TFRR.sub.j is connected to the writing signal
{overscore (w)}: the liberation of the memory register RDM.sub.j
automatically rearms the receive logic TFRR.sub.j.
[0128] This embodiment of the transfer control logic is only one
possible example: the transmitter register may be in permanent
shifting itself, and the receiver register activated for t pulses
of the clock on detection of the start bit at the beginning of
transfer.
[0129] The clock H may be connected to two registers, or two local
independent clocks may be used, the synchronization being obtained
in a conventional manner by an introduction of synchronization.
[0130] The system shown in FIG. 4 comprises a divided memory shift
register RDM.sub.j and RDM.sub.2, a divided processor shift
register RDP1.sub.j, and RDP2.sub.j, two unidirectional series
links LS1.sub.j and LS2.sub.j, one connecting the memory register
RDM1 to the processor register RDP.sub.j in such a manner as to
transmit the contents of the first to the second, the other
connecting the memory register RDM2.sub.j to the processor register
RDP2.sub.j in such a manner as to transmit the contents of the
second to the first, and the associated logics for the control of
the transfer: TFRE1.sub.j for RDM1; TFRR2.sub.j for RDM2;
TFRE2.sub.j for RDP@.sub.j; TFRR1.sub.j for RDP1.sub.j.
[0131] For reading a block of information bi into the central
memory RAM, the management processor PG.sub.j initializes by the
signal {overscore (init)} the logic TFRR1.sub.j associated with the
processor register RDP.sub.j1, then activates its demand to read to
the memory RAM by the read signal r. This signal activates the
logic TFRE1.sub.j associated with the memory register RDM1.sub.j:
this assures the transfer on the link LS1.sub.j of the block bi of
information. The end of the transfer is detected by the logic
TFRR1.sub.j associated with the processor register RDP1.sub.j which
notifies the management processor PG.sub.j of the arrival of the
block bi by the signal {overscore (end_transfer)}. The management
processor; PG.sub.j thus transfers the contents of the processor
register RDP1.sub.j into the cache memory MC.sub.j.
[0132] For writing a memory block bi, the management processor
PG.sub.j loads the processor register RDP2 with the block bi
concerned extracted from the cache memory MC.sub.j, which activates
the transfer of this block on the link LS2.sub.j. The transfer
logic TFRR2.sub.j associated with the memory register RDM2.sub.j
assures the good reception of this block. The management processor
PG.sub.j is notified of the end of transfer by the change of state
of the signal {overscore (borrow)} issued from the transmission
logic TFRE2.sub.j. The management processor PG.sub.j then carries
out its writing request which becomes effective at the time of
activation of the writing signal w. This has the effect of
transferring the contents of the register RDM2.sub.j into the
central memory RAM and to reinitiate for a next transfer the logic
TFRR2.sub.j.
[0133] This system authorizes a simultaneous transfer of blocks in
two directions and permits treating more rapidly the defects of
blocks bi in the cache memory MC.sub.j when the latter is
saturated. It authorizes also the operation of a conventional
mechanism for anticipation of the reading of blocks.
[0134] In another embodiment shown in FIG. 5, the connection
LS.sub.j comprises a single bidirectional link provided at each
extremity with a validation logic LV1 and LV2 constituted by a
logic gate with two open connector inputs OC1 and OC2, one of the
inputs being connected to the series output of the memory register
RDM.sub.j for the gate OC1 and the processor register RDP.sub.j for
the gate OC2, the other input being connected to the output Q of a
control flip flop BC1 and BC2; each of these is connected by its
inputs {overscore (S)} and {overscore (R)} to the transfer logic
TFR for the flip flop BD1 and TFR' for the flip flop BD2.
[0135] Readings and writings are carried out in an exclusive
manner, at the sole initiative of the management processor
PG.sub.j.
[0136] A reading memory activates the reading signal {overscore
(r)} which causes the setting to 1 of the flip flop BC1 by its
input {overscore (S)}, the resetting to zero being controlled, on
the input {overscore (R)} by the transfer logic TFR at the end of
the transfer of the block.
[0137] A writing memory releases a mechanism identical to the
validation logic LV2.
[0138] Other combinations of registers/links are possible, and in
the case of a bidirectional link, bidirectional shift registers may
in particular be used, receiving a signal of the direction of
transfer. This solution leads to the use of shift registers more
complex in logic, therefor a priori less efficient in speed of
transfer.
[0139] The speed of transfer coming to be very high, the shift
registers RDM.sub.j and RDP.sub.j, their associated control logic
TFR and TFR', the validation logics LV1 and LV2, are selected from
a rapid technology (ECL, ASGA), and synchronized by a clock of a
frequency F at least equal to 100 MHz.
[0140] Another solution with multiplexed registers shown in FIG. 21
permits, as will be understood below, considerably reducing the
quantity of efficient, and therefore costly logic necessary.
[0141] The multiprocessor of FIG. 1 was provided at the same time
with a common communications bus for block addresses and data
transfer series links. FIG. 6 shows, as a variation, a
multiprocessor system of the same general principle, but in which
data and addresses transfer by the series connections, in the
absence of a common bus.
[0142] This system comprises, besides the memory registers
RDM.sub.j, complementary shift registers RDC.sub.j able to store
the addresses of blocks called for and controlled by a logic of the
type TFR.sub.j. Further, an access management arbitrator ABM is
connected to the central memory RAM and to the complementary
registers RDC.sub.j by their parallel output. Each logic TFR.sub.j
is connected to this arbitrator ABM of conventional structure. The
management processor PG.sub.j of each cache memory MC.sub.j is
connected to one part of the parallel input of the processor
register RDP.sub.j, in order to have access thereto in writing.
[0143] For reading a block bi in central memory RAM, the management
processor PG.sub.j places the address of the block called for and
the nature of the request (by a prefix bit: 1=read, 0=write) in the
part of the register processor RDP.sub.j to which it is accessible,
which has the effect of initializing the transfer of this
information. The transfer logic TFR.sub.j detects the end of the
transfer on the complementary register RDC.sub.j and activates a
demand operation toward the arbitrator ABM. This is loaded for
serializing and treating the read requests of the block in the
central memory RAM while going to read the address of the block
requested in the complementary register RDC.sub.j corresponding to
the transfer logic selected by the arbitrator ABM, then in going to
read the block in the central memory RAM which will then be loaded
into the memory register RDM.sub.j and transmitted as before.
[0144] For writing a block in the central memory RAM, the
management processor PG.sub.j chains the transmission of the
address then of the block to be written through the processor
register RDP.sub.j. The complementary register RDC.sub.j receives
at once the address and the nature of the request.
[0145] The transfer logic TFR.sub.j analyzes this request and
validates the reception of the block in the memory register
RDM.sub.j from the fact of the nature of the request (write). The
transfer logic TFR.sub.j is notified of the end of the transfer of
the block b.sub.i and then transmits its request for service to the
arbitrator ABM. This request is treated, in turn, by the arbitrator
which activates the writing of the block bi into memory.
[0146] Further, the multiprocessor system shown in FIG. 7 comprises
means for management of the shared data, permitting treating, in a
static manner, the classical problem of maintaining the coherence
of the shared data. This system comprises the resources of the
system of FIG. 1 (with the same reference characters) with the
following logic and supplementary resources:
[0147] A special parallel communication bus for words BUSD connects
the processors CPU.sub.j and the central memory RAM. A partition
logic LP.sub.j is associated with each processor CPU.sub.j. Each
logic LP.sub.j is constituted in a conventional manner by an
assembly of coupled register-comparators connected in parallel on
the address bus adr of the processor CPU.sub.j, in order to provide
a division of the memory space of the central memory RAM in the
zone of non-shared data and shared data, said logic LP.sub.j
delivering to this effect a signal p (indicating the nature of the
data, shared or not). A decoding logic DEC is associated with the
central memory RAM, itself arranged to be controlled by a writing
by word or by block by said logic DEC.
[0148] The decoding logic DEC is detailed in FIG. 8 and comprises a
decoder DECL, receiving on its data input the address portion word
adrm of the address adr, and connected through its validation input
to the output of a logic gate ET3, each output i of said decoder
being connected to a validation buffer of the type BFS.sub.i. The
logic gate ET3 receives on its inputs the signal p and the inverted
signal {overscore (r)}. A decoder DECE is connected through its
validation input to the output of a logic gate ET4, its outputs
being connected to an assembly of logic gates OU1.sub.i of a number
equal to the number of words in a block. The logic gate ET4
receives on its inputs the signal p and the inverted signal
{overscore (w)}. The output of the gate ET4 is also connected to an
assembly of validation input buffers BFE.sub.1, BFEi . . . The
central memory RAM may be controlled in writing by word. Each
clipped word thus defined has its writing control input w.sub.i.
The output of each logic gate OU1.sub.i is connected to the input
w1 of each clipped word of the central memory RAM.
[0149] FIG. 8 shows further the detail of the addressing of the
memory registers RDM.sub.j, which comprises in the first place a
decoder DECEB connected through its data input to the common bus
BUSA, in order to receive the number .sub.j of the processor
concerned by the request of the central unit UC.sub.j. This decoder
DECEB is connected through its validation input to the output of a
logic gate ET5 and through its outputs 1, 2 . . . .sub.j to the
validation buffers BV1, BV.sub.j . . . The logic gate ET5 receives
on its inputs the signal p and the inverted signal {overscore (w)}.
In the same manner, a decoder DECLB is connected through its data
input to the field .sub.j of the common bus BUSA and through its
outputs 1, 2 . . . .sub.j of this decoder DECLB are connected to
the loading inputs ld1, ld.sub.j of the memory shift registers
RDM.sub.j. The logic gate ET6 receives on its inputs the signal p
and the inverted signal {overscore (r)}.
[0150] The operation of the system is as follows: at each reference
memory, the processor CPU.sub.j provides an address on its address
bus adr, and the nature of the request: reading {overscore (r)} or
writing {overscore (w)}. It waits for a datum in case of reading
and furnishes a datum in case of writing. The address adr traverses
the partition logic LP.sub.j, which indicates, by the signal p, if
the address adr belongs to a zone of non-shared data (p=0) or of
shared data (p=1). In the first case, the request is directed to
the management processor PG.sub.j and is treated according to the
mode of operation described in reference to FIG. 1. In the second
case, the request is directly sent through the common bus BUSA. The
bus address adr permits supplementary address streams permitting
identification of the word concerned: the address adr is comprised
of a block address part adrb and a word address part adrm. Thus,
after agreement of the bus arbitrator AB, the central memory RAM
receives either a block transaction request (p=0) and in this case,
only the block part adrb of the address adr is significant, or a
word transaction request (p=1) and, in this case, the whole address
adr (block adrb and word adrm) is significant.
[0151] In case of a reading block, p=0 and r=0, the logic gate
ET.sub.6 validates the decoder DECLB which delivers a loading
signal LD.sub.j to the shift register RDM.sub.j, permitting loading
into the latter the block read into the central memory RAM with the
address adrb by the reading signal {overscore (r)}.
[0152] In the case of a writing block, p=0 and w=0, the logic gate
ET5 validates the decoder DECEB which delivers a validation signal
to the buffer BV.sub.j, permitting the contents of this register to
be presented to the central memory RAM and to thus be written to
the address adrb, the output of the logic gate ET5 providing the
block writing signal. This latter is broadcasted on the writing
inputs w1, w.sub.j, . . . to the clipped words of the central
memory RAM across the logic gates OU1.sub.i.
[0153] In case of word reading, p=1 and r=0, the logic gate ET3
validates the decoder DFCL which delivers a validation signal to
the buffer BFS.sub.i, permitting the requested word (address adrm
in the block adrb) of which the reading is assured by the signal
{overscore (r)}, to be directed toward the special communication
bus BUSD. This word is recovered directly by the processor
CPU.sub.j on its data input data.
[0154] In case of writting word, p=1 and w=0, the logic gate ET4
validates the decoder DECE which furnishes on its output .sub.i a
signal directed across the logic gate OU1.sub.i; toward the writing
input w.sub.i of the clipped word of the central memory RAM
concerned. This signal present at the input w.sub.i permits writing
in this sole clipped word the word furnished by the processor
CPU.sub.j on the data bus BUSD. The contents of this bus are
presented in parallel on all the clipped words of the central
memory RAM, due to an activation of the buffers BFE.sub.i by the
signal emitted by the logic gate ET4.
[0155] An essential characteristic of the architecture of the
invention is to present a minimum load of requests on the common
bus BUSA. In the architecture shown schematically in FIG. 7, the
common bus BUSA is driven by the block addresses and the word
addresses. The frequency of the word address requests is a function
of the rate of shared data and may lead to a saturation of the
common bus BUSA.
[0156] FIG. 9 shows a variation of a solution for reducing this
load. The system provided comprises, in addition to the resources
of FIG. 7, a bus BUSAM for the word addresses, an arbitrator AB'
for arbitrating the access conflicts to the bus BUSAM, an
arbitrator ABM for arbitrating the access conflicts in the origin
of the busses BUSA and BUSAM, and connected to a multiplexer MUX
which in turn is connected through its inputs to the two busses
BUSA and BUSAM.
[0157] The operation of this system is as follows:
[0158] If the request concerns non-shared data (p=0), any lack of
information generates a memory request of the block type which
transfers through the common bus BUSA.
[0159] If the request relates to shared data (p=1), the request is
directed toward the common bus BUSAM. Thus, the central memory RAM
may receive simultaneous requests on the two busses BUSA and BUSAM,
which must then be arbitrated. The arbitrator ABM allows, in a
conventional manner, access to the central memory RAM by one of the
two requests and reconstructs the signal p from the origin of the
request (p=0 for BUSA, p=1 for BUSAM). The signal p thus controls
both the multiplexer MUX which allows the signals from the bus
concerned to pass through the request, and the decoding logic DEC:
one is thus situated as in the preceding system.
[0160] It will be noted that the load is shifted from the common
bus to the central memory RAM, since the rate of request at the
level of this latter remains the same, and its cycle time is of the
same order of magnitude or even greater than that of the bus
cycle.
[0161] This solution is therefore only of interest if the central
memory RAM comprises central independent memory banks organized
according to the description given later with reference to FIG. 20:
several transactions may in this case, if they affect different
memory banks, take place simultaneously.
[0162] FIG. 10 presents a block diagram of an embodiment of the
architecture according to the invention, in which the problem of
shared data is treated in a dynamic fashion. To this end, the
system conforming to this embodiment comprises an snooper processor
with a bus PE.sub.j, coupled to a management processor with a
parallel link PGP.sub.j. A management processor with a series link
PGS.sub.j is connected to a snooper processor PE.sub.j by a waiting
line FIFO.sub.j. A management processor for requests from the
central unit PGU.sub.j is connected, first to the treatment
processor CPU.sub.j, and to the parallel link management processors
PGP.sub.j and the series link manager PGS.sub.j. The logic
corresponding to the management processor PG.sub.j of each cache
memory is in this embodiment split into various processors
presented hereabove. The access to the cache memory MC.sub.j and to
its directory RG.sub.j is controlled by a management processor with
a directory and a cache PGR.sub.j.
[0163] Finally, a management processor PGM of the central memory
RAM is connected to the bus BUSA and to the central memory RAM, and
to its shift registers RDM.sub.j.
[0164] The functioning of the assembly is as follows:
[0165] Each transaction on the common bus BUSA corresponds to a
reading or writing request of the block b.sub.i. The snooper
processors of the bus PE.sub.j are activated by each reading
request of the block of data. This operation provided in the same
cycle through all of the snooper processors is going to permit
protecting the uniqueness of the value of the shared data. The
snooper processor PE.sub.j arranges an access to the directory
RG.sub.j. The function of the application used for the management
of the cache memory MC.sub.j is in the described embodiment of the
direct application type. Each element of the directory is a
descriptor of the block which contains a "tag" field (address of
the block), conventional status bits of the block: a validation bit
v and a modification bit m and two supplementary bits a for noting
that the block is known by the cache memory but still in the course
of transfer on the series link, f for indicating that the block is
in the waiting line FIFO and thus avoiding its being placed several
times.
[0166] The management processor memory PGM arranges a waiting line
AFIFO of addresses of blocks bi and of addresses of processors,
accessible in an associative manner, and a status directory of
blocks constituted by 2 bits per block ro and rw indicating the
possible states of the following block:
[0167] ro=rw=0: block not yet broadcast,
[0168] ro=1; rw=0: block already broadcast in reading: one or
several copies of this block being found in the cache memories.
[0169] The evolution of block status bits is the following,
different according to the nature of the request of the treatment
processor CPU.sub.j:
[0170] If the processor CPU.sub.j makes a request for reading
non-shared data (program space or explicitly non-shared data): the
block is marked already broadcast in reading (ro=1; rw=0) central
memory part at the time of transfer of said block from the central
memory RAM to the register RDM.sub.j and marked non-modified (m=0),
cache memory part in the same cycle of operation of the directory
RG.sub.j of the cache memory (valid block). The snoopers have not
reacted to the request on the common bus (the request having been
made with the indication "shared" or "non-shared").
[0171] If the processor CPU.sub.j makes a request for reading data
(a priori shared), the common bus BUSA is occupied for the time of
passage of the information of addresses and of the type requested,
the time of their treatment by the processor PGM and the snoopers
of the common bus PE.sub.j. In the central memory RAM, this block
may be:
[0172] 1. Not yet broadcast: ro=rw=0. It is then transmitted to the
central unit UC.sub.j and takes the non-modified state.
[0173] 2. Already broadcast in reading: ro=1; rw=0. It is then
transmitted to the central unit UC.sub.j. Its state does not
change.
[0174] 3. Already broadcast in writing: ro=0; rw=1. The open copy
of this block is found in a cache memory MCi. The snooper processor
associated with this cache memory has noted the request of the
central unit UC.sub.j at the time of passage of the address on the
common bus and has attempted its transfer to the central memory RAM
as soon as possible on the series link LS.sub.i. While waiting its
effective transfer, the memory management processor PGM places the
request while waiting in the associative waiting line which
comprises a number of elements equal to the number of
processors.
[0175] At the time of the reading request on the common bus BUSA,
all the snooper processors PE.sub.i have reacted by consulting the
directory RG.sub.i associated with their cache memory MC.sub.i. The
common bus BUSA is only freed when all the snooper processors
PE.sub.i have had their access to the management directory
RG.sub.i, which assures the same state of the block in the entire
system. The processor which has the current copy in its cache
memory carries out, as soon as its series link is free, the
transfer of this block into the register RDM.sub.i and makes a
writing request for the block on the common bus which will have the
effect to free the request while waiting in the associative file
AFIFO and carrying out the operation of the status bits of the
block.
[0176] The operation of the block only needs a writing to the
central memory RAM without activation of the snooper.
[0177] If the processor CPU.sub.j requests the writing of a datum
in a block present in its cache memory MC.sub.j with a non-modified
state, an informative writing request must be emitted on the common
bus BUSA as it is possible that other cache memories MC.sub.i have
this block with the same status. These other memories must be
informed of the change of state. To this end, all the snooper
processors PE.sub.i (activated by the informative writing broadcast
on the common bus BUSA) consult their management directory and
invalidate this block, while the central memory notes in the same
time the change of state of this block as well as the parallel
management processor PGP.sub.j, in the management directory
RG.sub.j. The liberation of the common bus BUSA by all the snooper
processors and the central memory RAM permits the processor
CPU.sub.j to provide the writing in its cache memory MC.sub.j, the
broadcasting of the status bit of the management directory RG.sub.j
having been carried out.
[0178] If one central unit is awaiting access to the bus BUSA for
the same request on the same block, its request is transformed in
simple writing and then follows the protocol of the request of the
block in writing.
[0179] If the processor CPU.sub.j requests writing of a datum in a
block absent from the cache memory MC.sub.j, this block is read
into the central memory RAM and brought into the cache memory
MC.sub.j in order that the writing will be carried out
effectively.
[0180] In the central memory RAM, this block may be:
[0181] 1. Not yet broadcast: ro=rw=0. The block is then emitted on
the series connection LS.sub.j to the cache memory MC.sub.j. It
takes the states ro=0; rw=1 in the central memory and the modified
state (m=1) in the cache memory,
[0182] 2. Already broadcast in writing: ro=1; rw=0. The block is
sent on the series connection LS.sub.j to the cache memory
Mc.sub.j. It takes the states ro=0, rw=1 in the central memory, and
the modified state (m=1) in the cache memory. At the time of a
request on the common bus BUSA, the (espion) processors PE.sub.i
have noted the request and invalidated this number in the block in
their cache memory MC.sub.i.
[0183] 3. Already broadcast in writing: ro=0; rw=1. The request is
placed in the associative waiting line AFIFO and the common bus
BUSA is liberated.
[0184] The snooper processor PE.sub.i of the cache memory MC.sub.i,
winner of the current copy, activates as soon as possible the
transfer of the block requested from its cache memory MC.sub.i to
the central memory RAM. This block is then invalidated in the cache
memory MC.sub.i.
[0185] The central unit UC.sub.j makes a writing request to publish
the block in the two following cases:
[0186] a) the cache memory is saturated and the purging of a block
requires sending this block into the central memory,
[0187] b) a central unit UC.sub.i is awaiting a block the only
current copy of which is found in the cache memory MC.sub.j. The
snooper processor notes the request and provides as soon as
possible the purging of this block.
[0188] The central memory side RAM, each request for a writing
operation entails a consultation with the associative waiting line
AFIFO, and in the case of discovery of a central unit UC.sub.i
awaiting this block, the loading of this block into the shift
register RDM.sub.i and the publishing of the status bits
corresponding to this block. This type of writing request does not
drive the (espion) processors.
[0189] The directory management processor PGR.sub.j which is added
in this embodiment, permits the execution of the algorithm
mentioned above, while coordinating the accesses to the directory
of the cache memory MC.sub.j which receives the requests of three
asynchronous functional units:
[0190] 1. The treatment processor CPU.sub.j, in order to read the
instructions of the program during execution and to read or write
the data manipulated by this program,
[0191] 2. The common bus BUSA, in order to maintain the coherence
of the data in the cache memory MC.sub.j,
[0192] 3. The series link LS.sub.j, in order to load/un-load a
block of information from/to the central memory RAM.
[0193] Each of these requests arrives at the management directory
RG.sub.j of the cache memory. The serialization of these accesses
on said management directory permits assuring the good operation of
the algorithm mentioned above with coherence of information in the
cache memories. Thus there is obtained a strong coupling of the
requests at the level of the management directory RG.sub.j but the
synchronization which must exist at the level of the treatment of
these requests is sufficiently low to consider an asynchronous
functioning of the treatment logic for these requests, which leads
to the following functional segmentation:
[0194] The interface of each processor CPU.sub.j and of its
auxiliaries (PGS.sub.j, PGU.sub.j) with the common bus BUSA is
composed of two parts having a mutually exclusive functioning: the
management processor for the parallel link PGP.sub.j is loaded to
request of the common bus BUSA to the request of the request
management processor PGU.sub.j or of the management series link
PGS.sub.j and to control the common bus BUSA in writing. The
snooper processor of the bus PE.sub.j assures the snooping
function, which returns to control the common bus BUSA in reading.
It frequently reaches the directory RG.sub.j of the cache memory
MC.sub.j.
[0195] The management processor of the series link PGS.sub.j
manages the interface with the series link LS.sub.j. It assures the
loading and unloading of blocks of information bi upon request of
the request management processor PGU.sub.j and of the snooper
processor of bus PE.sub.j. It reaches less frequently the cache
memory MC.sub.j and the corresponding management directory
RG.sub.j.
[0196] The request management processor PGU.sub.j assures the
following of the requests issued from the processor CPU.sub.j. It
reaches very frequently into the cache memory MC.sub.j and the
management directory RG.sub.j. This interface includes possible
"MMU" logic (memory management unit) usually associated with the
treatment processor CPU.sub.j.
[0197] The management processor PGR.sub.j of the management
directory RG.sub.j is the arbitrator charged with granting access
to the cache memory MC.sub.j.
[0198] FIGS. 11, 12, 13, 14, 15, 16 and 17 show by way of examples
the embodiments of the various functional units of the system of
FIG. 10. The designations of signals or inputs and outputs of these
units are chosen in a conventional manner. The signals of the same
functionality which are generated in each functional unit from a
base signal will be designated by the same reference, for example:
dnp=non-shared data, dl=reading request, ma.sub.j=updating,
de=demand writing, e.sub.i=informative writing. The system has
several processors and the indicia `.sub.j` used heretofore refers
to a current processor and its auxiliaries; for alleviating the
description, this indicia has been omitted in these figures and it
will be understood that the description which follows is directed
at each of the functional units which are connected with each
treatment processor. Further, the signals denoted x_YZ define the
name and the origin of the signal in the case where YZ=RG, MC, UC,
and the source and the destination of the signal in the other
cases, with Y and Z representing: U=PGU, R=PGR, P=PGP or PE,
S=PGS.
[0199] The cache memory MC shown in FIG. 11 has, for example, a
capacity of 16K . It is organized in 16 modules of high speed read
write memory of 1k MC0, . . . MC.sub.15, each accessible on a front
of 4 octets: the address bus of the cache memory MC (noted adr_MC)
comprises a partial block address adr_block and a partial word
address in the block adr_mot. The address bus adr_MC comprises 14
lines, permitting addressing the 16k of the cache memory MC. The
partial adr_bloc comprises 8 lines permitting addressing the 256
block sites in the cache memory, and the adr_mot part 6 lines
permitting addressing a word in the block the size of which is, in
the example, 64 octets.
[0200] The address part adr_bloc is connected to the adress input
of each of the memory modules MC.sub.0 . . . MC.sub.15. The word
address part adr_mot is connected to the input of two decoders DECO
and DEC1 (only the 4 significant bits of the address bus adr_mot
are used: the address is an octet address and the cache has an
access unit which is a word of 4 octets). The read signal
{overscore (r)}_MC is delivered to each of the read inputs of the
memory modules MC.sub.0 . . . MC.sub.15 and to one of the inputs of
a logic gate OU1. The other input of this logic gate OU1 receives
the inverted signal {overscore (bloc)}. The writing signal
{overscore (w)}_MC is delivered to one of the two inputs of the
logic gates OU2 and OU3. The logic gate OU2 receives on its other
input the inverted signal {overscore (bloc)}. The logic gate OU3
receives on its other input the signal {overscore (bloc)}. The
output of the logic gate OU1 is connected to the validation input
{overscore (en1)} of the decoder DEC1, and the output of digit
position i of this decoder DEC1 activates a validation buffer BVL
of digit position i. The output of the logic gate OU2 is connected
to the validation input {overscore (en0)} of the decoder DECO and
to the validation buffers BVE. The output of the logic gate OU3 is
connected to the logic gates ET1.sub.0 . . . ET1.sub.15, which
receive on their other input the output of the corresponding logic
position of the decoder DEC0. The output i of each logic gate
ET1.sub.0 . . . ET1.sub.15 is connected to the writing inpout
{overscore (w)}.sub.0 . . . {overscore (w)}.sub.15 of each memory
module MC.sub.0 . . . MC.sub.15. A data bus connects each memory
module MC.sub.0 . . . MC.sub.15 to one of the validation buffers
BVL and to one of the validation buffers BVE. The output of the
buffers RVL and the input of the buffers RVE receive in parallel a
data bus datamot_MC (connected to the request management processor
PGU).
[0201] The operation of the example of the cache memory described
above is as follows:
[0202] Case 1
[0203] The request comes from the management processor of the
series link PGS. This case is signalled by the presence of a logic
state zero on the signal bloc.
[0204] In reading the cache memory, the management processor of the
series link PGS presents on the address bus adr_MC the address of
placement of the block to be read (in this case, only the part
adr_bloc of the bus adr_MC is used) and activates the reading
signals {overscore (r)}_MC. At the end of the access time, the
block is available on the bus databloc_MC.
[0205] In writing the cache memory, the management processor of the
series link presents on the address bus adr_MC the address of
placement of the block to write, on the data bus databloc_MC the
data to be written there, and activates the line {overscore
(w)}_MC. The zero state of the signal bloc directs the signal
{overscore (w)}_MC toward the writing control inputs of the cache
memory modules MC.sub.0 . . . MC.sub.15, via the logic gates OU3
and ET1. The information presented on the data bus databloc_MC is
written in the cache memory at the end of the writing.
[0206] Case 2
[0207] The request come from the request management processor PGU
of the treatment processor CPU. This case is signaled by the
presence of a logic state one on the signal bloc.
[0208] In the reading cache memory, the management presents on the
bus adr_MC the address of the word requested, and activates the
reading signal {overscore (r)}_MC. The block corresponding to the
part adr_bloc is read in cache memory, and the word requested is
directed, via one of the validation buffers BVL, toward the data
bus datamot_MC. The validation buffer BVL concerned is activated by
the output of the decoder DEC1 corresponding to the word address
adr_mot requested.
[0209] In writing the cache memory, the management processor PGU
presents on the bus adr_MC the address of the word to be written,
on the data bus datamot_MC the data to be written, and activates
the writing signal {overscore (w)}_MC. The data presented on the
bus datamot_MC is distributed on each cache memory module, via the
buffers BVE validated by the writing signal. The writing signal
{overscore (w)}_MC is then presented to only the memory module
concerned. It is delivered to the output of the decoder DECO
corresponding to the address adr_mot concerned.
[0210] In the embodiment described above, the problems of access in
an octet and double octet, and of access in double octet and a word
on both sides of two memory modules are resolved in the same manner
as in conventional information systems and are not described
here.
[0211] FIGS. 12a, 12b, 12c and 12d show, by way of example, the
dharacteristics of a management directory of the cache RD and of an
associated management processor PGR. FIG. 12a illustrates the logic
structure of the address adr_RG in the hypothesis of a space
addressing on 32 bits and with the characteristics of the cache
memory described previously. The field --tag--, composing the
address block, is coded on 18 bits. The field --cadre-- is coded on
8 bits and permits addressing the 256 positions of the block of the
cache memory MC. The last 6 bits define the address word in the
block, in an octet unit.
[0212] FIG. 12b shows the structure of the management directory of
the cache RG, which is a simple fast read/write memory of 256 words
of 22 bits. Each word of the address i contains the descriptor of
the block written in the position i of the cache memory.
[0213] FIG. 12c shows schematically the structure of the descriptor
which comprises:
[0214] a field tag of 18 bits, defining the address of the block in
the position or frame of the current block,
[0215] the validation bit v,
[0216] the modification bit m,
[0217] the wait-for-end-of-transfer bit a,
[0218] the-wait-for-purge bit f.
[0219] FIG. 12d provides the structure of the processor PGR, which
is nothing other than a conventional arbitrator with fixed
priority.
[0220] This arbitrator comprises a register LATCH, of which three
inputs receive respectively the signals {overscore (rqst)}_UR,
{overscore (rqst)}_PR, {overscore (rqst)}_SR, also delivered
respectively to the logic gates ET2, ET3, ET4. The corresponding
outputs of the register LATCH are connected to the inputs of a
priority encoder PRI, the outputs of which are connected to the
inputs of a decoder DECPRI. The outputs of the digit position
corresponding to those of the register LATCH are connected to the
signals {overscore (grnt)}_UR, {overscore (grnt)}_PR, {overscore
(grnt)}_SR as well as, in an inverted manner, respectively to the
inputs of the logic gates ET2, ET3, ET4. The outputs of the logic
gates ET2, ET3, ET4 are connected to the inputs of the logic gate
NOU1. The output of the logic gate NOU1 is connected to a flip flop
B1, which receives on its input D the output {overscore (eO)} of
the priority encoder PRI. The assembly of the apparatus is
synchronized by a general clock which delivers a signal h to one of
the inputs clk of a logic gate ET5, and in a reverse manner, to the
clock input of the flip flop B1. The output {overscore (Q)} of the
flip flop B1 is connected to the other input of the logic gate ET5.
The output of the logic gate ET5 is connected to the input load of
the register LATCH.
[0221] The operation of this arbitrator is as follows: in the
absence of any request on the lines {overscore (rqst)}, the flip
flop B1 stores permanently the state of the line {overscore (eO)},
inactivates, and thus validates through the logic gate ET5 the
loading of the register LATCH.
[0222] The arrival of a signal {overscore (rqst)} provokes the
latching of the clock and the activation of the signal {overscore
(grnt)} associated with the signal {overscore (rqst)}, until
deactivation of the latter: the arbitrator is fixed in its state
during the entire transaction in progress.
[0223] The request management processor PGU, represented in FIG.
13, constitutes an interface between the treatment processor CPU
and:
[0224] the various processors with which it must exchange
information: paralllel management processor PGP, series management
processor PGS, directory management processor PGR, and
[0225] the management directory of the cache memory RG and the
cache memory MC.
[0226] The treatment processor CPU unlatches the activity of the
request management processor PGU while activating the signal
{overscore (as)} ("address strobe"). This signal validates the
address bus adr_CPU, the reading signals {overscore (r)}_CPU and
the writing signals {overscore (w)}_CPU as well as the function
lines fc_CPU of the treatment processor CPU. The treatment
processor CPU is then placed in a waiting state until discharge of
the request by the signal {overscore (dtack)}_CPU.
[0227] The signal as is connected to the input of a differentiator
circuit DI0. The output of this circuit is connected to one of the
three inputs of a logic gate ET12, the two other inputs receive
respectively the signals {overscore (ack)}_US and {overscore
(ack)}_UP. This latter signal is also delivered to the input R of a
flip flop B13. The input {overscore (S)} of the flip flop B13
receives the output of a logic gate NET10. The output of the logic
gate ET12 is connected to the input {overscore (S)} of a flip flop
B11, at the input {overscore (S)} of the flip flop B10 and at the
input clear10 of a shift register SR10. The output {overscore (Q)}
of the flip flop B11 furnishes the signal {overscore (rqst)}_UR.
The flip flop B11 receives on its input {overscore (R)} the
inverted phase .theta.13 and the flip flop B10 the inverted phase
.theta.11. The output Q of the flip flop B10 is connected to the
series input serial_in10 of the register SR10. The shift register
SR10 receives on its input clk10 the clock signal --h-- and on its
validation input en10 the signal grnt_UR.
[0228] The activation of the signal as releases the operation of
the differentiation circuit D10. The pulse produced by this circuit
across the logic gate ET12, sets to the logic state one of the flip
flops B10 and B11 by their input S, and also carries out the
resetting to zero of the shift register SR10 by its input
{overscore (clear)}10.
[0229] The flip flop B10 and the shift register SR10 constitute the
logical sub-assembly "phase distributor" DP_U. The activation of
this phase distributor is released by the setting to one of the
flip flop B10 and the resetting to zero of the shift register SR10.
If the shift register is validated by the presence of a level zero
on its input {overscore (en10)}, then the next pulse of the clock h
on the input clk of the shift register produces the shifting of one
step of said register.
[0230] The logic state one of the flip flop B10 is propagated on
the output .theta.11 of the shift register SR10 by its series input
serialin10. The output Q11, called phase .theta.11, inverted,
resets the flip flop B10 to zero through its input R. Thus, one
single bit is introduced to the shift register SR10 on each
activation of the phase distributor DP_U. At each clock pulse h,
this bit comes to be shifted in the shift register SR10 and produce
the consecutive disjointed phases .theta.11, .theta.12,
.theta.13.
[0231] The setting of the flip flop B11 to the logic state of one
causes activation of the signal {overscore (rqst)}_UR. This signal
is emitted to the destination of the directory management processor
PGR. This latter, as soon as possible, comes to accord access to
the management directory RG and to the cache memory MC while
activating the signal {overscore (grnt)}_UR, which comes to
validate the assembly of passage buffers BV10, BV11 and BV12
situated respectively on the busses of the management directory and
on the busses of the cache memory. This signal {overscore
(grnt)}_UR also validates the phase distributor which then produces
sequentially the phases .theta.11, .theta.12, .theta.13.
[0232] The phase .theta.11 corresponds to a time-out permitting
reading of the descriptor of the block requested by the treatment
processor CPU in the management directory RG, addressed by the
field frame with the address adr_CPU and connects to the bus adr_RG
through the passage buffers BV10. The signal {overscore (r)}_RG is
always active on the input of a buffer BV10, the signal {overscore
(w)}_RG always inactive on the input of a buffer BV10. At the start
of the time out, the descriptor is returned to the processor PGU
via the bus date_RG. The part tag of this descriptor and the
validation bit v are delivered to one of the comparison inputs of a
comparator COMP10, the other input being connected to the part tag
of the address adr_CPU. The bit opposite the validation bit v is
always at one. The comparator COMP10 is validated permanently by
the presence of a level of one on its input en11.
[0233] The time of access of the management directory RG and the
frequency of the cLock h are in a ratio such that at the end of the
phase .theta.11, the output eg10 of the comparator COMP10 is
positidnned and furnished the information "the block requested is
present in the cache memory or absent from the cache memory."
[0234] If the block requested is present in the cache memory MC
(eg10=1), then the signal eg10, delivered to one of the two inputs
of the logic gate ET10, furnishes a signal calibrated by the phase
.theta.12, present on the outher input of the logic gate ET10.
[0235] This calibrated signal present on the output of the logic
gate ET10 is connected to the inputs of the logic gates NET10,
NET11, NET12.
[0236] The logic gate NET10 receives on its inputs, in addition to
the output of the logic gate ET10, the inverted status bit m coming
from the descriptor, and the inverted writing request signal
{overscore (w)}_CPU.
[0237] The activation of the logic gate NET10 corresponds to the
state "request writing of a word in a block present in the cache
and which has not yet been modified (m=0)". The output of the logic
gate NET10 is connected to the input S of a flip flop B13. The
activation of the logic gate NET10 sets the flip flop B13 in the
logic state of one, which releases an informative writing request
by the line {overscore (rqst)}_UP to the management processor of
the parallel link PGP. The address of the block concerned is
furnished by the lines adr_bloc_UP, derived from the lines
adr_CPU.
[0238] The request management processor PGU has terminated the
first part of its task: the management directory RG and the cache
memory are freed by deactivation of the signal {overscore
(rqst)}_UR, as a consequence of the arrival of the inverted phase
.theta.13 on the input R of the flip flop B11.
[0239] The informative writing mechanism is described in the
paragraph "management processor of the parallel link PGP", and has
for effect to place the requested block in the modified state (m=1)
or the invalid state (v=0). One will note that the liberation of
the management directory RG and the cache memory MC by the request
management processor PGU is necessary in order that the management
processor of the parallel link PGP may have access thereto. The end
of the "informative writing" operation is signalled to the request
management processor PGU by the activation of the signal {overscore
(ack)}_UP, which has the effect of resetting the flip flop B13 to
zero and activating, through the logic gate ET12, the flip flop
B12, the flip flop B11 and the phase distributor: the cycle
initiall activated by the signal {overscore (as)} is reproduced,
but the consequential sequence of the activation of the gate NET10
is not reproduced a second time in this request cycle.
[0240] The logic gate NET11 receives on its inputs, other than the
output of the logic gate ET10, the status bit m from the
descriptor, and the inverted signal of the writing request
{overscore (w)}_CPU. The activation of the gate NET11 corresponds
to the state "writing request in a block present in the cache and
which has already been modified. The output of the gate NET11 is
connected, through one of the buffers BV11, to the writing signal
{overscore (w)}_MC of the cache memory MC. This signal permits
writing in the cache memory, at the address present on the bus
adr_MC, connected to the bus adr_CPU, via the buffers BV11, the
data present on ths bus data_MC, connected to the bus data_CPU via
the bidirectional buffers BV12. The direction of activation of
these buffers is furnished by the signal {overscore (w)}_MC.
[0241] The output of the gate NET11 is also connected to one of the
inputs of the logic gate ET11, which thus re-sends the signal
{overscore (dtack)}_CPU to the treatment processor CPU. The writing
operation in the cache is done in parallel with the activation of
the signal {overscore (dtack)}_CPU, which is conformed to the usual
specifications of treatment processors.
[0242] The operation is terminated by freeing of the management
directory RG and of the cache memory MC by deactivation of the
signal {overscore (rqst)}_UR, as a consequence of the arrival of
the inverted phase .theta.13 on the flip flop B11.
[0243] The logic gate NET12 receives on its inputs, in addition to
the output of the logic gate ET10, the inverted reading request
signal {overscore (r)}_CPU. The activation of the logit gate NET12
corresponds to the state "request reading of a word in a block
present in the cache."
[0244] The continuation of the operations is identical to the
preceding operation, with the sole difference of the activated
signal ({overscore (r)}_MC rather than {overscore (w)}_MC)
associated with the direction of transit of the data on the busses
data_CPU and data_MC.
[0245] If the block requested is absent from the cache memore
(eg10=0), then the signal eg10, inverted, connected to one of the
two inputs of the logic gate NET13, furnishes a signal calibrated
by the phase .theta.12, present on the other input of the logic
gate NET13. The output of the logic gate NET13 is connected to the
input {overscore (S)} of the flip flop B12. This calibrated signal
forces the flip flop to one, which has the effect of sending a
service request {overscore (rqst)}_US to the management processor
of the series link PGS. This processor also receives the address of
the block to request on the lines adr_bloc_US and the nature of the
request on the lines {overscore (w)}_US, {overscore (r)}_US and
fc_US.
[0246] The request management processor PGU has terminated the
first part of its task: the management directory RG and the cache
memory MC are freed by deactivation of the line {overscore
(rqst)}_UR, as a consequence of the arrival of the inverted phase
.theta.13 on the flip flop B11.
[0247] The mechanism for updating the cache is described in the
paragraph "management processor for the series link PGS".
[0248] One will note that the liberation of the management
directory RG and the cache memory MC is necessary in order that the
management processor for the series link PGS may have access
thereto.
[0249] The updating of the cache is signalled to the request
processor PGU by the activation of the signal {overscore (ack)}_US.
This signal is delivered to the input {overscore (R)} of the flip
flop B12 and to the input of the gate ET12. It thus has the effect
of resetting to zero the flip flop B12 and activating through the
logic gate ET12, the flip flop B11 and the phase distributor: the
cycle initially started by the signal {overscore (as)} is
reproduced, but this time with success due to the presence of the
block in the cache memory.
[0250] The series management processor PGS shown by way of example
in FIG. 14 is charged with managing the series link LS and in this
way to provide the block transfer requests between the central
memory RAM and the cache memory MC and to produce the corresponding
operations in the management directory RG. It treats in priority
the requests from the snooper processor PE and waits in a waiting
line FIFO. It also treats the requests from the request processor
PGU.
[0251] This management processor for the series link PGS comprises
a flip flop B20 which receives on its data inputs D the signal
{overscore (empty)} coming from the waiting file FIFO and on its
clock inputs the output {overscore (Q)} of a flip flop B22. A flip
flop B21 receives on its input the output data of a logic gate
OU20. This logic gate OU20 validates the signal {overscore
(rqst)}_US, received on one of its two inputs, the other input
being connected to the signal {overscore (empty)}. The clock input
of the flip flop B21 coming from the output Q of the flip flop B22.
The output {overscore (Q)} of the flip flop B22 is reconnected on
its data input D, which conditions it while dividing by two. The
clock input of the flip flop B22 is connected to the output of a
logic gate ET20, which receives on one of its inputs the general
operating clock h and on the other input a validation signal. This
validation signal comes from the output of a logic gate ET24, which
receives respectively on its two inputs the outputs {overscore (Q)}
and Q of the flip flops B20 and B21.
[0252] The flip flop B20 receives on its input {overscore (R)} the
phase .theta.25, inverted, issuing from a phase distributor DP_S.
the flip flop B21 receives on its input {overscore (S)} the output
of a logic gate NOU20. This logic gate NOU20 receives on its two
inputs the respective outputs of the logic gates ET22 and ET23. The
logic gate ET22 receives on its inputs the phase .theta.25 from the
phase distributor and the signal maj from a logic gate ET29. The
logic gate ET23 receives on its inputs the phase .theta.27 from the
phase distributor and the inverted signal maj.
[0253] The assembly of the circuits B20, B21, B22, ET20, ET22,
ET23, OU20, ET24, NOU20 constitute a fixed priority arbitrator
ARB_S. Its operation is as follows: the flip flop B22 furnishes on
its outputs {overscore (Q)} and Q the alternate frequency signals
half of that of the general clock. These signals validate
alternatively the flip flops B20 and B21. If a request for service
is present on one of the inputs of the flip flops B20 or B21, then
these alternated signals store the request in the corresponding
flip flop (B20 for the snooper processor PE, B21 for the request
management processor PGU) which, in return, lock out the alternated
function. One will note that the signal {overscore (rqst)}_US
originating from the request management processor PGU is
conditionned by the signal {overscore (empty)} through the gate
OU20: the signal is only considered if the waiting line FIFO is
empty. The flip flop B20 (respectively B21) is only reset to zero
when the transaction to be carried out is terminated.
[0254] The sampling of a request on one or the other of the flip
flops B20 and B21 translates by a change of state of the output of
the logic gate ET24. The output of the logic gate ET24 is also
connected to a differentiator circuit D20 which delivers a pulse
upon changing of the state of the output of the logic gate ET24.
The output of the differentiator circuit is connected, on one part,
to the phase distributor DP_S (input {overscore (S)} of flip flop
B36 and {overscore (clr)}20 of a shift register SR20) of the
management processor for the series link, and for the other part to
one of the two inputs of the logic gate OU22. The output of the
logic gate is connected to the inputs {overscore (S)} of the two
flip flops B24 and B25. The flip flop B24 receives on its input
{overscore (R)} the output of a logic gate NOU21 and the flip flop
B25 the signal {overscore (grnt)}_SR. The logic gate NOU21 receives
on its two inputs the outputs of the logic gates ET36 and ET37. The
logic gate E36 receives on its inputs the phase 023 from the phase
distributor and the signal maj, and the logic gate E37 the phase
.theta.27 from the phase distributor and the signal maj
inverted.
[0255] The pulse from the differentiator circuit D20 thus
initializes the phase distributor and sets the logic state of the
flip flops B24 and B25 to one across the logic gate OU22.
[0256] The phase distributor DP_S is composed of shift register
SR20 and the flip flop B36. Its operation is identical to that
described in the paragraph concerning the request management
processor PGU.
[0257] The output {overscore (Q)} of the flip flop B24 is connected
to the signal {overscore (rqst)}_SR, to the destination of the
directory management processor PGR. Its activation releases a
request for service to this processor, which responds by the line
{overscore (grnt)}_SR, connected to the input {overscore (R)} of
the flip flop B25. The output Q of the flip flop B25 is connected
to one of the inputs of a logic gate OU23. the output of thje logic
gate OU23 is connected to the input {overscore (en20)} of the shift
register SR20.
[0258] The logic assembly B24 and B25 constitute a
resynchronization logic RESYNC_S between asynchronous units. Its
operation is as follows:
[0259] A service request {overscore (rqst)}_SR to the directory
management processor PGR is made by activation of the flip flops
B24 and B25 through the logic gate OU22, which authorizes two
origins of activation. The logic proper to the directory management
processor PGR assures a response {overscore (grnt)}_SR in an
indeterminate time, which resets the flip flop B25 to zero. The
flip flop B24 maintains its request until it is reset to zero by
activation of its input {overscore (R)}. In return, the directory
management processor PGR deactivates its line grnt_SR: the
resynchronization logic is ready to operate for a next request for
service. The output Q of the flip flop B25 serves to block the
phase distributor by action on its input {overscore (en20)} via the
logic gate OU23: the resetting to zero of the flip flop B25 frees
the phase distributor which furnishes the first phase .theta.21
upon the next active transition of the general clock h, connected
to the input clk20 of the phase distributor.
[0260] The line {overscore (grnt)}_SR is also connected to the
validation buffers BV20, BV25 and BV21, BV22 which open access
respectively to the management directory RG and to the cache memory
MC.
[0261] If the flip flop B20 is active, then the transaction in
progress is a purge of the block requested by the snooper processor
PE via the waiting line FIFO. The output Q of the flip flop B20 is
connected to the validation buffers BV24. These buffers receive on
their input the output of a register REG20. The output {overscore
(Q)} of the flip flop B20 is connected to one of the two inputs of
a logic gate OU21, which receives on its other input the output of
the differentiator circuit D20. The output of the logic gate OU21
is connected to the input {overscore (load20)} of the register
REG20 and to the input {overscore (read20)} of the waiting line
FIFO.
[0262] Thus, the activation of the flip-flop B20 provokes:
[0263] 1. Initialization of the phase distributor,
[0264] 2. A request for access to the management directory RG and
to the cache memory MC,
[0265] 3. The loading,of the element at the head of the waiting
line FIFO in the register REG20 and advancing of the waiting
ling,
[0266] 4. The validation of the buffer BV24: the bus adr_x contains
the address of the block to be purged. The nature of the operation
(carried out) will be relocated from the combination of the bits v
and m (signal maj).
[0267] If the flip flop B21 is active, then the transaction in
progress comes from an information defectin the cache memory MC.
The output {overscore (Q)} of the flip flop B21 is connected to the
validation buvvers BV23. These buffers receive on their input the
information originating from the request management processor
PGU.
[0268] Thus, the activation of the flip flop B21 provokes:
[0269] 1. The initialization of the phase distributor,
[0270] 2. A request for access to the management directory RG and
to the cache memory MC,
[0271] 3. The validation of the buffers BV23: the bus adr_x
contains the address of the block which has provoked the
information defectin the cache memory MC, and the nature of the
request: reading or writing, shared data of non-shared data (lines
fc_US).
[0272] The field "frame" of the bus adr_x is connected to the
address lines adr_RG of the management directory through the
validation buffers BV20. the outputs {overscore (Q)} of the flip
flop B26 and B27 are respectively connected to the lines {overscore
(r)}_RG and {overscore (w)}_RG of the management directory through
the validation buffers BV20. The flip flop B26 receives on its
input {overscore (S)} the output of the logic gate OU22 and on its
input {overscore (R)} the phase 022 inverted, coming from the phase
distributor. The flip flop B27 receives on its input {overscore
(S)} the output of a logic gate NOU22 and on input {overscore (R)}
the output of a loghic gate NOU23. The logic gate NOU22 receives on
its two inputs the respective outputs of the logic gates ET25 and
ET26, themselves receiving on their inputs, for the gate ET25 the
phase .theta.22 and the signal maj, and for the gate ET26 the phase
.theta.26 and the signal maj. The logic gate NOU23 receives on its
two inputs the respective outputs of the logic gates ET27 and ET28,
themeselves receiving on their inputs, for the gate ET27 the phase
.theta.23 and the signalmaj inverted, and for the gate ET28 the
phase .theta.27 and the signal maj inverted.
[0273] The field tag of the bus adr_x is connected to the field tag
of the lines data_RG through the validation buffers BV25. These
latter receive on their validation input the output of a logic gate
OU24 which receives on its inputs the signal {overscore (grnt)}_SR
and the output Q from the flip flop B27.
[0274] The inputs D of the flip flops B28 and B29 are respectively
connected to the bit validation lines v and bit modification lines
m of the bus data_RG. The clock inputs of these flip flops B28 and
B29 are connected to the phase 022 of the phase distributor. The
output Q of the flip flop B28 and the output Q of the flip flop B29
are connected to two inputs of a logic gate ET29, which furnishes
on its output the signal maj. A logic gate OU25 receives on its
inputs the signal maj and the output {overscore (Q)} of a flip flop
B30. The output of the logic gate OU25 is connected to the
selection input sel20 of a multiplexer MUX20, which receives on its
two data inputs the output Q of the flip flop B28 (bit v) and the
constant 0, the constant zero being selected when the selection
input sel20 is at the logic state of one. The output of the
multiplexer MUX20 is connected to the bit validation line v of the
bus adr_x. The bit waiting line --a-- of the bus adr_x is forced to
the logic state zero. The modification bit --m-- is connected to
the reading line {overscore (r)}_adr_x of the bus adr_x.
[0275] The logic assembly described above constitutes the access
logic and operation logic of the management directory RG. Its
operation is as follows: the activation of the signal {overscore
(grnt)}_SR, authorizing access to the management director and to
the cache memory, validates the validation buffers BV20. The
reading of the descriptor concerned is controlled from the
beginning of the access authorization until the arrival of the
phase .theta.22, the storage instant of the bits --v-- and --m-- in
the flip flops B28 and B29. The combined state of these two bits
produces, across the logic gate ET29, the signal maj which
conditions the succeeding operations.
[0276] First Case: maj=1. This case is produced when the validation
bit is equal to 1 and the modification bit is equal to 1. This case
corresponds either to a request to purge the block from the snooper
processor PE, or to an information defect established by the
request management processor PGU upon placement of the block
occupied and modified: in the two cases, the block concerned must
be written into the central memory RAM.
[0277] To this effect, the field --cadre-- of the bus adr_x is
connected to the lines adr_MC via the validation buffers BV21. The
outputs {overscore (Q)} of the flip flops B30 and B31 are
respectively connected to the lines {overscore (r)}_MC and
{overscore (w)}_MC of the cache memory MC through the validation
buffers BV21. The line bloc is forced to zero via one of the
validation buffers BV21. The data lines data_MC of the cache memory
are connected, via the bidirectional validation buffers BV22 to the
inputs of the shift register RDP and to the outputs of the
validation buffers BV26, which receive on their inputs the. output
lines of the shift register RDP. The buffers BV22 are validated by
the line {overscore (grnt)}_SR, and their direction of validation
controlled by the output {overscore (Q)} of the flip flop B31. The
flip flop B30 receives on its inputs {overscore (S)} and {overscore
(R)} respectively the phases .THETA.21 and .THETA.23, inverted,
coming from the phase distributor. the flip flop B31 receives on
its inputs {overscore (S)} and {overscore (R)} respectively the
outputs of the logic gates ET32 and ET33. The logic gate ET32
receives on its inputs the phase .theta.26 and the signal maj
inverted, the logic gate ET33 the phase .theta.27 and the signal
maj inverted. A logic gate NET20 receives on its inputs the phase
.theta.23 and the signal maj. The output of a logic gate ET35
controls the validation buffers BV26, and receives on its two
inputs respectively the signal maj inverted and the output Q of the
flip flop Q31. The output of the logic gate NET20 is connected to
the signal {overscore (load21)} of the shift register RDP and to
the input S of a flip flop B32. The input R of this flip flop B32
receives the slgnal {overscore (fin_transfert_maj)} coming from the
logic TFR associated with the shift register RDP. The output Q of
the flip flop B32 is connected to one of the inputs of the logic
gate QU23.
[0278] The logic described above permits purging a block from the
cache memory. Its operation is as follows:
[0279] In parallel with the access to the management directory RG,
a reading of the cache memory MC is activated by the line
{overscore (r)}_MC, coming from the flip flop B30, during the
phases .THETA.21 and .THETA.22. At the output of this reading, the
data read, representing the block to be downloaded, are presented
at the input of the shift register RDM. The activation of the
signal maj, the state of which is known from the beginning of the
phase .theta.22, provokes:
[0280] 1. The invalidation of the block in the management
directory: the input sel20 of the multiplexer MUX20 being at the
logic state one, the value zero is forced on the validation bit,
the descriptor being written in the management directory RG with
the activation of the signal {overscore (w)}_RG, controlled by the
flip flop B27 during the cycle .THETA.22,
[0281] 2. The loading of the shift register RDH and the activation
of the transfer, upon passage of the phase .theta.22 to the phase
.theta.23,
[0282] 3. The placement at the logic state one of the flip flop
B32, which comes to block the phase distributor on the phase
.theta.23 until the end of the transfer, signalled by the signal
{overscore (fin_transfert_maj)} coming from the logic TFR,
[0283] 4. The liberation of the management directory RG and of the
cache memory MC by resetting to zero the flip flop B24 on the phase
.theta.23.
[0284] Thus, the access to the directory if freed (therefor
accessible for the snooper processor PE), the transfer of the
operation is in progress, and the phase distributor blocked on
.THETA.23.
[0285] When the transfer is terminated, the block is in waiting
state in the shift register RDM and it is necessary to activate the
management processor of the parallel link in order that the writing
into central memory RAM will be effective.
[0286] To this effect, the flip flop B33 is connected by its output
Q to the service request line {overscore (rqst)}_SP at the
destination of the parallel link management processor PGP. The
output Q is connected to one of the inputs of the logic gate OU23,
the input {overscore (S)} to the phase .theta.24 inverted from the
phase distributor and the input {overscore (R)} to the signal
{overscore (ack)}_SP. The bus adr_x is connected to the bus
adr_bloc_SP of the parallel link management processor PGP. One of
the lines of the bus adr_bloc_SP receives the signal maj in order
to indicate the nature of the request: position open.
[0287] After the liberating of the phase distributor by the signal
{overscore (fin_transfert_maj)}, the next active transition of the
general clock h causes the passage of the phase .theta.23 to the
phase .theta.24. The phase .theta.24 provokes a service request
from the parallel link management processor PGP (activation of the
line rqst_SP) and the blocking of the phase distributor until
discharge of the request by the signal ack_SP. At this moment, the
writing in operation will be effectively achieved by the parallel
link management processor PGP. The phase distributor, on the next
active transition of the clock h, comes to pass from the phase
.theta.24 to the phase .theta.25.
[0288] The updating of the central memory RAM is terminated: the
flip flop B20 is set to zero by activation of its input R by the
phase 025 inverted. The flip flop B212 is set to one by activation
of its input S by the phase 025, conditionned by the signal maj by
means of the logic gate ET22 via the logic gate NDU22. In case of
information defect in the cache memory, the purge of a block only
constitutes the first part of the request: the request rqst_US is
always present, but the liberation of the flip flop B21 is going to
permit eventual consideration of requests for operation while
waiting in the waiting line FIFO. When the waiting line FIFO is
empty (empty=0), the whole cycle described previously is
reproduced, with, this time, the validation bit at zero. On is then
in the following case:
[0289] Second Case: maj=0. This case arises when the validation bit
is equal to zero (by following perhaps, a purge of a block) or when
the validation bit is equal to one, but the modification bit is
equal to zero: the current copy of this block is already in the
central memory RAM.
[0290] Thus, the request for service rqst_SR comes to cause in
return an accord for access to the management directory RG and to
the cache memory MC with reading of the descriptor, storing the
bits m and v, generation of the signal maj, re-writing of the
descriptor with v=0 (maj is at the logic state zero, the flip flop
B31 is at the logic state zero and its output Q has a value of one,
which imposes a logic signal at the state one on the input sel20 of
the multiplexer MUX20 and therefor forces the constant 0) and the
freeing of access to the management directory RG and the cache
memory MC. The operations are effective upon activation of the
phase 023.
[0291] The block request must now be read in the central memory
RAM. To this effect, the signal maj inverted is received at one of
the inputs of a logic gate NET21, which receives on its other input
the phase 025. The output of the logic gate NET21 is connected to
inputs {overscore (S)} of the flip flops B34 and B35. The flip flop
B34 receives on its input {overscore (R)} the signal {overscore
(fin)}_reception coming from the transfer logic TFR. Its output Q
is connected to the differentiator circuit D21. This circuit is
connected to the logic gate OU22. the input {overscore (R)} of the
flip flop B35 is connected to the signal {overscore (grnt)}_SR and
the output {overscore (Q)} on one of the inputs of the logic gate
OU23.
[0292] The reading of a block in the central memory RAM and its
transfer into the cache memory are realized in the following
manner: the transition of the phase .theta.23 to the phase
.theta.24 unlatches a service request to the parallel link
management processor by activation of the line {overscore
(rqst)}_SP, coming from the flip flop B33. The type of operation is
this time a reading (r_adr_x0) or a writing (w_adr_x=0) and the bus
adr_x furnishes the address of the block requested on the bus
adr_bloc_SP. The phase distributor is blocked until arrival of the
release signal {overscore (ack)}_SP: the reading or writing request
has been made to the central memory RAM by the parallel connection
management processor PGP, and the block has been at the same time
validated and marked "in waiting" by this same processor. the
transfer is then in progress, from the central memory RAM to the
shift register RDP.
[0293] The phase distributor, freed, furnishes then the phase
.theta.25. This phase, by the intermediary of the logic gate NET21
which receives on its other input the signal maj inverted, comes to
set to the logic state one, the flip flops B34 and B35. The flip
flop B35 blocks the phase distributor. The flip flop B34 is reset
to zero upon arrival of the block request in the register RDP,
signalled by the activation of the line {overscore (fin_reception)}
coming from the transfer logic TFR, which releases the activation
of the differentiator circuit D21 and an access demand to the
management directory RG and the cache memory MC, by the activation
of the line {overscore (rqst)}_SR from the flip flop B24.
[0294] The access accord, signalled by the line {overscore
(grnt)}_SR, frees the phase distributor by reseting the flip flops
B25 and B35 to zero, and opening the access to the management
directory RG and to the cache memory MC. The phase distributor,
upon the next active transition of the general clock h, furnishes
the phase 026.
[0295] The flip flop B27, connected to the signal {overscore
(w)}_RC of the management directory, is activated by .THETA.26 and
.THETA.27, which permits providing the operation of the management
directory with:
[0296] field tag of the bus data_RG=field tag of the bus adr_x,
[0297] validation bit v=1 (maj=0 and the output Q of the flip flop
B31 is at zero: the multiplexer MUX20 allows v to pass, which has
been forced to one by the parallel connection management processor
PGP, or already reset to zero by the snooper processor PE),
[0298] modification bit m=state of the line {overscore (r)}_adr_x
of the bus adr_x (writing request causes m=1, reading request
m=0),
[0299] awaiting transfer bit a=0,
[0300] bit f=0.
[0301] The flip flop B31 activates the signal {overscore (w)}_MC of
the cache memory MC of .THETA.26 and .THETA.27, which permits
writing trhe contents of the shift register RDM (the buffers BV26
are validated by the output {overscore (Q)} of the flip flop B31
and the signal maj inverted) in the good placement (field --cadre--
of the bus adr_x connected to the bus adr_MC) of the cache memory,
via the validation buffers BV22, controlled in the direction of
tralsfer by the flip flop B31.
[0302] Upon arrival of the phase .theta.27, the operation of the
management directory and of the cache memory is terminated. The
arrival of the phase .theta.27 provokes the resetting of the flip
flop B24 to zero, which frees the access to the management
directory RG and to the cache memory MC and the activation of the
output of the logic gate ET23, resetting the flip flop B21 to zero
and activating the signal {overscore (ack)}_US to the destination
of the request management processor PGU: the request is
terminated.
[0303] In addition, the snooper processor PE is charged with
maintaining the coherence of the data in the cache memory MC. By
way of example, there is shown in FIG. 15 an architecture of this
processor. This is released at each active transition of the signal
{overscore (valid)} from the common bus BUSA.
[0304] If the signal {overscore (valid)} is activated by a parallel
connection management processor PGP other that that associated with
the snooper processor PE, then the task of this latter is as
follows, as a function of the request:
[0305] request for reading of the block of non-shared data, or
request for writing of the operation of the block: nothing,
[0306] request for reading of the block of shared data:
[0307] block absent: nothing,
[0308] block present and not modified: nothing,
[0309] block present and modified: request for purge of the block
(that this block be present or in the course of transfer from the
central memory RAM to the cache memory MC, state signalled by the
bit --s-- of the descriptor),
[0310] request for writing of the block of shared data:
[0311] block absent: nothing,
[0312] block present unmodified: invalidate the block,
[0313] block present modified: request for purge of the block (same
remark as above),
[0314] request for informative writing of the block:
[0315] block absent: nothing,
[0316] block present unmodified: invalidate the block,
[0317] block present modified: impossible case,
[0318] if a request for informative writing is in waiting on this
same block, it is transformed into a request for writing of the
block, since this block has been invalidated: to this effect, the
request in progress is annulled and a release is sent to the
request management processor PGU. This latter consults the
directory RG and comes to find the block absent: a request is sent
to the management processor PGS. This operation is taken into
consideration by the parallel management processor PGP.
[0319] If the signal {overscore (valid)} is activated by the
parallel connection management processor PGP associated with the
snooper processor, then the task of the latter is as follows,
function of the type of request (termed local request):
[0320] request for reading of the block of non-shared data or
request for writing of the operation of this block: nothing,
[0321] request for reading of the block of shared data: mark the
block valid, while waiting for transfer and modified (v=m=s=1),
[0322] request for informative writing of the block: mark the block
"modified" (m=1).
[0323] To assure the functions described above, the snooper
processor PE comprises a phasae distributor DP_E, comprising a
shift register SR40 and a flip flop B40. The output of a
differentiator circuit D40 is connected to the input {overscore
(S)} of the flip flop B40 and the input {overscore (clear40)} of
the register SR40, as well as the input {overscore (S)} of a flip
flop B41. The output Q of the flip flop B40 is connected to the
input serial_in40 of the register SR40. The input clk40 of the
register SR40 receives the signal h from the general clock and the
validation input {overscore (en40)} is connected to the output Q of
the flip flop B43. The phase B41, inverted, coming from the
register SR40 is connected to the input {overscore (R)} of the flip
flop B40.
[0324] The operation of the phase distributor is in accordance with
the description given for the request management processor PGU.
[0325] The signal {overscore (valid)} of the bus BUSA is connected
to the input of the differentiator circuit D40 and to the
validation input {overscore (en41)} of a decoder DEC41. The flip
flop R41 receives on its input {overscore (R)} the output of the
logic gate NOU40. The output {overscore (Q)} of the flip flop B41,
to the open collector, is connected to the signal {overscore
(done)} of the bus BUSA.
[0326] The signals {overscore (valid)} and {overscore (done)}
assure the synchronization of the snooper processor PE with the
other snooper processors of the multiprocessor system: the negative
transition of the signal valid unlatches the differentiator circuit
D40 which produces a pulse permitting activation of the phase
distributor and sets the signal {overscore (done)} at the logic
state zero through the intermediate flip flop B41. The end of the
work of the snooper is signalled by a change of state on the output
of the logic gate NOU40, which produces the setting-of the logic
state of the signal {overscore (done)} to one through the
intermediary of the flip flop B41.
[0327] The work of the snooper is a function of the nature of the
request and, to, this effect, the field type of the bus BUSA is
connected to the input of the decoder DEC41. The outputs dnp and
dmaj of the decoder DEC41 are connected to the inputs of a logic
gate OU40. The outputs di, de, dei of the decoder DEC41 are
connected to the inputs of a logic gate OU41, the outputs de and
dei being both connected to the inputs ofr a logic gate OU42. The
output of the logic gate OU40 is connected to one of the inputs of
the logic gate NOU40 by the intermediary of a logic gate ET40,
which receives on its other input the signal Q41. The output of the
logic gate OU41 is connected respectively to one of the inputs of
the logic gates ET42, ET43, which receive on their other input
respectively the phase signal .THETA.41 and .THETA.44.
[0328] The outputs of the logic gates ET42 a nd ET43 are connected
respectively to the inputs {overscore (S)} and {overscore (R)} of a
flip flop B44. The output of the logic gate ET42 is also connected
to the inputs {overscore (S)} of the flip flops B42 and B43. A
logic gate NOU41 receives on its inputs the phase signal .THETA.45
and the output of a logic gate ET45, which receives on its two
inputs the phase .theta.44 and the inverted output of a logic gate
OU43. The output of the logic gate NOU41 is connected to the input
{overscore (R)} of the flip flop B42. The output {overscore (Q)} of
the flip flop B42 is connected to the signal {overscore (rqst)}_PR
and the signal {overscore (grnt)}_PR is delivered to the input
{overscore (R)} of the flip flop B43, to the validation inputs of
the passage buffers BV40, and to one of the inputs of the logic
gate OU44. The logic gate OU44 receives on its other input the
output {overscore (Q)} of a flip flop B45, and its output validates
the passage buffers BV41. The output of the logic gate ET45 is also
connected to one of the inputs of the logic gate NOU40 which
receives on another input the phase signal .THETA.45. the output of
the logic gate OU42 is connected to the inputs of the logic gates
ET46 and ET47, which receive also on their inputs the output of a
logic gate OU43 and respectively the phases .THETA.44 and
.THETA.45.
[0329] The output {overscore (Q)} of the flip flop B44 delivers the
signal {overscore (r)}_RG via a buffer BV40, the output {overscore
(Q)} of the flip flop B45 delivers the signal {overscore (w)}_RG
via a buffer BV40. The field cadre of the common bus BUSA is
connected to bus adr_RG via a buffer BV40. The bus data_RG is
connecte4d to the outputs of the validation buffers BV41 and to the
input of the register REG40, which receives on its input load40 the
phase signal .THETA.43. The output of the register REG40 is
connected, for the part tag, to the inputs of the buffers BV41 and
to one of the inputs of a comparator COMP40. The comparator COMP40
receives on its other input the part tag from the bus BUSA.
[0330] The validation bit v, issued by the register REG40,
connected to the comparator COMP40 has opposite the constant value
1. The bits v, a, m, f in the output of the register REG40 are
connected respectively to one of the inputs of the multiplexers
MUX40, MUX41, MUX42, MUX43. The outputs of these multiplexers
furnish the state of these same bits to the inputs of the buffers
BV41. The multiplexer MUX40 receives on its other inputs the
constants zero and one, the inputs sel40 are connected to the
output of a logic gate ET48 and to a signal {overscore (dlle)}. The
multiplexer MUX41 receives on its other input the constant one,
selected when its input sel41, receiving a signal dlle from the
management processor PGP, is at the logic state of one. The
multiplexer MUX42 receives on its other inputs the constant 1 and
the signal {overscore (r)}_adrbloc_SR, on its inputs sel42
receiving the signals {overscore (dlei)} and {overscore (dlle)}
coming from the management processor PGP. The multiplexer MUX43
receives on its other input the constant one, selected when its
input sel43, connected to the output of a logic gate ET49, is at
logic state one.
[0331] The logic gate ET49 receives on its inputs the output eg40
of the comparator COMP40, the signal f inverted and the signal m.
The logic gate ET48 receives on its inputs the output eg40, the
signal m inverted, the signal dlei and the output of the logic gate
OU42. The logic gate OU43 receives on one of its inputs the signal
eg40, and on its other input the signal dlle. The output of the
logic gate ET49 is also connected to the input load41 of the
waiting line FIFO, already written in the series link management
processor PGS. The fields cadre and tag of the bus BUSA are
connected to the inputs of the waiting line FIFO. The signals
{overscore (dlei)}, {overscore (dlle)} and {overscore
(r)}_adrblock_SP come from the parallel link management processor
PGP, which receives also the signal dei of the decoder DEC41.
[0332] The operation of the assembly is as follows: the activation
signal {overscore (valid)} initializes the phase distributor DP_E
and validates the decoder DEC41 which produces the activation of an
output as a function of the information type coding the nature of
the demand in progress on the common bus BUSA. The active output
may be:
[0333] dnp: request for reading of non-shared data. The signal done
is deactivated upon appearance of the phase .theta.41;
[0334] dmaj: writing request for operation of the block. The signal
{overscore (done)} is deactivated at phase .theta.41;
[0335] dl: request for reading the block;
[0336] de: request for writing the block;
[0337] dei: request for informative writing.
[0338] Theswe three cases necessitate an access in reading to the
directory RG, and the two latter an eventual re-writing of the
directory. to this effect, a request for access is sent to the
directory management processor PGR by the flip flop B42 (signal
{overscore (rqst)}_PR), the flip flop B43 blinding the phase
distributor until access agreement, signified by the signal
{overscore (grnt)}_PR. The reading thus carried out of .THETA.41
and .THETA.44 (flip flop B44) and the eventual writing of .THETA.44
to .THETA.45 (flip flop B45) with storage of the descriptor in the
register REG40 from the phase .theta.43. If the block is absent
from the cache memory MC (eg40=0), then the signal {overscore
(done)} is deactivated on the phase .theta.44. If the block is
present in the cache (eg40=1), then:
[0339] if m=1, a request to purge is activated (activation logic
gate ET49) on the condition that this block is not already in the
line (f=0); the only bit modified is f, set to one, at the time of
re-writing of the descriptor,
[0340] if m=0, the block is invalidated by the multiplexer MUX40
(activation of the logic gate ET48),
[0341] if the request is local ({overscore (dlle)} or {overscore
(diei)} active) then:
[0342] 1) in case of reading or writing, the bits v and a are set
to 1 and me is set to 0 (reading) or 1 (writing) (state of the
signal r_adrbloc_SP),
[0343] in the case of informative writing, the bit m is forced to
1.
[0344] In these latter cases which require a re-writing in the
management directory RC, the signal done is deactivated on the
phase .theta.45.
[0345] The parallel link management processor PGP, an example of
which is shown in FIG. 16, is charged with requesting the common
bus BUSA and providing the transaction requested, either by the
request management processor PGU, or by the series link management
processor PGS.
[0346] A request coming from the request management processor PGU
can only be an informative writing request. A request coming from
the series link management processor PGS is either a request for
reading or writing of the block, or a request for the current
setting of the block.
[0347] The parallel link management processor PGP comprises a flip
flop B60, connected through its data input D to the signal
{overscore (rqst)}_UP. A flip flop B61 is connected through its
data input D to the signal {overscore (rqst)}_SP. the outputs Q and
{overscore (Q)} of a flip flop B62 are connected respectively to
the clock inputs of flip flop B62s B60 and B61. theoutput
{overscore (Q)} of the flip flop B62 is looped around to its data
input. The outputs {overscore (Q)} of the flip flops B60 and B61
are connected to the inputs of a logic gate OU60. The output of the
logic gate OU60 is connected, first, to a differentiator circuit
D60, and in an inverse manner, to an input of a logic gate ET60,
which receives on its other input the general clock signal h. The
output of the logic gate ET60 is connected to the clock input of
the flip flop B62. The output of the differentiator circuit D60 is
connected to the input {overscore (S)} of a flip flop B63. The
output {overscore (Q)} of the flip flop B63 delivers a signal
{overscore (rqsti)} to the destination of the arbitrator AB, and
its input {overscore (R)} is connected to the output of a logic
gate ET62. The signal {overscore (grnti)} coming from the
arbitrator AB is connected to the logic gates OU62 and NOU60. the
logic gate OU62 receives on its other input the signal {overscore
(valid)} inverted, and its output is connected to the input of a
differentiator circuit D61. The output of this circuit D61 is
connected to one of the inputs of the logic gate ET62 and to the
input {overscore (S)} of a flip flop B64. The output Q of this flip
flop B64 is connected to the validation input of the passage
buffers BV60 and in an inverted maner, across an inverter with an
open collector I60, to the signal {overscore (valid)}. The signal
{overscore (done)} is connected to the input of a differentiator
D62. the output of this circuit D62 is connected to the input
{overscore (R)} of the flip flop B64 and to one of the inputs of
the logic gate NOU60. The output of this logic gate NOU60 is
connected to one of the input logic gates NET60 and NET61, which
receive on their other input, respectively the outputs {overscore
(Q)} of the flip flops B60 and B61. The outputs {overscore (Q)} of
the flip flops B60 and B61 are also connected respectively to the
validation inputs of the passage buffers BV61 and BV62. The output
of the logic gate NET60 is connected to the input {overscore (S)}
of the flip flop B60 and to one of the inputs of a logic gate ET63,
which receives on its other input the output of a differentiator
circuit D63. The output of the logic gate ET63 delivers the signal
{overscore (acq)}_UP to the management processor PGU. The output of
the logic gate ET63 delivers the signal acq_UP to the management
processor PGU. The output of the logic gate NET61 is connected to
the input {overscore (S)} of the flip flop B61 and furnishes the
signal {overscore (ack)}_SP. The bus adr_bloc_UP is connected to
the input of the validation buffers BV61 and to one of the inputs
of a comparator COMP60.
[0348] The bus adr_bloc_SP is connected to the input of the
validation buffers BV62. The outputs of the buffers BV61 and BV62
are connected together and to the input of the validation buffer
BV60. The output of the buffer BV60 is connected to the common bus
BUSA. The logic gates OU63 and OU64 receive on their inputs
respectively the outputs {overscore (Q)} of the flip flops B60 and
B61, the logic signal {overscore (grnti)} and the signal maj for
OU64. The output of the logic gate OU63 delivers the signal
{overscore (dlei)}, and the output of the logic gate OU64 the
signal {overscore (dlle)}. The other input of comparator COMP60
receives the fields tag and cadre from the common bus BUSA. The
input en60 of the comparator COMP60 is connected to the output of
the logic gate ET61, which receives on its inputs the signal
{overscore (grnti)} inverted, the signal dei and the signal
{overscore (rqst)}_UP inverted. The output eg60 of the comparator
COMP60 is connected to the input of the differentiator circuit D63.
The output of this circuit is also connected to the inputs
{overscore (R)} of the flip flops B60 and B61 and the other input
of the logic gate ET62.
[0349] The operation of the assembly is as follows: The flip flops
B61 and B62 associated with the logic gates ET60 and ET61
constitute a local arbitrator. This arbitrator examines
alternatively the requests {overscore (rqst)}_UP and {overscore
(rqst)}_SP, and resends them to the arbitrator AB of the common bus
BUSA by the signal {overscore (rqsti)}. The access accord is
furnished by the validation of the signal {overscore (grnti)} and
cycle bus takes place upon deactivation of the signal {overscore
(valid)} which frees the arbitrator AB. The activation of the
signal {overscore (done)} frees the local arbitrator: the
transaction is terminated.
[0350] If the request comes from the series link management
processor PGS, then the signals {overscore (dlei)} and {overscore
(dlle)} indicate to the associated snooper processor PE the nature
of the updating of the status bits of the block to be carried out
in the directory RG.
[0351] If the request comes from the request management processor
PGU, the in case of detection of informative writing on the same
block (signal dei coming from the snooper processor PE), an
immediate liberation takes place: the request management processor
PGU, after consultation of the directory (the block has been
invalidated) will direct its request to the series link management
processor PGS.
[0352] The memory management processor PGM, one example of which is
shown in FIG. 17, is charged with assuring the reading or writing
of the block in the central memory RAM and to participate in
maintaining the coherence of the information in the various cache
memories MC of the multiprocessor system.
[0353] To this end, it comprises a differentiator circuit D80
receiving on its input a signal {overscore (valid)} and connected
by its output to the inputs {overscore (S)} of the flip flops B80
and B81 as well as to the input {overscore (clr)}_80 of a shift
register SR80. On the output Q of the flip flop B80, with an open
collector, is delivered the signal {overscore (done)}. The output Q
of the flip flop B81 is connected to the input serial_in80 of the
register SR80. This flip flop B81 is connected by its input
{overscore (R)} to the output Q81, inverted, of the register SR80.
The register SR80 receives on its input clk80 the general clock
signal h and its validation input {overscore (en80)} is always
active. The flip flop B81 and the shift register SR80 constitute a
phase distributor DP_M. The signal {overscore (valid)} is also
delivered on the validation input {overscore (en81)} of a decoder
DEC80. This decoder is connected by its data input to the part type
of the common bus BUSA, and furnishes the signals dnp, dl, de, ei
and maj. A read-write memory RAMFG of a size of two bits (termed
respectively ro and rw) receives on its address bus the fields tag
and cadre of the common bus BUSA. The data bus of this memory,
constituted of bits ro and rw, is connected by one part to a logic
PAL80, and to a logic gate ET80, directly for rw, and inverted for
ro. The logic PAL80 is connected to the field type and receives the
logic signals cl, r/w, s/n, mff and en82: the signal cl is emitted
from a flip flop B82, the signals r/w and s/n from an associative
waiting line AFIFO, the signal mff from a logic gate ET81, and the
signal en82 from a flip flop B83, which receives on its inputs
{overscore (S)} and {overscore (R)} respectively the signals O82
and O81 inverted from the phase distributor DP_M. The logic PAL
cables on its outputs ro-rw the following logic equations: dnp=10;
dl.{overscore (mff)}=10; dl.mff=01; de=01; maj.cl=00; ei=01;
maj.cl.s/{overscore (n)}.{overscore (r/w)}=10; maj.cl.s/{overscore
(n)}.{overscore (r/w)}=01. The output of the logic gate ET80 is
connected to the input D of a flip flop B84 which receives on its
clock input the phase .THETA.82. The output Q of this flip flop is
connected to one of the inputs of the logic gate ET81, which
receives on its other input the output of the logic gate OU80. The
two inputs of this logic gate OU80 are connected to the outputs de
and dl of the decoder DEC80. The reading input r of the memory
RAMFG is connected to the phase .THETA.81, and the writing input w
to the output of a logic gate ET82. The logic gate ET82 receives on
its inputs the signal .THETA.83 and the output of a logic gate
ET83, the inputs of which are connected to the signal s/{overscore
(n)} and the phase .THETA.87. The output dnp of the decoder DEC80
is connected to the logic gates ET84 and ET85, which receive on
their other input respectively the phases .THETA.81 and .THETA.85.
The signal s/{overscore (n)} is also delivered to the logic gates
ET86 and ET87 which receive on their other input respectively the
phases .THETA.86 and .THETA.90. The output mff of the logic gate
ET81 is also connected to a logic gate ET88, which receives on its
other input the phase .THETA.83, and in an inverted manner to the
logic gates ET89 and ET90 which receive on their other input
respectively the phases .THETA.83 and .THETA.87. The output of the
logic gate ET88 is connected to the input wff of the line AFIFO.
The outputs of the logic gates ET84, ET86 and ET89 are connected to
the inputs of a logic gate OU81, the output of which is connected
to the input S of a flip flop B85. The outputs of the logic gates
ET85, ET87, ET90 are connected to the inputs of a logic gate OU82
of which the output is connected to the input R of the flip flop
R85. The signal s/{overscore (n)} is also. connected, in an
inverted manner, to a logic gate ET91 which receives on its other
input the phase O89. The output of the logic gate ET91 is connected
to the input of a logic gate NOU80 which receives on its other
input the output of the logic gate OU82. The output of the logic
gate NOU80 is connected to the input {overscore (R)} of the flip
flop B80. The output maj of the decoder DEC80 is connected to the
input of the logic gates ET92, ET93, ET94, ET95, which receive on
their other input respectively the phases .THETA.81, .THETA.85,
.THETA.85, .THETA.91. The outputs of the logic gates ET92 and ET93
inverted are connected respectively to the inputs S and R of a flip
flop B86, and those of logic gates ET94 and ET95 to the inputs S
and R of a flip flop B82. The output Q of the flip flop B82
produces the logic signal cl also delivered to the input cff of the
line AFIFO and to the control input sel80 of a multiplexer MUX80.
The part tag, cadre of the common bus BUSA is connected to the data
input of the line AFIFO and to one of the data inputs of the
multiplexer MUX80. The output dl of the decoder DEC80 is also
connected to one of the data inputs of the line AFIFO in order to
produce the reading/writing signal l/e. The data output of the lint
AFIFO is connected to the other input of the multiplexer Mux80. The
output of the multiplexer MUX80 is connected to the address bus of
the central memory RAM for the part tag.cadre, and to the inputs of
the decoders DEC81 and DEC82 for the part champ cpu. The output
{overscore (Q)} of the flip flop B86 is connected to the writing
input of the central memory RAM and to the input {overscore (en84)}
of the decoder DEC82. The output Q of the flip flop B85, slightly
retarded, is connected to the reading input of the central memory
RAM and to one of the inputs of a logic gate ET96, which receives
on its other input the output of the logic gate OU82. The output of
hthe logic gate ET96 is connected to the input en83 of the decoder
DEC81. The output j of the decoder DEC81 is connected to the
validation input of the passage buffers of the memory shift
register RDM.sub.j and the output j of the decoder DEC82 to the
loading input of said memory shift registerRDM.sub.j.
[0354] The operation of this assembly is as follows:
[0355] The activation of the signal {overscore (valid)} provokes
the unlatching of the phase distributor DP_H, and the validation of
the decoder DEC80, which comes to permit the determination of the
nature of the request. The phase O81 is used for reading the state
of the bits corresponding to the block requested in the memory
RAMFG, and the combination ro.rw is stored in the flip flop B84. A
first writing takes place in the memory RAMFG on the phase O83
which permits updating the status bits. Their value is furnished by
the logic PAL80 and permits obtaining the following chains:
[0356] in case of request for a block of non-shared data (dnp) then
regardless of the state of the bits ro.rw (rw is forced to zero),
the state 10 is forced ("block broadcast in reading");
[0357] in case of request for a block for reading (dl) or writing
(de), if ro.rw=01, then the request is placed in the waiting line
on the phase .THETA.83 and the state 01 is forced (in fact this is
the preceding state), if not the state 10 is forced in case of
reading ("block broadcast in reading") and the state 01 is forced
in case of writing ("block broadcast in writing");
[0358] in case of request for updating (maj), the state 00 is
forced ("block not broadcast"). In these various cases, a reading
or a writing into the central memory RAM is carried out, toward or
away from the memory shift register RDM.sub.j identified by the
field cpu of the common bus BUSA. In the example chosen, the
duration of the memory cycle RAM is 4 periods of the general clock
h. In the case of reading of non-shared data, the cycle is carried
out from .theta.81 to .theta.85, in the other cases from .theta.83
to .theta.87. The writing is carried out from .theta.81 to
.theta.85;
[0359] in case of informative writing, this does not cause movement
of data, but forces the status bits to the value 01 (state of
departure is in this case forcedly 10);
[0360] in case of request for updating, a consultation of the
waiting line AFIFO is systematically carried out. This consultation
may cause the reading of a block, in the case where a central unit
CPU is waiting in the line AFIFO for operating this block.
[0361] The reading is carried out from O86 to O90 and the state of
the bits is forced to 10 (request for a reading) or 01 (request for
writing). The end of all operation translates by resetting to zero
the flip flop B80 which activates the signal done. This
deactivation may be produced on the phases .THETA.85, .THETA.87 or
.THETA.91 according to the requested operation, or on .THETA.89 if
the consultation of the line gives a negative result.
[0362] The associative waiting line is not detailed. It constitutes
in a conventional manner an associative store used in the waiting
line. the number of words of this memory is equal to the number of
the central unit CPU of the multiprocessor system. An internal
"daisy chain" identifies on each phase .THETA.81 the next candidate
word to a writing, which is produced on the phase .THETA.83 by the
signal wff. The signal cff unlatches a comparison starting fromthe
phase .THETA.85, the flip flops of the response memory having been
reset to zero on the phase .THETA.84. The result of the comparison
is propagated on the signal s/{overscore (n)} (some/{overscore
(none)}) and the contents of the word concerned is available on the
data output from the phase .THETA.86. This word is then invalid on
the phase .THETA.90.
[0363] In the architecture described above, the snooper processors
PE.sub.j are urged to each address transfer on the common bus BUSA,
with eventual consultation of their cache memory ML.sub.j. This
consultation is most of the time useless (low probability of the
presence of the address of the block corresponding to the transfer,
in the cache memories).
[0364] It should be noted that the memory management processor PGM
keeps the state bits of the blocks and makes possible a central
management for maintaining coherence. To this effect, one can add
to the architecture described above (FIG. 10) a parallel
synchronization bus operating according to the same algorithm as
the synchronization bus SYNCHRO of the variation which is described
hereafter. The snooper processors are no longer then property
spoken of a snoopers (since connected on the synchronization bus
and not on the common bus BUSA), and are designed as coherence
maintenance processors (PMC.sub.j for the variation in FIG. 18).
Thus, the memory management processor PGM remains urged to each
transfer on the common bus BUSA, but the coherence maintenance
processors are urged by the processor PGM only when they are
concerned by the transfer.
[0365] FIG. 18 presents a variation in which the coherence is
maintained according to the principle discussed above. This
variation takes the general architecture of FIG. 6, with the
address of the blocks which transfer by the series links LS.sub.j.
This system comprises a parallel synchronization bus SYNCHRO, of
the same logic structure as the common bus BUSA, but directed at
the sole initiative of the memory management processor PGM.
[0366] The structure of the central unit UC.sub.j is conformed to
that presented in FIG. 10, with some modifications:
[0367] the structure of the cache memory MC.sub.j remains the same,
as well as the structure of the management directory RG.sub.j,
[0368] the parallel management processor PGP.sub.j disappears,
since the common bus BUSA no longer exists and the functions which
had been attributed to it are turned over to the series link
management processor PGS.sub.j;
[0369] the snooper processor PE.sub.j is replaced by a coherence
maintainance processor PMC.sub.j which takes care of keeping the
state bits of the blocks in the cache memory MC.sub.j in order to
assure the coherence and which is active at the sole initiative of
the memory management processor PGM.sub.j via the synchronization
bus SYNCHRO;
[0370] the request management processor PGU no longer knows a
single partenary: the series link management processor PGS.sub.j,
on which it reports all its requests;
[0371] the series link management processor PGS.sub.j is charged
with the transfer of the addresses and the data, conforming to the
principle described for the system of FIG. 6, each address being
prefixed by the nature op the request;
[0372] the functionalities of the memory management processor PGM
are those described in reference to FIG. 17, its activation no
longer being assurred by the signal {overscore (valid)}, which
disappears (since previously associated with the common bus BUSA),
but by the arbitrator ABM described in the system of FIG. 6, which
serializes the requests for service which move by the series links.
The memory RAMFG is also constituted by a supplmentary field cpu
associated with the state bits ro.rw.
[0373] The general functionning of the embodiment shown in FIG. 18
is as follows:
[0374] Each request of the treatment processor CPU activates the
request management processor PGU.sub.j with the reading or writing
indication and code or data. This processor requires an access to
the management directory RG.sub.j after the directory management
processor in comparison with the directory management processor
PGB.sub.j. The consultation of the directory leads to one of the
following cases:
[0375] the block is present in the cache memory MC.sub.j, with an
unmodified state (m=0): if the request is for reading, the
information requested is extracted from the cache memory MC.sub.j
and furnished to the treatment processor CPU.sub.j. If the request
is for writing, then an informative writing request ei is
transmitted to the series link management processor PGS.sub.j.
[0376] the block is present in the cache memory MC.sub.j, with a
modified state (m=1); the request, reading or writing, is
satisfied;
[0377] the block is absent from the cache memory MC.sub.j; a block
request for reading or writing is transmitted to the series link
management processor PGS.sub.j.
[0378] Thus, the requests made to the series link management
processor may be: a request for reading non-shared data (code):
dnp, a request for reading the block: dl, a request for reading of
the block in order to carry out there a writing: de, a request for
informative writing: ei.
[0379] At these various states, it is necessary to add the state
maj corresponding to the purging of a block, either at the request
of the coherence maintenance processor PMC.sub.j, or for liberating
a placement of a block in the cache memory. The addresses thus
prefixed move by the series links LS.sub.j, and conformingly to the
principle set out during the description of the architecture of
FIG. 6, urge the arbitrator ABM when:
[0380] in case of reading a block, the address is transmitted,
[0381] in case of writing a block, the address and the data are
transmitted.
[0382] These requests are treated sequentially by the memory
management processor PGM, of the same general structure as that
described in FIG. 17. Their treatment is as follows:
[0383] 1/dnp: request for non-shared data. The block is transmitted
and takes the state ro.rw=10.
[0384] 2/dl: request for reading of a block.
[0385] If the block is in the non-broadcast state (ro.rw=00) or
"broadcast in reading" (ro.rw=10), it is transmitted and takes or
keeps the state ro.rw=01.
[0386] If the block is in the state of broadcast in writing, the
request is sent into the waiting line AFIFO. The memory management
processor PRM then finds in the field cpu of the memory RAMFG the
address of the cache memory MC.sub.i which contains the current
version of the requested block. A request for purging is then
emitted on the synchronization bus SYNCHRO, to the sole destination
of the coherence maintenance processor PMC.sub.i associated with
the cache memory MC.sub.i concerned. This request may be qualified
by the addressed command.
[0387] One wil note that the coherence maintenance processor
PMC.sub.i has not consulted the associated management directory
RG.sub.i since the memory management processor PGM knows of the
fact that it is the only possessor of the current copy. Its role
consists simply of removing the request and depositing it in the
associated file FIFO.sub.i.
[0388] 3/de: request for reading of a block in order to make a
writing.
[0389] If the block is in the "non-broadcast" state (ro.rw=00), it
is transmitted and takes the "broadcast in writing" state
(ro.rw=01).
[0390] If the block is in the "broadcast in reading" state
(ro.rw=10), then the memory management processor emits a universal
block invalidation command, then transmits the bloc with the state
"broadcast in writing" (ro.rw=01). the universal command provokes
the activation of all of the coherence maintenance processors
PMC.sub.j which execute strictly the same operations as those
described for the system of FIG. 10.
[0391] If the block is in the "broadcast in writing" state, the
request is sent to the waiting line AFIFO. As before, the memory
management processor PGM sends a command addressed to the only
possessor of the current copy.
[0392] 4/maj: request for writing of a block after a purge.
[0393] The operating algorithm is in this case strictly the same as
that described in reference to FIG. 17 for the PGM processor.
[0394] It will be noted that the problem of discharge of the
writing naturally found its solution in this embodiment by an
addressed command for discharge.
[0395] 5/ei: informative writing.
[0396] This case is treated directly on the commun bus BUSA in the
architecture presented in FIG. 10. In the embodiment provided here,
and in order ot assure the synchronization, this operation is
controlled by the memory management processor PGM.
[0397] If the block is in the state "broadcast in reading", then a
command, at the time universal and addressed, is sent out:
addressed in the sense that the coherence maintenance processor
PMC.sub.j concerned notes the discharge of the request for
informative writing and passes the block concerned into the state
"modified" in the management directory RG.sub.j, universal in the
sense that all the other processors PMC.sub.i must invalidate this
block in their directory.
[0398] The block in the "broadcast in writing" state indicates that
an informative writing request has been treated on this same block
during the time waiting for treatment of the request. In this case,
the informative writing request is transformed into a writing
request de, and follows the same treatment as in the corresponding
case of writing.
[0399] The parallel synchronization bus SYNCHRO is charged with
broadcasting the block addresses prefixed by a processor number and
a type of request, to be about 30 to 40 bits according to the
characteristics of the multiprocessor. These data are moreover
transmitted in a unidirectional manner. Their transfer may
advantageously be made there by a series link. Their transfer rate
is less critical than for the block, and simplified solutions may
therefor be envisioned, for example by the bias of "TAXI" circuits
made by the A.M.D. Company.
[0400] FIG. 19 shows a partial block diagram of one architecture
according to the invention, in which several central units UC.sub.k
. . . are connected in clusters and share the same series link
LS.sub.k. In this goal, a local arbitrator ABL.sub.k associated
with the cluster is charged with arbitrating the access conflicts
by means of communication of block addresses, and sharing with the
memory processor PGM, transported for this purpose, a signal
busy.sub.k indicating permanently the free state or occupied by the
series link LS.sub.k. Means for coding and decoding of one at the
head of the identification of the processor concerned at the
interior of a cluster are associated with emission logic and
reception of blocks of data.
[0401] In the case where the address communication means for blocks
are constituted by the common bus BUSS, the operation is as
follows:
[0402] If the central unit CPU.sub.k+j desires to make a transfer
of a block in the direction of the central memory RAM to a cache
memory MC.sub.k+j (case dnp, dl, de) or carry out an informative
writing ei, then it requires access to the common bus BUSA by the
local arbitrator ABL.sub.k, which propagates the request, its turn
coming, to the arbitrator AB. The accord for access to the common
bus BUSA is re-sent to the central unit CPU.sub.k+j and the
transfer is carried out in the manner described in reference to
FIG. 10. All blocks transmitted in the direction of the central
memory RAM to the cache memory MC.sub.k+j must then be identified,
since the order of the requests is not respected of the fact of the
possibility of being waiting in the line AFIFO of the memory
management processor PGM. If the central unit CPU.sub.k+j desires
to make a transfer of a clock in the direction of the cache memory
MC.sub.k+j toward the central memory RAM (case maj), then it
requests first of the local arbitrator ABL.sub.k access to the
series link LS.sub.k. The local arbitrator ABL.sub.k and the memory
management processor PGM are both susceptible of authorizing the
series link LS.sub.k: the contention is avoided by synchronizing
the modification of the signal busy.sub.k with the signal valid
(the memory management processor PGM only activates or reactivates
a transfer during a memory transaction). The occupation accord of
the series link LS.sub.k leads the central unit CPU.sub.k+j to
transfer its block of information to the memory shift register
RDM.sub.k, then to request the local arbitrator ABL.sub.k for
access to the common bus BUSA in order to carry out there the
operation request, which is carried out according to the algorithm
described in reference to FIG. 10. The writing of the operation may
cause a liberation of the block in the waiting line AFIFO of the
memory management processor PGM and request an occupied shift
register RDM.sub.j. In this case, the transfer requested is delayed
and chained with the transfer in progress.
[0403] In the case where the block address communication means is
the series links themselves, the operation is then identical to the
preceding case for the preemption of the series link, and identical
for the general operating algorithm to that shown with reference to
FIG. 17.
[0404] For example, a reading request of the block emitted by the
central unit CPU.sub.k+j necessitates first an access accord to the
series link LS.sub.k, an accord given by the local arbitrator
ABL.sub.k in concert with the memory management processor PGM. The
access accord causes the transfer of the address of the block
requested on the series link LS.sub.k, which is immediately
liberated: it is available for any other transaction if necessary.
A writing request of a block follows the same protocol for access
to the series link LS.sub.k.
[0405] In the architecture described with reference to FIGS. 1 to
19, there exist as many memory shift registers RDM.sub.j as central
units CPU.sub.j: one series link LS.sub.j being provided in a
static manner to a couple (RDM.sub.j, CPU.sub.j).
[0406] If there must evidently be at least one series link LS.sub.j
between a central unit and the central memory RAM, the number of
shift registers RDM.sub.j may be less. In effect, if tacc is the
time for access to the central memory RAM, and ttfr the time for
transfer of a block, it is not possible to keep more than
n=ttfr/tacc shift registers simultaneously occupied. For example,
for tacc=100 ns and ttfr=1200 ns, one obtains n=12.
[0407] Tacc and ttfr are then the criteria characteristic of the
performance of the multiprocessor system according to the invention
and putting in place n memory shift registers RDM.sub.j is
compatible with a greater number of series links LS.sub.j that the
condition of interspersing a logic of the interconnection network
type RI between registers and links, the allocation of a memory
register RDM.sub.j to a series link LS.sub.j being achieved in a
dynamic manner by the memory management processor PGM.
[0408] Further, the central memory RAM will generally constitute m
memory banks RAM.sub.1, . . . RAM.sub.p, RAM.sub.m arranged in
parallel, each memory bank comprising n shift registers RDM.sub.j
connected by an interconnection network RI.sub.p to the assembly of
series links LS.sub.j. It is then possible, under reserve that the
addresses of the blocks be uniformly distributed in the memory
banks RAM.sub.p, to obtain a theoretical performance of m.times.n
shift registers simultaneously active. The uniform distribution of
the addresses is assurred by conventional mechanisms for
interlacing the addresses.
[0409] In FIG. 20a, there is partially shown an architecture
conforming to the invention, comprising m memory banks RAM.sub.p
with n shift registers RDM.sub.j per memory bank and q central
units UC.sub.j. Each memory bank is of the random access type
provided with an input/output of data of a size corresponding to a
block of information bi, this input/output being (as before for the
memory RAM) connected by a parallel bus to the assembly of
elementary registers RDM.sub.1p . . . RDM.sub.jp.
[0410] The interconnection network is of a known structure
("cross-bar," "delta," "banyan," . . . ). One will note that a
multi-stage network is well suited in the case where the time for
establishing a path is negligible before its time of occupation
(the time of transfer of a block) and that it only concerns one bit
per link.
[0411] The memory management processor PGM is adapted to be assured
of the dynamic allocation of an output and an input of the network,
that is putting in relation a memory shift register RDM.sub.j and a
series link LS.sub.i.
[0412] In the case where the block address communication, means
constitutes the common bus BUSA, the operation is as follows:
[0413] In the case of a request for reading of a block of the part
of the central unit CPU.sub.j, the memory management processor
PGM.sub.p concerned allocates a shift register RDM.sub.i, controls
the interconnection network RI as a consequence, and initializes
the transfer.
[0414] In the case of a request for writing of a block of the part
of the central unit CPU.sub.j, a path must first be established. To
this end, a first request for establishment of the path is sent on
the common bus BUSA, followed by the writing request effecting the
transfer of the block from the cache memory MC.sub.j to the shift
register RDM.sub.i. At the time of the first request, the memory
management processor PGM is charged with allocating a path and
controlling the interconnection network RI.
[0415] In the case where the block address communication means is
the series links themselves, a path must be established for all
preliminary transfer. This problem is identical to the classical
problem of sharing of an assembly of n resources by m users and may
be controlled by classical solutions for arbitration of access
conflicts (communication protocols, supplementary signals).
[0416] In the example of the architecture shown in FIG. 5, the
shift registers RDM.sub.j and RDP.sub.j, their validation logic LV1
and LV2 were provided with high speed, the assembly being
synchronized by a clock of a frequency F at least equal to 100
MHz.
[0417] FIG. 20b shows a variation of the architecture proposed in
FIG. 20a, a solution conforming to the invention in which each
series link LS.sub.j, which connects the processor CPU.sub.j to all
of the memory banks, is divided into m series links LS.sub.jp
connecting by point to point the processor CPU.sub.j to each of the
memory banks RAM.sub.p.
[0418] This process presents the double advantage following:
[0419] each link being of the point to point type, may be better
adapted from the electrical point of view or the fiber optic point
of view,
[0420] a level of supplemental parallelism is obtained starting
from when the treatment processor is in the state of anticipating
block requests, which is the case actually for the for the highest
performance processors.
[0421] The interface logic (noted previously TFR.sub.j and
RDP.sub.j) which is associated with the series link LS.sub.j, side
processor CPU.sub.j, is then duplicated into m copies I.sub.1 . . .
I.sub.p . . . I.sub.m. One will note the presence of a link for
maintaining the coherence of information, proprietary to each
memory bank RAM.sub.p. The operation of this link is analogous to
that of the bus SYNCHRO of FIG. 18.
[0422] There is shown in FIGS. 21a and 21b another structure of the
memory RAM which comprises 2.sup.u memory planes, each memory plane
having a front of t/2.sup.u binary data (for reasons of clarity, in
FIG. 212a is shown the means necessary for the reading of a block
bi, and in FIG. 21b the means necessary for writing). The shift
registers RDM.sub.j or RDP.sub.j are constituted of 2.sup.u
elementary sub-shift registers RDM.sub.jp with t/2.sup.u bits of
capacity. The example shown in FIG. 20 is a realization of 8 memory
planes (u=3). (For clarity of the drawing, there is shown a single
shift register RDM.sub.j formed for the assembly of sub-registers
RDM.sub.jp). Each memory plane RAM.sub.p comprises opposite its
access front an assembly of elementary shift registers RDM.sub.jp
and is capable of operating with a shift frequency of at least
F/2.sup.u.
[0423] The operation of the assembly in case of reading is
illustrated in FIG. 21a. A block is read in a synchronous manner in
the assembly of 2.sup.u memory planes and loaded in the same manner
in the elementary registers of the same rank. The series outputs of
these registers are connected to the inputs of a multiplexer MUXR
made of high speed technology (ASGA). A circuit of this type,
perfectly adapted, is available at "Gigabit Logic" under the name
"LOGO40", and is capable of delivering a logic signal at a
frequency of 2.7 GHZ. It also furnishes a clock frequency divided
by eight, which constitutes the clock for shift of the elementary
registers RDM.sub.jp.
[0424] In the case of writing-, a symmetrical operation, shown in
FIG. 21b, is obtained with a multiplexer circuit DMUXR of the same
manufacturer (known as "LOG41"), with the same characteristics of
performance.
[0425] Thus, a transfer frequency of 500 MHz is obtained with 8
elementary registers operating at a frequency of 500/8=62.5 MHz,
which makes then able to be accomplished with more conventional
technology ("MOS" for example).
[0426] The multiplexer and demultiplexer circuits reference
hereabove are combinable into assemblies of 16, 32, . . . bits.
Thus, by associating respectively 16, 32 memory planes operating at
62.5 MHz, it is possible to obtain flow rates of 1 and 2 GHz, or a
level of performance 2 to 4 times greater.
[0427] It should be noted that the logic TFR may be provided on one
of the elementary registers, and that the validation logic LV is
integrated with the circuits "ASGA" (output with open
collector).
[0428] FIG. 22 shows the general structure of a component of the
"VLSI" integrated circuit type, called a multiport series memory
and capable of equipping a multiprocessor system according to the
invention. This component may be used in multiprocessor
architecture described previously, either to provide the central
memory RAM and the associated shift registers RDM.sub.j, or for
profiding each cache memory MC.sub.j and its shift register
RDP.sub.j. To simplify the notations, in the description which
follows, the symbols relative to the central memory RAM and the
associated shift registers will be retained.
[0429] The list of pins of this circuit with the corresponding
signals is as follows:
[0430] adbloc.sub.0-adbloc.sub.m-1: m address bits of block bi,
[0431] admot.sub.0-admot.sub.k-1: k bits of word addresses in the
block,
[0432] numreg.sub.0-numreg.sub.n-1: n bits of register addresses
rd,
[0433] {overscore (cs)}: "chip select": circuit selection
signal,
[0434] {overscore (wr)}: "write" signal for writing,
[0435] {overscore (rd)}: "read" signal for reading,
[0436] bit/{overscore (bloc)}: control signal for the multiport
function,
[0437] normal/{overscore (config)}: signal for the operating
mode,
[0438] data.sub.0-data.sub.1-1: 1 bits of data,
[0439] h.sub.1-h.sub.n: n clock signals,
[0440] d.sub.1-d.sub.n: n data signals.
[0441] The values m, n, 1 are functions of the current state of
technology. The actual values may be as follows:
[0442] m=16 or 2.sup.16 blocks bi of 64 bits each (or 4 Mbits),
[0443] n=3 or 8 registers rd,
[0444] 1=8 or a parallel interface of the octet type,
[0445] k=3 because of the fact of 8 octets per block.
[0446] The component provided comprises about 50 pins.
[0447] This series multiport memory circuit is composed of a random
access read-write memory RAM, of a predetermined size t, capable of
being controlled in writing on the fronts independent of the size
t/4 (value chosen by way of example in FIG. 22) and t/1. The data
lines of this memory RAM are connected to the inputs of a logic of
the "barrel shifter" type BS, or of the multiplexing MT, depending
on the version of the component, the multiplexing logic MT may be
considered as offering a sub-assembly of possibilities of barrel
shifter logic and therefor simpler to provide. The address signals
and the control signals of this memory RAM, namely csi, wri, rdi,
adbloci, are delivered from a control logic COM. This logic COM
receives further the information signals from the pins {overscore
(cs)}, {overscore (wr)}, {overscore (rd)}, bit/{overscore (bloc)},
normal/{overscore (config)}, numreg and is connected by the control
lines "format" to the barrel shifter logic BS, and to the output of
a configuration register RC1, and to the input of a selection logic
LSR providing the signals srd.sub.0, . . . srd.sub.n-1 and
src.sub.1, src.sub.2, src.sub.3. The outputs of the barrel shifter
type of logic GS constitute an internal parallel communication bus
BUSI, connected to an assembly of shift registers RD.sub.0, . . .
RD.sub.n-1 on their parallel inputs, for one part, on their
parallel inputs, and for the other part on their parallel outputs
through the validation buffers BV100.sub.0, . . . BV100.sub.n-1 and
to the parallel input of the configuration registers RC.sub.1,
RC.sub.2 . . . RC.sub.i .
[0448] The 1 bits of low significance of the bus BUSI are also
received on the 1 pins data.sub.0, . . . data.sub.1-1. Each shift
register RD.sub.i and the associated logic gates constitute a
functional unit ELRD.sub.i, controlled by an assembly of logic
elements which constitute a forcing logic LF.sub.i. Each functional
unit ELRD.sub.i comprises logic gates ET100.sub.i and ET101.sub.i,
connected on one of their inputs to the output srd.sub.i of the
selection logic LSR, and receiving on their other input
respectively the signals rd.sub.i and wr.sub.i. The output of the
logic gate ET100.sub.i is connected to the input load100.sub.i of
the shift register RD.sub.i, as well as the input load101.sub.i and
the input S respectively of a counter CPT100.sub.i and of a flip
flop B100.sub.i appurtaining to the forcing logic LF.sub.i. The
output of the logic gate ET101.sub.i is connected to the control
input of the validation buffers BV100.sub.i. The output di is
connected to the output of a logic gate PL.sub.i, which receives on
its data input the series output of the shift register RD.sub.i and
on its control input the output of a logic gate OU100.sub.i. The
signal sent from the pin h.sub.i is delivered to the input
clk100.sub.i of the register RD.sub.i as well as the input
down100.sub.i of the counter CPT100.sub.i. The output zero100.sub.i
of the counter CPT100.sub.i is connected to the input R of the flip
flop B100.sub.i.
[0449] The forcing logic LF.sub.i comprises in addition a
multiplexer MUX100.sub.i which receives on its data inputs the
values t and t/4. The data output of the multiplexer MUX100.sub.i
is conected to the data input of the counter CPT100.sub.i, and the
selection command sel100.sub.i of the multiplexer MUX100.sub.i is
connected to the output 1 of the register RC.sub.1. The output Q of
the flip flop B100.sub.i is connected to one of the inputs of a
logic gate ET102.sub.i, which receives on its other input the
signal issued from the pin i of a register RC.sub.2. The output of
the logic gate ET102.sub.i is connected to one of the inputs of the
logic gate OU100.sub.i, which receives on its other input the
signal issued on the pin i of a register RC.sub.3. The loading
inputs of the registers RC.sub.1, RC.sub.2, RC.sub.3 receive
respectively the signals src.sub.1, src.sub.2, src.sub.3 issued
from the selection logic LSR.
[0450] This component presents a duality of functionning: if the
signal bit/{overscore (bloc)} is in the "bit" state, then the
functionning of this component is that of a conventional
semiconductor memory: the signals adbloc associated with the
signals admot constitute the address bus in the word unit (8 bits
in the example), the signals {overscore (cs)}, {overscore (rd)},
{overscore (wr)} have the usual direction attributed to these
signals, and the data pins convey these data.
[0451] In an internal manner, in reading, the block of information
designated by adbloc is read in the memory RAM and presented at the
input of the barrel shifter logic BS or of the multiplexing MT. The
combination of signals admot and bit/{overscore (bloc)} permit the
control logic COM to provide to the barrel shifter logic BS or the
multiplexing MT the "format" signals. The word concerned is then
right justified in the output of the barrel shifter logic or the
multiplexing and presented thus on the data pins data.
[0452] In an internal manner, in writing the word presented on the
data lines data is aligned by the barrel shifter logic LS or the
multiplexing MT by the same signals of the control format as in
reading, with regard to its position in the block. The control
logic COM emits then a partial writing signal wri on the only leg
of the memory concerned, and to the address designated by the
signals adbloc.
[0453] If the signal bit/{overscore (bloc)} is in the "bloc" state,
then the operation depends on the state of the normal/{overscore
(config)} signal. The config mode programs the configuration
registers RC.sub.1, RC.sub.2, RC.sub.3 addressed by the signals
numreg, and programmed from the data lines data. The register
RC.sub.1 permits modifying the size of the block: t and t/4 in the
example, either 64 bits and 16 bits. In an internal manner, the
functionning is similar to that described in the "bit" embodiment:
t or t/4 bits are aligned on the internal bus BUSI (in reading), or
opposite the leg of the block concerned (in writing). The sizes of
the multiple blocks may be envisioned (t, t/2, t/4 . . . ).
[0454] The register RC.sub.3 permits selecting for each register a
permanent direction of operation: either in the input (RC3.sub.i0)
or in the output (RC3.sub.i=1). This permanent direction permits
adapting the component to series links with permanent
unidirectional connections. The register RC.sub.2 permits choosing
for each register, under reserve that the corresponding bit of
RC.sub.3 be at the logic state 0, a mode of operation with
alternate bidirectional links: on a reading memory RAM, the shift
register RD.sub.i concerned passes to the output mode for the time
of transmission of the block, then returns to the rest state in the
input mode. In an internal manner, the flip flop B100.sub.i, which
controls the logic gate PL.sub.i, is set to 1 on a loading signal
of the register RDM.sub.i and reset to zero upon issue of the
transfer of t or t/4 bits, through the counter CPT100.sub.i,
initialized at t or t/4 according to the state of the register
RC.sub.1, and which receives on its counting input the clock pulses
hi. In normal operation (normal/{overscore (config)} signal in the
normal state) for a reading, the block addressed by the pins adbloc
is loaded in the register RD.sub.i addressed by the numreg pins. If
the block is partial (t/4), then it is transmitted in a position of
low significance on the internal bus BUSI by the barrel shifter
type logic BS or the multiplexer MT. This block is then transmitted
upon activation of the clock signal hi.
[0455] In the normal operation for a writing, the contents of the
register RD.sub.i addressed by the numreg pins is written in the
block memory RAM with the address adbloc. If the block is partial,
it is transmitted into a position of high significance on the
internal bus BUSI, then aligned opposite the leg of the block
concerned by the barrel shifter logic BS or the multiplexer MT, and
finally a partial writing signal wri is sent out on the leg
concerned.
[0456] It should be noted that if a partial block is in service,
then the address of this partial block in the block is furnished by
the address lines admot.
[0457] This component is perfectly adapted to the diverse
variations of the architecture described. Associated in parallel,
8, 16 . . . circuits of this type permitting provision of the
apparatus described in FIGS. 20a, 20b. If the memory RAM is of high
speed technology, then this component may also be used at the level
of the cache memory, while multiplexing, according to the device
described in FIGS. 20a, 20b, the internal registers of a same
component.
* * * * *