U.S. patent application number 16/450933 was filed with the patent office on 2019-12-19 for system and method for self-healing of a dynamic link.
This patent application is currently assigned to ARTERIS, INC.. The applicant listed for this patent is ARTERIS, INC.. Invention is credited to Alexis BOUTILLIER, Dee LIN, Jonah PROBELL, Monica TANG.
Application Number | 20190384875 16/450933 |
Document ID | / |
Family ID | 59226555 |
Filed Date | 2019-12-19 |
![](/patent/app/20190384875/US20190384875A1-20191219-D00000.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00001.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00002.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00003.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00004.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00005.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00006.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00007.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00008.png)
![](/patent/app/20190384875/US20190384875A1-20191219-D00009.png)
United States Patent
Application |
20190384875 |
Kind Code |
A1 |
PROBELL; Jonah ; et
al. |
December 19, 2019 |
SYSTEM AND METHOD FOR SELF-HEALING OF A DYNAMIC LINK
Abstract
A Network-on-Chip (NoC) link with an upstream bypassable
narrowing serialization adapter and a downstream bypassable
widening serialization adapter, which are able to heal a link,
without losing throughput, by using one or a small number of
sideband signals to bypass individual known-bad wires. The
serialization adapters are normally bypassed. To avoid sending
information on broken wires, bypassing is disabled so that
information is serialized to only a portion of the link.
Serialization can be applied to any portion of a link down to as
little as one bit wire.
Inventors: |
PROBELL; Jonah; (ALVISO,
CA) ; BOUTILLIER; Alexis; (CAMPBELL, CA) ;
LIN; Dee; (FREMONT, CA) ; TANG; Monica; (SAN
JOSE, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
ARTERIS, INC. |
CAMPBELL |
CA |
US |
|
|
Assignee: |
ARTERIS, INC.
CAMPBELL
CA
|
Family ID: |
59226555 |
Appl. No.: |
16/450933 |
Filed: |
June 24, 2019 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
15391850 |
Dec 28, 2016 |
10331846 |
|
|
16450933 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 30/18 20200101;
G06F 2117/02 20200101 |
International
Class: |
G06F 17/50 20060101
G06F017/50 |
Claims
1. A method of adapting the serialization of a link in a NoC, the
method comprising: disabling a bypass of a narrowing serialization
adapter upstream of a link; and disabling a bypass of a widening
serialization adapter downstream of a link.
2. The method of claim 1, further comprising detecting a bad wire
of the link.
3. The method of claim 2, wherein the detecting step comprises:
driving a copy of a signal of a first wire of the link on a
sideband wire; comparing the value of the signal of the first wire
to the value of the sideband wire; driving a copy of the signal of
a second wire of the link; and comparing the value of the signal of
the second wire to the value of the sideband wire.
4. A system for adapting serialization of a link comprising: means
for disabling a bypass of a narrowing serialization adapter
upstream of a link; and means for disabling a bypass of a widening
serialization adapter downstream of a link.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This utility patent application is a continuation of U.S.
patent application Ser. No. 15/391,850 tilted DYNAMIC LINK
SERIALIZATION IN NETWORK-ON-CHIP and filed on Dec. 28, 2016 by
Alexis BOUTILLIER, et al., which claims the benefit of U.S.
Provisional Application Ser. No. 62/272,845 titled SYSTEM AND
METHOD FOR DYNAMIC LINK SERIALIZATION IN NOC filed on Dec. 30, 2015
by Alexis BOUTILLIER et al., the entire disclosures of which are
incorporated herein by reference.
FIELD OF THE INVENTION
[0002] The invention is in the field of semiconductor chips and,
more specifically, to fault-tolerant semiconductor chips.
BACKGROUND
[0003] Some kinds of chips must be fault tolerant. They must
continue to operate correctly, even when errors occur. Some kinds
of errors occur from time to time, such as when an alpha particle
or a power supply fluctuation causes a data bit to change value.
Such errors are known as soft errors. Errors can also occur
consistently, such as when a manufacturing defect or a process of
wearing out causes signal wires within the chip to be stuck at a 1
or 0, shorted to another wire, or broken. Such errors are known as
hard errors. Any errors would cause a conventional chip to
fail.
SUMMARY OF THE INVENTION
[0004] The invention is directed to safely and dependably enabling
a failed chip to function correctly when faced with hard
errors.
[0005] Fault tolerant chips are often designed with ECC to correct
data errors. An ECC code that just provides a desired level of
statistical protection against soft errors might provide sufficient
protection to compensate for hard errors. However, such an ECC
code, in a chip with a hard error, provides insufficient protection
against soft errors. The invention bypasses or provides a detour
around the point in a chip with a hard error. In embodiments that
use bypassing, data throughput is reduced through parts of the chip
with errors. In embodiments that provide detours, no throughput
decrease occurs.
[0006] Network-on-chip (NoC) is the set of wires and logic used to
transfer data and other information between functional units in
different parts of a chip. NoCs tend to have the longest wires and
the longest distances spanned of any logical connections within
chips. Because of their large amount of wire length, and their
relatively high wire to logic ratio, NoCs are particularly prone to
hard errors and furthermore the parts of a chip that can benefit
most from error resilient technologies.
[0007] NoCs transport information in packets. Network Interface
Units (NIU)s near the edges of the network create and consume
packets as necessary to complete requested transactions. Packets
comprise a header, and may comprise multiple bytes or words of
data. Packets are transferred along links, which comprise a set of
wires that can carry packet headers and data. Wider links have more
wires and provide for more data throughput, whereas narrower links
have fewer wires and use fewer resources. Where the link is
narrower than the amount of header and data information
transferred, the information is sent serially, generally in
successive clock cycles. Different links within a NoC can have
different serialization. That is, some are wider and some are
narrower. Serialization adapters are used in order to connect links
of different serializations. A narrowing serialization adapter
receives and stores a wide amount of information from a wide link
and transmits the information sequentially on the narrower link. A
widening serialization adapter receives and stores multiple pieces
of information from a narrow link and then transmit the information
together on the wider link.
[0008] Some embodiments of the invention use software procedures to
identify the location of hard errors. Some embodiments use
automatic physical link self-checking. Some embodiments use data
transport functional checking in a transport network to identify
the location of hard errors.
[0009] A transport network couples the units is a means of
communication that transfers at least all semantic information
necessary, between units, to implement coherence. The transport
network, in accordance with some aspects and some embodiments of
the invention, is a network-on-chip, though other known means for
coupling interfaces on a chip can be used and the scope of the
invention is not limited thereby. The transport network provides a
separation of the interfaces between the agent interface unit
(AIU), coherence controller, and memory interface units such that
they may be physically separated.
[0010] A transport network is a component of a system that provides
standardized interfaces to other components and functions to
receive transaction requests from initiator components, issue a
number (zero or more) of consequent requests to target components,
receive corresponding responses from target components, and issue
responses to initiator components in correspondence to their
requests. A transport network, according to some embodiments of the
invention, is packet-based. It supports both read and write
requests and issues a response to every request. In other
embodiments, the transport network is message-based. Some or all
requests cause no response. In some embodiments, multi-party
transactions are used such that initiating agent requests go to a
coherence controller, which in turn forwards requests to other
caching agents, and in some cases a memory, and the agents or
memory send responses directly to the initiating requestor. In some
embodiments, the transport network supports multicast requests such
that a coherence controller can, as a single request, address some
or all of the agents and memory. According to some embodiments the
transport network is dedicated to coherence-related communication
and in other embodiments at least some parts of the transport
network are used to communicate non-coherent traffic. In some
embodiments, the transport network is a network-on-chip with a
grid-based mesh or depleted-mesh type of topology. In other
embodiments, a network-on-chip has a topology of switches of varied
sizes. In some embodiments, the transport network is a crossbar. In
some embodiments, a network-on-chip uses virtual channels.
[0011] Some links send header information and data sequentially on
the same wires as in FIG. 1(a). That is, the header of each packet
is transmitted in one cycle, and therefore has a data throughput
penalty of one cycle per packet. Such links are generally as wide
as the wider of the header or the data word width. Some links send
different portions of header information sequentially as in FIG.
1(b). They penalize the data throughput multiple cycles, as needed
to send the header. Such links are generally just as wide as the
data, regardless of the amount of information in the header. Some
links send header information simultaneously, in parallel, with
data as in FIG. 1(c). That is, the penalty for sending the header
before the data is None. Such links are generally as wide as the
sum of the data word width and the size of the header information.
The choice of serialization determines the throughput available for
any particular pattern of packets and amounts of packet data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 illustrates header penalty versus wire count in links
with configurable serialization according to the invention.
[0013] FIG. 2 illustrates an error-prone link with bypassable
serialization adapters on the upstream and downstream ends of the
link according to the invention.
[0014] FIG. 3 illustrates muxing bypass wires into half of the
datapath wires to avoid an error-prone part of a link according to
the invention.
[0015] FIG. 4 illustrates muxing bypass wires into both halves of
the datapath wires according to the invention.
[0016] FIG. 5 illustrates bypassable narrowing and widening
serialization adapters according to the invention.
[0017] FIG. 6 illustrates an embodiment with Patch and Fix signals
on an error-prone link according to the invention.
[0018] FIG. 7 illustrates a transmitter for an error-prone link
with Patch and Fix signals according to the invention.
[0019] FIG. 8 illustrates a receiver for an error-prone link with
Patch and Fix signals according to the invention.
[0020] FIG. 9a and FIG. 9b illustrate timing diagrams for test
sequences on a link according to the invention.
DETAILED DESCRIPTION
[0021] Reference throughout this specification to "one embodiment,"
"an embodiment," or similar language means that a particular
feature, structure, or characteristic described in connection with
the various aspects and embodiments are included in at least one
embodiment of the invention. Thus, appearances of the phrases "in
one embodiment," "in an embodiment," "in certain embodiments," and
similar language throughout this specification refer to the various
aspects and embodiments of the invention. It is noted that, as used
in this description, the singular forms "a," "an" and "the" include
plural referents, unless the context clearly dictates
otherwise.
[0022] The described features, structures, or characteristics of
the invention may be combined in any suitable manner in accordance
with the aspects and one or more embodiments of the invention. In
the following description, numerous specific details are recited to
provide an understanding of various embodiments of the invention.
One skilled in the relevant art will recognize, however, that the
invention may be practiced without one or more of the specific
details, or with other methods, components, materials, and so
forth. In other instances, well-known structures, materials, or
operations are not shown or described in detail to avoid obscuring
the aspects of the invention.
[0023] In accordance with various aspects and embodiment of the
invention a distributed system implementation for cache coherence
includes distinct agent interface units, coherency controllers, and
memory interface units. The agents send requests in the form of
read and write transactions. The system also includes a memory. The
memory includes coherent memory regions. The memory is in
communication with the agents. The system includes a coherent
interconnect in communication with the memory and the agents. The
system includes a second coherent interconnect in communication
with the memory and the agents. The system also includes a
comparator for comparing at least two inputs, the comparator is in
communication with the two coherent interconnects. The features of
the system are outlined and discussed below.
[0024] A cache coherence system performs at least three essential
functions:
[0025] 1. Interfacing to coherent agents--This function includes
accepting transaction requests on behalf of a coherent agent and
presenting zero, one, or more transaction responses to the coherent
agent, as required. In addition, this function presents snoop
requests, which operate on the coherent agent's caches to enforce
coherence, and accepts snoop responses, which signal the result of
the snoop requests.
[0026] 2. Enforcing coherence--This function includes serializing
transaction requests from coherent agents and sending snoop
requests to a set of agents to perform coherence operations on
copies of data in the agent caches. The set of agents may include
any or all coherent agents and may be determined by a directory or
snoop filter (or some other filtering function) to minimize the
system bandwidth required to perform the coherence operations. This
function also includes receiving snoop responses from coherent
agents and providing the individual snoop responses or a summary of
the snoop responses to a coherent agent as part of a transaction
response.
[0027] 3. Interfacing to the next level of the memory
hierarchy--This function includes issuing read and write requests
to a memory, such as a DRAM controller or a next-level cache, among
other activities.
[0028] According to some embodiments a bypassable narrowing
serialization adapter is coupled to an error-prone link on its
upstream end. A bypassable widening serialization adapter is
coupled to the error-prone link one its downstream end. The
serialization adapter is normally bypassed. When it is known that a
bit is bad, the serialization adapters are enabled (not bypassed).
This allows the chip to function, though with lower throughput than
when the serialization adapters are bypassed.
[0029] In one such embodiment only a data portion of the
error-prone link carries serialized information, and a parallel
header portion of the link does not. This avoids additional logic
delay in the header path, where timing paths are more critical, and
uses the data portion of the link to protect headers. Headers need
more protection because they do not use ECC as the data portion
does.
[0030] FIG. 2 shows such an embodiment in which data is
sufficiently protected by ECC, and so needs no protection from hard
errors. However, the header is not protected by ECC, and must be
correct in order to properly transport packets. A link between
serialization adapters is serialized with header penalty None to
provide maximum data throughput. If a bad header bit is detected,
according to this embodiment, a None-to-one serialization adapter
is enabled upstream of the link and a One-to-none serialization
adapter is enabled downstream of the link. As a result, a portion
of the header uses the data wires. This reduces bandwidth, but
allows proper transmission of the header across one or more damaged
header link bits.
[0031] In another embodiment with header penalty One, two
bypassable narrowing serialization adapters, in series, but using
opposite halves of the datapath wires, are coupled upstream of the
error-prone link. Two corresponding widening serialization adapters
are coupled downstream.
[0032] Each serialization adapter adds a mux in the datapath. In
some embodiments, such as the ones above, they are 2:1 muxes. In
other embodiments the number of mux inputs can be as many as the
number of wires on the wide link. In some embodiments, muxes might
have additional inputs for sideband bits to replace any know bad
bit(s).
[0033] FIG. 3 shows the datapath of an embodiment with
serialization of just 1/2 of a link. The right side of the link is
error-prone and carries header information that cannot be corrected
with ECC. The left side of the link serves double duty when
serialization adapters are enabled. The result is a single 2:1 mux
on each link datapath wire.
[0034] FIG. 4 shows the datapath of an embodiment with
serialization of both halves of a link. Both sides of the link are
error-prone and need redundancy. Either side can be bypassed onto
the other side. The cost is two 2:1 muxes on each link datapath
wire.
[0035] In one embodiment, when a NoC link is known to be bad, all
NIUs with a route that transits the link enables such serialization
adapters. This keeps the serialization logic near the edges of the
NoC to avoid cluttering a simple transport network. In a simpler
embodiment, all NIUs in the chip enable bypass serialization
adapters when any link is known bad.
[0036] Some embodiments, have NIUs that perform a hardware built-in
self test (BIST) procedure when the chip comes out of its reset
state. Various BIST methods are known in the art. Some embodiments
allow for software to request and control BIST through certain NIU
transaction requests. In some such embodiments a simple dependable
subsystem, such as one with a dedicate microcontroller, checks the
BIST results and controls the serialization adapters.
[0037] Some embodiments provide hardware for automatic checking
during operation. In some embodiments the automatic checking is
performed by NIUs sending special packets that carry no useful
transaction data, but use various bits of links. In some
embodiments one or more links, themselves, send test signals on
various wires at various times to confirm expected operation.
[0038] FIG. 5 shows another kind of embodiment. On the upstream end
ECC is encoded on the packet header and data. In some embodiment a
simple parity is used on the physical data and link control
signals. Under normal operating conditions information progresses
through a double bypassable narrowing serialization adapter in
bypass mode. Information traverses the error-prone link, which
operate as two parallel bypassable links. Information then progress
through the double bypassable widening serialization adapter in
bypass mode. Next, information flows into a checking unit. In some
embodiments it is a simple parity checking unit. In other
embodiments it does an ECC correction on downstream-going data. In
some embodiments the checking unit also performs protocol checks,
such as detecting that a packet is on a valid route based on the ID
of its source and destination. If an error is detected on a wire in
the upper half of the long link, then the serialization adapters
are enabled (not bypassed) to use only the wires of the lower half
of the long link. If an error is detected on a bit in the lower
half of the long link, then the serialization adapters are enabled
(not bypassed) to use only the wires of the upper half of the long
link.
[0039] The check uses heuristics to ensure that only permanent
errors, not transient errors trigger an error. In some embodiments
that is a simple counter, and the bypass is enabled if the count
reaches a threshold. In other embodiments it is an error counter
with a clock-based decrementer so that the bypasses are only
enabled if a threshold of errors is reached within a certain time
window.
Self Healing
[0040] Other embodiments of the invention are able to heal a link,
without losing throughput, by using one or a small number of
sideband signals to bypass individual known-bad wires. Since wires
of bits within the same link are at high probability of short
circuits, have two bypass wires provides more than double the
safety benefit.
[0041] Some embodiments detect errors and report them for software
to enable a bypass. In other embodiments, hardware identifies a
failure point and stop using it. At any of every step of the
process, an interrupt is asserted to inform software of the
problem.
[0042] In some embodiments diagnosis is done with a tester shortly
after fabrication. In some embodiments diagnosis is part of a
power-on self-test. In some embodiments diagnosis occurs during
normal operation.
Diagnosis for Self Healing
[0043] According to some embodiments each initiator NIUs has a
table of links through which it has routes. It also knows the
serialization of each link. From time to time the initiator NIU
picks the next link from its table and sends a special kind of test
packet on a route that traverses the link. In some embodiments the
packet header format includes one or more special error codes for
physical errors.
[0044] The frequency of sending test packets should be rarely
enough so as to have negligible degradation of throughput on links,
but often enough to detect serious problems within a time frame
that corrective action can be safely taken. That might be every
10,000 cycles for some systems.
[0045] In some embodiments, the link table does not have an entry
for every link through which an NIU has routes. Instead, only links
that can be healed, are included in the link table.
[0046] If a test packet indicates an error, the initiator NIU marks
the link in the table as suspicious and records which link bit (or
byte) has the error. It the link is narrower than the NIU packet
interface then in some embodiments the initiator checks all copies
of the narrow link word within the test packet error to confirm the
consistency of the failure. In other embodiments, confirmation may
be achieved by one or more following errors from the same bit (or
byte) of the same link.
[0047] If a link marked suspicious encounters enough errors, or
exhibits a sufficiently high error rate to indicate a hard error
then the NIU marks it as confirmed bad.
[0048] In some embodiments, when a packet reaches a wrong
destination NIU it uses logic to compare the incorrect route ID bit
to data bits to confirm the error.
[0049] In some embodiments target NIUs simply forward test packets
back to the originating initiator NIU. In other embodiments target
NIUs forward test packets back to initiators, and if an error is
detected on a word, the target NIU marks the word as having an
error. That way the initiator NIU can confirm if it is a request or
response link error.
[0050] Different embodiments use different test packet formats, but
one kind of test packet includes a data sequence of walking 1s,
walking 0s, a 5-A-5-A-3-C-3-C type of sequence. Test sequences are
known in the art. Since the wires of a link is likely to be near
each other within the chip floorplan, if they have shorts it is
likely to be with another bit of the same link. Therefore, test
sequences should be designed to look for double-bad bits.
[0051] In some embodiments, initiator NIUs diagnose their
accessible links independently and act accordingly. In other
embodiments a centralized controller aggregates reports of
suspicious or confirmed-bad links. A centralized approach is more
helpful for pinpointing exactly which link is bad because most
individual initiators will have all routes through certain links so
that it is impossible to distinguish one from another.
[0052] In some embodiments, rather than detecting precisely which
bit or byte of the link is bad, all that is noted is which half, or
which quarter, or which of some portion of the link has the
problem.
[0053] These are just some novel methods of identifying problems.
Other methods of identifying problems in systems are known in the
art.
Methods of Healing
[0054] Links with multiple bytes can be used for healing. The
invention is applicable at bit granularity or at granularity of
data words larger than one byte, but the applicability will be
obvious in light of the following discussion referring to bytes.
The following discussion is of links with power of two numbers of
bytes, but application to links of other numbers of bytes will be
obvious.
[0055] According to some embodiments, links have serialization
adaptors at their ends. Serialization adapters do not modify packet
headers. Serialization affects throughput, but the serialization at
different links within the network is invisible to endpoint network
interface units. Serialization adapters go hand in hand with
buffers (FIFOs for narrowing and rate adapters for widening). The
relationship is known in the art.
[0056] According to some embodiments, healing is applied to 1/2 of
the link, 1/4 of the link, just one byte from the link, or any
portion of the link. In some embodiments headers and data are
treated separately. The trade-off is mux logic depth (logic delay)
versus granularity of the throughput vs wires trade-off for
redundancy.
[0057] Some embodiments perform healing at the transport layer.
Some initiator-target pairs have multiple routes. When an initiator
NIU detects that a route is bad, it switches route ID to use the
other route. To avoid ordering ID violations, the initiator NIU
backpressures any requests matching a pending transaction ID until
all requests of that ID have provided their response before
switching.
[0058] In some embodiments, the sending end of an error-prone link
duplicates one or more selected bit of the bus. FIG. 6 shows one
such embodiment. It comprises a transmitter and a receiver coupled
to the upstream and downstream ends of a link, respectively. The
link comprises an error-prone Data bus, driven by the transmitter;
two single patch signals, Patch 0 and Patch 1, also driven by the
transmitter; and two Fix signal is driven by the receiver. All
signals pass through a pipeline stage register. Various embodiments
have any number, including zero, pipeline stages on links,
generally as needed to meet clock speed requirements given long
distances for link signal propagation.
[0059] FIG. 7 shows details of the transmitter. During normal
operation, from time to time a Trigger signal is asserted. This
causes a Sequencer to send a one-cycle pulse on the Patch 0 signal.
The Sequencer proceeds to count cycles through a test pattern
during which it drives certain data bus signals with copies of
different bits on the Patch 0 and Patch 1 signals. The Sequencer is
programmable for different patterns. Walking 1s, walking 0s,
0x5A5A, and 0xC3C3 are common patterns.
[0060] The sequencer is configured, at design time, to know the
number of cycles of delay on the link due to pipelining. It drives
a delay Position signal indicating the bits being tested. If the
Fix 0 or Fix 1 signal is asserted, there is a Known bad bit, and
the transmitter locks one of the Patch 0 or Patch 1 signals on to
copy the bad signal from the Data bus.
[0061] FIG. 8 shows details of the receiver. When a pulse is
received on the Patch 0 signal, a Detect module begins an expected
test sequence. Patch 0 and Patch 1 signals are compared to data
bits selected according to the sequence. If a comparison finds a
mismatch then the detector increments a counter associated with the
data bit. Once every million cycles all counters decrement,
saturating at zero. If the count reaches three for a bit on a Patch
0 or Patch 1 mismatch, the Detect module asserts Fix 0 or Fix 1
respectively. It also locks a mux on the Data signal to use the
Patch signal rather than the Data bit signal.
[0062] FIG. 9 (a) shows a timing diagram for signals at a
transmitter link interface for a correctly functioning link. At
clock cycle 0, a pulse on Patch 0 signals the beginning of the test
sequence. The transmitter drives hex value 0xC3A5 on the 16-bit
data bus, which in binary, starting from the least significant bit,
is 10_10_01_01_11_00_00_11. The transmitter drives the even
numbered bits on the Patch 0 signal and the odd numbered bits on
the Patch 1 signal. Correct operation is confirmed by the fact that
the Fix 0 and Fix 1 signals are low starting at cycle 3, two cycles
after the transmitter begins sending. The two cycles are one each
for the downstream Patch pipeline registers and the upstream Fix
pipeline registers.
[0063] FIG. 9 (b) shows a timing diagram for signals at a
transmitter link interface for one manifestation of a link that has
a short between bit 4 and bit 11. The receiver identifies a
mismatch with the Patch 0 signal when compared to data bus bit 4.
That is signaled, and received by the transmitter in cycle 5. The
receiver identifies a mismatch with the Patch 1 signal when
compared to data bus bit 11. That is signaled, and received by the
transmitter in cycle 8. Upon receiving those signals, at the end of
the test procedure (after cycle 10) the transmitter permanently
ties the Patch 0 signal to bit 4 and the Patch 1 signal to bit 11.
The receiver will mux those patch signals into the data bus.
Dynamic Serialization for Power Saving
[0064] Some datapaths must be wide to handle maximum-case bandwidth
requirements, and must remain powered on to provide access during
low-bandwidth use cases. Keeping the full datapath logic powered up
wastes clock and leakage power. This is major if low-bandwidth use
cases are much more common than maximum-bandwidth use cases.
[0065] Some embodiments of the invention dynamically apply
narrowing and widening serialization adapter pairs to avoid using
parts of datapath, and then power off the unused datapath portions
when not needed. This is done with a separate clock tree for
eliminating clock toggling power and separate power net for supply
power-off.
[0066] One embodiment of a chip according to the invention has a
high-throughput 64-bit wide video display interface near a
low-throughput 16-bit wide microcontroller, both on the opposite
side of the chip from a DRAM memory interface. The video display
interface and the microcontroller share a 64-bit link to the DRAM
interface. When the video display is turned off, only the
microcontroller uses the link, so the chip powers off 48 bits of
the datapath, and leaves just 16 bits enabled to provide the
throughput needed for the microcontroller.
[0067] Embodiments should provide for safely powering off parts of
the datapath to avoid the loss of data transiting the link. Some
embodiments indicate the datapath width, on a sideband signal, with
each data beat. Some embodiments include a width signal in
conjunction with the clock and reset system-control signals. The
width signal acts as a clock gate on the logic that is turned
off.
[0068] Embodiments should also ensure that the width does not
change when a partial word is stored in a serialization adapter.
Some embodiments do so by indicating a datapath width in packet
headers. In some embodiments a single bit is sufficient to indicate
whether links are to operate as wide or as narrow.
[0069] In some embodiments NIUs have a sideband output indicating
the width of the widest pending packet. A sideband manager unit is
used for software to monitor that state in order to decide when it
is safe to power off part of the datapath. As with DVFS, software
is responsible for monitoring operating conditions.
[0070] Starting in high bandwidth mode, software decreases the mode
to a narrower one. After the widest pending packet signal becomes
narrower, software powers off the upper parts of the datapath.
Starting in low bandwidth mode, software powers on the upper parts
of the datapath. When power is applied, software increases the mode
to a wider one.
[0071] In some embodiments, all datapath bits are used to transfer
header information, and therefore must be powered on for any packet
to transit the link. In some embodiments, dynamic serialization for
power savings is only used on links with None header penalty,
whereby header bits are in parallel with data. In other
embodiments, header and data are serialized, but only certain
datapath wires are used for header information. The other datapath
wires are candidates for power-off. In some embodiments, links use
multiple cycles on fewer than 1/2 of the datapath wires to send
headers, at the cost of extra latency, but with the benefit of
being able to power off a larger portion of the datapath. In some
embodiments, header penalty is adapted as part of the power-off and
power-on procedure.
[0072] In some embodiments width adaptation for power savings is
used only within the NoC transport. In other embodiments, width
adaptation is used at transaction interfaces between the NoC and
other units in the system. In some embodiments, width adaptation
within a NoC is used to match the width of dynamic inter-chip
links.
[0073] To the extent that the terms "including", "includes",
"having", "has", "with", or variants thereof are used in either the
detailed description and the claims, such terms are intended to be
inclusive in a similar manner to the term "comprising". The
invention is described in accordance with the aspects and
embodiments in the following description with reference to the
figures, in which like numbers represent the same or similar
elements.
[0074] Unless defined otherwise, all technical and scientific terms
used herein have the same meaning as commonly understood by one of
ordinary skill in the art to which this invention belongs. The verb
couple, its gerundial forms, and other variants, should be
understood to refer to either direct connections or operative
manners of interaction between elements of the invention through
one or more intermediating elements, whether or not any such
intermediating element is recited. Any methods and materials
similar or equivalent to those described herein can also be used in
the practice of the invention. Representative illustrative methods
and materials are also described.
[0075] All publications and patents cited in this specification are
herein incorporated by reference as if each individual publication
or patent were specifically and individually indicated to be
incorporated by reference and are incorporated herein by reference
to disclose and describe the methods and/or system in connection
with which the publications are cited. The citation of any
publication is for its disclosure prior to the filing date and
should not be construed as an admission that the invention is not
entitled to antedate such publication by virtue of prior invention.
Further, the dates of publication provided may be different from
the actual publication dates which may need to be independently
confirmed.
[0076] Additionally, it is intended that such equivalents include
both currently known equivalents and equivalents developed in the
future, i.e., any elements developed that perform the same
function, regardless of structure. The scope of the invention,
therefore, is not intended to be limited to the exemplary
embodiments shown and described herein.
[0077] In accordance with the teaching of the invention a computer
and a computing device are articles of manufacture. Other examples
of an article of manufacture include: an electronic component
residing on a mother board, a server, a mainframe computer, or
other special purpose computer each having one or more processors
(e.g., a Central Processing Unit, a Graphical Processing Unit, or a
microprocessor) that is configured to execute a computer readable
program code (e.g., an algorithm, hardware, firmware, and/or
software) to receive data, transmit data, store data, or perform
methods.
[0078] The article of manufacture (e.g., computer or computing
device) includes a non-transitory computer readable medium or
storage that may include a series of instructions, such as computer
readable program steps or code encoded therein. In certain aspects
of the invention, the non-transitory computer readable medium
includes one or more data repositories. Thus, in certain
embodiments that are in accordance with any aspect of the
invention, computer readable program code (or code) is encoded in a
non-transitory computer readable medium of the computing device.
The processor or a module, in turn, executes the computer readable
program code to create or amend an existing computer-aided design
using a tool. The term "module" as used herein may refer to one or
more circuits, components, registers, processors, software
subroutines, or any combination thereof. In other aspects of the
embodiments, the creation or amendment of the computer-aided design
is implemented as a web-based software application in which
portions of the data related to the computer-aided design or the
tool or the computer readable program code are received or
transmitted to a computing device of a host.
[0079] An article of manufacture or system, in accordance with
various aspects of the invention, is implemented in a variety of
ways: with one or more distinct processors or microprocessors,
volatile and/or non-volatile memory and peripherals or peripheral
controllers; with an integrated microcontroller, which has a
processor, local volatile and non-volatile memory, peripherals and
input/output pins; discrete logic which implements a fixed version
of the article of manufacture or system; and programmable logic
which implements a version of the article of manufacture or system
which can be reprogrammed either through a local or remote
interface. Such logic could implement a control system either in
logic or via a set of commands executed by a processor.
[0080] Accordingly, the preceding merely illustrates the various
aspects and principles as incorporated in various embodiments of
the invention. It will be appreciated that those of ordinary skill
in the art will be able to devise various arrangements which,
although not explicitly described or shown herein, embody the
principles of the invention and are included within its spirit and
scope. Furthermore, all examples and conditional language recited
herein are principally intended to aid the reader in understanding
the principles of the invention and the concepts contributed by the
inventors to furthering the art, and are to be construed as being
without limitation to such specifically recited examples and
conditions. Moreover, all statements herein reciting principles,
aspects, and embodiments of the invention, as well as specific
examples thereof, are intended to encompass both structural and
functional equivalents thereof. Additionally, it is intended that
such equivalents include both currently known equivalents and
equivalents developed in the future, i.e., any elements developed
that perform the same function, regardless of structure.
[0081] Therefore, the scope of the invention is not intended to be
limited to the various aspects and embodiments discussed and
described herein. Rather, the scope and spirit of invention is
embodied by the appended claims.
* * * * *