U.S. patent application number 15/583316 was filed with the patent office on 2017-08-17 for methods and systems for functional analysis of an integrated circuit.
The applicant listed for this patent is MICROSEMI STORAGE SOLUTIONS (U.S.), INC.. Invention is credited to Theodore WILSON.
Application Number | 20170235864 15/583316 |
Document ID | / |
Family ID | 58615692 |
Filed Date | 2017-08-17 |
United States Patent
Application |
20170235864 |
Kind Code |
A1 |
WILSON; Theodore |
August 17, 2017 |
METHODS AND SYSTEMS FOR FUNCTIONAL ANALYSIS OF AN INTEGRATED
CIRCUIT
Abstract
An apparatus for monitoring operation of a design under test
(DUT) comprises a plurality of inputs comprising: an incoming clock
edge input connected to detect active clock edges provided to a
monitored clock gate; an outgoing clock edge input connected to
detect active clock edges sent from the monitored clock gate; an
enable input connected to detect enable signals provided to the
monitored clock gate and any leaf clock gates connected to receive
clock edges through the monitored clock gate; and a protocol input
connected to receive protocol signals specifying when the monitored
clock gate is required to output active clock edges. The apparatus
also comprises a memory in communication with the inputs for
storing values from the inputs, and a processor in communication
with the memory and the inputs, the processor programmed to
determine protocol compliance and to calculate energy consequences
of dropping of active clock edges.
Inventors: |
WILSON; Theodore;
(Vancouver, CA) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
MICROSEMI STORAGE SOLUTIONS (U.S.), INC. |
Aliso Viejo |
CA |
US |
|
|
Family ID: |
58615692 |
Appl. No.: |
15/583316 |
Filed: |
May 1, 2017 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14831505 |
Aug 20, 2015 |
9639641 |
|
|
15583316 |
|
|
|
|
Current U.S.
Class: |
716/108 |
Current CPC
Class: |
G06F 2119/06 20200101;
G06F 2119/12 20200101; G06F 30/396 20200101; G06F 30/327 20200101;
G06F 30/3312 20200101 |
International
Class: |
G06F 17/50 20060101
G06F017/50 |
Claims
1. An apparatus for monitoring operation of a design under test
(DUT), the DUT comprising a plurality of combinational logic
elements, a plurality of clocked sequential logic elements, and a
plurality of clock gate elements connected to selectively provide
clock edges to the clocked sequential logic elements, the apparatus
comprising: a plurality of inputs comprising: an incoming clock
edge input connected to detect active clock edges provided to a
monitored clock gate of the plurality of clock gate elements of the
DUT; an outgoing clock edge input connected to detect active clock
edges sent from the monitored clock gate; an enable input connected
to detect enable signals provided to the monitored clock gate and
any leaf clock gates of the plurality of clock gate elements of the
DUT connected to receive clock edges through the monitored clock
gate; and a protocol input connected to receive protocol signals
specifying when the monitored clock gate is required to output
active clock edges; a memory in communication with the plurality of
inputs for storing values from the plurality of inputs; and a
processor in communication with the memory and the plurality of
inputs, the processor programmed to determine protocol compliance
and to calculate energy consequences of dropping of active clock
edges at the monitored clock gate.
2. The apparatus of claim 1 wherein the plurality of inputs
comprises: a data-in input connected to detect signals on D-pins of
sequential logic elements within a fanout of the monitored clock
gate, the fanout of the monitored clock gate comprising all of the
clocked sequential logic elements connected to receive clock
signals through the monitored clock gate, and the combinational
logic elements that receive data from the clocked sequential logic
elements connected to receive clock signals through the monitored
clock gate; and a data-out input connected to detect signals on
Q-pins of sequential logic elements within the fanout of the
monitored clock gate, and wherein the processor is programmed to
calculate energy consumed in the fanout of the monitored clock
gate.
3. The apparatus of claim 2 wherein the plurality of inputs
comprises: an upstream clocking input connected to detect active
clock edges output from clock gates controlling sequential logic
elements upstream from the sequential logic elements controlled by
the monitored clock gate; and, a downstream clocking input
connected to detect active clock edges output to clock gates
controlling sequential logic elements downstream from the
sequential logic elements controlled by the monitored clock gate,
wherein the processor is programmed to determine unnecessary active
clock edges sent from the monitored clock gate and calculate
potential energy savings realizable through elimination of the
unnecessary active clock edges.
4. The apparatus of claim 1 wherein the plurality of inputs
comprises a timing input connected to receive a time window, and
wherein the processor is programmed to determine power savings due
to dropping of active clock edges at the monitored clock gate for
the time window.
5. The apparatus of claim 2 wherein the plurality of inputs
comprises a timing input connected to receive a time window, and
wherein the processor is programmed to determine power savings due
to dropping of active clock edges at the monitored clock gate, and
power consumed in the fanout of the monitored clock gate, for the
time window.
6. The apparatus of claim 3 wherein the plurality of inputs
comprises a timing input connected to receive a time window, and
wherein the processor is programmed to determine power savings due
to dropping of active clock edges at the monitored clock gate,
power consumed in the fanout of the monitored clock gate, and
potential power savings realizable through elimination of the
unnecessary active clock edges for the time window.
7. The apparatus of claim 1 wherein the processor is programmed to
attribute energy which would have been consumed by any leaf clock
gates connected to receive clock edges through the monitored clock
gate as energy saved due to dropping of an active clock edge at the
monitored clock gate when the enable input indicates that the leaf
clock gates were enabled when the active edge was dropped.
8. The apparatus of claim 2 wherein the processor is programmed to
calculate energy consumed in the fanout of the monitored clock gate
based on a number of sequential logic elements that change signal
levels on their Q-pins.
9. The apparatus of claim 3 wherein the processor is programmed to
determine that an unnecessary active clock edge is sent from the
monitored clock gate when the upstream clocking input indicates
that no active clock edge is sent to sequential logic elements
upstream from the sequential logic elements controlled by the
monitored clock gate.
10. The apparatus of claim 3 wherein the processor is programmed to
determine that an unnecessary active clock edge is sent from the
monitored clock gate when the downstream clocking input indicates
that no active clock edge is sent to sequential logic elements
downstream from the sequential logic elements controlled by the
monitored clock gate.
11. A method for monitoring operation of a design under test (DUT),
the DUT comprising a plurality of combinational logic elements, a
plurality of clocked sequential logic elements, and a plurality of
clock gate elements connected to selectively provide clock edges to
the clocked sequential logic elements, the method comprising:
detecting active clock edges provided to a monitored clock gate of
the plurality of clock gate elements of the DUT; detecting active
clock edges sent from the monitored clock gate; detecting enable
signals provided to the monitored clock gate and any leaf clock
gates of the plurality of clock gate elements of the DUT connected
to receive clock edges through the monitored clock gate; receiving
protocol signals specifying when the monitored clock gate is
required to output active clock edges; determining protocol
compliance by comparing the active clock edges sent from the
monitored clock gate to a set of required edges specified by the
protocol signals; and calculating energy consequences of dropping
of active clock edges at the monitored clock gate by comparing the
active clock edges provided to a monitored clock gate with the
active clock edges sent from the monitored clock gate.
12. The method of claim 11 comprising attributing energy which
would have been consumed by any leaf clock gates connected to
receive clock edges through the monitored clock gate as energy
saved due to dropping of an active clock edge at the monitored
clock gate when the enable signals indicate that the leaf clock
gates were enabled when the active edge was dropped.
13. The method of claim 12 comprising: detecting signals on D-pins
of sequential logic elements within a fanout of the monitored clock
gate, the fanout of the monitored clock gate comprising all of the
clocked sequential logic elements connected to receive clock
signals through the monitored clock gate, and the combinational
logic elements that receive data from the clocked sequential logic
elements connected to receive clock signals through the monitored
clock gate; detecting signals on Q-pins of sequential logic
elements within the fanout of the monitored clock gate; and
calculating energy consumed in the fanout of the monitored clock
gate based on the detected signals on the D-pins and Q-pins.
14. The method of claim 13 comprising calculating energy consumed
in the fanout of the monitored clock gate based on a number of
sequential logic elements that change signal levels on their
Q-pins.
15. The method of claim 13 comprising: detecting active clock edges
output from clock gates controlling sequential logic elements
upstream from the sequential logic elements controlled by the
monitored clock gate; detecting active clock edges output to clock
gates controlling sequential logic elements downstream from the
sequential logic elements controlled by the monitored clock gate;
and determining unnecessary active clock edges sent from the
monitored clock gate and calculating potential energy savings
realizable through elimination of the unnecessary active clock
edges.
16. The method of claim 15 comprising determining that an
unnecessary active clock edge is sent from the monitored clock gate
when the upstream clocking input indicates that no active clock
edge is sent to sequential logic elements upstream from the
sequential logic elements controlled by the monitored clock
gate.
17. The method of claim 15 comprising: determine that an
unnecessary active clock edge is sent from the monitored clock gate
when the downstream clocking input indicates that no active clock
edge is sent to sequential logic elements downstream from the
sequential logic elements controlled by the monitored clock
gate.
18. The method of claim 15 comprising: receiving a time window; and
determining power savings due to dropping of active clock edges at
the monitored clock gate, power consumed in the fanout of the
monitored clock gate, and potential power savings realizable
through elimination of the unnecessary active clock edges for the
time window.
19. The method of claim 15 comprising: defining a new transaction
as commencing at each detected active clock edge sent from the
monitored clock gate; and, maintaining statistics for at least a
current transaction and a previous transaction.
20. The method of claim 19 comprising, upon detecting each active
clock edge sent from the monitored clock gate: replacing a set of
previous transaction statistics with a set of current transaction
statistics; resolving energy consequences of the previous
transaction based on the set of previous transaction statistics;
and, resetting the set of current transaction statistics based on
data corresponding to a newest detected active clock edge sent from
the monitored clock gate.
Description
CROSS-REFERENCE
[0001] This application is a continuation of U.S. application Ser.
No. 14/831,505, filed on Aug. 20, 2015, which are hereby
incorporated by reference.
FIELD
[0002] The present disclosure relates to analysis of integrated
circuit designs.
BACKGROUND
[0003] The design of an integrated circuit typically includes,
among other aspects, functional verification and power analysis.
Functional verification refers to a practice of testing the circuit
and analyzing the results of the test to determine whether the
circuit is performing to specification. For example, given a set of
inputs, does the circuit generate the expected output? Functional
verification can be executed with a relatively large degree of
automation to cover all of the various operation conditions of the
circuit. Briefly, functional verification ensures that the logical
design of the circuit is correct.
[0004] In contrast, power analysis is an aspect of circuit design
that is directed to the physical requirements of the design
specification. Therefore, power analysis is generally performed
separately from functional verification, and the tools for power
analysis are different from the tools for functional
verification.
[0005] Conventional power analysis can report power consumption of
each cell and activity in each net of a design, given a design and
netlist activity file. However, these power reports do not indicate
whether the power consumption of a cell is correlated to the
functional workload of the cell. In practice, a cell may be
consuming power but not producing useful work. In this case,
conventional power analysis would not indicate whether power
consumption could be reduced.
[0006] It is desirable to obviate or mitigate these shortcomings of
conventional power analysis.
BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Embodiments of the present disclosure will now be described,
by way of example only, with reference to the attached Figures.
[0008] FIG. 1 is an example of a clock-gated circuit for connecting
to a clock gate monitor according to an embodiment of the present
disclosure.
[0009] FIG. 2 schematically illustrates an example clock gate
monitor for performing functional power analysis according to an
embodiment of the present disclosure.
[0010] FIG. 3 is an example of a clock-gated circuit for connecting
to a clock gate monitor according to an embodiment of the present
disclosure.
[0011] FIG. 4 is an example of a clock-gated circuit for connecting
to a clock gate monitor according to an embodiment of the present
disclosure.
[0012] FIG. 5 schematically illustrates another example clock gate
monitor for performing functional power analysis according to an
embodiment of the present disclosure.
[0013] FIG. 6 is an example of a clock-gated circuit for connecting
to a clock gate monitor according to an embodiment of the present
disclosure.
[0014] FIG. 7 is an example of a clock-gated circuit for connecting
to a clock gate monitor according to an embodiment of the present
disclosure.
[0015] FIG. 8 is a flowchart illustrating an example method of
processing inputs received at a clock gate monitor according to an
embodiment of the present disclosure.
DETAILED DESCRIPTION
[0016] Generally, the present disclosure provides methods and
systems for verifying a clock-gated integrated circuit using tools
that perform both functional verification and power analysis on
cells of the clock-gated integrated circuit.
[0017] An example tool examines the power consumption of cells
under a set of functional workloads. Examining power consumption
across a set of workloads enables prediction of power consumption
under related, but unmeasured, workloads. Thus, by correlating the
functional workload to the power consumption, the power analysis
performed by this tool may be considered a functional power
analysis. The tool can further determine, from the correlation
between the functional workload and the power consumption of a
cell, whether the power consumption of the cell, or set of cells,
may be reduced.
[0018] One aspect of the present disclosure provides an apparatus
for monitoring operation of a design under test (DUT) comprising a
plurality of combinational logic elements, a plurality of clocked
sequential logic elements, and a plurality of clock gate elements
connected to selectively provide clock edges to the clocked
sequential logic elements. The apparatus comprises a plurality of
inputs comprising: an incoming clock edge input connected to detect
active clock edges provided to a monitored clock gate; an outgoing
clock edge input connected to detect active clock edges sent from
the monitored clock gate; an enable input connected to detect
enable signals provided to the monitored clock gate and any leaf
clock gates connected to receive clock edges through the monitored
clock gate; and a protocol input connected to receive protocol
signals specifying when the monitored clock gate is required to
output active clock edges. The apparatus also comprises a memory in
communication with the plurality of inputs for storing values from
the plurality of inputs, and a processor in communication with the
memory and the plurality of inputs, the processor programmed to
determine protocol compliance and to calculate energy consequences
of dropping of active clock edges at the monitored clock gate.
[0019] One aspect of the present disclosure provides a method for
monitoring operation of a DUT. The method comprises detecting
active clock edges provided to a monitored clock gate, detecting
active clock edges sent from the monitored clock gate, detecting
enable signals provided to the monitored clock gate and any leaf
clock gates connected to receive clock edges through the monitored
clock gate, receiving protocol signals specifying when the
monitored clock gate is required to output active clock edges,
determining protocol compliance by comparing the active clock edges
sent from the monitored clock gate to a set of required edges
specified by the protocol signals; and calculating energy
consequences of dropping of active clock edges at the monitored
clock gate by comparing the active clock edges provided to a
monitored clock gate with the active clock edges sent from the
monitored clock gate.
[0020] Other aspects and features of the present disclosure will
become apparent to those ordinarily skilled in the art upon review
of the following description of specific embodiments in conjunction
with the accompanying figures.
[0021] FIG. 1 is an example of a clock-gated circuit which can be
used to demonstrate the functional power analysis operations of
various embodiments of the present disclosure. The circuit 100
comprises a first clock gate 101 connected to a second clock gate
102 and a third clock gate 103. In the illustrated example, clock
signals must pass through the first clock gate 101 before reaching
the second or third clock gate 102 or 103, and as such the first
clock gate 101 may be referred to as a "root", and the second and
third clock gates 102 and 103 may each be referred to as a "leaf",
of a clock gate "tree".
[0022] The first clock gate 101 is connected to each enable input
of a first flip-flop 111 and a second flip-flop 112. The second
clock gate 102 is connected to each enable input of a third
flip-flop 113 and a fourth flip-flop 114. The data inputs of the
first and second flip-flops 111 and 112 are triggered by other
upstream combinational logic elements (combinational cloud 121) of
which the exact nature is unimportant for the purpose of the
present disclosure. The data inputs of the third and fourth
flip-flops 113 and 114 are triggered by a second combinational
cloud 122, which is connected to the outputs of the first and
second flip-flops 111 and 112. Thus, the data inputs of the third
and fourth flip-flops 113 and 114 are indirectly connected to the
outputs of the first and second flip-flops 111 and 112.
[0023] Clock gating is a technique that selectively disables
synchronous flip-flops from switching states, which reduces the
power consumption of the flip-flops and consequently also power
dissipation of combinational cells driven by these flip flops. If
the circuit 100 did not have any clock gates, the clock inputs of
the flip-flops would be triggered by a common clock, and each
flip-flop would switch state on each active clock edge. (As one of
skill in the art will appreciate, depending on the design of the
circuit in question positive or negative edges may be active clock
edges.)
[0024] Clock gating selectively passes the clock signal to the
clock input of the flip-flop. If a certain flip-flop does not need
to change states (to pass the state of data input to the output)
then the clock signal can be gated off by the clock gate in order
to reduce power consumption in the flip-flop, as well as in the
fanout of the flip-flop (e.g., the combinational logic elements
receiving data from the flip-flop).
[0025] It is difficult to assess the impact of clock gating in the
design under representative operation. From a functional
verification perspective, the implementation of the clock gating
technique in a design should not destroy critical information (as
defined by relevant protocols for that design) that would otherwise
propagate through the design if clock edges were provided. The
destruction of this information would change the required
functional behavior of the circuit and would in effect be a
violation of relevant protocols (either design specific protocols
or industry protocols) applicable to the design.
[0026] From a power analysis perspective, clock gating should not
provide additional clock edges over what is minimally necessary to
move critical information through the design. Otherwise, the
dynamic power consumed consequential to delivering these edges is
wasted.
[0027] Ideally, clock gating should only add minimal complexity to
the clock tree--individual clock gates for every flip-flop would
not typically save enough power to justify their insertion. It is
an optimization problem to find a set of clock gates and enable
logic that saves power by reducing clock edges and discarding
propagated information at a small incremental cost in added clock
gate cells and combinational cells that define the enable logic for
each clock gate.
[0028] FIG. 2 shows an example clock gate monitor 200 for
performing functional power analysis according to an embodiment of
the present disclosure. The clock gate monitor 200 can be used to
help find the optimal clock gating logic that is protocol compliant
yet power efficient.
[0029] In an embodiment, the monitor 200 connects to a design under
test or device under test (DUT). The DUT comprises a plurality of
combinational logic elements (e.g. combinational clouds 121 and 122
of FIG. 1), a plurality of clocked sequential logic elements (e.g.
flip-flops 111, 112, 113, and 114 of FIG. 1), and a plurality of
clock gate elements (e.g. clock gates 101, 102 and 103 of FIG. 1)
connected to selectively provide clock edges to the clocked
sequential logic elements.
[0030] In a typical implementation, a DUT would be provided with a
plurality of clock gate monitors 200, with one clock gate monitor
200 connected to each clock gate of the DUT. In some
implementations, additional clock gate monitors 200 may be
connected to the clock gate input of each un-clock-gated flip-flop
in the DUT (i.e., at locations where additional clock gates could
be added to the DUT), for example in order to assist in evaluation
of whether or not to add additional clock gates.
[0031] The monitor 200 comprises a plurality of inputs, comprising:
an incoming clock edge input 201, an outgoing clock edge input 202,
an enable input 203, and a protocol input 204. The monitor 200 also
comprises a memory 205; and a processor 206.
[0032] The incoming clock edge input 201 is connected to the DUT to
detect active clock edges provided to a monitored clock gate.
[0033] The outgoing clock edge input 202 is connected to the DUT to
detect active clock edges sent from the monitored clock gate.
[0034] The enable input 203 is connected to the DUT to detect
enable signals provided to the monitored clock gate and any leaf
clock gate connected to receive clock edges through the monitored
clock gate.
[0035] The protocol input 204 is connected to the DUT to receive
protocol signals specifying when the monitored clock gate is
required to output active clock edges. A protocol signal active
edge is preferably provided prior to the delivery time of each
required output active clock edge.
[0036] In some embodiments, the protocol input 204 comprises two
bits and when either bit is high a required output active clock
edge is indicated. With such a configuration, a continuous series
of required output active clock edges can be represented in the
protocol input as {[0,1], [1,0], [0,1], [1,0], . . . }.
[0037] The memory 205 is in communication with the plurality of
inputs 201-204 and stores values from the plurality of inputs.
[0038] The processor 206 is in communication with the memory 205
and the plurality of inputs 201-204. The processor 206 is
programmed to determine protocol compliance and to calculate energy
consequences of dropping of active clock edges at a monitored clock
gate.
[0039] The monitor 200 provides dynamic analysis of a clock gate to
allow confirmation that, cycle to cycle, the clock gate is well
coordinated with other clock gates and conforms to relevant
protocols. Monitor 200 provides advantages over conventional static
analysis techniques, which are only based on toggle counts per net
over a time interval, not cycle to cycle behavior of the
design.
[0040] FIG. 3 shows an example of using a monitor (not shown), such
as the monitor 200, to determine a protocol violation. For example,
the flip-flop 111 may be part of the write address slave AXI
interface of a design. This flip-flop 111 must follow protocol
defined for a slave on the AXI write address channel, governed by
AWVALID and AWREADY. The behavior of AWVALID and AWREADY define
when this sequential cell must receive a clock edge to capture the
attributes of the AXI write address. The clock gate 102 for the
flip-flop 111 inherits the requirements for the specific
flip-flop.
[0041] The monitor connected to the clock gate 102 will determine
whether the clock gate 102 violates the AXI protocol. In the
example shown in FIG. 3, an incoming clock edge input (not shown),
similar to the incoming clock edge input 201 of the clock gate
monitor 200, receives the input of the clock gate 102 (which is the
output of clock gate 101), an outgoing clock edge input (not
shown), similar to the outgoing clock edge input 202 of the clock
gate monitor 200, receives the output of clock gate 102, an enable
input (not shown), similar to the enable input 203 of the clock
gate monitor 200, receives the enable single input to clock gate
102, and a protocol input (not shown), similar to the protocol
input 204 of the clock gate monitor 200, receives the protocol
signal 301.
[0042] A processor (not shown), similar to the processor 206 of the
clock gate monitor 200, determines, from the information provided
by the outgoing clock edge input and the protocol input, that a
protocol violation occurred. In particular, in the example of FIG.
3, the protocol signal 301 indicates that two edges should be
output from the clock gate 102, but only one edge is actually
output as indicated by the signal above protocol signal 301 in FIG.
3.
[0043] FIG. 4 shows an example of using a monitor (not shown), such
as the clock gate monitor 200, to determine a protocol spurious
clock edge. For example, the AXI protocol requires that flip-flops
111 and 112 receive two clock edges, as indicated by protocol
signal 401. The AXI protocol requires that flip-flops 113 and 114
receive one clock edge, as indicated by protocol signal 402.
Consequently, the monitor can determine that clock gate 103 is
outputting an additional unrequired or "spurious" clock signal that
is not correlated to the AXI protocol. Consequently, the flip-flops
113 and 114 are consuming more power than necessary given
protocol.
[0044] A processor (not shown), similar to the processor 206 of the
clock gate monitor 200, can also calculate how much energy is saved
by dropping a clock edge at the monitored clock gate. The processor
calculates the savings by comparing and incoming clock edge input
(not shown), similar to the incoming clock edge input 201 of the
clock gate monitor 200, to an outgoing clock edge input (not
shown), similar to the outgoing clock edge input 202 of the clock
gate monitor 200. For example, when the incoming clock edge input
has two edges and the outgoing clock edge input has one edge the
difference between the incoming clock edge input and the outgoing
clock edge input shows a single clock edge energy savings credited
to the monitored clock gate. In the case where the monitored clock
gate has one or more leaf clock gates further downstream in its
clock gate tree, the processor can also attribute additional energy
savings to the monitored clock gate for dropping an edge when an
enable input (not shown), similar to the enable input 203 of the
clock gate monitor 200, indicates that such leaf clock gates are
enabled, such that but for dropping of the edge at the monitored
clock gate that edge would have also propagated to the leaf clock
gates, and the monitored clock gate is credited with saving the
energy that would have been consumed by the leaf clock gates and
their fanouts.
[0045] FIG. 5 shows a clock gate monitor 500 according to a further
embodiment of the present disclosure. The clock gate monitor 500
can be used to help find the optimal clock gating logic that is
protocol compliant yet power efficient.
[0046] The monitor 500 connects to a design under test or device
under test (DUT). The DUT comprises a plurality of combinational
logic elements (e.g. combinational clouds 121 and 122), a plurality
of clocked sequential logic elements (e.g. flip-flops 111, 112,
113, and 114), and a plurality of clock gate elements (clock gates
101 and 102) connected to selectively provide clock edges to the
clocked sequential logic elements.
[0047] In a typical implementation, a DUT would be provided with a
plurality of clock gate monitors 500, with one clock gate monitor
500 connected to each clock gate of the DUT. In some
implementations, additional clock gate monitors 500 may be
connected to the clock gate input of sets of un-clock-gated
flip-flop in the DUT (i.e., at locations where additional clock
gates could be added to the DUT), for example in order to assist in
evaluation of whether or not to add additional clock gates.
[0048] The monitor 500 comprises a plurality of inputs, comprising:
an incoming clock edge input 501, an outgoing clock edge input 502,
an enable input 503, a protocol input 504, a data-in input 507, a
data-out input 508, an upstream clocking input 509, a downstream
clocking input 510, and a time window input 511. The monitor 500
also comprises a memory 505 and a processor 506.
[0049] The incoming clock edge input 501 is connected to the DUT to
detect active clock edges provided to a monitored clock gate.
[0050] The outgoing clock edge input 502 is connected to the DUT to
detect active clock edges sent from the monitored clock gate.
[0051] The enable input 503 is connected to the DUT to detect
enable signals provided to the monitored clock gate and any leaf
clock gate connected to receive clock edges through the monitored
clock gate.
[0052] The protocol input 504 is connected to the DUT to receive
protocol signals specifying when the monitored clock gate is
required to output active clock edges. A protocol signal is
preferably provided just prior to the time each required for
outputting each active clock edge. In some embodiments, the
protocol input 504 comprises two bits as described above with
reference to FIG. 2.
[0053] The memory 505 is in communication with the plurality of
inputs and stores values from the plurality of inputs.
[0054] The processor 506 is in communication with the memory and
the plurality of inputs. The processor is programmed to determine
protocol compliance and to calculate power consequences of clock
gating.
[0055] The data-in input 507 is connected to detect signals on data
input pins (D-pins) of sequential logic elements within a fanout of
the monitored clock gate. The fanout of the monitored clock gate
comprises all of the clocked sequential elements connected to
receive clock signals through the monitored clock gate.
[0056] The data-out input 508 is connected to detect signals on
data output pins (Q-pins) of sequential logic elements within the
fanout of the monitored clock gate.
[0057] The upstream clocking input 509 is connected to detect
active clock edges output from the clock gates controlling the
sequential logic elements upstream from the sequential logic
elements controlled by the monitored clock gate.
[0058] The downstream clocking input 510 is connected to detect
active clock edges output from the clock gates controlling the
sequential logic elements downstream from the sequential logic
elements controlled by the monitored clock gate.
[0059] The time window input 511 receives a time window range
instructing the processor 506 to perform certain operations for
that time window range. The time window range may, for example, be
a fixed or adjustable number of clock cycles. In an embodiment, the
processor 506 determines power saving based on the energy saved due
to dropping of active clock edges at the monitored clock gate for
the time window. In another embodiment, the processor 506
determines power saving based on energy saved due to dropping of
active clock edges at the monitored clock gate, and energy saved in
the fanout of the monitored clock gate, for the time window. In yet
another embodiment, the processor 506 determines power savings
based on energy saved due to dropping of active clock edges at the
monitored clock gate, energy saved in the fanout of the monitored
clock gate, and also determines potential additional power savings
realizable through elimination of the unnecessary active clock
edges for the time window. In yet another embodiment, the processor
506 determines power savings based on energy saved due to dropping
of active clock edges at the monitored clock gate, energy saved in
the fanout of the monitored clock gate, potential additional power
savings realizable through elimination of the unnecessary active
clock edges for the time window. In yet another embodiment, the
processor 506 determines power savings based on energy saved due to
dropping of active clock edges at the monitored clock gate, energy
saved in the fanout of the monitored clock gate, potential
additional power savings realizable through elimination of the
unnecessary active clock edges for the time window, and also
determines additional power savings realizable through elimination
of unnecessary combinational activity.
[0060] The monitor 500 provides the ability to determine if clock
gates for flip-flops upstream and downstream of each other are well
coordinated so that required information propagates with a minimum
number of clock edges. Protocol violations and compliance, spurious
clock edges, actual energy/power savings due to clock gating and
potential additional energy/power savings may be determined by the
monitor 500 in substantially the same manner as described above
with respect to the monitor 200 of FIG. 2. Energy consumption in
the fanout of the monitored clock gate may be determined, for
example, based on the data-in and data-out inputs 507 and 508 which
indicate the set of sequential elements in the fanout that change
output values in response to a clock edge.
[0061] FIG. 6 shows an example of using the monitor 500 to
determine coordination of upstream and downstream clock edges. For
example, the clock gate 101 propagates an edge, while the clock
gate 102 drops the edge. The clock gate 103, however, propagates
the same edge that was dropped by clock gate 103, but clock gate
103 has a fanout that is located downstream of the fanout of clock
gate 102. Downstream means that the fanout of the clock gate 103
(the clocked sequential logic elements connected to receive clock
edges through clock gate 103) receives data from the fanout of the
clock gate 102.
[0062] In this case, the monitor 500 is connected to the clock gate
103 and the processor 506 compares the upstream clock input 509 and
the outgoing clock edge input 502 and determines that the upstream
clock gate 102 dropped an edge and no new information will be
propagated to the fanout of the clock gate 103 for that edge.
Therefore, the processor will determine that the clock gate 103
could have dropped the edge that was dropped by clock gate 102,
such that additional energy/power savings could be realized.
[0063] FIG. 7 shows another example of using the monitor 500 to
determine coordination of upstream and downstream clock edges. For
example, the clock gate 101 propagates an edge, while the clock
gate 103 drops the edge. The clock gate 102, however, propagates
the same edge but has a fanout that is located upstream of the
fanout of clock gate 103. Upstream means that the fanout of the
clock gate 102 (the clocked sequential logic elements connected to
clock gate 103) are providing data to the fanout of the clock gate
103.
[0064] In this case, the monitor 500 is connected to the clock gate
102 and the processor 506 will compare the downstream clock input
510 and the outgoing clock edge input 502 and determine that the
downstream clock gate 103 dropped an edge and the information
propagated by the fanout of clock gate 102 to the fanout of the
clock gate 103 for that edge will simply be discarded. Therefore,
the processor will determine that the clock gate 102 could have
dropped the edge that was dropped by clock gate 103, such that
additional energy/power savings could be realized.
[0065] In some embodiments, one or more clock gate monitors (such
as monitor 200 or 500 described above) are implemented on a chip,
with the inputs implemented on pins of the chip. The following
table describes the pins of an example clock gate monitor, with
reference to corresponding inputs of monitors 200/500 described
above where applicable:
TABLE-US-00001 TABLE 1 Pin Width Purpose/Notes ECK 1 Provides an
ability to attribute all past activity to a net power savings or
power loss, and check compliance to protocol. (Corresponds to
outgoing clock edge input 202/502.) D_PINS R bits Together with CK,
provides an ability to sample values at the determined by inputs of
sequential fanout that may not propagate to the total sequential
outputs of the sequential fanout because of clock gating (CK cell
input pins. moved but not ECK), thereby attributing the reduction
in power to the clock gate behavior. (Corresponds to data-in input
507.) Q_PINS Q bits Provides an ability to observe activity at the
outputs of determined by sequential fanout that did stimulate the
combinational fanout, total sequential which in turn provides the
ability to attribute all dynamic cell output power in the design to
the behavior of individual clock gates. pins. Without Q PINS it is
not possible to accurately model combinational power consumption
nor power dissipated internal to sequential cells. (Corresponds to
data-out input 508.) U_ECK N bits, one bit Provides an ability to
observe whether or not previous to this per upstream ECK active
edge at least 1 upstream clock gate provided a clock gate that
clock edge to its own fanout, thereby propagating information
propagates that could be captured by this clock gate. Lacking at
least 1 information to upstream clock gate ECK active edge, there
is no new the sequential information for this clock gate ECK to
capture. fanout of the (Corresponds to upstream clocking input
509.) monitored clock gate. D_ECK P bits, one bit Provides an
ability to observe whether or not after this ECK per active edge at
least 1 downstream clock gate provided a downstream clock edge to
its own fanout, thereby capturing information clock gate that
propagated by this clock gate. Lacking at least 1 propagates
downstream clock gate ECK active edge, the information information
propagated by this clock gate has been discarded and hence from the
the ECK edge produced wasted power. monitored (Corresponds to
downstream clocking input 510.) clock gate. MUST_ECK 2 Provides an
ability to observe whether or not this ECK edge is required by
protocol. The count of past MUST_ECK active edges is either equal
to 0 (protocol spurious) 1 (protocol required) >1 (some protocol
required edges were not delivered). One of a pair of bits
transitions low-high ahead of the CK active edge to indicate that
protocol requires an ECK active edge before the end of the current
clock cycle. Two bits are used so that an OR indicates times when
clocks should be provided by the clock gate. Two bits are used so
that back to back clock edges can produce positive edges on WINDOW
near inactive edges of CK, and ahead of the active edge of CK.
(Corresponds to protocol input 204/504.) CK 1 Provides an ability
to count the number of input clock active edges dropped before ECK
has an active edge. For leaf clock gates in particular this ensures
they are not credited with dropping edges that were dropped by root
clock gates because ECK didn't toggle because CK input didn't
toggle either. (Corresponds to incoming clock edge input 201/501.)
E S bits, one bit Provides an ability to credit a root clock gate
with power per fanout leaf savings in leaf clock gates when CK does
not propagate to clock gate in ECK because specific leaf clock
gates have their enable pin the fanout of active. the monitored
(Corresponds to enable input 203/503.) clock gate. WINDOW 1, both
edges Typical practice is to define static time windows in which
active. power is computed for the activity in the design. In
contrast, WINDOW allows the clock gate to track the power
consequence of each pin toggle as they occur. WINDOW can be
arbitrarily set to the clock period itself or to the duration of a
packet passing through the design and similar. The WINDOW pin
exposes the time-accuracy of the monitor so that a user can see
power dissipated in each packet etc. This flexibility means that
unlike current practice it is not necessary for a user to, for
example, work backwards from reported power consumed between 100
and 120 nanoseconds to specific clock cycles nor specific design
specific events. Ultimately the clock gate monitor is still a
digital, event driven, apparatus. It cannot look within a single
clock cycle and determine that there is high power dissipation
because the clock wave form is very crisp, with a lot of high
frequency components. But within limitations of digital event-based
modelling the clock gate monitor can be made arbitrarily time
accurate. (Corresponds to time window input 511.)
[0066] In some embodiments, the clock gate monitor assesses all
dropped and propagated clock edges for power impact. The power
impact of a dropped or propagated clock edge is defined as the
dynamic power consumed (or saved) by the clock gate, the sequential
fanout of the clock gate, and the combinational fanout of the
sequential fanout of the clock gate consequential to the propagated
(or dropped) clock edge.
[0067] In some embodiments, the clock gate monitor also assesses
all enable toggles for power impact. The power impact of an enable
toggle may be defined as the dynamic power dissipated in the clock
gate and combinational cloud that solely provides the clock gate
enable signal consequential to the enable toggle. The word solely
here allows a clock gate monitor to, in extreme cases, attribute
very small power consequence to enable toggles when that enable is
generated by a combinational cloud that drives other design cells
than the clock gate alone.
[0068] By assessing the output clock edges and enable toggles for
total power impact, a set of clock gate monitors, one per clock
gate in the design, is able to assess dynamic power consumption in
the entire design on the basis firstly of enable toggles and output
clock edges. In such a full design assessment, ungated sequential
cells may be provided with a set of virtual clock gates that pass
all clock edges.
[0069] To increase the accuracy of the power consequences
attributed to the clock gate, in some embodiments the D (input) and
Q (output) pins of the sequential fanout of each clock gate are
also monitored. These additional inputs allow the specific power
consequence within each sequential cell of propagation of D (input)
edges to Q (output) edges to be assessed, as well as the specific
power consequence of Q (output) edges on the combinational
fanout.
[0070] The total dynamic power impact of dropped or propagated
clock edges, and in aggregate the entire design, may be defined as
either required or wasted on the basis of: protocol requirements;
the previous active clock edges of upstream clock gates; the
subsequent active clock edges of downstream clock gates; the
current enable state of fanout clock gates; and, activity at D
(input) and Q (output) pins of fanout sequential cells.
[0071] The power consequence of the clock gate may be assessed as
follows:
[0072] 1. For all clock gates:
[0073] a) A missing protocol-required output active clock edge is
defined as missing required power. This indicates that the design
does not meet requirements. A dropped output active clock edge is
defined as saving power otherwise.
[0074] b) An additional output active clock edge that is not
required by protocol is defined as producing wasted power.
[0075] c) All other output active clock edges are defined as
producing required power unless the previous behavior of upstream
clock gates dictates otherwise.
[0076] 2. For a clock gate with upstream clock gates:
[0077] a) An output active clock edge without at least one previous
upstream active clock edge is defined as wasted power as it is not
possible for this output active clock edge to capture any new
information propagated by an upstream active clock edge.
[0078] b) All other output active clock edges are defined as
producing required power unless the subsequent behavior of
downstream clock gates dictates otherwise.
[0079] 3. For a clock gate with downstream clock gates:
[0080] a) An output active clock edge without at least one
subsequent downstream active clock edge is defined as wasted power
as it is not possible that any information propagated by this clock
gate was captured by downstream sequential elements.
[0081] b) All other output active clock edges are defined as
producing required power.
[0082] 4. For a root clock gate with leaf clock gates that drops an
output clock edge:
[0083] a) The power consequence of this dropped output active edge
is increased for each leaf clock gate with an asserted enable
signal. The root clock gate is credited with larger power impact
for each leaf clock gate that would otherwise have propagated the
clock edge.
[0084] The power consequence of the clock gate behavior is
increased as follows:
[0085] 1. For all dropped active clock edges, the power consequence
of the dropped active edge is increased by: the set of D input
toggles that are not propagated to Q toggles (power which would
have been dissipated by sequential fanout); and, the set of Q
toggles that do not excite the combinational cloud (power which
would have been dissipated by combinational fanout).
[0086] 2. For all propagated active clock edges, the power
consequence of the propagated active edge is increased by: the set
of D input toggles that are propagated to Q toggles (internal power
dissipated by sequential fanout); and, the set of Q toggle that do
excite the combinational cloud (power dissipated by combinational
fanout).
[0087] In some embodiments, the activity of the monitored clock
gate is viewed by the clock gate monitor as a sequence of
transactions. Each transaction is defined by a single output active
edge of the clock gate. All activity of the clock gate and cells
grouped with the clock gate is associated with specific output
active edges of the clock gate in a manner that indicates the net
positive (negative) impact of the clock gate on device power. A set
of transactions may be grouped together within a window defined by
a time window input as discussed above.
[0088] In some embodiments, a clock gate monitor maintains
statistics for each of a total window, a previous transaction, and
a current transaction. Such statistics may, for example, be
determined by incrementing various counters for the window,
previous transaction and current transaction, as discussed further
below. The total window statistics indicate the energy consumed and
saved by the clock gate is accumulated in a period that spans all
ECK transactions completed within a power window defined by WINDOW
pin edges. The previous transaction statistics are maintained
because some pin activity is only possible due to an ECK active
edge and occurs after the ECK edge. Consequently, this activity can
only be recorded for the previous ECK transaction when the current
ECK active edge is observed. The current transaction statistics may
be used to track pin activity that is not due to an ECK active
edge, and as such may be recorded for the current ECK
transaction.
[0089] As an extension to basic functionality, if a clock gate
monitor is observing netlist activity with functional timing and
many non-zero transition delays, then ECK edges may be used to
define delays at which point pins are examined for activity.
[0090] In some embodiments, a clock gate monitor retains a set of 3
statistics only (i.e., window, previous transaction and current
transaction), replacing content as events occur. In some
embodiments, statistics could be forwarded or copied to an external
agent or memory prior to replacing content.
[0091] The following table illustrates examples of how pin activity
is mapped to transactions in some embodiments:
TABLE-US-00002 TABLE 2 Pin Transaction Notes ECK Definition of Each
active edge of ECK defines a new transaction, Boundaries completes
the previous transaction. D_PINS Current Toggles on D pins are
sampled on ECK active edges and recorded for the current
transaction. Some or none of these toggles will later propagate to
Q pin toggles. Q_PINS Previous Toggles on Q pins occur after ECK
active edges and are recorded for the previous transaction. U_ECK
Current Output active edges from upstream clock gates must have
occurred prior to the output active edge of this clock gate
otherwise the current output active edge is spurious - not possibly
capturing new information. D_ECK Previous Active edges of
downstream clock gates must occur after the previous output active
edge of this clock gate and before the current output active edge
of this clock gate otherwise the previous output active edge is
spurious - information was discarded by downstream clock gates.
MUST_ECK Current Prior to the current output active edge, protocol
monitors have either provided 1, none, or more than 1 MUST_ECK
positive edge indicating that the current output active edge is
either required, spurious, or insufficient with regards to
protocol. CK Current Additional CK active edges prior to the output
active edge of the clock gate are claimed as saved energy for the
current transaction. E Current Any enable edges prior to the output
active edge are claimed for the current transaction to provide
indication of when the enable pin is toggling too frequently to
allow the clock gate to save any net energy. WINDOW Total Window
Edges of the window pin define the end of the total window,
triggering computation of energy to power followed by clearing all
statistics for the next window.
[0092] FIG. 8 is a flowchart showing an example method 800 carried
out by a clock gate monitor according to one embodiment. Throughout
method 800, the clock gate monitor monitors the outgoing clock edge
input for an active clock edge at 802. When an outgoing active
clock edge is detected (block 802 YES output), the previous
transaction statistics are replaced with the current transaction
statistics at 804, and the consequences of the previous transaction
are resolved at 806. After the previous transaction has been
resolved, counters for the total window are incremented based on
the previous transaction statistics at 808.
[0093] Incrementing counters for the total window may, for example,
involve any or all of the following counters: [0094]
Protocol_spurious_fJ: incremented if the last ECK edge is spurious
with respect to protocol. [0095] Protocol_ck_edges_lost_fJ:
incremented if missing protocol required clock cycles were
detected. [0096] No_receiver_fJ: incremented if no downstream clock
gate accepted data from last ECK edge. [0097] No_transmitter_fJ:
incremented if no upstream clock gate sent data for capture by last
ECK edge. [0098] Q_edges_saved_fJ: incremented by difference
between observed D edges and Q edges across fanout for last ECK
edge using fJ_PER_Q_EDGE[k] for Q bit k, all k. [0099]
Combo_edges_saved_fJ: incremented by difference between observed D
edges and Q edges across fanout for last ECK edge. [0100]
ECK_edges_saved_fJ: incremented across fanout clock gates for all
edges that otherwise would have been delivered to their fanout.
[0101] E_fJ: incremented for each E (enable) edge between the last
active ECK edge and the previous active ECK edge. [0102] Total_fJ,
power consumed by the cluster of cells grouped with the clock gate:
incremented by: [0103] Q edges*(fJ_PER_D_TO_Q_EDGE+fJ_PER_Q_EDGE)
[0104] +fJ_PER_ECK_EDGE[0] [0105] +total_leakage_fJ, (leakage is
calculated based on time as known in the art) [0106] Net_fJ,
negative representing net energy savings:
leakage_fj+E_fj-q_edges_saved_fj
eck_edgs_saved_fj-fanout_saved-combo_saved_fJ.
[0107] At 810, the current transaction counters are cleared and the
current transaction statistics are reset based on the most recent
outgoing active clock edge detected. Resetting the current
transaction statistics at 810 may, for example, involve: recording
the total CK active edges dropped prior to the current ECK active
edge (this contributes to power savings by this clock gate);
recording total ECK active edges not transmitted to leaf clock
gates when they were otherwise enabled (this contributes to power
savings by this clock gate); recording total E edges prior to the
new ECK edge (this contributes to power cost of this clock gate);
and/or, if protocol applies, recording the number of protocol
required ECK active edges prior to the current ECK active edge
(this determines if the current edge is required by protocol and
whether previous required edges were not delivered).
[0108] At 812, counters for the current transaction are incremented
based on events at the inputs of the clock gate monitor.
Incrementing the current transaction counters at 812 continues as
long as no new outgoing active clock edge is detected (block 802 NO
output). When a new outgoing active clock edge is detected (block
802 YES output), the method 800 returns to 804 and continues as
discussed above.
[0109] The examples above include descriptions of ideal clock
gating behavior. This is defined by the functional specification of
the circuit design. Conventionally, comparing ideal design behavior
to actual design behavior is a functional verification exercise.
The clock gate monitor embodiments of the present disclosure
provide functional power analysis by allowing for comparison of the
ideal clock gate behavior with actual clock gate behavior through
dynamic, cycle-by-cycle, clock gate power monitoring and
processing.
[0110] In the preceding description, for purposes of explanation,
numerous details are set forth in order to provide a thorough
understanding of the embodiments. However, it will be apparent to
one skilled in the art that these specific details are not
required. In other instances, well-known electrical structures and
circuits are shown in block diagram form in order not to obscure
the understanding. For example, specific details are not provided
as to whether the embodiments described herein are implemented as a
software routine, hardware circuit, firmware, or a combination
thereof.
[0111] Embodiments of the disclosure can be represented as a
computer program product stored in a machine-readable medium (also
referred to as a computer-readable medium, a processor-readable
medium, or a computer usable medium having a computer-readable
program code embodied therein). The machine-readable medium can be
any suitable tangible, non-transitory medium, including magnetic,
optical, or electrical storage medium including a diskette, compact
disk read only memory (CD-ROM), memory device (volatile or
non-volatile), or similar storage mechanism. The machine-readable
medium can contain various sets of instructions, code sequences,
configuration information, or other data, which, when executed,
cause a processor to perform steps in a method according to an
embodiment of the disclosure. Those of ordinary skill in the art
will appreciate that other instructions and operations necessary to
implement the described implementations can also be stored on the
machine-readable medium. The instructions stored on the
machine-readable medium can be executed by a processor or other
suitable processing device, and can interface with circuitry to
perform the described tasks.
[0112] The above-described embodiments are intended to be examples
only. Alterations, modifications and variations can be effected to
the particular embodiments by those of skill in the art. The scope
of the claims should not be limited by the particular embodiments
set forth herein, but should be construed in a manner consistent
with the specification as a whole.
* * * * *