U.S. patent application number 11/752035 was filed with the patent office on 2007-09-20 for method and apparatus for converting globally clock-gated circuits to locally clock-gated circuits.
Invention is credited to Allen P. Haar, Joseph A. Iadanza, Sebastian T. Ventrone, Ivan L. Wemple.
Application Number | 20070220468 11/752035 |
Document ID | / |
Family ID | 36317786 |
Filed Date | 2007-09-20 |
United States Patent
Application |
20070220468 |
Kind Code |
A1 |
Haar; Allen P. ; et
al. |
September 20, 2007 |
Method and Apparatus for Converting Globally Clock-Gated Circuits
to Locally Clock-Gated Circuits
Abstract
A method for converting globally clock-gated circuits to locally
clock-gated circuits is disclosed. A timing analysis is initially
performed on an integrated circuit (IC) design to generate a slack
time report for all globally clock-gated circuits within the IC
design. Based on their respective slack time indicated in the slack
time report, all globally clock-gated circuits that should be
connected to locally generated clocks are identified. After
disconnecting from a global clock tree, each of the identified
globally clock-gated circuits is subsequently connected to a
locally generated clock having a clock delay comparable to its
slack time indicated in the slack time report.
Inventors: |
Haar; Allen P.; (State
College, PA) ; Iadanza; Joseph A.; (Hinesburg,
VT) ; Ventrone; Sebastian T.; (South Burlington,
VT) ; Wemple; Ivan L.; (Shelburne, VT) |
Correspondence
Address: |
DILLON & YUDELL LLP
8911 N. CAPITAL OF TEXAS HWY.,
SUITE 2110
AUSTIN
TX
78759
US
|
Family ID: |
36317786 |
Appl. No.: |
11/752035 |
Filed: |
May 22, 2007 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10904397 |
Nov 8, 2004 |
|
|
|
11752035 |
May 22, 2007 |
|
|
|
Current U.S.
Class: |
716/103 ;
716/108; 716/133 |
Current CPC
Class: |
G06F 30/30 20200101 |
Class at
Publication: |
716/006 |
International
Class: |
G06F 17/50 20060101
G06F017/50 |
Claims
1. A method for converting globally clock-gated circuits to locally
clock-gated circuits within in integrated circuit design, said
method comprising: identifying globally clock-gated circuits that
change state infrequently; and converting said identified globally
clock-gated circuits to corresponding locally clock-gated
circuits.
2. The method of claim 1, wherein said method further includes
assigning said identified globally clock-gated circuits to a
plurality of group according to their slack time indicated in a
slack time report.
3. The method of claim 2, wherein said method further includes
providing said identified globally clock-gated circuit with
respective locally generated clocks having a clock delay comparable
to their slack time indicated in said slack time report.
4. The method of claim 1, wherein said method further includes
performing a timing analysis on said integrated circuit design
after said identified globally clock-gated circuits has been
connected to said respective locally generated clocks.
5. The method of claim 1, wherein said method further includes
determining whether or not a globally clock-gated circuit should be
converted to a locally clock-gated circuit.
6. The method of claim 5, wherein said determining further includes
utilizing a logic circuit netlist, a switching factor, and a
switching factor threshold to determine whether or not a globally
clock-gated circuit should be converted to a locally clock-gated
circuit.
Description
RELATED PATENT APPLICATION
[0001] The present application is a divisional of U.S. Patent
application Ser. No. 10/904,397 (Atty. Docket No. BUR920040011US1),
filed on Nov. 8, 2004, and entitled, "Method and Apparatus for
Converting Globally Clock-Gated Circuits to Locally Clock-Gated
Circuits," which is incorporated herein by reference.
BACKGROUND OF THE INVENTION
[0002] 1. Technical Field
[0003] The present invention relates to integrated circuit design
methods in general, and, in particular, to a method for assigning
clock-gated circuits within an integrated circuit design. Still
more particularly, the present invention relates to a method for
converting globally clock-gated circuits to locally clock-gated
circuits within an integrated circuit design.
[0004] 2. Description of Related Art
[0005] A digital integrated circuit (IC) design typically employs
many clock-gated circuits, such as flip-flops, latches, etc., that
are periodically clocked by edges of a clock signal. Since there is
a very large number (thousands or millions) of clock-gated circuits
within an IC design, a single clock signal driver normally cannot
directly supply a clock signal to all of the clock-gated circuits.
Instead, a global clock tree having a set of buffers arranged in a
tree-like network is utilized to supply clock signals to various
clock-gated circuits. All circuits clocked by a global clock tree
are considered as globally clock-gated circuits.
[0006] In order to ensure proper synchronization between various
parts of a circuit design, each clock signal edge should reach all
synchronization points at substantially the same time. Thus, the
time required for a clock signal edge to travel from its source to
any clock-gated circuit should be substantially the same for all
paths it follows through the global clock tree. The time required
for a clock signal edge to work its way through the global clock
tree from its source to a globally clock-gated circuit depends on
many factors, such as the lengths of conductors in the path, the
number of buffers the edge must pass through, the switching delay
of each buffer, the amount of attenuation of the clock signal
incurs between buffer stages, and the load each buffer must drive.
Accordingly, the global clock tree needs to be balanced by ensuring
that all clock signal paths between any two tree levels are of
substantially similar length and impedance, that all buffers at any
level of the global clock tree drive the same number of buffers or
globally clock-gated circuits at the next level of the global clock
tree, and that all buffers on any given level have similar
characteristics.
[0007] Generally speaking, global clock trees consume a relatively
large amount of power. Global clock trees typically attribute to
approximately 30-60% of the total power consumption of an IC
design. In addition, the clocking of a global clock tree requires a
rigid boundary between pipeline stages such that all logic must
line up upon the boundaries. Thus, the ability to improve
performance either in the current pipeline stage or in the next
pipeline stage becomes locked to the clock boundary. The present
disclosure provides a method for reducing overall clocking power
consumption of an IC design such that additional flexibility in
clock management can be achieved.
SUMMARY OF THE INVENTION
[0008] In accordance with a preferred embodiment of the present
invention, a timing analysis is initially performed on an
integrated circuit (IC) design to generate a slack time report for
all globally clock-gated circuits within the IC design. Based on
their respective slack time indicated in the slack time report, all
globally clock-gated circuits that should be connected to locally
generated clocks are identified. After disconnecting from a global
clock tree, each of the identified globally clock-gated circuits is
subsequently connected to a locally generated clock having a clock
delay comparable to its slack time indicated in the slack time
report.
[0009] All features and advantages of the present invention will
become apparent in the following detailed written description.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The invention itself, as well as a preferred mode of use,
further objects, and advantages thereof, will best be understood by
reference to the following detailed description of an illustrative
embodiment when read in conjunction with the accompanying drawings,
wherein:
[0011] FIG. 1 is a block diagram of a conventional global clock
tree for providing a common clock signal input to globally
clock-gated circuits within an integrated circuit;
[0012] FIG. 2 is a high-level logic flow diagram of a method for
converting globally clock-gated circuits to locally clock-gated
circuits, in accordance with a preferred embodiment of the present
invention;
[0013] FIG. 3 is a block diagram of a locally generated clock
connected to two locally clock-gated circuits, in accordance with a
preferred embodiment of the present invention;
[0014] FIG. 4 is a high-level logic flow diagram of a method for
determining whether or not a globally clock-gated circuit should be
converted to a locally clock-gated circuit, in accordance with a
preferred embodiment of the present invention; and
[0015] FIG. 5 is a block diagram of a computer system in which a
preferred embodiment of the present invention is incorporated.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
[0016] Referring now to the drawings and in particular to FIG. 1,
there is depicted a block diagram of a conventional global clock
tree for providing a common clock signal input to clock-gated
circuits, such as flip-flops or latches, within an integrated
circuit (IC). As shown, a global clock tree 10 includes an array of
buffers 12-13 to fan out a CLOCK signal generated from a clock
signal source 11. Typically, global clock tree 10 is locked tightly
to a specific frequency with virtually zero jitter and clock drift
across an entire IC design. In the embodiment shown in FIG. 1, two
first stage buffers 12 fan the CLOCK signal out to four second
stage buffers 13 that, in turn, fan the CLOCK signal out to
thirty-two sinks 14. The number of buffer stages, the number of
buffers per stage and the number of buffers or sinks each buffer
drives are matters of design choice that depend on factors such as
load capacity of buffers forming global clock tree 10, input
impedance of the devices being driven, path impedances and
allowable signal attenuation between stages, etc.
[0017] Many circuits in the digital portion of an IC design change
their logic states very infrequently but continue to be clocked in
a synchronous fashion by a high-power clock tree, such as global
clock tree 10 in FIG. 1, on every clock cycle. Such practice adds
to unnecessary power consumption in clock distributions and latch
activities. The present invention allows some globally clock-gated
circuits within an IC design that switch infrequently to be
converted to locally clock-gated circuits (i.e., using a locally
generated delay clock). By reducing the number of simultaneous
circuit switching within an IC design on the high-power clock tree
or global clock tree, power consumption and chip noise can both be
reduced.
[0018] Although the localized delay clock still consumes power, an
overall power reduction can be achieved if the new clock topology
(i.e., one with a smaller global clock tree and the locally
generated clock circuits) demands less power than the original
unmodified global clock tree. Another advantage of reducing the
number of globally clock-gated circuits locked to a global clock
tree is that the launch noise of the set of globally clock-gated
circuits driven on the global clock tree can also be reduced.
Basically, the amount of simultaneous noise is reduced via a
frequency spectrum spreading, which is an effect of using localized
delay clocking.
[0019] With reference now to FIG. 2, there is illustrated a
high-level logic flow diagram of a method for converting globally
clock-gated (or synchronous) circuits to locally clock-gated
circuits, in accordance with a preferred embodiment of the present
invention. Starting at block 21, a synchronous IC design having
multiple globally clock-gated circuits, such as latches,
flip-flops, etc., is simulated using functional test vectors that
are deemed to cover a wide range of normal operating conditions. If
no functional test vectors are available, the synchronous IC design
may be simulated using automatic test pattern generation (ATPG)
vectors. In either case, a logic circuit is formed with simulation
results for the IC design in question. A timing analysis is then
performed on the synchronous IC design, as shown in block 22.
[0020] Based on the result of the timing analysis, each globally
clock-gated circuit is categorized in a respective group according
to its slack time, as depicted in block 23. For the purpose of the
present invention, slack time is defined to include the amount of
time margin for a globally clock-gated circuit to receive an input
signal, and the amount of time margin for the globally clock-gated
circuit to deliver an output signal to another circuit. Each
globally clock-gated circuit can be generally placed under a
positive slack time group or a negative slack time group according
to the timing analysis. Globally clock-gated circuits with a
positive slack time are defined as globally clock-gated circuits
that are able to complete their switch operation before their
allocated time under the IC design specification. Each globally
clock-gated circuit in the positive slack time group is then
further categorized according to a specific range of slack time
under which the globally clock-gated circuit falls.
[0021] For the globally clock-gated circuits with a positive slack
time, a process is performed to identify all the globally
clock-gated circuits that can be connected a locally generated
clock, as shown in block 24. Such process will be further explained
in details in FIG. 3.
[0022] A locally generated clock is generated for each slack time
range, as depicted in block 25. For example, a slack time of 1 ns
to 10 ns can be divided into three ranges, with range 1 for slack
time from 1 to less than 4 ns, range 2 for slack time from 4 to
less than 7 ns, and range 3 for slack time from 7 to less than 10
ns (the above-mentioned slack times include both input and output
timing margins). In order to accommodate the three slack time
ranges, three locally generated clocks are then generated, with the
first one designed for slack time range 1, the second one designed
for slack time range 2 and the third one designed for slack time
range 3.
[0023] Each globally clock-gated circuit that has been identified
for connecting to a locally generated clock is then disconnected
from a global clock tree and be connected to a locally generated
clock for the specific range of slack time under which the globally
clock-gated circuit falls, as shown in block 26. For example, if a
globally clock-gated circuit has been identified (from block 24)
for connecting to a locally generated clock, and the globally
clock-gated circuit has been determined (from block 22) to have a
slack time of 5 ns, the globally clock-gated circuit is then
disconnected from a global clock tree and be connected to a locally
generated clock designed for slack time from 4 to less than 7 ns.
In some instances, manual adjustments to the circuit delays
associated with locally generated delay clocks may be required.
[0024] After the completion of the synthesis, placement and wiring,
etc., a timing analysis is performed on the entire IC design again,
as shown in block 27. The performance of timing analysis is to
ensure that, after the above-mentioned clock modification, the
entire IC design functions as intended and the timing specification
of the entire IC design is satisfied.
[0025] A determination is made as to whether or not the IC design
meets the timing requirement, as shown in block 28. If the IC
design does not meet the timing requirement, the process returns to
block 23 for a different slack time grouping. Otherwise, if the IC
design meets the timing requirement, the process is complete.
[0026] Referring now to FIG. 3, there is depicted a block diagram
of a locally generated clock connected to two locally clock-gated
circuits, in accordance with a preferred embodiment of the present
invention. As shown, a local clock generator 31 is connected to
locally clock-gated circuits 32 and 33 (both clock-gated circuits
32 and 33 were formerly globally clock-gated circuits connected to
a global clock tree) via two different groups of delay elements.
For example, locally clock-gated circuit 32 receives clock signals
from local clock generator 31 via two delay elements, and locally
clock-gated circuit 33 receives clock signals from local clock
generator 31 via three delay elements.
[0027] In the generation of delayed clocks that are routed within
an IC design, each delayed clock must fall within the required
timing specification to guarantee the slack time for the entire
process range of the technology. If the delay chain is generated in
an open ended fashion where a source clock (from a local clock
generator) is injected at the beginning of the delay chain and
delayed clocks are tapped off from the delay chain, each stage of
the delay chain is more susceptible to process, voltage, and
temperature variation than the previous stage because each tapped
delay is additive. To provide low jitter for each tap of the tapped
delay line, the delay line may be closed with feedback in a ring
fashion and a master source clock may be used as a reference
comparison to the delay chain input. The master source clock and
feedback input to the first stage of the delay chain can be
compared to align with one another. If the two clocks do not align,
tail currents can be added or subtracted equally to each stage of
the delay chain until the two clocks align. Such calibration
procedure allows for multiple delay chains to be calibrated to a
single master source clock and provide a solution where each
delayed clock phase used on the IC design has comparable
jitter.
[0028] In order to determine whether or not a globally clock-gated
circuit should be converted to a locally clock-gated circuit, four
inputs are preferably utilized, and they are: a logic circuit
netlist, a switching factor connected to the clocked-gated circuit,
a switching factor threshold, and don't touch markers.
[0029] The "switching factor" for a data input to a globally
clock-gated circuit is generated by two values from the simulation
results : (1) a total number of clock-signal switches present at
the globally clock-gated circuit, and (2) a total number of data
input switches present at the same globally clock-gated circuit.
The switching factor is determined by the ratio of data input
switches to clock-signal switches within the same time
interval.
[0030] A user-specified "switching factor threshold" may be used to
indicate which globally clock-gated circuits should be converted to
corresponding locally clock-gated circuits. Specifically,
clock-gated circuits whose data-input switching factors exceed the
switching factor threshold are targeted for conversion. The
switching factor threshold may be selected by a user to be any
value between 0 and 1 although, for example, it may not be
recommended to use a switching factor greater than 0.5.
[0031] A circuit designer may desire to override the conversion
process for any globally clock-gated circuit within an IC design. A
don't touch marker can be applied to any globally clock-gated
circuit within an IC design that is intended to remain connected to
a global clock tree (instead of being connected to a localized
delay clock).
[0032] With reference now to FIG. 4, there is illustrated a
high-level logic flow diagram of a method for determining whether
or not a globally clock-gated circuit should be converted to a
locally clock-gated circuit, in accordance with a preferred
embodiment of the present invention. Starting at block 41, a
determination is made as to whether or not a globally clock-gated
circuit is a "don't touch" circuit (i.e., whether or not a "don't
touch" marker has been applied), as shown in block 42. If the
globally clock-gated circuit is not a "don't touch" circuit, then a
determination is made as to whether or not a switching factor of
the globally clock-gated circuit is greater than a predetermined
switching factor threshold, as shown in block 43. Each globally
clock-gated circuit in the IC design is considered by the process
of "don't touch." Any globally clock-gated circuit marked "don't
touch" is left unchanged.
[0033] If the switching factor of the globally clock-gated circuit
is greater than the predetermined switching factor threshold, then
the globally clock-gated circuit is converted to a corresponding
locally clock-gated circuit, as shown in block 44. The globally
clock-gated circuit can be converted to a corresponding locally
clock-gated circuit by disconnecting the globally clock-gated
circuit from a global clock tree and connecting the globally
clock-gated circuit to a locally generated delay clock. Otherwise,
if the switching factor of the globally clock-gated circuit is not
greater than the predetermined switching factor threshold, the
process proceeds to block 45.
[0034] As depicted in block 45, a determination is made as to
whether or not there is any other globally clock-gated circuit left
to be processed. If there is a globally clock-gated circuit left to
be processed, the process returns to block 42. Otherwise, if there
is no globally clock-gated circuit left to be processed, the
process is completed, as shown in block 46.
[0035] As has been described, the present invention provides a
method and apparatus for converting globally clock-gated circuits
to locally clock-gated circuits. In essence, all globally
clock-gated circuits with a switching factor greater than a
switching factor threshold are converted to corresponding locally
clock-gated circuits, and globally clock-gated circuits with a
switching factor less than (equal to) the switching factor
threshold are left unchanged. Once all the globally clock-gated
circuits that were targeted for conversion have been converted,
simulation is again performed on the entire IC design, with a focus
on the locally clock-gated circuit cuts.
[0036] By allowing the set of clocks to be generated based upon
actual layout and timing reports, the noise spectrum can be spread
in such a way as to minimize the overall effect on more timing
critical paths, and to reduce noise generated coupling and due to
simultaneous switching. In addition, by maximizing the number of
local clocks versus the total number of global synchronously
generated clocks, the overall power consumption can be reduced.
[0037] Referring now to FIG. 5, there is depicted a block diagram
of a computer system in which a preferred embodiment of the present
invention is incorporated. As shown, a computer system 50 includes
a processor 51, a system memory 52 and a hard drive 55. Processor
51 executes instructions and data that are stored in system memory
52. In addition, computer system 50 also includes input devices 53,
such as a keyboard and a mouse, output devices 54, such as a
display monitor and a printer.
[0038] Although the present invention has been described in the
context of a fully functional computer system, those skilled in the
art will appreciate that the mechanisms of the present invention
are capable of being distributed as a program product in a variety
of forms, and that the present invention applies equally regardless
of the particular type of signal bearing media utilized to actually
carry out the distribution. Examples of signal bearing media
include, without limitation, recordable type media such as floppy
disks or CD ROMs and transmission type media such as analog or
digital communications links.
[0039] While the invention has been particularly shown and
described with reference to a preferred embodiment, it will be
understood by those skilled in the art that various changes in form
and detail may be made therein without departing from the spirit
and scope of the invention.
* * * * *