Method And System For Grouping Logic In An Integrated Circuit Design To Minimize Number Of Transistors And Number Of Unique Geometry Patterns Moe; Matthew D. ; et al. [Hersan; Thiago]

Method And System For Grouping Logic In An Integrated Circuit Design To Minimize Number Of Transistors And Number Of Unique Geometry Patterns

Moe; Matthew D. ; et al.

Patent Application Summary

U.S. patent application number 12/938226 was filed with the patent office on 2011-03-03 for method and system for grouping logic in an integrated circuit design to minimize number of transistors and number of unique geometry patterns. Invention is credited to Thiago Hersan, Veerbhan Kheterpal, Matthew D. Moe, Dipti Motiani, Lawrence T. Pileggi, Vyacheslav V. Rovner.

Application Number	20110050281 12/938226
Document ID	/
Family ID	43016084
Filed Date	2011-03-03

United States Patent Application	20110050281
Kind Code	A1
Moe; Matthew D. ; et al.	March 3, 2011

METHOD AND SYSTEM FOR GROUPING LOGIC IN AN INTEGRATED CIRCUIT DESIGN TO MINIMIZE NUMBER OF TRANSISTORS AND NUMBER OF UNIQUE GEOMETRY PATTERNS

Abstract

A method and system are described to group logic terms at a higher level of abstraction than that found using standard cells to implement the logic functions using a reduced number of transistors, and to reduce the total number of unique geometry patterns needed to create the integrated circuit implementation. By grouping the logic functions in terms of a larger number of literals (logic variable inputs), the functions can be implemented in terms of a number of transistors that is often less and no more than equal to that which is required for implementing the same function with a number of logic primitives, or simpler standard logic cells. The optimized transistor level designs are further optimized and physically constructed to reduce the total number of unique geometry patterns required to implement the integrated circuit.

Inventors:	Moe; Matthew D.; (Pittsburgh, PA) ; Pileggi; Lawrence T.; (Pittsburgh, PA) ; Rovner; Vyacheslav V.; (Pittsburgh, PA) ; Hersan; Thiago; (Pittsburgh, PA) ; Motiani; Dipti; (Santa Clara, CA) ; Kheterpal; Veerbhan; (Sunnyvale, CA)
Family ID:	43016084
Appl. No.:	12/938226
Filed:	November 2, 2010

Related U.S. Patent Documents


Application Number	Filing Date	Patent Number
11969214	Jan 3, 2008	7827516
12938226
11619587	Jan 3, 2007	7784013
11969214
60883332	Jan 3, 2007

Current U.S. Class:	326/38
Current CPC Class:	G06F 30/30 20200101
Class at Publication:	326/38
International Class:	H03K 19/173 20060101 H03K019/173

Claims

1. A logic circuit that includes a logic brick that implements a non-standard complex Boolean logic function that has at least three inputs, the logic circuit made by: using a computer to determine a circuit that implements the non-standard complex Boolean logic function, the determining including identifying transistors, associated connections and the at least three inputs to implement the circuit, the identifying reducing a number of the transistors to be a fewest possible that satisfy predetermined logic, layout and electrical constraints, and wherein the determining restricts the circuit to a stack depth of no more than 3; determining a layout for the circuit to specify the logic brick using the computer; and, using the logic brick layout to implement the logic circuit.

2. A logic circuit made according to claim 1, wherein the determining uses a minimal negative gate algorithm.

3. A logic circuit that includes a logic brick that implements a non-standard complex Boolean logic function that has at least three inputs, the logic circuit made by: determining a circuit that implements the non-standard complex Boolean logic function using the computer, the determining including identifying transistors, associated connections and the at least three inputs to implement the circuit, the identifying reducing a number of the transistors to be a fewest possible that satisfy predetermined logic, layout and electrical constraints, and wherein the determining the circuit uses a recursive decomposition to select an output function for the circuit, and wherein a stack height of the output function is no more than 2; determining a layout for the circuit to specify the logic brick using the computer; and, using the logic brick layout to implement the logic circuit.

4. A logic circuit made according to claim 3 wherein the determining uses a recursive decomposition and a template matching, wherein the template matching requires that the circuit be substantially obtained from design templates used in the template matching, and wherein each of the design templates is restricted to having a stack depth of no more than 3.

5. A logic circuit made according to claim 3 wherein the identifying reduces a number of the transistors to be the fewest possible after the determining (a) uses a minimal gate algorithm, (b) finds a set of Don't Cares that minimizes transistor count, and (3) ensures that the selected transistors are achieved at or below a pre-specified stack height restriction.

6. A logic circuit made according to claim 3 wherein one of the predetermined electrical constraints is stack height, one of the predetermined logic constraints is a selected type of logic.

7. A logic circuit made according to claim 6 wherein one of the predetermined logic constraints is a logic family that does not include pass transistors.

8. A logic circuit made according to claim 6 wherein one of the predetermined layout constraints is using a merged diffusion region for at least some of the transistors.

9. A logic circuit made according to claim 3 wherein one of the predetermined layout constraints is using a merged diffusion region for at least some of the transistors.

10. A logic circuit made according to claim 9 wherein one of the predetermined logic constraints is a logic family that does not include pass transistors.

11. A computer-designed, transistor-implemented logic circuit, comprising: a plurality of interconnected transistors formed from overlapping polysilicon and diffusion patterns, said interconnected transistors corresponding to a plurality of Transistor Level Bricks (TL bricks), each TL brick corresponding to a set of transistor-based logical gates synthesized from a logical representation of a non-standard complex Boolean logic function; wherein each TL brick is limited to a stack depth of no more than three.

12. A logic circuit, as defined in claim 11, wherein each TL brick is limited to an output stack height of no more than two.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present invention is a continuation of U.S. application Ser. No. 11/969,214 filed Jan. 3, 2008 entitled "A Method and System for Grouping Logic in an Integrated Circuit Design to Minimize Number of Transistors and Number of Unique Geometry Patterns" which is a continuation-in-part of U.S. application Ser. No. 11/619,587 filed Jan. 3, 2007 entitled "Method for the Definition of a Library of Application-Domain-Specific Logic Cells" which claims priority to U.S. Provisional Application No. 60/883,332 filed Jan. 3, 2007 entitled "A Method and System For Grouping Logic in an Integrated Circuit Design to Minimize Number of Transistors and Number of Unique Geometry Patterns," all of which are incorporated by reference herein in their entirety.

FIELD OF THE INVENTION

[0002] The present invention relates to a method and system for grouping logic in an integrated circuit design to minimize number of transistors and number of unique geometry patterns.

BACKGROUND OF THE RELATED ART

[0003] Carnegie Mellon University has published research that describes grouping of logic into macro-regular "bricks" to allow the use of pushed design rules within the bricks. U.S. Pat. No. 7,278,118 entitled "Method and Process For Design of Integrated Circuits Using Regular Geometry Patterns to Obtain Geometrically Consistent Component Features" describes other aspects of such bricks.

[0004] While the invention in the '118 patent and other pending applications of the current assignee set forth advantageous aspects relating to the creation of such logic bricks, refinements and advances continue, and some of those are described herein.

SUMMARY OF THE INVENTION

[0005] The present invention relates to a method and system for grouping logic in an integrated circuit design to minimize number of transistors and number of unique geometry patterns.

[0006] In one aspect, there is described a method of determining a logic brick that contains a non-standard complex Boolean logic function that has at least three inputs that includes determining a circuit that implements the non-standard complex Boolean logic function, the step of determining including the step of identifying transistors, associated connections and the at least three inputs to implement the circuit, the step of identifying reducing a number of the transistors to be a fewest possible that satisfy predetermined logic, layout and electrical constraints; and determining a layout for the circuit to specify the logic brick.

[0007] In a preferred embodiment the step of determining the circuit can have a number of different aspects, examples of which include:

[0008] restricting the circuit to a stack depth of no more than 3;

[0009] using a minimal negative gate algorithm;

[0010] using a recursive decomposition to select an output function for the circuit wherein a stack height of the output function is no more than 2;

[0011] using a recursive decomposition and a template matching, wherein the template matching requires that the circuit is substantially obtained from design templates used in the template matching, and wherein each of the design templates are restricted to having a stack depth of no more than 3.

[0012] In another aspect, the invention reduces a number of the transistors to be the fewest possible.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013] These and other aspects and features of the present invention will become apparent to those of ordinary skill in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures, wherein:

[0014] FIG. 1 an overview flow diagram of the present invention;

[0015] FIG. 2 illustrates an example of a logic function implemented with standard cell functions;

[0016] FIG. 3 illustrates an example of the logic function of FIG. 2 implemented with a transistor level optimized brick according to one embodiment of the present invention;

[0017] FIGS. 4(a)-(b) illustrate footprints of a conventional standard cell and a transistor level optimized brick according to the present invention, respectively;

[0018] FIG. 5 illustrates a conventional fixed pitch polysilicon fabric;

[0019] FIGS. 6(a)-(b) illustrate two examples of circuits that have a different stack depth;

[0020] FIG. 7 shows an overview of one transistor level synthesis algorithm flowchart according to one embodiment of the present invention;

[0021] FIG. 8 illustrates a minimal gate transformation according to one embodiment of the present invention;

[0022] FIG. 9 illustrates encoding using a directed graph according to one embodiment of the present invention;

[0023] FIG. 10 illustrates an optimized standard cell implementation obtained using a conventional design process

[0024] FIG. 11 illustrates an optimized standard cell implementation obtained using a transistor level synthesis algorithm according to the embodiment of the present invention described with respect to FIG. 7 above;

[0025] FIG. 12 illustrates recursive decomposition according to one embodiment of the present invention;

[0026] FIG. 13 illustrates examples of Boolean functions recursively decomposed into sub-functions that drive an output function according to the present invention;

[0027] FIG. 14 illustrates an overview of the algorithm that recursively decomposes Boolean functions according to the present invention;

[0028] FIG. 15 shows an overview of one transistor level synthesis algorithm flowchart according to another embodiment of the present invention;

[0029] FIG. 16 shows an overview of one transistor level synthesis algorithm flowchart according to a further embodiment of the present invention;

[0030] FIGS. 17(a)-(b) illustrate two H-tree functions that are used as design templates according to the present invention; and

[0031] FIG. 18 illustrates an undesired high capacitance template.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0032] IC design with increased regularity for manufacturability can incur a penalty in terms of area and performance. It is important, therefore, to exploit this regularity with circuits and methodologies that can overcome some or all of these penalties. Exploiting the large logic functions that are grouped for macro-regularity to reduce the number of transistors required to perform one or more logic functions on the chip can provide a significant improvement in both area and performance. Improvements in IC area and performance are of great commercial value in all IC application domains.

[0033] A method and system are described that assist with obtaining this regularity, and in one aspect to group logic terms at a higher level of abstraction than that found using standard cells to implement the logic functions using a reduced number of transistors, and to reduce the total number of unique geometry patterns needed to create the integrated circuit implementation. This description is considered in conjunction with U.S. application Ser. No. 11/619,587 filed Jan. 3, 2007 entitled "Method for the Definition of a Library of Application-Domain-Specific Logic Cells" and U.S. Pat. No. 7,278,118 entitled "Method and Process For Design of Integrated Circuits Using Regular Geometry Patterns to Obtain Geometrically Consistent Component Features," filed Nov. 5, 2005, which applications are hereby expressly incorporated by reference herein. By geometry patterns is meant an arrangement of patterns for the masks which define a physical implementation of a transistors, logic cells, logic bricks, etc. The area covered by such patterns can be of any size or shape, but for this invention we are referring to the set of patterns which would lie within a circle that defines the range of influence between patterns for lithography (e.g. impacts OPC and RETs) or electrical interaction (e.g. stress).

[0034] Referring to FIG. 1, this invention begins with taking one or more system level netlists, such as those described at the RTL (register transfer level), and deriving the set of non-standard complex Boolean logic functions, herein referred to as logic bricks. One possible objective would be to find the set of non-standard complex Boolean logic functions that implement the design to meet all specifications, but with the fewest number of such logic bricks. One goal of such a methodology is to reduce the total number of unique geometry patterns required to implement the system or design. In such as design flow, these logic bricks can be physically implemented using a number of standard cells, but the invention described herein optimally implements these logic bricks by finding the best transistor level topology, sizing and interconnections that will provide for a superior power, delay and or physical area brick implementation.

[0035] Specifically with reference to the flowchart in FIG. 1, The library contained in 120 can be a complete standard cell library or a limited set of standard cells or logic primitives that form a sufficient set to derive the set of large logic functions in Step 160. In Step 130 the RTL netlist or netlists of Step 110 are synthesized using a logic synthesis tool and the library from Step 120 to produce a netlist in 140 in terms of the logic gates from the library of 120. At this point the number of transistors used to implement the netlist or netlists can be counted in Step 150. One or more logic gates can then be grouped into larger functions (sometimes referred to as bricks) in Step 160. Our preferred method for Step 160 is detailed in the U.S. application Ser. No. 11/619,587, referred to previously. The individual TL bricks can then each be synthesized directly into a transistor level implementation using TranSynth as part of Step 170. In Step 180 the newly re-synthesized bricks are substituted for the equivalent logic gates in the original netlist. In Step 190 the number of transistors in the netlist is once again counted. The number of transistors in the netlist at Step 180 will have less transistors than the netlist at Step 140.

[0036] With this overall description, further particulars will now be provided.

[0037] Brick Discovery is the process of finding a limited set of Boolean functions, some of which are non-standard complex Boolean logic functions that most efficiently implement a design. Integral to this process is the evaluation of a single Boolean function in terms of transistor level efficiency. Transistor Level Synthesis (henceforth referred to as TranSynth), is the process by which a Logical Brick, which is the logic and/or physical representation of a non-standard complex Boolean logic function, is transformed into a set of transistor-based logical gates, called a Transistor Level Brick (henceforth referred to as a TL Brick). The transformation process includes gate level synthesis, netlist generation and transistor sizing. Once transformed, these TL Bricks can be evaluated in terms of performance, area and power. Without these runtime evaluations during Brick Discovery, the chosen Logical Bricks could result in a design implementation that is significantly inferior to that which is otherwise possible.

[0038] By grouping the logic functions into bricks that are based on functions with a large number of literals (input logic variables), the TL brick implementation of those functions can be implemented in terms of a number of transistors that is often less than, and guaranteed to be no worse than equal to, that which is required for implementing the same function with a number of logic primitives, such as standard cells. As an example, consider the logic function A(DE+FG)+BC, which is an example of a 7-term logic brick. Implementation via standard cells is extremely efficient for this function in terms of two AO22's (standard cell AND-ORs). The transistor-level schematic for this standard-cell-based implementation is shown in FIG. 2. It is comprised of 20 transistors and requires 4 stages of logic to implement the function A(DE+FG)+BC.

[0039] In contrast, consider the transistor-level optimized implementation of a single TL brick for the same function, A(DE+FG)+BC, as shown in schematic form in FIG. 3. This implementation requires only 16 transistors, and only 2 stages of logic.

[0040] This optimization of logic functions, such as logical or TL bricks, can be performed for any size logic function, but in this invention it is intended for complex functions in the range of 3-12 inputs. Moreover, this invention as described in FIG. 1, includes taking one or more system-level or design-level (such as RTL) netlists and deriving a set of TL bricks and corresponding transistor-level optimizations of said TL bricks that will facilitate implementation of the netlist with a fewer total number of transistors. Optimization of logic is also achieved. by choosing a logic family that does NOT include pass transistors because they are inefficiently laid out in the unidirectional pattern fabrics (those with patterns in each layer only in one direction.

[0041] By reducing the number of transistors, the complete logic design can be implemented more efficiently in terms of power, area and performance (including timing). Furthermore, by grouping the logic into TL bricks that are larger than typical standard cells, further improvement in area and performance is obtained by optimizing the physical implementation (layout) of the transistor-level optimized functions. This invention further considers the co-optimization of the layout and transistor level topology and sizing to achieve the best possible area, power and performance. This optimization could include the minimization or reduction in the total number of geometry patterns required to implement the design.

[0042] One such layout optimization is to merge diffusions between neighboring transistors to avoid the need to make a connection between them. For example, the physical implementation of A(DE+FG)+BC based on standard cells (FIG. 2), is shown in FIG. 4 (left). The physical implementation of A(DE+FG)+BC based on a single transistor-level optimized brick (FIG. 3, is shown in FIG. 4 (right). Note that the TL brick footprint is 25% smaller than that for the standard cell implementation based on the use of the same regular pattern design rules for both. This improvement in footprint is attributable to both the reduced number of transistors, as well as the ability to better order those transistors for diffusion sharing and other physical layout improvements.

[0043] By following this design flow, one can reduce the total number of transistors required to implement a system-level or design-level logic description, and further reduce the total number of unique geometry patterns that are required for that implementation. Further specifics regarding this design flow will now be described.

TranSynth Metrics

[0044] The two fundamental metrics used within TranSynth are Area and Stage Depth. In a regular Fabric such as that shown in FIG. 5, where the gates occur only at a fixed pitch, a lower bound area estimate is Transistor Count*Gate Pitch*Brick Height/2. More accurate area estimation is achievable through the use of design templates, which are described later in the TranSynth-3 section. In terms of area, savings of area is also achieved by the avoidance of interconnected transistors that will require so much routing that for a particular fabric there is the need to skip a track where a transistor could be located (for example, with pass transistors).

[0045] The other fundamental metric-Stack Depth, is measured as the maximum number of gates traversed from inputs to output. With all other things being equal, stage depth shortening will result in a faster design because of a reduction in intermediate node capacitance. FIGS. 6(a)-(b) shows an example of how Stack Depth is measured in two possible implementations. If choosing between these two implementations, the implementation of FIG. 6(a) on the left is likely to be faster than the implementation of FIG. 6(b) on the right because of the shorter Stack Depth.

TranSynth Constraints

[0046] The transistor level synthesis process will generate the implementation with the fewest number of transistors that are required to implement the logic function and satisfy a number of electrical, layout and logical constraints. Sometimes these constraints create conflicting goals for TranSynth.

[0047] Stack height, which defines the number of series connected NFET or PFET devices in a logic cell or brick, has a logical impact, but also is constrained in terms of electrical performance requirements. High stack heights create large stack resistances that limit leakage power, but also limit the performance. The stack height constraint is carefully chosen to balance logic efficiency, leakage power, and timing. For example, in modern CMOS technologies, stack heights of more than 3 MOSFETs are generally not used because the switching performance will be degraded too severely. Electrical constraints also have an impact on layout and logic efficiency. For example, logic cells that have a high internal and output capacitance can be undesirable for power and delay reasons, i.e. AOI333 or OAI333, and other such cells, as are known. Such logic cells are often not used as part of the library for these reasons, even though their non-use can cause an increase in the overall block or IC layout area due to a reduction in efficiency for the mapping of the RTL design into the netlist of library elements.

[0048] Layout constraints such as cell height, cell area and limited pattern choices (based on lithography or manufacturability considerations) can have both electrical and logical impacts. Namely, there can be limitations on the choice of logic family and/or the sizes for the transistors within the cells. Patterning choices such as unidirectionality (all patterns oriented in vertical or horizontal direction only) and pitch selection (wire widths and spacings) of various layers make certain logic families area inefficient and undesirable. Transistor sizes in the final netlist are constrained not only by the patterning choices but also by the cell height. These layout constraints impact both the leakage power and logic efficiency.

[0049] Importantly, a central portion of the TranSynth methodology is to efficiently and effectively co-optimize the number of transistors subject to these constraints.

TransSynth-1

[0050] FIG. 7 shows one method 700, referred to as TranSynth-1, of turning a Logical Brick into a TL Brick. Each Logical Brick, shown as the input logic function in block 710, is transformed into a series of logical gates using Nakamura's Minimal Negative Gate Algorithm, described in K. Nakamura, N. Tokura, and T. Kasami. "Minimal Negative Gate Networks." IEEE Transactions on Computers, C-21(1):5-11, January 1972 and shown at block 720. The logical gate level netlist obtained is then transformed into a static CMOS transistor level netlist using simple substitution, as shown at block 730. Transistor sizing and timing estimation use a logical effort based algorithm, as shown at block 740. As discussed earlier, a lower bound area estimate can be derived from the transistor count. The final result is either a TL Brick netlist suitable for further implementation or an area and timing estimate which can be used within Brick Discovery. The next few sections describe each of these tasks in more detail.

Minimal Negative Gate Algorithm

[0051] Nakamura's Minimal Gate Algorithm, shown at block 720 in FIG. 7, transforms a Boolean function--F(a, b, c, d, . . . ), where a, b, c, d, etc. are Primary Inputs to the function, into a series of intermediate functions--F.sub.m (a, b, c, d, . . . ). Each F.sub.n is a function of the primary inputs and other intermediate functions F.sub.m's, where m<n as shown in FIG. 8.

[0052] The F.sub.ms are derived by encoding binate functions in a directed graph as shown in FIG. 9 for the example function a'c'+b'c'+abc. Directed connections in the graph are placed between minterms that have a Gray's Code distance of 1--the minterm values differ by only 1 bit. Labels are assigned to each minterm such that the Least Significant Bit (LSB) is the function value and the labels between connected minterms are monotonically increasing.

[0053] The Most Significant Bit (MSB) in the labels becomes the function values for F.sub.o given only the primary inputs. The next MSB in the labels becomes the function values for the F.sub.1 function given the primary inputs and F.sub.0 as inputs to the function. This continues until the function F.sub.n is derived from the LSBs in the labels as a function of the primary inputs and all of the previously evaluated intermediate functions.

[0054] Nakamura's algorithm only shows how to find an implementation of a function in the minimum number of logic gates. The algorithm does not address transistor count minimization or stack height restrictions--two of the goals of the TranSynth algorithm.

[0055] In order to find an implementation with the minimum number of transistors with a stack height restriction, a large solution space must be explored. There are two main causes of the large solution space--Don't Care selection and Label Assignment. Each F.sub.n, except for n equal to 0, is an incompletely specified function. As the n index increases, so does the Don't Care solution space because proportionally more minterms of the function have values that are unspecified. Finding the right set of Don't Cares that minimizes transistor count for each gate is not trivial. The current implementation exhaustively searches the solution space. Published Boolean minimization algorithms may not give an optimal transistor count.

[0056] In the label assignment process, each directed connection has a label value that is monotonically increasing. There are sometimes many label values that will satisfy this constraint. Exhaustive search of this solution space is necessary to guarantee optimality. The current solution for design space exploration utilizes a branch and bound algorithm based on transistor count. The addition of stack height constraints makes finding any solution difficult for some functions. Without an initial solution, the branch and bound algorithm must explore the entire design space. Some functions are not implementable in the minimum number of gates given a stack height constraint which results in long runtimes and no solution.

[0057] FIG. 10 shows the optimized implementation in terms of standard cells (or standard logic primitives) for the function a'c'+b'c'+abc.

[0058] FIG. 11 shows the CMOS implementation for the same function following TransSynth-1 without any stack height restrictions.

[0059] Compared to the optimized standard cell implementation, TranSynth-1 is able to reduce the number the stages from 3 to 2 without impacting the transistor count. The reduced stage depth will translate into a faster implementation.

TranSynth-2

[0060] A further refinement, and one way of decreasing the search space and resultant runtime found in TranSynth-1, is to recursively decompose the Boolean functions (into sub-functions driving an output function) through algebraic tree decomposition before applying the TransSynth-1 methodology, as shown in FIG. 12.

[0061] In tree decomposition, only logical gates with a tree structure like that shown in FIG. 13 are possible.

[0062] Transistor stack height restrictions help speedup the decomposition by limiting the number of possible logic gates at each stage of the recursion. With a stack height of 2, there are only 7 possible gates as the final output gate--INV, NAND2, NOR2, AOI21, AOI22, OAI21, OAI22. With a stack height of 3, the present inventors have identified that there are only 67 possible logical gates in tree decomposition. With a stack height of 4 there is still a limited number of functions, and not all of those necessarily are needed--just as not all of the 69 functions are needed to implement TL bricks if the stack height is limited to 3. Limiting the stack height is a performance constraint, and by limiting the stack height this limits the total number of possible combinations. By having a reduced set of combination, this allows characterization of those combinations, and can then lead to simplification of the overall design of integrated circuits, since with the limited number of functions the more limited number of combinations makes it easier to characterize them--particularly since from these useful Boolean functions an almost limitless set of TL bricks can be built. There is some loss of optimality in the search space reduction of TranSynth-2, but there is also a significant reduction in runtime when utilizing TranSynth-2 compared to TranSynth-1 for certain complex functions. Embedded XOR decompositions are only found because they are explicitly searched for. Other similar structures are not found. It is noted that Transynth-1 can search for the best solution. Transynth-2 simplifies the search space a bit by first decomposing a large function. This will improve the search efficiency, but it can result in a solution that is inferior to that from Transynth-1.

[0063] The TranSynth-2 flow starting from a Boolean logic function is shown in FIG. 14. The first step 1410 is consideration of an XOR decomposition, which step can also include the minimal gate algorithm described previously, if desired. If there is a naturally occurring XOR decomposition visible from the Binary Decision Diagram implementation of the function, then an XOR gate is selected as the top level gate, as shown at step 1420. This will yield two or more simplifying sub-functions. The number of subfunctions is equal to the number of inputs to the XOR gate. Each of these sub-functions can then be synthesized using TranSynth-1 or recursively decomposed using TranSynth-2, as shown at step 1430, in order to obtain the best results.

[0064] If an XOR decomposition does not naturally exist for the function, as determined by step 1410, then a gate is selected in step 1450 that satisfies all of the chosen constraints such as stack height, electrical constraints or layout constraints. The tree decomposition algorithm can explicitly limit the stack height. Other electrical or layout constraints can be met by either disallowing or penalizing inferior gates explicitly. The one or more sub-functions that are the inputs to the chosen gate can then be synthesized as shown in step 1460 using TranSynth-1 or recursively decomposed using TranSynth-2, to obtain saved results. Once the sub-functions are synthesized, the circuit can be evaluated based on a set of metrics and the metric values and circuit stored. Once all possible candidate gates have been considered as the top level gate that satisfies the chosen constraints, as shown at step 1470, the best implementation is selected as shown in step 1480 as that with the best overall quality measure (e.g. minimum number of transistors or smallest area), depending on the determined quality measure that is input to the transyth algorithm.

[0065] FIG. 15 shows the overall flow 1500 of the TransSynth 2 methodology. As shown in FIG. 15, each Logical Brick, shown as the input logic function in block 1510, is decomposed as described above in step 1520. The logical gate level netlist obtained is then transformed into a static CMOS transistor level netlist using simple substitution, as shown at block 1530. Transistor sizing and timing estimation use a logical effort based algorithm, as shown at block 1540. As discussed earlier, a lower bound area estimate can be derived from the transistor count. The final result is either a TL Brick netlist suitable for further implementation or an area and timing estimate which can be used within Brick Discovery.

TranSynth-3

[0066] One last modification to the TranSynth methodology is the use of design templates to increase the accuracy for timing and area modeling. Design Templates are precharacterized logic gates that can be used as building blocks for a TL Brick. In a stack height of 3, there are 69 useful Boolean functions that can be built, and from these useful Boolean functions an almost limitless set of TL bricks can be built. Because the templates are limited in number, each of these functions can be implemented in silicon and well-characterized in terms of timing and area.

[0067] 67 of the 69 functions are most efficiently implemented utilizing a traditional static CMOS tree structure such as that found in the examples of FIG. 13. These functions, most of which are trees as mentioned above, are provided in the table below:

TABLE-US-00001 TABLE I #define OAI333 1 // !((A+B+C)(D+E+F)(G+H+I)) #define AOI333 2 // !(ABC+DEF+GHI) #define OAI332 3 // !((A+B+C)(D+E+F)(G+H)) #define AOI332 4 // !(ABC+DEF+GH) #define OAI331 5 // !((A+B+C)(D+E+F)G) #define AOI331 6 // !(ABC+DEF+G) #define OAI33 7 // !((A+B+C)(D+E+F)) #define AOI33 8 // !(ABC+DEF) #define OAI32 9 // !((A+B+C)(D+E)) #define AOI32 10 // !(ABC+DE) #define OAI31 11 // !((A+B+C)D) #define AOI31 12 // !(ABC+D) #define NOR3 13 // !(A+B+C) #define NAND3 14 // !(ABC) #define NOR2 15 // !(A+B) #define NAND2 16 // !(AB) #define INV 17 // !A #define OAI322 18 // !((A+B+C)(D+E)(F+G)) #define AOI322 19 // !(ABC+DE+FG) #define OAI321 20 // !((A+B+C)(D+E)F) #define AOI321 21 // !(ABC+DE+F) #define OAI311 22 // !((A+B+C)DE) #define AOI311 23 // !(ABC+D+E) #define OAI222 24 // !((A+B)(C+D)(E+F)) #define AOI222 25 // !(AB+CD+EF) #define OAI221 26 // !((A+B)(C+D)E) #define AOI221 27 // !(AB+CD+E) #define OAI22 28 // !((A+B)(C+D)) #define AOI22 29 // !(AB+CD) #define OAI211 30 // !((A+B)CD) #define AOI211 31 // !(AB+C+D) #define OAI21 32 // !((A+B)C) #define AOI21 33 // !(AB+C) #define OA22OAI23 34 // !((((A+B)(C+D))+E)(F+G+H)) #define AO22AOI23 35 // !(((AB+CD)E)+FGH) #define OA22OAI22 36 // !((((A+B)(C+D))+E)(F+G)) #define AO22AOI22 37 // !(((AB+CD)E)+FG) #define OA22OAI21 38 // !((((A+B)(C+D))+E)F) #define AO22AOI21 39 // !(((AB+CD)E)+F) #define OA22NOR2 40 // !(((A+B)(C+D))+E) #define AO22NAND2 41 // !((AB+CD)E) #define OA21OAI23 42 // !((((A+B)C)+D)(E+F+G)) #define AO21AOI23 43 // !(((AB+C)D)+EFG) #define OA21OAI22 44 // !((((A+B)C)+D)(E+F)) #define AO21AOI22 45 // !(((AB+C)D)+EF) #define OA21OAI21 46 // !((((A+B)C)+D)E) #define AO21AOI21 47 // !(((AB+C)D)+E) #define OA21NOR2 48 // !(((A+B)C)+D) #define AO21NAND2 49 // !((AB+C)D) #define AND2OAI23 50 // !(((AB)+C)(D+E+F)) #define OR2AOI23 51 // !(((A+B)C)+DEF) #define AND2OAI22 52 // !(((AB)+C)(D+E)) #define OR2AOI22 53 // !(((A+B)C)+DE) #define AND2OAI21 54 // !(((AB)+C)D) #define OR2AOI21 55 // !(((A+B)C)+D) #define OA222NOR2 56 // !(((A+B)(C+D)(E+F))+G) #define AO222NAND2 57 // !((AB+CD+EF)G) #define OA221NOR2 58 // !(((A+B)(C+D)E)+F) #define AO221NAND2 59 // !((AB+CD+E)F) #define OR2AOI31 60 // !(((A+B)CD)+E) #define AND2OAI31 61 // !((AB+C+D)E) #define AND2OAI33 62 // !(((AB)+C+D)(E+F+G)) #define OR2AOI33 63 // !(((A+B)CD)+EFG) #define AND2OAI32 64 // !(((AB)+C+D)(E+F)) #define OR2AOI32 65 // !(((A+B)CD)+EF) #define AND2OA22NOR2 66 // !(((AB+C)(D+E))+F) #define OR2AO22NAND2 67 // !((((A+B)C)+(DE))F)

[0068] To this set the following non-tree functions are added that also satisfy the 3-stack limit. We count this H-tree function, shown in FIG. 17 as a single template, although it can be used to form more than one logic function if one or more of the inputs are repeated. It should be further noted that such H-tree functions are extremely efficient implementations of some large logic functions, and thus significant.

[0069] The other two functions are most efficiently implemented utilizing the H-Tree structures shown in FIG. 17. These structures can be found with the TransSyn-1 algorithm. In TranSyn-2, the currently employed algebraic tree decomposition algorithm can not find H-Trees except at the first logic stage (closest to the inputs). In TranSynth-3, when algebraic decomposition is replaced with Boolean division any template--tree-like or H-Tree can easily be found.

[0070] The number of templates that are considered in TranSynth-3 can be reduced further when factors other than stack height are considered such as layout efficiency and electrical properties. The logical gate implemented in FIG. 18 could be removed from the considered template set because of high output capacitance that will result in poor timing.

[0071] FIG. 16 shows the overall flow 1600 of the TransSynth 3 methodology. As shown in FIG. 16, each Logical Brick, shown as the input logic function in block 1610, is decomposed as described above in step 1620, which step can also include the minimal gate algorithm described previously, if desired. The logical gate level netlist obtained is then transformed into a static CMOS transistor level netlist using template matching as described above and indicated at block 1630, wherein the functions that make up the circuit are substantially obtained from the design templates. By substantially obtained is intended that typically 100% of the transistors in the circuit for the TL brick are obtained from transistors that are in the design templates, though this aspect of the invention cannot be avoided merely by using some percentage, even up to 20%, of transistors from a source that is not the design templates as described herein. Transistor sizing and timing estimation use a logical effort based algorithm, as shown at block 1640. As discussed earlier, a lower bound area estimate can be derived from the transistor count. The final result is either a TL Brick netlist suitable for further implementation or an area and timing estimate which can be used within Brick Discovery.

Optimal TranSynth Implementation

[0072] The optimal TranSynth implementation for a given function is dependant on the stack height for a given technology. If the allowable stack height is greater than 3, TranSyn-1 can be sometimes more efficient because of the algebraic decomposition solution space explosion with TranSyn-2 and the exponential increase in templates of TranSyn-3

[0073] Although the present invention has been particularly described with reference to embodiments thereof, it should be readily apparent to those of ordinary skill in the art that various changes, modifications and substitutes are intended within the form and details thereof, without departing from the spirit and scope of the invention. Accordingly, it will be appreciated that in numerous instances some features of the invention will be employed without a corresponding use of other features. Further, those skilled in the art will understand that variations can be made in the number and arrangement of components illustrated in the above figures. It is intended that the scope of the appended claims include such changes and modifications.

* * * * *