U.S. patent application number 13/491935 was filed with the patent office on 2012-11-01 for hardware definition method including determining whether to implement a function as hardware or software.
Invention is credited to Frank MAY, Martin VORBACH.
Application Number | 20120278772 13/491935 |
Document ID | / |
Family ID | 38093028 |
Filed Date | 2012-11-01 |
United States Patent
Application |
20120278772 |
Kind Code |
A1 |
VORBACH; Martin ; et
al. |
November 1, 2012 |
HARDWARE DEFINITION METHOD INCLUDING DETERMINING WHETHER TO
IMPLEMENT A FUNCTION AS HARDWARE OR SOFTWARE
Abstract
A hardware definition system and method includes a computer
processor analyzing software function modules of a software
program, and generating, for each of at least a subset of the
software function modules, and on the basis of the analyzing step,
a respective setting indicating whether the respective function
module is to be implemented as a respective hardware module or as a
software module executed on a hardware module defined in a hardware
module library.
Inventors: |
VORBACH; Martin;
(Lingenfeld, DE) ; MAY; Frank; (Munich,
DE) |
Family ID: |
38093028 |
Appl. No.: |
13/491935 |
Filed: |
June 8, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
12087916 |
Dec 2, 2008 |
8250503 |
|
|
PCT/EP2007/000380 |
Jan 17, 2007 |
|
|
|
13491935 |
|
|
|
|
Current U.S.
Class: |
716/103 |
Current CPC
Class: |
G06F 30/30 20200101;
G06F 30/34 20200101 |
Class at
Publication: |
716/103 |
International
Class: |
G06F 17/50 20060101
G06F017/50 |
Foreign Application Data
Date |
Code |
Application Number |
Jan 18, 2006 |
EP |
06 001 043.6 |
Jan 18, 2006 |
EP |
06 400 003.7 |
Jan 23, 2006 |
DE |
10 2006 003 275.6 |
Jan 27, 2006 |
DE |
10 2006 004 151.8 |
Claims
1. (canceled)
2. A method for automatically generating at least a part of a
hardware netlist of a chip, the method comprising: assembling a
definition for a chip, the definition defining a plurality of
hardware modules, wherein, for each of at least some of the defined
plurality of hardware modules, one or more features of the
respective hardware modules is selectable from a respective
plurality of features by selection of one or more of a plurality of
corresponding parameters, the definition not defining the parameter
selection; transforming each of a plurality of algorithms into a
respective form that is specific to execution on the defined
plurality of hardware modules; for each of the defined plurality of
hardware modules, determining required functionality of the
respective hardware module based on the transformed algorithms; and
for each of the defined hardware modules, in accordance with the
determined required functionality of the respective hardware
module, defining a respective selection of parameters from the
respective plurality of corresponding parameters of the hardware
module.
3. The method of claim 2, wherein the defined parameter selection
for each of at least one of the defined hardware modules, defines a
selection of fewer than all of the plurality of corresponding
parameters of the respective hardware module, such that at least
one of the plurality of corresponding parameters of the respective
hardware modules is not selected by the definition.
4. The method of claim 3, wherein the parameter selections are
performed to provide a minimal number of parameter selections.
5. The method of claim 2, wherein the parameter selections are
performed for optimizing at least one of a size of the chip, an
operation speed, and a power dissipation.
Description
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of and claims priority to
U.S. patent application Ser. No. 12/087,916, filed on Dec. 2, 2008,
which is the National Stage of International Patent Application
Serial No. PCT/EP2007/000380, filed on Jan. 17, 2007, the entire
contents of each of which are expressly incorporated herein by
reference.
FIELD OF THE INVENTION
[0002] The present invention relates to a preferably reconfigurable
architecture, or a preferably partially reconfigurable
architecture, and a method for programming a cell element field,
the elements of the field being able to execute a number of
different functions, in particular such a multitude of functions
that an all-purpose processor is obtained.
BACKGROUND INFORMATION
[0003] A method according the related art with respect to a design
flow is shown in FIG. 1. FIG. 1 shows a known method of creating
and programming a reconfigurable architecture in the sense of the
remarks below. The figure shows on the right that a library
containing modules for a larger chip is provided, which concerns,
among other things, an ALU-PAE definition, a RAM-PAE definition
etc. As required and specified, these different definitions are
combined in an XPP generator and afterwards a synthesis is
performed for the output obtained from the XPP generator in order
to generate a mask set for the synthesized hardware on the basis of
the result of the synthesis such that a chip may be produced.
[0004] The left side of the diagram shows a library for a number of
programs (software parts) in a language such as NML, this special
language being known from other publications of the applicant. Then
a program is written by using such library software parts, it being
obviously possible to use additionally and/or exclusively also
software parts not contained in the library. The program is then
compiled, compiling here being understood to include also placing
and routing, as required. For this purpose, the compiler needs
information that refers to the actual target hardware design. The
compiler also has such information. The configuration(s) generated
by the compiler are than made to run on the hardware as run time
configuration.
[0005] It has also already been proposed (WO 2004/114166) to
provide a so-called bottom-up approach in hardware design, an
integrated circuit development system having been provided, which
included a description library of a multitude of hardware objects,
which are each structured to operate on message packets, each
object being intended to have relatively similar electrical load
characteristics; and the integrated circuit development system
further including a modeler, which refers to the library and is to
be structured to accept an instruction that creates an
instantiation of one of the descriptions and to accept a command
that combines two or more of the created instantiations with one
another. The laborious programming of this known method of
instantiated hardware objects then provides for a collection of
software objects to be accepted which are themselves to be
abstractions of the instantiated hardware objects, each software
object being intended to include a list of hardware objects that
are used in the software object as well as a list of rules for
combining the listed hardware objects and an instruction file that
is to be loaded into the listed hardware objects; a description of
the collection of physically instantiated hardware objects then
having to be accepted; an identifier having to be allocated to each
of the physically instantiated hardware objects from the list of
hardware objects and an initialization file having to be created
for the collection of physically instantiated hardware objects by
using the identifier in order to replace symbolic information in
the instruction files. The last-mentioned technique as shown in WO
2004/114166 is disadvantageous particularly because of the fact
that it can neither be assumed with absolute reliability that a
hardware-software isomorphism is actually given and not merely
claimed, and because in addition the applications designed in
accordance with the system must often provide for an excess of
unnecessary hardware on a silicon chip. At the same time there is
no assurance that in the known procedure according to WO
2004/114166 an optimal execution speed of the hardware objects tied
together from predefined, invariable hardware modules is
realized.
[0006] Furthermore, in the cited related art as shown in WO
2004/114166, it remains necessary for hardware engineers to design
the hardware. It is not possible to leave the construction of a
chip for a dedicated application to the programmer of the dedicated
application entirely or at least largely.
SUMMARY
[0007] In the present application, a reconfigurable architecture is
understood in the broadest sense as an architecture in which at
least one of the elements processing, storing and/or transmitting
cross-linkages of data is itself modifiable; in a preferred
variant, the term reconfigurable architecture being understood,
without this being referenced each time, as a dynamically
reconfigurable architecture, unless the respective semantic context
indicates otherwise. In this connection, dynamic can refer to the
capability of reconfiguration occurring at a speed that allows for
a complete and/or partial reconfiguration at run time; the
reconfiguration may thus occur for all cell elements, connecting
elements etc. of a field, only for a subgroup of a field and/or for
an individual element of the field. The reconfiguration may be
induced, reference being made here for disclosure purposes to
earlier patent documents of the applicant which are all
incorporated to their full extent, e.g., by a possibly separately
built up and/or pre-loaded central entity, by an adjacent cell
and/or a cell within the element itself, which determines in the
course of the data processing performed by it that subsequently
another or additional data processing is required prior to or
during the transmission and/or output of the data to another cell
or outside the cell element field. A reconfiguration of elements
lying upstream in the data path may also be brought about. The
reconfiguration may be forced from the outside, i.e., outside of
the field, and/or from inside and/or may be requested.
Reconfiguration information is transmittable over separate
reconfiguration lines, (data) buses and/or in direction connection
from cell to cell.
[0008] The direct data connection from cell to cell may occur
alternatively and/or additionally to an interconnection of multiple
cells by connection to longer regions stretching over extended
parts of the field and/or by a reconfiguration entity and/or
external units such as data memories, data sources and/or data
receivers. Such data receivers or data sources may be, for example,
displays, data interfaces, external (host) processors,
co-processors, microcontrollers and/or chip-integrated sequencer
units and the like.
[0009] Reconfiguration information may, e.g., also be transmitted
together with the data, e.g., also interested in data words of a
longer data packet, it being in any event possible for the data
exchange between the cell elements to occur preferably in an
asynchronous manner. The transmission of configuration data from
cell to cell may occur by transmitting actual configuration words
for configuring a configurable cell element and/or by transmitting
triggers, in particular in trigger vector form, a selection being
made by these triggers between a plurality of configurations still
to be fed in and/or are already fed in for the trigger vector
target receiver cell element.
[0010] It is preferred, but not absolutely necessary for the
purposes of the present application, if at least one, preferably
multiple configurations are stored for current and/or subsequent
processing in or at the cell elements, it being possible to provide
either a configuration memory in each cell and/or for a group of
cells as known per se from the earlier patent documents of the
applicant.
[0011] Reference should be made to hierarchical structures, which
may be established by and for processor fields of the present kind,
be it for configuration data and/or data to be processed. It should
be mentioned that in a data stream, trigger vectors may also be
interposed in order to select between a plurality of different
configurations, in particular configurations stored in advance, in
the manner of a configuration ID. If, which is regarded as
possible, several configurations are executable on one configurable
cell element in a time-blending manner, as is provided for example
in PCT/EP 02/02402 (PACT25/PCTE), all originating from the present
applicant, then it may be possible in a preferred manner, to
transmit along to the cell elements even in the data transmission
information that relates to the association of a data packet with a
certain task to be processed. With respect to these identifying
specifications to be transmitted along with the data, reference is
made to PCT/EP 02/02403 (PACT18/PCT), where particularly the
explanations regarding APID should be compared, as well as in
PCT/EP 02/10572 (PACT31/PCToe), where the explanations regarding
CONFIGID should be compared. As far as the cell elements are
concerned, it is per se possible that a currently considered
reconfigurable architecture, for which a specific program is to be
compiled, is a (fully) homogeneous field, in which for example as
in the known XPP of the applicant a plurality of cells having in
particular segmented buses in between are provided, it being
possible, but not absolutely necessary, for the cells to be ALUs,
in part having an extended range of function (EALUs), compare
PCT/DE 97/02949 (PACT02/PCT), and (multi-stage) register units
coupled to the input and output buses being possibly provided on
both sides of the ALU, compare, e.g., FREG, BREC in PCT/EP 01/11299
(PACT22a/PCT), as well as respective refinements in other patent
documents of the applicant. Furthermore, reference is made in this
regard to input-output registers in front of the ALU itself, which
under a different name are also found in other writings of the
applicant.
[0012] For this purpose, the communication of the cell elements is
preferably subjected to protocols such as the applicant has already
described in connection with the XPP architecture.
[0013] Mention should be made in particular of the RDY/ACK
protocol, the RDY/ABLE protocol from PCT/DE 03/00489 (PACT16/PCTD)
as well as the additional protocols described there such as CREDIT
protocols etc., e.g., protocols having a reject option. As
applicant has pointed out in earlier applications, possibly
received, but no longer needed, data packets may be discarded. Here
mention should be made only by way of example of PCT/EP 2004/003603
(PACT50/PCTE), which is likewise in its full extent relevant also
for other purposes, such as for application purposes with respect
to the reconfigurable architecture for instance in connection with
hyperthreading, processor-coupling etc., and which for disclosure
purposes is to be regarded as incorporated in its full extent.
[0014] The cell elements may take the form of and/or include in
particular ALU-PAEs, EALU-PAEs, RAM-PAEs, RAM+ALU-PAEs,
function-folding PAEs (see DE 10 2005 005 766.7, DE 10 2005 010
846.6, DE 10 2005 014 860.3, DE 10 2005 023 785.1, EP 05 005 832.0,
EP 05 019 296.2, EP 05 020 297.7, EP 05 020 772.9, and (PACT62
ff)), graph-folding PAEs, sequencer structures connected via
command lines as well as PAEs, which may have, in addition to a
configurable or adjustable unit such as an ALU, a memory such as a
circular buffer and the like, in particular those having several
pointers etc., also parts firmly defined once in their function,
for example FPGA-like logic circuits that are defined, FPGA-like
groups that are reconfigurable only seldom and preferably without
recourse to preferred, in particular faster configuration methods
and/or logic circuits fixed in their functionality such as ASICs,
which may be used for example for certain I/O protocols such as
RS232, LAIN, VGA, XVGA, DVI, USB, S/PDIF, Firewire, RAMBUS etc.
[0015] Furthermore, using the ASIC-like logic circuits, which may
belong to the cell elements, it is possible to fall back on fixed
functions, for example ASIC-like programmed DCT algorithms, FIR
filters or IIR filters, VITERBI algorithms etc., which may be of
significance for various applications such as in general purpose
processors, general purpose co-processors, microcontrollers,
sequencers, image editing and/or image processing such as for HDTV,
cameras, base stations, mobile telephones, radio receivers
(software-defined radio), smart antennas, CODECs and/or parts for
these.
[0016] In order to be able to use such structures and methods of
structure operation, the corresponding hardware must now be
designed and data processing processes capable of being executed on
this hardware must be defined.
[0017] Experience has shown that, as discussed abover with respect
to FIG. 1, it is already possible without problem to design
hardware having the aforementioned architecture, protocols etc. and
to write programs for it. As far as programs for the architecture
are concerned, reference is made in particular to the NML language
and the documentation, manuals and general descriptions existing
for it. It should be mentioned that programming languages are known
per se and are optionally applicable to the specific architecture
as well. BASIC, LISP, COBOL, PL-M, ADA, ALGOL, FORTRAN, BASH, TCL,
but also JAVA, C in various dialects such as C++, PASCAL, OBERON,
EIFFEL, PERL, A, B, XML, UML are example possibly relevant high
level programming languages.
[0018] Embodiments of the present invention make possible at least
partial improvements in the design and/or with respect to the
usability of structures and architectures mentioned at the
outset.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIG. 1 shows a procedure according to the related art.
[0020] FIG. 2 shows an improved method in accordance with the
present invention for creating and/or programming hardware.
[0021] FIG. 3 shows a parameterizable hyper-PAE according to an
example embodiment of the present invention.
[0022] FIG. 4 shows insertion and removal of registers according to
an example embodiment of the present invention.
[0023] FIG. 5a shows a connection of an element array to a hardware
module by way of FIFO memories according to an example embodiment
of the present invention.
[0024] FIG. 5b shows a connection of an element array to a hardware
module with RAM memories as coupling elements according to an
example embodiment of the present invention.
[0025] FIG. 6a shows a connection of an element array to a hardware
module by way of FIFO memories, where trigger vectors are
transmitted, according to an example embodiment of the present
invention.
[0026] FIG. 7 shows a coupling arrangement according to an example
embodiment of the present invention.
[0027] FIGS. 8a-8d show spatial arrangements of hardware modules
with respect to an NW field, according to an example embodiment of
the present invention.
[0028] FIG. 9 shows various interconnections between hardware
modules and between the hardware modules and parts of an element
array, according to an example embodiment of the present
invention.
DETAILED DESCRIPTION
[0029] As is yet to be explained below, FIG. 2 shows essentially
parts of the design flow as is also known in FIG. 1 from the
related art, but it supplements and extends or modifies it in an
inventive manner. As will become apparent and be explained below,
the following is of particular importance in this regard.
[0030] First, a high level language program is provided, in which
initially no reference needs to be made to actual hardware
characteristics. This program may be written in the conventional
high level languages such as C++, JAVA, MATLAB etc. Thus,
programming is performed in abstraction from any hardware, ergo at
this point one preferably, but not necessarily, uses an entirely,
at least partially hardware-abstracted language. These
hardware-abstracted programs or this hardware-abstracted program is
then translated as known per se preferably with reference to a
quasi-maximally free hyperset, that is, a superset of possible
hardware objects, which for individual objects may include a
plurality of variants, these variants also for example, which is
preferred, possibly differing from one another in a manner
determinable by parameters in one or in multiple characteristics.
When the hardware-abstracted high level language program is
translated with reference to a quasi-maximally free hyperset of
possible hardware structures etc., for the purpose of which a
transformation compiler is used, then for this purpose one may fall
back on a multitude of PAEs parameterized for this hyperset and
similar suitable modules stored in a software library. The modules
in the library may be intended for parameterized or still
parameterizable elements of the hyperset, and, as the translation
described above as performed by the transformation compiler, may
occur both by machine coding as well as if desired entirely and/or
partially manual coding. It should be mentioned that the use of
modules in machine and/or manual translation is not absolutely
necessary.
[0031] The parameterization may be performed interactively by a
programmer, in particular by interaction with a place-and-route
tool, but may also be suggested by the latter, possibly even in a
fully automatic manner, and possibly only be confirmed and/or
stipulated without confirmation. Alternatively, heuristic methods
are possible as well, possibly even interactively and/or by
open-loop and closed-loop control of a place-and-route tool. In
heuristic methods, an iterative procedure using the place-and-route
tool or another tool in the programming and hardware definition
environment may be performed. It should be pointed out that such
iterations may occur manually, semi-automatically and/or
alternatively and particularly preferentially in a fully automatic
manner.
[0032] With the heuristic, SETPOINT variables may be specified for
this purpose, which are to be reached by the iteration, by trial
and error for example. In this connection, for purposes of
disclosure, explicit reference should be made to the methods of
"simulated annealing."
[0033] In addition to methods of simulated annealing, obviously,
evolutionary methods such as genetic algorithms may be readily used
as well.
[0034] In this connection, quasi-maximally free incidentally means
for the hyperset that the number of limitations to generally
available objects is as low as possible, that is, that as many
degrees of freedom as possible remain. Notwithstanding the demand
for as many degrees of freedom as possible, however, limitations
may be necessitated by certain factors such as, e.g., the
constructability of modules in the target semiconductor
implementation, which is why the term "quasi-" maximally free is
chosen. Incidentally, it should be pointed out that in certain
cases the quasi-maximally free hyperset needs to contain only one
PAE, which then however must be largely and in many parameters
parameterizable, from which by parameterization many mutually
distinct PAEs are derivable.
[0035] The final result is thus a program from a multitude of
functional blocks, which are indicated in FIG. 2 as f(n) for
different n.
[0036] On the basis of this program, which was already generated
with recourse to hyperset elements from the high level language
program and was thus generated in a manner according to the present
invention, novel with respect to the related art and in a manner
regarded as in accordance with patent for itself, a further
improvement may now be achieved. First it is possible (proceeding
to the right in the illustration) to select certain of the program
parts for processing on the hardware later executing the program
not by elements provided for general purposes, selected from the
hyperset and determined by parameterization etc. entirely in their
hardware construction, which, programmable or configurable, are
available also for quasi any other task to be processed in the
reconfigurable field, but to be implemented individually and/or
jointly in a hardware system specialized and optimized or
optimizable by dedication. In FIG. 2, program parts f(3), f(n),
f(n-2) are selected for this purpose. Typically, such program parts
may and will be configurations or configuration parts or an
individual configuration for an XPP field or the like, which is
composed of an at least partially reconfigurable set of elements
such as ALU-PAEs, graph-folding PAEs, function PAEs, MAC PAEs, RAM
PAEs, ROP PAEs and/or input-output PAEs, which are described in the
hyperset or describable by the latter, in particular completely
describable by parameterization. The selection of the type of
modules to be implemented may occur in various ways; the following
possibilities being mentioned only by way of example, it being
obvious that it is possible and preferred in a practically
preferred embodiment of the present invention not to fall back
exclusively on a single one of the possibilities, but rather to
provide multiple or all of the possibilities for simultaneous or
successive implementation as hardware module of program parts:
[0037] selection of program parts by hand, which may be done
particularly by inserting suitable text passages in the program
code such as, e.g., by inserting control characters; [0038]
selection of those program parts that occur and/or must be executed
particularly frequently in the entire program code or in a
multitude of program codes, which are to be executed independently
of one another on the hardware to be produced, will probably come
to be executed, that is, a selection according to execution time
and/or execution frequency; [0039] modules, from which one is able
to ascertain that with respect to other elements they are otherwise
executable only with difficulty or at a higher clock frequency,
that is, program parts that prove to be critical with respect to
performance; the selection of such program parts may be preferred
so as to be able to execute certain program parts on a certain
piece of hardware at all; [0040] selection of program parts, which
otherwise would generate a particularly high power loss on the
hardware to be produced; [0041] program parts, which could result
in a particularly high surface area requirement of the hardware
chip; [0042] selection of program parts according to heuristic
methods, which allows, particularly on the basis of the program
code, for a--even for itself possible--parameterization; and [0043]
selection of program parts by profiling or comparable techniques;
it may be provided either to identify on the basis of a source code
analysis those parts for which dedicated hardware modules are
particularly suitable, for example with a view to the
above-mentioned parameters with respect to executability,
implementability etc. Alternatively and/or additionally it is
possible to perform a profiling during the execution of programs.
For this purpose, an analysis may be made as to which program
parts, subprogram parts, configurations, configuration parts etc.
are subject to a particularly frequent execution, are
performance-critical, surface area-critical, require many and/or
long memory accesses, are particularly frequently used in various
configurations etc. The advantage of such a profiling lies in the
fact that for typical applications that call up a multitude of
programs, for example the application of a processor as a general
purpose processor on a server, a laptop or a workstation,
processors, co-processors and the like may be defined that are
optimized for a or typical user(s). To be sure, it is possible to
perform such a profiling on a simulator as well, but the particular
advantage of the present technique of the top-down approach is that
initially an already highly performing chip, which is thus
specifying real time conditions, may be used, which does not
compromise a user whose profile is to be detected, is made
available. Thus, by using the target architecture, it is possible
to detect how it may best be subjected to a design change process
without performance losses, but rather while improving the
performance with respect to critical parameters. It is pointed out
that, apart from the circuits described here, corresponding
precisely to the later desired architecture by the definition of
hardware modules, the idea of starting from the actual target
architecture for defining modified circuits by selecting particular
program parts and described definition of the hardware parts is
regarded as inventive for itself; in particular, the submission of
partial applications and the like is reserved for this purpose
and/or for parts of these aspects. Reference should be made to the
possibility of performing a successive processor improvement by
transmitting a multitude of profiles to a central unit, e.g., a
processor manufacturing firm, in particular by transmission over
the Internet. This may be used, e.g., for standard programs and for
other processors.
[0044] In this connection, it should be mentioned incidentally that
by taking the data obtained by profiling a manual selection may be
made and/or an automated selection.
[0045] It should be mentioned that in the selection it is not
necessary always to pay attention only to one parameter. Rather, it
may be possible, for example by recourse to methods of fuzzy logic,
to take into account multiple or all of the above-mentioned
influencing parameters, particularly with a suitable weighting
and/or in an nonlinear manner. The selected program parts are
initially on the already known PAEs existing in the hyperset, which
incidentally may include, in addition to the previously mentioned
PAEs, also PAEs that are made up of a combination of the
functionalities of the above-listed PAEs, that is, for example, a
parameterizable or parameterized PAE having a parameterizable set
of ALUs of parameterizable bit width and parameterizable range of
function, it being possible for this PAE to include additional
graph-folding, parameterizable elements, just as function-folding,
parameterizable elements parameterizable with respect to the bit
width for example and/or in particular parameterizable memory areas
having pointers and/or command-control line of one or multiple
ALUs, or other data-modifying parts in the PAE, in order to
implement sequencers or microprocessors, input-output elements and
the like.
[0046] An example of a still parameterizable hyper-PAE is shown in
FIG. 3. There one finds various parameterizable units such as,
e.g., bus inputs having m inputs, m representing a parameter, that
is, m different operands may be supplied to one PAE. The buses are
respectively k bits wide, k in turn representing a parameter, and n
different buses are provided, from which the m different inputs are
picked off. The total number of buses, n, also represents a
parameter. Within the PAE, different operand-combining units are
then shown by way of example, in the exemplary embodiment shown in
FIG. 3 for example a divider having a combinatorial network, a
multiplier, an ALU stage, a Boolean logic, a barrel shifter stage
as well as a floating point unit. It should be pointed out that the
aforementioned units in turn are parameterizable, for example with
respect to the operand width, that is, they may be, e.g., 8 bit, 16
bit, 32 bit or 64 bit stages or obviously stages of other bit width
as well, it being additionally possible for the range of function,
for example of the ALU, the floating point unit etc. to be defined
via parameters. It should be pointed out that for reasons of
simplicity of the drawing certain, omitted elements, which possibly
may also be provided in a hyper PAE such as sequencer units,
function-folding PAEs, compare PCT/EP 03/09957, may be provided. It
should be mentioned that memories of parameterizable width and
depth may also be provided etc. In this connection, reference is
made in particular to the previous applications of the present
applicant, in which a multiplicity of different logic elements such
as also FPGA-like structures, SIMD units etc. for PAEs are
disclosed, this disclosure being incorporated in its full
extent.
[0047] Regarding the parameterizable range of function, the flow
point unit may be, only by way of example, a floating point unit
that is capable of at least one, preferably several of the
following combinations in the still parameterizable definition:
multiplication, addition, subtraction, division, floating point
combination, look-up tables, possibly having an interpolation
option for certain functions such as trigonometric functions (sine,
cosine, tangent), sequential calculations as for Taylor series, it
being possible for special hardware to be provided for certain
approximations/interpolations and it being possible preferably in
addition for a parameterization of the floating point unit to be
provided with respect to the data word width in the mantissa and/or
exponent.
[0048] A parameterizable library for such a hyper PAE may have
recourse, for example, to a procedure in which so-called ifdef
constructs are used. These supply certain program segments to a
translation (in hardware circuits, which must be actually provided
on a chip) only if corresponding definitions are provided for this,
for example by specifying the parameters, for example the range of
function. It should be mentioned that this is also possible for
variables and elements of the hyper PAE such as the configuration
registers specified also at varying depth, possibly the protocols
(compare RDY/ACK, credit protocols, RDY/ABLE etc.) capable of being
implemented on a PAE, just as the parameterization of an output,
different multiplexer stages in a PAE etc.
[0049] In order to achieve the desired improvements either with
respect to some of the previously selected critical criteria such
as power consumption, surface area efficiency or execution
performance and/or a particularly great improvement in at least one
of the areas combined with at best a partial improvement of other
areas or a complete disregard of the same, for example, if in high
performance-critical program parts, power and/or surface area do
not matter, a preferably automatic and/or partially automatic
converter step is now executed in a preferred embodiment. This is
indicated in the figure as NML2V and represents a converter step by
which a hardware language description is determined for the program
parts that were selected, possibly by taking into account the
reason for the selection. In light of the fact that the program
parts for the hardware modules were selected with reference to one
or more elements in a hyperset, it is possible to find an identical
translation, that is, it is ensured that no errors occur in the
conversion into a hardware-describing code such as VERILOG, which,
if this is desired, may be confirmable by intermediately executable
simulation steps. Thus, one first obtains a hardware-describing
code, e.g., a VERILOG code, which has the corresponding
functionality of the parameterized PAE in the investigated
configuration(s).
[0050] Surprisingly, the use of hyper PAEs in the definition of the
program parts, which are then used for implementing hardware
modules, proves to be nondisruptive for the converter to hardware
code. The reason for this is that certain of the parameterizable
characteristics such as the bit width of the PAE, for example, must
already be determined when determining the actual program for the
transformation compiler, while other characteristics such as the
actual ranges of function for example, that is, for example the
provision of a divider stage, a multiplier stage, and adder stage
and/or a subtracter stage in an ALU-PAE do not yet have to be
defined. In other words, simultaneously with the transformation
compilation the quasi-maximally free hyperset is reduced to a
parameterized and/or partially parameterized hypersubset, in
particular fewer degrees of freedom being specified, that is, no
modification being required. In this instance, the bus widths to
the cells may already be defined for example. It should be
mentioned that the already defined parameters, which were defined
in the transformation compilation for example, are made available
to the NML-to-VERILOG converter or, more generally, to the hardware
language description-generating converter, which may be done by
corresponding indications on the program parts, for example in the
form of comment lines and/or by data separated from the actual
program part. The transformation compiler is thus designed for the
generation of parameterization information of hardware on which it
is to be based. In contrast to conventional compilers,
hardware-describing code, that is, code describing degrees of
freedom, is also generated.
[0051] The program parts, for which a hardware module is to be
implemented in an optimized manner, now not only have parameters
defined with respect to the PAEs, but rather it is at the same time
clear in which configuration a certain PAE is to be operated in the
program part that is to be converted to a hardware module. This
configuration now has the consequence that it is, if applicable,
immediately clear that certain parts of the PAE are not used, which
is the case for example if in the transformation compiler a
floating point unit must still be provided for other program parts,
but no floating point operations are required in a currently
considered program part that is to be translated into a hardware
module. The configuration that is defined for purposes of the
present consideration (bearing in mind that multiple configurations
to be processed successively may be present in the PAE for
sequencer-like PAEs or PAEs operated in a sequencer-like manner)
thus indicates that certain units are not required and it is then
possible to ascertain that for example a multiplexer connected
downstream from an operand combination stage, which is used to
select which operand combination unit should set its output or
outputs to an output region, is dispensable or partially
dispensable. The multiplexer typically situated behind the multiple
operand combination units of a typical PAE may thus as a rule be
readily simplified in a given hardware module. An invention per se
is likewise seen in the removal of multiplexer stages and/or
complete multiplexer units in the determination of hardware modules
with recourse to hyper PAEs or a quasi-maximally free set of hyper
PAEs. It should be mentioned that the removal of elements not
required in a configuration to be executed in a PAE may occur by
the NML2V converter, that is, in the isomorphous hardware
simplification means, and/or that the selection of hardware
elements to be removed as not required may also be performed by way
of a synthesis. Incidentally, it should be pointed out that in the
hardware module or the parts intended for the latter the
configuration register does not necessarily have to contain only
one constant value as was, e.g., depicted for reasons of better
illustration. Rather, particularly if wave-like changes or
reconfigurations of the operating mode and/or conditional changes
of the operating mode of an individual element are required for the
hardware module, for example as a function of data processing
stages above or below, multiple possible configurations may be
stored in the configuration register. The selection among such
previously stored configurations, which are disclosed by the
applicant in other applications, is pointed out in a manner fully
incorporating by reference, compare in particular, although not
exclusively, PCT/DE 98/00334 (PACT08/PCT). Incidentally, it should
also be pointed out that not only trigger vectors etc. are
transmittable, but possibly, within the hardware module and/or from
outside in an accordingly limited range of function, also data are
transmittable directly to a unit, which may be regarded as
configuration data, work instructions (commands etc.) and/or which
may contain respective instructions, in particular set between
operands. Incidentally, it should be pointed out that the hardware
module may also be defined in such a way that freely definable
configurations are still executable on the defined hardware module,
these freely definable configurations then in each individual
element accessing a reduced set of functions and/or a limited
connectivity, for example only with respect to next-neighbor
connections instead of global bus connection extending over many
cells being possibly provided between the individual elements of
the thus defined hardware module, whereas nevertheless a
multidimensional, that is, also possibly clearly more than
two-dimensional connectivity and/or a toroidal, even
multi-dimensionally toroidal connectivity is feasible.
[0052] The hardware description code of the NMLV2 converter thus
generated preferably automatically is now still optimized in a
particularly preferred variant of the present invention. The aim of
this optimization is on the one hand to allow for the elimination
of the registers, combination units etc. in a parameterizable PAE
that are not required for the respective functionality; reference
being made in this connection to the earlier applications of the
present applicant, compare PCT/EP 03/08080 (PACT30/PCTE) and PCT/EP
03/08081 (PACT33/PCTE). These provided for a configuration of a
field or of an individual PAE to be defined once by the use of
fuses, that is, breakable elements and the like in order to allow
for a problem-free construction of chips having ASIC functionality
without the requirement of a mask construction for each ASIC
embodiment; although in this previously known variant possibly not
required elements of functionality remained in the ALU or another
unit of a PAE. If, for example, a PAE having an ALU, which included
a subtracter, a divider, an adder and a multiplier, was configured
in a fixed manner in order to provide an adder, then the silicon
surface area used for producing the multiplier had to be provided
nevertheless. The present application and invention among other
things in one of its aspects aims to avoid this, which contributes
toward a reduction of the size and thus possibly also of the
execution speeds of a dedicated hardware area. The corresponding
changes in the parameterizable and already partially parameterized
hyper PAE take place in a retiming stage, in which initially
unnecessary registers are removed. The removal of the registers
first results in a decapsulation of functional parts previously
encapsulated by the use of the PAE definition. This is by no means
critical, however. On the contrary, in the case of a suitably
intelligent design chain, it is rather very advantageous.
[0053] The design chain hereby provided according to the present
invention inherently features the intelligent layout, which renders
obsolete the complex encapsulation required in the related art, for
example by input-output FIFOs and/or registers, which are
practically controllable only via suitable protocols such as
RDY/ACK protocols. For this purpose, e.g., initially the internal
registers are removed, that is, the registers situated between the
considered cells at their mutual junctions. The removal of the
registers, however, does not occur blindly for all registers, but
rather there is a preferably readily automated selection of
registers that are removable or that must remain in the piece of
hardware. First, constants should remain in the piece of hardware.
Further, it is strongly preferred if registers for preloading
values (PRELOAD registers) are not removed. Additional registers
are initially not required in a given implementation of the
method.
[0054] This obviously changes the timing behavior or the overall
system. Now, the present invention provides for the registers to be
removed nevertheless, but for a synthesis step to be performed in
order to ensure a correct timing of the data processing by the
considered piece of hardware. Preferably, therefore, a synthesis
step is performed according to the present invention. This also
applies to the inputs/outputs of the hardware module to be
constructed.
[0055] It should be pointed out that by and for suitable logging
bus-internal registers may readily be used, that a
feeding-in/reading-out of data in RAM-PAEs having sufficient memory
depth is possible and/or a reading-in/reading-out of data in the
preload memory, if required at all, may occur, or the provision of
input-output registers at the end and at the beginning of the piece
of hardware, unless for example the long-familiar FORWARD-BACKWARD
registers are also to be provided for purposes of use by other
PAEs. In a preferred design, constant contents of RAMs are
implemented by ROMs or mapped onto ROMs.
[0056] The removal of the registers will now change the timing
behavior. Initially, the frequency behavior of the considered
circuit to be provided may deteriorate, possibly even
significantly. This may be compensated by again inserting registers
in suitable places, which are either arranged according to fixed
rules, for example by inserting less deep register stages in places
where previously deeper register stages were provided, by inserting
register stages of the same depth as those that were previously
removed, or, particularly preferably, by considering the signal run
times through the remaining hardware circuits in order to identify
places at which registers are required to increase the frequency;
one skilled in the art being able to perform such a procedure per
se without deeper explanation.
[0057] It must further be kept in mind that, while the considered
software part may be regarded as initially balanced, balancing is
normally performed or could be performed by providing register
stages between different data-processing functionality areas in or
between the PAEs etc. The initial removal of the registers now
impairs the possible or already given balance of the
data-processing paths, which must be coupled at certain points. In
another register insertion step, the attempt is now made either to
arrange the registers already provided again in such a way that not
only the possibly demanded and required frequency increase is
obtained, but rather at the same time also a data run time balance
is achieved. Thus an automatic balancing in the retiming means by
register insertion is brought about by retiming only on the basis
of program parts possibly to be made into hardware modules, in
which it is pointed out that certain data paths are to be balanced
against one another.
[0058] Something else has to be kept in mind as well when retiming:
The hyper PAE, even in the case of a given parameterization, will
normally still include functionalities that are not required in the
hardware module. For example, it would be conceivable that a
hardware module is written for a program part in which no divisions
are required at all. In this case, a divider stage could be omitted
in a PAE. A division now requires a certain delay, that is, a run
time across the module. This will be significantly greater than for
example the run time across an adder stage. The primarily given
data run time balance of the parameterized or hyper PAE will be
such that the run times of a divider stage are taken into account
as well. If, however, in a hardware module at a certain point a
divider stage is no longer required, which is discernible, then
such non-required unit may and preferably will be removed from the
PAE, which then changes the delay of the data run through the unit.
The hardware module should also be adapted accordingly when
retiming. Fundamentally, it should be pointed out that this is not
absolutely necessary, however. A certain advantage is already
obtained if between the individual stages of a hardware module
composed of multiple hyper PAEs non-required register stages are
removed. In the preferred case, however, non-required parts are
also removed from the hyper PAEs, which may occur during the
synthesis, for example, such as, e.g., the removal of a divider
stage discussed above, other stages such as memory stage elements,
multipliers, floating point units etc. being also removable, if
indicated. This too may be taken into account when retiming. For
this purpose, a synthesis is preferably performed, by which the
timing behavior is analyzed in an automated manner in order then
either automatically to insert registers in required places and/or
to provide indications where a programmer should insert registers
in order to ensure a proper timing behavior.
[0059] Incidentally, it should be pointed out that divider stages
were mentioned above. With regard to this and to the removability
of register it should be pointed out explicitly, although
exemplarily, that on the one hand protocol-relevant and data
communication-relevant registers may be provided in a module or
array; such being readily removed at first. Precisely the division
shows, however, that certain registers shall not and/or cannot be
removed. The division may be implemented in two ways if a division
stage to be provided in hardware is to be constructed. The first
possibility provides for a combinatorial network, in which no
registers are required. The second variant provides for a
sequential division, in which a value is computed iteratively again
and again, comparable to the manual computation of a division. In
the latter case, intermediary results must be written into
registers. These must not be removed when retiming since they are
algorithmically required. The non-removal may be brought about,
e.g., by indications in the hardware-defining code of the hyper
PAE, which may lead to comments in a compiler code of the
transformation compiler that are not required for actual program
purposes. Alternatively and/or additionally, variants are
conceivable, in which first a removal and subsequently a
reinsertion may be performed.
[0060] In a particularly preferred variant, therefore, the hyper
PAEs may be marked as to whether certain registers are
algorithmically required such that they are not automatically
removed in an initial removal of the registers. Alternatively
and/or additionally, when removing superfluous registers, analyses
may be performed to the effect that registers having a feedback to
circuit regions located upstream of the data flow are not removed.
For such registers are automatically algorithmically relevant
registers. It should be pointed out that even algorithmically
required registers are obviously removable if the algorithm with
which they are associated is not executed; something that happens
for example in the case of a sequential division generally provided
in a hyper PAE if the division per se is not implemented in the
hardware module to be constructed. Incidentally, it should be
pointed out that feedbacks are implemented in the standard PAEs
provided by the applicant by backward registers. If these are
actually required in a given program part, it is advantageous not
to remove them or not to remove them without verification and/or
not to remove them completely.
[0061] If indicated, registers are then inserted with the retimer.
It should be mentioned that it is in principle possible to insert
the registers at any place in the hardware module, as required. In
particular, if performance efficiency is the sole concern, then it
is possible to insert register within a (parameterized) hyper PAE
that is provided in the hardware module. It should be pointed out,
however, that a simpler method of register insertion is obtained if
on the interfaces between multiple hyper PAEs in the hardware
module to be designed again those registers or a part of those
registers are inserted, which were initially removed. The reason
for this is to be seen in the fact that an optimum insertion in
these places is possible for the reason that the entire starting
definition of the hyper PAEs is selected to be such that an
insertion is automatically possible in these places. Reference is
made to FIG. 4, which shows how initially registers are removed for
an only exemplarily pipeline-like, only exemplarily unbranched
hardware module. These are shown as "removed registers" in a shaded
manner. In the hyper PAEs, which are drawn upon in the
parameterized faun of the hardware module description, these
registers are the input/output protocol registers, that is, for
example the FREG/BREG of the hyper PAEs. Alternatively it is
possible to provide PAEs without FREG/BREG only with those
registers that are provided in the direct coupling path of the ALUs
and other logic elements for operand combination in the PAE in the
connections to the buses as protocol registers. Reference is made
in particular to OREG and PREG from PCT/DE 97/02949 (PACT02/PCT).
The newly inserted registers, which ensure the balancing or the
desired performance/surface area efficiency/latency following the
removal of the shaded registers, are drawn in FIG. 4 by dashed
lines and indicated as "inserted register" or for multi-stage
registers as "inserted FIFO."
[0062] It should be pointed out once more that the represented
insertion of registers, FIFOs and the like between the predefined
hyper PAEs not only simplifies the structural layout, but also
facilitates the verification and calculation of the delay times
across the circuits provided in the hardware module since the run
time behavior etc. of the underlying elements may be assumed as
well-known in the register removal step or the retiming step, which
facilitates a possibly iterative approach to the retiming task. In
addition, the insertion of registers between the previously used
and underlying (parameterized) hyper PAEs is particularly surface
area-efficient since, e.g., the use of general ALUs in the hyper
PAEs would there require a multitude of registers even though the
insertion would readily be possible there as well, for example in
order to achieve particularly high frequencies. In addition, there
is hardly a positive effect in a cut within an ALU or a PAE
core.
[0063] Incidentally, it should be mentioned that it is readily
possible to design a hardware module in such a way that it has to
run at a different working frequency than that provided for an XPP
field or another reconfigurable unit field. On the one hand, it is
possible to select the frequencies to be lower, for example to
reduce latencies, to reduce the surface area and/or to reduce the
power consumption. For the sake of completeness, it should also be
disclosed that for lowering the power consumption it is also
possible, if indicated, to work with other hardware definitions
such as for example different gate thicknesses of transistors in
comparison to a reconfigurable, processor field to be likewise
provided. As an alternative to the purely power-saving
considerations, it is also possible in certain cases to design the
hardware modules for a certain frequency, which may be advantageous
especially if a particularly high data throughput is required in
the hardware module and/or if highly computing-intensive tasks must
be processed in it.
[0064] Incidentally, it should be pointed out that the register
stages or FIFO stages to be inserted or to be newly introduced or
reintroduced are usable not only with a view to, e.g., latencies,
but rather also in order to restore a balance of data flow paths
possibly destroyed by the initial removal of registers. It should
be pointed out that initially after removing the registers a
balanced data path automatically exists, but that possibly timing
conditions are not maintained; and that then initially the timing
behavior is restored by inserting additional registers, but that
because of that the balance between the individual data paths may
be disturbed. In order to restore the balance as well after having
restored the timing conditions and timing requirements, recourse
may be taken readily to the techniques for balancing from the
related art, particularly to those stemming from the applicant, by
using in particular precisely those algorithms to which recourse is
taken also in the construction of compilers such as for an NML
compiler for computing executable configurations. Reference is made
in particular to the applications PCT/EP 02/10065 (PACT11/PCTE),
PCT/EP 02/06865 (PACT20/PCTE), PCT/EP 03/00624 (PACT27/PCTE),
PCT/EP 2004/009640 (PACT48/PCTE). In these applications suitable
methods for balancing are described.
[0065] The output from the retimer is then a hardware code
definitely executable by recourse to the hyper PAEs or elements of
the quasi-maximally free hyperset, which is frequency-optimized
and/or throughput-optimized through the retiming. In addition, the
surface area is automatically optimized. The definition thus
obtained of new hardware areas as hard modules may now be
integrated into the XPP library. There are many possibilities for
the thus determined hardware module functionalities to be
integrated into and/or be connected to an XPP field or, more
generally, a field of reconfigurable and/or partially
reconfigurable elements. One possibility, for example, is to
provide a complete PAE, which does not have an ALU or an individual
sequencer as central functionality, but rather the specified
hardware functionality of the hardware module. In this instance, it
is particularly preferred if in such a PAE, as is provided in
PCT/EP 01/08534 (PACT14/PCT), an outwardly identical geometry and
in particular a connection geometry is provided, as in other PAEs
of the field. This has the great advantage that the homogeneity of
the field remains largely unimpaired. Alternatively and/or
additionally, it is possible to achieve a connection without a
corresponding consideration of form factors and the like, compare
DE 102 36 269.6 and DE 102 38 172.0-53 (PACT36 and 36I), by setting
the specific hardware modules next to the actual field. For this
purpose, it is possible to provide for an integral manufacture
and/or to manufacture the parts separately and then to let them
communicate via buses, via RAMs and the like with the
reconfigurable field, compare SOC technology etc.
[0066] Other possibilities of connection are described in the
figures.
[0067] FIG. 5a shows on the left a combination of an XPP field or
an FPGA field with a hardware module of the present invention, the
connection of the hardware module to the field occurring via FIFO
memories in the input path and/or output path, preferably via FIFOs
in both paths. By providing a FIFO memory between the or each
hardware module of the present invention, a decoupling is already
achieved in principle, which allows for a more independent
procedure and a deviating timing etc. especially with respect to
the processing speed.
[0068] It is particularly advantageous, however, if the exchanged
data packets are given identification information in the form of a
packet header or additional identification bits on each individual
word or the like. In this manner it is possible, for example, to
execute different tasks in a multithreading, hyperthreading,
multitasking, timeslotted or other manner either with the hardware
module and/or the XPP or FPGA field or another field and then, in
spite of the comparatively loose coupling effected via the FIFOs,
to ensure nevertheless that an exchanged data packet or data word
undergoes the correct processing in the receiver, that is, in the
hardware module or the XPP field or the like.
[0069] It should be mentioned that it is already helpful if
identification information in the hardware module remains unchanged
and/or is changed only in such a way that the associated data
packet is processed in the manner provided following the return of
the processing result to the receiver, that is, for example the XPP
field, for example in that it is processed further using the
correct configuration.
[0070] Alternatively and/or additionally it is possible, however,
to co-transmit, in place of pure identification information and/or
in addition to the latter, also control instructions or the like,
in order to choose for example in marginally changeable hardware
modules, whether an addition or a subtraction of consecutive
operands is to be performed and the like. In this manner, an
increased flexibility of programming all the way to self-modifying
code may be achieved.
[0071] FIG. 5b shows how for coupling a hardware module in the
input path and/or output path it is possible to provide coupling
elements in the form of RAM memories, in particular even of RAM
PAEs, rather than in the form of FIFOs. This makes it possible in
particular to provide respectively dedicated memory areas both for
writing data from the field for the hardware module as well as when
writing from the hardware module for the field, that is, when
transferring data from the field to the hardware module on the one
hand, and when (here:) returning results from the hardware module
to the field on the other hand. This facilitates on the hand the
handling of different configurations and on the other hand allows
for example for prioritizations in that work, that is, reading and
writing, is performed primarily and preferably in a first memory
area, and only if a data processing has been performed sufficiently
often in the first memory area and/or no data are present there,
will recourse be taken to other memory areas and thus other
tasks.
[0072] Incidentally, it should be mentioned that in those places in
the present document where there is talk of data input from the XPP
field and the transfer of the result to the latter, an opposite
course is possible as well in that the hardware module may use the
XPP field as a more flexible data processing element and/or where
mixed forms are possible, that is, where data are shifted back and
forth in ping-pong-like fashion or in a less regular manner for
overall processing.
[0073] FIG. 6a shows a variant in which again data are exchangeable
via FIFO memories and in particular again FIFO memories are present
either in the input branch and, preferably or also in the output
branch. In addition, trigger vectors are now transmitted. Regarding
the significance and application of trigger vectors, reference is
made to WO 98/35299 (PACT08/PCT). The combination of identification
information with programming information and/or trigger information
or status information, which are exchanged in order to trigger
certain data processing events or processes, should be mentioned
once more explicitly.
[0074] FIG. 6b shows that in the data exchange, a "time stamp",
that is, information regarding the temporal priority or the
temporal validity of a data packet is transmitted as well. In the
present case shown in FIG. 6b, this transmission occurs in
read-write memories (RAMs). The time mark that is co-transmitted
may be used to select those data packets, which are to be processed
next.
[0075] It should be pointed out that in this manner the data
processing may be controlled particularly favorably. The actual
procedure of transmitting time marks along with a data word or data
packet in data flow processing was already described in WO
02/071249 (PACT18/PCTE, butcher protocol). Irrespective of the fact
that document WO 02/071249 is in its full extent incorporated by
reference for disclosure purposes, it should be pointed out
explicitly that the assignment of a time mark to data packets
allows for both a data sequence to be reconstructed and/or restored
at any later time as well as for operands to be combined with each
other as required, which is advantageous particularly if branches
supplying operands are balanced with respect to the latencies.
[0076] It should be mentioned that in the manner in which for FIG.
6b reference was made to WO 02/071249 (PACT18/PCTE), with the other
figures reference is made to WO 02/071248 (PACT15/PCTE) and WO
02/071196 (PACT25/PCTE) for FIG. 5a as well as WO 02/071196
(PACT25/PCTE) and WO 98/29952 (PACT04/PCT) for FIG. 5b as well as
WO 98/35299 (PACT08/PCT) and WO 02/071196 (PACT25/PCTE) for FIG.
6a, and that all of the mentioned documents are respectively
incorporated in their full extent individually and in combination
for disclosure purposes.
[0077] It should be mentioned that incidentally the coupling
methods mentioned and disclosed here are also combinable, for
example by connecting a FIFO upstream to a RAM-PAE while
co-transmitting time marks in parallel and the like.
[0078] It should be mentioned that as far as the physical
connectability of the hardware modules is concerned, the latter may
be connected either by integration into the internal bus system of
the XPP or of another processor field and/or via external, possibly
bundled input/output lines. The possibility should be pointed out
of combining a multitude of individual input/output lines to form
buses in order to obtain a coupling of the hardware modules for
example in finely granular fields. In this connection, reference is
made to WO 02/29600 (PACT22aII/PCTE) and the parallel patents
connected with this by way of priority, which are all incorporated
to their full extent for disclosure purposes.
[0079] With respect to the spatial arrangement of hardware modules
or hardware parts of the present invention in XPP fields or other
fields, FIG. 8 shows that these may be provided either as columns
or lines at the edge of a field, it obviously also being possible
for a field to be surrounded by such hardware modules or hardware
parts, and/or that individual elements or field groups may be
distributed over the field, as shown in FIG. 8 on the lower left.
Alternatively it should be mentioned that a hardware element and/or
a group of hardware elements according to the present invention
could also be set next to an XPP field or other field or, assuming
appropriate manufacturing processes, could be placed on top or
underneath such a field. The usability by integration on a single,
jointly manufactured chip should be disclosed as a possibility in
the same manner as those of manufacturing the separate elements
independently and connecting them. It is understood that in the
variants in which the hardware modules of the present invention are
connected most closely to the field because they form an interposed
column, represent a column at the edge, a frame and/or elements
provided in a field, the setup is preferably connected via internal
buses, whereas in an arrangement next to the field a connection via
I/O connectors is preferred. In the case of an arrangement on the
edge, connections may be established alternatively via I/O ports
and/or via internal buses. It should also be mentioned that bus
lines or other lines may be drawn across the hardware elements that
are set between field elements, if necessary. Hardware elements
that are set into a field may also be connectable by separate
lines, as required. The arrangement in columns is incidentally
clearly preferred, a positioning of the column at the edge or in
between being preferred depending on for what purposes a data
processing unit having a hardware part of the present invention is
to be used.
[0080] In a preferred variant, the number of hardware modules will
be selected in such a way that on the one hand the data processing
tasks to be performed may be solved quickly and efficiently, and
that on the other hand the form factor is observed when inserting
into or next to a field.
[0081] In this connection, it should be mentioned incidentally that
the hardware modules of the present invention, even when closely
coupled to a field, may additionally have separate I/O connections
for communication with external elements such as memories and the
like, if necessary.
[0082] According to FIG. 9, it is possible incidentally to allow
for a connection between the hardware modules of the present
invention among one another and/or certain field element parts,
which is either permanently fixed, alternatively and preferably,
however, is set up only temporarily. This is readily possible
particularly when a hierarchically ordered bus system allows for
global bus lines that may be set up and/or dismantled. Regarding
the setup and/or dismantling of bus lines, reference is made by
incorporation of its entire disclosure to the document WO 98/35294
(PACT07/PCT).
[0083] It should be pointed out that prior to building a mask for
manufacturing a dedicated chip, recourse may be taken, if
necessary, to an emulation, the hardware parts being emulated using
FPGA. It is pointed out that the applicant has already proposed
building XPP fields, in which PAEs are provided that may represent
small FPGA fields. By suitable wiring of several such FPGA PAEs,
the hardware structures may then be emulated, if indicated. It is
then possible to emulate a verification or emulation of a
personalized or customer-specific design by a suitably designed XPP
test chip having FPGA PAEs.
[0084] While it was indicated above that hardware modules may be
constructed by hyper PAEs arranged linearly one behind the other
only by way of example, which are suitably parameterized and
defined, this is not absolutely necessary. It may be advantageous
not to assign to each operand combination in the program part a
separate PAE and provide for a linear processing. Rather, in
particular in especially complex program parts, it would also be
possible to break up the program parts in turn into a multiplicity
of different configurations to be processed in the hard module. In
such a case, for example, it may be established, for example, that
a certain cut for breaking up the program part into two
configurations would be advantageous. The manner in which such cuts
may be applied is known per se. Reference is made in particular to
PCT/EP 02/10065 (PACT11/PCTE). If such a procedure is desired,
typically the range of function of the hard module is selected to
be such that the range of function at the desired place
respectively corresponds to the set union of the operand
combinations etc. executed or to be executed using different
configurations. It should be pointed out that in a
multi-configuration hard module definition, fixed configurations
may be provided, if indicated, which are provided in a fixed manner
in the hardware module, compare PCT/EP 03/08080 (PACT30/PCTE).
[0085] Further, it should be pointed out that if multiple
configurations are to be processed successively on the hardware
module, the ranges of function of the individual hard module areas,
which are obtained by parameterization, that is, the definition of
parameters of the hyper PAEs, are preferably selected to be such
that respective computing units, considered individually, have
still a minimal range of function. This may possibly occur in that
the configurations that are divided are executed in such a way that
multiplications are always performed in the same PAE if in each
configuration only one multiplication is required, and, instead of
a multiplier stage, in another PAE, the data lines to be addressed
by a certain configuration, required for the return of data or the
transmission of data, in particular lines to be implemented here
too as next-neighbor connections, are implemented possibly at a
lower surface area requirement.
* * * * *