U.S. patent application number 10/654136 was filed with the patent office on 2004-06-10 for polymorphic computational system and method.
Invention is credited to Brisudova, Martina M., Granny, Nicola V..
Application Number | 20040111248 10/654136 |
Document ID | / |
Family ID | 31981536 |
Filed Date | 2004-06-10 |
United States Patent
Application |
20040111248 |
Kind Code |
A1 |
Granny, Nicola V. ; et
al. |
June 10, 2004 |
Polymorphic computational system and method
Abstract
Configuration software is used for generating hardware-level
code and data that may be used with reconfigurable/polymorphic
computing platforms, such as logic emulators. A user may use
development tools to create visual representations of desired
process algorithms, data structures, and interconnections, and
system may generate intermediate data from this visual
representation. The Intermediate data may be used to consult a
database of predefined code segments, and segments may be assembled
to generate monolithic block of hardware syhthesizable (RTL, VHDL,
etc.) code for implementing the user's process in hardware.
Efficiencies may be accounted for to minimize circuit components or
processing time. Floating point calculations may be supported by a
defined data structure that is readily implemented in hardware.
Inventors: |
Granny, Nicola V.;
(Bloomington, IN) ; Brisudova, Martina M.;
(Bloomington, IN) |
Correspondence
Address: |
BANNER & WITCOFF
1001 G STREET N W
SUITE 1100
WASHINGTON
DC
20001
US
|
Family ID: |
31981536 |
Appl. No.: |
10/654136 |
Filed: |
September 4, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60407702 |
Sep 4, 2002 |
|
|
|
60407703 |
Sep 4, 2002 |
|
|
|
Current U.S.
Class: |
703/22 |
Current CPC
Class: |
G06F 30/331
20200101 |
Class at
Publication: |
703/022 |
International
Class: |
G06F 009/45 |
Claims
We hereby claim the following inventions:
1. A method for programming reconfigurable target hardware,
comprising the steps of: storing a plurality of computer code
segments containing executable computer code for performing a
plurality of algorithms; displaying a graphical workspace;
receiving a user request to display a plurality of predefined
graphical icons in said workspace, and at least one interconnection
between two of said icons, wherein said icons correspond to
respective ones of said computer code segments; receiving a user
request to prepare computer code for executing a process defined by
said icons and interconnection; and responsive to said request,
causing one or more data processors to compile a plurality of said
computer code segments in accordance with said displayed icons and
interconnections to generate a download file for said
reconfigurable target hardware, wherein said download file is used
to cause said target hardware to be configured to perform said
process.
2. The method of claim 1, wherein at least one of said icons
represents a predefined circuit for executing a predefined
algorithm.
3. The method of claim 2, wherein a first one of said icons
includes an input data handle, a second one of said icons includes
an output data handle, and said at least one interconnection
connects said input data handle to said output data handle.
4. The method of claim 2, wherein said at least one interconnection
represents an event trigger signal.
5. The method of claim 1, further comprising the step of using said
download file to configure said reconfigurable target hardware.
6. The method of claim 1, further comprising the step of outputting
said computer code in a human-readable computer language
format.
7. The method of claim 1, wherein at least one of said plurality of
predefined graphical icons represents a data structure, said method
further comprising the step of determining whether said data
structure should be instantiated in hardware as a multi-port memory
based at least in part on a number of interconnections connected to
said at least one of said plurality of predefined graphical
icons.
8 The method of claim 1, further comprising the step of analyzing
said graphical icons and said at least one interconnection to
determine whether any data dependencies exist in a process defined
by said icons and connection.
9. The method of claim 8, further comprising the step of
instantiating a plurality of circuits corresponding to said icons
in parallel based on said step of analyzing.
10. The method of claim 1, further comprising the step of prompting
said user for argument data associated with one of said graphical
icons.
11. The method of claim 10, wherein said code segments includes one
or more argument placeholders, said method further comprising the
step of copying said code segments and substituting said argument
data for said argument placeholders.
12. The method of claim 1, wherein said download file is in a
Register Transfer Level format.
13. The method of claim 1, wherein said user request to display a
plurality of predefined graphical icons is entered using a stylus
and a display sensitive to said stylus.
14. The method of claim 1, further comprising the step of storing
said computer code for executing said process defined by said icons
and interconnection, and associating said computer code with an
icon.
15. The method of claim 1, further comprising the step of
transmitting said download file to a location of said target
hardware, where said location of said target hardware is different
from a location of said user.
16. The method of claim 1, wherein said graphical icons assigned to
a hierarchy including one or more theater, stage, actor and prop
abstractions.
17. A computing system, comprising: one or more processors; a
display, communicatively coupled to said one or more processors; an
input device, communicatively coupled to said one or more
processors; and an electronically-readable storage medium,
communicatively coupled to said one or more processors, and
containing executable program code that causes said one or more
processors to perform the following steps: store a plurality of
computer code segments containing executable computer code for
performing a plurality of algorithms; display a graphical workspace
on said display; receive, via said input device, a user request to
display a plurality of predefined graphical icons in said
workspace, and at least one interconnection between two of said
icons, wherein said icons correspond to respective ones of said
computer code segments; receive, via said input device, a user
request to prepare computer code for executing a process defined by
said icons and interconnection; and responsive to said request,
causing one or more data processors to compile a plurality of said
computer code segments in accordance with said displayed icons and
interconnections to generate a download file for said
reconfigurable target hardware.
18. A computing device, comprising: one or more processors; a user
input device; a display configured to detect said user input
device; one or more memories, storing program instructions that
cause said one or more processors to perform the following steps:
display a workspace on said display; detect a pattern defined by
said user input device on said display; compare said detected
pattern with a library of predefined graphical patterns; when said
pattern matches a predefined pattern in said library, extracting a
computer code segment from a database associated with said device,
said computer code segment representing programming instructions
for performing an algorithm associated with said predefined
pattern; using said computer code segment to generate a data file;
and downloading said data file to a reconfigurable computing
platform, such that said reconfigurable computing platform executes
said algorithm at hardware speed.
19. The computing device of claim 18, wherein said program
instructions further cause said one or more processors to display a
plurality of user-selected graphical icons on said display, and one
or more interconnections between said icons.
20. The computing device of claim 19, wherein at least one of said
one or more interconnections represents an event trigger
signal.
21. The computing device of claim 18, wherein said one or more
memories store a plurality of computer code segments, each
corresponding to an algorithm represented by said detected
pattern.
22. The computing device of claim 18, wherein said reconfigurable
computing platform is a field-programmable array of logic, and said
data file is a binary download file for said field-programmable
array of logic.
23. The computing device of claim 22, wherein said download file
causes said target hardware to perform said algorithm with parallel
processing.
24. The computing device of claim 18, wherein said program
instructions further cause said one or more processors to detect a
second pattern on said display, said second pattern corresponding
to a data structure.
25. The computing device of claim 24, wherein said program
instructions further cause said one or more processors to identify
a plurality of algorithms that interact with said data structure;
determine whether a data dependency exists with respect to said
data structure, and if no data dependency exists, writing said data
file to permit simultaneous execution of said plurality of
algorithms.
26. A method for preparing a download file for target hardware,
comprising the steps of: receiving configuration information
identifying a target hardware; displaying, responsive to a user
request, a plurality of graphical icons representing predefined
algorithms, and a plurality of graphical icons representing data
elements; receiving user requests to create interconnections
between two or more of said graphical icons; automatically
converting said display of graphical icons and interconnections
into programming instructions for performing said algorithms in
accordance with said interconnections, wherein said programming
instructions are optimized in accordance with said configuration
information.
27. A method for configuring a reconfigurable hardware platform,
comprising the steps of: using a graphical authoring utility to
create a plurality of logically-connected abstractions of physical
phenomena, wherein said abstractions represent a plurality of
triggered behaviors, and logical connections represent cues that
trigger said behaviors; forwarding said plurality of
logically-connected abstractions to a distiller, wherein said
distiller redefines said logically-connected abstractions into
hardware description language constructs suitable for a target
reconfigurable hardware platform; and transferring at least some of
said constructs to a host of said reconfigurable hardware computing
platform for synthesis into one or more target primitives and
execution.
28. The method of claim 27, wherein at least one of said behaviors
is instantiated as a triggered lookup table.
29. The method of claim 27, wherein interconnections between said
abstractions are dynamic.
30. The method of claim 27, wherein two or more of said behaviors
are synthesized as parallel blocks of logic within said
reconfigurable hardware platform, such that said behaviors may be
executed in parallel.
31. The method of claim 27, wherein said abstractions are
precompiled.
32. The method of claim 27, further comprising the step of storing
said constructs as a predefined abstraction for future use.
33. The method of claim 27, wherein said step of transferring
further includes the step of transferring a first portion of said
constructs to a host of a first reconfigurable hardware computing
platform, and transferring a second portion of said constructs to a
host of a second reconfigurable hardware computing platform, and
using said first and second platforms to jointly execute said
behaviors.
34. The method of claim 33, wherein said first and second platforms
are located in separate locations.
35. The method of claim 34, wherein said second platform includes
replicate hardware, and lacks said authoring utility.
36. The method of claim 27, wherein said reconfigurable hardware
computing platform is located remotely from a location of said
graphical authoring utility.
37. A method of analyzing a behavior, comprising the steps of:
defining a computational model for said behavior; preparing an
abstraction flow of said computational model; responsive to a user
request, automatically converting said abstraction flow into
computer code for configuring a reconfigurable target platform;
using said computer code to configure said reconfigurable target
platform; causing said target platform to execute said
computational model; recording data values during said execution of
said computational model; using said data values to define a
behavioral model for said computational model.
38. The method of claim 37, wherein said abstraction flow is a
graphical representation of an algorithm defined by said
computational model.
39. The method of claim 38, wherein said step of preparing further
comprises the step of using a pointing device on a display that is
configured to detect said pointing device to create graphical
symbols on said display; and comparing said graphical symbols with
a library of predefined graphical symbols to identify algorithms
associated with said graphical symbols.
40. The method of claim 38, wherein said step of preparing further
includes the step of using a graphical symbol to represent an
unknown behavior under study, wherein said graphical symbol is
associated with data collection computer code.
41. The method of claim 37, further comprising the step of defining
a new hardware primitive corresponding to said behavioral
model.
42. The method of claim 37, further comprising the step of defining
a second computational model existing at a higher level of
abstraction than said computational model, and using said
computational model in defining said second computational
model.
43. The method of claim 42, further comprising the step of
configuring said reconfigurable target platform to execute said
second computational model using said behavioral model.
44. The method of claim 42, further comprising the step of
configuring a second reconfigurable target platform to execute said
second computational model using said behavioral model.
Description
[0001] The present application claims priority, under 35 U.S.C.
119(e), to copending U.S. provisional application serial No.
60/407,703, entitled "A Device, Methodology and Development
Environment for the Modeling of Physical Phenomena Within a
Reconfigurable Computational Platform," filed Sep. 4, 2002, and
U.S. provisional application serial No. 60/407,702, entitled "A
Device, Methodology and Application Development for Signals
Intelligence Using a Reconfigurable Computational Platform," filed
Sep. 4, 2002, the disclosures of which are both hereby incorporated
by reference.
[0002] A portion of the disclosure of this patent document contains
material that is subject to copyright protection. The copyright
owner has no objection to the facsimile reproduction by anyone of
the patent document or the patent disclosure, as it appears in the
Patent and Trademark Office patent file or records, but otherwise
reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
[0003] The present invention relates generally to the field of
reconfigurable computing platforms. The reconfigurable nature of
these platforms indicates that their physical hardware need not be
static, and that it may be readily reconfigured after manufacture.
Such platforms are typically made up from single devices such as
field-programmable gate arrays (FPGAs), collections of FPGA's
assembled into a fabric of reconfigurable hardware or highly
complex logic emulation systems. Some embodiments are particularly
advantageous in logic emulation systems, which may be a large-scale
platform with reconfigurable logic such as the V Station family of
products offered by Mentor Graphics Corporation. In particular,
some embodiments of the present invention relate to user interface
systems and methods for simplifying configuration of these
reconfigurable platforms. Other aspects relate to software design
concepts for configuration of polymorphic computational systems,
which broadly refers to systems employing one or more
reconfigurable computing platforms or emulation systems that may
treat an entire problem holistically, involving not only the
reconfigurable platform, but also its related software, methods,
and practices. Still further aspects relate to using reconfigurable
(and/or polymorphic) computing platforms to provide an easy-to-use,
dynamic development environment that may be used by even those
unfamiliar with computer programming and/or FPGA or emulation
system programming.
BACKGROUND OF THE INVENTION
[0004] The power of modern computing can hardly be overstated.
Calculations that once took anywhere from hours to months to
manually perform can be accomplished literally in the blink of an
eye. Calculation-intensive tasks are now accomplished in a mere
fraction of the time previously required, and with each passing
year computing power is greater than before. These days, the power
of computing is even applied to the process of making computers
themselves, a self-fulfilling process that will inevitably lead to
more powerful computers.
[0005] One tool that is often used in the design of integrated
circuits is the logic emulation system (emulator). The emulator may
be used to simulate hardware circuitry, in real time, prior to the
circuit's formal manufacturing process. The circuit's design, once
emulated, can be analyzed and tested to identify any design errors.
Since the emulator (by design) is reconfigurable, errors in a
circuit's design, once detected, may be corrected by reconfiguring
the emulator. In this manner, a designer can be confident in a
particular design even before a single actual component is
manufactured.
[0006] Although the emulator has gained wide acceptance in certain
fields (specifically electronic design automation), the full
potential for this technology has not yet been reached. This is
partly due to the complexity and difficulty in writing the programs
and download files that are necessary for configuring an
emulator--those outside of the circuit design art have, until now,
simply avoided using the emulator for tasks other than hardware
functional and performance verification.
[0007] The inventors of the present application have realized,
however, that the emulator possesses great promise in computing
power. The emulator can be configured to create dedicated hardware
for executing any desired process or algorithm, and this
configuration may be optimized such that the process is carried out
at hardware speeds--much faster than programs written for general
purpose computers. The potential uses are limitless, as emulators
may be used by geneticists, mathematicians, image analysis experts,
signals intelligence analysis, pattern recognition, and in any
other area where programs are executed on general purpose
computers.
[0008] To a geneticist, however, the typical emulator may as well
be a ship's anchor. Writing typical computer programs or download
files for an emulator takes special skill in computer programming
and logic synthesis (such as knowledge of various hardware
description languages such as Verilog, Verilog Hardware Description
Language (VHDL) and/or Register Transfer Logic (RTL)), and may
require significant amounts of time to write. For example, working
exclusively in RTL and/or VHDL, a simple circuit might require a
skilled semiconductor designer no less than two days to write the
code, and another full day to verify its functionality. Many of us,
geneticists included, simply may not have the time or ability to do
such coding. Accordingly, there is a general need for improved
computing power, and if emulators (or other large scale "fabrics"
of reconfigurable logic) are to be used to offer this power, there
is a specific need for a simpler, user-friendly way to generate the
complex code and download files necessary to program today's
reconfigurable platforms.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] FIG. 1 illustrates an example of a computing environment in
which one or more embodiments of the present invention may be
implemented.
[0010] FIG. 2 illustrates an example of a configuration of a logic
element in a reconfigurable computing platform.
[0011] FIG. 3 depicts an example of a user interface that may be
used in accordance with embodiments of the present invention to
create a visual representation of a desired process.
[0012] FIG. 4a illustrates an example of an icon for an algorithm
according to some embodiments of the present invention.
[0013] FIG. 4b illustrates an example of an icon that represents
data according to some embodiments of the present invention.
[0014] FIG. 5 illustrates an example of how icons may be assembled
and interconnected to create a desired process in some embodiments
of the present invention.
[0015] FIG. 6 depicts an example of a flow diagram showing steps
involved in generating computer code corresponding to the user's
desired process in some embodiments of the present invention.
[0016] FIG. 7 illustrates an example of a process having a data
dependency.
[0017] FIG. 8 shows a hierarchy diagram illustrating how the user's
desired process may be abstracted and analogized to a theater
production in some embodiments.
[0018] FIG. 9 illustrates a block diagram example of how the FIG. 8
abstractions may be implemented in the final hardware.
[0019] FIG. 10 shows a block diagram process flow used in some
embodiments of the present invention, and represents a process that
is similar to that shown in FIG. 6 above.
[0020] FIGS. 11a and 11b illustrate block diagrams showing
communications in an example embodiment.
[0021] FIGS. 12a and 12b show block diagram examples of how some
embodiments of the present invention may interface with target
hardware.
[0022] FIG. 13 illustrates an example of a model for the
distribution of a theater according to some embodiments of the
present invention.
[0023] FIG. 14 illustrates a block diagram example of a
collaborative distribution of theaters according to some
embodiments of the present invention.
[0024] FIG. 15 illustrates a flow diagram of an example
computational/behavioral modeling processing using one or more
embodiments of the present invention.
[0025] FIG. 16 illustrates an example of a block diagram showing
relationships between various elements used in some embodiments of
the present invention.
DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
[0026] FIG. 1 illustrates a computing environment in which one or
more embodiments of the present invention may be used. This
environment uses a reconfigurable computing device 101, which may
be an emulator, although other forms of reconfigurable computing
platforms may work equally well. Emulator 101 contains an array of
reconfigurable logic elements 102, each of which includes circuitry
that allows the particular logic element 102 to perform predefined
functions supporting or implementing a portion of the desired
algorithm. The emulator 101 may also include circuitry, such as a
interconnect 103, that performs interconnections between the
various logic elements 102 to form a larger circuit. Other
approaches to interconnections are also possible, such as on-chip
wiring, circuitry, using logic elements 102 to control
interconnectivity, and/or time division multiplexing of the
interconnections. Some approaches to such interconnections, and
other features that may be pertinent to the disclosure herein, are
described in U.S. Pat. Nos. 5,036,473; 5,109,353; 5,596,742;
5,854,752; 6,009,531; 6,061,511; and 6,223,148, the disclosures of
which are hereby incorporated by reference. Using reconfigurable
computing platforms, one may take advantage of their massively
parallel nature in order to partition a problem to be solved into
manageable elements with fast and reliable communication pathways,
allowing them to be solved by the hardware. Circuits and algorithms
may be implemented on the platforms in a parallel fashion and
executed at hardware speeds, which may be several orders of
magnitude faster than traditional general-purpose computers
(depending upon the nature of the application).
[0027] A user may configure the logic elements 102 and their
interconnections by using computer workstation 104. Workstation 104
may include one or more processors 105, which may execute
instructions from software contained in one or more
computer-readable memories 106 to perform the various steps and
functions described herein. Workstation 104 may also include one or
more displays 107, which may be used to provide visual information
to a user, as well as one or more input devices 108 to allow user
input. Any form of display and input device may be used, although
in some embodiments, display 107 is sensitive to a stylus input
device 108. For example, display 107 may be touch-sensitive, or may
electromagnetically detect the presence of an input device 108,
which may be a hand-held stylus, pen, or other type of pointing
device. Embodiments of the present invention may be implemented
using commercially-available emulation hardware, such as the
V-Station/5M, V-Station/15M and V-Station/30M emulation systems
offered by Mentor Graphics Corporation, and may be used with system
compilers such as the Mentor Graphics VLE 4.0.3 and VLE 4.0.4, also
offered by Mentor Graphics Corporation.
[0028] FIG. 2 illustrates an example of a logic element 102, which
may be referred to as a common logic block (CLB) in some
embodiments. As shown in FIG. 2, a particular logic element or CLB
102 may include a number of inputs 201. In some systems, a CLB 102
may receive 32 to 64 inputs. CLB 102 may also include a
reconfigurable computational element 202, which may include
reconfigurable circuitry for performing a variety of predefined
operations on one or more of inputs 201, and may be configured to
perform one or more of these operations by downloading binary data
files from host workstation 104. CLB 102 may present the output
signal or signals as output 203, which in some embodiments may
include 32 to 64 outputs. As will be discussed below, some
embodiments of the present invention may be used to configure one
or more CLB 102s to perform a complex table lookup implementing a
behavioral model of a physical behavior.
[0029] Logic elements 102 may be implemented in a variety of
different configurations, such as having different numbers of
inputs or outputs. Similarly, while FIG. 1 depicts a single
emulator 101, that emulator 101 may in turn be comprised of a
plurality of smaller emulation circuit boards working in concert,
and/or may be combined with other emulators in a collaborative
arrangement. Other types of reconfigurable computing platforms,
besides emulators, may also be used. Embodiments of the present
invention may be used in any variety of platforms and
configurations.
[0030] Before getting into details regarding the example
embodiments, it will be helpful to understand the basics of several
general steps that may be found in some embodiments of the present
invention. In the first such general step, the user uses
workstation 104 to access a graphical user interface (described
below) to assemble a visual representation of a process using a
collection of predefined graphical icons. These icons represent
predefined algorithms, software functions, data structures, or the
like. The user places these icons in a graphical workspace, and
creates a number of interconnections between the icons to represent
the transfer of information and/or control signals, thus
effectively defining the flow of the desired process. In some
embodiments, the user accomplishes this by simply drawing symbols
on a display device using a pointing device. When the user has
finished preparing the visual representation of the desired
process, the system may enter the second general step. In the
second step, the system may automatically analyze the various
interconnected icons to construct computer code that will carry out
the user's desired process. In some embodiments, this code may be a
program of human-readable computer code (e.g., in the C, C++,
Pascal, Delphi, ADA, Fortran, etc. computer language) that will
carry out the user's process. To accomplish this, the system may
store one or more databases in memory 106 containing program code
segments corresponding to the various icons, as well as additional
characteristic (e.g., header) information relating to the
algorithms represented by the icons. The system may assemble these
code segments according to their orientation in the visual
representation. In further embodiments, the system may prepare a
machine-readable version of the program code, such as in a Hardware
Description Language (HDL) such RTL, Very large scale integration
Hardware Description Language (VHDL--an industry standard tool for
the description of electronic circuits in structural or behavioral
frameworks) and Structural Verilog, or a downloadable binary file,
that may be used to configure a reconfigurable computing device,
such as emulator 101, to carry out the desired process in hardware.
In preparing this machine-readable code, the system may
automatically analyze the user's process to determine an efficient
hardware configuration for carrying out the user's process. Through
this process, a user who is relatively unfamiliar with the
technical programming of a reconfigurable computing device may
easily create a hardware component custom-tailored to implement the
user's desired process. These general steps are discussed below in
greater detail.
[0031] An Example Graphical User Interface (GUI)
[0032] FIG. 3 depicts an example user interface that may be used to
create a visual representation of a desired process. This user
interface may include an overall workspace 300 through which the
user may graphically assemble an iconic representation of a
particular process that the user wishes to implement in emulator
101 hardware circuitry. Workspace may include control features,
such as menu bar 301, containing a number of control commands that
the user may wish to enter. In some embodiments, the particular
control features that are available are context sensitive, such
that command options are only displayed and/or available for
selection when they are contextually appropriate. Example functions
are described further below in connection with FIG. 8.
[0033] Workspace 300 may include a Library Icon Panel 302
containing a number of library element icons 303 representing
predefined algorithms that the user may use as "building blocks" to
construct the desired process. Library elements may be any type of
predetermined algorithm, such as a known mathematical function, a
computer function, or a computer subroutine. The library element
may also represent a previously-defined circuit that performs an
algorithm or carries out some process.
[0034] Workspace 300 may also include a Library Space 304, which
allows a user to manage the various icons 303 that are displayed in
the Library Icon Panel 302. The various library element icons 303
may be organized by category and/or subject matter to simplify the
process of locating a particular element. For example, icons
corresponding to mathematical functions may be located together in
one library, while other icons corresponding to predefined circuits
may be located in another library. In the FIG. 3 example, Library
Space 304 includes a pull-down menu of available libraries, and a
listing of the various libraries that the user has already
opened.
[0035] Workspace 300 may include an Abstraction Window 305, which
may serve as the area in which the user assembles the visual
representation of the desired process. The user does this by
placing various icons in the Abstraction Window 305, and by
defining relationships, such as data transfer and timing
relationships, between the icons.
[0036] Workspace 300 may also include a Collaboration Panel 306.
Collaboration may allow a number of individuals to simultaneously
work on the same project using different computer terminals. In
some embodiments, workspace 300 may be displayed on each of those
computer terminals. One of the terminals may be given a proverbial
"token," and may have control over workspace 300 while others may
view the display as it is modified. Alternatively, multiple
terminals may be given control over workspace 300, where the
terminals simultaneously update the various displays to reflect the
collaborators' changes. In some embodiments, different
collaborators may work on different aspects of an overall project,
and their individual computer terminal workspaces 300 may display
different portions of the graphic algorithm. For example, one
collaborator's workspace 300 may show an algorithm for calculating
a first value, while another collaborator's workspace 300 may show
a subsequent algorithm that uses the first value in a further
calculation.
[0037] Collaboration Panel 306 may include an area identifying the
various collaborators who are currently actively working on the
workspace 300, and may also include an area identifying the various
collaborators who are authorized to work on the same project.
[0038] Workspace 300 may also include an area, such as Status
Messages Panel 307, in which status messages, context sensitive
help, and/or other information may be provided to the user. For
example, context-sensitive help messages may be dynamically
displayed as the user positions a cursor or pointer over various
parts of workspace 300. Such help messages may also be displayed in
a pop-up window in proximity to the cursor or pointer, or the
messages may be displayed across both the pop-up window and the
Status Messages Panel 307. In some embodiments, the Status Messages
Panel 307 may display the current status of various collaboration
activities. Status Message Panel 307 may also be used to prompt the
user for certain types of information.
[0039] FIGS. 4a and 4b illustrate example library element icons
that may be used in various embodiments. FIG. 4a illustrates an
example icon 401 for an algorithm, such as one that performs the
following mathematical function: 1 x y 2 n + 1
[0040] This example mathematical function receives three integers
as input (x, y and n), and produces an output that is the sum of
the function (2n+1) for all integer values of n ranging from x to
y. As will be discussed further below, one unique feature of
certain embodiments of the invention is that it has the intrinsic
capability to perform floating point operations in conformance with
ANSI/IEEE Std-754 (IEEE Standard for Binary Floating-Point
Arithmetic).
[0041] Icon 401 may include one or more input data handles 402 to
represent the input data that is to be provided to the algorithm.
Using the above example, these inputs would be the values x, y and
n. For algorithms that require more than one input, a single input
data handle 402 may be used to represent all inputs. In alternate
embodiments, there may be multiple data input handles, and each
distinct input may have its own handle. Having separate handles
increases the complexity of the icon, but allows for an easy way to
see each distinct input to an algorithm.
[0042] Icon 401 may also include one or more output data handles
403. Similar to input data handles 402, output data handles 403
represent the output of the algorithm. Using the above example, the
output would be a single integer value representing the sum of the
function (2n+1) for all integer values of n ranging from x to y. An
algorithm having multiple outputs may be represented by an icon
having a single output data handle 403, or alternatively may be
represented by an icon having multiple output data handles 403.
[0043] Since icon 401 may represent just one algorithm that is used
with other algorithms in an overall desired process, there is often
a need to coordinate the timing of when the algorithm will be
performed, particularly when several of the algorithms may be
asynchronous in nature. Using the above example, the output of the
summation function may be needed by another algorithm, and as such,
that other algorithm may need to know when the summation algorithm
has completed its calculations. This sequencing of algorithms may
be accomplished using event trigger signals, which are signals
produced by algorithms to indicate their progress. An algorithm may
receive one or more input event trigger signals, and may produce
one or more output event trigger signals.
[0044] These event trigger signals may be represented in icon 401
as well. Icon 401 may include one or more output event handles 404,
representing the various event trigger signals that may be produced
by the algorithm. Icon 401 may also include one or more input event
handles 405, representing the various event trigger signals that
may be accepted by the algorithm.
[0045] Icon 401 shown in FIG. 4a is merely one example of how an
algorithm may be visually represented. Variations may be used in
other alternate embodiments, such as the addition or omission of
one or more handles discussed above, variation in the shape (e.g.,
circular, square, trapezoidal, three-dimensional etc.) of the icon
or handle, the placement of the handles (e.g., on edges, on
corners, external to the rectangle, etc.), the presence or absence
of labeling on the icons, etc.
[0046] The icon 401 shown in FIG. 4a represents an algorithm, or a
kind of active process. Such algorithms and processes will often
act upon some type of data, and as such, other types of icons may
be used. FIG. 4b shows an example of an icon 406 that represents
data. The data represented by the icon 406 may be any data,
database and/or data structure stored in a memory or other circuit.
Since data, by itself, does not perform any steps, it has no need
for input, output, or event trigger signals, and does not need the
corresponding handles described above. Connections to and from the
data icon 406 may simply be made to the icon itself. The same may
be done for algorithm icons 401 as well, although in such alternate
embodiments, there would preferably be some manner of
differentiating the input data, output data, input event trigger,
and/or output event trigger signals for ease of use. Such
differentiation may be accomplished by, for example, varying the
line width and/or color of the various lines
[0047] In addition to placing these icons in the Abstraction Window
305, a user will likely wish to identify how the various algorithms
and/or data structures are interrelated for the particular desired
process. The user may want to specify that the output of one
algorithm is to be the input to another, or that a particular data
structure is an input to yet another algorithm. The user may create
these relationships by simply drawing a connection line between the
various icons and their handles. A line drawn from the output data
handle of one algorithm to the input data handle of a second
algorithm indicates that the output of the first algorithm is the
input of the second. The lines may be given different appearances
based on the information they represent. For example, thick lines
may be used to represent data, while thin lines may be used to
represent event trigger signals. Other variations in format, such
as dashed lines, line color, multiple lines, arrows, etc. may also
be used to differentiate the lines.
[0048] For data structure icon 406, although no explicit handles
are shown in that example, connections may still be drawn between
the icon 406 and other input/output data handles to indicate when
the data is the input/output of an algorithm. These connections may
be referred to as data pipelines, where the input/output data may
be referred to as data elements, and the input/output event
triggers may be referred to as semaphores.
[0049] FIG. 5 illustrates an example of how these icons may be
assembled and interconnected to create a desired process. In this
example process, a circuit (a Multi-Channel Transport circuit)
captures an image using a variety of light-sensitive devices and
provides it to a first filter. The filter processes the image and
produces a filtered image that is then supplied to a second filter.
The second filter further processes the image, and provides the
twice-filtered image to another circuit (another Multi-Channel
Transport circuit) that finishes the process by displaying the
filtered image on a monitor.
[0050] As shown in FIG. 5, the user has placed the first circuit,
MCT Input 501, in the upper-left portion of the Abstraction Window
305. Since the MCT Input 501 circuit receives no external input,
and receives no input event trigger, its icon does not show handles
for these elements. In alternate embodiments, unused handles may
nevertheless be displayed to serve as a reminder to the user of
their availability, or to consolidate the types of icons that are
displayed.
[0051] The MCT Input 501 icon has an output data handle that is
connected to Image Data icon 502. The Image Data icon 502 is a data
structure icon, and its connection to the output data handle of MCT
Input 501 signifies that this data structure is the output of the
MCT Input 501 circuit (e.g., the data representing the image that
was captured by the MCT Input circuitry). This image data is also
connected to the input data handle of the first Pass Filter
algorithm 503, meaning that the Image Data 502 is provided as an
input to the Pass Filter algorithm 503. Pass Filter algorithm 503
also has an input event trigger handle, which is shown connected to
the output event handle of MCT Input 501. This connection may be
used to ensure that the Pass Filter algorithm 503 does not begin
its filtering until it receives the appropriate trigger signal from
MCT Input 501 (e.g., when the MCT Input circuit 501 has captured a
complete image).
[0052] Through this series of connections, the user can easily
define the particular desired process. The first Pass Filter
algorithm 503 may produce a filtered image that is output as Image
Data 504, and may supply an output event trigger signal to a second
Pass Filter algorithm 505. The second Pass Filter algorithm 505 may
receive the filtered image from Image Data 504, and upon receipt of
the appropriate input event trigger signal, may perform a second
filtration on the image. The second Pass Filter algorithm 505 may
output the twice-filtered image directly to another algorithm, MCT
Output circuit 506, and may also supply it with an output event
trigger signal as well. Upon receiving the appropriate trigger
event signal, MCT Output circuit 506 may complete the process by
displaying the twice-filtered image on a monitor.
[0053] When icons are placed in Abstraction Window 305, some
embodiments of the present invention will permit users to access
help information by right-clicking on the icon. Thus, for example,
a user may click on the icon to quickly see the types of input data
required for the algorithm represented by the icon, the types of
output data produced, and whether any trigger events are produced
or used by the algorithm. This help information may also provide
contextual information explaining how the algorithm works and/or
what the algorithm does. This help information simplifies the
user's task of assembling the algorithms necessary for the desired
process, and producing a logical graphical representation that can
ultimately be converted to working computer code. Additionally, in
some embodiments, a user may right-click on a portion of an icon,
such as a handle, and obtain help specific to the particular
portion or handle that was clicked. For example, a user might click
on an icon's output event handle and see a message informing the
user that the algorithm represented by the icon produces an output
trigger signal, and may inform the user of the characteristics of
this output signal (e.g., how many signals are produced, the type
of signal, when they are produced, etc.).
[0054] Generating Computer Code
[0055] The example graphical user interface described above
provides an easy way for a user to conceptualize and assemble a
visual representation of a desired process. Once this visual
representation is completed, however, the user may wish to have an
executable computer program to carry out the process and/or format
a reconfigurable computing platform to execute the process in
hardware. The following description addresses various aspects that
may be used for this process.
[0056] To help illustrate an example process of preparing such
computer code, FIG. 6 depicts an example flow diagram showing steps
involved in generating computer code corresponding to the user's
desired process. The example process begins with an initialization
step 600. The step represents the preparation necessary to support
the graphical assembly of code described above.
[0057] Several databases may be created during initialization and
stored in a computer-readable medium, such as memory 106. One such
database, referred to herein as the Code Database 109, may store
individual segments of executable program code. Each segment may,
when executed, carry out the performance of a predefined algorithm,
such as the summation algorithm described above. The segments of
code may be written in any computer language, such as C++, and
there may be multiple segments for each algorithm. For example, the
Code Database 109 may store multiple versions of the summation
algorithm, to allow compatibility with a wider variety of software
and hardware.
[0058] The individual code segments may require a number of
input/output arguments and variables. To allow for
interchangeability, the code segments may be stored in Code
Database 110 with generic placeholder values for these arguments
and variables. As will be explained below, these placeholders may
be replaced with actual values as the code segments are assembled
into a final program.
[0059] Another database that may be created is the Header Database
110. The Header Database 110 may specify the header format for each
code segment stored in the Code Database. The header format may
provide characteristic information regarding the algorithm, such as
the number and types of input/output arguments. For example, the
Header Database 109 may contain the following header for a C++ code
segment implementing the summation algorithm described above,
showing that the algorithm receives three integer values (n, x and
y) and produces a single integer output:
pmc_int summation(int, int, int)
[0060] Header Database 110 provides a rapid way for the system to
determine what input/output data is appropriate for each given
algorithm, and may be used during the compilation process to ensure
that the user properly identifies all necessary inputs/outputs.
Although Header Database 110 is shown separate from Code Database
109, the header information need not be stored separately. In some
embodiments, the header information may simply be stored with the
code segments in the Code Database, and Header Database 110 might
not even be created. This may save memory space, but may lead to
slightly longer compilation times. The Header Database information
may also be used by the contextual help facility.
[0061] Another initialization task that may occur is the
association of the various algorithms with one or more graphical
icons. These icons, such as summation icon 401, may be used to
visually represent the icon in the workspace 300. In some
embodiments, the icons include predefined images, such as the
summation symbol (".SIGMA."), that may help the user easily
identify the particular algorithm being represented. These various
initialization tasks may be performed by a computer program,
sometimes referred to herein as a "librarian," that manages the
various databases and/or libraries available in the system.
[0062] Once the various code segments and databases are prepared,
the process may then move to step 601, in which the user
graphically assembles the various icons to create the desired
process. The user may add icons representing the various
algorithms, as well as interconnections showing the flow of
input/output data and event trigger signals. As the user adds a
connection between two icons in the Abstraction Window 305, the
system may consult the database(s) to determine the types and
numbers of input/output data required by each icon's respective
algorithm, and may inform the user when the user attempts to
provide incompatible data variables, such as connecting an icon's
output of type "a" with another icon's input of type "b." This
check may be performed by comparing the header information for the
algorithms. In some situations, an algorithm's output will match
precisely another algorithm's input (e.g., one algorithm outputs a
single data element of type "a," and the user connects that output
to an input of an algorithm that accepts a single input of type
"a").
[0063] In other situations, there may be a difference in the number
and/or types of output/input at either end of the connection. In
such situations, the system may prompt the user to supply
information regarding how the various arguments are to be
distributed. Using the connection between the output data handle of
Pass Filter 503 and the input data handle of Pass Filter 504, if
Pass Filter 503 outputs three arguments of type "a," and Pass
Filter 504 requires only two inputs of type "a," the user may be
prompted to identify which of the Pass Filter 503 outputs are to be
the Pass Filter 504 inputs. This identification information may be
stored in the netlist. As another example, if Pass Filter 504
requires four inputs, the user may be prompted to identify which of
the four inputs are provided by Pass Filter 503, and may be
reminded that Pass Filter 504 requires a fourth input that has not
yet been assigned. To assign this additional input, the user may
simply create another connection between Pass Filter 504's input
data handle and whatever source is to provide this additional
input. Again, this argument information may be stored in
netlist.
[0064] In some embodiments, the icons are displayed in the
Abstraction Window 305 with a unique name to identify that
particular instance of the algorithm. For example, the summation
icon 401 may be displayed with the following legend:
"summation.sub.--01." The user may choose the unique name, and the
system may also automatically generate a custom name for the
algorithm.
[0065] When the user has completed the process of creating the
graphical representation of the desired process, the system may
then move to step 602, in which the user's graphical representation
is analyzed to generate a network description, or netlist, to be
used in further processing. This analysis may be performed by a
separate software process, referred to herein as the "analyzer."
The netlist may contain information identifying the various icons
that the user placed in Abstraction Window 305, an identification
of the icons' corresponding algorithms and/or data structures,
identification of the data and/or event trigger signal transfers
that the user specified, and may also store positional data
regarding the placement and arrangement of the various icons and
lines.
[0066] In generating this netlist, the system (or the analyzer) may
check to make sure that all of the required data arguments and/or
variables are accounted for, and may prompt the user when an error
or missing argument has been detected. In some embodiments, the
netlist may be a high-level code database containing function
prototype calls with blank (or placeholder) argument values for the
necessary arguments. An example netlist used in some embodiments
appears further below, in connection with the discussion of the
thespian analogy.
[0067] In some embodiments, the netlist may be generated by a
Netlist Builder routine that may be crafted as a compiled PROLOG
program. This routine may access the libraries of information
corresponding to the various icons in the graphical representation,
and retrieve information to generate a netlist "node" data
structure. The node data structure may include information
necessary to affect an interface of the symbol into the matrix
formed by the resulting netlist. This matrix definition may contain
grouping, data flow and data type information that is needed for
the downstream processing utilities, and may include a symbolic
token ID, the number of input ports, the format of the input ports,
the number of output ports, the format of the output ports, the
time of execution (which may be in a predefined standard time unit,
such as nanoseconds), and a pointer to a location of help
information for the particular symbol. If the Netlist Builder
cannot define an interface between two nodes due to mismatches in
data types or parameter counts, the discrepancy may be flagged and
presented to the user for resolution. Such resolution may include
modification to the original algorithm design or the development of
one or more new library entries.
[0068] Embodiments of the present invention may also include an
Input/Output Definition File to provide information to the Netlist
Builder concerning the input-output and memory requirements of the
library entry. The file may be formatted as follows:
1 // ***** DeltaV_Adder.ios *******************************-
************ * // * IO specification file for the DeltaV floating
point adder entry * // * Copyright (c) 2003 Mentor Graphics
Corporation * // * All rights reserved. * //
******************************************************************
* // Identity information info_symbol "DeltaV::adder" // library
symbol string info_id "DeltaV::1001" // library index entry
info_version "1.0.1" // version number info_status "RELEASED" //
release status info_date "28-Aug-2003" // date of current status
info_author "Mentor Graphics Corp." // library entry author
info_technology "MGVS" // target technology name // Library
security information security PROTECTED // write-delete status
encryption NONE // source encryption // Timing information
parameter_latency 27 // execution latency 27nS parameter_setup 2 //
minimum setup time 2nS parameter_hold 2 // minimum hold time 2nS
parameter_min_clock 20 // mimimum clock period 20nS // Inputs and
Outputs parameter_inputs 2 // it has two input ports
parameter_outputs 1 // it has one output port parameter_in_width 32
// it accepts 32-bit input parameter_in_width 64 // it accepts
64-bit input parameter_out_width 32 // it outputs 32-bit data
parameter_out_width 64 // it outputs 64-bit data
parameter_io_format IEEE754 // uses IEEE-754 float data
parameter_in_event NONE // it uses no event triggers
parameter_out_event NONE // it generates no eventsparameter_in_prop
NONE // it uses no props. parameter_out_prop NONE // it generates
no props. // Memory interface memory_discrete NONE // no external
discret mem. memory_shared NONE // no external shared mem.
[0069] In some embodiments, the netlist generated by the Netlist
Builder may be further optimized using another routine, called a
Semantics and Structure Analyzer, which may also be crafted as a
compiled PROLOG program. The Semantics and Structure Analyzer
(hereafter, SSA) may accept as its input the netlist produced by
the Netlist Builder (which may be just a "first pass," or initial,
netlist). It may also accept a symbols library and a Semantics and
Structure rules library (SSRL). The SSA is an artificial
intelligence application that applies the rules found in the SSRL
to the first pass netlist and determines the most efficient manner
to restructure the netlist for hardware implementation. In
particular, the SSA may determine which data paths in the netlist
are serially dependent and which are not, and may adjust data type
parameters of each netlist node such that information is properly
passed among the nodes. The SSA can also ensure that the resulting
netlist is compliant with the generally-accepted rules of
mathematics.
[0070] In some embodiments, serially-dependent data paths may
require that their related nodes be clustered together and
structured in a pipelined manner for hardware efficiency and
fidelity of the algorithm, and the SSA may repartition the netlist
such that the serially-dependent sub-sets are isolated from those
nodes with no serial dependencies. Non-serially dependent data
paths may be instantiated as semi-autonomous hardware blocks that
may operate in parallel with each other and with the serially
dependent blocks. The ability to restructure the operational
elements of the algorithm based upon data dependency ensures
maximum possible performance by utilizing parallel hardware and
pipelining to the greatest possible extent. The output may be a
netlist with pipelined serial segments and parallel non-serial
segments
[0071] The output of the SSA is a spatially-architectured netlist
that embodies the original user algorithm, and may be in a
language-independent format. The optimizing feature of the SSA then
reviews the resulting netlist to determine if there is any
redundant hardware. Based on timing estimates derived from each
library elements "execution time" entry (stated in standard time
elements) identical hardware instantiations that spend most of
their time "waiting" are shared by inserting data multiplexors into
the netlist. The result of this optimization is blocks of hardware
that are never exercised and are therefore deleted from the
netlist.
[0072] When the netlist is ready, it may then be passed on to a
Distiller/Behavior Generator (DBG) software program in step 603.
The DBG analyzes the netlist and the various algorithms identified
therein, and extracts the corresponding program code segments from
Code Database 109. The DBG may substitute data variable values for
placeholders in the code segments (or may leave placeholders as-is,
depending on implementation), and then each of these segments may
then be passed to a conversion utility that converts the code
segments from their current format to a format more suitable for
implementing the process in hardware. For example, the PRECISION C
program, of Mentor Graphics Corporation, is able to convert
computer code from the C programming language to a block of
Register Transfer Level (RTL) code that implements the process in
digital electronic elements. Other conversion utilities, such as
Los Alamos National Laboratory's STREAMS-C, Coloxica's HANDLE-C,
Y-Explorations' EXCITE, and Synopsis's SCENIC, may also be used to
perform some of the conversion process. At this stage, the code
prepared by the DBG program may still include one or more
placeholder variables that can be addressed by the Spatial
Architect discussed further below. Further details regarding
features found in the PRECISION C program may be found in U.S. Pat.
No. 6,611,952, entitled "Interactive Memory Allocation in a
Behavioral Synthesis Tool," and copending, commonly-assigned U.S.
patent application Ser. No. 10/126,911, filed Apr. 19, 2002,
entitled "Interactive Loop Configuration in a Behavior Synthesis
Tool," and Ser. No. 10/126,913, filed Apr. 19, 2002, entitled
"Graphical Loop Profile Analysis Tool," the disclosures of which
are hereby incorporated by reference.
[0073] The DBG may require configuration information to identify
the target hardware in order to select and use the appropriate code
segments. For example, the user may need to inform the DBG of the
type of reconfigurable hardware, the number of units it contains,
the type of memory it needs, etc., so that the DBG knows what kind
of hardware will be running the process, and can extract the
correct type of code segment for use. The output of the DBG may be
individual code segments in a hardware format, such as RTL. RTL is
a superset of both VHDL and Verilog hardware description languages.
It is readily synthesized into formats (using any number of
commercially-available compilers) suitable for hardware
instantiation.
[0074] In some embodiments, the Code Database 109 may store code
segments in RTL format, in which case the DBG might not be needed
for the conversion. For example, technology libraries may be
written for use with the Precision-C user's library. Additionally,
emulator primitives may be provided by the manufacturer of the
particular target hardware, and those primitives may also be stored
within Code Database 109.
[0075] In alternative embodiments, the DBG may output the code
segments in a high-level format, such as the C++ programming
language. The high-level format may then be compiled and executed
on a general-purpose computer (as opposed to reconfigurable
hardware), allowing the particular process to be tested even before
it is converted and downloaded into the reconfigurable hardware,
potentially saving time if an error is detected. For example, the
code may be output in an ANSI C format. The ANSI C output format
may be used with "pure" C compilers, when the purpose is to produce
a C program that will run on a conventional computational platform.
This program may be used, for example, for debugging the algorithm.
Alternatively, the code may be output as Structural Verilog.
Targeting structural Verilog may simplify the use of the algorithm
in high-end logic emulation systems and in the translation into
ASIC (Application Specific Integrated Circuit) form.
[0076] As part of the DBG's operations, an Output Formatter routine
may be written in tcl/Tk to accept the optimized netlist from the
SSA and the users output language selection, and build a table of
information for each node in the netlist. From this tabular
information it may extract the output code from one of the product
libraries. Each library entry may contain a sub-section of code for
each target language. In some embodiments, the root language for
developing library entries is "pure C," which is the dialect of the
C programming language that is fully supported by BOTH C and C++
compilers.
[0077] Then, in step 604, the various blocks of RTL code may be
passed to another program, referred to as the Spatial Architect
utility. The Spatial Architect takes the blocks of RTL code, as
well as the netlist data (which identifies the various data
input/output assignments for each algorithm), and determines the
best way to assemble the code fragments into a monolithic block of
code representing the user's desired process. In doing so, the
Spatial Architect accesses the netlist to obtain the necessary
data/event trigger transfers, and may stitch the individual code
fragments' port sections together such that the necessary
input/output data transfers are implemented.
[0078] The Spatial Architect may also make modifications to add
security parameters, such as the introduction of encryption,
password features, serial numbers, etc. into the code, and can also
add code for handling input/output (IO) capabilities. For example,
the Spatial Architect may note, from the netlist, that a particular
process is to receive an input from a satellite data receiver. The
Spatial Architect may access a library of predefined code (such as
from Code Database 109) and retrieve code segments, such as
software drivers or "Transactors," that interact with the satellite
data receiver and produce a predefined type of output. The Spatial
Architect may automatically insert this code as the source of input
to the algorithm. If the output from the satellite data receiver
code is not of the proper type (e.g., an integer output when a
floating point input is needed), the Spatial Architect may include
predefined code for converting data types, and may apply some of
this predefined code to match the input/output.
[0079] The Spatial Architect may also make certain decisions
concerning the manner in which the various algorithms will be
implemented in hardware. As one example, the Spatial Architect can
examine the netlist to determine whether a particular data
structure should be instantiated as a single- or multi-ported
memory. Referring again to the process shown in FIG. 5, Image Data
502 represents data that is accessed by two distinct algorithms:
MCT Input 501 and Pass Filter 503. When this data element is
instantiated in hardware, it may be instantiated as a multi-port
memory, with a separate port for each separate algorithm that will
need access to the memory. In alternative embodiments, some or all
of this analysis may be performed by the DBG.
[0080] If Image Data 502 and 504 are both of the same type, the
Spatial Architect may decide to instantiate both memories as a
single circuit. In this way, circuit components may be conserved,
but a slower operating speed may result, as both processes will be
sharing the same circuit for storage of their images. As an
alternative, the Spatial Architect may instantiate the memories as
two distinct circuits. Doing so allows for a faster operation,
since the two algorithms can now be pipelined for streamlined
operation. Pipelining refers generally to situations where two
algorithms may be sequential within a single process (such as the
two Pass Filters in the FIG. 5 example), but where both algorithms
may operate simultaneously as data is "piped" through the
abstraction. For example, while the second Pass Filter 505 is
processing the Image Data 504 produced by the first Pass Filter
503, that first Pass Filter 503 may move on and begin processing
the next Image Data 502. In this streamlined manner, sequential
algorithms may operate simultaneously, increasing the throughput of
the overall process. The decision between size and speed may be a
configuration option chosen by the user.
[0081] To determine whether particular algorithms are capable of
being pipelined, the Spatial Architect may examine the process to
determine whether any data dependencies exist between the
algorithms. In general, a data dependency exists when two or more
algorithms require access to the same data element. FIG. 7
illustrates an example process having a data dependency. Image Data
701 is written to by both MCT Input 702 and Pass Filter 703, and as
such, those two algorithms are data dependent on one another and
cannot be pipelined for simultaneous operation. If desired by the
user, the Spatial Architect may assemble the RTL code in a manner
that instantiates non-data-dependent algorithms in parallel
hardware. This assembly may be performed based on the directions
provided in the netlist.
[0082] In some embodiments, the Spatial Architect (or other system
software, such as the librarian) may store this block of code in
Code Database 109, and may create an icon associated with it such
that the user's desired process may be used as an icon in the
future. This flexibility allows the user to create an adaptive,
up-to-date library of algorithms.
[0083] When the Spatial Architect has prepared the block of RTL
code representing the user's desired process, this block of RTL
code may then be passed on, in step 605, to a hardware compilation
manager that can compile RTL code into a format suitable for
downloading into the target emulation system. For some emulation
systems, this downloadable format is a binary file that sets forth
the "routing tables" for the various memory elements 102 of the
reconfigurable hardware 101. One such compiler is the VIRTUAL WIRES
series of compilers offered by Mentor Graphics Corporation. One
piece of information needed for this process is the identity of the
reconfigurable hardware 101 that is to be used (since different
manufacturers may have different ways of configuring their
hardware). The user may be prompted for this additional information
at any stage in the process.
[0084] The compilation manager may also generate one or more
scripts that may be used to download the compiled code into a
reconfigurable platform 101. In this manner, the scripts and binary
files may be generated at one location, and distributed to the
locations of the reconfigurable hardware for execution and loading.
This may avoid the necessity of having additional development
stations at each reconfigurable hardware location. Then, in step
606, the various scripts may be executed on a workstation (such as
workstation 104) to configure the reconfigurable hardware 101.
[0085] Several advantages may be realized by this process. For
example, the binary files that are used by typical reconfigurable
computing platforms 101 are near impossible to reverse engineer.
This is due to the fact that the binary code is essentially the
"truth table" contents of the various elements 102 in the
reconfigurable platform and include not only the algorithm, but all
of the routing and timing data for signal multiplexing as well; by
its nature an unintelligible string of ones and zeros. Anyone
intercepting these download files would need to know at least the
specific hardware configuration of the target reconfigurable
platform, all the compiler switches and have access to the original
library elements to even begin to decipher the string of ones and
zeros. Accordingly, these binary files offer a secure way to
transmit signals intelligence analysis (SIA) information. A
plurality of target hardware stations may be placed around the
world, and whenever a user modifies a process to generate a new
download file and process, the user can use insecure channels to
transmit that download file to the worldwide hardware stations, and
have reasonable confidence that the transmitted algorithm is still
secure. To further increase security, some embodiments of the
present invention may still encrypt the download files, and may
also use authentication such as RSA Corporation's SecurID
protocol.
[0086] The discussion above gives illustrative examples of several
embodiments of inventions disclosed herein. However, those of
ordinary skill will readily see that many variations may be made.
For example, in an alternate embodiment, workspace 300 may be
displayed on a display 107 having a screen that can detect the
presence of a pointing device, such as a stylus. The user may use a
stylus to handwrite symbols in Abstraction Window 305. In such an
embodiment, the system may employ handwriting recognition software
to detect when a user has drawn a predefined symbol, such as one of
the icons 303. Upon detection of such a symbol, the system may
automatically consult the various libraries to assemble the
computer code necessary for implementing an algorithm represented
by the icon. In this manner, the user need not drag-and-drop the
predefined icons 303 into Abstraction Window 305, but instead can
simply draw them by hand--much like the way an instructor may write
on a chalkboard. In such alternative embodiments, Icon Window 302
need not even be displayed, or may be displayed simply as an assist
to the user who is writing in the Abstraction Window 305. The
necessary computer code can be dynamically assembled as the user is
writing in the Abstraction Window 305, allowing for the rapid
preparation of computer code to implement the author's
algorithm--without requiring the author to be proficient in
computer programming. Furthermore, as a user writes out the various
symbols, the system may automatically output high-level (e.g., C,
C++, ADA, etc.) code representing the symbol's algorithm and/or the
entire process thus far, and/or may output lower-level code
versions of the same, such as VHDL or RTL. As a user edits and/or
deletes from the image being drawn, the system may even
automatically erase the code segments that it had prepared in
response to the user's creation of the symbol. The computer system
can thus serve as a natural, and near invisible, assistant to the
author such that the author need not even know how to program a
computer or reconfigurable platform.
[0087] A variety of input formats may be used, in addition to (or
instead of) the ones described above. For example, inputs may be
provided in three types. The first, referred to herein as Type-1,
format may be the netlist described above. It may be a
language-neutral intermediate format that treats each node as a
call to the various algorithm libraries. Type 1 format nodes may be
referenced in an existing library, such as one of the following, to
support their use:
[0088] Theater Library
[0089] Stage Library
[0090] Actor Library
[0091] Prop Library
[0092] Directions Library
[0093] Core Math Library
[0094] Optional Application Libraries
[0095] User Defined Theater Library
[0096] User Defined Stage Library
[0097] User Defined Actor Library
[0098] User Defined Prop Library
[0099] User Defined Directions Library
[0100] User Defined Core Math Library
[0101] User Defined Optional
[0102] A second type, Type 2, may be a vector, bitmap or other
visual graphics format, including JPEG, GIF or BMP formatted
documents. Type-2 formatted input can come from any type of
graphics (drawing) program, web page image captures, etc. In some
embodiments, an interactive digital whiteboard may be used to
generate such images. This commercially-available device (e.g., the
Panasonic KX-BP800) provides a large drawing surface in the form of
a whiteboard. The image drawn on the whiteboard is then converted
into a bitmap or vector image and transferred, upon command, to the
host computer via an RS-232 serial interface. Alternatively, a
digitizing tablet may be used. The digitizing tablet is typically
interfaced to a graphics program and the output is then saved in
either a bitmap (.bmp) or vector (jpg, tif, gif) image format (also
Type 2 formats).
[0103] The Type 2 formats may produce visual images that need to be
converted to a logical form (e.g., Type 1) for further processing.
Conventional Optical Character Recognition (OCR) software (such as
those offered by ScanSoft Corporation) may be used to scan these
images and convert the image into a series of image tokens, where
each token represents a single character from the image. The user
may then review the captured image on the computer screen and makes
any necessary corrections or adjustments, and then accept the
corrected tokenized image.
[0104] The tokenized image may then be passed to an Equation Parser
(EP) where it is analyzed syntactically and structurally and parsed
into token groups that represent the parenthesized equation(s). At
this point superscripts and subscripts may also be structured into
the new image. The re-tokenized image may be presented to the user
for concurrence or adjustment (as may be needed).
[0105] The Netlist Builder (NB) may consult a symbols database
(discussed below) that compares the tokens in each token group with
its contents to determine if a hardware instantiation for each
token (or token group) exists. Where no hardware instantiation
exists the user is prompted to create one as described above. Once
all tokens or token groups have associated library elements the NB
may output its "first pass" netlist of the algorithm.
[0106] The third type, Type 3, may be a plain ASCII text file in
which equation elements are specified using normal keyboard
characters and macro definitions. By using the internal reference
names for the symbols in the symbols library, the user may elect to
manually enter an equation using only a simple ASCII text editor.
The practice is analogous to manually entering equations using
Mathmatica or MatLab. This may be useful if the user is working
with a device that cannot run a graphics program capable of
producing a Type-2 format output (e.g., using a PDA or handheld
organizer).
[0107] The Macro Expander (hereafter, ME) may be a utility crafted
in tcl/Tk that accepts the output of the EP, ASCII text file or
graphical authoring utility described above and expands the
equation macros into a Type-1 data file.
[0108] As a further feature, the system may be expandable. The
system software, which may be the librarian discussed above, may
update its libraries of algorithms and processes as the user
creates them. In some embodiments, when a user has decided that a
particular process is worth saving, the librarian may automatically
store the code segment(s) that it assembled for the process, and
may add it to the library of available algorithms. In this manner,
the user may access dynamic, up-to-date libraries of the various
processes and algorithms she has created.
[0109] To facilitate expandability, some embodiments may use a
"mainframe" and "snap-in" modular approach to the software code.
The mainframe may allow simultaneous revisions to the various
processes described above, and may provide a consistent foundation
for adding features and functionality embodied in modular "snap-in"
code. For example, in some embodiments, a core mainframe program
may include a Tool Command Language (TCL) and/or Tool Kit (TK)
scripting engine to allow for internal scripting. Some snap-ins may
be written in TCL/TK scripting form, as opposed to, for example,
the higher-level C++ language. The mainframe may also include code
for generating the workspace 300 described above, and its related
features. The mainframe may also include code for managing the
various libraries of algorithms and processes, and may include some
basic libraries such as basic math functions, architecture
functions, and/or input/output functions for transfer of data
between a target hardware and its host (workstation). The DBG and
Spatial Architect described above may also be incorporated in the
software mainframe, as well as a compilation manager, which may be
a TCL/TK snap-in that generates script files for performing various
compilation steps associated with the creation of binary download
files for the target hardware. The compilation manager may also
supervise execution of the scripts on the target hardware's host
workstation or other compilation station. The mainframe may also
include a snap-in coordinator to manage the various snap-ins and
coordinate their activities, and may also serve as an interface to
the license manager(s) (if any) required by software used in the
system.
[0110] In some embodiments, a data collection algorithm may be
defined to represent an "unknown" algorithm whose process is under
study. For example, in studying an unknown physical phenomenon
(example discussed below), the user may wish to create a process
having a large number of known behaviors or algorithms, and these
algorithms may provide their outputs to the "unknown" data
collection algorithm. The "unknown" data collection algorithm may
simply include a process for collecting and/or recording the data
it receives, such as by placing it in a predefined data structure.
The "unknown" algorithm may also include logic to react to certain
predefined conditions, such as sending an alert signal when a
received input exceeds a predefined amount. The data collected by
the "unknown" algorithm may subsequently be analyzed to discern
patterns that may help the user define the behavior under study.
For example, a researcher may be interested to know how a variation
in temperature may affect a particular physical mass as a whole.
The user may already know how individual portions of the mass
react. Using an unknown data collection element, the user can
define a process to simulate variations in temperature, and cause
sample temperature data to be collected by the unknown data
collection element. The data collected by this element can then be
studied to discern a behavioral pattern to the mass' thermal
characteristics.
[0111] In some embodiments, the user may be given a greater degree
of control over the amount of serialization of the various nodes in
the netlist. The Spatial Architect (SA) may provide a tool that
allows the user to adjust the architecture of the algorithm, as it
will be instantiated in hardware. To accomplish this, the SA may
work on the netlist after it has been processed by the Semantics
and Structure Analyzer (SSA). For example, the SA may scan the
netlist and identify the various serially-dependent nodes, and
display them onscreen in a graphical manner that depicts their
dependencies. For example, the workspace 300 may be used to display
the nodes on the computer screen in a manner where the Y-axis
(vertical axis) represents time and the X-axis (horizontal axis)
represents parallel displacement. The SA may display data flow by
connecting the nodes with lines of varying weight and color, with
the line weight indicating the relative width of the data transfers
in bits, and the line color indicating data dependencies; none,
serial, pipelined, etc. Other visual representations may be used as
well. Using a pointing device, the user may move the icons
representing the netlist nodes around within the workspace 300.
Orientation of the non-serially dependent nodes in time allows for
optimization in later steps. When the user is satisfied with the
spatial and time orientation of the nodes, the may be called again
to scan the netlist for hardware elements that, because of their
time displacement, may be shared. The data flow of the netlist may
then be modified by including multiplexors in the logic, and a new
version of the netlist may be produced. In some embodiments, the
user may, capacity permitting, elect to split the input data set
and prepare multiple instantiations of the algorithm. The SA
includes a "replicate" option that will create multiple copies of
the netlist in parallel in the hardware, separating them by
isolating their 10 facilities.
[0112] Further embodiments may also include a graphical Memory Map
utility (hereinafter, "MMU"). The MMU may display the finished
netlist on the screen, and the user may then determine which nodes
should use autonomous local memory and which should use shared
memory. For any node, the user may request to see the node's
embedded memories on the computer screen by, for example,
"control-left-clicking" on the node. The user may select a specific
memory and determine if it should be instantiated as a local,
protected memory, or a shared global memory. Memory use may be
graphically identified in a variety of formats, such as by color
and border style. In some embodiments, the user may simply draw a
rectangle around the various nodes that are to be in a shared
memory or local memory. When the user attaches a global memory
resource to a node it causes the NB to generate (synthesize) a
multi-ported memory. For each node connected to the memory, a
unique port is generated to that memory. Arbitration on shared
memories may be determined by node ID number. When multiple nodes
desire to access the memory at the same time the node with the
lower ID number may be given priority. After all nodes have had
their access to the memory (on that bus cycle) the process repeats
the next time multiple nodes conflict. Local memories require no
arbitration.
[0113] As a further alternative, a Library Builder (hereinafter,
"LB") program may be written, for example, in C++ to carry out
various library management functions, and may serve as a database
manager. For example, the following types of libraries may be used:
Theater, Actor and Prop libraries defined by the user; direction
libraries defined by the user or supplied by an Original Equipment
Manufacturer (OEM). Application libraries may also be used, such as
OEM core math libraries and other application libraries, or
user-generated libraries. Referring to the Theater Abstraction
concept presented above, the Direction, Prop, Actor and Theater
libraries may be collections of completed algorithms that have been
saved as discrete entities for later use. The LB may store these
library entries in a tree structured database.
[0114] The application libraries may be somewhat different. Since
they are the core building blocks for actors, props, directions,
theaters, etc., they may be written in "pure C" and then translated
using commercially-available translation utilities into RTL. The
RTL may then be translated into structural Verilog using a
commercially-available synthesis tool. The new library entry may
thus end up with three forms: C, RTL and Verilog, each of which may
be maintained in the database(s) described above. Since each
library entry may be entirely autonomous, there is no need to
manage memory or memory sharing outside the library entry, thus
simplifying its maintenance and instantiation.
[0115] The Library Builder may manage the libraries above as a tree
structure. For example, the library master index may be at the top
of this structure, and there may be a number of branches to the
tree. Three possible primary branches of the tree are protected,
secure and open. The "protected" library entries may be read by any
user but can only be written to by the library creator (Mentor
Graphics). The contents of the "protected" library are those
entries that are directly supported by the library creator. The
"secure" libraries are those that are created by the user but for
reasons of security have restricted read access. The "open"
libraries allow both read and write access to all authorized
users.
[0116] FIG. 8 illustrates an example hierarchy of a library
structure. In actual use, some embodiments could include thousands
of entries. Each library entry may consist of multiple files, each
of which has a distinct function. In order to keep the library
organized, each entry (symbol) has a unique director (as noted in
the FIG. 8 diagram). The individual library entry structure
(including superior directories leading to it) may be as indicated
below (the reference to "theater" will be described below):
2 Library_Root (directory) Protected (directory) Core_Math
(directory) Arithmetic (directory) Adder (top directory) Adder.ios
(io-specification file) Adder.ico (icon file) Adder.sym (symbol
file) Adder.hlp (help file) Adder_C ("c" directory) Adder.c (source
file) Adder.h (header file) Adder_CPP ("c++" directory) Adder.cpp
(source file) Adder.hpp (header file) Adder_RTL ("RTL" directory)
Adder.rtl (source file) Adder_V ("Verilog" directory) Adder.v
(source file)
[0117] A number of databases may also be stored and used to support
the various features described above. For example, a Symbol Library
may be a graphics library that contains all the symbols
recognizable by the OCR engine for handling Type 2 data. The OCR
engine compares the entries in the symbols library with the symbol
under conversion to determine its identity. Maintenance of the
symbol library may be handled by the OCR engine embedded in the
product. A Rules Database may be a non-structured, non-indexed
collection of PROLOG rules that effect the operation of the
Equation Parser contained within a single ASCII text file. It may
be maintained with any ASCII text editor. A Macro Database may be a
b-tree organized, indexed random access database driven by the
Microsoft "JET" database engine, or alternatively, any OLDB
compliant database engine using SQL constructs and semantics. This
database contains the methods of expanding the equation macros
(single symbols or their text representation) into core math
elements found in the main libraries. It is initially populated by
the OEM and then maintained by the user. A Netlist Symbols Database
is a b-tree organized, indexed random access database driven by the
Microsoft "JET" database engine, or alternatively, any OLDB
compliant database engine using SQL constructs and semantics. This
database contains the methods of expanding internal primitive types
in the Type-I data into target language objects. This database may
be initially populated by the OEM and then maintained by the user
with the system software, such as the librarian. Some or all of the
database and/or librarian functions described above may use
database engines, such as the Microsoft JET engine, for
management.
[0118] A user's desired process essentially seeks to accomplish, or
act out, some behavior. To help users who may be unfamiliar with
computer programming concepts, the development process may be
analogized, in some embodiments, to a thespian stage production,
where the "play" (e.g., "Romeo and Juliet") represents the process
to be "acted out." FIG. 9a shows a hierarchy diagram illustrating
how the user's desired process may be abstracted and analogized to
a theater production. The overall project may be referred to as a
production 901. A production may be created using a computer
workstation 104 and/or mainframe by the end user, and may organize
libraries and source files that are used by the overall process.
Within a production may be a number of Theaters 902, and within
each theater may be a number of stages 903. In some embodiments, a
first theater (Theater A) may represent a local site, such as the
system on which the development is to take place, while other
theaters (e.g., Theater B) may be either remote or local.
[0119] The various theaters and stages on Broadway are different
locations in which events may be acted out, and in keeping with
that analogy, the distinct theaters and stages in the FIG. 9a
production may represent distinct areas in which events may take
place. In some embodiments, each stage may have its own visual
representation and Abstraction Window 305, and their resulting
circuitry may each be instantiated as distinct circuits. Data
connections may exist among theaters and stages to allow them to
exchange control and/or data signals. Collaboration Stages may
effect the virtual interconnection of the various theaters,
allowing them to communicate with one another through a consistent
mechanism. Users in different locations may share the Collaboration
Stage to work together on a particular process. In some
embodiments, separate stages may be created for Input and Output.
These stages may represent the physical mechanism by which the
system, or theater, receives or supplies information. For example,
the FIG. 5 process may be an Input Stage for the capture and
initial processing of image data. If a particular production
employs multiple theaters and/or stages in a single piece of
hardware, the various theaters and stages may share the use of a
single Input Stage and Output Stage.
[0120] On any given stage, there may be a number of actors 904.
Actors 904 represent the algorithms that carry out some predefined
functionality. These algorithms may be control-enabled or
autonomous. Control-enabled algorithms await the receipt of one or
more event trigger signals prior to execution, while autonomous
algorithms may continuously execute (or execute whenever necessary
data is received). The data and other elements used by the Actors
are represented as props 905.
[0121] The prop, actor, stage and theater levels of abstractions
are just that--abstractions. They provide a logical approach to
arranging and managing the various algorithms in the user's
process. These abstractions may be implemented in code prior to
their hardware instantiation, and the following sections include
some example software code (in C++) for these abstractions. The
software architecture of a prop may be a data element defined as
follows:
3 pmc_Prop propname( pmc_PropFlag = "bit vector string"; //register
may be used for error and semaphore traffic <data type>
elementName1; <data type> elementName2; );
[0122] The software architecture for an actor may be defined as
follows:
4 pmc_Actor actorName ( pmc_InputHandle inputHandleName = {
<input_type> inputHandleName1; <input_type>
inputHandleName2; }; pmc_OutputHandle outputHandleName = {
<output_type> outputHandleName1; <output_type>
outputHandleName2; }; pmc_PropList stagePropNameList = {
prop-01-01; prop-01-02; prop-01-03; }; pmc_Event Processor
stageEventProcessor; );
[0123] The software architecture of a stage may be defined as
follows:
5 pmc_Stage stageName ( pmc_InputHandle inputHandleName = {
<input_type> inputHandleName1; <input_type>
inputHandleName2; }; pmc_OutputHandle outputHandleName = {
<output_type> outputHandleName1; <output_type>
outputHandleName2; }; pmc_PropList stagePropNameList = {
prop-01-01; prop-01-02; prop-01-03; }; pmc_ActorList
stageActorListName = { actor-01-01; actor-01-02; actor-01-03; };
pmc_Event Processor stageEventProcessor; );
[0124] The software architecture of a theater maybe defined as
follows:
6 pmc_Theater theaterName ( pmc_InputStage inputStageName;
pmc_OutputStage outputStageName; pmc_CollaborationStage
collaborationStageName; pmc_PropList theaterPropNameList = {
prop-01-01; prop-01-02; prop-01-03; }; pmc_StageList
theaterStageListName = { stage-01-01; stage-01-02; stage-01-03; };
pmc_EventProcessor theaterEventProcessor; );
[0125] A netlist generated by the analyzer may appear as follows in
some embodiments:
7 // Sample output of the analyzer. start theater actor_embodiment
"001" //naming an actor "001" // interface use actor_library
"DeltaV_core_math" //importing an existing library use prop_library
"DeltaV_core_props" in_handle a, b //create input handles named a
and b out_handle ret_val // create output handle named ret_val
event_handle input_available, output_ready //create event handle
for two predefined //events timing async //indicates that the
timing is //asynchronous, with no external timing //dependencies
target_dependency NONE //indicates that the actor is not target-
//specific, and will work on a variety of //platforms security NONE
//indicates that no encryption is used help
"DeltaV_core_math_multiply" //defines where to get the help file
for //this actor // abstractions cast //identifes the other
predefined actors //included in this theater actor "parse_float"
//includes an actor of the type //"parse_float" in the theater
actor "32-bit_multiply" actor "make_float" event "input_available"
//defines the two events that are needed event "output_ready" props
data "pmc_float" a, b, ret_val //defines three props of the type
//"pmc_float", named a, b and ret_val. //Using the predefined
handle names a, b //and ret_val creates connections - two //inputs
and an output-to actor 001 data "pmc_word" ahi, alo, bhi, blo
//defines a prop of data type "pmc_word" //not yet used data
"pmc_dword" term_1, term_2, term_3, term_4 data "pmc_fStruct"
in_s_a, in_s_b, out_s // process direction //defines how the actors
and props //interact pipeline on input_available accept a, b
//pipeline indicates that this step in the //direction can occur
continuously, each //time the input_available event trigger is
//asserted. As an alternative to pipeline, //"static" may be used
to indicate an //action that occurs once. "accept a,b" //means that
the data handles a and b //accept their input. pipeline on a &
b parse_float a, b to in_s_a, in_s_b //when a and b are both ready,
use the //parse_float function on a and b, with //output sent to
in_s_a and in_s_b pipeline on in_s_a & in.sub.-- do
32-bit_multiply to out_s s_b //when in_s_a and in_s_b are ready, do
a //32-bit multiply of those values, and //provide output to out_s
pipeline on out_s do make_float out_s to ret_val pipeline on
ret_val trigger output_ready end "001" // Subsequent instantiations
actor_embodiment "002" replicate "001" //make duplicate actor of
001, named //002 end "002" actor_embodiment "003" replicate "001"
//make duplicate actor of 001, named //003 end "003" // Structure -
tells spatial architect how to assemble the actors and props
stage_embodiment "top_001" place "001" & "002" & "003"
//puts 001, 002 and 003 into this stage link "MCT_port_1_1" to
"001_a" //provide data from MCT_port_1_t to //input "a" of actor
001 link "MCT_port_1_2" to "001_b" //provide data from MCT_port_1_2
to //input b of actor 001 link "MCT_port_2_1" to "002_a" link
"MCT_port_2_2" to "002_b" link "001" to "003_a" //links output of
001 to input "a" of 003 link "002" to "003_b" //links output of 002
to input "b" of 003 link "003" to "MCT_port_3_1" //links output of
003 to port //MCT_port_3_1 end "top_001" end theater abstract
"theater" to actor in library "user_actor_library" as
"Y_multiplier"
[0126] FIG. 9b illustrates a block diagram example of how these
abstractions may be implemented in the final hardware. A single
theater 911 may contain circuitry located at a first location, such
as the location of the development platform on which the user
created the desired process. The hardware for the theater 911 may
include a number of stages 912 (a hardware subset described below),
and a data pipe circuit 913 that may be accessed by various
elements in the theater to transfer data. Each stage 912 may
include a number of actors 914 (e.g., circuits that carry out an
algorithm) and props 915 (e.g., circuits that store predefined data
structures), as well as common circuitry 916 that may be shared by
the various elements of the stage to help carry out handshaking of
the various asynchronous processes in the system.
[0127] The Data Pipe 913 may include circuitry for carrying out the
exchange of data between the various circuits of the system. In
some embodiments, this Data Pipe 913 may be instantiated as a 37-
or 69-bit wide port for the uni- or bi-directional transportation
of information, the specific configuration of which may be
established by the user. A number of memory registers may be used
to temporarily hold this data while it is awaiting collection by a
destination circuit, and the circuitry may also include address and
timing control logic to coordinate this transfer of data. Multiple
instances of Data Pipe 913 may also be used to increase
transmission capacity.
[0128] The common circuitry 916 may include an input port for
receiving a clock signal from the target hardware's main clock to
synchronize the transfer of data. When a circuit needs to output
data, it may place this data in static registers on the Data Pipe
913, and the destination circuit may read the data from the Data
Pipe 913 when the clock signal enables the read. This may be
helpful for deskewing and synchronizing data transfers. Since the
local clock may be hardware dependent, this clock input port may be
instantiated when the overall RTL code is generated. The common
circuitry may include circuitry for receiving an Input Ready signal
from each circuit that is ready to accept input data, and an Output
Available signal from each circuit that has placed output data on
the Data Pipe 913, and may manage the timing of the transfer of
data from these outputs to the inputs. The common circuitry may
also include circuitry for sending and receiving a Data Mode signal
that can allow a data recipient to understand the data that is on
the Data Pipe 913. The Data Modes may be statically defined at
compile time.
[0129] A stage's common circuit 916 may include circuitry for
receiving a START signal, which may cause the particular process
carried out in the stage to begin execution. A stage's common
circuit 916 may also include circuitry for receiving a HALT signal,
which may cause every circuit in the stage to immediately halt
processing. This may be carried out by gating the local clock
signal, and processing may resume where it left off when the HALT
signal is deasserted. A stage's common circuit 916 may also include
circuitry for receiving an ABORT signal, which causes the circuits
in the stage to terminate processing and/or return to a default
state.
[0130] Similar to the common circuit 916 associated with each stage
912, each theater 911 may also include its own common circuit 917
that is shared by the various stages 912. The components of the
theater's common circuit 917 may contain some or all of the same
components found in the stage common circuit 916, but may affect a
larger scale of abstraction. For example, the Input/Ouput signals
may indicate that the particular theater is ready to
receive/transmit data to a circuit outside of the theater 911, such
as another theater in a different location.
[0131] By using the common circuitry 916/917, the various
algorithms and/or processes that become instantiated may operate on
hardware platforms that are geographically dispersed. The common
circuitry may include circuitry for using telephone,
radio-frequency, Internet, and other forms of communication between
physically-separate devices to allow the sharing of data and
collaboration of effort. Processes may be executed in parallel not
only within a given hardware platform, but across multiple
platforms.
[0132] This abstraction may be used to create simple-to-understand
menu commands for Workspace 300. For example, the Workspace 300
Menu Bar 301 may contain a variety of menu options that apply this
theater analogy for the user. In the FIG. 3 example, the Menu Bar
301 may contain the following general options: FILE, CREATE, EDIT,
VIEW, ARRANGE, CODE, BUILD, RUN, TOOLS, and HELP. The FILE menu
option may contain options for opening, saving, closing,
replicating, or deleting an existing theater, prop, actor, etc.,
and may also allow the user to simply exit the program.
[0133] The CREATE and EDIT menu options may allow the user to
create or edit the various theaters, stages, actor, or props in the
user's process. The user may also be given options for creating a
new library of code segments, and may also create a new set of help
messages for use with an existing or new library.
[0134] The VIEW menu option may contain options concerning the
arrangement of Workspace 300, such as the windows to be shown, the
toolbar elements to include. The menu may also include options for
displaying the user's production as an overall abstraction (e.g.,
displaying a chart similar to FIG. 8 illustrating the various
processes), displaying a listing of the currently-enabled hardware
details, and even displaying a graphic representation of the data
flow within the process. The View menu option may permit the user
to place various icons and interconnections on the workspace, and
can be used to select a view of the production, theater, stage,
actor and/or prop.
[0135] The ARRANGE menu may contain options that allow the user to
rearrange the theater and/or stage, and may include commands for
altering the topography of the current view (such as replicating,
deleting, moving, editing icons, etc.), which may affect how the
spatial architect will render it in hardware. The menu may also
include the option for how the code is to be optimized (e.g.,
should the Spatial Architect favor serialization over
parallelization, or vice versa, or whether the system should be
optimized for speed or size).
[0136] The CODE menu option may include options for generating
computer code that carries out the user's desired process. The menu
may include options for generating code in a selected language
(such as C, RTL, Verilog Netlist, etc.). This option may be useful
when a particular process needs to be provided to a variety of
systems with differing hardware.
[0137] The BUILD menu option may include a variety of options
relating to generation of the binary download files from the
computer code. This may include options for building the files for
the actual target hardware, and may also include options for
building the files to be used by other software programs that
emulate reconfigurable hardware platforms, such as System-C or
ModelSim. This menu may also include configuration options, such as
setting the target hardware details, compilation details, and/or
translation details for the compilation and/or software.
[0138] The RUN menu option may contain a number of options for
executing the user's desired process. This may be done, for
example, by using a number of software simulators (e.g., System-C,
ModelSim, etc.). This menu may also include the option of causing
the target emulation hardware to begin execution of the desired
process.
[0139] Event Processing
[0140] As discussed above, many algorithms (such as control-enabled
actors) may use event trigger signals to control the timing of
their execution. In some embodiments, a single generic data type
may be defined for these event trigger signals. By using a common
data type, generic circuitry may be used to handle the event
trigger signals. In some embodiments, each algorithm that is
interested in an event signal may include an Event Processor to
handle the event signals. Alternatively, the Event Processor
circuitry may be instantiated for each abstraction, such as an
actor or stage. The Event Processor may be supplied with
information, such as the netlist or a simple lookup table, that
identifies the various input/output event trigger signals for each
algorithm. The generic event trigger data type handled by this
Event Processor may include the following types of event trigger
signals:
[0141] EVENT_ACTIVITY_COMPLETE--is a signal that an algorithm
(actor, stage or theater level of abstraction) may assert when it
has completed its execution. Upon receipt of this signal, the Event
Processor may determine which other algorithms are "interested" in
this completion (e.g., which algorithms receive this as an input
trigger, also known as "interested parties"), and may transmit a
signal to those algorithms indicating that the completion has
occurred.
[0142] EVENT_ACTUVITY_WARNING--is a signal that an algorithm may
assert to indicate that it has failed to complete its execution,
but that the error was not a fatal one, and that it largely
completed its execution. When an algorithm asserts this event
trigger signal, it may also transfer a "semaphore" containing
warning data describing its progress and/or the error to the
interested parties.
[0143] EVENT_ACTIVITY_ERROR--is a signal that an algorithm may
assert to indicate that it has failed to finish execution due to a
fatal error. The algorithm may also transmit a semaphore containing
data describing the error to the interested parties.
[0144] EVENT_ENTITY_READY--is a signal that an algorithm may assert
to indicate that the algorithm is ready to receive new or
additional input, such as raw data or a data type.
[0145] EVENT_PROP_ARRIVAL--is a signal to indicate that a completed
prop or data structure has been received by a particular theater or
stage (or a data structure associated with a theater or stage). The
Event Processor may use this signal in determining whether to send
an activation signal to interested parties. The signal may be
generated by a reduced version of the data pipe, referred to as a
prop transporter, which may be a shared memory utility. The reduced
version is possible if the prop is referenced using a relatively
small pointer.
[0146] EVENT_PROP_DISPATCH--is a signal that may be generated when
a prop or data structure is transmitted to a different location,
such as a different stage or theater. The Event Processor may
transmit a signal to interested parties indicating that the prop is
on its way.
[0147] EVENT_PROP_CHANGE--is a signal that may be generated when an
algorithm modifies an existing prop. Upon receipt of this signal,
the Event Processor may consult a netlist or lookup table to
determine which other algorithms need to be notified of the change
in the prop, and may send such notification to those interested
parties.
[0148] EVENT_PROP_INITIALIZATION--is a signal that may be generated
when an algorithm creates a new prop. Upon receipt of this signal,
the Event Processor may consult a table or listing to determine
which other algorithms need to be notified of the creation of the
prop, and may send such notification to those interested parties.
The initialization of a data structure essentially reserves memory
space in software, and sets the data to a predefined initialization
value. When implemented in hardware, the circuitry for the new data
structure may have been previously allocated to the prop, and
initialization may simply refer to setting the memory contents to
the predefined initialization value.
[0149] EVENT_PROP_DESTRUCTION--is a signal that may be generated
when an algorithm destroys an existing prop. Upon receipt of this
signal, the Event Processor may consult a table or listing to
determine which other algorithms need to be notified of the
destruction of the prop, and may send such notification to those
interested parties. The concept of "destroying" a data structure
essentially clears memory in software, but when the program is
implemented in hardware, the circuitry previously used to store the
data structure need not physically be destroyed. Instead, that
circuitry might simply be cleared to a predefined neutral value
(which may or may not be its initialization value).
[0150] USER_DEFINED_X--are event trigger signals that the user may
define. These user-defined events may be transmitted using an 8-bit
dedicated port used by each Event Processor. In some embodiments,
the most significant bit may define the direction of the signal,
and the remaining seven bits may simply be used to identify the
user-defined event trigger signal being sent.
[0151] The common circuitry within each stage or theater may also
include a Semaphore Processor, which may be circuitry used to
handle the transportation of the various semaphore control data
described above. Like the Event Processor, the Semaphore Processor
receives the various event semaphore data sent above, consults a
lookup table (or netlist) to identify the recipient algorithm, and
forwards the semaphore data to the recipient. The Semaphore
Processor may handle event transfers, but may also transfer other
types of data, and may be user-definable. To support this
transmission, each stage or theater may instantiate a separate
communication port (or circuitry) for the various other Semaphore
Processors with which it will communicate. At their heart,
semaphores may be viewed as data structures that may contain any
reasonable data type consistent with the physical method of
transport within the target hardware. They may be similar to props,
although instead of carrying data to be manipulated, they carry
control data. The transport mechanism for semaphores may simply be
wires interconnecting the input/output registers of the Semaphore
Processors of the various stages and/or theaters.
[0152] To support the transfer of event trigger signals, the system
may instantiate a separate port, also referred to as an Event Pipe,
for each event trigger connection that an algorithm has. The Event
Pipe circuitry may facilitate the transfer and buffering of event
trigger signal data. In some embodiments, the Event Pipe is
instantiated to carry out one-way communication, and might not be
as simple as a wire bus. Using such unidirectional communication
circuits helps minimize the risk of erroneous event trigger signal
transfer. However, it is also possible to instantiate an Event Pipe
as a bi-directional circuit, which may be helpful in situations
where two algorithms each send event trigger signals to each
other.
[0153] Since various embodiments of the present invention may be
used for mathematical algorithms, some embodiments offer native
support for one of the more troublesome aspects of computer
math--floating point calculations. In existing computing systems, a
processor's arithmetic logic unit typically includes a predefined
data structure for handling floating point values (if they are
handled at all). This predefined data structure may allow a certain
number of bits for the exponent and mantissa. The predefined size
requires that floating point calculations first conform the data
values to the predefined size, which may require execution time to
do. Additionally, the conversion to the predefined size may even be
irrelevant to the particular calculation in question. For example,
if the processor requires a 13-bit exponent, but the particular
calculation in question will never need more than 4 bits for the
exponent, the time spent to conform the data value to the
processor's requirement will be wasted time.
[0154] Some embodiments of the present invention overcome this
deficiency by providing support for arbitrary floating point
values. In such embodiments, the system may define a separate
hardware circuit for each algorithm that needs one, and may define
a custom-sized floating point data architecture for use in the
calculation.
[0155] Thus, for example, embodiments may support 32- and/or 64-bit
floating point data architectures. Under a 37-bit data pipe
architecture, a floating point value may be represented using a
1-bit sign, 8-bit exponent (bias of decimal 127), and 23-bit
mantissa/significand. The remaining bits may be a 1-bit data clock
port, a 1-bit Ready for Input flag, a 1-bit Output Available port,
and a 2-bit mode select port (to allow
input/output/bi-directional). Using a 69-bit data pipe, the same
Data Clock port, Ready for Input port, Output Available port and
Mode Select ports may be used, and the sign bit may again be a
single bit, but the exponent may be expanded to 11-bits (a bias of
decimal 1023), and mantissa/significand may be 52-bits.
[0156] Each instantiated element or circuit, whether it be actor,
stage, or theater level of abstraction, may instantiated with a
circuit that uses the 37- or 69-bit data pipe. An example argument
may be as follows (in the C++ language):
8 pmc-io37[input, output] or pmc_io69 input[input, output]
[0157] The single bit vector (or data pipe) may then be overloaded
with smaller individual registers such that individual components
(e.g., sign, exponent and mantissa) of the bit vector may be
immediately transacted into target registers. The process may then
declare the target registers "on top" of the input/output data
pipe, and may have the following arguments to define where, in the
data pipe, the various floating point values begin, as well as
other data that may be needed, such as a clock and ready
signal.:
9 pmc_bit clk = *(pmc_bitPointer*) input [msb]; pmc_bit rdy = *
(pmc_bitPointer*) input[msb-1]; pmc_byte exp = *(pmc_bytePointer*)
input[msb-6, msb-13] pmc_bit.sign = *(pmc_bitPointer*)
input[msb-14] pmc_fMan.man = *(pmc_bytePointer*) input[msb-16,
msb-24]
[0158] The following data structure may then be defined and used to
accurately reflect a floating point value:
10 typedef struct pmc_fStruct { pmc_bit sign; // defines a sign bit
pmc_byte exp; // defines an exponent byte pmc_fMan mantissa //
defines the mantissa as type fMan } static const pmc_fMan fpDivisor
= 0x800000; // defines a static variable used to // convert binary
to decimal static pmc_fStruct workData; // instantiates an example
variable workData of // type pmc_fStruct workData.sign = inputSign;
workData.exp = inputExp - 0x7F; // remove the bias workData.man =
inputMan .vertline. 0x800000 //the value is OR-ed to obtain //just
the mantissa bit
[0159] Using this data structure (or one like it), any value may be
represented as a fraction consisting of an integer dividend and
integer divisor. The product is then multiplied by the constant 2
raised to the exponent power:
Value=workData.sign((workData.man/fpDivisor)*2exp(workData.exp))
[0160] Consequently, the original value becomes a fixed-point
number (fp) greater than or equal to zero, but less than 2. Using
such an approach can be accomplished in hardware since the fixed
point number is efficiently manipulated and
addition/subtraction/shifting of exponents efficiently determines
the radix point for computational results.
[0161] Fractional-format notation can readily represent this
floating point value, eliminating the need for a fixed-point
divider circuit. Thus, using a 32-bit float as example, the value
may be
static const pmc_fixed<26, 4>RECFPDIV=2.0 exp(-23);
Value=workData.sign(workData.man*RECFPDIV*2exp(workData.exp))
[0162] This definition may be instantiated in hardware using a
pipelined pair of shifters, since both RECFPDIV and the value
2exp(workData.exp) are powers of two.
[0163] The above example assumes that the system is using ANSI 754
float type. This Fractional-format notation works equally well for
ANSI 754 doubles and for the non-standard extended (80-bit)
double.
[0164] Arbitrary range and precision floating point storage that do
not use the ANSI-754 standard may use the following specialized
types:
typedef pmc_arb_float<WL, EXP><name>
[0165] Such that WL represents the total word length including sign
bit, and EXP represents the exponent (which must be an even number)
and the bias for the exponent will always be considered to be one
half of the maximum exponent. [ALWAYS] The mantissa or significand
will simply be (WL-EXP-1) and the ANSI 754 method of using an
"implied" or "hidden" initial bit in the mantissa (for normalized
numbers, per the standard) may be used as well. As the circuits are
instantiated, computer code referencing this newly defined data
type will result in circuitry that has been modified to handle the
architecture described above. In this manner, floating point values
may efficiently be handled.
[0166] By permitting such arbitrary width of the floating point
data value, some embodiments of the present system provide a more
efficient way to handle floating point calculations. Defining the
data structures in this way may also automatically modify the
algorithm for implementation, as the system (e.g., the DBG or
Spatial Architect) may discern the size of the value directly from
the data structure, and may automatically modify the algorithm to,
for example, include a predetermined amount of shifting operations
to match the data sizes of two floating point values that are being
summed.
EXAMPLE APPLICATIONS
[0167] Embodiments of the present invention may be used in any
field where a user may wish to have a hardware implementation of a
software process. Given the inherent speed advantages of running
software using dedicated hardware, it is easy to see that the
applications to which the present invention may be put to use are
near limitless. The following discussion addresses example fields
where one or more embodiments of the present invention may be
advantageously used.
EXAMPLE: MODELING OF PHYSICAL PHENOMENA
[0168] The first field deals with the use of a reconfigurable
platform to create modeling of physical phenomena. Research in the
areas of physical phenomena (e.g. Chemistry, Physics, Cosmology,
Meteorology, Geology, etc.) is largely dependent upon and
frequently restricted by the availability of sufficiently powerful
computational platforms. This difficulty is compounded by the
inappropriateness of generally available computer programming
languages (e.g. C, C++, Fortran, ADA, Basic, etc.) when applied to
the solution of parallel dependency problems. Research efforts
would be significantly expedited and their accuracy improved if the
researcher had a computational engine that was specifically
designed to solve the specific issue facing the researcher and an
applications development environment that makes the reconfigurable
platform easy to use.
[0169] An additional problem is that conventional languages are
generally procedural in nature and designed for use by computer
programming experts. The majority of physical sciences researchers
view the computer as a "necessary evil," a cumbersome tool that
does not conform to the thought process of scientific study nor
conforms well to the actual real-world behavior of the physical
phenomenon to be studied. The vast majority of physical phenomenon
manifest themselves not a step-by-step changes, but rather as
complex interactions with many simultaneous (parallel) events. This
complex real-world scenario is not always effectively modeled using
conventional practices. Because of these problems the resultant
programs and their performance frequently prove slow, unreliable
and nondeterministic.
[0170] Embodiments of the present invention may include a
structured methodology and a rules-based applications development
environment (as discussed above) that addresses and can be used to
solve the problems defined in the above paragraphs. FIG. 10 shows a
block diagram process flow used in some embodiments of the present
invention, and represents a process that is similar to that shown
in FIG. 6 above. Aspects of the invention represent a unique
application of commercially available reconfigurable platforms such
as Mentor Graphics Corporation's V-Station family of emulation
systems and existing reconfigurable logic systems technology, such
as described in U.S. Pat. Nos. 5,596,742; 5,854,752; 6,009,531;
6,061,511; and 6,223,148, the disclosures of which are incorporated
herein by reference. U.S. Pat. Nos. 5,036,473 and 5,109,353 also
describe technology to which aspects of the present invention may
be applied, and are also incorporated by reference. Embodiments of
the present invention may also be adapted for use with other logic
emulation systems such as those manufactured by AXIS Systems, Inc.,
and Cadence Design Systems, Inc. as well.
[0171] By using a commercially available, very large scale,
reconfigurable computational platform, combined with aspects of the
present invention, the researcher does not need to actually design
and build an application specific compute engine. Additionally, the
researcher does not have to attempt to adapt a sequentially
threaded, procedurally based programming language for use in
solving event triggered, behaviorally-organized phenomena.
[0172] The massively parallel nature of the reconfigurable platform
allows the problem to be partitioned into manageable elements with
fast and reliable communications pathways allowing them to be
solved by the hardware. Since the hardware (target platform) is
actually configured to solve the specific problem and operates in a
truly parallel manner, the time to calculate the solution is
dramatically accelerated; depending upon the level of interactivity
between elements, by as much as 1000 times over the same
calculations performed on a conventional computational
platform.
[0173] As shown in FIG. 10, some embodiments of the present
invention contain four key components. First, there may be a
Physical Phenomenon Modeling Language (PPML) 1001. The PPML may be
a loosely structured application development language specifically
engineered for the modeling of physical phenomenon. PPML is unique
in that it need not be a procedurally organized language; but
rather may be structured behaviorally allowing the creation of both
independent and interactive "actors" which respond to event
triggers thereby emulating the real-world behavior of the
phenomenon being studied. The PPML 1001 may take the form of the
various code segments stored in Code Database 109 and their
associated icons.
[0174] Second, there may be a PPML to HDL Distiller 1002. The
Distiller 1002 may accept the PPML definitions of the individual
"actors," "stages," and "theaters," and may distill them into HDL
descriptions for carrying out a user's defined process. The
"distiller" may be configured to support whatever HDL is used by
the target emulation platform, e.g. RTL, VHDL or Verilog. These
PPML definitions may be a netlist generated in step 602 above, and
may perform the DBG step 603 described above.
[0175] Third, there may be a Director Utility 1003. The Director
Utility is a tool that may accept the PPML constructs for "props"
and "cues," and synthesize them into HDL statements that form the
data pathways and event triggers that interconnect the "actors" and
"stages" into a cohesive "theater" in which the phenomenon is
studied. The director's output may be piped into the distiller
utility for incorporation with the other theater elements. The
Director Utility may perform tasks as discussed above with respect
to the spatial architect, and may be a process running in the
background while the user creates the graphical representation of
the process. As the user connects the various actors and props
graphically the director utility (running in the background)
generates the netlist commands that define the control architecture
of the theater.
[0176] Fourth, there may be an Authoring Utility 1004. The
"authoring utility" may be a graphical user interface to the PPML,
Distiller and Director. It allows the model's author to construct
actors, props, stages, scripts and directions at any reasonable
level of abstraction by defining fundamental behaviors for each of
these elements. Once defined, the elements (actors, props, stages,
etc.) may be collected into libraries and/or logically
interconnected into the final theater form. Operating at its
highest levels of abstraction, the authoring utility allows
drag-and-drop authoring of even extremely complex phenomenon. The
authoring tool also provides a mechanism for creation of stimulus
events to be acted upon by the final theater and an event capture
utility for recording and analyzing the results of the phenomenon's
study. The Authoring Utility 1004 may use the Abstraction Window
305 and icons described above to generate the graphic
representation of the user's desired process.
[0177] Some aspects of the present invention provide a "front-end"
to any number of commercially-available reconfigurable platforms.
These platforms have been brought to the marketplace for use as
logic emulation systems. Their single largest application is in the
verification of the integrity of the design of integrated circuits.
These systems are available from several vendors serving the EDA
(Electronic Design Automation) industry. One or more of these
systems serves as a target platform for embodiments of the
invention. A computer workstation (such as workstation 104)
suitable for use with the target platform is also to be
provided.
[0178] Since the output of the distiller and director utilities may
be machine-independent text files, aspects of the invention may be
operated on any suitable computer and use nearly any computer
operating system. The output of the distiller may be, in some
embodiments, the DBG output from step 603, and may be a
hardware-level description of a configuration that may carry out
the user's desired process.
[0179] A method of communications between the target platform's
workstation and the computer hosting aspects of the present
invention may need to be provided, unless the target platform's
workstation is also hosting these aspects. For example, and as
discussed above, several theaters may be implemented on different
pieces of reconfigurable hardware, with communications between the
two reconfigurable hardware platforms.
[0180] It is first important to understand that the invention may
be more than simply a new "programming language." Embodiments of
the invention may provide a fundamentally new and unique
methodology for researching physical phenomenon that dismantles the
differentiation between the "theorist" and the
"experimentalist."
[0181] Traditional scientific method relies upon the theorist to
create highly simplified models of an expected behavior that
largely are analyzed outside the real-world domain (and its
inherent complexity) in which the subject of the study would
normally exist. Once the theorist determines the mathematical model
of the expected behavior, the experimentalist contrives some suite
of controlled environment, conditions and instruments to prove or
disclaim the accuracy of the theoretical model. This process is
repeated, continually adding complexity to the model until it is
believed to match the real-world behavior of the phenomenon under
study.
[0182] A simulation of the theory using conventional computational
techniques may be performed prior to the experimental activities to
reduce the cost of research by limiting how many times the
experiments must be run. These traditional methods are best
described as event-driven cycle simulators. While their results are
often quite accurate, the actual computational process is very
slow. Embodiments of the present invention allow the distillation
of complex, but well understood, phenomenon into behavioral models.
The behavioral models, very highly abstracted entities, are then
combined with the new model under investigation, to allow highly
deterministic and non-granular analysis of the entire phenomenon
under study.
[0183] Using some aspects of the present invention, the theorist is
given a suite of tools that allows rapid and accurate replication
of the actual experimental environment (as known behaviors) and may
then trigger and observe the phenomenon to be studied as it
performs in this virtual environment. FIGS. 11a and 11b illustrate
block diagrams showing how various stages may communicate with one
another within a theater, and how props, actors and directions may
interact on a given stage. Thus the effects of the environment on
the subject are readily observed and may be quickly analyzed
thereby allowing fast changes to the subject model and the event
quickly studied again.
[0184] In some aspects, the invention may operate in a mode
analogous to a theater. Within this theater are collected a number
of "stages." The stage is representative of a collection (suite) of
both known and unknown actors who perform the behaviors to be
studied. The "unknown" actor may be a special construct that
performs a place-keeping role, and may have its own graphical icon
as discussed above. It may be embedded within a stage, and may have
a data collection pipe to other algorithms and/or processes. It can
be used to represent a phenomenon that is not well understood, and
provides a place where neighboring, understood phenomena direct
their outputs, giving the researcher a method of collecting
stimulus information that may be later used to "flesh out" the
incompletely understood phenomenon. The Code Database 109 may also
store code segments defining the manner in which the unknown actor
may react to this data (e.g., defining the frequency of data
sampling, providing an output and/or event trigger signal upon
receiving a certain data value, etc.), and these code segments may
be used to instantiate the appropriate circuitry for reacting to
the data provided by the rest of the stage.
[0185] The actors' behaviors are controlled by "directions"
provided by the author via the director, and as represented in the
netlist. These directions control the interactions between the
actors and the time necessary for each actor to respond to the
events or "cues" that trigger their individual behaviors.
[0186] Associated with the stages and actors may be resources
provided in the form of "props." The props are analogues to
real-world quanta be it energy, matter, or vector or scalar
properties. Actors manipulate the props upon the stage in which
they are set. Props may be of any reasonable level of abstraction,
from simple, single data types to highly complex structures or
collections of data.
[0187] Cues are the triggers that start the performance on any
particular stage. Cues may be data events or may be the
introduction of a prop onto a stage. Cues may be supplied by
outside stimulus or may be generated by the performance on another
stage. Cues may also interrupt or modify the behavior acted out on
any stage. In some embodiments, these cues take the form of the
various event trigger signals described above.
[0188] The stage is the variable level of abstraction. Upon the
stage the actors, props and cues perform any given behavior. The
stage may be organized as highly specialized or simplistic,
performing a single behavior by a single troop of actors. Or, the
stage may be generalized, sweeping several smaller stages into a
single macro-behavior.
[0189] The author (researcher) may collect and/or create known
stages (behaviors) and use them to assemble a test library. Most of
these would be previously proven valid stages. They may be left
intact where all internal interactions are executed or may be
graduated to higher levels of abstraction where they are dealt with
only as high-level behaviors thereby causing them to use fewer
resources and quickening the execution time. By surrounding an
"unknown actor" with well-understood stages, the researcher may
provide sufficient data during hardware-accelerated simulation to
create an effective behavioral model of the unknown phenomenon.
This behavioral model may then be used, later, to derive the
algorithmic behavior of the phenomenon under study.
[0190] The author may define a new stage for the phenomenon to be
studied by collecting actors and props onto the stage using PPML.
The author may then define the timing and behavior of the stages'
contents though the use of cues and directions. Once all the stages
are created or collected, the author gathers them into a theater
and forms their interrelations using cues.
[0191] The theater may be passed to the distiller where the PPML is
redefined as HDL constructs suitable for the target platform. The
HDL may then be transferred to the target platform's host computer
for synthesis into target primitives and execution.
[0192] It may be helpful to address how this embodiment interacts
with a target platform. The target platform, regardless of its
manufacturer, may essentially be viewed as a collection (albeit a
very large collection) of individually reconfigurable electronic
devices, such as field-programmable gate arrays (FPGAs) that are
preconfigured into an array or "fabric." Some switching and/or
multiplexing of the IO's of these devices allow for the dynamic
reconfiguration that makes some aspects of the invention possible
and attractive. The mechanism for switching and/or multiplexing is
generally proprietary to the individual manufacturer and is,
essentially, irrelevant to the performance of many aspects of the
invention. FIGS. 12a and 12b show block diagram examples of how
some embodiments of the present invention may interface with target
hardware.
[0193] The individual stages (behaviors) composed by the author
using PPML may be distilled into HDL and then stored for later
injection into a theater. Since all the PPML constructs distill
into HDL, regardless of their mathematic complexity they will
ultimately synthesize into gates or target primitives. Some
commercial logic emulation systems do not provide traditional
"gates." Instead, their designs implement a number of standard
"primitives" that have predefined structure and work from a
parameter list. The Mentor Graphics VStation emulator is an example
of this method. At high levels of abstraction the resultant use of
target primitives is minimized because behaviors need not be
calculated in execution, they may simply be triggered outputs of
tables. As the abstraction of the problem drops, additional target
resources may be required to support processing with combinational
logic or iteration rather than table lookup.
[0194] The dynamic interconnections, or cues, may then be
synthesized and the result is a theater, or monolithic block of HDL
that may be passed to the target platform for final compilation and
ultimate execution in hardware. The cues may be dynamic in that
they contain an op-code (operation defining code) that controls the
behavior of the event processor on the target stage. Thus, the
results of a computation may alter the behavior of another stage by
providing flexible cues to downstream stages.
[0195] Since all the stages may remain independent, though
communicative, elements, execution of parallel performances within
the theater may actually be synthesized as parallel blocks of logic
and therefore perform very fast.
[0196] Further enhancing performance, since the vast majority of
stages in any theater will be previously proven behaviors, they may
be precompiled and stored. As changes are made to the behavior
(phenomenon) under study only those things that change need be
distilled again. This dramatically reduces the time necessary to
incorporate change, making it predominantly dependent upon the
target platform's recompilation time.
[0197] Since the individual stages and theaters may be asynchronous
behaviors that interact only upon demand, it is possible, indeed
practical, to construct extremely large behavioral models of
physical phenomenon that exceed the capacity a single target by
simply using multiple targets (theaters). Since the individual
theaters need not be synchronized by anything other then
transmitted cues or props, the difficulties normally associated
with "multi-box" solutions are eliminated. Since props and cues are
comparatively small data elements, they may be quickly and easily
transmitted between theaters either by direct connection of the
target hardware's IO facilities or over a communication network,
such as a Local Area Network (LAN) or Wide Area Network (WAN).
[0198] FIG. 13 illustrates an example model for the distribution of
a theater. First, a generic theater may be developed. Then the
generic theater is debugged and, optionally, one or more target
specific theaters may be generated. The theater(s) may be
distributed to one or more remote targets via a network, such as
the Internet or RF transmission networks. By making the theaters
"target specific," any theater intercepted during transmission
cannot be reverse-engineered or executed without the target
platform; thus, a high level of security can be provided when
desired. Plural generic (or "target specific") theaters may be
selectively distributed to remote target platforms for operating in
conjunction with, or independently of, the other distributed
theaters. The distributed theaters may be identical to, or
different from, each other, depending upon the distributed modeling
requirements. Each of the remote targets may include replicate
hardware as is commercially available, such as those from Mentor
Graphics Corporation. The replicate hardware is less costly than a
primary theater development system (development platform) and is
more secure because replicates do not require debugging
facilities.
[0199] The physical phenomena modeling example may involve
research, such as the following hypothetical example. Assume a
theoretical researcher at "National Laboratory A" has developed
equations which promise a mechanism for reducing decay rates in
doping materials used for semiconductor fabrication. The
implications if the theory can be proved correct would be that new
devices could be fabricated which require substantially lower
activation energies and therefore lower power consumption. However,
the laboratory has no facilities for experimentally proving or
disproving the simplified theory and certainly no resources for
demonstrating it in far more complex environments.
[0200] Using an embodiment of the present invention, integrated
with a Mentor Graphics Corporation V-Station/30M logic emulation
system, the researcher constructs a theater where one of its
internal stages is the new decay model he has devised. Since the
stage emulates the behavior of the new phenomenon in massively
parallel hardware, the researcher is able to use machine generated
test vectors to test the theory with several million vectors which
represent the probable range of external stimulus that the theory
would be experiencing in a real-world application. The time
necessary for these millions of vectors is only a few minutes. As
unexpected perturbations appear in the theater's results, the
researcher is able to quickly modify the model until flaws in the
theory are corrected and the model appears consistent and
accurate.
[0201] Now the researcher modifies the theater to include a number
of additional stages having well known and proven behaviors that
must be able to properly interact with the new decay theory if it
is to have any commercial value. Again, the speed of the overall
theater allows many millions of test cycles in a very short period
of time (several hours). Again, unexpected variations in the
results indicate that some environmental issues may be injecting
unacceptable levels of chaos into the model. Unfortunately the
laboratory does not have sufficient numbers of the V-Station target
hardware to adequately test the theory against stages representing
all interactions that may be causing the problems.
[0202] However, the researcher has collaborators at National
Laboratories B and C with similar V-Station equipment. A new
composite theater may be created that purposely exceeds the
capacity of any one target hardware system but partitions the
theater across three remote machines. FIG. 14 illustrates a block
diagram example of such a collaborative distribution of theaters.
The researcher's collaborators are each provided with a fractional
theater where trans-theater pathways and triggers are transmitted
via each target's host workstation. These three host workstations,
separated, e.g., by hundreds of miles, interact via high-speed
internet connections allowing the three dispersed systems to
intimately collaborate and complete several million test cycles in
just one day. The common circuitry 917 of each theater may also
include circuitry to allow the various theaters to communicate with
one another and share information. This circuitry may be as simple
as Internet communication hardware, telephone line modern hardware,
etc., and may allow multiple researchers to jointly execute
experimental software algorithms.
[0203] Assume that the theater emulated test results support the
validity of the new theory. Armed with verification of the
integrity of the new theory, the researcher secures funding for an
experimental production batch of integrated circuits, which upon
physical fabrication and testing, provided final validation of the
theory. The several flaws in the initial theory that were
eliminated through machine accelerated testing would have required
several attempts at the experimental device fabrication process
before finally yielding the desired results. Not only would an
iterative physical fabrication process have been very time
consuming, the cost would have been significant. By using aspects
of the invention, coupled with distributed machine collaboration,
all involved laboratories are able to constrain costs and provide
tangible value for the research investment in a dramatically
shorter period of time.
[0204] Aspects of the present invention may also simplify the task
of the theorist when a new process is needed. Once the initial
algorithms have been created, and their icons are available, the
theorist may easily modify the overall process by rearranging
and/or modifying the existing algorithms. The user may open the
process in Abstraction Window 305, and may insert/delete/rearrange
the icons to modify the process, and may then simply request that
the system recompile the process to provide a new downloadable file
for the target hardware. If minimal data dependencies are present,
the Spatial Architect may instantiate the circuitry in the target
hardware as a massively parallel circuit to provide the fastest
operation possible.
[0205] Accordingly, in using aspects of the present invention to
model the behavior of physical phenomena, the following example
aspects become apparent. First, reconfigurable platforms have been
traditionally marketed and supported exclusively as EDA tools,
specifically tools for the verification of custom integrated
circuit designs. This embodiment introduces a novel application for
this technology: physical sciences research.
[0206] Second, the embodiment introduces the concept of a
non-procedural language specifically engineered for the study of
massively-parallel physical phenomena.
[0207] Third, the embodiment introduces the concept of arbitrary
range and precision floating-point data representations in
hardware.
[0208] Fourth, an aspect of this embodiment is that, since the
system may generate code for a variety of platforms, it supports
portability under the OpenMP suite of standards.
[0209] Fifth, the embodiment introduces the concept of distillation
of content across high-level languages, thereby increasing economy
by eliminating the need for mission-specific or platform-specific
compilers. This makes it possible to use embodiments of the
invention on nearly any suitable target platform without any need
to alter the target or its supporting software.
[0210] Sixth, the embodiment breaks down the barrier between the
theorist and the experimentalist by providing a tool that allows
the theorist to prove and adjust theoretical predictions in a
complex environment prior to passing it off to an experimentalist
for testing.
[0211] Seventh, the embodiment is applicable to modeling of any
physical phenomenon. This allows marketing of the target platforms
into applications previously closed to the EDA industry, e.g.
chemical manufacture, aerospace, and geophysical exploration
industries. Utilizing a plurality of distributed (networked)
reconfigurable target platforms, each forming a fractional theater,
a researcher in one location can create a very large composite
modeling theater exceeding the capacity of any one target platform.
Alternatively, centrally-developed theaters, which may be the same
as, or different from, each other, can be distributed to plural
researchers in different locations, for carrying out modeling of
related phenomena, e.g., location specific phenomena such as
weather or geophysical phenomena, or entirely different
phenomena.
EXAMPLE EMBODIMENT--ABSTRACTION AND BEHAVIORAL MODELS
[0212] FIG. 15 illustrates a flow diagram for another example
embodiment and use of the present invention. The power and
ease-of-use offered by various embodiments described above enable
the simulation and modeling of various computational problems. For
example, in step 1501, a user may define a computational model and
its boundaries. Computational models are common throughout the
research community, and are used to define a near-infinite variety
of behaviors such as planetary orbits, gene sequencing, thermal
conductivity, etc. For ease of explanation, the present discussion
will use the following simplified computational model (although it
will be understood that the teachings described herein may be
applied to any computational model): 2 0 sin ( x + 41 )
[0213] The boundaries for a computational model represent the outer
limits for the variables appearing in the computational model. In
the illustrated example, the model is bounded by defining the value
x to vary between 0 and .pi..
[0214] Once the computational model has been defined, the process
may move to step 1502, in which the user may create an abstraction
flow for the computational model. The abstraction flow may simply
be a series of icons and interconnections as described above to
represent the computational algorithm. In creating this
abstraction, the user may rely on previously-defined algorithms.
For example, the user may already possess in the library an icon
corresponding to an algorithm for calculating "sin(x+41)," where x
is a single input to the algorithm. The user may then use this
icon, together with an integration symbol, to define an abstraction
of the computational model that will compute the sum defined by the
integral, and may provide as input to the algorithm the various
boundaries of the model. The user may also define additional
circuitry for capturing data samples during execution, and may
define a data structure that will retain the output generated by
each corresponding input.
[0215] In step 1503, the abstracted computational model may then be
converted into code that may be used to configure hardware to
perform the computational model. This conversion may use the
Spatial Architect, architect, and/or Distiller/Behavior Generator
described above.
[0216] Then, in step 1504, the code for performing the
computational model may be used to configure a hardware platform,
and the platform may begin its execution of the computational
model. The calculations may be performed in hardware, and the
circuit may capture the voluminous amount of input/output data
values obtained during the process.
[0217] In step 1505, the output of the hardware's calculations may
be reviewed. The data structure holding the various input and
output combinations may be examined to discern patterns in the
data. For example, the user may identify a step value in the input
values in which an output value's change is insignificant (e.g.,
the outputs for an input of 0.001 and 0.002 are so close that they
can be treated the same). The data structure may also be used to
define a lookup table identifying the corresponding output for each
given input. This lookup table may then serve as a behavioral model
of the computational model, and may produce equivalent results in a
fraction of the time since a look up process can be handled in
hardware much faster than a computational process. The tradeoff, of
course, is that the lookup table may require significantly more
memory/circuit real estate to implement.
[0218] In step 1506, the user may once again define an abstraction
flow, although this time for the behavioral model developed in step
1505. The behavioral model abstraction flow may include a variety
of look up operations using data structures, and may include
additional logic to simplify the lookup process.
[0219] In step 1507, the user's abstraction flow may be processed
(again, this may be done by the Spatial Architect, architect and/or
DBG discussed above) as described above to produce computer code
for implementing the new behavioral modal. In step 1508, this code
may then be added to the Code Database 109, and a new hardware
primitive may be defined for the behavioral model. The primitive
may include a new icon with handles, such as icon 401.
[0220] In step 1509, the user may determine whether the particular
algorithm that was abstracted may be used in a larger process
occurring at a higher level of abstraction. For example, the
integral function described above may in fact be just a small piece
of a larger process or behavior. If a higher level of abstraction
exists, then the process may move to step 1510, in which the user
may define the computational model for the higher level process or
behavior, as well as the boundaries applicable to that higher
level, and the process may then return to step 1502 to allow the
user to define an abstraction flow for the higher level of
abstraction. In this recursive manner, scientists and researchers
may begin with a lower level, simplified, computational model,
instantiate it in hardware to obtain results for creating a
behavioral model, replace the computational model with the
behavioral model, and repeat this process for a more complicated
(e.g., higher level of abstraction) process. As this process is
repeated, more and more complex computational models may be
replaced by behavioral models that can be instantiated in hardware,
which may execute much faster than the computational models
could.
[0221] The discussion above introduces a number of concepts,
aspects and features that may play a role in various embodiments of
the present invention. FIG. 16 shows a high-level, overall diagram
illustrating how many of these features may fit together in one or
more embodiments. As shown in FIG. 16, the various elements in the
upper portion 1601 may have an interface to the Authoring Utility
or the Solutions Editor, and additionally, the front-end of the
Distiller-Behavior Generator may be coupled to this section. The
lower portion 1602 may be coupled to the back-end of the
Distiller-Behavior Generator and the Spatial Architect.
[0222] The discussion above presents a number of embodiments,
aspects and features that may be used in the present invention.
However, it will be understood that the particular embodiments
disclosed are example embodiments, and that the various features
described herein may readily be interchanged and/or rearranged to
produce combinations and subcombinations, all of which are
encompassed within the scope of the present disclosure. The true
scope of the inventions covered herein should be limited only to
the claims that are made against this disclosure--claims that
include the ones appearing below.
* * * * *