U.S. patent application number 12/373481 was filed with the patent office on 2009-11-26 for solver for hardware based computing.
This patent application is currently assigned to DREXEL UNIVERSITY. Invention is credited to Anthony Deese, Jeremy Johnson, Prawat Nagvajara, Chika Nwankpa, Aaron St. Leger, Petya Vachranukunkiet, Jeffrey Yakaski.
Application Number | 20090292520 12/373481 |
Document ID | / |
Family ID | 38982399 |
Filed Date | 2009-11-26 |
United States Patent
Application |
20090292520 |
Kind Code |
A1 |
Nwankpa; Chika ; et
al. |
November 26, 2009 |
SOLVER FOR HARDWARE BASED COMPUTING
Abstract
Full-AC load flow constitutes a core computation in power system
analysis. The present invention provides a performance gain with a
hardware implementation of a sparse-linear solver using a Field
Programmable Gate Array (FPGA). The invention also relates to the
design, simulation, and hardware verification of a static
transmission line model for analog power flow computation.
Operational transconductance amplifiers are employed in the model
based on a previously proposed DC emulation technique of power flow
computation, and provide reconfigurability of transmission line
parameters via transconductance gain. The invention also uses
Analog Behavioral Models (ABMs) in an efficient strategy for
designing analog emulation engines for large-scale power system
computation. Results of PSpice simulations of these emulation
circuits are compared with industrial grade numerical simulations
for validation. The application is also concerned with the
development of a generator model using analog circuits for load
flow emulation for power system analysis to reduce computation
time. The generator model includes reconfigurable parameters using
operational transconductance amplifiers (OTAs). The circuit module
is used with other reconfigurable circuits, i.e., transmission
lines and loads.
Inventors: |
Nwankpa; Chika;
(Philadelphia, PA) ; Deese; Anthony; (Richmond,
VA) ; St. Leger; Aaron; (Warminster, PA) ;
Yakaski; Jeffrey; (Sharon Hill, PA) ; Johnson;
Jeremy; (Berwyn, PA) ; Nagvajara; Prawat;
(Narberth, PA) ; Vachranukunkiet; Petya;
(Warminster, PA) |
Correspondence
Address: |
KNOBLE, YOSHIDA & DUNLEAVY
EIGHT PENN CENTER, SUITE 1350, 1628 JOHN F KENNEDY BLVD
PHILADELPHIA
PA
19103
US
|
Assignee: |
DREXEL UNIVERSITY
PHILADELPHIA
PA
|
Family ID: |
38982399 |
Appl. No.: |
12/373481 |
Filed: |
July 27, 2007 |
PCT Filed: |
July 27, 2007 |
PCT NO: |
PCT/US07/74630 |
371 Date: |
January 12, 2009 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60820549 |
Jul 27, 2006 |
|
|
|
Current U.S.
Class: |
703/18 ;
708/607 |
Current CPC
Class: |
G06F 17/16 20130101;
G06F 30/331 20200101 |
Class at
Publication: |
703/18 ;
708/607 |
International
Class: |
G06F 9/455 20060101
G06F009/455; G06F 9/00 20060101 G06F009/00; G06F 7/32 20060101
G06F007/32 |
Claims
1. An apparatus for carrying out complex computations which
comprises: a field programmable gate array, said field programmable
gate array programmed for implementation of Gaussian elimination by
decomposition of a matrix into lower and upper triangulation
factors and forward and backward elimination; memory operably
connected to said field programmable gate array, and a device for
maintaining pivoting information operably connected to said field
programmable gate array.
2. Apparatus as claimed in claim 1, wherein said device for
maintaining pivoting information comprises a lookup table and
memory pointers.
3. Apparatus as claimed in claim 2, wherein said field programmable
gate array implements the following Gaussian elimination for each
column in the matrix loop: (a) search the column for a pivot
element; (b) pivot if necessary; (c) normalize a pivot column; (d)
update a remaining portion of the matrix; (e) multiply to form a
sub-matrix update; and (f) update a value or create a fill-in.
4. Apparatus as claimed in claim 3, wherein said apparatus is
configured to simultaneously perform memory reads and memory writes
between said field programmable gate array and said memory.
5. Apparatus as claimed in claim 4, further comprising a random
access memory for storing an array of the indices of non-zero
elements in the matrix, said random access memory being operably
connected to said field programmable gate array.
6. Apparatus as claimed in claim 5, further comprising index
rejection logic to select appropriate indices from said stored
array of indices.
7. Apparatus as claimed in claim 6, wherein said memory comprises a
row cache of memory of sufficient size to store a largest
sub-matrix update for said computation.
8. Apparatus as claimed in claim 7, wherein further comprising a
table to keep track of elimination progress.
9. Apparatus as claimed in claim 1, wherein said complex
computation involves solution of at least one sparse linear
system.
10. Apparatus as claimed in claim 1, configured to emulate power
flow in a power transmission system.
11. A method for carrying out complex computations which comprises
the steps of: (a) implementing Gaussian elimination by
decomposition of a matrix into lower and upper triangulation
factors and forward and backward elimination; (b) storing and
retrieving matrix values in a memory; and (c) maintaining pivoting
information.
12. A method as claimed in claim 11, wherein said step of
implementing Gaussian elimination comprises the steps of: Gaussian
elimination for each column in the matrix loop: (a) searching a
column of the matrix for a pivot element; (b) pivoting, if
necessary; (c) normalizing a pivot column; (d) updating a remaining
portion of the matrix; (e) multiplying to form a sub-matrix update;
and (f) updating a value or create a fill-in.
13. A method as claimed in claim 12, wherein said step of storing
and retrieving matrix values in memory comprises simultaneously
storing and retrieving matrix values.
14. A method as claimed in claim 13, further comprising the step of
tracking elimination progress to track elements that have already
been eliminated.
15. A method as claimed in claim 14, further comprising the step of
selecting indices from said stored array to avoid selection of
matrix elements that have already been eliminated.
16. A system for emulating power flow in a power transmission
system, which comprises: a power supply; a variable resistor
including at least one reconfigurable operational transconductance
amplifier operably connected to said power supply; and a device for
measuring a current or voltage output of said system.
17. A system as claimed in claim 16, comprising at least two
reconfigurable operational transconductance amplifiers for modeling
power transmission in at least two directions.
18. A method for emulating a power transmission system comprising
the steps of: (a) calculating a complex voltage from at least one
generator using a dynamic swing equation; (b) calculating a complex
current flowing in a branch of said power transmission system; and
(c) calculating a real power of said at least one generator; and
(d) updating the dynamic swing equation based on a result of step
(c).
19. A method as claimed in claim 18, further comprising the step
of: (e) solving for a steady-state solution.
20. A method as claimed in claim 19, further comprising the step
of: (f) correcting for offset of said steady-state solution by
calculating an offset solution for zero mechanical input power, and
subtracting the offset solution from said steady-state solution.
Description
BACKGROUND OF THE INVENTION
[0001] 1. Field of the Invention
[0002] The current invention is related to a programmable linear
algebraic solver for large sparse matrices based on a lower-upper
triangular decomposition method to be used in a hardware computing
environment. Furthermore, the present invention also relates to
models for power system components and power systems.
[0003] 2. Brief Description of the Prior Art
[0004] Currently, power flow computation for large power systems is
time intensive. The calculations are non-linear in nature and
lengthy iteration schemes are the currently preferred solution.
This presents a problem as many assumptions and simplifications are
required to solve the equations in a timely manner. In addition,
expansion of the power grid, increasing necessity and complexity of
contingency studies and introduction of economic analysis are
demanding further computational burden. Traditional digital methods
are too slow to solve the aforementioned demands quickly. This
affects the security, reliability and market operation of power
systems. Ideally a real-time computation tool is preferable,
specifically in market activities and operation.
[0005] In power flow computations, sparse linear solvers are used
to calculate the update vector for the Jacobian matrix at each
iterative step during Newton-Raphson iteration for load flow
solution. The bottleneck in Newton power flow iteration is the
solution of a sparse system of linear equations, which is required
for each iteration, and typically consumes 85% of the execution
time in large-scale power systems [1]. In practice, many related
load flow computations are performed as part of contingency
analysis. These multiple load flow computations can easily be
performed in parallel. It remains desirable to speed up the
computation of an individual load flow computation. Typical power
flow computation is performed using general-purpose personal
computing platforms based on processors such as the Intel Pentium
IV.RTM.. High performance uniprocessor sparse linear solvers on
these platforms utilize only roughly 1-4% of the floating-point
throughput. Attempts have also been made to increase performance
through the use of cluster computing [1], however, due to the
irregular data flow and small granularity of the problem these
methods do not scale very well.
[0006] Analog computation provides a viable alternative to
traditional approaches. Among the advantages over traditional
digital methods are physically realizable solutions and faster
computation times. In order to make the analog method a viable tool
in power system analysis, accurate models of power system
components are required. A reconfigurable transmission line model
is considered for a specific analog computation method. Traditional
digital methods are expensive and slow in comparison to analog
computers. The power flow solution is obtained almost
instantaneously regardless of the number of components in the
network with analog circuits. Essentially computation time is
independent of network size. Effectively the solution is obtained
as quickly as the system stabilizes. Experimentation has shown the
ability to calculate solutions even faster than real time. In prior
research, simulation time for a two-machine system was typically
104 times shorter than the real time simulated phenomena [101].
This is achieved by modeling generator dynamics for the purpose of
transient stability evaluation.
[0007] The modeling and design of analog components is one obstacle
that must be overcome. The solutions will only be as accurate as
the models and measurements will allow. Without clearly defined
valid models for power system components, the analog computational
method will never be realized. In addition, these models also must
cater towards computational speed. Specifically, in power system
operation, multiple runs and contingencies are required to be
executed extremely quickly. A priority for these analog models is
to yield valid solutions while allowing fast reconfigurability.
[0008] An efficient strategy for designing analog emulation engines
for large scale power system computation may be provided through
the use of Analog Behavioral Models (ABMs) of PSpice [201].
Emulation can be described as an act of a physical system imitating
a real system. Emulation in this patent application, therefore, is
the representation of physical characteristics of a real life
object (power system) using an electric circuit equivalent. The
representational relationship could be mathematical, scaled, or
both. The circuit equivalent representation has within it the model
of a real system as well as a method of its solution. The speed of
computation is as quick as the response of the circuit itself,
which could be real time, faster or slower than real time depending
on the parameter settings. The solution is continuous in time and
amplitude.
[0009] Simulation on the other hand is an attempt to
predict/replicate aspects of the behavior of a real system by
creating an approximate (mathematical) model of it. Computer
modeling, for example by providing a special-purpose computer
program, may do this. The program is composed of equations that
describe the functional relationships within the real system. When
the program is run, the resulting mathematical dynamics form an
analog of the behavior of the real system with the results
presented in the form of data.
[0010] The need to emulate/simulate large and complex mixed-signal
systems has prompted the development of high-level circuit
representations for analog components; ABMs serve this purpose. The
interior of a behavioral model, however, is different in that it is
implemented in terms of algebraic or differential equations father
than physical analog components. In order words, in a behavioral
model, the focus is on the input/output relationship of the block.
The fundamental advantage of the behavioral modeling technique in
top-down designs is that the simulation can provide fast prediction
of system performance. The approach helps to select proper
architectures for circuit implementation and analyze tradeoffs at
the early design stages. The transistor level simulation (bottom-up
design), comparatively, can be very tedious and cumbersome
especially for mixed-signal chips containing a large number of
analog components. Under such circumstances, behavioral models
enable designers to verify the complex system efficiently and
result in fast system evaluation prior to embarking on full
structural design and implementation.
[0011] Static load flow analysis, which is based on the power flow
equations, is one method currently used by the industry. The
problem is that such analyses solve using a sequential method,
which makes the simulation process very slow in complex networks.
The speed of the digital computer to solve for static load flow is
greatly dependent on the size of the power system network. An
alternative method to solve static load flow is using an analog
computer to emulate the behavior of the power system, instead of
using model equations to simulate the network.
[0012] In the past two decades, silicon-on-insulator (SOI)
complementary metal oxide semiconductor (CMOS) technology has
become a major technology for integrating VLSI systems using a
low-supply voltage [301]. Along with the progress in the CMOS
technology, CMOS devices have been scaled down continuously and the
corresponding power consumption has also been decreased, which have
triggered advances in the circuit design techniques for various
designs and applications. The association between CMOS circuits and
VLSI chips is becoming increasingly easier to implement.
[0013] With these advances, Fried et al. [302], proposed an
approach using an analog VLSI chip for simulation of the behavior
of power systems. Using an analog VLSI chip can be limited if the
fabricated chip is only useful for one power system configuration.
Gu et al. [303], presented a concept to add reconfigurable
parameters using switches and analog voltages to change the system
configuration and parameters.
SUMMARY OF THE INVENTION
[0014] According to one aspect of the current invention, the system
provides a substantially improved computing platform for solving
computation intensive problems. One implementation includes Field
Programmable Gate Arrays (FPGA) for the solution of sparse linear
systems.
[0015] According to a second aspect of the current invention, the
system is implemented by a combination of downloadable core
software and a dedicated hardware such as FPGA. The core software
customizes the application of the FPGA for various computational
analyses.
[0016] According to a third aspect of the current invention, a
static transmission line is modeled for analog power flow
computation. Operational transconductance amplifiers in the model
provide reconfigurability of transmission line parameters via
transconductance gain.
[0017] According to a fourth aspect of the current invention,
Analog Behavioral Models (ABMs) are implemented for use in analog
emulation engines for large-scale power system computation. ABMs
are applied for model verification and validation prior to full
structural design and implementation.
[0018] According to a fifth aspect of the current invention, a
generator model using analog circuits is developed for load flow
emulation for power system analysis to reduce computation time. The
generator model may include reconfigurable parameters using
operational transconductance amplifiers (OTAs). The circuit module
is used with other reconfigurable circuits, i.e., transmission
lines and loads.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] FIG. 1 is a diagram illustrating SYMAMD elimination row
histogram.
[0020] FIG. 2 is a diagram illustrating the architecture of one
preferred embodiment according to the current invention.
[0021] FIG. 3 is a diagram illustrating the pivot search logic
block of one preferred embodiment according to the current
invention.
[0022] FIG. 4 is a diagram illustrating the update pivot column/row
read logic block of one preferred embodiment according to the
current invention.
[0023] FIG. 5 is a diagram illustrating the update unit of one
preferred embodiment according to the current invention.
[0024] FIG. 6 is a graph illustrating the relation between the
increased speed and the number of update units according to the
current invention.
[0025] FIG. 7 is a graph illustrating the relative solve times for
a single update unit according to the current invention.
[0026] FIG. 8 is a graph illustrating the relative solve times for
sixteen update units according to the current invention.
[0027] FIG. 9 is a graph illustrating the LU hardware performance
relative to a Pentium IV.RTM. processor.
[0028] FIG. 10 is a circuit diagram illustrating a DC network
emulation for a three-bus power system.
[0029] FIG. 11 is a diagram illustrating a lossy transmission line
model.
[0030] FIG. 12 is a diagram illustrating a lossless transmission
line model.
[0031] FIG. 13 is a diagram illustrating an ideal operational
transconductance amplifier (OTA).
[0032] FIG. 14 is a diagram illustrating a double ended OTA
transmission line model.
[0033] FIG. 15 is a diagram illustrating a potentiometer
relationship to OTA model.
[0034] FIG. 16 is a diagram illustrating PSpice schematic for bias
current control.
[0035] FIG. 17 is a graph plotting bias current vs. bias
voltage.
[0036] FIG. 18 is a graph plotting theoretical and simulated
resistance vs. bias voltage.
[0037] FIG. 19 is a graph plotting line current vs. line
voltage.
[0038] FIG. 20 is a graph plotting line current vs. line voltage
linear region.
[0039] FIG. 21 is a graph illustrating effective resistance of line
model.
[0040] FIG. 22 is a graph plotting effective resistance of line
model vs. line voltage.
[0041] FIG. 23 is a graph plotting hardware and spice effective
resistance vs. bias voltage.
[0042] FIG. 24 is a graph plotting HW line current vs. line voltage
for various bias currents.
[0043] FIG. 25 is a graph illustrating line current linearity.
[0044] FIG. 26 is a graph plotting HW resistance vs. line
voltage.
[0045] FIG. 27 is a diagram illustrating a three-bus power system
topology.
[0046] FIG. 28 is a diagram illustrating a three-bus prototype
board.
[0047] FIG. 29 is a diagram illustrating an emulation circuit
setup.
[0048] FIG. 30 is a diagram illustrating a power world power flow
solution.
[0049] FIG. 31 is a diagram illustrating analog and ABM sine
shaper.
[0050] FIG. 32 is a time line of hardware technology.
[0051] FIG. 33 illustrates complex computation methodology.
[0052] FIG. 34 is a diagram illustrating a complex computation
implementation.
[0053] FIG. 35 is a diagram illustrating generator real power
computation methodology.
[0054] FIG. 36 is a diagram illustrating computationally feasible
bus voltage.
[0055] FIG. 37 illustrates variable representation in analog
emulator.
[0056] FIG. 38 is a diagram illustrating a power system emulator
hardware setup.
[0057] FIG. 39 is a diagram illustrating a classical generator
model.
[0058] FIG. 40 is a block diagram illustrating a swing equation
functional block.
[0059] FIG. 41 is a block diagram illustrating generator model
functions.
[0060] FIG. 42 is a diagram illustrating a single-machine CMON
analog emulator.
[0061] FIGS. 43a-b show diagrams illustrating results of power
system and analog emulator.
[0062] FIG. 44 is a diagram illustrating results of analog emulator
with offset corrections.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0063] In the current invention, application-specific hardware is
employed to reduce the computation time for the solution of the
sparse linear systems arising in complex computations. The use of
special-purpose hardware reduces the overhead costs, better
utilizes floating-point hardware, and provides fine-grained
parallelism. The use of Field Programmable Gate Arrays (FPGA)
provides a convenient platform to design and implement such
hardware. By building hardware that is specifically designed to
solve the sparse matrices found in complex calculations, such as
power system calculations, rather than utilizing general-purpose
processors and parallel processing, the present invention
significantly improves the efficiency of the linear solver and
hence reduces the computing time compared to traditional
platforms.
[0064] Thus, one aspect of the present invention relates to
hardware for the direct solution of sparse linear systems. The
hardware design takes advantage of properties of the matrices
arising in the specific computation obtained from a detailed
analysis of actual systems.
Overview of FPGA Technology
[0065] Field programmable gate arrays (FPGA) are reconfigurable
logic devices that allow users to configure device operation
through software. Arrays of programmable logic blocks, combined
with programmable I/O blocks, and programmable interconnects form
the underlying reconfigurable fabric for FPGAs. Software code,
typically written in a hardware description language (HDL) such as
VHDL or Verilog, is mapped to these devices allowing the user to
specify the functionality of the FPGA. FPGAs can be programmed
multiple times, unlike other programmable logic devices (PLD),
which can only be programmed once. In addition, unlike application
specific integrated circuits (ASIC), there is not a long lead time
in between design and testing since the designer does not have to
wait for masks and then for manufacturing to return a part for
test. The user simply compiles the design for the desired FPGA and
then programs the device.
[0066] FPGA device densities are in the millions for logic gates
with clock rates of 500 MHz for synthesized logic (see
www.xilinx.com). Beyond increases in logic density, the addition of
high performance embedded arithmetic units, large amounts of
embedded memory, high speed embedded processor cores, and high
speed I/O has allowed FPGAs to grow beyond just simple prototyping
devices. FPGAs can outperform high-end personal computers [3].
Based on synthesis results, a high-end FPGA such as a Xilinx
Virtex-4.TM. contains enough embedded multiplier and logic
resources to support over 40 high-speed single precision
floating-point multipliers operating at 300 MHz. Unlike personal
computer processors, FPGAs have the advantage of being able to be
mapped to the desired application, rather than being designed to
perform well across a wide range of applications. This allows the
greatest amount of utilization of the available resources on the
FPGA.
Overview of Power Flow by Newton's Method
[0067] The present invention will be exemplified based on power
flow computation. However, the present invention is applicable to
other types of complex computations as well.
[0068] Power flow solution via Newton's method [4] involves
iterating the following equation:
-J.DELTA.x=f(x) (1)
[0069] Until f(x)=0 is satisfied. The Jacobian, J, of the power
system, is a large highly sparse matrix (very few non-zero
entries), which while not symmetric has a symmetric pattern of
non-zero elements. .DELTA.x is a vector of the change in the
voltage magnitude and phase angle for the current iteration. And
f(x) is a vector representing the real and imaginary power
mismatch. The above set of equations are of the form Ax=B, which
can be solved using a direct linear solver. Direct linear solvers
perform Gaussian Elimination, which is performed by decomposing
tithe matrix A into Lower (L) and Upper (U) triangular factors,
followed by forward and backward elimination to solve for the
unknowns.
[0070] Dense matrix linear solvers are not as efficient as
specialized solvers designed for sparse matrices. These solvers do
not compute the non-zero entries. However, they suffer from a
phenomenon known as fill-in. Fill-in of a sparse matrix during LU
decomposition arises when a previously zero entry in the matrix
becomes non-zero during the elimination process. Large amounts of
fill-in will degrade the performance of sparse solvers by
increasing the number of mathematical operations required for the
LU decomposition. By using an appropriate elimination order, the
amount of fill-in can be dramatically reduced [5]. Selecting the
optimal ordering is an NP complete problem [6]. However, effective
heuristics are known that work quite well for the matrices in load
flow computation. In addition, sparse matrix solvers do not share
the regular computational pattern of dense matrix solvers and as a
consequence there is significant overhead in accessing and
maintaining data structures that utilize the sparse data.
Benchmark System and Properties
[0071] An in-depth analysis of the elimination process was
performed using four power systems of various sizes. Two of the
systems, the 1648 Bus and 7917 Bus systems, were obtained from the
PSS/E data collection, and the other two systems, the 10279 Bus and
26829 Bus systems, were obtained from power systems used by PJM
Interconnect. The purpose of the analysis was to obtain features of
the systems arising in typical power system computation that could
be utilized in the design of special-purpose hardware. In addition,
the systems were also used to benchmark the proposed hardware and
compare it against a state-of-the art software solution. Table 1
summarizes the number of branches and number of non-zeros (NNZ) in
the Ybus matrices and the size and number of non-zeros of typical
Jacobian matrices for the benchmark systems.
TABLE-US-00001 TABLE 1 Summary of Power System Matrices System
Branches NNZ YBUS NNZ Jacobian Jacobian Size 1648 Bus 2,602 6,680
21,196 2,982 7917 Bus 13,014 32,211 105,522 14,508 10279 Bus 14,571
37,755 134,621 19,285 26829 Bus 38,238 99,225 351,200 50,092
Operation Count (Opcounts) and Fill-in Study
[0072] The number of floating-point subtractions (FSUB),
multiplications (FMUL), divisions (FDIV), and the growth of the
matrices as a result of fill-in were collected for the Jacobians
and are summarized in Tables 2 and 3, This information was used to
project performance and estimate storage requirements for the
resulting L and U factors. The MATLAB functions COLAMD and SYMAMD
were used to determine elimination ordering. The number of fill-ins
is equal to the number of non-zeros in the L and U factors compared
to the Jacobian. SYMAMD utilizes the symmetric structure of the
non-zeros in the Jacobian and provides a substantial reduction in
fill-in and arithmetic operations.
TABLE-US-00002 TABLE 2 COLAMD NNZ and MFLOP Count System NNZ LU #
FDIV # FMUL # FSUB 1648 Bus 82,378 0.039 1.088 1.027 7917 Bus
468,757 0.225 9.405 9.041 10279 Bus 450,941 0.206 6.267 5.951 26829
Bus 1,424,023 0.661 40.110 39.037
TABLE-US-00003 TABLE 3 SYMAMD NNZ and MFLOP Count System NNZ LU #
FDIV # FMUL # FSUB 1648 Bus 45,595 0.021 0.268 0.243 7917 Bus
236,705 0.110 1.737 1.606 10279 Bus 293,688 0.130 1.976 1.817 26829
Bus 868,883 0.392 11.223 10.705
[0073] Given a floating point throughput limited calculation, the
performance is proportional to the number of floating point
operations (FLOPs) divided by the throughput of the floating-point
hardware. Based on the flop counts and a throughput of 200 million
floating point operations per second (MFLOPs), the projected solve
time for the largest system is approximately 0.11 seconds, solved
using the SYMAMD pre-permutation.
[0074] The storage requirement for the L and U factors is
proportional to the number of non-zero elements for a non-restoring
design, where the original inputs are overwritten by the resulting
solution. A sparse matrix storage scheme is much more space
efficient than dense storage when dealing with matrices of large
size, but very few non-zero values. Row compressed storage is a
common method used by sparse solvers [7]. This scheme stores the
non-zero values of a sparse matrix by a set of row vectors, one
vector containing non-zero values and the other containing the
corresponding column indices. Assuming 32-bit indices and 32-bit
values for the matrix entries, the amount of storage required to
store the L and U factors is approximately 7 MB for the largest
test system, solved using the SYMAMD pre-permutation. This amount
of storage is well within the space available on off the shelf
memory chips (www.micron.com).
Elimination Details
[0075] The pattern of non-zero elements that arise during the
elimination process was analyzed in order to estimate the available
parallelism, intermediate storage requirements, and potential for
data reuse.
[0076] Table 4 shows the average number of non-zero elements in the
rows and columns of the matrix as the elimination process proceeds.
It is used as an estimate for the amount of row or column
parallelism (separate rows can be updated in parallel) that can be
achieved by utilizing multiple FPUs. Observe that the reduced
fill-in obtained by SYMAMD leads to reduced parallelism. The amount
of resources on an FPGA can support the number of floating-point
units required to achieve maximal parallelism.
[0077] FIG. 1 shows the distribution of the number of non-zero
entries in the rows that occur during the LU factorization for the
1648 Bus system. The x-axis is the number of non-zeros and the
vertical bars indicate the number of rows with that many non-zero
entries. The bulk of the elimination rows are at or below the
average. This histogram is representative of all of the row and
column results for both COLAMD and SYMAMD across all four benchmark
systems.
[0078] The maximum number of non-zero elements in a row determines
the size of buffers that store rows. Table 5 shows the maximum NNZ
in a row for the entire row as well as the portion of the row used
during elimination (Table 5). Using 32-bit indices and 32-bit
floating point values, 19K bits of storage is required for the
largest elimination row of the 26829 Bus system. Today's FPGAs
contain thousands of Kbits of high speed embedded memory, which is
enough to buffer hundreds of rows or columns.
TABLE-US-00004 TABLE 4 Mean elimination row and column size COLAMD
SYMAMD System Row Col Row Col 1648 Bus 13.4 14.2 7.2 8.1 7917 Bus
15.8 16.5 7.8 8.6 10279 Bus 11.7 11.7 7.5 7.80 26829 Bus 14.2 14.2
8.5 8.9
TABLE-US-00005 TABLE 5 Maximum elimination row and matrix row size
COLAMD SYMAMD System Elim. Matrix Elim. Matrix 1648 Bus 90 231 66
218 7917 Bus 147 477 125 275 10279 Bus 116 416 186 519 26829 Bus
258 886 293 865
[0079] The amount of reuse between subsequent elimination rows
impacts the potential efficiency of a memory hierarchy. Rows that
are reused from one iteration to the next can be cached and a row
not used in the previous iteration can be prefetched. Rows reused
in subsequent eliminations do not allow pivot search to occur in
parallel with computation during elimination since some values
being calculated are necessary for the next pivot search. Table 6
shows that slightly more than half of the rows are reused between
subsequent iterations.
TABLE-US-00006 TABLE 6 Mean Percentage of Non-Zero Row Elements
Reused between Successive Elimination Rows System COLAMD SYMAMD
1648 Bus 64% 53% 7917 Bus 64% 54% 10279 Bus 60% 54% 26829 Bus 60%
54%
Software Performance
[0080] The comparison system used in the benchmarks is a 2.60 GHz
Pentium IV.RTM. computer running Mandrake Linux 9.2.TM.. Software
was compiled using gcc v3.3.1 (flags-O3-fpIC) and utilizing ATLAS
CBLAS libraries. Three high-end sparse solvers were tested, UMFPACK
(Un-Symmetric Multi-Frontal Package) [8], SuperLU
(http://www.cs.berkeley.edu/.about.demmel/SuperLU.html), and WSMP
(Watson Sparse Matrix Package:
http://www-users.cs.umn.edu/.about.agupta/wsmp.html). Of the three
solvers, UMFPACK gave the best results on all four systems. The
numerical solve time, floating point operations per second, and
floating point efficiency of the comparison system were gathered
using the power system matrices. Despite the 2.6 GHz clock speed
and impressive computational power of the Pentium IV.RTM. benchmark
system, even the highest performance software was only able to
utilize a fraction of the total available floating-point
resources.
TABLE-US-00007 TABLE 7 Comparison System Benchmarks Mflops re-
Efficiency % Data UMFPACK Numeric ported by (obtained flop * Name
Factor time (s) UMFPACK 100/peak flop) 1648Bus 0.03 27.42 1.05
7917Bus 0.13 34.69 1.33 10278Bus 0.16 24.9 0.96 26829Bus 0.49 89.74
3.45
Hardware Design
[0081] The proposed hardware implements the Gaussian elimination
with partial pivoting LU factorization algorithm shown below:
TABLE-US-00008 For each column in the matrix loop Search column for
pivot element Pivot if necessary Normalize pivot column Update
remaining portion of matrix: Multiply to form sub-matrix update
Update value or create fill-in End loop
[0082] In order to maximize performance, the design of the LU
hardware focused on maintaining regular computation and memory
access patterns that are parallelized wherever possible. FIG. 2
shows a block diagram of the overall design of one preferred
embodiment according to the current invention. The Pivot Search
Logic, Update Pivot Column/Row Read Logic, and Update Unit Logic
blocks each perform a portion of the overall LU algorithm, with
control and memory access handled by the other two blocks on the
FPGA. The memory hierarchy to which the processing logic connects,
consists of a write-back row-cache implemented in a fast
random-access memory such as SRAM, as well as a larger block of
SDRAM which is used to store the matrix rows which do not fit into
the row cache. Our design makes full use of Dual-Ported SRAM
memories which allow memory reads and writes to occur
simultaneously. An additional bank of random-access memory is also
required to store the colmap. The colmap is a redundant column-wise
array of the indices of non-zero elements in the matrix. The
purpose of the colmap is to increase the efficiency of the pivot
search.
[0083] The minimum size of the row cache is equal to the amount of
memory required to store the largest sub-matrix update, which is
proportional to the product of the maximum number of non-zero
elements in an elimination row and the maximum number of non-zero
elements in an elimination column. A cache of this size insures
that all row writes for a given sub-matrix update will be cache
hits and will occur at cache speed, since all row misses will only
occur during pivot search insuring row availability during
sub-matrix update. The parameters to calculate the amount of cache
required was extracted as previously described under Elimination
Details.
[0084] For the initial design, the cache row size, SDRAM row size,
and colmap row size exceed the maximum counts, which were extracted
during the initial matrix profiling. This insures that the memories
will have enough space to store the worst case, however, this also
means that there will be inefficiencies in the storage for the
typical case.
[0085] Not shown in the hardware diagrams are the facilities
required to maintain row pivoting information, which is implemented
by lookup tables and memory pointers, which keep track of the row
to memory mappings for the compressed row storage format. An
additional table is used to keep track of elimination progress to
avoid reading row values that are not part of the sub-matrix
update, i.e., elements that have already been eliminated. The size
of these lookup tables scales linearly with matrix size. Depending
on the size of the matrices to be solved and the available on-chip
resources, these tables can be stored in embedded memory on-chip or
if they are larger, in off-chip memory.
Pivot Search Logic
[0086] Now referring to FIG. 3, the pivot search logic block
performs the pivot search prior to each sub-matrix elimination
step. A memory read of the column map followed by indexing logic
creates the indices for the pivot column. The column map stores the
physical address of rows in a particular column. The physical
address must then be translated to the corresponding matrix row
number. Since the colmap may contain rows which have already been
eliminated, index rejection logic is required to select the
appropriate indices. The physical addresses corresponding to these
indices are then used to access the values from memory. The values
are compared sequentially as they arrive to obtain the absolute
maximum value and index. This value is stored in a register as the
pivot element. The physical indices, virtual indices, and
floating-point values for the pivot column are also stored in
registers to be used by the pivot column update logic. The minimum
amount of memory required for these registers is equal to the
product of the largest elimination column in the matrix and the
amount of storage for two indices and one value. After pivoting is
complete, a background update is sent to lookup tables (not shown)
to maintain the correct row and address mappings for future
translations.
Update Pivot Column/Row Read Logic
[0087] Now referring to FIG. 4, the update pivot column/row read
logic block performs normalization prior to elimination in parallel
with the pivot row requested from memory. The pivot index, pivot
value, and pivot column are from the pivot search logic (FIG. 3).
After the pivot row is completely read from memory into buffers,
and at least one of the update column entries has been created the
next part of the elimination logic can proceed. Since the design
utilizes a floating-point divider with a pipeline rate of 1 per
clock cycle, all of the remaining divides will complete before they
are requested by the update unit logic. The update row indices and
values, and the normalized update column indices (physical and
virtual) and values are stored in registers. The minimum size of
this register space is equal to the product of the maximum
elimination row in the matrix and the amount of space required to
store three indices and two values.
Update Unit Logic
[0088] Now referring to FIG. 5, the update unit logic performs the
remaining computations required during the elimination step. In
this design, the processing logic can support multiple update units
operating in parallel. Each update unit is tasked with completing
the elimination for a single row of the sub-matrix to be updated.
Each update unit contains a floating-point multiplier and adder to
perform the required arithmetic as well as other logic that
operates on the column indices. The computational logic operates in
parallel with the memory and indexing logic to maximize utilization
of all of the logic units. The number of update units that can fit
on the FPGA is limited by the amount of on-chip embedded memory
available, the number of multipliers available, and also the
available logic resources. A high end FPGA has enough resources to
support over 16 of these units.
[0089] The merge logic maintains the row order and also determines
if the multiplier output is a fill-in, an update, and also
determines if the sub-matrix row element should be copied into the
new row. If an entry is a fill-in, the colmap table is updated in
the background to reflect the newly added index. Updates that
result in a zero value are not pruned from the colmap and they are
also stored in the row result. All logic in the update unit is
pipelined for maximum throughput. Once the front end of the update
unit has completed processing, it can request the next available
row to be operated on in order to eliminate idle logic cycles.
After all of the pending row eliminations for the current
sub-matrix update have been completed, the next pivot search can
begin.
Hardware Performance Model
[0090] A program was written in C, to project the performance of
the hardware design. This program performs the elimination in the
same fashion as the hardware would, and operates on an input file
that contains the power system Jacobian matrix. The number of clock
cycles the hardware requires to obtain the L and U factors is
accumulated based on the latency and throughput of the processing
logic and memory hierarchy. This cycle count allows us to compute
the computation time based on clock speed.
TABLE-US-00009 TABLE 8 Solve time for LU Hardware Update Units
Solve Time (s) System 1 2 4 8 16 1648 Bus 0.0041 0.0028 0.0021
0.0018 0.0017 7917 Bus 0.0243 0.0157 0.0116 0.0097 0.0090 10279 Bus
0.0311 0.0202 0.0149 0.0126 0.0117 26829 Bus 0.1378 0.0817 0.0543
0.0415 0.0359
[0091] The memory model used for accessing SDRAM utilized three
parameters to model the average behavior of a SDRAM memory: verall
latency, burst read length, and latency between bursts. For the
SRAM and embedded memories the model parameter was latency from
request to data output. The only parameter for the floating-point
hardware is the pipeline length, which affects the latency of
results. All units were assumed to have a pipeline rate of 1 per
clock cycle, even the floating-point division hardware. Extending
the model to multiple instances of update unit logic was done by
simulating round-robin service for each update unit by the memory
hierarchy assuming a proportional increase in bandwidth. The clock
rate for memory and logic was assumed to be synchronous at 200 MHz.
This system clock rate was chosen because this was the maximum
clock rate reported by VHDL synthesis for our floating-point
units.
[0092] Table 8 shows the projected performance of the proposed LU
hardware architecture based on the four test power systems using
the SYMAMD pre-permutation at a clock rate of 200 MHz. The
simulated hardware performance achieves approximately 80% of the
performance predicted by opcount. The projected performance using
COLAMD (not shown) is less than SYMAMD, regardless of the number of
update units used. FIG. 6 shows the relative speedup of the LU
hardware by increasing the number of update units. Scaling to even
four update units does not net an equivalent speedup in overall
solve time. The reason for this is apparent in FIGS. 7 and 8.
[0093] FIGS. 7 and 8 show the relative time spent in the three
portions of the LU hardware solution. For the single update unit
configuration (FIG. 7), the majority of the time is spent
performing the matrix update. Roughly a quarter of the time is
spent searching for the pivot value. A FIG. 8 show that as the
number of update units increases, the amount of time spent during
pivot searching begins to dominate the overall solve time.
[0094] FIG. 9 shows the predicted performance ratio of performance
of the LU hardware at 200 MHz to the Pentium IV.RTM. benchmark
system. This data predicts a speedup over the Pentium IV.RTM. of a
factor of 10 with only 4 update units and approximately 15 with 16
units.
[0095] These results show that special-purpose hardware implemented
using existing FPGA provides a 10.times. or higher performance
increase, compared to software solutions using state-of-the-art
sparse linear solvers, for the LU decompositions that arise in
typical power flow computations. The speedup is obtained by
utilizing special-purpose hardware to reduce the overhead to access
and maintain data structures for sparse matrices and the use of
fine-grained parallelism which performs multiple row updates in
parallel. The reduction in overhead dramatically improves the
utilization of the floating-point units as compared to software
solutions on general-purpose processors.
[0096] A limiting factor is the time for pivot search. Eliminating
or reducing the pivot search time will result in additional gains
in performance. Such improvement may be obtained by overlapping the
next pivot search with the current update, however, this is limited
by the amount of reuse in rows from one elimination step to the
next. The use of other approaches such as the Bordered Diagonal
Block (BDB) technique may provide additional parallelism.
Analog Power Flow Method
[0097] A DC emulation power flow method has been proposed in [102]
and is reviewed here for an understanding of the application of the
line model in this invention. This approach utilizes multiple DC
networks as shown in FIG. 10. The power system is broken up into
three main components: generators, transmission lines, and loads.
The generators are shown as voltage sources. This form of DC
emulation is possible due to the fact that the emulation is
executed in rectangular coordinates. The generators excite the
network with real (Re{E.sub.g}) and imaginary (Im{E.sub.g}) voltage
components and the states (voltages and currents) of these networks
provide the steady-state power flow solution. The transmission line
components in this emulation scheme are purely resistive and the
imaginary resistor values are dependant upon operating frequency.
By modeling as fixed resistors the frequency is assumed to be
constant.
[0098] The current flowing out of a given generator i is calculated
using the following equation [103]:
I Gi = j = 1 n Re { Y ij } Re { E j } network 1 - j = 1 n Im { Y ij
} Im { E j } network 2 + j j = 1 n Im { Y ij } Re { E j } network 3
+ j j = 1 n Re { Y ij } Im { E j } network 4 ( 2 ) ##EQU00001##
[0099] This approach accurately models and calculates power flow
for a lossy transmission network with the assumption that there is
no frequency deviation from nominal. A lossy transmission line
model is emulated composed of both reactive and resistive elements
as shown in FIG. 11.
[0100] Relating the emulation circuit to the real world power
system determines the real and imaginary resistor values in the DC
networks. The following governs this [104]:
R Re ( ij ) = 1 Re { Y ij } = R ij 2 + X Lij 2 R ij ( 3 ) R Im ( ij
) = 1 Im { Y ij } = R ij 2 + X Lij 2 X Lij ( 4 ) ##EQU00002##
where R.sub.ij and X.sub.ij are the resistance and reactance of a
line between buses i and j. The same approach is used for lossless
transmission lines. The transmission line resistances are neglected
as shown in FIG. 12. The topology is similar to FIG. 10 with the
exclusion of networks 1 and 4. Similarly, it is solved by omitting
the excluded networks.
[0101] The general layout for the DC emulation approach has been
provided along with a relationship between AC lumped line models
and the DC networks. With this information, an operational
transconductance amplifier (OTA) based transmission line model with
reconfigurability was developed explicitly for the DC emulation
power flow method.
Transmission Line Model
[0102] The transmission lines were modeled as resistors in the DC
emulation shown previously for both lossy and lossless lines.
Actual use of resistors would require manual intervention to
configure and alter the analog system. In order to overcome this
problem, a remotely reconfigurable OTA based variable resistor was
developed to model the lines.
[0103] The transconductance gain (g.sub.m) of an OTA is
controllable through an external source over a wide range, allowing
remote reconfigurability. In addition, the basic OTA only consists
of a few current mirrors resulting in a small and relatively
low-cost device. They are also readily used in VLSI design. Due to
its versatility, the OTA is one of the most important building
blocks of analog VLSI circuits [105] and can be highly accurate if
used properly. Some of the pitfalls of OTAs are their sensitivity
to temperature and saturation effects but there are methods to
correct these [106, 107]. The principle operation of an OTA is a
voltage controlled current source (VCCS). FIG. 13 shows an ideal
OTA.
[0104] The amplifier has a differential voltage input and a current
output. The current output is related to the differential input
through a transconductance gain (g.sub.m) controllable through an
external source. Ideally the output would be:
i.sub.out=g.sub.mv.sub.in (5)
where v.sub.in is the differential input voltage.
[0105] With VCCS operation, the OTA naturally lends itself to an
application of a variable resistor. The proposed OTA based variable
resistor is shown in FIG. 14. It consists of two OTAs to address
bi-directional current flow. When current is flowing from left to
right, the OTA on the right is supplying the current, conversely if
current is flowing in the opposite direction. The inputs are
floating with respect to the power supply and feedback is
incorporated to extend the linear operating range of the
device.
[0106] FIG. 14 illustrates a double-ended OTA transmission line
model [108]. This line model has three terminals similar to a
variable resistor. The two inputs (V.sub.1, V.sub.2) mimic
terminals of a resistor and the i.sub.abc is the biasing current
used to control the resistance of the model similar to the wiper on
a potentiometer. The terminal relationship to a potentiometer is
shown in FIG. 15.
[0107] The effective resistance, Reff, of the OTA model is defined
as the resistance seen between terminals V.sub.1 and V.sub.2. Reff
is governed directly by Ohm's law through the line voltage
(V.sub.line) and the line current (I.sub.line).
R eff = V 1 - V 2 I = V line I line ( 6 ) ##EQU00003##
[0108] This effective resistance is controlled by the bias current
and approximated by the following equation:
R eff .apprxeq. 2 R g m R A ( 7 ) ##EQU00004##
[0109] The sizing of resistors R and R.sub.A along with the range
of transconductance gain will determine the behavior of the
circuit. With appropriate resistors and wide control over gain a
very large range of effective resistance is obtainable. The
limitations of the circuit are related to properties of OTAs. The
limitations, operation and controllability were analyzed through
PSPice simulations and hardware testing.
Simulation
[0110] A LM13700 dual OTA was used for the simulations in PSPice.
Various simulations were run to validate and characterize the line
model. Controllability, resistance variance and circuit limitations
were analyzed. The first simulation dealt with controlling the
circuit through the bias current. Applying a current source is not
as simple and easily controllable in comparison to a voltage
source. The simulation using the circuit in FIG. 16 uses a voltage
source (V.sub.abc) behind impedance (R.sub.abc) to drive the bias
current.
[0111] A voltage sweep was performed on V.sub.abc and the
corresponding bias current produced was measured. The relationship
between bias voltage and i.sub.abc is differentiable until
i.sub.abc approached zero as shown in FIG. 17.
[0112] From these results, the following relationship between
V.sub.abc, R.sub.abc and i.sub.abc was formed:
i abc = 1 2 ( V abc + 13.56 R abc ) ( 8 ) ##EQU00005##
[0113] This approximation is close to the simulation results
deviating only as the bias current becomes very close to zero. This
is due to a non-linear roll-off of the bias current when compared
to the bias voltage applied. Reasonable accuracy is obtained from a
.+-.10 Volt bias voltage.
[0114] The transconductance gain of a basic bipolar OTA is directly
proportional to the bias current and is quantified by [109]:
g m = i abc 2 V T sec h 2 ( V i n 2 V T ) ( 9 ) ##EQU00006##
[0115] The gain in (9) is not linear, typical applications
linearize g.sub.m around an operating point. The differential input
of the OTA is limited within very small range to ensure accuracy of
this linear approximation. The LM13700 OTA gain is also a
hyperbolic function although it improves the linearity through the
use of linearizing diodes. These diodes allow a larger swing of
input voltage while maintaining linear behavior. The gain for this
chip, factoring in the diodes, is approximated by (10) for a
differential input voltage in the range of .+-.50 mV. This gain can
directly control the effective resistance in (7).
g m .apprxeq. i abc 2 V T sec h 2 ( V i n D 2 V T ) ( 10 )
##EQU00007##
where V.sub.T=26 mV and D is a diode linearization constant.
[0116] A simulation was conducted where V.sub.abc was varied to
change the effective resistance while holding the line voltage
constant. FIG. 18 shows simulation and theoretical results from
(7). The effective resistance was computed using (6). These results
show the controllability of resistance over a wide.
[0117] More simulations were run to analyze the consistency of the
resistance. While maintaining a constant bias current, the circuit
was subjected to varying line voltages. FIG. 19 shows multiple
sweeps of the line voltage with different bias currents. The plot
shows line current vs. line voltage. For an ideal resistor, the
slope of the resulting line should be constant.
[0118] As seen in FIG. 19, when the line voltage is between .+-.4
volts, the slope, or resistance, remains constant. Beyond that
threshold the OTA begins to saturate. FIG. 20 shows the same plot
zoomed in on the linear range.
[0119] The saturation of the circuit is dependant on two factors:
the bias current and the throughput current. The effective
resistance has a variance of .+-.2% with throughput current 40% or
less of the bias current. Complete saturation occurs when the
throughput current is equal to the bias current. Further
examination of FIG. 20 reveals an offset in the circuit behavior.
The output of the circuit is nonzero when no voltage is applied to
the input (one of the line terminals). This is problematic as the
effective resistance of the circuit governed by Ohm's law is
nonlinear. The computation of this resistance is shown in FIG. 21
and exhibits anomalous behavior.
[0120] The offset current present at the output of the OTA is
almost entirely caused by the internal offset voltage of the OTA.
This can be modeled in a similar manner as offset voltages of
traditional op-amps. Specifically, a voltage source (v.sub.off) at
the input of the device is amplified to the output producing the
offset (i.sub.offset). For an OTA, this is quantified by (11).
i offset = v off sec h 2 ( v i n 2 V T ) ( 11 ) ##EQU00008##
[0121] A method for eliminating an OTA input offset voltage has
been proposed in [110]. This method is suitable for fabrication and
quantifies and eliminates the offset for any gain configuration. An
offset rejection of 40 dB was achieved in simulation [110]. With
the recognition of sufficient offset cancellation, the data was
corrected in a manner representing the elimination of the offset
quantified by (11) to analyze the results with offset compensation.
The corresponding results are much better. FIG. 22 shows effective
resistance versus line voltage for various bias current setting
corrected for the offset. The asymptotic behavior is eliminated and
the resistance exhibits a variance of approximately 1% with a line
voltage magnitude of 2.5 volts or less.
Hardware
[0122] The OTA based transmission line model was constructed in
hardware using the LM13700 dual OTA and tested. Tests similar to
the PSpice simulations were conducted. The effective resistance of
the circuit was calculated from measurements while varying the bias
voltage. The hardware exhibits similar controllable resistance as
the software simulations with only slight deviation from simulation
results as shown in FIG. 23. Analysis shows the offset voltage
present in hardware is slightly higher, and subsequently closer to
the data sheet specifications, than the PSpice model.
[0123] Multiple hardware tests were conducted to analyze the
saturation effects of the circuit. The circuit was configured with
a constant bias current and subjected to a varying line voltage.
FIG. 24 shows results of these tests for eight different resistance
configurations. The slope of plots is indicative of the effective
resistance and is constant for a limited line voltage range before
saturation sets in similar to the simulation results. Approximately
the same variance of .+-.2% for line currents at 40% or less of
bias current was exhibited in hardware. FIG. 25 shows line current
and the linear characteristic of this line current plotted against
line voltage. The response is very linear and deviates only
slightly in this range. This linearity is exhibited in the
effective resistance of the circuit.
[0124] The effective resistance is plotted against line voltage in
FIG. 26. With the line voltage range between .+-.2.5 volts, the
variance of the effective resistance was less than 1%. This is an
accurate range for operation of this model. The line model becomes
less accurate, saturation effects become more severe, as the line
current magnitude approaches the bias current magnitude. The linear
range for this model, defined by <1% variance, is quantified by
the following relationship:
i.sub.line.ltoreq.0.3i.sub.abc (12)
[0125] Within device limitations, the line voltage can be any
magnitude as long as the relationship in (12) is maintained. This
is advantageous as the linear operating voltage range has been
significantly increased from a basic open loop OTA with the use of
feedback. This will allow easy measurement of the bus voltages in
an emulation network.
[0126] The results from both simulation and hardware were very
similar and verify the operation, control and linear operating
conditions of the proposed OTA based transmission line model. A
three-bus prototype was constructed and tested based on these
results.
Three-Bus Prototype
[0127] A network module prototype of a three-bus power system was
built based on the proposed transmission line model. This design
was incorporated on a single circuit board containing the
transmission line components of the four networks as shown in FIG.
10. It is suitable for both lossy and lossless models shown in
FIGS. 11 and 12, respectively. The main purpose of this prototype
was to validate the transmission line model application in the
previously proposed DC emulation power flow technique [102]. The
configuration of the power system is shown in FIG. 27.
[0128] The power system consists of one load, two generators and
three transmission lines (A, B, C). Four independent networks
containing three OTA based variable resistors were used in this
emulation. The combination of these networks constitutes the
complete transmission line model. The circuit board contains only
the line models with terminals to attach generator and load models.
The complete circuit board is shown in FIG. 28. The lines (A, B, C)
are highlighted in the picture to show the relationship to the
power system along with the labeling of the four networks in
relation to FIG. 10. The input terminals for generators, loads, and
power supply are also shown. For simplistic operation with fewer
power supplies, all the bias currents are driven by the +15 volt
supply. Configuration of the lines was accomplished by the
respective values of R.sub.abc. In addition, there are jumpers on
the board that connect the transmission lines to the buses. Lines
can be removed for contingency analysis or the network can be
reconfigured with these jumpers. In addition, this design is
scalable and subsequently larger power systems can be emulated with
interconnection of multiple boards.
[0129] The network module was configured for power flow emulation.
DC voltage sources were used as generators (values were obtained
from a computed steady-state power flow solution) and the load was
an OTA based constant current source. This is shown in FIG. 29.
While this setup requires the prior knowledge of the generator bus
voltages, the appropriate voltage on the load bus corresponding to
the correct load currents is sufficient to verify the functionality
of the network module. The results were compared to results from
the PowerWorld.RTM. software package [111].
[0130] The network module was configured per the lossless system in
PowerWorld.RTM. shown in FIG. 30. Table 9 shows the results, load
bus voltage in per unit, from both the hardware (offset corrected)
and the software computation.
TABLE-US-00010 TABLE 9 Power World and HW Emulation Results Load
Bus Voltage Rectangular Polar PowerWorld 0.78 + i0.25 0.816
17.48.degree. HW Emulation 0.78 + i0.24 0.817 17.35.degree.
indicates data missing or illegible when filed
[0131] The results in this case are very accurate. From multiple
runs and different load configurations the voltage magnitude error
between PowerWorld.RTM. and HW was approximately 1.5% and the phase
angle error was about 5%. The errors are caused mostly by
deviations from OTA to OTA and resistor tolerances.
[0132] The transmission line model presented is sufficient for
lossy and lossless line models for use in the DC emulation static
power flow technique mentioned. The overall model consists of
multiple OTA based variable resistor circuits. The shortcomings of
the hardware prototype, such as linear range, accuracy and device
offsets, are mostly due to the current OTA technology. This can be
overcome with further development of better performing OTAs, which
have been shown in prior research, in a similar manner in which
traditional op-amps have been developed. This problem can also be
tackled through a custom VLSI design.
[0133] An accurate, low-cost and remotely reconfigurable OTA based
analog transmission line model has been developed. Simulation and
hardware results verified the design, clearly defined linear
operating ranges and examined the deficiencies of current
off-the-shelf OTAs while pointing out methods of compensation. A
three-bus power system was constructed based on the proposed model
and the resultant power flow results compared favorably to those
from commercially available software. This expands on the work
presented in [101-103] by the introduction of a hardware based line
model with fast, remote reconfigurability suitable for the
presented power flow emulation scheme.
[0134] In another preferred embodiment, simulation and/or emulation
are used when dealing with large-scale power systems. They make it
possible to do essential assessment in power system dispatch,
operation, security and stability. Different simulation/emulation
tools today are built for different applications like transient
stability analysis, fault analysis, power flow studies, operational
planning, etc.
[0135] ABMs are used to make flexible descriptions of electronic
components or complex building block in terms of transfer function
or lookup tables. In order words, a mathematical relationship is
used to model a circuit segment so one will not need to design the
segment component by component. FIG. 31 shows a sine shaper which
is a device that outputs sine value of its input signal. The left
side of FIG. 31 is a modified Barry Gilbert sine shaper [203] built
component by component using analog devices, and the right side of
FIG. 31 is an ABM sine shaper [201]. The only similarity between
the two is that they both take the same input signal and produce
the same output.
[0136] One aspect of the invention employs transient stability
analysis with the advantages of shorter computation time than
current digital simulations and smaller size and cost than discrete
analog emulators. One of the main disadvantages of this method is
the limited accuracy due to the implementation of the analog VLSI
chip.
Methodology of Core Computations Using ABMs
[0137] Analog Behavioral Models (ABMs) of PSpice are used to
substitute and implement mathematical equations and/or scaled
relationship analog circuit representations, which describes the
states of power system and emulate its behaviors. This gives a
clearer picture of the end goal, design layouts and intricacies of
circuit connections.
[0138] Power system core computation methodology implemented in the
design of analog emulation goes through three major steps. These
steps are illustrated in FIG. 32. The assumptions made and detailed
processes are discussed in [211] and [202], respectively.
[0139] Step 1: Calculating Complex Voltage Out of Generator
[0140] The generator block solves the swing equation (dynamic) with
resulting power angle, .delta.. The power angle combines with
desired voltage magnitude to give a complex voltage out of a
generator.
[0141] Step II: Calculating Currents in the Network
[0142] The nodal and generator induced voltages interact with
transmission line impedance and load (if not lumped with line
impedance) to produce complex current flowing in any branch. Using
admittance for simplicity and expressing in rectangular form,
current is computed as in FIG. 32. As depicted in FIG. 33, it will
require four separate networks in order to emulate complex current
flowing on any branch. The approach of DC-resistive network as
proposed by Fried et al. [211] is used. Transformation of the
transmission line complex impedance to its resistive value for real
and imaginary part is through its complex conjugate as shown in
equation 13:
Y = 1 Z = 1 R + j X = R - jX R 2 + X 2 Y Re = R R 2 + X 2 = Y r Y
Im = ( - ) X R 2 + X 2 = Y i } ( 13 ) ##EQU00009##
Combining equation 13 with FIG. 33, implementation of complex
current computation as described in FIG. 34 is shown. This means
that the real part of complex current flowing in any branch is
given by the sum of component currents in networks 1 and 2, and the
imaginary part as the difference between component currents in
networks 4 and 3, as described in equation 14.
I = I r + j I i I r = I I + I II I i + I IV - I III } ( 14 )
##EQU00010##
where subscripts r and i represent real and imaginary part,
respectively. I, II, III, IV represent the four DC networks.
[0143] Step III: Computation of Generator Real Power and Swing
Equation Update
[0144] The real power out of each generator in the system is
calculated as the real part of the product of generator terminal
voltage and complex conjugate of the generator current:
(15)
[0145] The swing equation describes the dynamics of a generator.
Its input is a mechanical power from a prime mover and which is
assumed constant. The output of a generator is an electrical power,
which changes in accordance with the state of the network. The
swing equation is a second order differential equation and its
solution is the power angle of the generator. The power angle
combines with generator internal voltage to constantly update the
generator terminal voltage and hence the generator current. FIG. 35
describes the continuous cycle of power system core
computation.
[0146] FIG. 36 describes the DC-network equivalent of a real power
system bus and conditions for its implementation. Analyzing a
sample power system and considering an arbitrary bus k with its
complex voltage values, there are 4 networks with 4 buses K.sup.I,
K.sup.II, K.sup.III, and K.sup.IV as equivalents. For feasible bus
voltages, the following conditions must be satisfied:
V k I = V k III = V k r V k II = V k IV = V k i } ( 16 )
##EQU00011##
[0147] The feasible bus voltages condition is realizable with a
constant PQ-load model [202]. Among other attributes of an analog
emulator that contribute greatly to its versatility are the scale
factors. Scale factors are constants that relate the values of the
variables on the emulator to the values of the variables in the
real system under study.
Case Studies and Results
[0148] Cases studied, which include 3-bus, 6-bus and 14-bus lossy
and lossless power systems, were implemented using ABMs of PSpice.
ABMs substitute real analog circuit implementation of the
mathematical equations and scaled relationship that describe the
states of the power system cases and emulate their behaviors. Power
flow tests ere conducted on each of the cases. Similar cases were
tested on industrial grade numerical simulation software,
PowerWorld.RTM. v9.1 [212] and Power System Simulation for
Engineering.RTM. (PSS/E) v.28.1 [213], for benchmarking. For the
validation process, the following parameters were measured and/or
calculated and compared with the benchmarks.
[0149] i. .delta.j--Voltage angle difference between generator
buses j and 1 (j=2, 3, . . . );
[0150] ii. Load bus complex voltage;
[0151] iii. Complex current flowing in each branch.
Simulation results obtained from the analog emulation engines
implemented on ABMs of PSpice and that of the benchmarks are
presented in Tables 10 through 13. The results compare favorably.
This confirms the validity of the methodology as well as the
technique of implementation.
TABLE-US-00011 TABLE 10 Summary Results for 3-Bus Power System
(Lossless System) PowerWorld PSS/E Analog Emulator Relative Power
Angle 0.94.degree. 0.9.degree. 0.94.degree. (.delta..sub.11) in
degree Load Bus (V3) Voltage 0.988 -3.17.degree. 0.988 -3.2.degree.
0.9880 -3.12.degree. (V.sub.in) Current in Branch 2_1 343.1848 343
343.2 0.47.degree. (I, A) Current in Branch 1_3 784.49 784 784.5
-13.91.degree. (I, A) Current in Branch 2_3 503.87 504 503.9
-10.70.degree. (I, A) indicates data missing or illegible when
filed
TABLE-US-00012 TABLE 11 Summary Results for 3-Bus Power System
(Lossy System) PowerWorld PSS/E Analog Emulator Relative Power
Angle 0.65.degree. 0.6.degree. 0.65.degree. (.delta..sub.21) in
degree Load Bus (V3) Voltage 0.9675.sub.in -1.03.degree.
0.968.sub.in -1.degree. 0.9675.sub.in -1.03.degree. (V.sub.in)
Current in Branch 2_1 334.2612 334 334.6 45.30.degree. (I, A)
Current in Branch 1_3 691.3922 691 691.6 1.36.degree. (I, A)
Current in Branch 2_3 654.2371 654 654.2 -23.1.degree. (I, A)
indicates data missing or illegible when filed
TABLE-US-00013 TABLE 12 Summary Results for 6-Bus Power System
(Lossy System) PowerWorld PSS/E Analog Emulator Relative Power
Angle -2.10.degree. -2.1.degree. -2.10.degree. (.delta..sub.21) in
degree Relative Power Angle -2.71.degree. -2.7.degree.
-2.71.degree. (.delta..sub.31) in degree Load Bus (V4) Voltage
0.974 -7.35.degree. 0.974 -7.3.degree. 0.974 -7.35.degree.
(V.sub.in) Load Bus (V5) Voltage 0.988 -4.69.degree. 0.988
-4.7.degree. 0.988 -4.69.degree. (V.sub.in) Load Bus (V6) Voltage
0.983 -6.36.degree. 0.983 -6.4.degree. 0.9829 -6.36.degree.
(V.sub.in) Current in Branch 1_2 344.86 345 344.86 -29.76.degree.
(I, A) Current in Branch 1_5 758.72 759 758.72 -20.98.degree. (I,
A) Current in Branch 2_5 388.84 389 388.84 -15.95.degree. (I, A)
Current in Branch 2_6 515.047 515 515.04 -11.7.degree. (I, A)
Current in Branch 3_6 376.53 377 376.54 -32.34.degree. (I, A)
indicates data missing or illegible when filed
TABLE-US-00014 TABLE 13 Summary Results for 14-Bus Power System
(Lossless) PowerWorld PSS/E Analog Emulator Relative Power Angle
-1.08.degree. -1.1.degree. -1.08.degree. (.delta..sub.21) in degree
Relative Power Angle -0.75.degree. -0.8.degree. -0.75.degree.
(.delta..sub.31) in degree Relative Power Angle -8.09.degree.
-8.1.degree. -8.09.degree. (.delta..sub.41) in degree Relative
Power Angle -7.30.degree. -7.3.degree. -7.30.degree.
(.delta..sub.51) in degree Load Bus (V4) Voltage 0.989
-5.51.degree. 0.989 -5.5.degree. 0.989 -5.51.degree. (V.sub.in)
Load Bus (V5) Voltage 0.991 -4.75.degree. 0.991 -4.8.degree. 0.991
-4.75.degree. (V.sub.in) Load Bus (V7) Voltage 0.984 -9.22.degree.
0.984 -9.2.degree. 0.984 -9.22.degree. (V.sub.in) Load Bus (V9)
Voltage 0.979 -11.15.degree. 0.979 -11.2.degree. 0.979
-11.15.degree. (V.sub.in) Load Bus (V10) Voltage 0.980
-11.68.degree. 0.980 -11.7.degree. 0.980 -11.68.degree. (V.sub.in)
Load Bus (V11) Voltage 0.987 -10.7.degree. 0.987 -10.7.degree.
0.987 -10.70.degree. (V.sub.in) Load Bus (V12) Voltage 0.982
-12.42.degree. 0.982 -12.4.degree. 0.982 -12.88.degree. (V.sub.in)
Load Bus (V13) Voltage 0.980 -12.88.degree. 0.980 -12.9.degree.
0.980 -12.88.degree. (V.sub.in) Load Bus (V14) Voltage 0.979
-11.2.degree. 0.979 -11.2.degree. 0.979 -11.20.degree. (V.sub.in)
Current in Branch 1_5 311.63 312 311.63 8.52.degree. (I, A) Current
in Branch 2_5 309.85 310 309.85 10.86.degree. (I, A) Current in
Branch 5_4 262.50 263 252.50 12.84.degree. (I, A) Current in Branch
4_9 151.21 151 151.21 14.42.degree. (I, A) indicates data missing
or illegible when filed
[0152] Simulation outputs of analog emulator were measured in
rectangular form and/or radians. Conversions to polar
representation and/or degree were made for easy comparisons with
the benchmarks. Also, for convenience and efficient analog
emulation implementation, certain power system variables and
parameters were represented by other circuit variables FIG. 37
summarizes variable representations used in this example. Since
conversion ratio used between current-voltage or power-current is
1:1, obtained values remain representative in quantity.
Generator Model
[0153] The proposed circuit is based largely upon the classical
generator model. This section presents a short review of the basics
behind the generator model developed and scaling of the model to an
appropriate analog circuit. The classical generator model that will
be used for this example consists of a voltage source behind an
impedance, where V.angle.0 represents the generator terminal
voltage, E.angle..delta. denotes the internal generator voltage and
angle, and Z representing the transient reactance of the generator
'dX. The generator consisting of these elements can be seen in FIG.
39.
[0154] With this model, the electrical angle, .delta., is unknown,
and therefore, needs to be computed. In order to determine the
electrical angle, the dynamics of the generator were captured. The
simplified mechanical equation that governs the motion of the
generator rotor, describes the swing between the mechanical and
electrical angle, otherwise known as the swing equation,
M{dot over ({dot over (.delta.)})}+D{dot over
(.delta.)}+P.sub.e(.delta.)=P.sub.m (17)
[0155] where M is the generator inertia coefficient, D is the
damping coefficient, P.sub.e is the electrical power output, and
P.sub.m is the mechanical input power. The solution we seek, the
load flow, corresponds to the steady state solution of this
equation.
[0156] In order to simplify the calculation required in the analog
circuit, the electrical power output of the generator will be
solved in rectangular form. For converting the polar notation to
rectangular form, only the real and imaginary components will be
necessary and the calculation for the electrical output power of
the generator model can be rewritten as the following equation:
Re{S}=Re{VI*}
P.sub.e=E.sub.RealI.sub.Real+E.sub.ImagI.sub.Imag
P.sub.e=|E|cos .delta.Re{I}+|E|sin .delta.IM{I} (18)
[0157] The representation of the electrical output power will need
to be applied directly into the swing equation for the calculation
of the electrical angle. The functionality of the swing equation as
seen in the block diagram in FIG. 40, will provide us with a layout
for the generator model that will be presented in the following
section.
[0158] With many different models representing the generator
available, we will use the classical model while neglecting
damping. This will simplify the non-linear dynamics of the
classical generator without significantly reducing the precision of
the model [305]. With these assumptions, the swing equation (17)
will reduce to only the difference between the mechanical input
power and the electrical output power. After factoring in the
generator inertia coefficient and taking two integration steps of
the angular acceleration, the solution of the electrical power
angle can be calculated in the following form:
.delta. = 1 M .intg. .intg. ( P m - P e ( .delta. ) ) t t ( 19 )
##EQU00012##
Scaling Parameters
[0159] In order to realize a power system using analog emulation,
two different scaling factors need to transpire. The time scaling
parameter is for the computation speedup of the emulator while the
scaling of parameter space is for the electrical angle to be
represented as voltage in the circuit. Next is the introduction of
the time scale factor, .tau., a constant which will represent a
computational speedup of .tau. times faster than the real-time
system. This factor is directly proportional to the real time of
the power system, t, for a given simulation time of the analog
emulator, T. With this advantage, we can include the time scale
factor in the solution of the electrical angle (19), capable of
scaling the computation time of emulator in the following
manner:
.tau. = t T .delta. = .tau. 2 M .intg. .intg. ( P m - P e ( .delta.
) ) t t ( 20 ) ##EQU00013##
[0160] The time scale factor needs to be squared because of the
second order nature involved in the dynamics of the classical
generator model. A collaborating factor that may influence the
choice of the time scale is that the error in the integration may
be accentuated by long emulation times.
[0161] Another scaling procedure is actual parameter scaling of the
electrical angle to voltage. Introducing the desired voltage,
.upsilon. to be represented as a voltage level in the circuit that
is equivalent to .pi. radians, (20) can be rewritten as the
following equation:
.delta. = .tau. 2 M .upsilon. .pi. .intg. .intg. ( P m - P e (
.delta. ) ) t t ( 21 ) ##EQU00014##
[0162] With the generator model and scaling parameters defined, we
can proceed to develop this model using CMOS analog devices. Before
selecting the devices we need to identify the circuit parameters
needed to scale the power system to an analog circuit and the
parameters needed to achieve reconfigurability within the
model.
Reconfigurable Generator Model
[0163] The next step of modeling addresses the concept of
reconfigurability in the generator model. The actual analog circuit
representing the generator needs to incorporate a programmable
analog device into model to obtain controllable behaviors of the
power system.
[0164] Instead of the typical operational amplifiers to realize
some of the components needed, the operational transconductance
amplifier (OTA) will be used because of its ability to control its
gain by an external current (or voltage).
[0165] The OTA has the basic properties of a voltage-controlled
current source (VCCS) and the output current is obtained by:
I.sub.o=g.sub.m(.DELTA.V.sub.in) (22)
a product of the transconductance parameter and the differential
input voltage. The transconductance, gm, can be increased or
decreased as a function of the input voltage by varying the
amplifier bias current, I.sub.abc, according to this equation:
g.sub.m=.rho.I.sub.abc (23)
as a gain and is dependent on a non-linear device constant, .rho.
and the amplifier bias current as seen in (23) [306]. Ideally, the
desirable gain of an OTA would always be constant, with a linear
input and output relationship. Since this is not the case, measures
will need to be considered because of the errors that will result
from the non-linear transconductance of the OTA.
Circuit Parameter Transformation
[0166] To achieve the difference between the mechanical input power
and the electrical output power on a circuit, power is quantified
as current. This introduces a circuit scaling parameter, Kp, a
constant ratio between the circuit current and model power quantity
as seen below.
K p = I max P max ( 24 ) ##EQU00015##
[0167] The analog circuit developed to solve the swing equation of
the generator will consist of two first-order integrators built
using OTAs [306]. The output of the double integration process is
voltage representing the electrical angle and is calculated in the
following manner analogous to (21).
.delta. .ident. V = g m 1 g m 2 C 1 C 2 .intg. .intg. ( I m - I e (
V ) ) t t ( 25 ) ##EQU00016##
[0168] In this equation, the I.sub.e(V) represents electrical power
which is dependent on the electrical angle. The components C.sub.1
and C.sub.2 are the values of the integration capacitor needed in
the first and second integrators, respectively. The
transconductance parameters, g.sub.m1 and g.sub.m2, are the gains
of the first and second OTAs in the double integration process.
These values can be reconfigured using the amplifier bias currents
to correspond to various generator characteristics and a given
range of speedup times.
Analog Generator Model
[0169] In order to provide the analog implementation of the
algorithm described in the previous section, an approach to
simulate the block diagram of the swing equation (19) needs to be
realized. A functional block diagram of the generator behavioral
model is made up of voltage-controlled current sources (VCCS),
current-controlled voltage source (CCVS), amplifiers, integrators,
and various other components that determine the electrical output
power of the generator. FIG. 41 represents the block diagram of the
analog behavioral model of the generator.
[0170] Similar to I.sub.e, the current, I.sub.m, is used to
quantify mechanical power. The difference between this current and
I.sub.e is used as the input into the double integrator scaled as
voltage. The voltage output of the double integrator is equivalent
to the electrical angle given by (21). By definition, electrical
power output is dependent on generator terminal voltage and
current. Upon calculation of the electrical power,
P.sub.e(.delta.), it will be scaled to a corresponding current
value, I.sub.e(V), and the process updates itself until a possible
stable steady-state solution is attained.
[0171] To achieve this operation on a circuit, appropriate CMOS
devices must be selected. Considerations used for selection include
minimal device power consumption, suitable device operating ranges,
and transconductance. The devices selected for this model will use
commercial off-the-shelf (COTS) CMOS components including the
National Semiconductor LM13700 OTA [307] and the Analog Devices
AD639 Trigonometric Converter [308]. Having the devices selected,
an example of the analog emulator will be discussed, along with
emulation results.
Power System Example
[0172] The model in FIG. 41 was used in a simple power system
problem [39]. For this example, the generator delivered power to an
infinite bus through a transmission line and the solution of the
electrical angle can be seen as described in equation (19).
[0173] The calculation of the steady-state solution of the
electrical angle of the generator can be found by solving the
instantaneous generator power for the system,
P m = E V ( X d ' + X L ) sin .delta. .smallcircle. P m = P e , max
sin .delta. .smallcircle. ( 26 ) ##EQU00017##
where |E| is the internal generator voltage, |V| is the bus
voltage, X'.sub.d is the transient reactance of the generator, and
X.sub.L is the line reactance. For a given mechanical input power,
the steady-state solution of the electrical angle can be found with
given system parameters.
[0174] The analog circuit for the single-machine power system
consisted of four OTAs, a trigonometric converter, and resistors
and capacitors as seen in FIG. 42. The controllable gains were
calculated in order to fit the circuit to the power system, which
was dependent on the selection of the resistor and capacitors in
the circuit. The selection of the resistor values were needed to
scale the parameters to proper voltage levels for the OTAs to
function within their linear operating range. The capacitor
selection determined the integration time of the circuit.
[0175] The gain representing the maximum electrical power output,
P.sub.e,max, is transformed into the maximum current gain parameter
A, which is equivalent to I.sub.e,max. The maximum allowable
current parameter will be controlled through the gain of an OTA,
g.sub.mA. This can be changed for different power system parameters
in the following manner from equations (24) and (26).
A = K p P e , max ; g m 1 = A R 1 + R 2 R 2 g m 1 = K p E V ( R 1 +
R 2 ) ( X d ' + X L ) R 2 ( 27 ) ##EQU00018##
[0176] The two gains of the OTAs within the double integrator are
considered identical to one another. Using equations (21), (24),
and (25), the gains can be characterized in terms of the
following:
M = H .pi. f 0 g m 1 = g m 2 = P m C 1 C 2 .tau. 2 .upsilon. K p M
R .pi. ( 28 ) ##EQU00019##
[0177] For equation (27), the power system parameters are directly
associated with a controllable gain parameter. In equation (28),
the controllable gain is directly associated with the generator
inertia constant and the scaling parameters defined in equations
(20) and (21). This allows for a completely reconfigurable
classical generator model that provides computational speedup for
conventional digital methods.
[0178] The reconfigurable classical generator model has been tested
in a single-machine power system. The results of the simulation
shown are from the circuit emulator of FIG. 42 using PSpice and
will be compared to simulation results obtained using PSpice Analog
Behavior Models (ABM). This comparison is validated from previous
work [310] conducted using Analog Behavior Models of sample power
systems, where the results obtained compared rather favorably with
traditional load flow solvers such as PSS/E and
PowerWorld.RTM..
[0179] The power system parameters and the associated circuit
parameters based on the previous equations for the simulation
example are listed in Table 14. For this example, the mechanical
input power is fixed constant at 0.5 per unit (p.u.) while the
emulator solves for a steady-state solution.
[0180] The results from the power system can be seen in FIG. 43a
and the results of the analog emulator can be seen in FIG. 43b.
With both converging to approximately the same steady-state
solution, the analog emulator solved from the electrical angle of
the swing equation much faster than the actual power system. This
clearly demonstrates the capability of the computational speedup
the analog emulator possesses with respect to real-time.
TABLE-US-00015 TABLE 14 System and Circuit Parameters Power System
Circuit Emulator |E| 1.8 C.sub.1 (.eta.F) 100 |V| 1.0 C.sub.2
(.eta.F) 100 X.sub.d (p.u.) 1.0 g.sub.m1 (mho) 0.0182 X.sub.L
(p.u.) 0.4 g.sub.m2 (mho) 0.0182 H (sec) 5.0 K.sub.D 1.0 10.sup.-4
P.sub.m (p.u.) 0.5 .nu. (V) 3.6 f.sub.o (Hz) 60 .tau. 5.0 10.sup.3
A (.mu.A) 129 I.sub.m (.mu.A) 50 g.sub.mA (mho) 0.0130
[0181] Upon closer inspection, the results of the actual angular
solution are different with a large degree of error. The ABM power
system model solved for the angular solution of 22.89.degree.,
while the OTA analog emulator model solved for the angular solution
of 27.79.degree.. The difference between the two results has an
error of approximately 21-22%. Since the steady-state calculation
of the actual electrical angle using equation (26) results in an
angle of 22.89.degree., the analog emulator seems to produce the
error.
[0182] Since the present invention focuses on the study of load
flow analysis of power systems, and not the intricate design of
electronic devices, the offset correction method that will be
applied to this example will be the post-processing of the data to
correct for the offset of the devices.
[0183] To correct for the offset, the circuit was set with zero
mechanical input power, Pm=0. The circuit emulator provided an
offset solution because the gains amplify the offsets within the
devices. This angular offset solution was subtracted from the
PSpice emulator simulation for the given mechanical power. FIG. 44
shows the results of the offset correction compared to the original
results of the analog emulator with PSpice. With the offset
eliminated from the PSpice solution, the corrected angular solution
is now 22.74.degree., which is much closer to the desired solution.
The results from this example and the errors calculated are
tabulated in Table 15.
TABLE-US-00016 TABLE 15 Result Comparison and Errors Method Delta
(deg) Error (%) Calculation 22.185 -- PSpice ASM 22.886 0.00 PSpice
OTA 27.787 21.4 PSpice OTA 22.740 0.63 Offset Correction
[0184] This single-machine infinite-bus power system example shows
that the analog emulator achieves computational speedups over the
real power system. The disadvantage of the analog circuit is that
the accuracy of the circuit may not be as good as current digital
computer methods for solving load flow. The results show with
offset correction methods, an advantageous tradeoff is made between
accuracy and computational speedup.
[0185] The example provided results of the advantages of the analog
emulator including the ability to reconfigure system parameters and
the computational speedups achieved in the circuit. The present
invention is applicable to many different areas. One method would
be the introduction of initial conditions on the integrator to
start at a value that may be closer to a desired stable solution.
Also, more complicated models could be developed to expand this
model to include damping and higher order generator models.
[0186] As has been demonstrated above, the present invention is
useful for modeling of power systems and power system components.
This can be used for fail-safe planning, as well as for operations
planning. Thus, the present system is flexible enough to allow
long-term planning over the life of the system, as well as to
facilitate short-term operations planning for operating the system
as little as 24 hours to 7 days ahead of time. This is possible
since the present invention uses the hardware up to about 10 times
more efficiently than all-purpose computing platforms.
[0187] The hardware of the present invention, including the FPGA
and related components, are flexible enough to be customized for a
variety of different calculations. Thus, although the present
invention has been exemplified for power system planning, it is
generally applicable to other systems requiring complex
computations as well.
[0188] It is to be understood, however, that even though numerous
characteristics and advantages of the present invention have been
set forth in the foregoing description, together with details of
the structure and function of the invention, the disclosure is
illustrative only, and changes may be made in detail, especially in
matters of shape, size and arrangement of parts within the
principles of the invention to the full extent indicated by the
broad general meaning of the terms.
REFERENCES
[0189] [1] Feng Tu and A. J. Flueck, A message-passing
distributed-memory parallel power flow algorithm, Power Engineering
Society Winter Meeting, 2002. IEEE, Volume 1 (2002), pp 211-216.
[0190] [2] X. Wang and S. G. Ziavras, Parallel Direct Solution of
Linear Equations on FPGA-Based Machines, Workshop on Parallel and
Distributed Real-Time Systems (17th Annual IEEE International
Parallel and Distributed Processing Symposium), Nice, France, April
22-23, 2003. [0191] [3] Keith Underwood, FPGAs vs. CPUs: Trends in
peak floating point performance, ACM/SIGDA Twelfth ACM
International Symposium on Field-Programmable Gate Arrays,
(Monterrey, Calif.), February 2004. [0192] [4] A. R. Bergen, V.
Vittal, Power Systems Analysis, 2nd Edition, Prentice Hall, 2000.
[0193] [5] P. Amestoy, T. A. Davis, and I. S. Duff, An approximate
minimum degree ordering algorithm, SIAM J. Matrix Anal. Applic., 17
(1996), pp. 886-905. [0194] [6] M. Yannakakis. Computing the
minimum fill-in is NP-Complete, SIAM J. Alg. Disc. Meth., 2 (1981),
pp. 77-79. [0195] [7] W. H. Press, S. A. Teukolsky, W. T. Vettering
and Brian P. Flannery, Numerical Recipes in C: the Art of
Scientific Computing, 2nd Edition, New York: Cambridge University
Press, 1992. [0196] [8] T. A. Davis and I. S. Duff, An
unsymmetric-pattern multifrontal method for sparse lu
factorization, SIAM J. Matrix Anal. Applic., 18 (1997), pp.
140-158. [0197] [101] R. Fried, R. S. Cherkaoui, and C. C. Enz,
"Low-Power CMOS, Analog Transient-Stability-Simulator for a
Two-Machine Power System," presented at ISCAS, Hong-Kong, 1997.
[0198] [102] R. Fried, R. S. Cherkaoui, C. C. Enz, A. Ger-mond, and
E. A. Vittoz, "Approaches for analog VLSI simulation of the
transient stability of large power networks," Ieee Transactions on
Circuits and Systems I-Fundamental Theory and Applications, vol.
46, pp. 1249-1263, 1999. [0199] [103] R. Fried, R. S. Cheikaoui, C.
C. Enz, A. Germond, and E. A. Vittoz, "Analog VLSI Solution to the
Stability Study of Power Networks," presented at ICECS, 1996.
[0200] [104] S. P. Carullo, M. Olaleye, and C. O. Nwankpa, "VLSI
Based Analog Power System Emulator for Fast Contingency Analysis,"
presented at Hawaii International Conference on System Science,
Hawaii, 2004. [0201] [105] J. Ramirez-Angulo and I. Grau, "Wide gm
Adjustment Range, Highly Linear OTA with Linear Programmable
Current Mirrors," presented at ISCAS, 1992. [0202] [106] Z. Wang,
"Making CMOS OTA a Linear Transconductor," Electronics Letters,
vol. 26, pp. 1448-1449, 1990. [0203] [107] C. A. Karybakas, C.
Kosmatopoulos, and T. Laopoulos, "Improved Temperature Compensation
of Otas," Electronics Letters, vol. 28, pp. 763-764, 1992. [0204]
[108] N. Semiconductor, "LM13700 Dual Operational Transconductance
Amplifiers with Linearizing Diodes and Buffers," 2000. [0205] [109]
A. Gratz, "Operational Transconductance Amplifiers," Unpublished.
[0206] [110] R. Wunderlich, J. Oehm, A. Dollberg, and K.
Schumacher, "A Linear Operational Transconductance Amplifier with
Automatic Offset Cancellation and Transconductance Calibration,"
presented at International Conference on Electronics, Circuits and
Systems, 1999. [0207] [111] "PowerWorld Simulator 10.0," PowerWorld
Corporation, 2004. [0208] [201] Capture CIS version 9.2.3, Cadence
Design Systems, Inc. [0209] [202] M. B. Olaleye and C. O. Nwankpa,
"Analog Behavioral Models for the Purpose of Analog Emulation of
Large Scale Power Systems," The Proceedings of the 36th North
American Power Symposium (NAPS), pp. 97-104, August 2004. [0210]
[203] B. Gilbert, "A monolithic Microsystem for Analog Synthesis of
Trigonometric Functions and Their Inverses", IEEE Journal of Solid
State Circuits, Vol. SC-17, No. 6, pp. 1179-1191, December 1982.
[0211] [204] P. C. Krause, T. A. Lipo and D. P. Carroll,
"Applications of Analog and Hybrid Computation in Electric Power
System Analysis," Proceedings of the IEEE, Vol. 62, No. 7, July
1974. [0212] [205] P. G. McLaren, P. Forsyth, A. Perks and P. R.
Bishop, "New Simulation Tools for Power Systems", Transmission and
Distribution Conference and Exposition, 2001. IEEE/PES, vol. 1, 28,
pp. 91-96, October-2 Nov. 2001. [0213] [206] A. Greenwood,
Electrical Transients in Power Systems. 2nd Edition, John Wiley
& Sons, Inc., New York, 1991 [0214] [207] G. A. Korn and T. M.
Korn, Electronic Analogue Computers (D-C Analogue Computers),
McGraw-Hill, New York, 2nd edition, 1956. [0215] [208] M. Hirakami
and W. Neugebauer, "Transient Network Analyzer Operation with
Digital Computer Control and Analysis," IEEE Transactions on Power
Apparatus and Systems, Vol. PAS-100, No. 4, April 1981. [0216]
[209] M. Crow, Computational Methods for Electric Power Systems.
CRC Press, 2003. [0217] [210] R. W. Newcomb and J. D. Lohn, "Analog
VLSI for Neural Networks" in Handbook of Brain Theory and Neural
Networks, M. Arbib Ed., Bradford Books, MIT Press, 1995. [0218]
[211] R. Fried, R. S. Cherkaoui, C. C. Enz, A. Germond, and E. A.
Vittoz, "Approaches for Analog VLSI Simulation of the Transient
Stability of Large Power Networks," IEEE Transactions on Circuits
and Systems-I: Fundamental Theory and Applications, Vol. 46, No.
10, pp. 1249-1263, October 1999. [0219] [212] PowerWorld Simulator
v.9.1, PowerWorld Corporation, Champaign Ill. [0220] [213] Power
System Simulator for Engineering (PSS/E)-28.1.4, Power
Technologies, Inc. [0221] [301] J. B. Ku and S. C. Lin, Low-Voltage
SOI CMOS VLSI Devices and Circuits. New York: John Wiley & Sons
Inc, 2001. [0222] [302] R. Fried, R. Cherkaoui, C. Enz, A. Germond,
and E. Vittoz, "Approaches for analog VLSI simulation of the
transient stability of large power networks," IEEE Transactions on
Circuits and Systems I-Fundamental Theory and Applications, vol.
46, pp. 1249-1263, 1999. [0223] [303] J. Gu, G. G. Karady, and R.
Farmer, "Real-time analysis of transient stability using
reconfigurable analog VLSI," IEEE Transactions on Power Systems,
vol. 18, pp. 1207-1209, 2003. [0224] [304] J. Yakaski and C.
Nwankpa, "Insight into Reconfigurable Analog Emulation of the
Classical Generator Model," presented at The 36th Annual North
American Power Symposium, University of Idaho, Moscow, Id., 2004.
[0225] [305] H. E. Brown, Solution of Large Networks by Matrix
Methods. New York: Wiley, 1985. [0226] [306] R. L. Geiger and E.
Sanchez-Sinencio, "Active Filter Design Using Operational
Transconductance Amplifiers: A Tutorial," in IEEE Circuits and
De-vices Magazine, vol. 1, 1985, pp. 20-32. [0227] [307] National
Semiconductor, LM13700 Dual Operational Transconductance Amplifiers
with Linearizing Diodes and Buffers, August 2000. [0228] [308]
Analog Devices, AD639 Universal Trigonometric Function Converter,
October 1994. [0229] [309] A. Bergen and V. Vittal, Power System
Analysis, 2nd ed: Prentice Hall, 2000. [0230] [310] M. Olaleye and
C. Nwankpa, "Analog Behavioral Models for the Purpose of Analog
Emulation of Large Scale Power Systems," presented at The 36th
Annual North American Power Symposium, University of Idaho, Moscow,
Id., 2004. [0231] [311] R. Wunderlich, J. Oehm, A. Dollberg, and K.
Schumacher, "A Linear Operational Transconductance Amplifier with
Automatic Offset Cancellation and Transconductance Calibration,"
presented at The 6th IEEE International Conference on Electronics,
Circuits and Systems, University of Patras, Greece, 1999.
* * * * *
References