U.S. patent number 3,766,370 [Application Number 05/143,578] was granted by the patent office on 1973-10-16 for elementary floating point cordic function processor and shifter.
This patent grant is currently assigned to Hewlett-Packard Company. Invention is credited to John S. Walther.
United States Patent |
3,766,370 |
Walther |
October 16, 1973 |
**Please see images for:
( Certificate of Correction ) ** |
ELEMENTARY FLOATING POINT CORDIC FUNCTION PROCESSOR AND SHIFTER
Abstract
Three arithmetic units including three shifters are operated in
parallel and controlled by a microprogram stored in a read-only
memory to provide an improved elementary function floating-point
processor. The microprogram includes a set of routines for
calculating 20 elementary functions including arithmetic,
exponential, hyperbolic, logarithmic, square root, and
trigonometric functions. Each shifter is capable of reading a fixed
plural number of consecutive bits, beginning with any bit position,
from an associated data storage register.
Inventors: |
Walther; John S. (Sunnyvale,
CA) |
Assignee: |
Hewlett-Packard Company (Palo
Alto, CA)
|
Family
ID: |
22504668 |
Appl.
No.: |
05/143,578 |
Filed: |
May 14, 1971 |
Current U.S.
Class: |
708/494; 708/230;
708/274; 708/277; 708/276 |
Current CPC
Class: |
G06F
17/10 (20130101); G06F 7/5446 (20130101) |
Current International
Class: |
G06F
7/544 (20060101); G06F 7/48 (20060101); G06F
17/10 (20060101); G06f 007/00 (); G06f
007/38 () |
Field of
Search: |
;235/156,159,160,164,197
;444/1 ;340/172.5 |
References Cited
[Referenced By]
U.S. Patent Documents
Other References
J Volder, "The Cordic Trigonometric Computing Technique," IRE
Trans. on Electronic Computers, Sept. 1959, pp. 330-334..
|
Primary Examiner: Atkinson; Charles E.
Assistant Examiner: Malzahn; David H.
Claims
I claim:
1. A floating point CORDIC processor for calculating trigonometric,
hyperbolic, and linear elementary functions, said floating point
CORDIC processor comprising:
input means for receiving input information and input control
signals;
output means for providing output information and output control
signals;
first, second, and third arithmetic units coupled in parallel for
performing floating point CORDIC calculations, each of said first,
second, and third arithmetic units including an adder-subtractor, a
data register, and a fixed plural-bit shifting unit;
coupling means for selectively intercoupling the adder-subtractors,
the data registers, and the fixed plural-bit shifting units of the
first, second, and third arithmetic units;
storage means for storing a plurality of floating point CORDIC
routines and a plurality of tables of uniquely determined floating
point CORDIC constants; and
control means coupled to the first, second, and third arithmetic
units, to the storage means, and to the coupling means, said
control means being responsive to the input control signals and to
the input information for selecting different ones of the floating
point CORDIC routines and associated floating point CORDIC
constants stored in the storage means and for selectively enabling
different portions of the coupling means.
2. A floating point CORDIC processor as in claim 1 wherein:
said tables of uniquely determined floating point CORDIC constants
stored in the storage means include tables of plural-bit rotation
and distortion constants for use in performing trigonometric and
hyperbolic floating point CORDIC calculations;
said control means includes first logic means coupled to the first,
second, and third arithmetic units for automatically reselecting a
plural-bit rotation or distortion constant when, within the
accuracy of the floating point CORDIC processor, the bits of that
constant are identical to the bits of the next plural-bit rotation
or distortion constant to be selected; and
said control means includes second logic means coupled to the
first, second, and third arithmetic units for automatically
reselecting a prescribed set of plural-bit distortion constants for
converging hyperbolic floating point CORDIC rountines.
3. A floating point CORDIC processor as in claim 1 wherein said
coupling means comprises:
first coupling means for intercoupling the adder-subtractor of the
first arithmetic unit with the data register and the fixed
plural-bit shifting unit of the first arithmetic unit, with the
fixed plural-bit shifting unit of the second arithmetic unit, with
the adder-subtractor and the fixed plural-bit shifting unit of the
third arithmetic unit for transmitting information and control
signals therebetween;
second coupling means for intercoupling the adder-subtractor of the
second arithmetic unit with the adder-subtractor and the fixed
plural-bit shifting unit of the first arithmetic unit, with the
data register and the fixed plural-bit shifting unit of the second
arithmetic means, and with the adder-subtractor of the third
arithmetic unit for transmitting information and control signals
therebetween; and
third coupling means for intercoupling the adder-subtractor of the
third arithmetic unit with the data register and the fixed
plural-bit shifting unit of the third arithmetic unit for
transmitting information and control signals therebetween.
4. Apparatus for shifting in parallel a fixed plural number of
consecutive bits beginning with any bit position from an ordered
set of bits, said apparatus comprising:
first logic means for defining a first plurality of overlapping
groups of consecutive bits from the ordered set of bits, each of
these groups comprising a predetermined number of consecutive bits
beginning with a different bit position in the ordered set of bits,
the number of bits in each of these groups being less than the
number of bits in the ordered set of bits and being greater than
the fixed plural number of consecutive bits to be shifted in
parallel from the ordered set of bits;
first decoding means coupled to the first logic means for selecting
any one of the first plurality of overlapping groups of consecutive
bits from the ordered set of bits in response to an input control
signal;
second logic means coupled to the first logic means for defining a
second plurality of overlapping groups of consecutive bits from the
group of consecutive bits selected by the first decoding means,
each of these groups comprising a predetermined number of
consecutive bits beginning with a different bit position in the
group of consecutive bits selected by the first decoding means, the
number of bits in each of these groups being less than the number
of bits in the group of consecutive bits selected by the first
decoding means and being equal to the fixed plural number of
consecutive bits to be shifted in parallel from the ordered set of
bits;
second decoding means coupled to the second logic means for
selecting any one of the second plurality of overlapping groups of
consecutive bits in response to an input control signal; and
output means coupled to the second logic means for out-putting in
parallel the group of consecutive bits selected by the second
decoding means.
5. Apparatus for shifting in parallel a fixed plural number of
consecutive bits beginning with any bit position from an ordered
set of bits, said apparatus comprising:
logic means for defining a plurality of overlapping groups of
consecutive bits from the ordered set of bits, each of these groups
comprising a predetermined number of consecutive bits beginning
with a different bit position in the ordered set of bits, the
number of bits in each of these groups being less than the number
of bits in the ordered set of bits and being equal to the fixed
plural number of consecutive bits to be shifted in parallel from
the ordered set of bits;
decoding means coupled to the logic means for selecting any one of
the plurality of overlapping groups of consecutive bits from the
ordered set of bits in response to an input control signal; and
output means coupled to the logic means for outputting in parallel
the group of consecutive bits selected by the decoding means.
Description
BACKGROUND OF THE INVENTION
This invention relates to a coordinate rotational digital computer
(CORDIC) floating-point processor for computing elementary
functions and to a data shifter for use therein.
Conventional minicomputers without floating-point hardware, such as
the Hewlett-Packard 2116B, typically have a floating-point software
precision of about 24 bits of mantissa and 8 bits of exponent
(i.e., double precision) and a processing time of about 500
microseconds for addition, subtraction, multiplication, and
division and about 10,000 microseconds for trigonometric and
hyperbolic functions. Some floating-point processors are available
for increasing the precision, decreasing the processing time, and
extending the capability of such minicomputers. However, these
floating-point processors can only perform a few elementary
functions, typically limited to addition, subtraction,
multiplication, division, and, in some cases, square root. A CORDIC
processor is available that can perform many additional elementary
functions including trigonometric functions. However, this CORDIC
processor is unable to handle floating-point arguments and is
inaccurate for small arguments. In addition, it employs bit-serial
shifters and adder-subtractors which makes the processing time
slower than desired. Parallel shifters and adder-subtractors could
be employed to increase its speed. However, to do so would
substantially increase the cost of the CORDIC processor since
parallel shifters and adder-subtractors are much more expensive
than bit-serial shifters and adder-subtractors.
SUMMARY OF THE INVENTION
An object of this invention is to provide an improved processor for
extending the hardware computing capability of a minicomputer.
Another object of this invention is to provide a faster and more
accurate processor for performing an increased number of elementary
functions economically.
Another object of this invention is to provide a CORDIC
floating-point processor.
Another object of this invention is to provide a CORDIC
floating-point processor for handling floating-point arguments and
maintaining full accuracy for small arguments.
Still another object of this invention is to provide a shifter that
is faster than conventional bit-serial shifters and less expensive
than conventional parallel shifters.
Other and incidental objects of this invention will become apparent
from a reading of this specification and an inspection of the
accompanying drawings.
These objects are accomplished according to the preferred
embodiment of this invention by employing three arithmetic units
and three shifters operated in parallel, a read-only memory, a data
storage register for storing constants read from the read-only
memory, and control logic to provide a CORDIC floating-point
processor that may be controlled by a microprogram stored in the
read-only memory or, alternatively, by a tester. The microprogram
includes a set of routines for performing the elementary functions
of addition, subtraction, multiplication, division, absolute value,
entier, complement, cosine, sine, tangent, arctangent, hyperbolic
cosine, hyperbolic sine, hyperbolic tangent, archyperbolic tangent,
exponential, natural logarithm, square root, round to nearest
integer, and round to twenty-four bits. The routines for performing
multiplication, division, the aforementioned trigonometric and
hyperbolic functions, natural logarithm, exponential, and square
root are based on a unified algorithm for performing all of these
last-mentioned functions in order to simplify the control and
hardware of the CORDIC floating-point processor. Each routine is
optimized to perform one or more elementary functions. The control
logic allows two levels of microprogrammed subroutines and permits
conditional and imperative microinstructions to be executed
simultaneously. Each arithmetic unit includes an adder-subtractor
and an associated data storage register. The contents of these data
storage registers are prescaled before performing selected
elementary functions to provide an argument having a scale factor
and a mantissa that falls within the domain of convergence of the
unified algorithm. If the mantissa is small, it may also be
prenormalized and maintained in a normalized form while the
selected elementary function is performed. Certain steps of the
unified algorithm for performing hyperbolic functions are repeated
to maintain the convergence requirement while decreasing the
processing time. Each shifter is capable of reading a fixed plural
number of consecutive bits, beginning with any bit position, from
an associated one of the data storage registers.
DESCRIPTION OF THE DRAWINGS
FIG. 1 represents the angle and radius of a vector in a coordinate
system parameterized by m.
FIG. 2 illustrates the input-output functions for CORDIC modes.
FIG. 3 is a simplified schematic diagram of a floating point CORDIC
processor according to the preferred embodiment of this
invention.
FIG. 4 is a simplified flow chart of the microprogram control for
the floating point CORDIC processor of FIG. 3.
FIG. 5 represents different data formats including the extended
floating point data format employed by the floating point CORDIC
processor of FIG. 3.
FIG. 6 illustrates the ranges of extended floating point
numbers.
FIG. 7 is a simplified schematic diagram illustrating the unary
function routines.
FIG. 8 is a simplified schematic diagram illustrating the binary
function routines.
FIGS. 9A-D are detailed block diagrams of a floating point CORDIC
processor according to the preferred embodiment of this
invention.
FIG. 10 is a composite figure map of FIGS. 9A-D.
FIGS. 11A-E are logic diagrams of the read-only memory portions of
FIGS. 9A-D.
FIG. 12 is a composite figure map of FIGS. 11A-E.
FIG. 13 is a wiring diagram of the read-only memory of FIGS.
11A-E.
FIGS. 14A-R are logic diagrams of the read-only memory addressing
portions of FIGS. 9A-D.
FIG. 15 is a composite figure map of FIGS. 14A-R.
FIGS. 16A-N are logic diagrams of the A-adder of FIGS. 9A-D.
FIG. 17 is a composite figure map of FIGS. 16A-N.
FIGS. 18A-H are logic diagrams of the A-shifter of FIGS. 9A-D.
FIG. 19 is a composite figure map of FIGS. 18A-H.
FIGS. 20A-C are logic diagrams of the transfer gates for the
A-adder of FIGS. 9A-D.
FIG. 21 is a composite figure map of FIGS. 20A-C.
FIGS. 22A-N are logic diagrams of the B-adder of FIGS. 9A-D.
FIG. 23 is a composite figure map of FIGS. 22A-N.
FIGS. 24A-H are logic diagrams of the B-shifter of FIGS. 9A-D.
FIG. 25 is a composite figure map of FIGS. 24A-H.
FIGS. 26A-C are logic diagrams of the transfer gates for the
B-adder of FIGS. 9A-D.
FIG. 27 is a composite figure map of FIGS. 26A-C.
FIGS. 28A-N are logic diagrams of the C-adder of FIGS. 9A-D.
FIG. 29 is a composite figure map of FIGS. 28A-N.
FIGS. 30A-G are logic diagrams of the D-register of FIGS. 9A-D.
FIG. 31 is a composite figure map of FIGS. 30A-G.
FIGS. 32A-H are logic diagrams of the D-shifter of FIGS. 9A-D.
FIG. 33 is a composite figure map of FIGS. 32A-H.
FIGS. 34A-C are logic diagrams of the transfer gates for the
C-adder of FIGS. 9A-D.
FIG. 35 is a composite figure map of FIGS. 34A-C.
FIGS. 36A-G are logic diagrams of an interface card for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 37 is a composite figure map of FIGS. 36A-G.
FIGS. 38A-I are logic diagrams of a tester for use with the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 39 is a composite figure map of FIGS. 38A-I.
FIGS. 40A-J are schematic diagrams of the power supply for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 41 is a composite figure map of FIGS. 40A-J.
FIGS. 42A-F are block plain wiring diagrams for the floating point
CORDIC processor of FIGS. 9A-D.
FIG. 43 is a composite figure map of FIGS. 42A-F.
FIGS. 44A-D are flow charts for the entry routine for the floating
point CORDIC processor of FIGS. 9A-D.
FIG. 45 is a composite figure map of FIGS. 44A-D.
FIGS. 46A-C are flow charts for the entier, fix, load, load I, and
round routines for the floating point CORDIC processor of FIGS.
9A-D.
FIG. 47 is a composite figure map of FIGS. 46A-C.
FIGS. 48A-D are flow charts for the addition, subtraction,
absolute, and negative routines for the floating point CORDIC
processor of FIGS. 9A-D.
FIG. 49 is a composite figure map of FIGS. 48A-D.
FIGS. 50A-C are flow charts for the overflow and normalize routines
for the floating point CORDIC processor of FIGS. 9A-D.
FIG. 51 is a composite figure map of FIGS. 50A-C.
FIGS. 52A-B are flow charts for the multiply and initialize
routines for the floating point CORDIC processor of FIGS. 9A-D.
FIG. 53 is a composite figure map of FIGS. 52A-B.
FIGS. 54A-B are flow charts for the division routine for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 55 is a composite figure map of FIGS. 54A-B.
FIGS. 56A-D are flow charts for the sine, cosine, tangent, and
hyperbolic prescale routines for the floating point CORDIC
processor of FIGS. 9A-D.
FIG. 57 is a composite figure map of FIGS. 56A-D.
FIGS. 58A-C are flow charts for the sine resolver routine for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 59 is a composite figure map of FIGS. 58A-C.
FIGS. 60A-C are flow charts for the arctangent routine for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 61 is a composite figure map of FIGS. 60A-C.
FIGS. 62A-D are flow charts for the hyperbolic prescale routine for
the floating point CORDIC processor of FIGS. 9A-D.
FIG. 63 is a composite figure map of FIGS. 62A-D.
FIGS. 64A-D are flow charts for the hyperbolic sine and hyperbolic
cosine resolver routines.
FIG. 65 is a composite figure map of FIGS. 64A-D.
FIGS. 66A-D are flow charts for the natural log, hyperbolic
arctangent and square root prescale routine for the floating point
CORDIC processor of FIGS. 9A-D.
FIG. 67 is a composite figure map of FIGS. 66A-D.
FIGS. 68A-C are flow charts for the arc hyper resolve routine for
the floating point CORDIC processor of FIGS. 9A-D.
FIG. 69 is a composite figure map of FIGS. 68A-C.
FIGS. 70A-B are flow charts for the diagnostic routine for the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 71 is a composite figure map of FIGS. 70A-B.
FIG. 72 illustrates a clock timing diagram for the floating point
CORDIC processor of FIGS. 9A-D.
FIG. 73 illustrates the instruction execution timing diagram for
the floating point CORDIC processor of FIGS. 9A-D.
FIG. 74 represents the instruction coding of the read-only memory
of the floating point CORDIC processor of FIGS. 9A-D.
FIG. 75 illustrates the addressable reading of a data storage
register by an associated one of the shifter of the shifters of the
floating point CORDIC processor of FIGS. 9A-D.
FIG. 76 illustrates the A-shifter selection process of the floating
point CORDIC processor of FIGS. 9A-D.
FIG. 77 is a block diagram of a power supply for the floating point
CORDIC processor of FIGS. 9A-D.
FIG. 78 illustrates the voltage limit ranges of the power supply of
FIG. 77.
FIG. 79 shows a partial filter power supply waveform used to detect
power failure.
DESCRIPTION OF THE PREFERRED EMBODIMENT
INTRODUCTION
A floating-point processor (FPP) for extending the hardware
computing capability of a minicomputer (COMP), such as the
Hewlett-Packard 2116B, to include high-speed, extended-precision
mathematical functions is described herein. The processing time of
the FPP is in the range of 20 to 75 microseconds for floating-point
addition, subtraction, multiplication, and division and in the
range of 20 to 200 microseconds for the remaining mathematical
functions. The precision of the FPP is 40 bits of mantissa and 8
bits of exponent (i.e., triple precision), which is equivalent to
an accuracy of 12 decimal digits, and the accuracy is limited only
by the truncation of input arguments. Triple-store and triple-load
instructions are provided to transfer the triple-length quantities
between an extended floating-point accumulator and three
consecutive memory locations of the COMP. To provide compatibility
with double-precision data, instructions are also provided for
performing the necessary format conversions. Both integer and
floating-point double-precision data may be used.
Functionally, the FPP can be regarded as a calculator under the
control of the COMP. In the same way that a human operator manually
uses a free-standing calculator, the COMP enters an argument into
the FPP and then issues a command to calculate a selected function
of the argument. The answer appears in much less time than the COMP
would have taken to calculate it. Thus, the benefits derived from
the FPP by the COMP are much the same as the benefits derived from
the calculator by the human operator, namely, speed and
efficiency.
A UNIFIED ALGORITHM FOR ELEMENTARY FUNCTIONS
The FPP includes a set of routines based on a single unified
algorithm for the calculation of elementary functions including
multiplication, division, sine, cosine, tangent, arctangent,
hyperbolic sine, hyperbolic cosine, hyperbolic tangent,
archyperbolic tangent, natural logarithm, exponential, and square
root. The basis for the unified algorithm is coordinate rotation in
a linear, circular, or hyperbolic coordinate system depending on
which function is to be calculated. The only operations required
are shifting, adding, subtracting, and the recall of prestored
constants.
Referring to FIG. 1, the radius R and the angle A of the vector P =
(x,y) are defined as follows in a coordinate system parameterized
by m:
R = [x.sup.2 +my.sup.2 ].sup.1/2 , (1) A = m.sup.-.sup.1/ 2
tan.sup.-.sup .1 [m.sup.1/2 (2) ] .
It can be shown that R is the distance from the origin to the
intersection of the curve of constant radius with the x axis, while
A is twice the area enclosed by the vector, the x axis, and the
curve of constant radius, divided by the radius squared. The curves
of constant radius for the circular (m=1), linear (m=0), and
hyperbolic (m=-1) coordinate systems are shown in FIG. 1.
Let a new vector P.sub.i.sub.+1 = (x.sub.i.sub.+1, y.sub.i.sub.+1)
be obtained from P.sub.i = (x.sub.i, y.sub.i) according to
x.sub.i.sub.+1 = x.sub.i + m y.sub.i .delta..sub.i (3)
y.sub.i.sub.+ 1 = y.sub.i - x.sub.i .delta..sub.i (4)
where m is the parameter for the coordinate system, and
.delta..sub.i is an arbitrary value. The angle radius of the new
vector in terms of the old are given by
A.sub.i.sub.+1 = A.sub.i - .alpha..sub.i (5) R.sub.i.sub.+ 1 =
R.sub.i * (6) ub.i ,
where
.alpha..sub.i = m.sup.-.sup.1/2 tan.sup.-.sup.1 [m.sup.1/2
.delta..sub.i ] (7) K.sub.i = [1 + m .delta..sub.i .sup.2 ].sup.1/2
. (8)
The angle and radius are modified by quantities which are
independent of the coordinate values. Table 1 gives the equations
for .alpha. and K after applying identities A2 and A5 in Table
2.
TABLE 1
Angles and Radius Factors
Coordinate Radius System Angle Factor m .alpha..sub.i K.sub.i 1
tan.sup.-.sup.1 .delta..sub.i (1 + .delta..sub.i .sup.2).sup.1/ 2 0
.delta..sub.i 1 -1 tanh.sup.-.sup.l .delta..sub.i (1 -
.delta..sub.i .sup.2).sup.1/ 2 ##SPC1##
For n iterations we find:
A.sub.n = A.sub.o - .alpha. (9) R.sub.n = R.sub.o * K , (10)
where ##SPC2## ##SPC3##
The total change in angle is just the sum of the incremental
changes while the total change in radius is the product of the
incremental changes.
If a third variable z is provided for the accumulation of the angle
variations,
z.sub.i.sub.+1 = z.sub.i + .alpha..sub.i (13)
and the set of difference equations (3), (4), and (13) is solved
for n iterations, we find:
x.sub.n = K{x.sub.o cos(.alpha..sqroot.m) + y.sub.o m.sup.1/2
sin(.alpha..sqroot.m) } (14) Y.sub.n = K{y.sub.o cos(.alpha..s
qroot.m) - x.sub.o m.sup.-.sup.1/ 2 sin(.alpha..s qroot.m)}
(15)
z.sub.n = z.sub.o + .alpha. , (16)
where .alpha. and K are as in equations (11) and (12). These
relations are summarized in FIG. 2 for m=1, m=0 and m=-1 for the
following special cases:
1. A is forced to zero: y.sub.n = 0;
2. z is forced to zero: z.sub.n = 0.
The initial values x.sub.o, y.sub.o, z.sub.o are shown on the left
of each block in FIG. 2 while the final values x.sub.n, y.sub.n,
z.sub.n are shown on the right. The identities given in Table 2
were used to simplify these results.
By the proper choice of the initial values the functions x-z, y/x,
sin z, cos z, tan.sup.-.sup.1 z, sinh z, cosh z, and
tanh.sup.-.sup.1 z may be obtained. In addition the following
functions may be generated:
tan z = sin z/cos z (17) tanhz = sinh z/cosh z (18) exp z = sinh z
+ cosh (19)
ln w = 2 tanh.sup.-.sup.1 [y/x],where x = w+1 and y = w-1 (20)
.sqroot.w = .sqroot.x.sup. 2 -y.sup.2, where x = w + (1/4) and y =
w - (1/4). (21)
The angle A of the vector P may be forced to zero by a converging
sequence of rotations .alpha..sub.i which at each step brings the
vector closer to the positive x axis. The magnitude of each element
of the sequence may be predetermined, but the direction of rotation
must be determined at each step such that
.vertline.A.sub.i.sub.+1 .vertline. = .vertline..vertline.A.sub.i
.vertline. - .alpha..sub.i .vertline. . (22)
The sum of the remaining rotations must at each step be sufficient
to bring the angle to at least within .alpha..sub.n.sub.-1 of zero,
even in the extreme case where A.sub.i = 0,
.vertline.A.sub.i.sub.+1 .vertline. = .alpha..sub.i. Thus,
##SPC4##
The domain of convergence is limited by the sum of the rotations:
##SPC5## ##SPC6##
To show that A converges to within .alpha..sub.n.sub.-1 of zero
within n steps we first prove the following theorem: ##SPC7##
which holds for i .gtoreq. 0.
We proceed by induction on i. The hypothesis (26) holds for i=0 by
(24). We now show that if the hypothesis is true for i then it is
also true for i+1. Subtracting .alpha..sub.i from (26) and applying
(23) at the left side yields ##SPC8##
Application of (22) then yields ##SPC9##
as was to be shown. Therefore, by induction, the hypothesis holds
for all .gtoreq. 0.
In particular, the theorem is true for i=n so that
.vertline.A.sub.n .vertline. < .alpha..sub.n.sub.-1 . (29)
The same scheme may be used to force the angle in z to zero. The
proof of convergence proceeds exactly as before except that A is
replaced by z in equations (22) through (29). By equation (25) z
has the same domain of convergence as A,
max .vertline.z.sub.o .vertline. = max .vertline.A.sub.o
.vertline.. (30)
Note that since K is a function of .delta..sub.i.sup.2, hwere
.delta..sub.i = m.sup.-.sup.1/2 tan[m.sup.1/2 .alpha..sub.i ], K is
independent of the sequence of signs chosen for the .alpha..sub.i.
Thus, for a fixed sequence of .alpha..sub.i magnitudes the constant
1/K may be used as an initial value to counteract the factor K
present in the final values.
The practical use of the algorithm is based on the use of shifters
to effect the multiplication by .delta..sub.i. If .rho. is the
radix of the number system and F.sub.i is an array of integers,
where i .gtoreq. 0, then a multiplication of x by
.delta..sub.i = .rho..sup.-.sup.F
is simply a shift of x by F.sub.i places to the right. The integers
F.sub.i must be chosen such that the angles
.alpha..sub.m,F = m.sup.-.sup.1/2 tan.sup.-.sup.1 (m.sup.1/2
.rho..sup..sup.-F ) (32)
satisfy the convergence criterion (23). The domain of convergence
is then given by (25).
Table 3 shows some F sequences, convergence ranges, and radius
factors for a binary code.
TABLE 3
Shift Sequences for a Binary Code
coordinate shift domain of radius radix system sequence convergence
factor .rho. m F.sub.m,i, i .gtoreq. 0 max .vertline.A.sub.o
.vertline. K 2 1 0,1,2,3,4,i,... .about.1.74 .about.1.65 2 0
1,2,3,4,5,i+1,... 1.0 1.0 2 -1 1,2,3,4,4,5...* .about.1.13
.about.0.80 *for m = -1 the following integers are repeated: {4,
13, 40, 121, . . . , k, 3k + 1, . . . }
The hyperbolic mode (m = -1) is somewhat complicated by the fact
that for .alpha..sub.i = tanh.sup.-.sup.1 (2.sup.-.sup.i) the
convergence criterion (23) is not satisfied. However, it can be
shown that ##SPC10##
and that, therefore, if the integers {4, 13, 40, 121, ..., k, 3k
+1, ...} in the F.sub.i sequence are repeated then (23) becomes
true.
The limited domain imposed by the convergence criterion (25) may be
extended by means of the prescaling identities shown in Table 4.
For example, to calculate the sin of a large argument, we first
divide the argument by .pi./2 obtaining a quotient Q and a
remainder D where .vertline.D.vertline. < .pi./2. The table
shows that only sin D r cos D need be calculated and that .pi./2 is
within the domain of convergence. Note that the sine and cosine can
be generated simultaneously by the CORDIC algorithm and that the
answer may then be chosen as plus or minus one of these according
to Q mod 4. As a second example, to calculate the logarithm of a
large argument we first shift the argument's binary point E places
until it is just to the left of the most significant non-zero bit.
The fraction M then satisfies 0.5 .ltoreq. M < 1.0 and as shown
in the table therefore falls within the domain of convergence. The
answer is calculated as log.sub.e M + E** log.sub.e 2.
##SPC11##
The accuracy at the n.sup.th step is determined in theory by the
size of the last of the converging sequence of rotations
.alpha..sub.i, and for large n is approximately equal in digits to
F.sub.n.sub.-1. The accuracy in digits may conveniently be made
equal to L, the length of storage used for each variable, by
choosing n such that F.sub.n.sub.-1 = L.
In practice the accuracy is limited by the finite length of
storage. The truncation of input arguments performed to make them
fit within the storage length gives rise to unavoidable error, the
size of which depends on the sensitivity of the calculated function
to small changes in the input argument. In a binary code, the
truncation of intermediate results after each of L iterations gives
rise to a total of at most log.sub.2 L bits of error. This error
can be rendered harmless by using L + log.sub.2 L bits for the
storage of intermediate results.
In a normalized floating point number system it is desirable that
all L bits of the result be accurate, independent of the absolute
size of the argument. To accomplish this for very small arguments
it is necessary to keep each storage register in a normalized form;
i.e., in a form where there are no leading zeros. It is possible to
do this by transforming the iteration equations (3), (4), (13) to a
normalized form according to the following substitutions:
x becomes x' (34) y becomes y' * 2.sup.-.sup.E (35)
z becomes z' * 2.sup.-.sup.E (36) .alpha..sub.F becomes
.alpha..sub.F ' * 2.sup.-.sup.F (37)
where E, a positive integer, is chosen such that the initial
argument, placed into either the y or z register, is
normalized.
The result of the substitutions is:
x' .fwdarw. x' + my' 2.sup.-.sup.(F.sup.+E) (38) y' .fwdarw. y' -
x' 2.sup.-.sup.( F.sup.-E) (39)
z' .fwdarw. z' + .alpha..sub.F ' 2.sup.-.sup.(F.sup.-E) .
For simplicity the subscripts i and i+1 have been dropped. Instead,
.alpha. has been expressed as a function of F as in equation (32),
and the replacement operator (.fwdarw.) has been used. The value of
i may be initialized such that F.sub.i = E:
i.sub.initial .fwdarw.{i.vertline. F.sub.i = E} . (41)
The value of n may be chosen such that L significant bits are
obtained:
n .fwdarw. {n.vertline. F.sub.n.sub.-1 - E = L} . (42)
Note that n - i.sub.initial .apprxeq. L and that therefore
providing L + log.sub.2 L bits for the storage of intermediate
results is still adequate.
The radius factor K is now a function of i = i.sub.initial as well
as m, ##SPC12##
Fortunately, not all the reciprocal constants 1/K.sub.m,i need to
be stored since for large values of i
(1/K.sub.m,i) .apprxeq. 1 - m (2/3) 2.sup.-.sup.2i (44)
and therefore all the constants having i > L/2 are identical to
within L significant bits. Therefore, only L/2 constants need to be
stored for m = +1 and also for m = -1. For m=0 no constants need to
be stored since K.sub.0,i = 1 for i .gtoreq. 1.
A similar savings in storage can be made for the angle constants
.alpha..sub.m,F since for large values of F
.alpha..sub.m,F ' .tbd. .alpha..sub.m,F * 2.sup.F .apprxeq. 1 -
m(1/3) 2.sup..sup.-2F (45)
and, thus, as for the K constants, only L/2 constants need to be
stored for m = +1 and also for m = -1. For m=0 no constants need to
be stored since 60 .sub.0,F ' = 1 for F .gtoreq. 1.
GENERAL DESCRIPTION
As shown in FIG. 3, the FPP includes three identical arithmetic
units 10, 12, and 14 operated in parallel. Each arithmetic unit
contains a 64-bit register 16, 18, or 20, an 8-bit parallel
adder/subtractor 22, 24, or 26, and an 8-out-of-48 multiplex
shifter 28, 30, or 32. The assembly of arithmetic units is
controlled by a microprogram stored in a read-only memory (ROM) 34,
which also contains the angle and radius-correction constants. The
ROM contains 512 words of 48 bits each and operates on a cycle time
of 200 nanoseconds.
The essential aspects of the microprogram used to execute the
unified CORDIC algorithm are shown in FIG. 4. The initial argument
and correction constants are loaded into the three registers 16,
18, and 20, and m is set to one of the three values 1, 0, -1 . If
the initial argument is small, it is normalized and E is set to
minus the binary exponent of the result, otherwise, E is set to
zero. Next, i is initialized to a value such that F.sub.m,i = E. A
loop is then entered and is repeated until F.sub.m,i - E = L. In
this loop the direction of rotation necessary to force either of
the angles A or z to zero is chosen; the binary variable .sigma.,
used to control the three adder/subtractors 22, 24, and 26, is set
to either +1 or -1; and the iteration equations in block 36 or 38
are executed.
Table 5 gives a breakdown of the maximum execution times for the
routines performed by the FPP. The figures in the column marked
"data transfers from computer " are the times for operand and
operation code transfers between the FPP and the COMP and vary
depending upon the COMP used and how it is interfaced with the FPP.
The FPP retains the result of each executed function. Thus, the
binary functions add, subtract, multiply and divide require only
one additional operand to be supplied, and the unary functions do
not require any operand transfers. The first operand is loaded via
the LOAD instruction, and the final result is retrieved via the
STORE instruction. ##SPC13##
PROGRAMMING INFORMATION
Table 6 lists the operation codes for each of the 20 FPP
routines.
TABLE 6
Cordic Floating-Point
Operation Codes 7 6 5 4 3 2 1 0 TRIPLE LOAD AND LDX 0 0 1 0 1 0 0 1
STORE STX 0 0 1 0 0 0 0 1 TRIPLE PRECISION ADX 0 0 0 0 0 0 0 1
FLOATING POINT SBX 0 0 0 0 0 0 1 1 ARITHMETIC MPX 0 0 0 0 0 1 0 1
DVX 0 0 0 0 0 1 1 1 ABX 0 0 1 0 0 0 1 1 ENX 0 0 1 0 0 0 0 1 CMX 0 0
0 0 1 0 0 1 CSX 0 0 0 1 0 0 1 1 SNX 0 0 0 1 0 0 0 1 TNX 0 0 0 1 0 1
0 1 ATX 0 0 0 0 1 0 1 1 FUNCTIONS HCX 0 0 0 1 1 0 1 1 HSX 0 0 0 1 1
0 0 1 HTX 0 0 0 1 1 1 0 1 AHT 0 0 0 0 1 1 0 1 EXX 0 0 0 1 1 1 1 1
LNX 0 0 0 0 1 1 1 1 SRX 0 0 0 1 0 1 1 1 FIX 0 0 1 0 0 1 1 1 RNX 0 0
1 0 0 1 0 1
the FPP receives the data from the COMP in a normalized or
un-normalized extended floating-point format and returns data in a
normalized extended floating-point format to the COMP. As shown in
the upper box of FIG. 5, a typical traditional floating-point
format employs 23 bits for the mantissa, 1 bit for the mantissa
sign, 7 bits for the exponent, and 1 bit for the exponent sign.
This requires two 16-bit words. Note that the second word is split,
such that bits 8 through 15 are the eight least significant bits of
the mantissa, and the remaining eight bits are the exponent and
exponent sign. As shown in the middle box of FIG. 5, the only
difference between the traditional double-word floating point
format and the extended floating-point format employed by the FPP
is the addition of one 16-bit word to the length of the
mantissa.
As shown in the lower box of FIG. 5, the conversion from the
traditional double-word floating-point format to the extended
triple-word floating-point format is accomplished by splitting the
second word of the double-word format between bits 8 and 7, and
inserting 16 zeros as the least significant bits of the mantissa in
the triple-word format. Note that 8 of these zeros are present in
the second word of the triple-word format, and the remaining 8 are
in the third word. The reverse conversion, from triple- to
double-length format, consists simply of truncating the 16 least
significant bits of the mantissa. This means removing bits 7
through 0 of the second word, and bits 15 through 8 of the third
word.
The conversions between double-word integer and extended floating
point formats (not illustrated) are more complex. Briefly, the
process is as follows. For conversion to double-word integer, the
mantissa is arithmetically shifted right while the exponent value
is correspondingly increased (one increment per shift) until the
exponent equals +31. If bit 15 is a 1 (implying 1/2), the quantity
in bits 16 through 47 is incremented by 1; this rounds the integer
to the nearest whole number. Bits 8 through 15 are set to 0, and
bits 16 through 47 comprise the integer value. The reverse
conversion, double-word integer to extended floating point,
consists of filling in zeros for bits 8 through 15 of the least
significant word, setting the exponent to +31, and then normalizing
the result.
From the standpoint of the COMP programmer, the FPP effectively
adds an extended floating-point accumulator, designated the X
register. This register can be loaded from the COMP memory, its
contents manipulated, and its contents stored in the COMP memory.
Each extended floating-point operand may be contained in three
consecutive COMP memory locations, as assumed for the following
definitions of the twenty FPP routines:
LDX (LOAD EXTENDED FLOATING POINT). Load the extended floating
point number from addressed memory location m (and m + 1 and m + 2)
into the X-register.
STX (STORE EXTENDED FLOATING POINT). Store the extended floating
point number in the X-register in addressed memory location m (and
m + 1 and m + 2).
ADX (ADD EXTENDED FLOATING POINT). Add the extended floating point
number from addressed memory location m (and m + 1 and m + 2) to
the current value in the X-register. The extended floating point
result of the addition occupies the X-register on completion of the
instruction.
SBX (SUBTRACT EXTENDED FLOATING POINT). Subtract the extended
floating point number in addressed memory location m (and m + 1 and
m + 2) from the current value in the X-register. The extended
floating point result of the subtraction occupies the X-register on
completion of the instruction.
MPX (MULTIPLY EXTENDED FLOATING POINT). Multiply the extended
floating point number in the X-register by the extended floating
point number in addressed memory location m (and m + 1 and m + 2).
The extended floating point result of the multiplication occupies
the X-register on completion of the instruction.
DVX (DIVIDE EXTENDED FLOATING POINT). Divide the extended floating
point number in the X-register by the extended floating point
number in addressed memory location m (and m + 1 and m + 2). The
extended floating point result of the division occupies the
X-register on completion of the instruction.
ABX (ABSOLUTE VALUE). Calculate the absolute value of the extended
floating point value in the X-register; i.e., if the content of the
X-register is negative, convert to positive.
ENX (ENTIER). Calculate the entier of the extended floating point
value in the X-register. The calculated result replaces the
original contents of the X-register.
CMX (COMPLEMENT). Convert the extended floating point value in the
X-register to an extended floating point value with unchanged
magnitude but opposite sign.
CSX (COSINE). Calculate the cosine of the value in the X-register,
where the value is expressed in radians as an extended floating
point number. The result of the cosine calculation replaces the
original value in the X-register.
SNX (SINE). Calculate the sine of the value in the X-register,
where the value is expressed in radians as an extended floating
point number. The result of the sine calculation replaces the
original value in the X-register.
TNX (TANGENT). Calculate the tangent of the value in the
X-register, where the value is expressed in radians as an extended
floating point number. The result of the tangent calculation
replaces the original value in the X-register.
ATX (ARCTANGENT). Calculate the arctangent of the value in the
X-register, where the result is expressed in radians as an extended
floating point number in the X-register.
HCX (HYPERBOLIC COSINE). Calculate the hyperbolic cosine of the
extended floating point number in the X-register. The result of the
hyperbolic cosine calculation replaces the original value in the
X-register. HSX (HYPERBOLIC SINE). Calculate the hyperbolic sine of
the extended floating point number in the X-register. The result of
the hyperbolic sine calculation replaces the original value in the
X-register.
HTX (HYPERBOLIC TANGENT). Calculate the hyperbolic tangent of the
extended floating point number in the X-register. The result of the
hyperbolic tangent calculation replaces the original value in the
X-register.
AHT (ARCHYPERBOLIC TANGENT). Calculate the archyperbolic tangent of
the extended floating point number in the X-register. The result of
the archyperbolic tangent calculation replaces the original value
in the X-register.
EXX (EXPONENTIAL). Calculate the exponential of the extended
floating point number in the X-register. The result of the
exponential calculation replaces the original value in the
X-register.
LNX (NATURAL LOGARITHM). Calculate the natural logarithm of the
extended floating point number in the X-register. The result of the
logarithm calculation replaces the original value in the
X-register.
SRX (SQUARE ROOT). Calculate the square root of the extended
floating point number in the X-register. The result of the square
root calculation replaces the original value in the X-register.
FIX (ROUND TO NEAREST INTEGER). Round-off the extended floating
point number in the X-register to the nearest integer value. The
result remains an extended floating point number and replaces the
original value in the X-register. The number of bits affected
depends on the exponent value.
RNX (ROUND TO 24 BITS). Round-off the extended floating point
number in the X-register to 24 bits of precision.
Internally, the FPP processes all data as normalized extended
floating point numbers and can convert the input data if the data
is not already in this form. As shown in Table 7, normalization is
accomplished by shifting the mantissa left to eliminate any zeros
between the binary point and the first non-zero bit, while the
exponent is correspondingly reduced by subtracting 1 for each
shift. In the upper example of Table 7, there are two zeros between
the binary point and the 1 bit. Therefore two left shifts are
necessary, and the exponent is reduced from -121 to -123. In the
lower example, the number is too small to be normalized, resulting
in an underflow condition. The mantissa is shown shifted left seven
positions, which reduces the exponent to its smallest possible
value, -128. No further left shifts can therefore be made, and
there is still one remaining zero between the binary point and the
1 bit. If a number is too large to be represented in the normalized
extended floating point format, an overflow condition results.
TABLE 7
NORMALIZATION Mantissa Binary Exponent Binary representation)
(Decimal representation) + .0010000000 . . . 000 -121 + .1000000000
. . . 000 -123 UNDERFLOW (Cannot be normalized) Mantissa Binary
Exponent (Binary representation) (Decimal representation) +
.0000000010 . . . 000 -121 + .0100000000 . . . 000 -128
FIG. 6 defines the ranges of valid binary numbers that can be
processed by the FPP. The shaded areas define overflow and
underflow ranges. In the VALUE column, the mantissa is enclosed in
parentheses, followed by the exponent outside of the parentheses.
Note that 0 (middle line of the figure) is represented by all zeros
in both the mantissa and the exponent. Positive numbers are shown
above this line, and negative numbers are shown below this line. In
the far right column, the exponent values are shown increasing in
both directions from the zero line, from the smallest representable
value (-128) to the largest (+127). Nonrepresentable exponent
values, smaller than -128, are also underflow conditions, but this
situation is not considered in FIG. 6.
For each finite value of exponent, the mantissa is assumed to go
through its complete cycle of valid values. In approximate terms,
the positive mantissa cycles from +1/2 to +1, and the negative
mantissa cycles from -1/2 to -1. More precisely, the positive range
is from +1/2 to the largest possible fractional number under the
value of 1. The negative range is from the largest possible
fractional number below (more negative than) -1/2 to -1.
The significance of +1/2 and -1/2 in determining mantissa ranges is
a result of the normalization requirement, which dictates that
there will be a significant digit immediately to the right of the
binary point. This automatically eliminates all numbers between
+1/2 and -1/2, including the exact value of -1/2, but excluding 0
and +1/2.
If an error condition results during the execution of an FPP
routine, the FPP provides an error code and an error control
signal. Table 8 lists the possible types of error that can occur
for each of the 20 routines of the FPP. ##SPC14##
1. x = Contents of X-register before execution. z = Contents of
X-register after execution.
The FPP add, subtract, multiply, and divide routines will indicate
an underflow error condition if the result cannot be normalized,
i.e., the result falls in one of the shaded areas immediately
adjacent to zero in FIG. 6. If the input data is not normalized,
the FPP will normalize the operands before beginning the
computation; if the numbers are too small to be successfully
normalized, the FPP will attempt normalization as far as possible
and then proceed with the computation. The answer will be correct
except that the sign bit of the exponent will be incorrect
(complemented). The FPP will indicate an underflow error condition
for CMX if the pre-execution contents of the X-register (i.e., x)
equals (1/2)2.sup..sup.-128, for HCX or HSX if x is less than -88,
and for EXX if x is less than -87.3. The SBX routine checks if the
subtrahend from memory has the minimum allowed positive value
(i.e., (1/2)2.sup.-.sup.128 in FIG. 6); this would produce an
underflow when converted to negative form during execution.
However, as in all underflow conditions, the computation will
proceed, and only the sign bit of the exponent will be
incorrect.
The FPP add, subtract, multiply, and divide routines will indicate
an overflow error condition if the result exceeds the largest
positive or negative number which can be represented; i.e., the
result falls in either the top or bottom shaded areas in FIG. 6.
Answers will be correct except for the sign bit of the exponent,
which will be complemented. However, the divide routine DVX can
produce one exception to this general rule. If the dividend
exponent is +126 or +127 and the divisor exponent is -127 or -128,
the resultant exponent will be incorrect. In this case, the
exponent value will equal the original value of the dividend
exponent, plus two. This produces a rollover to an apparent
negative exponent of -128 (if the original was +126) or -127 (if
the original was +127). The SBX routine also checks the subtrahend
from memory before execution begins. If the subtrahend is at the
maximum allowed negative value [i.e., (-1)2.sup.127 in FIG. 6], an
overflow will result when the number is converted to positive form
during execution. Similarly, the CMX and ABX instructions check the
pre-execution contents of the X-register for the maximum allowed
negative value; overflow will result when the number is converted.
However, in the SBX, CMX, and ABX routines execution will proceed,
and only the sign bit of the exponent will be incorrect
(complemented). The RNX routine will indicate an overflow condition
if the number is at the maximum allowed positive value and then is
rounded upward. This will cause rollover to the maximum negative
number. Overflows resulting from the FIX, HCX, HSX, and EXX
routines produce results which are generally unpredictable, making
it difficult or impossible to reconstruct correct answers.
The CSX, SNX, and TNX routines allow the arguments to include
multiple rotations of the angle represented by the argument. These
rotations use up part of the available floating point bits, leaving
fewer bits to express the fractional part of the rotation for the
computation. When the number of rota-tions reaches about 2.sup.38
/2.pi., there is insufficient resolution to express angles in
increments smaller than 90 degrees. The no-resolution error code 3
indicates this condition.
If the divisor for the DVX routine is zero, the division will not
be attempted, and the error code 4 will indicate this condition. In
the TNX routine, odd numbers of quarter rotations (1/4, 3/4, etc.)
result in a divide-by-zero condition (sin = 1, cos = 0). This is
also indicated by the error code 4. The value 1 will remain in the
X-register on exit from this routine.
Attempts tO calculate the square root (SRX) of negative numbers, or
the natural logarithm (LNX) of zero or negative numbers, will not
be executed, and will leave the X-register unchanged. Attempts to
calculate the archyperbolic tangent (AHT) of numbers which are
.+-.1 or greater in magnitude also will not be executed and will
leave the X-register unchanged with the exception that the attempt
to calculate the archyperbolic tangent of -1 will result in
clearing the X-register to zero.
Undefined operation codes given to the FPP will indicate error code
6 and will otherwise be ignored.
The ENX, RNX, and FIX are explained in the following
paragraphs.
Entier (ENX) is simple truncation of the fractional part of a
number. This is accomplished by noting the value of the exponent
and saving that number of places in the most significant part of
the mantissa. The remaIning bits of the mantissa are cleared (to
zeros). The effect for positive numbers is to reduce the X-register
contents (if there is a fraction) to the nearest integer value and
for negative numbers is to increase the X-register contents (even
if there is no fraction) to the nearest negative integer value.
Rounding an extended floating point number to 24 bits of
significance (RNX) implies that a specific number of bits (16) will
be cleared. Before these bits are cleared, however, the most
significant bit of those to be cleared is examined. If this bit is
a 1, indicating for positive numbers that the part of the number to
be cleared represents a value of 1/2 (or more) of the least
significant bit of the saved part, the saved part is incremented by
+1. Thus, if the cleared part of a positive number is 1/2 or
greater, the value is rounded upward (not necessarily to an
integer); otherwise, the value is rounded downward. If the cleared
part of a negative number is -1/2 or more negative, the value is
rounded in the negative direction; otherwise, the value is rounded
in the positive direction. Note that if all 23 significant data
bits were 1's before execution, rounding upward (incrementing by
+1) would cause overflow; unless an overflow condition exists, the
instruction will automatically renormalize the number.
Rounding an extended floating point number to the nearest integer
(FIX) involves two steps: conversion to integer, and rounding.
First, the mantissa is shifted right while the exponent value is
incremented, once per shift, until the exponent equals +31. Then
the most significant bit of the eight bits to be cleared is
examined. If this bit is a 1, the saved part is incremented by +1.
For positive numbers, the 1 bit indicates a fraction of 1/2 or
greater; for negative numbers, it indicates a fraction smaller than
1/2. After rounding, bits 8 through 15 are cleared. If the cleared
part of a positive number is 1/2 or greater, the saved part is
rounded to the next higher integer; otherwise the number is reduced
to the integer-only value. If the cleared part of a negative number
is -1/2 or more negative, the value is rounded to the next more
negative integer; otherwise the number becomes the integer-only
value.
THEORY OF OPERATION
The floating point processor has two basic modes of operation, one
for binary function routines, and one for unary function routines.
FIGS. 7 and 8 illustrate these two basic modes of operation.
In FIGS. 7 and 8, the three registers, A, B, C, together comprise
what was earlier called, for simplicity, the X-register. Each of
the three registers has 48 bits for data (mantissa), and also has 8
bits for an exponent byte (E) and 8 bits for a shift control byte
(S). Since the C-register has no facilities for shifting, the C
shift control byte is used to receive the operation code from the
COMP. The shaded areas indicate the location of operands at the
start of the operations.
Referring now to FIG. 7, an operand quantity x, previously loaded
into the FPP, is assumed to exist in the FPP B-register. The COMP
places an operation code on interface data lines 40 and issues an
OPC command to the FPP via line 42. (An OPC command initiates the
reception of an operation code through the 16 bit input port of the
FPP and initiates the execution of that operation.) The FPP, which
operates under control of firmware programs in the ROM, cycles in a
wait mode as long as it is in the ready state, looking for an OPC
command. When the ROM program detects the presence of OPC, it loads
the operation code data into the S byte of the FPP C-register. Then
the microprogram identifies the type of operation by decoding the
operation code bits, and branches to the appropriate function
routine. The operation code is now no longer needed.
As the microprogram proceeds, the value in the FPP B-register is
manipulated according to the algorithm for the particular function,
using all three registers. At the end of the routine, the final
answer resides in the FPP B-register, and a FLG (Flag) signal is
sent back to the COMP. The FLG signal indicates, if following an
OPC command, that the FPP is ready for further commands, having
completed the last issued command.
If an error occurs during the calculation, or if the operation code
is improper, an ERR (Error) signal is sent back to the COMP with
the FLG signal. It is the user's option to decide what to do about
an error condition. In general, it may be said that the FPP will
attempt the calculation, rather than abort, even if input values
will result in an error. The FPP will provide the best answer it
can, along with the error indication. This allows the programmer
some flexibility to reconstruct correct answers from results which
normally could not be represented. The exception to this
generalization is that divisions by zero will not be attempted.
Referring now to FIG. 8, the binary function routines take two
operands, one that was previously loaded into the FPP, and one that
exists in the COMP memory, and operate on these operands. The
operations include addition, subtraction, multiplication, and
division.
Initially the quantity x is assumed to exist in the FPP B-register.
It may have been left there as the result of a previous
instruction, or it may have been loaded by a load instruction LDX.
The COMP first fetches one word of a three-word operand (y) from
memory. It then puts this data word on the interface data lines and
issues an ENC command to the FPP unit. An ENC command initiates the
reception of an operand word through the 16 bit input port and the
transmission of a result word through the 16 bit output port.
As mentioned previously, the FPP unit, under control of the
microprogram, continuously searches for ENC or OPC commands as long
as it is in the ready state. When the microprogram detects the
presence of ENC, it loads the data word (in two 8-bit bytes) into
the high order third of the FPP C-register.
After the second byte has been loaded, the FPP unit sends a FLG
signal back to the COMP, indicating readiness for the next word of
y. The COMP fetches this next word from its memory and repeats the
process: the word is placed on the interface data lines 40, an ENC
command is given to the FPP to load these two bytes, and another
FLG signal is returned to repeat the process for the third and
final time.
At the end of the three-word transfer, the quantity x is in the FPP
B-register and the quantity y is in the FPP C-register. The FPP
unit now needs to be told what to do with these numbers. The entire
process described above under function routines is now added on. In
brief, the procedure is:
a. The COMP issues an operation code and OPC control signal.
b. The FPP unit loads the operation code into the FPP
C-register.
c. The ROM program interprets the operation code and branches to
the appropriate function routine (add, subtract, etc.).
d. The function routine calculates the answer, and leaves it in the
FPP B-register.
e. A final FLG returned to the COMP tells it that the FPP unit is
ready for further commands.
f. In case of error, an ERR signal indicates that an error code is
present on the data lines to the COMP.
The load instruction LDX operates similarly to the above procedure,
except that the operation code simply causes the loaded FPP
C-register contents to move up into the FPP B-register.
The OPC command, or ENC command, should only be given (set to 1) to
the FPP if FLG from FPP is 1. The ENC command should not go to 0
until FLG from FPP goes to 0. The ENC or OPC command should not go
to 1 again until FLG goes to 1.
The following describes in detail the procedures described above,
plus a description of the FPP power supply. All logic is
positive-true. The high (or true) state ranges from +1.25 to +2.5
volts; the low (or false) state ranges from -0.5 to +0.5 volts.
The block diagrams of FIGS. 9A-D are referenced throughout this
description. Tables 9 and 10 provide supporting information: the
detailed coding of the instruction register (IR), definitions of
the ROM instructions, and a list of tests used for branching
decisions. These tables and FIGS. 9A-D should be referred to
frequently, since definitions will not be given within the
descriptive text. The logic diagrams for the FPP are given in FIGS.
11-40, a wiring diagram is given in FIGS. 42A-F, and flow charts
for the FPP routines are given in FIGS. 44-71. Specific signal
names are given on the block diagram, facilitating direct reference
from a function on the block diagram to the comparable function on
the logic diagrams, wiring lists, signal indexes, and signal
transfers from board to board.
The clock generator for the floating point processor unit operates
at a rate of 5 MHz (200-nanosecond period) supplying a
100-nanosecond clock signal, and its complement, to the FPP logic.
The clock generator is located on the FPP D-register card. The
complemented clock signal (Clock) allows the clock cycle to be
split, so that an active bit may be loaded into a register in the
first 50 nanoseconds and perform its function in the second 50
nanoseconds. This high-speed feature employs a master/slave pair of
flip-flops for each bit. ##SPC15##
As shown in FIG. 72, Clock latches the master flip-flop, and Clock
latches the slave flip-flop. The master flip-flop is loaded with
data (Data 1) by Clock, and about 45 nanoseconds later Clock
transfers the bit to the slave flip-flop. (There is a slight offset
between Clock and Clock.) The output of the slave flip-flop can
then be used in the logic without affecting the inputs (such as
Data 2) that determined the setting of the master flip-flop.
The FPP ROM has nine input lines (RA0 through RA8), thus allowing
512 addresses (2.sup.9), and 48 output lines (R0 through R47),
giving a word length of 48 bits.
The ROM constantly reads out whatever contents are enabled by the
nine address lines. The ROM output lines are clocked either into
the instruction register or if the preceding instruction contained
a constant call into the D-register . All words in the ROM are
either instructions or constants. For start-up purposes (power-on),
ROM is forced to start at address 0.
The FPP ROM contents are listed in Table 11 in the form of
mnemonics and constants. Physically, the ROM is contained in 24
microcircuit packages on a single printed circuit card. Each
package accepts 8 of the 9 address lines and has 4 of the 48 output
lines. (See logic diagram, FIGS. 11A-E). The ninth address bit
(RA8) selects either the high half or low half of ROM. The lower
rank of 12 packages is enabled when RA8 is 0 and is therefore
active for address 0 through 255 (decimal). The upper rank of 12
packages is enabled when RA8 is 1 and is therefore active for
address 256 through 511. The output of the two ranks are "or"-tied
together. ##SPC16## ##SPC17## ##SPC18## ##SPC19## ##SPC20##
##SPC21## ##SPC22## ##SPC23## ##SPC24## ##SPC25##
The instruction register is clocked to load the ROM output every
200 nanoseconds. The data word is loaded into the master latch of
the instruction register by Clock and into the slave latch by
Clock. At Clock the contents of the instruction register are used
to control the logic (see block diagram). Loading the instruction
register occupies one clock cycle (time intervals 1 and 2), as
shown in FIG. 73.
As soon as the instruction is in the slave latch of the instruction
register, execution begins. A typical execution might read a pair
of byte operands, add them during Clock (2) and store the result
during the next Clock (3) into the master latch of the specified
register. The next Clock (4) would transfer the stored result from
master to slave, where it may be used (read) by the next
instruction. Notice that there is a time overlap, and the second
instruction has already been loaded from ROM (3) into the master
latch of the instruction register and execution has begun (4).
The detailed coding of the instruction register, plus definitions
of the instruction fields, are given in FIG. 74. Physically, the
register is split up and located on three separate cards: ROM
address card, D-register card, and FPP interface card.
The floating point algorithms require considerable flexibility for
branching from one area of ROM to another. The following paragraphs
describe the various modes employed to specify the next address
within a current instruction. The addressing modes are shown in
Table 12.
Each microinstruction specifies where the next micro-instruction is
to be obtained by selecting one of two addresses, dependent on a
test condition. One of 16 conditions may be selected for the test
by the TS field of the instruction. These conditions are numbered
T0 through T15, as shown in Table 10.
TABLE 12. ROM ADDRESSING MODES
NEXT ADDRESS MODE PAGE LINE Unconiditional Branching (TS = 0) BP BL
Conditional Branching TS True MP (Current Page) BL TS False MP
(Current Page) BP Indirect and Constants IND CS (4-7) CS (0-3) Bit
8 = 0 CON CS (4-7) CS (0-3) Bit 8 = AD9 JSB and Return JSB BP BL
(JSB and RTN (If JAR = 0, save MP in FP) (If JAR = 0, save TS in
FL) complement JAR (If JAR = 1, save MP in GP) (If JAR = 1, save TS
in GL) after execution) RTN If JAR = 0: GP If JAR = 0: GL If JAR =
1: FP If JAR = 1: FL
figs. 9a-d show the sources of many of the test inputs; for
example, OPC and ENC from the computer, selected outputs of the A,
B, C adders, and certain count values of the byte counter YC. These
signals are applied as one input to a three-input gate in the
conditional branching logic block, as shown in FIGS. 9A-D. (This
gate represents a series of gates performing this function.) The
second input to the gate is the decoded test number, and the third
is the BRN signal (also decoded from the instruction register)
which must be true for all branching instructions. If the test is
true, bits 0 through 3 of the ROM address are taken from the BL
field; if the test is false, these bits are taken from the BP
field. Bits 4-8 of the ROM address are taken from the MP register,
which contains the page number of the current microinstruction.
Thus, conditional branches may be made only to microinstructions on
the current page.
Unconditional branch microinstructions give the next ROM address,
regardless of test conditions. The page and line of the next
microinstruction are given by the BP and BL fields, respectively.
The transfer is accomplished by coding BRN and TRU in the
microinstruction. This enables the first of the three gates in the
unconditional branching logic block, which in turn enables BP and
BL onto the page and line address lines.
The jump to subroutine instruction JSB is an unconditional branch
with two special provisions: a return address is stored, allowing
return from subroutine completion to this address, and register
switching to allow one level of JSB nesting.
The decoded JSB signal gates BP and BL onto the page and line
address lines in the unconditional branching logic block. The JSB
signal also stores the current page value into either the GP or FP
register, depending on the current state of the JAR flip-flop, and
loads a return address line value from the TS field into either the
GL or FL register, depending on the JAR flip-flop state. The TS
field can be used to specify the return address line since
conditional branches are not allowed with a JSB microinstruction.
Thus, a return address is stored in either GP or GL or FP and FL.
Since the stored page value is always the current page value,
subroutine returns must be to the same page that contained the JSB
call. Furthermore, since only four of the five page bits are
stored, both the call and return addresses must be in the lower
half of ROM (addresses 0 through 255). The subroutine itself,
however, may be located on any page, specified in the BP field.
After the return address has been stored, the JAR flip-flop toggles
to its complementary state. The initial state of JAR does not
matter, as it is immaterial whether the G or F pair of registers is
the first selected. The toggling of JAR after each occurrence of
JSB insures that the remaining one of the F or G registers will be
used to store a second return address. The return from subroutine
instruction RTN retrieves a return address from either G or F
depending on JAR and branches to that address. RTN also toggles the
JAR flip-flop, insuring that a second return address will be
retrieved from the remaining one of the F or G registers.
Each word in ROM is either an instruction or constant. Instructions
are loaded into the instruction register, and constants are loaded
into the D-register (48 bits). To obtain a constant, the ROM
program must first load the address of the desired constant into
the CS byte of the C-register, and then issue an instruction
containing CON in the BC field. The ninth bit of CS is forced to a
1, thus, constants must be in the upper half of ROM (addresses 256
through 511).
The CON signal reads the CS byte onto the ROM address lines,
disables Clock for one cycle, and enables a special clock TS that
loads the addressed ROM contents into the D-register. See the
D-register logic diagram. When the CON bit (R21) is detected, it is
loaded by TS into the master CON latch. At Clock, the CON bit is
transferred to the slave latch and inverted to disable Clock. The
low input to pin 6 (input control code = 01) causes the master
latch to clear, so that at the next Clock CON will go false. This,
inverted to true by U92B, reenables Clock. The net result is that
one Clock pulse has been inhibited, temporarily halting program
execution while the constant is being read.
Note, however, that the TS clock has continued to run, and this
clock, enabled during the interval that CON is high, loads ROM bits
R0 through R47 (the constant) into the D-register.
If a series of constants is called, the CIC bit (R38) may be used
to increment the constant address in CS. The purpose of the CKC
flip-flop (which, like the CON flip-flop, is clocked by TS) is to
assume the function of the CIC flip-flop (disabled in the absence
of Clock) during the CON cycle. Note that this feature applies only
for CS addresses below 24 (decimal). These 24 locations provide a
table that results in a numerical convergence after 25 steps. The
S24 signal inhibits CIC for addresses 24 or higher in order to
conserve ROM space. The program may continue to call for constants
and keep issuing the CIC command, but in effect the contents of
location 24 will continue to be read on each further call.
IND reads the 8-bit CS byte onto the ROM address lines. For IND
alone, the low half of ROM is addressable (0 to 255) since the
ninth address bit (RA8) cannot be controlled by the 8-bit CS
register. For CON, however, U61A forces the ninth bit to a 1 (since
AD9 is normally 0), so that constants will always be read from the
high half of ROM (256 to 511). For certain purposes (such as the
ROM dump routine), AD9 can be made true, so that CON can also read
the lower half of ROM.
The floating point processor contains three nearly identical
arithmetic sections, each typically consisting of a register, a
shifter, and an adder. The A-register/shifter/adder will be
discussed in detail first; differences for the B and C sections
will be described later.
The FPP A-register accommodates 48 bits of data in six separately
controllable bytes (A0 through A5). In addition, there is an
exponent byte (AE) and a shift byte (AS).
The A-shifter provides a means of selecting any eight consecutive
data bits from the 48 bits stored in the FPP A-register starting at
any bit position.
The A-adder adds an 8-bit byte, selected from the A-register, to
another 8-bit byte, selected from the C register, the B-shifter,
the D-shifter, or the A-shifter.
With reference to the block diagram of FIGS. 9A-D, the logic will
be discussed left to right across the A-register/shifter/adder
block. On the left side of the block is a series of four transfer
mode gates (each representing eight separate gates for the complete
byte). If transfer mode 1 is selected in the ROM instruction, one
of the C-register bytes (on the C50 through C57 lines) will be
transferred as an input to the A-adder. (The C byte number selected
will be the same as the A byte number selected.) Similarly, if
transfer mode 2, 3, or 4 is selected, the gates will transfer a
shifted B byte (on the B70 through B77 lines), a shifted D byte (on
the D70 through D77 lines) or a shifted A byte (on the A70 through
A77 lines).
The output of the Transfer Mode gates is applied to a
true/complement network consisting of an "and"/"nor" pair of gates
for each bit. If the CPA instruction bit is true, each bit is
complemented before being routed to the A-adder on the A90 through
A97 lines.
The other input to the A-adder (lines A60 through A67) is enabled
if the RRA bit of the instruction register is true. The input
consists of one of the FPP A-register bytes on the A50 through A57
lines (byte selection described below).
The result of the addition appears on the A00 through A07 lines,
with a possible carry saved in the CY bit register. The carry may
be used (propagated) by a PCY instruction in the next cycle. It is
also possible to inject a carry (actually an increment by one) by
means of a CIA signal. PCY and CIA are functions of the Special
field of the instruction word, as is BI8, which can force a one on
the eighth bit (A97) of the transferred input to the adder.
Tests which can be made on the A00 through A07 output are: eighth
bit true or false, eighth bit of A does or does not equal eighth
bit of B, and adder output is zero or non-zero.
The adder output is applied to all eight byte positions of the
A-register (the data is stored in a master register at Clock time).
At Clock time, the data will be transferred into a byte position
(slave register) which is selected by one of eight enabling
signals: AY0 through AY5, AYE, or AYS. The enabling signal is
derived from the SR, SY, and YC fields of the instruction register.
The SR field specifies the A-register (SRA), and SY either
specifies byte AS, AE, A5 or else enables the YC field (byte
counter) to select one of the six data bytes, A0 through A5. The
byte counter produces an octal output on the ROM address card,
consisting of signals Y0, Y1, Y2, which is decoded on the FPP
interface card to produce the SY0 through SY5 signals. The decoder
is not enabled if the SY field specifies SY5 (store in A5 byte),
SYS (store in AS byte) or SYE (store in AE byte).
The bytes stored in the FPP A-register can be selectively read out
by read signals derived from the RY and YC fields of the
instruction register. The RY field either specifies byte AS, AE, or
A5 (by RYS, RYE, or RY5 signals), or else enables the decoded byte
count from the YC field to select one of the six data bytes (by the
RY0 through RY5 signals). The selected byte is routed via the A50
through A57 lines to the A-adder.
In addition to the selective byte reading described in the
preceding paragraph, provision is also made to read any adjacent
eight bits in the data portion of the A-register. Byte boundaries
are ignored, and the register is looked at as a 48-bit data
register. Selection is accomplished by the A-shifter, under control
of the shift byte (AS) in the A-register.
The shifter may be viewed as an addressable reader. (see FIG. 75.)
The numerical value of the shift byte (decimal) points to the least
significant bit of the desired 8-bit series. This bit and the next
higher seven bits are read out to the transfer mode gates.
As shown in FIG. 75, special cases occur when the shift byte points
to bit positions higher than 40. (Seven of the eight bits of AS are
used for addressable reading, so AS can point to values as high as
127.) When the AS value is between 41 and 47 (inclusive), one or
more bits selected at the high end of the series of eight will be
nonexistent. These non-existent bits are referred to as phantom
bits (P); provision is made to fill these bit positions on the
output lines (A70 through A77) with either zeros or copies of the
sign bit (bit 47). The desired choice is made by controlling bit 7
of the shift byte (AS7): if 0, signs will be copied (arithmetic
shifting); if 1, zeros will be filled in (logical shifting). Note
that when AS is 48 or higher, all of the selected bits will be
phantom bits.
Details of the selection process are shown in FIG. 76. The six
least significant bits of AS are decoded octally into two sets of
selection signals, designated SW0 through SW7 and SV0 through SV7.
(AS6, if true, would result in the all-phantom condition, so it is
not decoded but is simply "or"-tied with SW6 and SW7; see next
paragraph.) The SW0 through SW5 signals accomplish a preselection
of 15 out of the 48 register bits, and the SV0 through SV5 signals
select 8 out of the 15 preselected bits. These final eight bits are
routed out on the A70 through A77 lines.
Refer to the A-shifter logic diagram for details on the generation
of phantom bits. Note that U50C gates sign bit 47 to the higher
order SW5 positions if AS7 is 0. If AS7 is a 1, the output of U50C
is 0. (Final selection of one or more of these bits is made by the
SV0 through SV7 signals.) For the all-phantom condition, the
shifter network is ignored completely (all zeros on the A70-77
lines); instead, a true or false TSA signal is sent to the
complementing networks. Gate U50D is enabled by SW6, SW7 or AS6,
and will provide a true output if phantom signs are desired (AS7 =
0) and the sign bit happens to be a 1. Otherwise, if the sign bit
is 0 or if phantom zeros are desired (AS7 = 1), TSA will be false.
Depending on the transfer mode selected, TSA will affect one of the
three complementers (A, B, or C) by inverting the existing all-zero
output to all ones (TSA true) or will leave the data as all zeros
(TSA false). The result is eight copies of the sign (1 or 0), or
eight zeros.
For microprogramming purposes, it is advantageous to have the
pointer in AS keep in step with the byte counter. This means that
whenever the byte counter is incremented or decremented to enable
the next higher or lower byte position, the shift pointer should
also change value to enable the next higher or lower series of
eight bits. The AY8 adder performs this function.
In order for the AS value to point to a new series of eight bits,
its value must increase by 8 when YP1 increments the byte counter
and must decrease by 8 when YM1 decrements the byte counter.
Furthermore, when the byte counter rolls over from 5 to 0
(incrementing, modulo 6) or from 0 to 5 (decrementing), the AS
value must change correspondingly: return to its original value or
go to the original value plus 40 (i.e., 5 .times. 8),
respectively.
Referring to the ROM address card logic diagram, FIGS. 14A-R, note
that when YP1 increments the byte counter (via U35E), it also
increments the AY8 adder. Since the AY8 adder operates on bits 3
through 6 of the AS register (rather than 0 through 3), each
increment adds 8 to the contents of AS, via the AP3 through AP6
lines. Similarly, when YM1 decrements the byte counter (by adding
all 1's via U35D/C, U33B, and U32A), it also decrements the AY8
adder decrementing AS by 8 via the AP3-6 lines.
Gates U35E, U33A, and U32D cause the byte counter (and AY8 and BY8
adders) to act as modulo 6 counters when incrementing. When the
count of 5 is detected by U24A and U15D, the next YP1 will inject a
quantity which, when added to 5, will produce zero. For the 3-bit
byte counter this quantity is 3 (via U35E and U33A). For the 4-bit
AY8 and BY8 adders this quantity is 11 (all three gates).
To achieve modulo 6 when decrementing, gates U33B and U32A are
disabled at the count of zero, and allow U35D and U35C to inject a
quantity of 5. This reverts the byte counter to the count of 5 and
adds 40 to the AS register via the AP3 through AP6 lines.
(Incidentally, the AP3-6 lines are disabled when AS is originally
loaded, by the SYSA signal.)
The method by which the byte counter is forced to zero (YFO) is to
add the current value of Y to its complement (U35C, U33C, U16B) and
inject a carry (U35A). For the 4-bit adders, U32B injects the
necessary one-bit for the most significant bit position.
The FPP B-register/shifter/adder is identical to the A section
described above, with only signal nomenclature changes and a
different assignment of inputs for the transfer modes.
The C-register/adder does not have an associated shifter. Instead,
the third shifter is assigned to the D-register. The D-shifter is
controlled in parallel with the A-shifter by the AS shift byte.
Since the CS byte is used for indirect addressing of ROM, the CS
output is routed to the ROM address card. Also the S24 signal is
made available to the conditional branching test logic.
On the C-adder card, the CSX line is open 0, whereas on the A- and
B-adder cards ASX and BSX are enabled by tying to +4.75 volts. This
disables the CP input lines to the shift byte (CS), since these
lines are not used in the C arithmetic section.
As mentioned above, data is transferred into or out of the FPP in
three successive 16-bit words. It was also stated that the COMP
sends 16 bits of data with every ENC, and the FPP returns 16 bits
of data with every FLG, whether or not data is actually used at
either end. Data to the FPP is loaded into the C-register (and
transferred to the B-register if a load opcode follows), and is
sent from the B-register. Referring to table 13, the process is as
follows:
TABLE 13. X REGISTER TRANSFER SEQUENCE
ENC INPUT OUTPUT No. 1 IM0-7 .fwdarw. C5 B5 .fwdarw. A4 IL0-7
.fwdarw. C4 A4 (FLG) No. 2 IM0-7 .fwdarw. C3 B3 .fwdarw. A2 IL0-7
.fwdarw. C2 B2 (FLG) No. 3 IM0-7 .fwdarw. C1 B1 .fwdarw. A0 IL0-7
.fwdarw. Convert B0 Convert to FPP format to FPP format .fwdarw. C0
(FLG)
on the first ENC, the entry routine first loads the high order
eight bits (IM0 through IM7) into C5 while, simultaneously, B5 is
transferred to A4 and read out to the output lines (A50 through
A57). Then the byte counter is decremented (pointing to byte 4).
The low order input bits (IL0-7) are loaded into C4, and B4 is read
out to the B50 through B57 lines. A FLG signal is issued to the
COMP, telling it that it can store the 16 bits from A4 and B4.
On the second ENC, the byte counter decrements to 3, and IM0-7 is
loaded into C3, while B3 is transferred to A2. Decrementing to
count 2 allows C2 to be loaded, and A2 and B2 to be read out (with
FLG).
On the third ENC, the byte counter decrements to 1, and IM0-7 is
loaded into C1. Then, when the byte counter decrements to 0, a
format conversion occurs which moves the exponent sign bit to the
proper position. (Internally in the FPP, this bit is in the most
significant bit position; externally in the computer, it is in the
least significant bit position.) Byte B1 is now transferred to A0,
and A0 and B0 are read out (with FLG).
POWER SUPPLY
The power supply of the floating point processor generates two
regulated dc supply voltages for all logic circuits in the unit:
+4.75 volts and -2 volts. (A third dc voltage, +10 volts, is also
generated, but this supply is used only within the power supply
itself.)
FIG. 77 illustrates the power supply circuits in simplified form.
The 115- or 230-volt ac input is stepped down to a nominal 12 volts
ac and rectified by a pair of silicon-controlled rectifiers (SCR).
The inductor/capacitor filtered output is 6.75 volts dc, referenced
to ground such that the positive line is +4.75 volts and the
negative line is -2 volts with respect to ground.
The full 35-ampere current capacity is available to the +4.75-volt
load, and up to 35 amperes is available to the +2-volt load. Since
the -2volt load is less than the +4.75-volt load, the difference
current is diverted through the -2-volt shunt regulator. This
regulator acts in the same way as would a Zener diode. A variable
amount of current is drawn through the shunt in order to maintain a
constant -2-volt level.
The level of the +4.75 voltage is maintained constant by
controlling the conduction time of the SCR. The +4.75-volt level is
detected by a differential amplifier, which compares the voltage
with a Zener diode reference. The difference output is used to
control the slope of a ramp voltage, which is synchronized to the
120 Hz rectified line frequency. When the ramp reaches the trigger
level of a unijunction transistor in the ramp generator, the ramp
terminates, generating a positive pulse of about 10 volts amplitude
and 20 microseconds duration. This pulse triggers the SCR's, which
will then continue to conduct for the remainder of the half cycle.
As shown in FIG. 77 (note examples of ramp slope and rectified sine
wave), a variance of ramp slope has the net result of altering the
conduction time (shaded area). Consequently, the energy delivered
to the LC filter will be increased or reduced proportionately, thus
providing the means of controlling the output dc level.
Referring now to FIGS. 40 A-J, input ac power is applied to power
line assembly A1. This snap-in module contains the ac line
connector, line fuse, rf interference filter, 115/230V line voltage
switch, and terminals for connection of the front panel POWER
switch, power-on indicator lamp (DS1), and power transformer. Relay
K1 is inserted in series with the transformer primary, so that
power will be turned off if either the computer loses power (-23.8V
drops) or the ambient temperature in th FPP unit rises too
high.
Sensing of the +4.75-volt level is made from a point on the
backplane bus. Due to the high currents involved, bus resistance
itself will drop the dc level slightly; power is therefore applied
to the bus at two points, and the sense line is connected to a
point that represents an average value.
The sensed +4.75 voltage is applied to a differential amplifier at
Q1/Q2, which compares a divided sample (R20/R21) to a presettable
reference level from resistor R25 (+4.75V ADJ). Any voltage
difference between the bases of Q1 and Q2 is amplified and applied
to Q3, altering the charging rate of ramp capacitor C30. When Q3
has charged C30 to the triggering level of unijunction transistor
Q4, Q4 discharges C30 to the -2-volt clamping level. The sharp
negative transition at the base of Q5 turns on Q5 for about 20
microseconds, dependent on circuit constants, and the resultant
positive pulse is applied through emitter follower Q6 to the SCR
trigger inputs (CR5, CR6). Diode CR22 limits the pulse amplitude to
+10 volts and protects Q6; CR8 protects the SCR's (which are
non-conducting before the pulse arrives).
The positive pulse turns on CR5 or CR6 (depending on the ac cycle
polarity), charging filter capacitors C11, C12, and C13 through
inductor L4. At the end of the half cycle, ac polarity reverses and
the SCR ceases conduction. Since the other SCR will not begin its
conduction until triggered, neither SCR is conducting at this time.
The inductive field of L4 begins to collapse, building up a reverse
voltage which could be destructive if protection were not provided.
Diode CR7 provides this protection by coming into conduction when
the reverse voltage exceeds the -2-volt level, and provides a
current path back to the filter capacitors. Thus, even when both
SCR's are off, the inductor still delivers current to the load.
When the next SCR is triggered, it abruptly puts out a positive
voltage to the inductor, and thus reverse biases CR7. In summary:
CR7 conducts when the SCR's are not conducting.
As explained above, the timing of the SCR trigger accomplishes the
voltage regulating function.
The primary purpose of Q7 is to synchronize the unijunction
oscillator to twice the line frequency. A secondary function is to
inhibit the triggering of unijunction transistor Q4 when the
crowbar is on, thus reducing current delivered to the Crowbar,
CR80. When the input voltage (pulsating dc from the input to L4) is
in excess of +9 volts, Q7 is saturated (on), providing a low
impedance path for the Q3 collector current, diverting it from C30.
Thus the unijunction oscillator is held in the off state. (Note
that a positive input from the crowbar, via Q29, could permanently
hold the oscillator in this off state.) Then, when the pulsating
voltage drops below +8 volts, Q7 is cut off, and the current from
the Q3 collector is shunted to ramp capacitor C30. This results in
a voltage ramp on the emitter of Q4, the slope of which (as
discussed earlier) is determined by the collector current of Q3.
The start of the ramp is therefore determined by the on-to-off
transition of Q7, which occurs twice for each cycle of the
line.
The -2-volt sense voltage referred to above is applied through a
presettable divider to the base of Q18. The bottom end of the
divider is held constant by a Zener diode reference. The -2-volt
adjustment resistor is set so that the Q18 base is at zero volts
when the -2-volt output is at its nominal value. This
zero-volt-level is compared with the zero-volt ground at the
emitter of Q19. Any difference is amplified by Q19, Q20, and Q21,
altering the flow of shunt current through Q22. The direction of
change (more or less current) is such as to maintain a fixed
voltage value on the -2-volt sense line. As mentioned earlier
(paragraph 4-- 94), the circuit acts like a Zener diode in
maintaining a fixed voltage output. About 5 amperes is passed
through Q22.
Transistors Q8 and Q9 are normally conducting. When an unusual
current drain increases the dc voltage drop across inductor L4 to a
specific level (determined by the selected values of R40 through
R44), Q8 and Q8 will be biased off. Under this condition, CR35
clamps the unijunction input to a level that is below the trigger
point. No pulses are therefore applied to the SCR's, and no further
conduction occurs. Both +4.75-volt and -2-volt outputs are thus cut
off.
There are three separate circuits involved in detecting and acting
on out of limit dc voltage conditions. These three circuits (-2V
limit sense, +4.75V limit sense, and crowbar) are discussed
together under the current heading.
FIG. 78 illustrates the actions that occur when either the +4.75 or
-2 voltages go out of limits. When the +4.75 voltage (applied to
Q23/Q24 bases) rises too high, to a level set by R91, Q24 will
conduct and activate the power fail circuit (discussed later under
paragraph 4-110). Similarly, if the +4.75 voltage drops below a
negative limit set by R92, Q23 will conduct and activate the power
fail circuit. In the -2V limit sense circuit, if the -2 voltage
(applied to the top of the divider), becomes too positive, to a
level set by R61, Q14 will conduct and activate the power fail
circuit. The negative limit sensing circuit uses a normally
conducting emitter follower (Q13). When the -2 voltage becomes too
negative, Q15 will conduct and activate the power fail circuit.
If the +4.75 voltage becomes excessively positive (above about +6
volts), or if the -2 voltage becomes excessively negative (more
than about -3 volts), the crowbar circuit triggers and cuts off
both supplies.
The crowbar circuit uses an SCR diode (CR80). When the -2-volt
level goes more negative than the breakdown level of CR82 (normally
an effective open circuit), CR82 causes Q30 (and Q31) to conduct.
Or, if the +4.75-volt level goes more positive than the breakdown
level of CR81, Q31 will again be caused to conduct. This is because
both emitter and base voltages increase together as the +4.75
voltage rises; then CR81 breaks down and holds the base low. When,
from either cause, Q31 conducts, SCR diode CR80 is triggered,
effectively short-circuiting the +4.75V and -2V supplies together.
This protects logic circuits from overvoltage damage. To prevent
the rectifiers from delivering any more current to this short
circuit, Q29 (which goes into conduction when the SCR triggers)
inhibits the sync amplifier. Transistor Q7 is driven into
saturation, thus preventing further trigger pulses to SCR
rectifiers CR5 and CR6.
Diodes CR60 and CR61 rectify a sample of the transformer secondary
output, and the resulting pulsating direct voltage is applied to
two RC filters. One filter (R70, C50) has a short time constant,
and the other (R71, R72, C51) has a long time constant. The filters
are isolated from each other by CR63. As a result (see FIG. 79), a
dc voltage representing the peak value of the rectified ac is
present at the emitter of Q16, and a partially filtered waveform is
present at the base. Normally (see half-cycle number 1), the
exponential decay is not sufficient to cause conduction of Q16
before the next half-cycle restores the C50 charge. If, however, at
least two half-cycles are missed (assume ac power is lost at the
end of half-cycle number 2), the base voltage will drop to the
point where conduction of Q16 will occur. With Q16 conducting, Q17
will also be turned on, thus activating the power fail circuit.
When any of the previously discussed voltage sensing circuits
indicate a failure, Q26 is caused to conduct. (Note that four of
the sources, Q14, Q15, Q17, and Q23, require inversion by Q25,
whereas the Q24 source does not.) The conduction of Q26 in turn
causes the other four transistors in the power fail circuit to
conduct. The EPF signal (normally low) goes high to initiate a
power fail interrupt in the computer. A few milliseconds later, EP0
(normally high) goes low; when power is restored, EP0 will go high
again, initiating a restart sequence in computers which have the
restart option installed and enabled.
Several circuits in the power supply require a +10-volt operating
voltage. To supply this, the transformer secondary is rectified by
CR40 and CR41, filtered by C45, and regulated by Q10. The control
for Q10 is the differential amplifier consisting of Q11 and Q12.
The reference voltage provided by CR42 is compared with a divided
sample of the +10V output, and any difference is applied as a
correction signal to the base of Q10.
INTERFACE INFORMATION
ENC: Initiates the reception of an operand word through the 16 bit
input port and the transmission of a result word through the 16 bit
output port.
FIRST OPERAND WORD: FPP receives through the 16 bit input port the
16 most significant magnitude bits (including sign) of a 48 bit
floating point operand (reception initiated by the 1st, 4th, 7th,
etc. ENC command after the last preceding OPC command).
SECOND OPERAND WORD: FPP receives through the 16 bit input port the
16 middle significant magnitude bits of a 48 bit floating point
operand (reception initiated by the 2nd, 5th, 8th, etc. ENC command
after the last preceding OPC command).
THIRD OPERAND WORD: FPP receives through the 16 bit input port the
8 least significant magnitude bits and the 8 exponent bits of a 48
bit floating point operand (reception initiated by the 3rd, 6th,
9th, etc., ENC command after the last preceding OPC command).
OPC: Initiates the reception of an operation code through the 16
bit input port and initiates the execution of that operation.
OPERATION CODE: FPP receives through the 16 bit input port an 8 bit
operation code contained in bits 0-7 (reception initiated by the
OPC command).
FLG: Indicates if following and ENC command that the reception of
an operand word is complete and that the transmission of a result
word has begun, or if following an OPC command that the execution
of an operation is complete.
FIRST RESULT WORD: FPP transmits through the 16 bit output port the
16 most significant magnitude bits (including sign) of a 48 bit
floating point result (transmission initiated by the 1st, 4th, 7th,
etc. ENC command after the last preceding OPC command).
SECOND RESULT WORD: FPP transmits through the 16 bit output port
the 16 middle significant magnitude bits of a 48 bit floating point
result (transmission initiated by the 2nd, 5th, 8th, etc. ENC
command after the last preceding OPC command).
THIRD RESULT WORD: FPP transmits through the 16 bit output port the
8 least significant magnitude bits and the 8 exponent bits of a 48
bit floating point result (transmission initiated by the 3rd, 6th,
9th, etc. ENC command after the last preceding OPC command).
ERR: Indicates that an error or special condition has been
encountered during the execution of an operation and that the
transmission of an error code has begun.
ERROR CODE: FPP transmits through the 16 bit output port an 8 bit
error code contained in bits 8-15 (transmission initiated by an
error or special condition encountered during the execution of an
operation). ##SPC26## ##SPC27## ##SPC28## ##SPC29## ##SPC30##
##SPC31## ##SPC32## ##SPC33## ##SPC34##
* * * * *