U.S. patent application number 12/105188 was filed with the patent office on 2008-08-21 for data processor.
This patent application is currently assigned to Renesas Technology Corp.. Invention is credited to Naoki Mitsuishi, Takahiro Okubo, Shinichi Shibahara.
Application Number | 20080201564 12/105188 |
Document ID | / |
Family ID | 29774398 |
Filed Date | 2008-08-21 |
United States Patent
Application |
20080201564 |
Kind Code |
A1 |
Mitsuishi; Naoki ; et
al. |
August 21, 2008 |
DATA PROCESSOR
Abstract
An object of the present invention is to achieve fast data
processing. A unit (FF) is included for selecting whether a central
processing unit (CPU) performs instruction reading in units of 16
bits (a first word length) or in units of 32 bits (a second word
length). Depending on whether instruction reading is performed in
units of 16 bits or 32 bits, increment values (+2 and +4) by which
a program counter (PC) is incremented are switched. Data reading or
writing is performed in units of a given data length irrespective
of the selecting unit. When the CPU issues a request for
instruction reading in units of 16 bits or 32 bits or for data
reading or writing, a bus control unit performs reading or writing
a predetermined number of times according to a bus width designated
for a resource located at an address specified in the request. The
bus control unit causes the CPU to wait until an instruction of 16
or 32 bits long (read data) requested by the CPU gets ready.
Inventors: |
Mitsuishi; Naoki; (Kodaira,
JP) ; Shibahara; Shinichi; (Kodaira, JP) ;
Okubo; Takahiro; (Hanno, JP) |
Correspondence
Address: |
MILES & STOCKBRIDGE PC
1751 PINNACLE DRIVE, SUITE 500
MCLEAN
VA
22102-3833
US
|
Assignee: |
Renesas Technology Corp.
|
Family ID: |
29774398 |
Appl. No.: |
12/105188 |
Filed: |
April 17, 2008 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
10457920 |
Jun 11, 2003 |
7376819 |
|
|
12105188 |
|
|
|
|
Current U.S.
Class: |
712/238 ;
712/233; 712/E9.016; 712/E9.025; 712/E9.028; 712/E9.029;
712/E9.034; 712/E9.05; 712/E9.055 |
Current CPC
Class: |
G06F 9/30149 20130101;
G06F 9/30036 20130101; G06F 9/3842 20130101; G06F 9/3814 20130101;
G06F 9/30145 20130101; G06F 9/30167 20130101; G06F 9/3802 20130101;
G06F 9/30032 20130101 |
Class at
Publication: |
712/238 ;
712/233; 712/E09.016 |
International
Class: |
G06F 9/30 20060101
G06F009/30 |
Foreign Application Data
Date |
Code |
Application Number |
Jun 28, 2002 |
JP |
2002-191339 |
Claims
1-11. (canceled)
12. A data processor comprising: an instruction executing unit
which reads and decodes an instruction, and executes the
instruction according to the result of the decoding, wherein said
instruction executing unit includes a first storing unit and a
second storing unit that store an instruction code, wherein said
second storing unit stores a branch target instruction read in
response to a delay branch instruction, and wherein said first
storing unit stores a previous branch target instruction, and when
a delay slot instruction is terminated, said first storing unit is
cleared.
13. A data processor according to claim 12, said instruction
executing unit further comprising: a prefetch counter; and a
temporary register unit used to temporarily store information,
wherein the temporary register unit stores a branch target address
or a value produced by incrementing the branch target address in
response to a delay branch instruction, wherein said prefetch
counter stores a previous branch target address, and wherein said
prefetch counter is usable during execution of a delay slot
instruction, and when the delay slot instruction is terminated, the
contents of said temporary register unit or the value produced by
incrementing the branch target address is pre-set in said prefetch
counter.
14. A data processor according to claim 12, wherein when said
instruction executing unit controls execution of a delay slot
instruction, said instruction executing unit disables instruction
reading in an execution state of the delay slot instruction in
which an instruction code thereof is read and an execution state
thereof in which an effective address thereof is calculated.
15. A data processor according to claim 12, wherein said
instruction executing unit is able to execute a branch instruction
which is the same addressing mode as the delay branch instruction
and which does not cause a delay branch.
16-21. (canceled)
Description
BACKGROUND OF THE INVENTION
[0001] The present invention relates to a data processor, or more
particularly, to a technology effectively utilized for, for
example, a central processing unit (CPU), or a microcomputer or
data processor including the CPU.
[0002] A microcomputer realized with semiconductor integrated
circuits has undergone extension of an address space, expansion of
an instruction set, or an increase in a processing speed. For
example, Japanese Unexamined Patent Publication No. Hei5
(1993)-241826 or No. Hei6 (1994)-51981 describes an example of a
microcomputer that has the address space thereof extended or the
instruction set thereof expanded while maintaining the
interchangeability on an object level.
[0003] Moreover, Japanese Unexamined Patent Publication No. Hei8
(1996)-263290 describes an example of a microcomputer whose CPU is
interchangeable with a CPU that executes basic instructions in two
execution states, and whose processing is so fast as to execute the
basic instructions while bringing them into one state. Furthermore,
the microcomputer enjoys really fast processing owing to
incorporation of a multiplier independent of the CPU.
[0004] Owing to the realization of fast processing, equipment to be
controlled by a microcomputer can be designed to operate fast or
can be sophisticated. Otherwise, a microcomputer that
conventionally consists of a plurality of semiconductor integrated
circuits can be designed compactly by integrating the semiconductor
integrated circuits with one another.
[0005] The present inventor has made the proposal described below
in the previous application (Japanese Unexamined Patent Publication
No. 2000-357089). Namely, the width of an internal data bus is made
larger than at least a basic unit of an instruction (for example, a
word). An instruction register is included for holding a plurality
of units of a read instruction. A means is included for monitoring
the number of bits representing an instruction present in the
instruction register. Based on a basic unit execution time (during
which one execution state persists), an instruction is executed so
that the instruction will be read in one execution state and a
program counter (PC) will be incremented at the same time, and the
instruction will have an effective address thereof calculated or
undergo a data operation in another execution state. An execution
state in which the instruction is merely read can therefore be
omitted. In response to a direction given by the monitoring means,
the execution state in which the instruction is merely read is
omitted (skipped) depending on the number of bits representing the
instruction present in the instruction register. Consequently, the
number of bits to be read as an instruction during instruction
execution is increased or decreased based on the instruction
length. Thus, it is intended to reduce the number of execution
states and achieve fast processing.
SUMMARY OF THE INVENTION
[0006] The present inventor et al. have discussed hardware needed
to achieve fast processing in a microcomputer or the like. In the
course of the discussion, consideration is taken into the points
described below.
[0007] For the control of a microcomputer realized with a single
chip or equipment, it is important to reduce an interrupt response
time. A CPU recognizes various events, which occur in an object of
control or within a microcomputer, as interrupts, and runs a
predetermined program according to an interrupt. In order to
realize desired control using various resources of a microcomputer,
the capability to extent desired control in real-time, that is,
within a predetermined time responsively to occurrence of a
predetermined event must be improved.
[0008] The present inventor et al. have discussed the point that a
logical or physical scale should desirably be minimized. When a
microcomputer is realized with a single chip containing a read only
memory (ROM) as a program memory, since the storage capacity of the
built-in ROM is smaller than that of an externally connected
memory, it is desired to reduce a program size.
[0009] In recent years, it has become almost a matter of common
practice to adopt a flash memory that can be electrically
programmable and erasable or an EEPROM (electrically erasable and
programmable ROM) as a ROM to be incorporated in a single-chip
microcomputer. The electrically erasable and programmable memory is
realized with a large memory device. Moreover, a high-voltage
generation circuit for programming and erasing is required. The
electrically erasable and programmable memory is therefore large in
physical scale. It is therefore greatly desired to reduce a program
size. The same applies to a built-in RAM (which stands for a random
access memory). Namely, a data length to be employed should be as
small as possible.
[0010] More and more programs to be installed in microcomputers are
described using a high-level language such as C language. When a
program is described using a high-level language, compared with
when a program is described using an assembly language, a program
size and a data length employed are likely to increase. From this
viewpoint, the present inventor has noticed that it is more
important for fast processing to realize the points described below
than the hardware-related point.
[0011] (1) [Bus Mode]
[0012] A means for achieving fast processing by realizing a 32-bit
internal bus would prove effective when a bus width can be expanded
like the one of a bus on which a built-in ROM is connected. The
range of applications of microcomputers is wide and programs are
often saved in an external memory. When the external memory is
employed, the 32-bit bus imposes many restrictions in the aspects
of an area of wiring, the number of pins included in a
microcomputer, and the price of a memory employed. By the way, it
is possible for a CPU to request instruction reading in units of 32
bits. The reading of 32 bits can be achieved with two times of
reading of 16 bits. However, the CPU has to wait until 32 bits the
CPU has requested are read. When a branch instruction or the like
is issued, excessively read bits of an instruction are wasted
(unused). Meanwhile, the CPU remains in the wait state. Both the
read bits and the time during which the CPU is in the wait state
are wasted. Even an interrupt is not treated, whereby a so-called
interrupt response time increases. Japanese Unexamined Patent
Publication No. Sho56 (1981)-132624 describes a case where a bus
width designation signal is received through an input/output
terminal and actions are switched accordingly. However,
consideration is not taken into an instruction.
[0013] (2) [Delay Branch]
[0014] In a microcomputer having a pipeline fixed therein, such as,
a reduced instruction set computer (RISC) type microcomputer, a
delay branch may be caused in order to process a branch instruction
fast and simplify an internal logical structure. A branch
instruction and a preceding instruction are switched and then
executed. The switched branch instruction shall be called a delay
branch instruction and the switched preceding instruction shall be
called a delay slot instruction. Japanese Unexamined Patent
Publication No. Hei10(1998)-21074 describes a case where a delay
branch is applied to a variable-length instruction code. This case
is described on the assumption that a Harvard bus is employed and
different processing stages of pipelining are uniform. Moreover, if
instructions directing the same branch are delay branch
instructions alone, there is no instruction capable of being
allocated to a delay slot period. A no-operation (NOP) instruction
is therefore allocated to the delay slot period. This invites an
increase in the number of undesired instructions or an increase in
a program size.
[0015] (3) [A Plurality of Bit Shifts]
[0016] When bits are shifted a plurality of times, it is known that
a so-called barrel shifter is employed. For the barrel shifter,
refer to, for example, Japanese Unexamined Patent Publication Hei
10 (1998)-198550.
[0017] On the other hand, a CPU performs a plurality of operations,
and computing units for performing the respective operations must
be included. The operations include, for example, in addition to a
shift operation, arithmetic operations and logical operations.
These operations do not require series arrangement of bit positions
(shifting of the result of addition or the like). The bit positions
are therefore arranged in parallel. In order to raise the operating
frequency of the CPU, a computing unit capable of producing the
largest delay must be designed to process data fast. In other
words, the delay produced by the computing unit that produces the
largest delay among the computing units is tolerable by the other
computing units. In general, an arithmetic unit is likely to
produce a large delay because of a carry time.
[0018] When a plurality of bit shifts is performed, if a barrel
shifter is unemployed, a required number of shift instructions each
directing a shift of one bit position is described, or a program
loop is described. In order to shift five bit positions, the former
description may be structured as follows:
[0019] SHLL.L #1, ER1
[0020] SHLL.L #1, ER1
[0021] SHLL.L #1, ER1
[0022] SHLL.L #1, ER1
[0023] SHLL.L #1, ER1
[0024] The latter description may be structured as follows:
[0025] MOV.B #5, ROL
L1 SHLL.L #1, ER1
[0026] DEC.B #1, ROL
[0027] BNE L1
Although the frequency of occurrence is not high, an increase in a
program size is invited.
[0028] (4) [Multiplier]
[0029] When a multiplier must be incorporated, a multiplier that
treats a multiplier, a multiplicand, and a product which have the
same data length, such as, a multiplier that provides a product of
16 bits long by multiplying 6 bits by 16 bits is often adopted
according to a data type, which is adopted for a source program
written in C language, other than a multiplier that provides a
product of 16 bits long by multiplying 8 bits by 8 bits. Moreover,
a programming technique that employs multiplication instead of
constant division is known through Japanese Unexamined Patent
Publication No. 2000-357096.
[0030] An object of the present invention is to achieve fast data
processing in terms of hardware, or more particularly, to achieve
fast data processing with the operating frequency of hardware held
unchanged. In short, an object of the present invention is to
decrease the number of execution states needed to perform
predetermined processing.
[0031] Another object of the present invention is to provide a data
processor capable of improving the use efficiency of resources, to
provide a data processor capable of improving user-friendliness, or
to provide a data processor capable of minimizing a program
size.
[0032] The above and other objects of the present invention and the
novel features thereof will be apparent from the contents of this
specification and the appended drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] FIG. 1 is a block diagram showing the details of a CPU that
is an example of a data processor in accordance with the present
invention;
[0034] FIG. 2 is a block diagram showing a single-chip
microcomputer to which a data processor in accordance with the
present invention is adapted;
[0035] FIG. 3 is an explanatory diagram presenting the
configurations of general-purpose registers and control registers
which are incorporated in the CPU;
[0036] FIG. 4 is an explanatory diagram presenting the address
space of the CPU;
[0037] FIG. 5 is an explanatory diagram showing an example of an
instruction format for a machine language unique to the CPU;
[0038] FIG. 6 is an explanatory diagram presenting concrete formats
for several typical instructions relating to the present
invention;
[0039] FIG. 7 is an explanatory diagram presenting a logical
description concerning instruction reading to be performed by a bus
access control unit BCNT;
[0040] FIG. 8 is an explanatory diagram presenting a logical
description concerning the contents of control to be extended to a
prefetch counter pfc by an incrementer INC;
[0041] FIG. 9 is an explanatory diagram presenting a logical
description containing a control instruction that directs selecting
of +1, +2, or +4 according to a data length so as to increase the
number of entry conditions in the logical description shown in FIG.
7 or FIG. 8;
[0042] FIG. 10 is an explanatory diagram presenting a logical
description concerning the contents of control to be extended by a
first shifter;
[0043] FIG. 11 is an explanatory diagram presenting a logical
description concerning the contents of control to be extended by a
second shifter;
[0044] FIG. 12 is an explanatory diagram presenting a logical
description concerning the ability of a control signal sftcnt;
[0045] FIG. 13 is a block diagram presenting the details of an
instruction buffer;
[0046] FIG. 14 is a block diagram presenting the details of an
arithmetic and logic unit;
[0047] FIG. 15 is an explanatory diagram presenting a logical
description of a control instruction that controls the actions of a
first shifter;
[0048] FIG. 16 is an explanatory diagram presenting a logical
description of a control instruction that controls the actions of a
second shifter;
[0049] FIG. 17 is an explanatory diagram presenting other
descriptions concerning first and second cases of the logical
description shown in FIG. 16;
[0050] FIG. 18 is a block diagram presenting the details of a
multiplier;
[0051] FIG. 19 shows a state transition made by the multiplier
shown in FIG. 18 with multiplication controlled;
[0052] FIG. 20 is a flowchart showing an execution flow for a
transfer instruction (MOV.W @aa:16,Rd) in relation to the state of
an instruction buffer;
[0053] FIG. 21 is a flowchart showing an execution flow for a
branch instruction (JMP @aa:32) in relation to the state of the
instruction buffer;
[0054] FIG. 22 is a flowchart showing an execution flow for a
plurality-of-bit shifts instruction (SHLL #xx,Rd) in relation to
the state of the instruction buffer;
[0055] FIG. 23 is a timing chart showing the timings of executing
instructions in a built-in ROM;
[0056] FIG. 24 is a timing chart showing the action timings of
executing instructions fetched from an external memory in 16-bit
bus mode;
[0057] FIG. 25 is a timing chart showing the action timings of
executing instructions fetched from the external memory in 32-bit
bus mode;
[0058] FIG. 26 is a timing chart showing a first example of the
timing of executing a normal branch instruction;
[0059] FIG. 27 is a timing chart showing a second example of the
timing of executing the normal branch instruction;
[0060] FIG. 28 is a timing chart showing a first example of the
timing of executing a delay branch instruction;
[0061] FIG. 29 is a timing chart showing a second example of the
timing of executing the delay branch instruction;
[0062] FIG. 30 is a timing chart showing a first example of the
timing of executing a shift instruction;
[0063] FIG. 31 is a timing chart showing a second example of the
timing of executing the shift instruction;
[0064] FIG. 32 is a timing chart showing a first example of the
timing of executing a multiplication instruction;
[0065] FIG. 33 is a timing chart showing a second example of the
timing of executing the multiplication instruction;
[0066] FIG. 34 is a block diagram presenting a configuration for
producing a control bit bmod;
[0067] FIG. 35 is a block diagram presenting another configuration
for producing the control bit bmod; and
[0068] FIG. 36 is an explanatory diagram schematically showing a
development environment for programs to be installed in a CPU.
[0069] The constituent features of the present invention to be
disclosed in this applicant will be briefed below.
[0070] (1) [Bus Mode]
[0071] A means is included for selecting whether a CPU performs
instruction reading in units of, for example, 16 bits (a first word
length) or 32 bits (a second word length) Depending on whether
instruction reading is performed in units of 16 bits or 32 bits,
increment values (+2 and +4) by which a program counter is
incremented are switched. Data reading or writing is performed in
units of a given data length irrespective of the selecting means.
Specifically, a data processor in accordance with the aspect of the
present invention concerning a bus mode includes an instruction
executing means (CPU) that reads and interprets an instruction, and
executes the instruction according to the result of the
interpretation. The instruction executing means includes: a
register in which data or an address can be saved; a selecting
means that selects a word length in units of which instruction
reading is performed; and a control means (DEC) that produces a bus
control signal (BCMD) which permits instruction reading in units of
the word length selected by the selecting means, and switches
increment values (+2 and +4), by which an instruction prefetch
counter (pfc) is incremented, according to the selected word
length. Data reading or writing is performed in units of a given
word length irrespective of the selecting means. For example, the
CPU issues a request for instruction reading in units of 16 bits or
32 bits or for data reading or writing. A bus controller performs
reading or writing a predetermined number of times according to a
bus width designated for a resource located at an address specified
in the request. The bus controller requests the CPU to wait until
16 bits or 32 bits (read data) of an instruction is prepared in
compliance with the request issued from the CPU.
[0072] When a required number of bits to be read as an instruction
is small, for example, when execution of a branch instruction is
started, instruction reading is forcibly performed in units of 16
bits (a small word length). When a branch instruction or an
interrupt exception is treated, the CPU starts decoding a leading
instruction located at a branch target at the time of completion of
pre-fetching of one word at minimum, and then executes the decoded
instruction.
[0073] As for data, a data reading or writing request is issued in
order to read or write a required number of bits. Therefore, no
data is wasted.
[0074] When the control means decodes an instruction code
representing a branch instruction so as to execute the instruction,
the control means reads an instruction from a branch target. The
instant the control means completes receiving the instruction read
from the branch target, the control means interrupts the
instruction read from the branch target.
[0075] The data processor may further include a built-in ROM and an
external bus means over which an externally coupled memory can be
accessed. At this time, it can be selected whether the bus width of
the external bus means is made smaller than the bus width of a bus
on which the built-in ROM is coupled. Otherwise, the bus width of
the bus on which the built-in ROM is coupled is larger than the bus
width of the external bus means.
[0076] In a preferred form of the present invention, control
information, based on which the selecting means selects a word
length, can be selected by executing an instruction. An initial
value of the control information represents a relatively larger
word length.
[0077] (2) [Delay Branch]
[0078] A branch target instruction read in response to a delay
branch instruction is held in a dedicated register (IR) serving as
a second holding means. When a delay slot instruction is
terminated, an instruction buffer (FB) serving as a first holding
means is cleared. Specifically, a data processor includes an
instruction executing means that reads and interprets an
instruction, and executes the instruction according to the results
of the interpretation. The instruction executing means includes the
first holding means (FB) that holds an instruction code, and the
second holding means (IR). A branch target instruction read in
response to a delay branch instruction is held in the second
holding means (IR). The first holding means (FB) holds the
preceding instruction. When a delay slot instruction is terminated,
the first holding means (FB) is cleared.
[0079] A count value of a prefetch counter (pfc) representing a
branch address is held in a temporal register means (TR). Even when
the length of an instruction code representing a delay slot
instruction is long, instruction reading can be continued based on
the count value of the prefetch counter. The delay slot instruction
may be able to be executed in a plurality of execution states.
Consequently, the number of bits constituting an instruction to be
allocated to a delay slot period can be increased, a delay branch
can be utilized effectively, user-friendliness can be improved, the
use efficiency of resources can be improved, and fast processing
can be realized. Specifically, the instruction executing means
further includes the prefetch counter (pfc), and the temporary
register means (TR) used to temporarily hold information. A branch
target address or a value produced by incrementing the branch
target address is held in the temporary register means (TR) in
response to a delay branch instruction. The prefetch counter
indicates an address succeeding an address from which an
instruction is read before reading of a branch target instruction.
When a delay slot instruction is executed, the prefetch counter is
usable. When the delay slot instruction is terminated, the contents
of the temporary register means (TR) or a value produced by
incrementing the contents of the temporary register means is
pre-set in the prefetch counter. In order to conceal a latency
required for branch instruction reading when a branch takes place
rather than to suppress a disorder in pipelining, an instruction to
be executed in a plurality of execution states is allocated to a
delay slot period. Thus, the number of execution states can be
reduced.
[0080] When an instruction allocated to a delay slot period enters
an execution state in which data of an instruction code
representing the instruction is read or an execution state in which
an effective address thereof is calculated, instruction reading is
disabled. Consequently, reading (pre-fetching) of an instruction
that is not utilized can be suppressed. An increase in the number
of execution states caused by insertion of a wait state can be
avoided.
[0081] The instruction executing means can execute both the delay
branch instruction and a branch instruction which does not include
a delay branch and which is addressed in the same addressing mode
as the delay branch instruction. In short, the instruction
executing means addresses the branch instruction and delay branch
instruction in the same addressing mode. Even if no instruction can
be allocated to a delay slot, a no-operation (NOP) instruction that
is not desired will not be allocated to the delay slot.
[0082] (3) [A Plurality of Bit Shifts]
[0083] Shifters (shifting circuits) that treat a fixed number of
bits equivalent to the power of 2 are connected in series with each
other in two stages. The first and second shifters connected in
series with each other are connected in parallel with an arithmetic
and logic unit. For example, when a 32-bit operation is enabled,
the first shifter performs a right or left shift of a first or
second number of bit positions (1 or 2), and the second shifter
performs a right or left shift of a third, fourth, or fifth number
of bit positions (4, 8, or 16). Moreover, a no-shift operation can
be selected.
[0084] A control unit controls the two stages of shifters. During
instruction execution, the control unit verifies the number of bit
positions to be shifted. The control unit directs a shift in a
first direction or a shift in a second direction, and directs
either or both of the first and second shifters to perform a shift
operation. During instruction execution, if one shift performed by
the first or second shifter is insufficient, the control unit
informs the number of remaining bit positions to be shifted
(sftcnt). When the instruction enters its next execution state, a
shift is continuously performed. Meanwhile, an interrupt is
inhibited. For example, the number of bit positions to be shifted
(1, 2, 4, 8, or 16 bit positions) of which frequency of occurrence
is high can be shifted as directed in one execution state of the
instruction. When a 32-bit operation is performed, up to the first,
second, third, fourth, and fifth numbers of bits are shifted once
as operations directed in three different execution states of the
instruction. Consequently, any bit positions can be shifted.
[0085] A barrel shifter includes right and left selectors that
select any of 64 options. In a plurality-of-bit shifts circuit
relevant to the foregoing means, a first shifter should merely
include a selector that selects any of five options, and a second
shifter should merely include a selector that selects any of seven
options. Thus, any number of bit positions to be shifted can be
selected as well as any shifting direction can be designated, and a
no-shift operation can also be designated. This leads to a
reduction in a logical scale. Moreover, the first and second
shifters are connected in parallel with the arithmetic and logic
unit. A delay time given by the plurality-of-bit shifts circuit
relevant to the aforesaid means can be made equal to the delay time
given by the arithmetic and logic unit, whereby an increase in the
delay time can be deterred. The employment of the plurality-of-bit
shifts circuit relevant to the aforesaid means can minimize a
program size more successfully than when a required number of shift
instructions each directing a shift of one bit position is
described or when a program loop is described.
[0086] (4) [Multiplier]
[0087] A data processor in accordance with the present invention
includes an instruction executing means. The instruction executing
means includes a multiplying means that produces a product, which
is composed of a second number of bits, by multiplying a
multiplicand, which is composed of a first number of bits, by a
multiplier composed of the first number of bits. The multiplying
means is used to execute a first multiplication instruction
(MUL.W), a second multiplication instruction (MUL.L), and a third
multiplication instruction (MUL/UP). Herein, the first
multiplication instruction directs acquisition of the low-order
bits of a product, which is composed of the first number of bits,
of the multiplicand composed of the first number of bits by the
multiplier composed of the first number of bits. The second
multiplication instruction directs acquisition of the low-order
bits of a product, which is composed of the second number of bits,
of a multiplicand composed of the second number of bits by a
multiplier composed of the second number of bits. The third
multiplication instruction directs acquisition of the higher-order
bits of a product, which is composed of the second number of bits,
of the multiplicand composed of the second number of bits by the
multiplier composed of the second number of bits. For example, a
multiplier adopts the multiplication logic that a multiplication
instruction directing calculation of a 32-bit product of 16 bits by
16 bits is executed in one execution state. The multiplier is used
to execute the multiplication instruction, which directs
multiplication of 16 bits by 16 bits, in one execution state. A
multiplication instruction directing acquisition of the 32
low-order bits of a product of 32 bits by 32 bits is executed in
four execution states. A multiplication instruction directing
acquisition of the 32 high-order bits of the product of 32 bits by
32 bits is executed in five execution states.
[0088] The multiplier acts in response to a start signal (start), a
control signal (mul_long) indicating the number of bits to be
multiplied, and a control signal (upper) representing the
high-order bits calculated by the multiplier. The multiplier
changes the states thereof. When the multiplication instruction
directing acquisition of the 32 high-order bits of a product of 32
bits by 32 bits is employed in division, the multiplication
instruction will contribute to improvement of the efficiency in
division.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0089] FIG. 2 is a block diagram showing a single-chip
microcomputer to which a data processor in accordance with the
present invention is adapted.
[0090] A single-chip microcomputer 1 shown in FIG. 2 comprises
functional blocks or modules serving as: a CPU 2 that is
responsible for the control of the whole; a bus controller (bus
state controller, BSC) 3; an interrupt controller (INT) 4; a ROM 5
that is a memory in which processing programs to be run by the CPU
2 are stored; a RAM 6 that is a memory which is used as a work area
by the CPU 2 and in which data is temporarily stored; a timer 7; a
serial communication interface (SCI) 8; an A/D converter
(analog/digital converter) 9; first to ninth input/output ports
(IOP1 to IOP9) 10A to 10I; and a clock oscillator (clock pulse
generator, CPG) 11. A known semiconductor integrated circuit
manufacturing technology is used to form the microcomputer on one
semiconductor substrate (semiconductor chip).
[0091] The single-chip microcomputer 1 has as power terminals to
which a ground-level voltage (Vss), a supply voltage (Vcc), an
analog ground-level voltage (AVss), and an analog supply voltage
(AVcc) are applied. Furthermore, a reset terminal (RES), a standby
terminal (STBY), mode control terminals (MD0 and MD1), and clock
input terminals (EXTAL and XTAL) are included as dedicated control
terminals.
[0092] The input/output ports 10A to 10I are used as the
input/output terminals through which a signal on an address bus, a
signal on a data bus, and a bus control signal are received or
transmitted, or used as the input/output terminals of the timer 7,
SCI 8, and A/D converter 9.
[0093] Crystal oscillators are connected to the terminals EXTAL and
XTAL of the CPG 11, and an external clock is applied to the EXTAL
terminal. Synchronously with a reference clock (system clock)
produced based on the external clock, the single-chip microcomputer
1 performs actions. One cycle of the reference clock shall be
called a state period.
[0094] The functional blocks of the single-chip microcomputer 1 are
interconnected over an internal bus 12. The internal bus 12
consists of an address bus, a data bus, and a control bus. Bus
commands produced by encoding a Read signal, a Write signal, and a
bus width signal are transmitted over the control bus.
[0095] The functional blocks or modules are read or written by the
CPU 2 over the internal bus 12. The width of the internal bus 12
is, for example, 32 bits. The built-in ROM 5 and RAM 6 can be read
or written by taking one state period. Control registers included
in the bus controller 3, interrupt controller 4, timer 7, SCI 8,
A/D converter 9, input/output ports 10A to 10I, and CPG 11 shall be
generically called internal I/O registers.
[0096] The bus controller (BSC) 3 receives a bus command BCMD from
the CPU 2 and receives an address signal over the address bus IAB,
and controls the internal bus in a given manner. In association
with an address designated over the address bus IAB, an inherent or
selected bus width and an inherent or selected number of access
state periods are determined. Accordingly, instruction reading or
data reading or writing requested by the CPU 2 is achieved. At this
time, a Ready signal ready is returned as a response signal
relative to the request for instruction reading or data (bus
command) reading or writing issued from the CPU 2. When the Ready
signal ready is inactive, the CPU 2 is brought to a wait state.
[0097] The bus controller 3 has a control bit bmode16 held in the
internal I/O register thereof. When the control bit bmode16 is a 0,
a 32-bit instruction fetch mode is designated. When the bit bmode16
is a 1, a 16-bit instruction fetch mode is designated, and a
control signal bmode16 is transmitted to the CPU 2. The control bit
bmode16 is reset or cleared to 0. In the 32-bit instruction fetch
mode, even when a memory connected over the 16-bit bus is read,
unless the processing sequence of a program is changed by inserting
a branch instruction or an interrupt, a waste will not occur.
Normally, after the control bit bmode16 is reset, an interrupt is
masked or inhibited. Prior to initialization, the control bit
bmode16 should be set to 1.
[0098] When a Reset signal RES is transmitted to the single-chip
microcomputer 1, the single-chip microcomputer 1 including the CPU
2 is reset. When the reset state is lifted, the CPU 2 handles a
reset exception, that is, reads a start address from a
predetermined address (reset vector) and starts reading an
instruction from the start address. Thereafter, the CPU 2
sequentially reads an instruction from the ROM 5 or the like, and
interprets it. Based on the contents of the interpreted
instruction, the CPU 2 treats data or transfers data to or from the
RAM 5, timer 7, SCI 8, or input/output ports 10A to 10I.
[0099] The state of the timer 7, SCI 8, or external signal may be
transmitted as an interrupt signal to the CPU 2. Specifically, an
interrupt signal 13 is transmitted from a predetermined circuit
such as the A/D converter 9, timer 7, SCI 8, or input/output ports
10A to 10I, and received by the interrupt controller 4. Based on a
designation held in a predetermined register, an interrupt request
signal intf and a vector vec corresponding to a handled interrupt
are transmitted to the CPU 2. If a factor of an interrupt occurs, a
CPU2 interrupt request is generated. The CPU 2 suspends processing
that is under way, handles an exception, reads a branch target
address from an address associated with the vector vec, causes a
branch to a predetermined routine, performs desired processing, and
thus overcomes the factor of an interrupt. At the end of the
predetermined routine, a normal restoration instruction is
described. By executing the instruction, the suspended processing
is restarted.
[0100] FIG. 3 shows an example (programming model) of the
structures of general-purpose registers and control registers
incorporated in the CPU 2.
[0101] The CPU 2 has eight general-purpose registers ER0 to ER7
each of which holds data of 32 bits long. The general-purpose
registers ER0 to ER7 share the same capability and may be used as
address registers or data registers.
[0102] When the general-purpose registers ER0 to ER7 are used as
data registers, they can serve as 32-bit, 16-bit or 8-bit
registers. When the general-purpose registers ER0 to ER7 must be
used as address registers or 32-bit registers, they are regarded
generically as general-purpose registers ER (ER0 to ER7). When the
general-purpose registers ER0 to ER7 must be used as 16-bit
registers, the general-purpose registers ER are divided into
general-purpose registers E (E0 to E7) and general-purpose
registers R (R0 to R7). The general-purpose registers E and R share
the same capability. Up to sixteen 16-bit registers can be used.
When the general-purpose registers ER0 to ER7 must be used as 8-bit
registers, the general-purpose registers R are divided into
general-purpose registers RH (R0H to R7H) and general-purpose
registers RL (R0L to R7L). The general-purpose registers RH and RL
share the same capability. Up to sixteen 8-bit registers can be
used. A using method can be selected for each register.
[0103] The general-purpose register ER7 has, in addition to the
capability of a general-purpose register, the capability of a
register holding a stack pointer (SP). The general-purpose register
ER7 is implicitly used to handle an exception or a branch from a
subroutine. When it says that an exception is handled, it includes
a case where an interrupt is handled.
[0104] A program counter PC is a 32-bit counter and indicates the
address of an instruction being executed by the CPU 2. Instructions
to be executed by the CPU 2 are all 2 bytes (word) long, though
they are not limited to the data length. The least significant bit
is invalid. During instruction reading, the least significant bit
is regarded as a 0.
[0105] A condition code register CCR is an 8-bit register and holds
data representing the internal state of the CPU 2. The data is 8
bits long and contains an interrupt mask bit (I), a half carry flag
(H), a negative flag (N), a zero flag (Z), an overflow flag (V),
and a carry flag (C).
[0106] A vector register VBR is a 32-bit register and holds data
whose 12 low-order bits are 0s and 20 high-order bits serve as an
exception handling vector.
[0107] FIG. 4 presents the address space of the CPU. The address
space of the CPU 2 is 4 gigabytes wide and divided into eight
areas. In the areas, a bus width and the number of access state
periods can be specified based on the settings of the bus
controller 3.
[0108] On the other hand, the ROM 5, RAM 6, and internal I/O
registers incorporated in the microcomputer 2 operate according to
an inherent bus width and an inherent number of access state
periods irrespective of the settings of the bus controller 3. As
mentioned above, the built-in ROM 5 and RAM 6 are interconnected
over the 32-bit bus, and read or written during one state period.
The settings of the address space and areas are described in
Japanese Unexamined Patent Publication No. Hei5 (1993)-307519 or
the like.
[0109] FIG. 4 also shows a preferred example of switching bus
modes. FIG. 4 is concerned with a case where an external ROM 5 that
is connected on the 16-bit bus and accessible over two state
periods is connected to area 1. When the built-in ROM 5 is used to
run a program, since the built-in ROM 5 is connected on the 32-bit
bus, the CPU should act in 32-bit mode. The control bit bmode16 is
reset to 0, and instruction reading is performed in units of a long
word. Consequently, a long word (two words) of an instruction can
be read during a unit reading cycle. Even if a subsequent
instruction is not executed, no waste occurs. In response to a
branch instruction contained in the program present in the built-in
ROM 5, a branch is made to the external ROM. A program in the
external ROM is then run. Consequently, the control bit bmode16 is
set to 1. Hereinafter, instruction reading is performed in units of
a word.
[0110] Even if the control bit bmode16 is a 0, unless the
processing sequence of a program is changed by inserting a branch
instruction or an interrupt, no waste is produced. The control bit
bmode16 may be changed from a 1 to a 0 or vice versa after a branch
is made to the external ROM. When there is a possibility that an
interrupt may occur, it an increase in an interrupt response time
cannot be permitted, the control bit bmode16 should be set to 1
immediately before a branch to the external ROM takes place.
[0111] If a branch is made from the external ROM to the built-in
ROM, the foregoing actions are performed in reverse order.
[0112] Referring to FIG. 4, after the program residing in the
built-in ROM 5 is run, a branch to the external ROM takes place.
After the program residing in the external ROM is run, a branch to
the built-in ROM 5 takes place. However, a branch may take place
any time. A main program may be stored in the external ROM, and a
subroutine program (functions) that must be run fast may be stored
in the built-in ROM 5. Between the external ROM and built-in ROM 5,
a branch from the subroutine or a return to the subroutine is
allowed to take place any time. Moreover, the subroutine program
(functions) that must be run fast may be stored in the built-in RAM
6. The same applies to the pair of built-in RAM and external
RAM.
[0113] FIG. 5 shows an example of an instruction format for an
instruction written in a machine language which is employed in the
CPU 2. Instructions to be executed by the CPU 2 are treated in
units of 2 bytes (word). Each instruction includes an operation
field (op), a register field (r), an EA extension field (EA), and a
condition field (cc).
[0114] The operation field (op) is used to indicate the effect of
an instruction and to designate the contents of an operand to be
addressed in designated addressing mode. Four leading bits of an
instruction is contained in the operation field without fail. An
instruction format may include two operation fields.
[0115] The register field (r) is used to designate a
general-purpose register. When the register field (r) is used to
designate an address register, 3 bits are contained. When the
register field (r) is used to designate a data register, 3 bits
(32-bit register) or 4 bits (8 or 16-bit register) are contained.
An instruction format may have two register fields or no register
field.
[0116] The EA extension field (EA) is used to designate immediate
data, an absolute address, or a displacement. 8, 16, or 32 bits are
contained. The condition field (cc) is used to designate a branch
condition for a conditional branch instruction (Bcc).
[0117] FIG. 6 shows examples of instruction formats (for
instructions to be described later) employed in the present
invention.
[0118] [Instruction MOV]
[0119] MOV.W @rs,rd is an instruction of one word long. Bit
positions 6 to 4 are used to designate an address register (rs),
and bit positions 3 to 0 are used to designate a data register
(rd). MOV.W @aa:16,rd is an instruction of two words long. Bit
positions 3 to 0 in a first word are used to designate a data
register (rd), and a second word represents an absolute address
(aa).
[0120] [Instruction JMP]
[0121] JMP @aa:32 is an instruction of three words long. Second and
third words represent an absolute address (aa).
[0122] [Normal Branch Instruction or Delay Branch Instruction]
[0123] A normal branch instruction BRA d:8 and a delay branch
instruction BRA/S d:8 are instructions of one word long. Bit
positions 6 to 1 are used to designate a displacement (d). The
displacement (d) is designated by resetting the least significant
bit to 0 and interpreting the high-order bits according to a
different code through code extension. When bit position 0 in an
instruction code is reset to 0, the instruction code is thought to
represent a normal branch instruction. When the bit position 0 in
an instruction code is set to 1, the instruction code is thought to
represent a delay branch instruction.
[0124] [Shift Instruction]
[0125] SHLL.L #xx,rd is an instruction of one word long. Bit
positions 8 to 4 are used to designate immediate data (xx) that
represents the number of bit positions to be shifted, and bit
positions 3 to 0 are used to designate a destination register
(rd).
[0126] [Multiplication Instruction]
[0127] Instruction MULU.W rs,rd directs acquisition of the 16
low-order bits of a product of 16 bits by 16 bits. Instruction
MULU.L rs,rd directs acquisition of the 32 low-order bits of a
product of 32 bits by 32 bits. Instruction MULU/UP.L rs,rd directs
acquisition of the 32 high-order bits of the product of 32 bits by
32 bits. These instructions are of one word long. Bit positions 7
to 4 are used to designate a source register (rs), and bit
positions 3 to 0 are used to designate a destination register (rd).
Likewise, MULU.W #x:4,rd, MULU.L #x:4,rd, and MULU/UP.L #x:4,rd are
instructions of one word long whose four bits of immediate data are
not used to designate the source register but used to designate a
multiplier. Namely, bit positions 7 to 4 are used to designate
immediate data (xx), and bit positions 3 to 0 are used to designate
a destination register (rd).
[0128] FIG. 1 shows a detailed example of the CPU 2. The CPU 2
includes a control block (CNT) 2A and an execution block (EXEC) 2B.
The control block 2A includes an instruction buffer IBUF, an
instruction changer CHG, an instruction decoder DEC, a register
selector RESL, and an interrupt control section INTC.
[0129] The Ready signal ready is a response signal returned from
the bus controller 3 in response to a request for instruction
reading or for data reading or writing (bus command) issued from
the CPU 2. The Ready signal is an enabling signal with which the
CPU 2 is enabled to make a state transition.
[0130] The instruction buffer IBUF is a circuit in which a read
instruction is temporarily stored. The instruction buffer IBUF is
composed of a FIFO buffer FB capable of accommodating 8 words and
an instruction register IF capable of holding two words. The
instruction register IR is used to issue a branch instruction, and
receives an initialization signal clrFIFO sent from the instruction
decoder DEC.
[0131] An instruction to be executed is transmitted from the
instruction buffer IBUF to the instruction decoder DEC. Immediately
after instruction reading is completed, the FIFO buffer FB in the
instruction buffer IBUF is unused, and an input received directly
over the data bus IDB is transmitted to the instruction decoder
DEC.
[0132] The instruction changer CHG is activated only when an
instruction code other than that of a read instruction is
transmitted to the instruction decoder DEC. In other cases, the
instruction changer CHG transmits the contents of the instruction
buffer IBUF to the instruction decoder DEC. An instruction code
other than that of a read instruction is used to handle an
exception such as an interrupt in response to a direction given by
the interrupt control section INTC.
[0133] The interrupt control section INTC treats an interrupt
request signal intf sent from the interrupt controller 4 shown in
FIG. 2. Moreover, the interrupt control section INTC references an
interrupt mask signal (not shown) sent from the instruction decoder
DEC. If an interrupt is unmasked, the interrupt control section
INTC transmits a control signal intrq so as to direct the
instruction changer CHG to generate an interrupt. An interrupt is
inhibited during the execution time of a plurality-of-bit shifts
instruction that is treated between a delay branch instruction and
a delay slot instruction which will be described later. The logical
description of a signal chg_int which initiates execution of
interrupt exception handling is "assign chg_int=intrq &
.about.(|sftcnt).about. & mod_db." Herein, |sftcnt signifies
the OR of all the bits of a signal sftcnt, & is an AND sign,
and .about. is a reverse sign. The logical descriptions of the
control signal sftcnt and modification signal mod_db will be
described later.
[0134] The instruction decoder DEC is realized with, for example, a
micro-ROM, a programmable logic array (PLA), or wired logic, and
transmits control signals. The control signals are broadly
classified into three kinds of first to third kinds. The first to
third kinds of control signals are different from one another in
the timings of their being valid. The first control signal (control
signal A) is valid during a current state period. The second
control signal (control signal B) is valid during the next state
period. The third control signal (control signal C) is valid during
the state period succeeding the next state period.
[0135] Part of an output of the instruction decoder DEC (second
control signal) is fed back to the instruction decoder DEC. The
part of the output contains a stage code (tmg) used to change
stages of execution of each instruction code, a plurality-of-bit
shifts control signal sftcnt, and a modification signal mod_db
indicating a delay slot.
[0136] A signal bmode16 indicating a bus mode is transmitted to the
CPU 2. A flip-flop FF receives the signal bmode16. An output signal
mode16 of the flip-flop FF is transmitted to the instruction
decoder DEC.
[0137] The instruction decoder DEC includes an execution sequence
control section SCNT, a bus access control section BCNT (bus
command BCMD: first control signal), and an execution control
section ECNT. The logical description of instruction reading
according to which the bus access control section BCNT extends
control will be created as briefed below. Namely, when the
execution sequence control section SCNT activates a control signal
if or ifw that initiates instruction fetching, a bus command is
issued according to the logical description (case statement) in
FIG. 7. The logical description shown in FIG. 7 is called a
register transfer level (RTL) description or a hardware description
language (HDL) description, and can be logically developed in a
logic circuit using a known logical combination tool. The HDL rules
are standardized as the IEEE1364 standard. A syntax applied to the
logical description is conformable to the case statement. The
contents of the case statement are such that when a value or signal
defined in parentheses ( ) following always @ changes, processing
described in the subsequent lines is carried out. 4'b1000 signifies
a binary code 1000 of 4 bits long. The control signal if initiates
instruction reading in units of a long word or word. The control
signal ifw initiates forcible instruction reading in units of a
word. 'longread or 'wordread specifies a constant defined
separately. iab1 signifies bit position 1 in the address bus IAB.
An instruction is treated in units of a word (16 bits long), and
allocated to an even-numbered address without fail. When iab1=0,
the address is a multiple of 4, and a long word can be accessed
with one attempt. Specifically, assuming that instruction fetch is
directed, an address is a multiple of 4, and the 16-bit bus mode is
not adopted, long word reading 'longread is specified in a bus
command BCMD. Assuming that instruction fetch is directed, the
16-bit bus mode is adopted, and an address is not a multiple of 4,
word reading 'wordread is specified in the bus command BCMD.
[0138] FIG. 8 shows the contents of control, which is extended to
the prefetch counter pfc by an incrementer INC that will be
described later, in the form of a logical description. 'inc4 and
'inc2 specify separately defined constants.
[0139] An entry condition is identical to that for the bus command
BCMD. When instruction reading is performed in units of a long
word, a control signal c_inc is set to a state +inc4 representing
+4. When instruction reading is performed in units of a word, the
control signal c_inc is set to a state 'inc2 representing +2.
[0140] When the control signal ifw is transmitted, the prefetch
counter pfc is not incremented on the assumption that a branch will
take place. However, the present invention is not limited to this
mode. Herein, the number of entries (entry conditions) in the
logical description (case statement) shown in FIG. 7 or FIG. 8 may
be increased properly. For example, the entries can be shared with
data access to be performed in addressing mode in which addressing
is achieved indirectly via a post-increment register. A control
instruction for controlling selection of +1, +2, or +4 according to
a data length is additionally described. For example, a control
signal pstinc initiating execution of indirect addressing via the
post-increment register, and a control signal size (byte: 01, word:
10, long word: 11) indicating a data length are used to create a
logical description as shown in FIG. 9.
[0141] The control of a shifter includes the control of a first
shifter, the control of a second shifter, and transmission of a
control signal sftcnt. A plurality-of-bit shifts instruction is
executed in up to three execution states (S1 to S3). In the first
execution state (S1), a shift bit field in an instruction code is
referenced in order to perform a shift. In the second and third
execution states (S2 and S3), the control signal sftcnt that is a
control signal B is referenced.
[0142] The control of the first shifter is realized according to a
logical description presented in FIG. 10. Referring to FIG. 10,
opcode specifies an instruction code (16 bits long) received by the
instruction decoder DEC. In FIG. 10, 'S1 and 'S2 specify constants
or control signals defined separately in order to indicate the
first and second execution states of the plurality-of-bit shifts
instruction. Specifically, the control of the first shifter is such
that when the plurality-of-bit shifts instruction enters the first
execution state (S1), whether a shift is performed one bit position
or two bit positions, or no shift is performed is verified in that
order in order to produce a control signal sft1 or sft2. When a
shift of one bit position is performed in the first execution state
of the plurality-of-bit shifts instruction, if a shift of two bit
positions is needed in the second execution state (S2) thereof, a
two-bit control signal sft2 is produced.
[0143] Likewise, the control of the second shifter is realized
according to a logical description presented in FIG. 11. The effect
of the control signal sftcnt can be specified in a logical
description presented in FIG. 12 with the control signal divided
into three high-order bits and two low-order bits. When all the
bits of the signal sftcnt are 0s, it is verified that the control
procedure is terminated.
[0144] A register selector RESL produces register selection signals
rdgb[n] and wbrd[n] on the basis of a direction given by the
instruction decoder DEC and information contained in a register
field in an instruction code. The register selector RESL controls
the inputs and outputs of general-purpose registers. Herein, [n]
specifies a number that ranges from 0 to 7 and that is assigned to
each general-purpose register.
[0145] The execution block 2B includes general-purpose registers
ER0 to ER7, a program counter PC, a prefetch counter pfc, a
condition code register CCR, a temporary register TR, an arithmetic
and logic unit ALU, an incrementer INC, a read data buffer DBR, a
written data buffer DBW, an address buffer AB, and a multiplier
MULT. These blocks are interconnected over internal buses ab, gb,
db, wb1, and wb2.
[0146] The arithmetic and logic unit ALU is used to perform various
operations designated by instructions or to calculate an effective
address. The incrementer INC is used to give an increment to a
count value of the program counter PC. As mentioned above, the
direction indicating +1, +2, or +4 by which the count value of the
prefetch counter pfc should be incremented is given to the
incrementer INC with the control signal c_inc.
[0147] Any data other than those held in the general-purpose
registers ER0 to ER7, program counter PC, and condition code
register CCR which are listed in FIG. 3 cannot be referenced in the
course of programming, but used for the actions of the components
of the microcomputer 1. Namely, the read data buffer DBR, written
data buffer DBW, and address buffer AB are used to temporarily
latch or buffer data for the purpose of interfacing with the
internal address bus IAB or internal data bus IDB. The temporary
register TR is used for the actions of the components of the
microcomputer 1. The internal address bus IAB and internal data bus
IDB are contained in the internal bus 12.
[0148] The read data buffer DBR temporarily holds data read from
the ROM 5, RAM 6, internal I/O registers, or an external memory
that is not shown. The written data buffer BDW buffers data to be
written in the ROM 5, RAM 6, internal I/O registers, or external
memory. The address buffer AB temporarily holds an address from or
to which the CPU 2 reads or writes data, produces a vector or
branch table reading address. The address buffer AB receives a
vector from the interrupt controller or receives information over
the internal bus ab, and places an address on the address bus IAB.
Normally, the contents of the internal bus ab are selected. The
multiplier MULT receives a control signal from the instruction
decoder DEC or receives data over the internal bus gb or db, and
places the result of multiplication on the internal bus wb2.
[0149] FIG. 13 presents the details of the instruction buffer IBUF.
The instruction buffer IBUF includes a FIFO buffer FB serving as a
first holding means that holds an instruction code, and an
instruction register IR serving as a second holding means.
[0150] The instruction register IR is composed of a low-order
instruction register IRL connected on an internal data bus IDBL and
a high-order instruction register IRH connected on an internal data
bus IDBH. A word enabling register IR/W indicates that the contents
of the high-order instruction register IRH are valid. A long word
enabling register IR/L indicates that the contents of both the
high-order instruction register IRH and low-order instruction
register IRL are valid.
[0151] The FIFO buffer FB includes eight stages of storage areas
BUF0 to BUF7 in each of which 16-bit data is stored, though the
FIFO buffer is not limited to this structure. In the storage areas
BUF0 to BUF7, an instruction placed on an internal data bus IDB or
an instruction read from the instruction register IR is saved. The
internal data bus IDBH is used to transmit 16 high-order bits,
while the internal data bus IDBL is used to transmit 16 low-order
bits. TLP denotes a trailing bit pointer (read data pointer) of two
bits long indicating a trailing storage area in which data is
stored. HDP denotes a leading pointer (written data pointer) of two
bits long indicating a leading storage area in which data is
stored. NBP denotes a number-of-data items pointer indicating the
number of data items stored in the FIFO buffer FB. A receiving
route along which an instruction code is received and stored in a
storage area is designated with a signal IRHenb, IRLenb, Henb, or
Lenb. The signal IRHenb directs that an instruction code should be
received from the high-order instruction register IRH. The signal
IRLenb directs that an instruction code should be received from the
low-order instruction register IRL. The signal Henb directs that an
instruction code should be received over the internal data bus
IDBH. The signal Lenb directs that an instruction code should be
received over the internal data bus IDBL. A multiplexer MUX selects
any of the outputs of the storage areas BUF0 to BUF7. The
multiplexer MUX transmits data in the storage area indicated with
the trailing pointer TLP through an output terminal out0, and
transmits data in the storage area next to the storage area
indicated with the trailing pointer TLP through an output terminal
out1. The multiplexer MUX transmits data in the third storage area,
which is a storage area adjoining the storage area next to the
storage area indicated with the trailing pointer TLP, through an
output terminal out2.
[0152] Any of the outputs received through the output terminal
out0, received over the internal data buses IDBL and IDBH, or
received from the instruction registers IRH and IRL which is
selected by a selector ISEL. The selected output is transmitted as
an output rslt0. The output rslt0 is transmitted to the instruction
decoder DEC via a changing circuit CHG. The outputs transmitted
through the output terminals out1 and out2 are used as outputs
rslt1 and rslt2 to calculate an effective address or the like.
[0153] A buffer control circuit FCNT controls the entire
instruction buffer IBUF. By the way, clk denotes a clock and rst
denotes a Reset signal. LIR/W denotes a word length reading
directing signal, LIR/L denotes a long word length reading
directing signal, LID denotes an instruction execution completing
signal, and LID_NUM denotes data representing the word length of an
instruction whose execution is completed. Moreover, stkIR denotes a
signal directing saving of an instruction on the internal data bus
IDB into the instruction register IR. pfb (equivalent to cirFIFO)
denotes a signal directing a branch. The signal stkIR is activated
to logical 1 when a branch target instruction is placed on the
internal data bus IDB during execution of a delay branch
instruction. When the signal stkIR is activated, the buffer control
circuit FCNT saves the instruction on the internal data bus IDB
into the instruction register IR. Thereafter, when the signal pfb
is activated, the FIFO buffer FB is cleared. The FIFO buffer FB is
cleared by initializing the pointers TIP and HDP. The signals
LIR/W, LIR/L, LID, LID_NUM, stkIR, and pfb are transmitted from the
instruction decoder DEC.
[0154] In the instruction buffer IBUF, a branch target instruction
read in response to a delay branch instruction is temporarily held
in the register IR. After a delay slot instruction is terminated,
the FIFO buffer FB is cleared. At this time, a branch address is
held in a tentative register means, for example, a temporary
register TR, and a count value of the prefetch counter pfc is
preserved. When the length of an instruction code representing a
delay slot instruction is large, instruction reading can be
continued based on the count value of the prefetch counter pfc. The
delay slot instruction may be executed in a plurality of execution
states. Consequently, the number of instructions to be allocated to
a delay slot increases, a delay branch is utilized effectively,
user-friendliness improves, the use efficiency of resources
improves, and fast processing is realized.
[0155] In an effort to conceal a latency derived from reading of a
branch instruction during handling of a branch but not to suppress
a disorder in pipelining, an instruction being executed in a
plurality of execution states is allocated to a delay slot. This
leads to a reduced number of execution states.
[0156] When an instruction allocated to a delay slot enters an
execution state in which an instruction code representing the
instruction is read or an execution state in which an effective
address thereof is calculated, instruction reading is disabled.
Thus, reading (pre-fetching) of an instruction that is not utilized
is suppressed. Consequently, an increase in the number of execution
states derived from insertion of a wait state can be avoided.
[0157] FIG. 14 presents the details of the arithmetic and logic
unit ALU. FIG. 14 shows the arithmetic and logic unit 20, shifter
21, and output selector 22. Inclusion of a logic operation facility
and an additional facility for producing a bit handling instruction
or the like which are not shown will not be hindered.
[0158] The arithmetic and logic unit 20 has input terminals thereof
connected on the internal buses gb and db. The shifter 21 has an
input terminal thereof connected on the internal bus gb. The output
selector 22 selects the result of an operation performed by the
arithmetic and logic unit 20 or the result of an operation
performed by the shifter 21. The output selector 22 places a
selected result on an internal bus wb2, or if necessary, on an
internal bus ab.
[0159] The arithmetic and logic unit 20 consists of an input
selector 25 and a full adder 26. The input selector 25 reverses an
input placed on the internal bus db and a carry. This is, as
already known, intended to perform subtraction as addition of a
complement of 2.
[0160] The shifter 21 comprises the first shifter 27 and second
shifter 28, and receives a signal on the internal bus gb as well as
control signals sft1, sftr, sft1, sft2, sft4, sft8, and sft16. The
shifter 21 places the result of shifts on the internal bus wb2 via
the output selector 22. The signal sft1 is a control signal
directing a left shift, and the signal sftr is a control signal
directing a right shift. The signal sft1 is a control signal
directing a shift of one bit position, the signal sft2 is a control
signal directing a shift of two bit positions, the signal sft4 is a
control signal directing a shift of four bit positions, the signal
sft8 is a control signal directing a shift of eight bit positions,
and the signal sft16 is a control signal directing a shift of
sixteen bit positions.
[0161] FIG. 15 presents the logical description of a control
instruction that controls the first shifter 27. According to the
logical description, when both the control signals sft1 and sft2
are inactive, no shift takes place. In other cases, a right or left
shift of 1 or 2 bit positions is performed based on a combination
of the control signals sft1, sftr, sft1, and sft2.
[0162] Likewise, FIG. 16 presents a logical description of a
control instruction that controls the second shifter 28.
[0163] In the logical descriptions of the control instructions,
which control the shifter 21, shown in FIG. 15 and FIG. 16, for
example, 1 is specified for the control signal sft1, 1 is specified
for stf1, 1 is specified for stf4, 0 is specified for sftr, and 0
is specified for sft2, 0 is specified for sft8, and 0 is specified
for sft16. In this case, the first shifter 27 and second shifter 28
perform a left shift of five bit positions.
[0164] The foregoing logical descriptions define the actions of the
control block so that the control block will not direct the first
shifter 27 and second shifter 28 to perform a shift of a plurality
of bit positions. Otherwise, priorities may be assigned to the
logical descriptions of the control instructions that control the
first shifter 27 and second shifter 28 respectively, and the logic
of the control block may be defined based on the priorities.
[0165] Moreover, the logical descriptions of the control
instructions that control a shifter may be defined without use of
an operator << or the like. For example, a first case and a
second case concerning the second shifter 28 may be described as
presented in FIG. 17. The logical description may be modified in
order to contain the description of an arithmetic shift or an
operation of rotating bits right or left, or to change a byte count
or a word length.
[0166] When either the control signal sft1 or control signal sftr
is active, the output selector 22 selects an output of the
shifter.
[0167] As far as the arithmetic and logic unit 20 is concerned, a
carry is generated and one of two inputs that will undergo
subtraction must be reversed. This leads to an increase in the
number of logical operations. Even when the first shifter 27 and
second shifter 28 are connected in series with each other, a delay
caused by the shift operations will not get larger than a delay
caused by arithmetic operations. An increase in a maximum operating
frequency is hardly restricted A barrel shifter requires selector
circuits that select 64 right and left bit positions. In contrast,
a selector circuit that selects any of five options may be adopted
for the first shifter 27 included in the shifter 21, and a selector
circuit that selects any of seven options may be adopted for the
second shifter 28 therein. Herein, even a no-shift operation can be
selected. Consequently, a logical scale can be drastically
reduced.
[0168] FIG. 18 presents the details of the multiplier MULT. MUL
denotes a parallel multiplication circuit that performs an
operation of multiplying 16 bits by 16 bits. reg1 and reg2 denote
data registers in which a partial product is temporarily stored.
ADD denotes an addition circuit. MAC (MACH or MACL) denotes a
product summation circuit in which the results of multiplication
are accumulated. regc denotes an output data register. MUX denotes
a multiplexer. Data a is placed on an internal bus gb, data b is
placed on an internal bus db, and data c is placed on an internal
bus wb2. The multiplier MULT has its action of performing an
operation controlled through state transition control. The state
transition control logic whose illustration is omitted is
implemented in the multiplier MULT.
[0169] FIG. 19 presents a state transition to be made during the
control of multiplication performed by the multiplier MULT shown in
FIG. 18. States are specified as tmg=0 to tmg=8.
[0170] The state tmg=0 is a standby state (no-operation). But for
an operation request issued from the control unit 2A, the
multiplier MULT does not start an operation. The operation request
is issued with the control signal start set to 1. As soon as the
control signal start is set to 1, a signal that selects a kind of
operation is activated. The state value is updated to a value
associated with the kind of operation. The operation is then
started. When the control signal mul_long is set to 1 indicating
multiplication of 32 bits, the state tmg=0 is changed to the state
tmg=1. Otherwise, the state tmg=0 is changed to the state
tmg=8.
[0171] In the state tmg=1, an operation expression
a[15:0].times.b[15:0] is solved. The resultant data is stored in
the register reg2. When the operation expression is signed, a
control signal sign that is not shown is set to 1. When the
operation expression is unsigned, the control signal sign is reset
to 0. a[15:0] signifies 16 low-order bits of data a.
[0172] In the state tmg=2, an operation expression
a[31:16].times.b[15:0] is solved. The resultant data is stored in
the register reg1. When the operation expression is signed, the
control signal sign is set to 1. When the operation expression is
unsigned, the control signal sign is reset to 0. When the 32
high-order bits of a product of 32 bits by 32 bits are needed, a
control signal upper that is not shown is set to 1. When the 32
low-order bits of the product of 32 bits by 32 bits are needed, the
control signal upper is reset to 0. If the control signal upper is
set to 1, the next state is the state tmg=3. If the control signal
upper is reset to 0, the next state is the state tmg=6.
[0173] In the state tmg=3, an operation expression
{16'h0000,reg2[31:16]}+reg1 is solved. The resultant data is stored
in the register reg2. An operation expression
a[15:0].times.b[31:16] is solved. The resultant data is stored in
the register reg1. When the operation expression is signed, the
control signal sign is set to 1. When the operation expression is
unsigned, the control signal sign is reset to 0. The next state is
the state tmg=4. {16'h0000,reg2[31:16]} signifies that 16
high-order bits are 0000 and the low-order bits constitute 32-bit
data signified by reg2[31:16].
[0174] In the state tmg=4, an operation expression reg2+reg1 is
solved. The resultant data is stored in the register reg2. An
operation expression a[31:16].times.b[31:16] is then solved, and
the resultant data is stored in the register reg1. The next state
is the state tmg=5. When the operation expression is signed, the
control signal sign is set to 1. When the operation expression is
unsigned, the control signal sign is reset to 0.
[0175] In the state tmg=5, when the control signal sign is set to
1, an operation expression {16{reg2[15]},reg2[15:0]}+reg1 is
solved, and the resultant data is stored in the register regc. When
the control signal sign is reset to 0, an operation expression
{16'h0000,reg2[15:0]}+reg1 is solved, and the resultant data is
stored in the register regc. When the operation expression is
signed, the control signal sign is set to 1. When the operation
expression is unsigned, the control signal sign is reset to 0. When
the control signal start is set to 1, the state value tmg is
updated based on the state transition chart having the state tmg=0
as a reference state. The next operation is then executed. In the
next state, the contents of the register regc are transmitted as
the result of the operation through an output terminal c, and
stored in a destination register. 16{reg2[15]} signifies data
acquired by repeating reg2[15] 16 times for 16 bits.
[0176] In the state tmg 6, an operation expression
req2+{reg1[15:0],16'h0000} is solved, and the resultant data is
stored in the register reg2. An operation expression
a[15:0].times.b[31:16] is then solved, and the resultant data is
stored in the register reg1. When the operation expression is
signed, the control signal sign is set to 1. When the operation
expression is unsigned, the control signal is reset to 0. The next
state is the state tmg=7.
[0177] In the state tmg=7, an operation expression
reg2+{reg1[15:0],16'h0000} is solved, and the resultant data is
stored in the register regc. When the control signal start is reset
to 0, the next state is the state tmg=0. When the control signal
start is set to 1, the state value tmg is updated based on the
state transition chart having the state tmg=0 as a reference state.
The next operation is then executed. In the next state, the result
of the operation is transmitted through the output terminal c.
[0178] In the state tmg=8, an operation expression
a[15:0].times.b[15:0] is solved, and the resultant data is stored
in the register regc. Moreover, an operation of working out 16 bits
by multiplying 8 bits by 8 bits, an operation of working out 16
bits by multiplying 16 bits by 16 bits, and an operation of working
out 32 bits by multiplying 16 bits by 16 bits are carried out When
the operation expression is signed, the control signal sign is set
to 1. When the operation expression is unsigned, the control signal
sign is reset to 0. When multiplication of 8 bits is performed, a
control signal leg is set to 1. When multiplication of 16 bits is
performed, the control signal leg is reset to 0. When the control
signal start is reset to 0, the next state is the state tmg=0. When
the control signal start is set to 1, the state value tmg is
updated based on the state transition chart having the state tmg=0
as a reference state. An operation is then executed. In the next
state, the result of the operation is stored in the register
regc.
[0179] The multiplier MULT has its multiplying action controlled
through state transition control described in conjunction with FIG.
19. The 32 high-order bits of the result of multiplication of 32
bits by 32 bits can be obtained by making a state transition in the
order of the states tmg=1, tmg=2, tmg=3, tmg=4, tmg=5, and tmg=8.
Moreover, the 32 low-order bits of the result of multiplication of
32 bits by 32 bits can be obtained by making the state transition
in the order of the states tmg=1, tmg=2, tmg=6, tmg=7, and tmg=8.
For example, the former is initiated with a multiplication
instruction MULU.L, and the latter is initiated with a
multiplication instruction MULU/UP.L. When the latter
multiplication is used to work out the high-order bits of the
result of multiplication of an inverse number, it substantially
provides a result of division. Thus, fast division can be
realized.
[0180] Incidentally, the states to be changed may include other
states needed to realize other operations (summation of products or
the like).
[0181] FIG. 20 to FIG. 22 are flowcharts showing instruction
execution in relation to the state of the instruction buffer IBUF.
The contents of control presented in the flowcharts may be
understood to be equivalent to the contents of control extended by
the instruction decoder DEC.
[0182] Referring to FIG. 20, the instruction execution sequence
will be described by taking a transfer instruction (MOV.W
@aa:16,Rd) for instance. In the flowchart, whether an instruction
is skipped is verified by verifying whether the FIFO buffer FB is
full. "Yes" signifies that the result of verification demonstrates
that the FIFO buffer FB is full. Verification of whether the
modification signal mod_db is received corresponds to verification
of whether it is now a delay slot period. "Yes" signifies that the
result of the verification demonstrates that it is now the delay
slot. When the FIFO buffer is full, if it is now the delay slot,
instruction fetch is not performed.
[0183] When the transfer instruction enters a first state (S1),
reading of the next instruction (if) and increment of the prefetch
counter pfc are performed in response to a control signal A. In
response to a control signal B, the read instruction is stored in
the instruction buffer IFIFO.
[0184] If the read instruction is executed as a delay slot
instruction (mod_db=yes), no operation is performed in the first
state thereof (S1') of the transfer instruction. This is because
even if instruction prefetch is executed, the resultant instruction
is not utilized.
[0185] When the transfer instruction enters a second state (S2),
the EA extension field (aa) of the transfer instruction is
interpreted according to a different code through code extension on
the basis of an output rslt1 of the instruction buffer IBUF in
response to the control signal A. The interpreted data is placed on
the internal address bus IAB by way of the internal bus ab and
address buffer AB. At the same time, a bus command is issued in
order to read data. In response to the control signal B, the data
read in the second state of the transfer instruction is stored in
the read data buffer DBR over the internal data bus IDB in a third
state of the transfer instruction. Furthermore, with a control
signal C, the data is transferred from the read data buffer DBR to
the general-purpose registers over the internal bus wb2 in the next
state of the transfer instruction. Moreover, data is tested, and
the result of the test is held in the flag register CCR.
[0186] When the transfer instruction enters the third state (S3),
reading (if) of an instruction succeeding the next instruction and
increment of the prefetch counter pfc are performed in response to
the control signal A. In response to the control signal B, the read
instruction is stored in the instruction buffer IBUF.
[0187] When the instruction is executed as a delay slot instruction
(mod_db=yes), the contents of the temporary register TR are read
instead of the contents of the prefetch counter pfc in the third
state (S3') of the transfer instruction. Moreover, an instruction
reading request (ifa) is issued irrespective of the number of bits
remaining in the FIFO buffer FB.
[0188] The first state and third state of the transfer instruction
are omitted (skipped) depending on the number of read bits of the
transfer instruction (saved in the FIFO buffer). If the number of
read bits is small, the actions defined to be performed in the
first and third states are carried out. An instruction longer than
the transfer instruction (two words) is then read. If the number of
read bits is appropriate, the action defined to be performed in one
of the first and third states is carried out. An instruction having
the same length as the transfer instruction (of two words long) is
then read. If the number of read bits is large, the actions defined
to be performed in the first and third states are not carried out.
No instruction is therefore read.
[0189] To be more specific, if the number of bits remaining in the
instruction buffer at the start of execution of the transfer
instruction, delta, has a relationship of delta>num relative to
the data length num of an instruction code representing the
transfer instruction, the first state of the transfer instruction
is omitted. This is because the transfer instruction including its
EA extension field has already been read and is immediately
executable. Besides, the next instruction is made available in
consideration of the latency derived from instruction reading.
Moreover, if delta-num>2 is established, the third state of the
transfer instruction is omitted. This is because the first word of
the next instruction has already been read, and execution of the
instruction can be started. Moreover, even if the next instruction
is terminated in the first state, another word is made available in
consideration of a latency derived from instruction reading.
[0190] As mentioned above, in the first state S1 of the transfer
instruction, reading of the transfer instruction (if) and increment
of the prefetch counter pfc are disabled for fear the instruction
buffer may overflow. This action is shared with the previous
instructions, and the definition of the action is omitted from the
description of each instruction. Instruction reading (ifa) does not
depend on the number of bits remaining in the FIFO buffer FB and is
therefore not disabled.
[0191] FIG. 21 presents an execution sequence of a branch
instruction (JMP @aa:32). When the branch instruction enters a
first state (S1), reading of the next instruction (ifw) and
increment of the prefetch counter pfc are performed in response to
a control signal A. In response to a control signal B, the read
instruction is stored in the instruction buffer IBUF.
[0192] When the branch instruction enters a second state (S2),
storage of the read instruction in the instruction buffer IBUF is
waited.
[0193] When the branch instruction enters a third state, the EA
extension field (aa) of the branch instruction is placed on the
internal address bus IAB over the internal bus ab via the address
buffer AB on the basis of the outputs reslt1 and rslt2 of the
instruction buffer IBUF in response to the control signal A. The
contents of the EA extension field are incremented by the
incrementer INC and saved in the prefetch counter pfc. At the same
time, a bus command is issued in order to execute instruction
reading. In response to the control signal B, the read instruction
is stored in the instruction buffer IBUF.
[0194] When the branch instruction enters a fourth state, reading
of an instruction succeeding the next instruction (if) and
increment of the prefetch counter pfc are performed in response to
the control signal A. In response to the control signal B, the read
instruction is stored in the instruction buffer IBUF.
[0195] The first and second states of the branch instruction are
omitted (skipped) depending on the number of read bits of the
branch instruction (saved in the FIFO buffer FB).
[0196] If the number of bits remaining in the instruction buffer at
the start of execution of the branch instruction, delta, has a
relationship of delta.gtoreq.num relative to the number of bits num
representing the branch instruction, the first and second states of
the branch instruction are omitted. This is because the branch
instruction including the EA extension field thereof has already
been read and can be immediately executed. Moreover, when
consideration is taken into the number of bits constituting an
instruction code representing an instruction being read, ifnum, if
delta+ifnum.gtoreq.num is established, the first state of the
branch instruction is omitted. This is because the branch
instruction has already been read, and can be executed one state
later. It is therefore unnecessary to additionally perform
instruction reading.
[0197] When the action supposed to be performed in the first
execution state of the branch instruction is carried out, since at
least the first and second words of the branch instruction have
been read according to the previous instruction, only one word is
short. When a program is transferred over a 16-bit bus, an
instruction should be read in units of a word. If a request for
instruction reading in units of 32 bits is issued, an unnecessary
word is also waited. Thus, a wait state is inserted. Therefore, the
control signal ifw is activated in order to forcibly perform
instruction reading in units of a word.
[0198] FIG. 22 presents an execution sequence of a plurality-of-bit
shifts instruction (SHLL #xx,Rd). In the first to third execution
states of the plurality-of-bit shifts instruction, reading of an
instruction (if) and increment of the prefetch counter pfc are
performed in response to a control signal A. In response to a
control signal B, the read instruction is stored in the instruction
buffer IBUF. Further, in response to a control signal B, the
shifter 21 performs an operation. This action is specified as
do_shift. More particularly, the contents of the designated
destination register Rd are read and placed on the internal bus gb
in response to a control signal rdgb[n], and transmitted to the
shifter 21. The control signals sft1, sftr, sft1, sft2, sft4, sft8,
and sft16 are issued in order to initiate shift operations. In
response to a control signal wbrd[n], the results of the shift
operations are stored in the destination register Rd. A
predetermined flag is set in the condition code register CCR. A
control signal sftcnt is then produced.
[0199] As mentioned above, when all the bits of the control signal
sftcnt are 0s, the plurality-of-bit shifts instruction is verified
to be terminated. When all the bits of the control signal sftcnt
are not 0s, insertion of an interrupt is disabled (an interrupt
directing signal intrq produced by the interrupt control section
INTC shown in FIG. 1 is not treated). This action is defined to be
performed in up to three states of the plurality-of-bit shifts
instruction. Thus, up to 31 bit positions is arbitrarily
shifted.
[0200] As mentioned above, instruction reading (if) and increment
of the prefetch counter pfc are disabled appropriately for fear the
instruction buffer may overflow. This action is shared with the
preceding instructions. Therefore, the description of the action is
omitted from the description of each instruction.
[0201] FIG. 23 to FIG. 25 present the instruction execution timings
in different bus modes. The timing charts are concerned with a case
where a program briefed below is executed for a branch target
designated in a branch instruction.
TABLE-US-00001 pc0 BTST #6,R0L BEQ t1 ...... t1 BM1 t2 ......
[0202] The branch conditions inherent to respective conditional
branch instructions BEQ and BMI shall be satisfied. An external
memory is connected on a 16-bit bus, and accessible over two state
periods. Incidentally, the Ready signal ready is a response signal
returned from the bus controller relative to a request for
instruction reading or data reading or writing (bus command) issued
from the CPU. The Ready signal serves as an enabling signal that
enables the CPU to make a state transition.
[0203] FIG. 23 shows execution timings in the built-in ROM. This is
concerned with a case where instructions BTST (bit handling
instruction), BEQ (conditional branch instruction), and BMT
(conditional branch instruction) are fetched from the built-in ROM
and executed.
[0204] At the timing T0, an immediately preceding branch
instruction is executed, and an address is therefore read from the
address buffer AB incorporated in the CPU 2 and placed on the
internal address bus IAB. Moreover, the instruction decoder DEC
issues a bus command BCMD, which directs instruction fetch
(if-long) to be performed on an instruction of a long word length.
if-long corresponds to 'longread.
[0205] At the timing T1, reading is executed based on the address
signal placed on the bus IAB at the timing T0 and the bus command
BCMD issued at the timing T0. Consequently, read data of a long
word long is placed on the internal data bus IDB. At the timing T2,
the instruction decoder DEC receives a preceding word (instruction
BTST) of the read data and interprets the instruction. A succeeding
word (instruction BEQ) is stored in the FIFO buffer FB included in
the instruction buffer IBUF.
[0206] Based on the result of the interpretation, a given control
signal is produced and the instruction BTST is executed. The
details will be omitted. At the timing T2, an instruction of a long
word long is fetched.
[0207] On the other hand, at the timing T2, reading is executed
based on the address signal placed on the bus IAB at the timing T1
and the bus command BCMD issued at the timing T1. Read data of a
long word length is placed on the internal data bus IDB. At the
timing T3, the instruction decoder DEC receives the next
instruction (instruction BEQ) from the instruction buffer IBUF, and
interprets the instruction.
[0208] Based on the result of the interpretation, an effective
address is calculated. At the timings T3 and T4, an instruction of
a long word length is fetched from a branch target (t1, t1+4).
[0209] Likewise, at the timing T5, the instruction decoder DEC
receives data (instruction BMI) read based on the address signal
placed on the bus IAB at the timing T3 and the bus command BCMD
issued at the timing T3, and interprets the instruction. Based on
the result of the interpretation, an effective address is
calculated. At the timings T5 and T6, an instruction of a long word
length is fetched from a branch target (t2, t2+4).
[0210] FIG. 24 presents actions of executing instructions fetched
from an external memory in 16-bit bus mode. Over the timings T0 to
T1, an immediately preceding branch instruction is executed.
Consequently, an address signal is read from the address buffer AB
included in the CPU 2 and placed on the internal address bus IAB.
Moreover, the instruction decoder DEC issues a bus command BCMD
that directs information fetch of a word length (if-word). if-word
corresponds to 'wordread.
[0211] Over the timings T2 to T3, reading is performed on the
external buses (EXA and EXD) according to the address signal placed
on the bus IAB over the timings T0 to T1 and the bus command BCMD
issued over the timings T0 to T1. Read data is placed on the
internal data bus IDB. At the timing T4, the instruction decoder
DEC receives the read data (instruction BTST) and interprets the
instruction.
[0212] Based on the result of the instruction, a given control
signal is produced, and the instruction BTST is executed. The
details will be omitted. Over the timing T4 to T5, an information
of a word length is fetched.
[0213] On the other hand, over the timings T4 to T5, reading is
performed on the external buses (EXA and EXD) according to the
address signal placed on the bus IAB over the timings T2 to T3 and
a bus command BCMD issued over the timings T2 to T3. Read data is
placed on the internal bus IDB. At the timing T6, the instruction
decoder DEC receives the read data (instruction BEQ) and interprets
the instruction.
[0214] Based on the result of the interpretation, an effective
address is calculated. Over the timings T6 to T7 and the timings T8
to T9, an instruction of a word length is fetched from a branch
target (t1, t1+2). Likewise, over the timings T8 to T9, the
instruction decoder DEC receives data (instruction BMI) read based
on the address signal placed on the bus IAB over the timings T6 and
T7 and the bus command BCMD issued over the timings T6 and T7. The
instruction is interpreted. Based on the result of the
interpretation, an effective address is calculated. Over the
timings T10 to T11 and at the timings T12 to T13, an instruction of
a word length is fetched from a branch target (t2, t2+2).
[0215] As apparent from FIG. 24, although a clock frequency is
lower than the one attained in the built-in ROM as shown in FIG.
23, since the 16-bit bus mode is adopted, the number of bus cycles
is the same as that shown in FIG. 23.
[0216] FIG. 25 presents actions of executing instructions fetched
from an external memory in 32-bit bus mode. Over the timings T0 to
T3, an immediately preceding branch instruction is executed, and an
address read from the address buffer AB incorporated in the CPU 2
is placed on the address bus IAB. Moreover, the instruction decoder
DEC issues a bus command (BCMD) that directs fetching (if-long) of
an instruction of a long word long.
[0217] Over the timings T4 to T7, reading based on the address
signal placed on the bus IAB over the timings T0 to T3 and the bus
command BCMD issued over the timings T0 to T3 is performed on the
external buses (EXA and EXD) as two times of reading of an
instruction of a word long. Over the timings T6 to T7, the read
data of a long word long is placed on the internal data bus IDB. At
the timing T8, the instruction decoder DEC receives a preceding
word (instruction BTST) of the read data, and interprets the
instruction. A succeeding word (instruction BEQ) is stored in the
FIFO buffer FB included in the instruction buffer IBUF. Based on
the result of the interpretation, a given control signal is
produced and the instruction BTST is executed. The details will be
omitted. Over the timings T8 to T11, an instruction of a long word
long is fetched.
[0218] On the other hand, over the timings T8 to T11, reading based
on the address signal placed on the internal address bus IAB over
the timings T4 to T7 and the bus command BCMD issued over the
timings T4 to T7 is performed on the external buses (EXA and EXD)
as two readings of words. Over the timings T10 to T11, read data of
a long word long is placed on the internal data bus IDB. At the
timing T12, the instruction decoder DEC receives the next
instruction (instruction BEQ) read from the instruction buffer IBUF
and interprets the instruction. Based on the result of the
interpretation, an effective address is calculated. Over the
timings T12 to T15 and the timings T16 to T19, an instruction of a
long word long is fetched from a branch target (t1, t1+4).
[0219] Likewise, over the timings T18 to T19, reading is performed
based on the address signal placed on the bus IAB over the timings
T12 to T15 and the bus command BCMD issued over the timings T12 to
T15. At the timing T20, the instruction decoder DEC receives the
read data (instruction BMT) and interprets the instruction. Based
on the result of the interpretation, an effective address is
calculated. Over the timings T20 to T23 and the timings T24 to T27,
an instruction of a long word long is fetched from a branch target
(t2, t2+4).
[0220] In 32-bit bus mode, unlike 16-bit bus mode, instructions
fetched during five times of word reading performed on the external
buses over the timings T10 to T15 or the timings T20 to T23 are
unused and wasted. Consequently, the number of execution states
increases. If execution of the program is not preceded by execution
of a branch instruction, instruction reading may be disabled
depending on the number of bits remaining in the instruction
buffer. Therefore, the bus command issued over the timings T4 to T5
may be disabled (NOP: no-operation), and word reading may not be
performed on the external buses over the timings T12 to T15.
Consequently, the instructions may be completed in one execution
state. Moreover, when a branch condition for the instruction BEQ or
BMI is not satisfied, a branch does not take place. In this case,
if an interrupt is not requested, an instruction code that is read
is used but not wasted.
[0221] FIG. 26 shows a first example of execution timings for a
normal branch instruction and others. Herein, the normal branch
instruction is a branch instruction that is not a delay branch
instruction. A program taken as an example will be briefed
below.
TABLE-US-00002 MOV.W @ER1,R0 BRA target ...... target
[0222] As mentioned above, at the timing T2, the instruction
decoder DEC receives an instruction MOV and interprets the
instruction. Based on the result of the interpretation, the
contents of the general-purpose register ERI are placed as an
effective address on the internal address bus IAB. A bus command
(data-read) that directs word reading is issued. At the same time,
a data reading control signal containing a register selection
signal is produced. Actions started at the timings T3 and T4 are
controlled sequentially. Specifically, fetching into the read data
buffer DBR is directed at the timing T3. Transfer of the contents
of the read data buffer DBR to the destination register R0 is
directed at the timing T5.
[0223] Read data is placed on the internal data bus IDB at the
timing T3. As mentioned above, the read data is fetched into the
read data buffer DBR at the timing T4, and transferred to the
general-purpose register R0 at the timing T5. At the same time, a
predetermined flag in the condition code register CCR is updated
based on the transferred data. Moreover, instruction reading is
performed at the timing T4. The count value of the prefetch counter
pfc is placed on the internal address bus IAB, and incremented by
the incrementer INC.
[0224] At the timing T4, the instruction decoder DEC receives an
instruction BRA and interprets the instruction. Based on the result
of the interpretation, an effective address is calculated. At the
timing T4, an instruction of a long word long (a word when a value
specified in target is not a multiple of 4) is fetched from a
branch target (target). The branch target address is incremented by
the incrementer INC, and saved in the prefetch counter pfc. At the
timing T5, a control signal clrFIFO (equivalent to the control
signal pfb) is activated and the FIFO buffer FB is initialized.
Moreover, an instruction at a branch target is transmitted to the
instruction decoder DEC.
[0225] FIG. 27 shows a second example of execution timings for a
normal branch instruction and others. A program taken as an example
will be briefed below.
TABLE-US-00003 MOV.W @aa:16,R0 BRA target ...... target
[0226] As mentioned above, at the timing T2, the instruction
decoder DEC receives an instruction MOV and interprets the
instruction. Based on the result of the interpretation, instruction
reading is executed in the first state of the instruction MOV (S1
in FIG. 20) that starts at the timing T2. Reception of the EA
extension field of the instruction MOV is waited. In the second
state of the instruction MOV that starts at the timing T3, the EA
extension field (aa) of the instruction MOV is transmitted from the
instruction buffer IBUF through an output terminal out1, and
interpreted according to a different code through code extension.
The resultant data is transmitted as an effective address from the
address buffer AB onto the internal address bus IAB. Moreover, a
bus command (data-read) that directs word reading is issued. At the
same time, a data reading control signal containing a register
selection signal is produced, and actions to be performed over the
timings T4 and T5 are controlled sequentially. Specifically,
fetching into the read data buffer DBR is directed at the timing
T4. Transfer of the contents of the read data buffer DBR into the
destination register R0 is directed at the timing T5.
[0227] Read data is placed on the internal data bus IDB at the
timing T4, and, as mentioned above, fetched into the read data
buffer DBR at the timing T5. The data is then transferred to the
register R0 at the timing T6. At the same time, a predetermined
flag in the condition code register CCR is updated based on the
transferred data. Moreover, instruction reading is performed at the
timing T4.
[0228] The instruction decoder DEC receives an instruction BRA at
the timing T5, and interprets the instruction. Based on the result
of the interpretation, an instruction of a long word long is
fetched from a branch target (target) at the timing T5. The branch
target address is incremented by the incrementer INC and saved in
the prefetch counter pfc. At the timing T6, a control signal
clrFIFO (equivalent to the signal pfb) is activated and the FIFO
buffer FB is initialized. Moreover, an instruction read from a
branch target and saved in the instruction register IR is
transmitted to the instruction decoder DEC.
[0229] FIG. 28 shows a first example of execution timings for a
delay branch instruction and others. A program taken as an example
will be briefed below. [0230] BRA/S target [0231] MOV.W @ER1,R0
[0232] . . . target Processing is substantially identical to the
one achieved by the program described in FIG. 26. BRA/S denotes a
branch instruction accompanied by a delay slot instruction, that
is, a delay branch instruction.
[0233] At the timing T2, the instruction decoder DEC receives a
delay branch instruction (BRA/S) and interprets the instruction.
Based on the result of the interpretation, an effective address is
calculated. At the timing T3, an instruction of a long word long (a
word when the address of the branch target is not a multiple of 4)
is fetched from a branch target (target). The branch target address
is incremented by the incrementer INC, and saved in the temporarily
register TR. Moreover, a control signal mod_db (indicating that the
instruction is a delay slot instruction) is transmitted in order to
modify a delay slot instruction. At the timing T3, the instruction
decoder DEC receives the delay slot instruction (in this case,
instruction MOV) and interprets the instruction. Based on the
result of the instruction, the contents of the general-purpose
register ER1 are placed as an effective address on the internal
address bus IAB. A bus command (data-read) that directs word
reading is issued. At the same time, a data reading control signal
containing a register selection signal is produced. Based on the
control signal, actions to be performed at the timings T4 and T5
are controlled. Specifically, fetching data into the read data
buffer DBR is directed at the timing T4. At the timing T5, transfer
of the contents of the read data buffer DBR into the destination
register R0 is directed.
[0234] Read data is placed on the internal data bus IDB at the
timing T4, fetched into the read data buffer DBR at the timing T5,
and transferred to the register R0 at the timing T6. At the same
time, a predetermined flag in the condition code register CCR is
updated based on the transferred data.
[0235] Moreover, instruction reading is executed at the timing T4.
At this time, the delay slot instruction is modified based on the
control signal mod_db. Thus, the contents of the temporary register
TR are used instead of the contents of the prefetch counter pfc.
The address read from the temporary register TR is incremented by
the incrementer INC and saved in the prefetch counter pfc.
Moreover, a control signal stkIR is activated at the timing T3, and
an instruction fetched from a branch target is stored in the
register IR included in the instruction buffer IBUF. At the timing
T4, a control signal clrFIFO (equivalent to a signal pfb) is
activated and the FIFO buffer FB is initialized. The instruction
fetched from the branch target and stored in the instruction
register IR is transmitted to the instruction decoder DEC.
[0236] By combining a delay branch instruction and a delay slot
instruction, a latency derived from instruction reading from a
branch target is concealed and a delay branch instruction is
executed in one execution state. Actions supposed to be performed
in the first states of the instruction BRA and instruction BRA/S
respectively are basically identical to each other. A difference
between the actions lies in whether an output of the incrementer
INC is saved in the prefetch counter pfc or in the temporary
register TR.
[0237] FIG. 29 shows a second example of execution timings for a
delay branch instruction and others. A program to be presented as
an example will be briefed below.
TABLE-US-00004 BRA/S target MOV.W @aa:16,R0 ...... target
Processing performed by the program is substantially identical to
that performed by the program described in FIG. 27.
[0238] At the timing T2, the instruction decoder DEC receives a
delay branch instruction (instruction BRA/S) and interprets the
instruction. Based on the result of the interpretation, an
instruction of a long word long is fetched from a branch target
(target) at the timing T3. The branch target address is incremented
by the incrementer INC, and saved in the temporary register TR.
Moreover, a control signal mod_db is transmitted in order to modify
a delay slot instruction. The information decoder DEC receives the
delay slot instruction (instruction MOV) and interprets the
instruction. The delay slot instruction is modified based on the
result of the interpretation and the control signal mod_db. In the
first state of the delay slot instruction (S1 in FIG. 20) that
starts at the timing T4, instruction reading is not performed, but
reception of the EA extension field of the delay slot instruction
is waited. In short, the EA extension field whose reading has
already started is transferred to the FIFO buffer FB. In the second
state of the delay slot instruction that starts at the timing T4,
the EA extension field (aa) of the delay slot instruction is
transmitted from the instruction buffer IBUF through an output
terminal out1. The EA extension field is interpreted according to a
different code through code extension, and transmitted as an
effective address from the address buffer AF over the internal
address bus IAB. Furthermore, a bus command BCMD (data-read) that
directs word reading is issued. At the same time, a data reading
control signal containing a register selection signal is produced,
and actions supposed to be performed at the timings T5 and T6 are
controlled based on the control signal. Specifically, fetching into
the read data buffer DBR is directed at the timing T5, and transfer
of the contents of the read data buffer DBR to the destination
register R0 is directed at the timing T6.
[0239] Read data is placed on the internal data bus IDB at the
timing T5, and, as mentioned above, fetched into the read data
buffer DBR at the timing T6. At the timing T7, the data is
transferred to the register R0. At the same time, a predetermined
flag in the condition code register CCR is updated based on the
transferred data.
[0240] Moreover, instruction reading is executed at the timing T5.
At this time, when the delay slot instruction is modified based on
the control signal mod_db, the contents of the temporary register
TR are used instead of the count value of the prefetch counter pfc.
The address represented by the contents of the temporary register
is incremented by the incrementer INC and saved in the prefetch
counter pfc.
[0241] Moreover, a control signal stkIR is activated at the timing
T3, and an instruction at a branch target is stored in the
instruction register IR included in the instruction buffer IBUF. At
the timing T5, a control signal clrFIFO (equivalent to a signal
pfb) is activated and the FIFO buffer FB is initialized. The
instruction fetched from the branch target and stored in the
instruction register IR is transmitted to the instruction decoder
DEC. The FIFO buffer FB is initialized at the timing T6. Therefore,
during execution of the delay slot instruction, for example, the EA
extension field (ea) may be transmitted from the FIFO buffer FB at
the timing T4 and put to use.
[0242] Instruction reading supposed to be performed in the first
state (S1) of the instruction MOV is disabled, and wasting
instruction reading is avoided.
[0243] FIG. 30 shows a first example of execution timings for a
shift instruction. Herein, the execution timings for executing
instruction SHLL #4, ER3 will be illustrated. The instruction
directs a left shift of 4 bit positions.
[0244] At the timing T2, the instruction decoder DEC receives an
instruction code (shll4) and interprets the instruction. Based on
the result of the interpretation, the contents of the destination
register ER3 are placed on the internal bus gb at the timing T3,
and transmitted to the arithmetic and logic unit ALU. Control
signals sft1 and sft4 are activated. The first shifter 27 does not
perform a shift but the second shifter 28 performs a shift
operation of four bit positions. The result of the shift operation
is placed on the internal bus wb2. At the same time, the contents
of the internal bus wb2 are stored in the general-purpose register
ER3 that is a destination register. Moreover, the result of the
shift is verified in order to update a predetermined bit in the
condition code register CCR. For example, if the result of the
operation is such that all bits are zeros, a Z flag is set to 1.
Moreover, the data placed on the internal bus gb[28] and subjected
to shift-out is saved as a C flag. The details of instruction
reading will be omitted.
[0245] FIG. 31 shows a second example of execution timings for a
shift instruction. Herein, the execution timings for instruction
SHLL #31, ER3 are illustrated. The instruction directs a left shift
of 31 bit positions.
[0246] At the timing T2, the instruction decoder DEC receives an
instruction code (shll31) and interprets the instruction. Based on
the result of the interpretation, the contents of the destination
register ER3 are placed on the internal bus gb and transferred to
the arithmetic and logic unit ALU over the timings T3 to T5. At the
timing T3, control signals sft1 (directing a left shift), sft1
(directing a shift of one bit position), sft4 (directing a shift of
four bit positions) are activated. The first shifter 27 performs a
shift operation of one bit position, and the second shifter 28
performs a shift operation of four bit positions (a left shift of
five bit positions in total). The results of the shift operations
(intermediate results result') are placed on the internal bus wb2.
At the same time, the data on the internal bus wb2 is stored in the
general-purpose register ER3 that is a destination register.
Moreover, the results of the shifts are verified, and a
predetermined bit in the condition code register CCR is
updated.
[0247] At the timing T4, the control signals sft1 (left shift),
sft2 (shift of two bit positions), and sft8 (shift of eight bit
positions) are activated. The first shifter 27 performs a shift
operation of two bit positions, and the second shifter 28 performs
a shift operation of eight bit positions (a left shift of 15 bit
positions in total over the timings T3 and T4). The results of the
shift operations (intermediate result result'') are placed on the
internal bus wb2. At the same time, the data on the internal bus
wb2 is stored in the general-purpose register ER3 that is the
destination register. Moreover, the results of the shifts are
verified, and a predetermined bit in the condition code register
CCR is updated.
[0248] At the timing T5, the control signals sft1 (left shift) and
sft16 (shift of 16 bit positions) are activated. The first shifter
27 does not perform a shift, but the second shifter 28 performs a
shift operation of 16 bit positions (a left shift of 31 bit
positions in total over the timings T3 to T4). The results of the
shift operations (result) are placed on the internal bus wb2. At
the same time, the data on the internal bus wb2 is stored in the
general-purpose register ER3 that is the destination register.
Moreover, the final results of the shift operations are verified,
and a predetermined bit in the condition code register CCR is
updated. Instruction reading supposed to be performed over the
timings T3 and T4 is illustrated to be disabled because of the
number of bits remaining in the FIFO buffer FB.
[0249] FIG. 32 shows a first example of execution timings for a
multiplication instruction. Herein, the execution timings for
instruction MULU.L ER1, ER2 are illustrated. The instruction
directs the processing of acquiring 32 low-order bits of a product
of 32 bits by 32 bits.
[0250] At the timing T2, the instruction decoder DEC receives an
instruction code (mulu), and interprets the instruction. Based on
the result of the interpretation, a Start signal start is
transmitted to the multiplier MULT at the timing T2. The multiplier
MULT starts changing states. At the timing T3, the contents of the
source register ER1 are placed on the internal bus db, and the
contents of the destination register ER2 are placed on the internal
bus gb. The contents are transferred to the multiplier MULT. Over
the timings T3 to T5, a control signal mul_long is activated in
order to signify multiplication of 32 bits by 32 bits. The
multiplier MULT is controlled based on the control signal mul_long.
At the timing T7, the result of multiplication is placed on the
internal bus wb2 according to a control signal mul_rd. Moreover, at
the timing T8, the data on the internal bus wb2 is stored in the
general-purpose register ER2 that is the destination register.
Moreover, the result of the multiplication is verified in order to
update a predetermined bit in the condition code register CCR.
[0251] FIG. 33 shows a second example of execution timings for a
multiplication instruction. Herein, the execution timings for
instruction MULU/UP. L #12, ER2 are illustrated. The instruction
directs the processing of acquiring the 32 high-order bits of a
product of 32 bits by 32 bits.
[0252] At the timing T2, the instruction decoder DEC receives an
instruction code (mulu), and interprets the instruction. Based on
the result of the interpretation, a Start signal start is
transmitted to the multiplier MULT at the timing T2. The
multiplication MULT starts changing states. At the timing T3,
immediate data (12) is extended with zeros and placed on the
internal bus db. The contents of the destination register ER2 are
placed on the internal bus gb, and transferred to the multiplier
MULT. Over the timings T3 to T8, a control signal mul_long is
activated in order to signify multiplication of 32 bits by 32 bits.
A control signal upper that signifies acquisition of high-order
bits is activated. The multiplier MULT is thus controlled.
[0253] At the timing T8, the result of the multiplication is placed
on the internal bus wb2 according to a control signal mul_rd. At
the timing T9, the data on the internal bus wb2 is stored in the
general-purpose register ER2 that is a destination register.
Moreover, the result of the multiplication is verified in order to
update a predetermined bit in the condition code register CCR.
[0254] FIG. 34 presents a configuration for producing the control
bit bmode16. The control bit bmode16 is produced by a flip-flop
FFbc, and initialized in a reset state (attained responsively to a
signal rstb=0 whose trailing b signifies that the signal is a
reverse signal of a reset signal rst). The AND (iowr&bs_rsel)
of a Write signal iowr that directs writing of an internal I/O
register and a selection signal bs_rsel that designates an internal
I/O register containing the control bit bmode16 is adopted as an
enabling signal. A bit at a predetermined bit position on the data
bus IDB, for example, bit at bit position 0 is received. The
flip-flop FFbc is incorporated in the bus controller 3. The
flip-flop FF shown in FIG. 1 receives an output of the flip-flop
FFbc.
[0255] FIG. 35 presents another configuration of the flip-flop FFbc
for producing the control bit bmode16. An enabling signal shall be
provided as the OR ((iowr&bs_rsel)|clrFIFO) of the above AND
and a control signal clrFIFO (equivalent to pfb) that directs
execution of a branch. When the control signal clrFIFO is active,
input data (d) is a signal sel_bl6 indicating that an address the
bus controller has detected by referencing the internal address bus
IAB is a 16-bit bus. Consequently, every time a branch takes place,
the bus width of a branch target can be verified and bus modes can
be switched automatically. Incidentally, a signal signifying a
branch may be the signal clrFIFO or a dedicated signal that is
newly produced.
[0256] FIG. 36 schematically shows a development environment for
development of a program or the like to be run in the CPU 2. A
developing engineer uses various editors to produce a program
written in C or an assembly language. The program is normally
produced by assigning jobs to a plurality of modules. A C compiler
30 receives source programs that are written in C and produced by
the developing engineer, and transmits a source program written in
the assembly language and an object module. An assembler 31
receives the source program written in the assembly language, and
transmits an object module. A linkage editor 32 receives the
plurality of object modules produced by the C compiler and
assembler, resolves the external references to the modules and the
relative addresses thereof, and integrates the modules into one
program. The linkage editor 32 then transmits a load module. The
load module is transferred to a simulator/debugger 33.
Consequently, the actions of a CPU are simulated in a system
development device such as a personal computer, and the results of
the simulation are displayed in order to analyze or evaluate a
program. Moreover, the load module may be transferred to an
emulator 34, and so-called in-circuit emulation is performed in
order to realize the actions of a microcomputer to be performed in
an actual applied system or the like. Thus, the actual actions of
the microcomputer can be analyzed or evaluated. Furthermore, the
load module may be transferred to a PROM writer 35 so that a
produced program can be stored in a flash memory adopted as the
built-in ROM of the microcomputer or in an external flash memory.
If necessary, the program may be converted into a desired format
using an object converter. Otherwise, general-purpose subroutines
may be provided as a librarian.
[0257] A delay branch instruction may be utilized for the purpose
of optimization of the actions of the C compiler 30 or linkage
editor 32 or as an option. A program is analyzed in order to
analyze an instruction that can be replaced with the branch
instruction. A plurality-of-words instruction or
plurality-of-execution states instruction can also be utilized
effectively as long as it can be allocated to a delay slot
period.
[0258] According to the aforesaid embodiments, the operation and
advantages described below are provided.
[0259] (1) [Bus Mode (FIG. 4)]
[0260] Instruction reading in units of 16 bits and instruction
reading in units of 32 bits can be switched depending on the
settings of internal I/O registers. When the width of a data bus
contained in an external bus is smaller than the width of a bus on
which a built-in ROM is connected, a throughput can be improved
according to a way of using a microcomputer, that is, whether a
program is stored in a built-in ROM alone or an external ROM alone
or whether the built-in ROM or RAM and external ROM are changed for
use. An initial value signifying whichever of instruction reading
in units of 16 bits and instruction reading in units of 32 bits is
designated is determined to signify instruction reading in units of
32 bits. After the microcomputer is reset, whether a program is run
over a 32-bit bus or a 16-bit bus, a processing speed represented
by an initial value will not be lowered.
[0261] Data reading or writing is performed in order to read or
write a given number of bits of data. This obviates the necessity
of logic for selection. When programs are stored in an external ROM
and data is stored in a built-in RAM, a high throughput is
achieved.
[0262] (2) [Bus Mode (FIG. 1)]
[0263] When an inter-module control signal bmode16 is transferred
from the bus controller 3 to the CPU 2, the control signal is
routed to a flip-flop. Thus, a delay a bus command BCMD undergoes
can be inhibited.
[0264] An increment value by which the incrementer INC increments
data is incremented is controlled based on a bus mode. Therefore,
an addressing mode in which an instruction is addressed indirectly
via a post-increment register can be used in common. Consequently,
resources can be utilized effectively and an increase in a logical
scale can be suppressed.
[0265] A branch instruction may direct reading of part of an
instruction code representing the branch instruction itself. In
this case, when instruction reading is performed in units of 16
bits, reading (pre-fetching) of an instruction that is not utilized
can be suppressed. Moreover, an increase in the number of execution
states deriving from insertion of a wait state can be avoided.
[0266] (3) [Delay Branch (FIG. 28 and FIG. 29)]
[0267] When a delay branch takes place, after a branch address is
read, a value produced by incrementing the branch address is held
in the temporary register TR. Thus, even when an instruction code
representing a delay slot instruction is long, instruction reading
can be performed continuously. Moreover, when the delay slot
instruction enters an execution state in which the instruction code
representing the delay slot instruction is read or an execution
state in which an effective address thereof is calculated,
instruction reading is disabled. Thus, reading (pre-fetching) of an
instruction that is not utilized can be suppressed. Moreover, an
increase in the number of execution states deriving from insertion
of a wait state can be avoided. The FIFO buffer FB is not cleared
until the EA extension field of the delay slot instruction or an
output of an instruction buffer is no longer needed. Consequently,
execution of the delay slot instruction can be achieved in a
plurality of execution states. But for a Harvard bus or the like,
the present invention can be applied to various instruction sets or
hardware configurations.
[0268] (4) [Delay Branch (FIG. 6, FIG. 26 to FIG. 29)]
[0269] Instruction codes representing a delay branch and a normal
branch are switched based on the least-significant bit of an
instruction code. Thus, an effective address can be calculated in a
common manner.
[0270] (5) [A Plurality of Bit Shifts (FIG. 14)]
[0271] Owing to the employment of the shifter 21 capable of
performing shifts of 1, 2, 4, 8, and 16 bit positions, a shift
whose occurrence frequency is high can be performed in one
execution state. When the shifter 21 is used in combination with
another (connected in series with another or used repeatedly), a
shift of any bit positions can be achieved. Moreover, an increase
in a logical scale can be suppressed.
[0272] The selectors included in the shifters select any of five or
seven options including no shift and a right or left shift of a
certain number of bit positions. The selectors can be realized
easily through logical combination. A selector that selects any of
multiple options often includes many stages of logical gates due to
the logical combination. Therefore, even when the first shifter 27
and second shifter 28 are connected in series with each other, no
critical drawback will occur.
[0273] (6) [A Plurality of Bit Shifts (FIG. 22)]
[0274] A transition to the next instruction and control of the next
bit position to be shifted are performed based on a control signal
sfcnt. This leads to a simplified logical description and improved
use efficiency of resources.
[0275] (7) [Multiplier (FIG. 18)]
[0276] A multiplication instruction that directs acquisition of the
32 high-order bits of a product of 32 bits by 32 bits is used for
division. This contributes to improvement of efficiency in
division.
[0277] The invention of the present inventor is not limited to the
aforesaid embodiment but can be modified in various manners without
a departure from the gist of the invention.
[0278] For brevity's sake, a word is adopted as the length of an
instruction code representing an instruction that directs a
plurality of bit shifts or a multiplication instruction that
directs acquisition of high-order bits. Alternatively, an
instruction may have a length of a plurality of words. A sequence
of actions or the timings thereof should merely be modified
accordingly. In short, a prefix code should merely be appended to
the start of a word, but the sequence of actions or the timings
thereof may not be changed. An instruction coding form, instruction
set, and a combination thereof, that is, a kind of instructions, an
addressing mode, and a combination thereof may be designated
arbitrarily. An instruction format can be changed arbitrarily.
[0279] The number of bits to be set in a program counter, the size
of an address space, the number of areas into which the address
space is divided, and any other structure of a control register can
be determined arbitrarily.
[0280] Instruction reading in units of 16 bits and instruction
reading in units of 32 bits are switched based on the settings of
internal I/O registers. Alternatively, mode control signals (MD0
and MD1) that are mode signals may be used to designate either of
the instruction reading in units of 16 bits and instruction reading
in units of 32 bits. An initial value indicating either of the
instruction reading in units of 16 bits and instruction reading in
units of 32 bits may be able to be determined arbitrarily.
[0281] The storage capacity of the FIFO buffer that is an
instruction buffer is not limited to eight words but may be two
words or more at minimum. If the storage capacity is large, even
when an instruction does not assume an omissible execution state,
an omissible execution state of a subsequent instruction in which a
large number of bits contained in the instruction is treated is
omitted. This leads to a balanced total number of bits contained in
instructions. Even if the storage capacity is enhanced, a read
instruction may be wasted during execution of a branch instruction.
Normally or in a steady state, therefore, the number of bits
contained in an instruction and saved in the FIFO buffer should not
be very large.
[0282] The shifter 21 is not limited to the configuration having
two stages of shifters but may include three or more stages of
shifters. The number of bit positions to be shifted by each shifter
can be set to any value. The number of bit positions to be shifted
may be specified in immediate data contained in an instruction code
or may be designated based on the contents of a general-purpose
register.
[0283] The detailed configuration of a multiplier, the way of
receiving control signals, and the order of changing states may be
modified. Moreover, the width of an external bus may be fixed to a
value smaller than the width of an internal bus. Moreover, the
width of the external bus may be selected from a range of widths
smaller than the width of the internal bus. The selection is
controlled based on the settings of the bus controller 3 by the CPU
2.
[0284] The present invention of the present inventors has been
described on the assumption that the invention is implemented in a
single-chip microcomputer encompassed by the field of utilization
of the invention or the background art of the invention. The
present invention is not limited to the single-chip microcomputer
but may be implemented in various data processors called a data
processor, a microprocessor, a system LSI, or the like. The present
invention can be implemented at least in a system that interprets
an instruction, treats the instruction, and performs
operations.
[0285] The advantages provided by the typical constituent features
of the present invention disclosed in the present applicant will be
described below.
[0286] Namely, from the viewpoint of hardware, fast processing can
be achieved with an operating frequency unchanged. In other words,
the number of execution states required to perform predetermined
processing can be reduced, and fast data processing can be
achieved. This contributes to improvement of use efficiency of
resources in a data processor or a data processing system to which
the data processor is adapted. Consequently, the user-friendliness
of the data processor can be improved, and a program size can be
reduced.
* * * * *