U.S. patent application number 11/461340 was filed with the patent office on 2007-11-29 for instruction folding for a stack-based machine.
This patent application is currently assigned to SUN MICROSYSTEMS, INC.. Invention is credited to James Michael O'Connor, Marc Tremblay.
Application Number | 20070277021 11/461340 |
Document ID | / |
Family ID | 27359254 |
Filed Date | 2007-11-29 |
United States Patent
Application |
20070277021 |
Kind Code |
A1 |
O'Connor; James Michael ; et
al. |
November 29, 2007 |
INSTRUCTION FOLDING FOR A STACK-BASED MACHINE
Abstract
An instruction decoder allows the folding away of JAVA virtual
machine instructions pushing an operand onto the top of a stack
merely as a precursor to a second JAVA virtual machine instruction
which operates on the top of stack operand. Such an instruction
decoder identifies foldable instruction sequences and supplies an
execution unit with a single equivalent folded operation thereby
reducing processing cycles otherwise required for execution of
multiple operations corresponding to the multiple instructions of
the folded instruction sequence. Instruction decoder embodiments
described herein provide for folding of two, three, four, or more
instruction folding. For example, in one instruction decoder
embodiment described herein, two load instructions and a store
instruction can be folded into execution of operation corresponding
to an instruction appearing therebetween in the instruction
sequence.
Inventors: |
O'Connor; James Michael;
(Union City, CA) ; Tremblay; Marc; (Menlo Park,
CA) |
Correspondence
Address: |
GUNNISON MCKAY & HODGSON, LLP
1900 GARDEN ROAD
SUITE 220
MONTEREY
CA
93940
US
|
Assignee: |
SUN MICROSYSTEMS, INC.
4150 Network Circle
Santa Clara
CA
|
Family ID: |
27359254 |
Appl. No.: |
11/461340 |
Filed: |
July 31, 2006 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11096183 |
Mar 30, 2005 |
|
|
|
11461340 |
Jul 31, 2006 |
|
|
|
10346886 |
Jan 17, 2003 |
6950923 |
|
|
11096183 |
Mar 30, 2005 |
|
|
|
08787617 |
Jan 23, 1997 |
6532531 |
|
|
10346886 |
Jan 17, 2003 |
|
|
|
08642253 |
May 2, 1996 |
|
|
|
08787617 |
Jan 23, 1997 |
|
|
|
08647103 |
May 7, 1996 |
|
|
|
08787617 |
Jan 23, 1997 |
|
|
|
60010527 |
Jan 24, 1996 |
|
|
|
Current U.S.
Class: |
712/208 ;
712/E9.023; 712/E9.028; 712/E9.037; 712/E9.055 |
Current CPC
Class: |
G06F 9/30174 20130101;
G06F 9/30021 20130101; G06F 9/30134 20130101; G06F 2212/451
20130101; G06F 9/30145 20130101; G06F 12/0875 20130101; G06F
15/7846 20130101; G06F 9/264 20130101; G06F 9/345 20130101; G06F
9/45504 20130101; G06F 9/30167 20130101; G06F 9/30181 20130101;
G06F 9/449 20180201; G06F 9/3802 20130101; G06F 9/3017 20130101;
G06F 9/45516 20130101 |
Class at
Publication: |
712/208 |
International
Class: |
G06F 9/30 20060101
G06F009/30 |
Claims
1-56. (canceled)
57. An apparatus comprising: an execution unit operable to
manipulate data residing in register storage; and an instruction
decoder operable to decode plural successive stack-based
instructions and to cause the execution unit to perform a single,
register-based operation that corresponds to the plural decoded
stack-based instructions.
58. The apparatus of claim 57, wherein the single, register-based
operation explicitly identifies a result location in the register
storage and at least two source locations in the register
storage.
59. The apparatus of claim 57, wherein the instruction decoder is
further operable to select plural entries of the register storage,
the selected register storage entries corresponding to a subset of
stack and local variable storage locations explicitly and
implicitly targeted by the plural decoded stack-based
instructions.
60. The apparatus of claim 57, further comprising: stack and local
variable portions of the register storage, wherein at least one of
the plural decoded stack-based instructions defines an information
transfer from a local variable to an top position of the stack.
61. The apparatus of claim 57, wherein the instruction decoder is
part of an instruction folding just-in-time (JIT) compiler.
62. The apparatus of claim 57, wherein the instruction decoder is
part of an instruction folding bytecode interpreter that implements
a virtual machine.
63. The apparatus of claim 57, wherein the instruction decoder
includes instruction folding hardware.
64. The apparatus of claim 57, further comprising: a just-in-time
(JIT) compiler executable to produce object code native to the
execution unit.
65. A method comprising: fetching a sequence of stack-based
instructions; decoding plural successive stack-based instructions;
and causing an execution unit to perform a single, register-based
operation that corresponds to the plural decoded stack-based
instructions.
66. The method of claim 65, wherein the decoding is performed in a
just-in-time (JIT) compiler to produce object code native to a
processor that includes the execution unit.
67. The method of claim 65, wherein the decoding is performed using
an instruction folding just-in-time (JIT) compiler.
68 . The method of claim 65, wherein the decoding is performed
using an instruction folding bytecode interpreter.
69. The method of claim 65, wherein the decoding is performed using
instruction folding hardware.
70. The method of claim 65, further comprising: selecting plural
entries of register storage, the selected register storage entries
corresponding to a subset of stack and local variable storage
locations explicitly and implicitly targeted by the plural decoded
stack-based instructions.
71. A computational system comprising: an instruction source;
register storage; a processor; and means for receiving stack-based
instructions from the instruction source and for causing the
processor to operate upon data residing in the register storage
using explicit identifiers for source and result locations in the
register storage, the means translating plural successive ones of
the stack-based instructions into a single corresponding
register-based operation.
72. The computational system of claim 71, wherein the means for
translating includes means for compiling the stack-based
instructions to produce object code native to the processor.
73. The computational system of claim 71, wherein the means for
translating includes means for folding instructions using a
just-in-time (JIT) compiler.
74. The computational system of claim 71, wherein the means for
translating includes means for folding instructions using a
bytecode interpreter.
75. The computational system of claim 71, wherein the means for
translating includes means for folding instructions using a
hardware decoder.
Description
RELATED APPLICATIONS
[0001] This application is a continuation of application Ser. No.
11/096,183, filed Mar. 30, 2005, which is itself a continuation of
application Ser. No. 10/346,886, filed Jan. 17, 2003, now U.S. Pat.
No. 6,950,923, which is in turn a continuation of application Ser.
No. 08/787,617, filed Jan. 23, 1997, now U.S. Pat. No. 6,532,531.
application Ser. No. 08/787,617 is itself a continuation-in-part of
application Ser. No. 08/647,103, filed May 7, 1996 and a
continuation-in-part of application Ser. No. 08/642,253, filed May
2, 1996, both now abandoned. application Ser. Nos. 08/787,617,
08/647,103 and 08/642,253 each claim benefit of U.S. Provisional
Application No. 60/010,527, filed Jan. 24, 1996.
[0002] The present application, as well as each of successive
priority application Ser. Nos. 11/096,183, 10/346,886 and
08/787,617, incorporate by reference the entirety of application
Ser. No. 08/786,351, filed Jan. 23, 1997, now U.S. Pat. No.
6,026,485.
REFERENCE SECTION I
[0003] A portion of the disclosure of this patent document
including Section I, The JAVA Virtual Machine Specification and
Section A thereto, contains material which is subject to copyright
protection. The copyright owner has no objection to the facsimile
reproduction by anyone of the patent document or the patent
disclosure, as it appears in the U.S. Patent and Trademark Office
patent files or records, but otherwise reserves all copyright
rights whatsoever.
BACKGROUND OF THE INVENTION
[0004] 1. Field of the Invention
[0005] The present invention relates to instruction decoders for a
stack machine, and in particular, to methods and apparati for
folding a sequence of multiple instructions into a single folded
operation.
[0006] 2. Discussion of Related Art
[0007] Many individuals and organizations in the computer and
communications industries tout the Internet as the fastest growing
market on the planet. In the 1990s, the number of users of the
Internet appears to be growing exponentially with no end in sight.
In June of 1995, an estimated 6,642,000 hosts were connected to the
Internet; this represented an increase from an estimated 4,852,000
hosts in January, 1995. The number of hosts appears to be growing
at around 75% per year. Among the hosts, there were approximately
120,000 networks and over 27,000 web servers. The number of web
servers appears to be approximately doubling every 53 days.
[0008] In July 1995, with over 1,000,000 active Internet users,
over 12,505 usenet news groups, and over 10,000,000 usenet readers,
the Internet appears to be destined to explode into a very large
market for a wide variety of information and multimedia
services.
[0009] In addition, to the public carrier network or Internet, many
corporations and other businesses are shifting their internal
information systems onto an intranet as a way of more effectively
sharing information within a corporate or private network. The
basic infrastructure for an intranet is an internal network
connecting servers and desktops, which may or may not be connected
to the Internet through a firewall. These intranets provide
services to desktops via standard open network protocols which are
well established in the industry. Intranets provide many benefits
to the enterprises which employ them, such as simplified internal
information management and improved internal communication using
the browser paradigm. Integrating Internet technologies with a
company's enterprise infrastructure and legacy systems also
leverages existing technology investment for the party employing an
intranet. As discussed above, intranets and the Internet are
closely related, with intranets being used for internal and secure
communications within the business and the Internet being used for
external transactions between the business and the outside world.
For the purposes of this document, the term "networks" includes
both the Internet and intranets. However, the distinction between
the Internet and an intranet should be born in mind where
applicable.
[0010] In 1990, programmers at Sun Microsystems wrote a universal
programming language. This language was eventually named the JAVA
programming language. (JAVA is a trademark of Sun Microsystems of
Mountain View, CA.) The JAVA programming language resulted from
programming efforts which initially were intended to be coded in
the C++ programming language; therefore, the JAVA programming
language has many commonalities with the C++ programming language.
However, the JAVA programming language is a simple,
object-oriented, distributed, interpreted yet high performance,
robust yet safe, secure, dynamic, architecture neutral, portable,
and multi-threaded language.
[0011] The JAVA programming language has emerged as the programming
language of choice for the Internet as many large hardware and
software companies have licensed it from Sun Microsystems. The JAVA
programming language and environment is designed to solve a number
of problems in modern programming practice. The JAVA programming
language omits many rarely used, poorly understood, and confusing
features of the C++ programming language. These omitted features
primarily consist of operator overloading, multiple inheritance,
and extensive automatic coercions. The JAVA programming language
includes automatic garbage collection that simplifies the task of
programming because it is no longer necessary to allocated and free
memory as in the C programming language. The JAVA programming
language restricts the use of pointers as defined in the C
programming language, and instead has true arrays in which array
bounds are explicitly checked, thereby eliminating vulnerability to
many viruses and nasty bugs. The JAVA programming language includes
objective-C interfaces and specific exception handlers.
[0012] The JAVA programming language has an extensive library of
routines for coping easily with TCP/IP protocol (Transmission
Control Protocol based on Internet protocol), HTTP (Hypertext
Transfer Protocol) and FTP (File Transfer Protocol). The JAVA
programming language is intended to be used in
networked/distributed environments. The JAVA programming language
enabled the construction of virus-free, tamper-free systems. The
authentication techniques are based on public-key encryption.
[0013] Many computing systems, including those implementing the
JAVA virtual machine, can execute multiple methods each of which
has a method frame. Typically, method invocation significantly
impacts the performance of the computing system due to the
excessive number of memory accesses method invocation requires.
Therefore, a method and memory architecture targeted to reduce the
latency caused by method invocation is desirable.
SUMMARY OF THE INVENTION
[0014] A JAVA virtual machine is an stack-oriented abstract
computing machine, which like a physical computing machine has an
instruction set and uses various storage areas. A JAVA virtual
machine need not understand the JAVA programming language; instead
it understands a class file format. A class file includes JAVA
virtual machine instructions (or bytecodes) and a symbol table, as
well as other ancillary information. Programs written in the JAVA
programming language (or in other languages) may be compiled to
produce a sequence of JAVA virtual machine instructions.
[0015] Typically, in a stack-oriented machine, instructions
typically operate on data at the top of an operand stack. One or
more first instructions, such as a load from local variable
instruction, are executed to push operand data onto the operand
stack as a precursor to execution of an instruction which
immediately follows such instruction(s). The instruction which
follows, e.g., an add operation, pops operand data from the top of
the stack, operates on the operand data, and pushes a result onto
the operand stack, replacing the operand data at the top of the
operand stack.
[0016] A suitably configured instruction decoder allows the folding
away of instructions pushing an operand onto the top of a stack
merely as a precursor to a second instruction which operates on the
top of stack operand. The instruction decoder identifies foldable
instruction sequences (typically 2, 3, or 4 instructions) and
supplies an execution unit with an equivalent folded operation
(typically a single operation) thereby reducing processing cycles
otherwise required for execution of multiple operations
corresponding to the multiple instructions of the folded
instruction sequence. Using an instruction decoder in accordance
with the present invention, multiple load instructions and a store
instruction can be folded into execution of an instruction
appearing therebetween in the instruction sequence. For example, an
instruction sequence including a pair of load instructions (for
loading integer operands from local variables to the top of stack),
an add instruction (for popping the integer operands of the stack,
adding them, and placing the result at the top of stack), and an
store instruction (for popping the result from the stack and
storing the result in a local variable) can be folded into a single
equivalent operation specifying source and destination addresses in
stack and local variable storage which are randomly accessible.
[0017] In accordance with an embodiment of the present invention,
an apparatus includes an instruction store, an operand stack, a
data store, an execution unit, and an instruction decoder. The
instruction decoder is coupled to the instruction store to identify
a foldable sequence of instructions represented therein. The
foldable sequence includes first and second instructions, in which
the first instruction is for pushing a first operand value onto the
operand stack from the data store merely as a first source operand
for a second instruction. The instruction decoder coupled to supply
the execution unit with a single folded operation equivalent to the
foldable sequence and including a first operand address identifier
selective for the first operand value in the data store, thereby
obviating an explicit operation corresponding to the first
instruction.
[0018] In a further embodiment, if the sequence of instructions
represented in the instruction buffer is not a foldable sequence,
the instruction decoder supplies the execution unit with an
operation identifier and operand address identifier corresponding
to the first instruction only.
[0019] In another further embodiment, the instruction decoder
further identifies a third instruction in the foldable sequence.
This third instruction is for pushing a second operand value onto
the operand stack from the data store merely as a second source
operand for the second instruction. The single folded operation is
equivalent to the foldable sequence and includes a second operand
address identifier selective for the second operand value in the
data store, thereby obviating an explicit operation corresponding
to the third instruction.
[0020] In yet another further embodiment, the instruction decoder
further identifies a fourth instruction in the foldable sequence.
This fourth instruction is for popping a result of the second
instruction from the operand stack and storing the result in a
result location of the data store. The single folded operation is
equivalent to the foldable sequence and includes a destination
address identifier selective for the result location in the data
store, thereby obviating an explicit operation corresponding to the
fourth instruction.
[0021] In still yet another further embodiment, the instruction
decoder includes normal and folded decode paths and switching
means. The switching means are responsive to the folded decode path
for selecting operation, operand, and destination identifiers from
the folded decode path in response to a fold indication therefrom,
and for otherwise selecting operation, operand, and destination
identifiers from the normal decode path.
[0022] In various further alternative embodiments, the apparatus is
for a virtual machine instruction processor wherein instructions
generally source operands from, and target a result to, uppermost
entries of an operand stack. In one such alternative embodiment,
the virtual machine instruction processor is a hardware virtual
machine instruction processor and the instruction decoder includes
decode logic. In another, the virtual machine instruction processor
includes a just-in-time compiler implementation and the instruction
decoder includes software executable on a hardware processor. The
hardware processor includes the execution unit. In yet another, the
virtual machine instruction processor includes a bytecode
interpreter implementation and the instruction decoder including
software executable on a hardware processor. The hardware processor
includes the execution unit.
[0023] In accordance with another embodiment of the present
invention, a method includes (a) determining if a first instruction
of a virtual machine instruction sequence is an instruction for
pushing a first operand value onto the operand stack from a data
store merely as a first source operand for a second instruction;
and if the result of the (a) determining is affirmative, supplying
an execution unit with a single folded operation equivalent to a
foldable sequence comprising the first and second instructions. The
single folded operation includes a first operand identifier
selective for the first operand value, thereby obviating an
explicit operation corresponding to the first instruction.
[0024] In a further embodiment, the method includes supplying, if
the result of the (a) determining is negative, the execution unit
with an operation equivalent to the first instruction in the
virtual machine instruction sequence.
[0025] In another further embodiment, the method includes (b)
determining if a third instruction of the virtual machine
instruction sequence is an instruction for popping a result value
of the second instruction from the operand stack and storing the
result value in a result location of the data store and, if the
result of the (b) determining is affirmative, further including a
result identifier selective for the result location with the
equivalent single folded operation, thereby further obviating an
explicit operation corresponding to the third instruction. In a
further embodiment, the method includes including, if the result of
the (b) determining is negative, a result identifier selective for
a top location of the operand stack with the equivalent single
folded operation. In certain embodiments, the (a) determining and
the (b) determining are performed substantially in parallel.
[0026] In accordance with yet another embodiment of the present
invention, a stack-based virtual machine implementation includes a
randomly-accessible operand stack representation, a
randomly-accessible local variable storage representation, and a
virtual machine instruction decoder for selectively decoding
virtual machine instructions and folding together a selected
sequence thereof to eliminate unnecessary temporary storage of
operands on the operand stack.
[0027] In various alternative embodiments, the stack-based virtual
machine implementation (1) is a hardware virtual machine
instruction processor including a hardware stack cache, a hardware
instruction decoder, and an execution unit or (2) includes software
encoded in a computer readable medium and executable on a hardware
processor. In the hardware virtual machine instruction processor
embodiment, (a) the randomly-accessible operand stack local
variable storage representations at least partially reside in the
hardware stack cache, and (b) the virtual machine instruction
decoder includes the hardware instruction decoder coupled to
provide the execution unit with opcode, operand, and result
identifiers respectively selective for a hardware virtual machine
instruction processor operation and for locations in the hardware
stack cache as a single hardware virtual machine instruction
processor operation equivalent to the selected sequence of virtual
machine instructions. In the software embodiment, (a) the
randomly-accessible operand stack local variable storage
representations at least partially reside in registers of the
hardware processor, (b) the virtual machine instruction decoder is
at least partially implemented in the software, and (c) the virtual
machine instruction decoder is coupled to provide opcode, operand,
and result identifiers respectively selective for a hardware
processor operation and for locations in the registers as a single
hardware processor operation equivalent to the selected sequence of
virtual machine instructions.
[0028] In accordance with still yet another embodiment of the
present invention, a hardware virtual machine instruction decoder
includes a normal decode path, a fold decode path, and switching
means. The fold decode path is for decoding a sequence of virtual
machine instructions and, if the sequence is foldable, supplying
(a) a single operation identifier, (b) one or more operand
identifiers; and (c) a destination identifier, which are together
equivalent to the sequence of virtual machine instructions. The
switching means is responsive to the folded decode path for
selecting operation, operand, and destination identifiers from the
folded decode path in response to a fold indication therefrom, and
otherwise selecting operation, operand, and destination identifiers
from the normal decode path.
BRIEF DESCRIPTION OF THE DRAWINGS
[0029] FIG. 1 (depicted as FIGS. 1A and 1B) is a block diagram of
one embodiment of a virtual machine hardware processor that
includes an instruction decoder for providing instruction folding
in accordance with some embodiments of the present invention.
[0030] FIG. 2 is an process flow diagram for generation of virtual
machine instructions that are used in one embodiment of this
invention.
[0031] FIG. 3 illustrates an instruction pipeline implemented in
the hardware processor of FIG. 1.
[0032] FIG. 4A is an illustration of the one embodiment of the
logical organization of a stack structure where each method frame
includes a local variable storage area, an environment storage
area, and an operand stack utilized by the hardware processor of
FIG. 1.
[0033] FIG. 4B is an illustration of an alternative embodiment of
the logical organization of a stack structure where each method
frame includes a local variable storage area and an operand stack
on the stack, and an environment storage area for the method frame
is included on a separate execution environment stack.
[0034] FIG. 4C is an illustration of an alternative embodiment of
the stack management unit for the stack and execution environment
stack of FIG. 4B.
[0035] FIG. 4D is an illustration of one embodiment of the local
variables look-aside cache in the stack management unit of FIG.
1.
[0036] FIG. 5 illustrates several possible add-one to the hardware
processor of FIG. 1.
[0037] FIG. 6 illustrates a block diagram of one embodiment of a
stack cache management unit in accordance with this invention.
[0038] FIG. 7 illustrates the memory architecture of one embodiment
of a stack cache in accordance with this invention.
[0039] FIG. 8 illustrates the contents of a register or memory
location of one embodiment of a stack cache in accordance with this
invention.
[0040] FIG. 9 illustrates a block diagram of one embodiment of a
dribble manager unit in accordance with this invention.
[0041] FIG. 10A illustrates a block diagram of another embodiment
of a dribble manager unit in accordance with this invention.
[0042] FIG. 10B illustrates a block, diagram of another embodiment
of a dribble manager unit in accordance with this invention.
[0043] FIG. 11 illustrates a block diagram of a portion of an
embodiment of a dribble manager unit in accordance with this
invention.
[0044] FIG. 12 illustrates an pointer generation circuit for one
embodiment of a stack cache in accordance with this invention.
[0045] FIG. 13 is an illustration, in the context of a stack data
structure, of data flows associated with a pair of stack
instructions, wherein the first stack instruction pushes a data
item onto the top of the stack only to be consumed by the second
stack instruction which pops the top two stack entries off the
stack and pushes their sum onto the top of the stack.
[0046] FIG. 14 is a contrasting illustration of folded execution of
first and second stack instructions such as those depicted in FIG.
13, wherein the first (push data item onto the top of the stack)
operation is obviated in accordance with an exemplary embodiment of
the present invention.
[0047] FIG. 15 is a block diagram depicting relationships between
operand stack, local variable storage, and constant pool portions
of memory storage together with register variables for access
thereof in accordance with an exemplary embodiment of the present
invention.
[0048] FIGS. 16A-D illustrate an iload (integer load)/iadd (integer
add)/istore (integer store) instruction sequence operating on an
operand stack and local variable storage. FIGS. 16A, 16B, 16C, and
16D, respectively depict operand stack contents before iload
instructions, after iload instructions but before an iadd
instruction, after the iadd instruction but before an istore
instruction, and after the istore instruction. Intermediate stages
depicted in FIGS. 16B and 16C are eliminated by instruction folding
in accordance with an exemplary embodiment of the present
invention.
[0049] FIGS. 17A, 17B and 17C illustrate an aload (object reference
load)/arraylength (integer add) instruction sequence operating on
the operand stack and local variable storage. FIGS. 17A, 17B, and
17C, respectively depict operand stack contents before the load
instruction, after the aload instruction but before an arraylength
instruction (without instruction folding), and after the
arraylength instruction. The intermediate stage depicted in FIG.
17B is eliminated by instruction folding in accordance with an
exemplary embodiment of the present invention.
[0050] FIG. 18 is a functional block diagram of a stack based
processor including an instruction decoder providing instruction
folding in accordance with an exemplary embodiment of the present
invention.
[0051] FIG. 19 is a functional block diagram depicting an
instruction decoder in accordance with an exemplary embodiment of
the present invention and coupled to supply an execution unit with
a folded operation, with operand addresses into an operand stack,
local variable storage or a constant pool, and with a destination
address into the operand stack or local variable storage, wherein
the single operation and addresses supplied are equivalent to a
sequence of unfolded instructions.
[0052] FIG. 20 is a functional block diagram of an instruction
decoder supporting instruction folding in accordance with an
exemplary embodiment of the present invention.
[0053] FIG. 21 is a functional block diagram of a fold decode
portion of an instruction decoder supporting instruction folding in
accordance with an exemplary embodiment of the present
invention.
[0054] FIG. 22 is a flow chart depicting an exemplary sequence of
operations for identifying a foldable instruction sequence in
accordance with an exemplary embodiment of the present
invention.
[0055] FIG. 23 is a functional block diagram depicting component
operand and destination address generators of a fold address
generator in accordance with an exemplary embodiment of the present
invention.
[0056] FIG. 24 is a functional block diagram depicting an exemplary
structure for an operand address generator in accordance with an
exemplary embodiment of the present invention.
[0057] These and other features and advantages of the present
invention will be apparent from the Figures, as explained in the
Detailed Description of the Invention. Like or similar features are
designated by the same reference numeral(a) throughout the drawings
and the Detailed Description of the Invention.
DETAILED DESCRIPTION OF THE INVENTION
[0058] FIG. 1 illustrates one embodiment of a virtual machine
instruction hardware processor 100, hereinafter hardware processor
100, that includes an instruction decoder 135 for folding a
sequence of multiple instructions into a single folded operation in
accordance with some embodiments of the present invention, and that
directly executes virtual machine instructions that are processor
architecture independent. The performance of hardware processor 100
in executing JAVA virtual machine instructions is much better than
high-end CPUs, such as the Intel PENTIUM microprocessor or the Sun
Microsystems ULTRASPARC processor, (ULTRASPARC is a trademark of
Sun Microsystems of Mountain View, Calif.), and PENTIUM is a
trademark of Intel Corp. of Sunnyvale, Calif.) interpreting the
same virtual machine instructions with a software JAVA interpreter,
or with a JAVA just-in-time compiler; is low cost; and exhibits low
power consumption. As a result, hardware processor 100 is well
suited for portable applications. Hardware processor 100 provides
similar advantages for other virtual machine stack-based
architectures as well as for virtual machines utilizing features
such as garbage collection, thread synchronization, etc.
[0059] In view of these characteristics, a system based on hardware
processor 100 presents attractive price for performance
characteristics, if not the best overall performance, as compared
with alternative virtual machine execution environments including
software interpreters and just-in-time compilers. Nonetheless, the
present invention is not limited to virtual machine hardware
processor embodiments, and encompasses any suitable stack-based, or
non-stack-based machine implementations, including implementations
emulating the JAVA virtual machine as a software interpreter,
compiling JAVA virtual machine instructions (either in batch or
just-in-time) to machine instruction native to a particular
hardware processor, or providing hardware implementing the JAVA
virtual machine in microcode, directly in silicon, or in some
combination thereof.
[0060] Regarding price for performance characteristics, hardware
processor 100 has the advantage that the 250 Kilobytes to 500
Kilobytes (Kbytes) of memory storage, e.g., read-only memory or
random access memory, typically required by a software interpreter,
is eliminated.
[0061] A simulation of hardware processor 100 showed that hardware
processor 100 executes virtual machine instructions twenty times
faster than a software interpreter running on a variety of
applications on a PENTIUM processor clocked at the same clock rate
as hardware processor 100, and executing the same virtual machine
instructions. Another simulation of hardware processor 100 showed
that hardware processor 100 executes virtual machine instructions
five times faster than a just-in-time compiler running on a PENTIUM
processor running at the same clock rate as hardware processor 100,
and executing the same virtual machine instructions.
[0062] In environments in which the expense of the memory required
for a software virtual machine instruction interpreter is
prohibitive, hardware processor 100 is advantageous. These
applications include, for example, an Internet chip for network
appliances, a cellular telephone processor, other
telecommunications integrated circuits, or other low-power,
low-cost applications such as embedded processors, and portable
devices.
[0063] The present invention increases the speed of method
invocation by using an execution environment memory 440 in
conjunction with stack 400B. The execution environment of various
method calls are stored in execution environment memory 440 while
the operands, variables and parameters of the method calls are
stored in stack 400B. Both execution environment memory 440 and
stack 400B can include a stack management unit 150 that utilizes a
stack cache 155 to accelerate data transfers for execution unit
140. Although, stack management unit 150 can be an integral part of
hardware processor 100 as shown in FIG. 1, many embodiments of
stack management unit 150 are not integrated with a hardware
processor since stack management in accordance with the present
invention can be adapted for use with any stack-based computing
system. In one embodiment, stack management unit 150 includes a
stack cache 155, a dribble manager unit 151, and a stack control
unit 152. When hardware processor 100 is pushing data onto stack
400 (FIG. 4(a)) and stack cache 155 is almost full, dribble manager
unit 151 transfers data from the bottom of stack cache 155 to stack
400 through data cache unit 160, so that the top portion of stack
400 remains in stack cache 155. When hardware processor 100 is
popping data off of stack 400 and stack cache 155 is almost empty,
dribble manager unit 151 transfers data from stack 400 to the
bottom of stack cache 155 so that the top portion of stack 400 is
maintained in stack cache 155.
[0064] Instruction decoder 135, as described herein, allows the
folding away of JAVA virtual machine instructions pushing an
operand onto the top of a stack merely as a precursor to a second
JAVA virtual machine instruction which operates on the top of stack
operand. Such an instruction decoder identifies foldable
instruction sequences and supplies an execution unit with a single
equivalent folded operation thereby reducing processing cycles
otherwise required for execution of multiple operations
corresponding to the multiple instructions of the folded
instruction sequence. Instruction decoder embodiments described
herein provide for folding of two, three, four, or more instruction
folding. For example, in one instruction decoder embodiment
described herein, two load instructions and a store instruction can
be folded into execution of operation corresponding to an
instruction appearing therebetween in the instruction sequence.
[0065] As used herein, a virtual machine is an abstract computing
machine that, like a real computing machine, has an instruction set
and uses various memory areas. A virtual machine specification
defines a set of processor architecture independent virtual machine
instructions that are executed by a virtual machine implementation,
e.g., hardware processor 100. Each virtual machine instruction
defines a specific operation that is to be performed. The virtual
computing machine need not understand the computer language that is
used to generate virtual machine instructions or the underlying
implementation of the virtual machine. Only a particular file
format for virtual machine instructions needs to be understood.
[0066] In an exemplary embodiment, the-virtual machine instructions
are JAVA virtual machine instructions. Each JAVA virtual machine
instruction includes one or more bytes that encode instruction
identifying information, operands, and any other required
information. Section I, which is incorporated herein by reference
in its entirety, includes an illustrative set of the JAVA virtual
machine instructions. The particular set of virtual machine
instructions utilized is not an essential aspect of this invention.
In view of the virtual machine instructions in Section I and this
disclosure, those of skill in the art can modify the invention for
a particular set of virtual machine instructions, or for changes to
the JAVA virtual machine specification.
[0067] A JAVA compiler JAVAC, (FIG. 2) that is executing on a
computer platform, converts an application 201 written in the JAVA
computer language to an architecture neutral object file format
encoding a compiled instruction sequence 203, according to the JAVA
Virtual Machine Specification, that includes a compiled instruction
set. However, for this invention, only a source of virtual machine
instructions and related information is needed. The method or
technique used to generate the source of virtual machine
instructions and related information is not essential to this
invention.
[0068] Compiled instruction sequence 203 is executable on hardware
processor 100 as well as on any computer platform that implements
the JAVA virtual machine using, for example, a software interpreter
or just-in-time compiler. However, as described above, hardware
processor 100 provides significant performance advantages over the
software implementations.
[0069] In this embodiment, hardware processor 100 (FIG. 1)
processes the JAVA virtual machine instructions, which include
bytecodes. Hardware processor 100, as explained more completely
below, executes directly most of the bytecodes. However, execution
of some of the bytecodes is implemented via microcode.
[0070] One strategy for selecting virtual machine instructions that
are executed directly by hardware processor 100 is described herein
by way of an example. Thirty percent of the JAVA virtual machine
instructions are pure hardware translations; instructions
implemented in this manner include constant loading and simple
stack operations. The next 50% of the virtual machine instructions
are implemented mostly, but not entirely, in hardware and require
some firmware assistance; these include stack based operations and
array instructions. The next 10% of the JAVA virtual machine
instructions are implemented in hardware, but require significant
firmware support as well; these include function invocation and
function return. The remaining 10% of the JAVA virtual machine
instructions are not supported in hardware, but rather are
supported by a firmware trap and/or microcode; these include
functions such as exception handlers. Herein, firmware means
microcode stored in ROM that when executed controls the operations
of hardware processor 100.
[0071] In one embodiment, hardware-processor 100 includes an I/O
bus and memory interface unit 110, an instruction cache unit 120
including instruction cache 125, an instruction decode unit 130, a
unified execution unit 140, a stack management unit 150 including
stack cache 155, a data cache unit 160 including a data cache 165,
and program counter and trap control logic 170. Each of these units
is described more completely below.
[0072] Also, as illustrated in FIG. 1, each unit includes several
elements. For clarity and to avoid distracting from the invention,
the interconnections between elements within a unit are not shown
in FIG. 1, However, in view of the following description, those of
skill in the art will understand the interconnections and
cooperation between the elements in a unit and between the various
units.
[0073] The pipeline stages implemented using the units illustrated
in FIG. 1 include fetch, decode, execute, and write-back stages. If
desired, extra stages for memory access or exception resolution are
provided in hardware processor 100.
[0074] FIG. 3 is an illustration of a four stage pipeline for
execution of instructions in the exemplary embodiment of processor
100. In fetch stage 301, a virtual machine instruction is fetched
and placed in instruction buffer 124 (FIG. 1). The virtual machine
instruction is fetched from one of (i) a fixed size cache line from
instruction-cache 125 or (ii) microcode ROM 141 in execution unit
140.
[0075] With regard to fetching, aside from instructions tableswitch
and lookupswitch, (see Section I.) each virtual machine instruction
is between one and five bytes long. Thus, to keep things simple, at
least forty bits are required to guarantee that all of a given
instruction is contained in the fetch.
[0076] Another alternative is to always fetch a predetermined
number of bytes, for example, four bytes, starting with the opcode.
This is sufficient for 95% of JAVA virtual machine instructions
(See Section I). For an instruction requiring more than three bytes
of operands, another cycle in the front end must be tolerated if
four bytes are fetched. In this case, the instruction execution can
be started with the first operands fetched even if the full set of
operands is not yet available.
[0077] In decode stage 302 (FIG. 3), the virtual machine
instruction at the front of instruction buffer 124 (FIG. 1) is
decoded and instruction folding is performed if possible. Stack
cache 155 is accessed only if needed by the virtual machine
instruction. Register OPTOP, that contains a pointer OPTOP to a top
of a stack 400 (FIG. 4), is also updated in decode stage 302 (FIG.
3).
[0078] Herein, for convenience, the value in a register and the
register are assigned the same reference numeral. Further, in the
following discussion, use of a register to store a pointer is
illustrative only of one embodiment. Depending on the specific
implementation of the invention, the pointer may be implemented
using hardware register, a hardware counter, a software counter, a
software pointer, or other equivalent embodiments known to those of
skill in the art. The particular implementation selected is not
essential to the invention, and typically is made based on a price
to performance trade-off.
[0079] In execute stage 303, the virtual machine instruction is
executed for one or more cycles. Typically, in execute stage 303,
an ALU in integer unit 142 (FIG. 1) is used either to do an
arithmetic computation or to calculate the address of a load or
store from data cache unit (DCU) 160. If necessary, traps are
prioritized and taken at the end of execute stage 303 (FIG. 3). For
control flow instructions, the branch address is calculated in
execute stage 303, as well as the condition upon which the branch
is dependent.
[0080] Cache stage 304 is a non-pipelined stage. Data cache 165
(FIG. 1) is accessed if needed during execution stage 303 (FIG. 3).
The reason that stage 304 is non-pipelined is because hardware
processor 100 is a stack-based machine. Thus, the instruction
following a load is almost always dependent on the value returned
by the load. Consequently, in this embodiment, the pipeline is held
for one cycle for a data cache access. This reduces the pipeline
stages, and the die area taken by the pipeline for the extra
registers and bypasses.
[0081] Write-back stage 305 is the last stage in the pipeline. In
stage 305, the calculated data is written back to stack cache
155.
[0082] Hardware processor 100, in this embodiment, directly
implements a stack 400 (FIG. 4A) that supports the JAVA virtual
machine stack-based architecture (See Section I). Sixty-four
entries on stack 400 are contained on stack cache 155 in stack
management unit 150. Some entries in stack 400 may be duplicated on
stack cache 155. Operations on data are performed through stack
cache 155.
[0083] Stack 400 of hardware processor 100 is primarily used as a
repository of information for methods. At any point in time,
hardware processor 100 is executing a single method. Each method
has memory space, i.e., a method frame on stack 400, allocated for
a set of local variables, an operand stack, and an execution
environment structure.
[0084] A new method frame, e.g., method frame two 410, is allocated
by hardware processor 100 upon a method invocation in execution
stage 303 (FIG. 3) and becomes the current frame, i.e., the frame
of the current method. Current frame 410 (FIG. 4A), as well as the
other method frames, may contain a part of or all of the following
six entities, depending-on various method invoking situations:
[0085] Object reference; [0086] Incoming arguments; [0087] Local
variables; [0088] Invoker's method context; [0089] Operand stack;
and [0090] Return value from method.
[0091] In FIG. 4A, object reference, incoming arguments, and local
variables are included in arguments and local variables area 421.
The invoker's method context is included in execution environment
422, sometimes called frame state, that in turn includes: a return
program counter value 431 that is the address of the virtual
machine instruction, e.g., JAVA opcode, next to the method invoke
instruction; a return frame 432 that is the location of the calling
method's frame; a return constant pool pointer 433 that is a
pointer to the calling method's constant pool table; a current
method vector 434 that is the base address of the current method's
vector table; and a current monitor address 435 that is the address
of the current method's monitor.
[0092] The object reference is an indirect pointer to an
object-storage representing the object being targeted for the
method invocation. JAVA compiler JAVAC (See FIG. 2.) generates an
instruction to push this pointer onto operand stack 423 prior to
generating an invoke instruction. This object reference is
accessible as local variable zero during the execution of the
method. This indirect pointer is not available for a static method
invocation as there is no target-object defined for a static method
invocation.
[0093] The list of incoming arguments transfers information from
the calling method to the invoked method. Like the object
reference, the incoming arguments are pushed onto stack 400 by JAVA
compiler generated instructions and may be accessed as local
variables. JAVA compiler JAVAC (See FIG. 2.) statically generates a
list of arguments for current method 410' (FIG. 4A), and hardware
processor 100 determines the number of arguments from the list.
When the object reference is present in the frame for a non-static
method invocation, the first argument is accessible as local
variable one. For a static method invocation, the first argument
becomes local variable zero.
[0094] For 64-bit arguments, as well as 64-bit entities in general,
the upper 32-bits, i.e., the 32 most significant bits, of a 64-bit
entity are placed on the upper location of stack 400, i.e., pushed
on the stack last. For example, when a 64-bit entity is on the top
of stack 400, the upper 32-bit portion of the 64-bit entity is on
the top of the stack, and the lower 32-bit portion of the 64-bit
entity is in the-storage location immediately adjacent to the top
of stack 400.
[0095] The local variable area on stack 400 (FIG. 4A) for current
method 410 represents temporary variable storage space which is
allocated and remains effective during invocation of method 410.
JAVA compiler JAVAC (FIG. 2) statically determines the required
number of local variables and hardware processor 100 allocates
temporary variable storage space accordingly.
[0096] When a method is executing on hardware processor 100, the
local variables typically reside in stack cache 155 and are
addressed as offsets from pointer VARS (FIGS. 1 and 4A), which
points to the position of the local variable zero. Instructions are
provided to load the values of local variables onto operand stack
423 and store values from operand stack into local variables area
421.
[0097] The information in execution environment 422 includes the
invoker's method context. When a new frame is built for the current
method, hardware processor 100 pushes the invoker's method context
onto newly allocated frame 410, and later utilizes the information
to restore the invoker's method context before returning. Pointer
FRAME.(FIGS. 1 and 4A) is a pointer to the execution environment of
the current method. In the exemplary embodiment, each register in
register set 144 (FIG. 1) is 32-bits wide.
[0098] Operand stack 423 is allocated to support the execution of
the virtual machine instructions within the current method. Program
counter register PC (FIG. 1) contains the address of the next
instruction, e.g., opcode, to be executed. Locations on operand
stack 423 (FIG. 4A) are used to store the operands of virtual
machine instructions, providing both source and target storage
locations for instruction execution. The size of operand stack 423
is statically determined by JAVA compiler JAVAC (FIG. 2) and
hardware processor 100 allocates space for operand stack 423
accordingly. Register OPTOP (FIGS. 1 and 4A) holds a pointer to a
top of operand stack 423.
[0099] The invoked method may return its execution result onto the
invoker's top of stack, so that the invoker can access the return
value with operand stack references. The return value is placed on
the area where an object reference or an argument is pushed before
a method invocation.
[0100] Simulation results on the JAVA virtual machine indicate that
method invocation consumes a significant portion of the execution
time (20-40%). Given this attractive target for accelerating
execution of virtual machine instructions, hardware support for
method invocation is included in hardware processor 100, as
described more completely below.
[0101] The beginning of the stack frame of a newly invoked method,
i.e., the object reference and the arguments passed by the caller,
are already stored on stack 400 since the object reference and the
incoming arguments come from the top of the stack of the caller. As
explained above, following these items on stack 400, the local
variables are loaded and then the execution environment is
loaded.
[0102] One way to speed up this process is for hardware processor
100 to load the execution environment in the background and
indicate what has been loaded so far, e.g., simple one bit
scoreboarding. Hardware processor 100 tries to execute the
bytecodes of the called method as soon as possible, even though
stack 400 is not completely loaded. If accesses are made to
variables already loaded, overlapping of execution with loading of
stack 400 is achieved, otherwise a hardware interlock occurs and
hardware processor 100 just waits for the variable or variables in
the execution to be loaded.
[0103] FIG. 4B illustrates another way to accelerate method
invocation. Instead of storing the entire method frame in stack
400, the execution environment of each method frame is stored
separately from the local variable area and the operand stack of
the method frame. Thus, in this embodiment, stack 400B contains
modified method frames, e.g. modified method frame 410B having only
local variable area 421 and operand stack 423. Execution
environment 422 of the method frame is stored in execution
environment memory 440. Storing the execution environment in
execution environment memory 440 reduces the amount of data in
stack cache 155. Therefore, the size of stack cache 155 can be
reduced. Furthermore, execution environment memory 440 and stack
cache 155 can be accessed simultaneously. Thus, method invocation
can be accelerated by loading or storing the execution environment
in parallel with loading or storing data onto stack 400B.
[0104] In one embodiment of stack management unit 150, the memory
architecture of execution environment memory 440 is also a stack.
As modified method frames are pushed onto stack 400b through stack
cache 155, corresponding execution environments are pushed onto
execution environment memory 440. For example, since modified
method frames 0 to 2, as shown in FIG. 48, are in stack 400B,
execution environments (EE) 0 to 2, respectively, are stored in
execution environment memory circuit 440.
[0105] To further enhance method invocation, an execution
environment cache can be added to improve the speed of saving and
retrieving the execution environment during method invocation. The
architecture described more completely below for stack cache 155,
dribbler manager unit 151, and stack control unit 152 for caching
stack 400, can also be applied to caching execution environment
memory 440.
[0106] FIG. 4C illustrates an embodiment of stack management unit
150 modified to support both stack 400b and execution environment
memory 440. Specifically, the embodiment of stack management unit
150 in FIG. 4C adds an execution environment stack cache 450, an
execution environment dribble manager unit 460, and an execution
environment stack control unit 470. Typically, execution dribble
manager unit 460 transfers an entire execution environment between
execution environment cache 450 and execution environment memory
440 during a spill operation or a fill operation.
I/O Bus and Memory Interface Unit
[0107] I/O bus and memory ,interface unit 110 (FIG. 1), sometimes
called interface unit 110, implements an interface between hardware
processor 100 and a memory hierarchy which in an exemplary
embodiment includes external memory and may optionally include
memory storage and/or interfaces on the same die as hardware
processor 100. In this embodiment, I/O controller 111 interfaces
with external I/O devices and memory controller 112 interfaces with
external memory. Herein, external memory means memory external to
hardware processor 100. However, external memory either may be
included on the same die as hardware processor 100, may be external
to the die containing hardware processor 100, or may include both
on- and off-die portions.
[0108] In another embodiment, requests to I/O devices go through
memory controller 112 which maintains an address map of the entire
system including hardware processor 100. On the memory bus of this
embodiment, hardware processor 100 is the only master and does not
have to arbitrate to use the memory bus.
[0109] Hence, alternatives for the input/output bus that interfaces
with I/O bus and memory interface unit 110 include supporting
memory-mapped schemes, providing direct support for PCI, PCMCIA, or
other standard busses. Fast graphics (w/VIS or other technology)
may optionally be included on the die with hardware processor
100.
[0110] I/O bus and memory interface unit 110 generates read and
write requests to external memory. Specifically, interface unit 110
provides an interface for instruction cache and data cache
controllers 121 and 161 to the external memory. Interface unit 110
includes arbitration logic for internal requests from instruction
cache controller 121 and data cache controller 161 to access
external memory and in response to a request initiates either a
read or a write request on the memory bus to the external memory. A
request from data cache controller 161 is always treated as higher
priority relative to a request from instruction cache controller
121.
[0111] Interface unit 110 provides an acknowledgment signal to the
requesting instruction cache controller 121, or data cache
controller 161 on read cycles so that the requesting controller can
latch the data. On write cycles, the acknowledgment signal from
interface unit 110 is used for flow control so that the requesting
instruction cache controller 121 or data cache controller 161 does
not generate a new request when there is one pending. Interface
unit 110 also handles errors generated on the memory bus to the
external memory.
Instruction Cache Unit
[0112] Instruction cache unit (ICU) 120 (FIG. 1) fetches virtual
machine instructions from instruction cache 125 and provides the
instructions to instruction decode unit 130. In this embodiment,
upon a instruction cache hit, instruction cache controller 121, in
one cycle, transfers an instruction from instruction cache 125 to
instruction buffer 124 where the instruction is held until integer
execution unit IEU, that is described more completely below, is
ready to process the instruction. This separates the rest of
pipeline 300 (FIG. 3) in hardware processor 100 from fetch stage
301. If it is undesirable to incur the complexity of supporting an
instruction-buffer type of arrangement, a temporary one instruction
register is sufficient for most purposes. However, instruction
fetching, caching, and buffering should provide sufficient
instruction bandwidth to support instruction folding as described
below.
[0113] The front end of hardware processor 100 is largely separate
from the rest of hardware processor 100. Ideally, one instruction
per cycle is delivered to the execution pipeline.
[0114] The instructions are aligned on an arbitrary eight-bit
boundary by byte aligner circuit 122 in response to a signal from
instruction decode unit 130. Thus, the front end of hardware
processor 100 efficiently deals with fetching from any byte
position. Also, hardware processor 100 deals with the problems of
instructions that span multiple cache lines of cache 125. In this
case, since the opcode is the first byte, the design is able to
tolerate an extra cycle of fetch latency for the operands. Thus, a
very simple de-coupling between the fetching and execution of the
bytecodes is possible.
[0115] In case of an instruction cache miss, instruction cache
controller 121 generates an external memory request for the missed
instruction to I/O bus and memory interface unit 110. If
instruction buffer 124 is empty, or nearly empty, when there is an
instruction cache miss, instruction decode unit 130 is stalled,
i.e., pipeline 300 is stalled. Specifically, instruction cache
controller 121 generates a stall signal upon a cache miss which is
used along with an instruction buffer empty signal to determine
whether to stall pipeline 300. Instruction cache 125 can be
invalidated to accommodate self-modifying code, e.g., instruction
cache controller 121 can invalidate a particular line in
instruction cache 125.
[0116] Thus, instruction cache controller 121 determines the next
instruction to be fetched, i.e., which instruction in instruction
cache 125 needs to accessed, and generates address, data and
control signals for data and tag RAMs in instruction cache 125. On
a cache hit, four bytes of data are fetched from instruction cache
125 in a single cycle, and a maximum of four bytes can be written
into instruction buffer 124.
[0117] Byte aligner circuit 122 aligns the data out of the
instruction cache RAM and feeds the aligned data to instruction
buffer 124. As explained more completely below, the first two bytes
in instruction buffer 124 are decoded to determine the length of
the virtual machine instruction. Instruction buffer 124 tracks the
valid instructions in the queue and updates the entries, as
explained more completely below.
[0118] Instruction cache controller 121 also provides the data path
and control for handling instruction cache misses. On an
instruction cache miss, instruction cache controller 121 generates
a cache fill request to I/O bus and memory interface unit 110.
[0119] On receiving data from external memory, instruction cache
controller 121 writes the data into instruction cache 125 and the
data are also bypassed into instruction buffer 124. Data are
bypassed to instruction buffer 124 as soon as the data are
available from external memory, and before the completion of the
cache fill.
[0120] Instruction cache controller 121 continues fetching
sequential data until instruction buffer 124 is full or a branch or
trap has taken place. In one embodiment, instruction buffer 124 is
considered full if there are more than eight bytes of valid entries
in buffer 124. Thus, typically, eight bytes of data are written
into instruction cache 125 from external memory in response to the
cache fill request sent to interface unit 110 by instruction cache
unit 120. If there is a branch or trap taken while processing an
instruction cache miss, only after the completion of the miss
processing is the trap or branch executed.
[0121] When an error is generated during an instruction cache fill
transaction, a fault indication is generated and stored into
instruction buffer 124 along with the virtual machine instruction,
i.e., a fault bit is set. The line is not written into instruction
cache 125. Thus, the erroneous cache fill transaction acts like a
non-cacheable transaction except that a fault bit is set. When the
instruction is decoded, a trap is taken.
[0122] Instruction cache controller 121 also services non-cacheable
instruction reads. An instruction cache enable (ICE) bit, in a
processor status register in register set 144, is used to define
whether a load can be cached. If the instruction cache enable bit
is cleared, instruction cache unit 120 treats all loads as
non-cacheable loads. Instruction cache controller 121 issues a
non-cacheable request to interface unit 110 for non-cacheable
instructions. When the data are available on a cache fill bus for
the non-cacheable instruction, the data are bypassed into
instruction buffer 124 and are not written into instruction cache
125.
[0123] In this embodiment, instruction cache 125 is a
direct-mapped, eight-byte line size cache. Instruction cache 125
has a single cycle latency. The cache size is configurable to 0K,
1K, 2K, 4K, 8K and 16K byte sizes where K means kilo. The default
size is 4K bytes. Each line has a cache tag entry associated with
the line. Each cache tag contains a twenty bit address tag field
and one valid bit for the-default 4K byte size.
[0124] Instruction buffer 124, which, in an exemplary embodiment,
is a twelve-byte deep first-in, first-out (FIFO) buffer, de-links
fetch stage 301 (FIG. 3) from the rest of pipeline 300 for
performance reasons. Each instruction in buffer 124 (FIG. 1) has an
associated valid bit and an error bit. When the valid bit is set,
the instruction associated with that valid bit is a valid
instruction. When the error bit is set, the fetch of the
instruction associated with that error bit was an erroneous
transaction. Instruction buffer 124 includes an instruction buffer
control circuit (not shown) that generates signals to pass data to
and from instruction buffer 124 and that keeps track of the valid
entries in instruction buffer 124, i.e., those with valid bits
set.
[0125] In an exemplary embodiment, four bytes can be received into
instruction buffer 124 in a given cycle. Up to five bytes,
representing up to two virtual machine instructions, can be read
out of instruction buffer 124 in a given cycle. Alternative
embodiments, particularly those providing folding of multi-byte
virtual machine instructions and/or those providing folding of more
than two virtual machine instructions, provide higher input and
output bandwidth. Persons of ordinary skill in the art will
recognize a variety of suitable instruction buffer designs
including, for example, alignment logic, circular buffer design,
etc. When a branch or trap is taken, all the entries in instruction
buffer 124 are nullified and the branch/trap data moves to the top
of instruction buffer 124.
[0126] In the embodiment of FIG. 1, a unified execution unit 140 is
shown. However, in another embodiment, instruction decode unit 130,
integer unit 142, and stack management unit 150 are considered a
single integer execution unit, and floating point execution unit
143 is a separate optional unit. In still other embodiments, the
various elements in the execution unit may be implemented using the
execution unit of another processor. In general, the various
elements included in the various units of FIG. 1 are exemplary only
of one embodiment. Each unit could be implemented with all or some
of the elements shown. Again, the decision is largely dependent
upon a price vs. performance trade-off.
Instruction Decode Unit
[0127] As explained above, virtual machine instructions are decoded
in decode stage 302 (FIG. 3) of 35 pipeline 300. In an exemplary
embodiment, two bytes, that can correspond to two virtual machine
instructions, are fetched from instruction buffer 124 (FIG. 1). The
two bytes are decoded in parallel to determine if the two bytes
correspond to two virtual machine instructions, e.g., a first load
top of stack, instruction and a second add top two stack entries
instruction, that can be folded into a single equivalent operation.
Folding refers to supplying a single equivalent operation
corresponding to two or more virtual machine instructions.
[0128] In an exemplary hardware processor 100 embodiment, a
single-byte first instruction can be folded with a second
instruction. However, alternative embodiments provide folding of
more than two virtual machine instructions, e.g., two to four
virtual machine instructions, and of multi-byte virtual machine
instructions, though at the cost of instruction decoder complexity
and increased instruction bandwidth. See U.S. patent application
Ser. No. 08/786,351, entitled "INSTRUCTION FOLDING FOR A
STACK-BASED MACHINE" naming Marc Tremblay and James Michael
O'Connor as inventors, assigned to the assignee of this
application, and filed on Jan. 23, 1997 with Attorney Docket No.
SP2036, now U.S. Pat. No. 6,026,485, which is incorporated herein
by reference in its entirety. In the exemplary processor 100
embodiment, if the first byte, which corresponds to the first
virtual machine instruction, is a multi-byte instruction, the first
and second instructions are not folded.
[0129] An optional current object loader folder 132 exploits
instruction folding, such as that described above, and in greater
detail in U.S. patent application Ser. No. 08/786,351, entitled
"INSTRUCTION FOLDING FOR A STACK-BASED MACHINE" naming Marc
Tremblay and James Michael O'Connor as inventors, assigned to the
assignee, of this application, and filed on Jan. 23, 1997, now U.S.
Pat. No. 6,026,485, which is incorporated herein by reference in
its entirety, in virtual machine instruction sequences which
simulation results have shown to be particularly frequent and
therefore a desirable target for optimization. In particular, a
method invocation typically loads an object reference for the
corresponding object onto the operand stack and fetches a field
from the object. Instruction folding allows this extremely common
virtual machine instruction sequence to be executed using an
equivalent folded operation.
[0130] Quick variants are not part of the virtual machine
instruction set (See Chapter 3 of Section I), and are invisible
outside of a JAVA virtual machine implementation. However, inside a
virtual machine implementation, quick variants have proven to be an
effective optimization. (See Section A in Section I; which is an
integral part of this specification.) Supporting writes for updates
of various instructions to quick variants in a non-quick to quick
translator cache 131 changes the normal virtual machine instruction
to a quick virtual machine instruction to take advantage of the
large benefits bought from the quick variants. In particular, as
described in more detail in U.S. patent application Ser. No.
08/788,805, entitled "NON-QUICK INSTRUCTION ACCELERATOR INCLUDING
INSTRUCTION IDENTIFIER AND DATA SET STORAGE AND METHOD OF
IMPLEMENTING SAME" naming Marc Tremblay and James Michael O'Connor
as inventors, assigned to the assignee of this application, and
filed on Jan. 23, 1997 with Attorney Docket No. SP2039, now U.S.
Pat. No. 6,065,108, which is incorporated herein by reference in
its entirety, when the information required to initiate execution
of an instruction has been assembled for the first time, the
information is stored in a cache along with the value of program
counter PC as a tag in non-quick to quick translator cache 131 and
the instruction is identified as a quick-variant. In one
embodiment, this is done with self-modifying-code.
[0131] Upon a subsequent call of that instruction, instruction
decode unit 130 detects that the instruction is identified as a
quick-variant and simply retrieves the information needed to
initiate execution of the instruction from non-quick to quick
translator cache 131. Non-quick to quick translator cache is an
optional feature of hardware processor 100.
[0132] With regard to branching, a very short pipe with quick
branch resolution is sufficient for most implementations. However,
an appropriate simple branch prediction mechanism can alternatively
be introduced, e.g., branch predictor circuit 133. Implementations
for branch predictor circuit 133 include branching based on opcode,
branching based on offset, or branching based on a two-bit counter
mechanism.
[0133] The JAVA virtual machine specification defines an
instruction invokenonvirtual, opcode 183, which, upon execution,
invokes methods. The opcode is followed by an index byte one and an
index byte two. (See Section I.) Operand stack 423 contains a
reference to an object and some number of arguments when this
instruction is executed.
[0134] Index bytes one and two are used to generate an index into
the constant pool of the current class. The item in the constant
pool at that index points to a complete method signature and class.
Signatures are defined in Section I and that description is
incorporated herein by reference.
[0135] The method signature, a short, unique identifier for each
method, is looked up in a method table of the class indicated. The
result of the lookup is a method block that indicates the type of
method and the number of arguments for the method. The object
reference and arguments are popped off this method's stack and
become initial values of the local variables of the new method. The
execution then resumes with the first instruction of the new
method. Upon execution, instructions invokevirtual, opcode 182, and
invokestatic, opcode 184, invoke processes similar to that just
described. In each case, a pointer is used to lookup a method
block.
[0136] A method argument cache 134, that also is an optional
feature of hardware processor 100, is used, in a first embodiment,
to store the method block of a method for use after the first call
to the method, along with the pointer to the method block as a tag.
Instruction decode unit 130 uses index bytes one and two to
generate the pointer and then uses the pointer to retrieve the
method block for that pointer in cache 134. This permits building
the stack frame for the newly invoked method more rapidly in the
background in subsequent invocations of the method. Alternative
embodiments may use a program counter or method identifier as a
reference into cache 134. If there is a cache miss, the instruction
is executed in the normal fashion and cache 134 is updated
accordingly. The particular process used to determine which cache
entry is overwritten is not an essential aspect of this invention.
A least-recently used criterion could be implemented, for
example.
[0137] In an alternative embodiment, method argument cache 134 is
used to store the pointer to the method block, for use after the
first call to the method, along with the value of program counter
PC of the method as a tag. Instruction decode unit 130 uses the
value of program counter PC to access cache 134. If the value of
program counter PC is equal to one of the tags in cache 134, cache
134 supplies the pointer stored with that tag to instruction decode
unit 130. Instruction decode unit 130 uses the supplied painter to
retrieve the method block for the method. In view of these two
embodiments, other alternative embodiments will be apparent to
those of skill in the art.
[0138] Wide index forwarder 136, which is an optional element of
hardware processor 100, is a specific embodiment of instruction
folding for instruction wide. Wide index forwarder 136 handles an
opcode encoding an extension of an index operand for an immediately
subsequent virtual machine instruction. In this way, wide index
forwarder 136 allows instruction decode unit 130 to provide indices
into local variable storage 421 when the number of local variables
exceeds that addressable with a single byte index without incurring
a separate execution cycle for instruction wide.
[0139] Aspects of instruction decoder 135, particularly instruction
folding, non-quick to quick translator cache 131, current object
loader folder 132, branch predictor 133, method argument cache 134,
and wide index forwarder 136 are also useful in implementations
that utilize a software interpreter or just-in-time compiler, since
these elements can be used to accelerate the operation of the
software interpreter or just-in-time compiler. In such an
implementation, typically, the virtual machine instructions are
translated to an instruction for the processor executing the
interpreter or compiler, e.g., any one of a Sun processor, a DEC
processor, an Intel processor, or a Motorola processor, for
example, and the operation of the elements is modified to support
execution on that processor. The translation from the virtual
machine instruction to the other processor instruction can be done
either with a translator in a ROM or a simple software translator.
For additional examples of dual instruction set processors, see
U.S. patent application Ser. No. 08/787,618, entitled "A PROCESSOR
FOR EXECUTING INSTRUCTION SETS RECEIVED FROM A NETWORK OR FROM A
LOCAL MEMORY" naming Marc Tremblay and James Michael O'Connor as
inventors, now U.S. Pat. No. 5,925,123, assigned to the assignee of
this application, and filed on Jan. 23, 1997 with Attorney Docket
No. SP2042, which is incorporated herein by reference in its
entirety.
[0140] As explained above, one embodiment of processor 100
implements instruction folding to enhance the performance of
processor 100. In general, instruction folding in accordance with
the present invention can be used in any of a stack-based virtual
machine implementation, including, e.g., in a hardware processor
implementation, in a software interpreter implementation, in a
just-in-time compiler implementation, etc. Thus, while various
embodiments of instruction folding are described in the following
more detailed description in terms of a hardware processor, those
of skill in the art will appreciate, in view of this description,
suitable extensions of instruction folding to other stack-based
virtual machine implementations.
[0141] FIG. 14 illustrates folded execution of first and second
stack instructions, according to the principles of this invention.
In this embodiment, a first operand for an addition instruction
resides in top-of-stack (TOS) entry 141 la of stack 1410. A second
operand resides in entry 1412 of stack 1410. Notice that entry 1412
is not physically adjacent to top-of stack entry 141 la and in
fact, is in the interior of stack 1410. An instruction stream
includes a load top-of-stack instruction for pushing the second
operand onto the top of stack (see description of instruction iload
in Section I) and an addition instruction for operating on the
first and second operands residing in the top two entries of stack
1410 (see description of instruction iadd in Section I). However,
to speed execution of the instruction stream, the load top-of-stack
and addition instructions are folded into a single operation
whereby the explicit sequential execution of the load top-of-stack
instruction and the associated execution cycle are eliminated.
Instead, a folded operation corresponding to the addition
instruction operates on the first and second operands, which reside
in TOS entry 141 la and entry 1412 of stack 1410. The result of the
folded operation is pushed onto stack 1410 at TOS entry 141 lb.
Thus, folding according to the principles of this invention
enhances performance compared to an unfolded method for executing
the same sequence of instructions.
[0142] Without instruction folding, a first operand for an addition
instruction resides in top-of-stack (TOS) entry 1311a of stack 1310
(see FIG. 13). A second operand resides in entry 1312 of stack
1310. A load to top-of-stack instruction pushes the second operand
onto the top of stack 1310 and typically requires an execution
cycle. The push results in the second and first operands residing
in TOS entry 1311b and (TOS-1) entry 1313, respectively.
Thereafter, the addition instruction operates, in another execution
cycle, on the first and second operands which properly reside in
the top two entries, i.e., TOS entry 1311b and (TOS-1) entry 1313,
of stack 1310 in accordance with the semantics of a stack
architecture. The result of the addition instruction is pushed onto
stack 1310 at TOS entry 1311c and after the addition instruction is
completed, it is as if the first and second operand data were never
pushed onto stack 1310. As described above, folding reduces the
execution cycles required to complete the addition and so enhances
the speed of execution of the instruction stream. More complex
folding, e.g., folding including store instructions and folding
including larger numbers of instructions, is described in greater
detail below.
[0143] In general, instruction decoder unit 130 (FIG. 1) examines
instructions in a stream of instructions. Instruction decoder unit
130 folds first and second adjacent instructions together and
provides a single equivalent operation for execution by execution
unit 140 when instruction decoder unit 130 detects that the first
and second instructions have neither structural nor resource
dependencies and the second instruction operates on data provided
by the first instruction. Execution of the single operations
obtains the same result as execution of an operation corresponding
to the first instruction followed by execution an operation
corresponding to the second instruction, except that an execution
cycle has been eliminated.
[0144] As described above, the JAVA virtual machine is
stack-oriented and specifies an instruction set, a register set, an
operand stack, and an execution environment. Although, the present
invention is described in relation to the JAVA Virtual Machine,
those of skill in the art will appreciate that the invention is not
limited to embodiments implementing or related to the JAVA virtual
machine and, instead, encompasses systems, articles, methods, and
apparati for a wide variety of stack machine environments, both
virtual and physical.
[0145] As illustrated in FIG. 4A, according to the JAVA Virtual
Machine Specification, each method has storage allocated for an
operand stack and a set of local variables. Similarly, in the
embodiment of FIG. 15 (see also FIG. 4A), a series of method frames
e.g., method frame 1501 and method frame 1502 on stack 1503, each
include an operand stack instance, local variable storage instance,
and frame state information instance for respective methods invoked
along the execution path of a JAVA program. A new frame is created
and becomes current each time a method is invoked and is destroyed
after the method completes execution. A frame ceases to be current
if its method invokes another method. On method return, the current
frame passes back the result of its method invocation, if any, to
the previous frame via stack 1503. The current frame is then
discarded and the previous frame becomes current. Folding in
accordance with the present invention, as described more completely
below, is not dependent upon a particular process used to allocate
or define memory space for a method, such as a frame, and can, in
general, be used in any stack based architecture.
[0146] This series of method frames may be implemented in any of a
variety of suitable memory hierarchies, including for example
register/cache/memory hierarchies. However, irrespective of the
memory hierarchy chosen, an operand stack instance 1512 (FIG. 15)
is implemented in randomly-accessible storage 1510, i.e., at least
some of the entries in operand stack instance 1512 can be accessed
from locations other than the top most locations of operand stack
instance 1512 in contrast with a conventional stack implementation
in which only the top entry or topmost entries of the stack can be
accessed. As described above, register OPTOP stores a pointer that
identifies the top of operand stack instance 1512 associated with
the current method. The value stored in register OPTOP is
maintained to identify the top entry of an operand stack instance
corresponding to the current method.
[0147] In addition, local variables for the current method are
represented in randomly-accessible storage 1510. A pointer stored
in register VARS identifies the starting address of local variable
storage instance 1513 associated with the current method. The value
in register VARS is maintained to identify a base address of the
local variable storage instance corresponding to the current
method.
[0148] Entries in operand stack instance 1512 and local variable
storage instance 1513 are referenced by indexing off of values
represented in registers OPTOP and VARS, respectively, that in the
embodiment of FIG. 1 are included in register set 144, and in the
embodiment of FIG. 15 are included in pointer registers 1522.
Pointer registers 1522 may be represented in physical registers of
a processor implementing the JAVA Virtual Machine, or optionally,
in randomly-accessible storage 1510. In an exemplary embodiment,
commonly used offsets OPTOP-1, OPTOP-2, VARS+1, VARS+2, and VARS+3
are derived from the values in registers OPTOP and VARS,
respectively. Alternatively, the additional offsets could be stored
in registers of pointer registers 1522.
[0149] Operand stack instance 1512 and local variable storage
instance 1513 associated with the current method are preferably
represented in a flat 64-entry cache, e.g., stack cache 155 (see
FIG. 1) whose contents are kept updated so that a working set of
operand stack and local variable storage entries are cached.
However, depending on the size of the current frame, the current
frame including operand stack instance 1512 and local variable
storage instance 1513 may be fully or partially represented in the
cache. Operand stack and local variable storage entries for frames
other than the current frame may also be represented in the cache
if space allows. A suitable representation of a cache suitable for
use with the folding of this invention is described in greater
detail in U.S. patent application Ser. No. 08/787,736, entitled
"METHODS AND APPARATI FOR STACK CACHING" naming Marc Tremblay and
James Michael O'Connor as inventors, assigned to the assignee of
this application, and filed on Jan. 23, 1997, the detailed
description of which is incorporated herein by reference, and in
U.S. patent application Ser. No. 08/787,617, entitled "METHOD FRAME
STORAGE USING MULTIPLE MEMORY CIRCUITS" naming Marc Tremblay and
James Michael O'Connor as inventors, assigned to the assignee of
this application, and filed on Jan. 23, 1997, the detailed
description of which also is incorporated herein by reference.
However, other representations, including separate and/or uncached
operand stack and local variable storage areas, are also
suitable.
[0150] In addition to method frames and their associated operand
stack and local variable storage instances, a constant area 1514 is
provided in the address space of a processor implementing the JAVA
virtual machine for commonly-used constants, e.g., constants
specified by JAVA virtual machine instructions such as instruction
iconst. In some cases, an operand source is represented as an index
into constant area 1514. In the embodiment of FIG. 15, constant
area 1514 is represented in randomly-accessible storage 1510.
Optionally, entries of constant area 1514 could also be cached,
e.g., in stack cache 155.
[0151] Although those of skill in the art will recognize the
advantages of maintaining an operand stack and local variable
storage instance for each method, as well as the opportunities for
passing parameters and results created by maintaining the various
instances of operand stack and local variable storage in a
stack-oriented structure, in the interest of clarity, the
description which follows focuses on the particular instances
(operand stack instance 1512 and local variable storage instance
1513) of each associated the current method. Hereafter, these
particular instances of an operand stack and local variable storage
are referred to simply as operand stack 1512 and local variable
storage 1513. Despite this simplification for purposes of
illustration, those of skill in the art will appreciate that
operand stack 1512 and local variable storage 1513 refer to any
instances of an operand stack and variable storage associated with
the current method, including representations which maintain
separate instances for each method and representations which
combine instances into a composite representation.
[0152] Operand sources and result targets for JAVA Virtual Machine
instructions typically identify entries of operand stack instance
1512 or local variable storage instance 1513, i.e., identify
entries of the operand stack and local variable storage for the
current method. By way of example, and not limitation,
representative JAVA virtual machine instructions are described in
Chapter 3 of The JAVA Virtual Machine Specification which is
included at Section I.
[0153] JAVA virtual machine instructions rarely explicitly
designate both the source of the operand, or operands, and the
result destination. Instead, either the source or the destination
is implicitly the top of operand stack 1512. Some JAVA bytecodes
explicitly designate neither a source nor a destination. For
example, instruction iconst.sub.--0 pushes a constant integer zero
onto operand stack 1512. The constant zero is implicit in the
instruction, although the instruction may actually be implemented
by a particular JAVA virtual machine implementation using a
representation of the value zero from a pool of constants, such as
constant area 1514, as the source for the zero operand. An
instruction decoder for a JAVA virtual machine implementation that
implements instruction iconst.sub.--0 in this way could generate,
as the source address, the index of the entry in constant area 1514
where the constant zero is represented.
[0154] Prior to considering the various embodiments of folding in
accordance with the present invention, it is informative to
consider execution of JAVA virtual machine instructions, such as
the iadd instruction and the arraylength instruction, without the
folding process. After the operations associated with typical
execution of JAVA virtual machine instructions are understood, the
advantages of this invention will be more apparent. Further, this
understanding will assist those of skill in the art in extending
the invention to other stack-based architectures that do not rely
upon the JAVA virtual machine instructions.
[0155] Focusing illustratively on operand stack and local variable
storage structures associated with the current method and referring
now to FIGS. 16A-D, the JAVA virtual machine integer add
instruction, iadd, generates the sum of first and second integer
operands, referred to as operand1 and operand2, respectively, that
are at the top two locations of operand stack 1512. The top two
locations are identified, at the time of instruction iadd
execution, by pointer OPTOP in register OPTOP and by pointer
OPTOP-1. The result of the execution of instruction iadd, i.e., the
sum of first and second integer operands, is pushed onto operand
stack 1512.
[0156] FIG. 16A shows the state of operand stack 1512 and local
variable storage 1513 that includes first and second values,
referred to as value1 and value2, before execution of a pair of
JAVA virtual machine integer load instructions iload. In FIG. 16A,
pointer OPTOP has the value AAC0h.
[0157] FIG. 16B shows operand stack 1512 after execution of the
pair of instructions iload that load integer values from local
variable storage 1513 onto operand stack 1512, pushing (i.e.,
copying) values value1 and value2 from locations identified by
pointer VARS in register VARS and by pointer VARS+2 onto operand
stack 1512 as operand1 at location AAC4h and operand2 at location
AAC8h, and updating pointer OPTOP in the process to value AAC8h.
FIG. 16C shows operand stack 1512 after instruction iadd has been
executed. Execution of instruction iadd pops operands operand1 and
operand2 off operand stack 1512, calculates the sum of operands
operand1 and operand2, and pushes that sum onto operand stack 1512
at location AAC4h. After execution of instruction iadd, pointer
OPTOP has the value AAC0h and points to the operand stack 1512
entry storing the sum.
[0158] FIG. 16D shows operand stack 1512 after an instruction
istore has been executed. Execution of instruction istore pops the
sum off operand stack 1512 and stores the sum in the local variable
storage 1513 entry at the location identified by pointer
VARS+2.
[0159] Variations for other instructions which push operands onto
operand stack 1512 and which operate on values residing at the top
of operand stack 1512 will be apparent to those of skill in the
art. For example, variations for alternate operations and for data
types requiring multiple operand stack 1512 entries, e.g., long
integer values, double-precision floating point values, etc., will
be apparent to those of skill in the art in view of this
disclosure.
[0160] The folding example of FIGS. 17A-C is analogous to that
illustrated with reference to FIGS. 16A-D, though with only load
folding illustrated. Execution of JAVA virtual machine length of
array instruction arraylength determines the length of an array
whose object reference pointer objectref is at the top of operand
stack 1512, and pushes the length onto operand stack 1512. FIG. 17A
shows the state of operand stack 1512 and local variable storage
1513 before execution of JAVA virtual machine reference load
instruction aload that is used to load an object reference from
local variable storage 1513 onto the top of operand stack 1512. In
FIG. 17A, pointer OPTOP has the value AAC0h.
[0161] FIG. 17B shows operand stack 1512 after execution of
instruction aload pushes, i.e., copies, object reference pointer
objectref onto the top of operand stack 1512 and updates pointer
OPTOP to AAC4h in the process.
[0162] FIG. 17C shows operand stack 1512 after instruction
arraylength has been executed. Execution of instruction arraylength
pops object reference pointer objectref off operand stack 1512,
calculates the length of the array referenced thereby, and pushes
that length onto operand stack 1512. Suitable implementations of
the instruction arraylength may supply object reference pointer
objectref to an execution unit, e.g., execution unit 140, which
subsequently overwrites the object reference pointer objectref with
the value length. Whether the object reference pointer objectref is
popped from operand stack 1512 or simply overwritten, after
execution of instruction arraylength, pointer OPTOP has the value
AAC4h and points to the operand stack 1512 entry storing the value
length.
[0163] FIG. 18 illustrates a processor 1800 wherein loads, such as
those illustrated in FIGS. 16A and 16B and in FIGS. 17A and 17B,
are folded into execution of subsequent instructions, e.g., into
execution of subsequent instruction iadd, or instruction
arraylength. In this way, intermediate execution cycles associated
with loading operands operand1 and operand2for instruction iadd, or
with loading pointer objectref for instruction arraylength onto the
top of operand stack 1512 can be eliminated. As a result, single
cycle execution of groups of JAVA virtual machine instructions
e.g., the group of instructions iload, iload, iadd, and istore, or
the group of instructions aload and arraylength, is provided by
processor 1800. One embodiment of processor 1800 is presented in
FIG. 1 as hardware processor 100. However, hardware processor 1800
includes other embodiments that do not include the various
optimizations of hardware processor 100. Further, the folding
processes described below could be implemented in a software
interpreter or a included within a just-in-time compiler. In the
processor 1800 embodiment of FIG. 18, stores such as that
illustrated in FIG. 16D, are folded into execution of prior
instructions, e.g., in FIG. 16D, into execution of the immediately
prior instruction iadd.
[0164] The instruction folding is provided primarily by instruction
decoder 1818. Instruction decoder 1818 retrieves fetched
instructions from instruction buffer 1816 and depending upon the
nature of instructions in the fetched instruction sequence,
supplies execution unit 1820 with decoded operation and operand
addressing information implementing the instruction sequence as a
single folded operation. Unlike instructions of the JAVA virtual
machine instruction set to which the instruction sequence from
instruction buffer 1816 conforms, decoded operations supplied to
execution unit 1820 by instruction decoder 1818 operate on operand
values represented in entries of local variable storage 1513,
operand stack 1512, and constant area 1514.
[0165] In the exemplary embodiment of FIG. 18, valid operand
sources include local variable storage 1513 entries identified by
pointers VARS, VARS+1, VARS+2, and VARS+3, as well as operand stack
1512 entries identified by pointers OPTOP, OPTOP-1 and OPTOP-2.
Similarly, valid result targets include local variable storage 1513
entries identified by operands VARS, VARS+1, VARS+2, and VARS+3.
Embodiments in accordance with FIG. 18 may also provide for
constant area 1514 entries as valid operand sources as well as
other locations in operand stack 1512 and local variable storage
1513.
[0166] Referring now to FIGS. 18 and 19, a sequence of JAVA virtual
machine instructions is fetched from memory and loaded into
instruction buffer 1816. Conceptually, instruction buffer 1816 is
organized as a shift register for JAVA bytecodes. One or more
bytecodes are decoded by instruction decoder 1818 during each cycle
and operations are supplied to execution unit 1820 in the form of a
decoded operation on instruction decode bus instr_dec and
associated operand source and result destination addressing
information on instruction address bus instr_addr. Instruction
decoder 1818 also provides an instruction valid signal instr_valid
to execution unit 1820. When asserted, signal instr_valid indicates
that the information on instruction decode bus instr_dec specifies
a valid operation.
[0167] One or more bytecodes are shifted out of instruction buffer
1816 to instruction decode unit 1818 each cycle in correspondence
with the supply of decoded operations and operand addressing
information to execution unit 1820, and subsequent undecoded
bytecodes are shifted into instruction buffer 1816. For normal
decode operations, a single instruction is shifted out of
instruction buffer 1816 and decoded by instruction decode unit
1818, and a single corresponding operation is executed by execution
unit 1820 during each instruction cycle.
[0168] In contrast, for folded decode operations, multiple
instructions, e.g., a group of instructions, are shifted out of
instruction buffer 1816 to instruction decode unit 1818. In
response to the multiple instructions, instruction decode unit 1818
generates a single equivalent folded operation that in turn is
executed by execution unit 1820 during each instruction cycle.
[0169] Referring illustratively to the instruction sequence
described above with reference to FIGS. 16A-16D, instruction
decoder 1818 selectively decodes bytecodes associated with four
JAVA virtual machine instructions:
[0170] 1. iload value1;
[0171] 2. iload value2;
[0172] 3. iadd; and
[0173] 4. istore,
[0174] that were described in the above description of FIGS. 16A-D.
As now described, both instructions iload and the instruction
istore are folded by instruction decoder 1818 into an add operation
corresponding to instruction iadd. Although operation of
instruction decoder 1818 is illustrated using a foldable sequence
of four instructions, those of skill in the art will appreciate
that the invention is not limited to four instructions. Foldable
sequences of two, three, four, five, or more instructions are
envisioned. For example, more than one instruction analogous to the
instruction istore and more than two instructions analogous to the
instructions iload may be included in foldable sequences.
[0175] Instruction decoder 1818 supplies decoded operation
information over bus instr_dec and associated operand source and
result destination addressing information over bus instr_addr
specifying that execution unit 1820 is to add the contents of local
variable storage 1513 location 0, this is identified by pointer
VARS, and local variable storage 1513 location 2, that is
identified by pointer VARS+2, and store the result in local
variable storage 1513 location 2, that is identified by pointer
VARS+2. In this way, the two load instructions are folded into
execution of an operation corresponding to instruction iadd. Two
instruction cycles and the intermediate data state illustrated in
FIG. 16B are eliminated. In addition, instruction istore is also
folded into execution of the operation corresponding to instruction
iadd, eliminating another instruction cycle, for a total of three,
and the intermediate data state illustrated in FIG. 16C. In various
alternative embodiments, instruction folding in accordance with the
present invention may eliminate loads, stores, or both loads and
stores.
[0176] FIG. 20 depicts an exemplary embodiment of an instruction
decoder 1818 providing both folded and unfolded decoding of
bytecodes. Selection of a folded or unfolded operating mode for
instruction decoder 1818 is based on the particular sequence of
bytecodes fetched into instruction buffer 1816 and subsequently
accessed by instruction decoder 1818. A normal decode portion 2002
and a fold decode portion 2004 of instruction decoder 1818 are
configured in parallel to provide support for unfolded and folded
execution, respectively.
[0177] In the embodiment of FIG. 20, fold decode portion 2004
detects opportunities for folding execution of bytecodes in the
bytecode sequence fetched into instruction buffer 1816. A detection
of such a foldable sequence triggers selection of the output of
fold decode portion 2004, rather than normal decode portion 2002,
for provision to execution unit 1820. Advantageously, selection of
folded or unfolded decoding is transparent to execution unit 1820,
which simply receives operation information over bus instr_dec and
associated operand source and result destination addressing
information over bus instr_addr, and which need not know whether
the information corresponds to a single instruction or a folded
instruction sequence.
[0178] Normal decode portion 2002 functions to inspect a single
bytecode from instruction buffer 1816 during each instruction
cycle, and generates the following indications in response
thereto:
[0179] 1. a normal instruction decode signal n_instr_dec, which
specifies an operation, e.g., integer addition, corresponding to
the decoded instruction, is provided to a first set of input
terminals of switch 2006;
[0180] 2. a normal address signal n_adr, which makes explicit the
source and destination addresses, e.g., first operand
address=OPTOP, second operand address=OPTOP-1, and destination
address=OPTOP-1 for an instruction iadd, for the decoded
instruction, is provided to a first bus input of switch 2010;
[0181] 3. a net change in pointer OPTOP signal n_delta_optop, e.g.,
for the instruction iadd, net change=-1, which in the embodiment of
FIG. 20 is encoded as a component of normal address signal n_adr;
and
[0182] 4. an instruction valid signal instr_valid, which indicates
whether normal instruction decode signal n_instr_dec specifies a
valid operation, is provided to a first input terminal of switch
2008.
[0183] In contrast with normal decode portion 2002, and as
discussed in greater detail below, fold decode portion 2004 of
instruction decoder 1818 inspects sequences of bytecodes from the
instruction buffer 1816 and determines whether operations
corresponding to these sequences (e.g., the sequence iload value1
from local variable 0, iload value2 from local variable 2, iadd,
and istore sum to local variable 2) can be folded together to
eliminate unnecessary temporary storage of instruction operands
and/or results on the operand stack. When fold decode portion 2004
determines that a sequence of bytecodes in instruction buffer 1816
can be folded together, fold decode portion 2004 generates the
following indications:
[0184] 1. a folded instruction decode signal f_instr_dec, which
specifies an equivalent operation, e.g., integer addition
corresponding to the folded instruction sequence, is provided to a
second set of input terminals of switch 2006;
[0185] 2. a folded address signal f_adr, which specifies source and
destination addresses for the equivalent operation, e.g., first
operand address=VARS, second operand address=VARS+2, and
destination address=VARS+2, is provided to a second bus input of
switch 2010;
[0186] 3. a net change in pointer OPTOP signal f_delta_optop, e.g.,
for the above sequence net change=0, which in the embodiment of
FIG. 20 is encoded as a component of normal address signal n_adr;
and
[0187] 4. a folded instruction valid signal f_Valid, which
indicates whether folded instruction decode signal f_instr dec
specifies a valid operation, is provided to a second input terminal
of switch 2008.
[0188] Fold decode portion 2004 also generates a signal on fold
line f/nf which indicates whether a sequence of bytecodes in
instruction buffer 1816 can be folded together. The signal on fold
line f/nf is provided to control inputs of switches 2006, 2010 and
2008. If a sequence of bytecodes in instruction buffer 1816 can be
folded together, the signal on fold line f/nf causes switches 2006,
2010 and 2008 to select respective second inputs for provision to
execution unit 1820, i.e., to source folded instruction decode
signal f_instr_dec, folded address signal f_adr, and folded
instruction valid signal f_valid from fold decode portion 2004. If
a sequence of bytecodes in instruction buffer 1816 cannot be folded
together, the signal on fold line f/nf causes switches 2006, 2010
and 2008 to select respective first inputs for provision to
execution unit 1820, i.e., to source normal instruction decode
signal n_instr_dec, normal address signal n_adr, and normal
instruction valid signal n valid from fold decode portion 2004.
[0189] In some embodiments in accordance with the present
invention, the operation of fold decode portion 2004 is suppressed
in response to an active suppress folding signal suppress_fold
supplied from outside instruction decoder 1818. In response to an
asserted suppress folding signal suppress_fold (see FIG. 21), the
signal on fold line f/nf remains in a state selective for
respective first inputs of switches 2006, 2010 and 2008 even if the
particular bytecode sequence presented by instruction buffer 1816
would otherwise trigger folding. For example, in one such
embodiment, suppress folding signal suppress_fold is asserted when
the local variable storage 1513 entry identified by pointer VARS is
not cached, e.g., when entries in operand stack 1512 have displaced
local variable storage 1513 from a stack cache 155. In accordance
with the exemplary embodiment described therein, a stack cache and
cache control mechanism representing at least a portion of operand
stack 1512 and local variable storage 1513 may advantageously
assert suppress folding signal suppress_fold if fold-relevant
entries of local variable storage 1513 or operand stack 1512 are
not represented in stack cache 155.
[0190] FIG. 21 illustrates fold decode portion 2004 of instruction
decoder 1818 in greater detail. A fold determination portion 2104
selectively inspects the sequence of bytecodes in instruction
buffer 1816. If the next bytecode and one or more subsequent
bytecodes represent a foldable sequence of operations (as discussed
below with respect to FIG. 22), then fold determination portion
2104 supplies a fold-indicating signal on fold line f/nf and a
folded instruction decode signal f_instr_dec that specifies an
equivalent folded operation. Folded instruction decode signal
f_instr_dec is supplied to execution unit 1820 as the decoded
instruction instr_dec. In an exemplary embodiment, a foldable
sequence of operations includes those associated with 2, 3, or 4
bytecodes from instruction decoder 1818 (up to 2 bytecodes loading
operands onto operand stack 1512, a bytecode popping the
operand(s), operating thereupon, and pushing a result onto operand
stack 1512, and a bytecode popping the result from operand stack
1512 and storing the result. The equivalent folded operation, which
is encoded by the folded instruction decode signal f_instr_dec,
specifies an operation, that when combined with folded execution
addressing information obviates the loads to, and stores from,
operand stack 1512.
[0191] Alternative embodiments may fold only two instructions,
e.g., an instruction iload into an instruction iadd or an
instruction istore back into an immediately prior instruction iadd.
Other alternative embodiments may fold only instructions that push
operands onto the operand stack, e.g., one or more instructions
iload folded into an instruction iadd, or only instructions that
pop results from the operand stack, e.g., an instruction istore
back into an immediately prior instruction iadd. Further
alternative embodiments may fold larger numbers of instructions
that push operands onto the operand stack and/or instructions that
pop results from the operand stack instructions in accordance with
instructions of a particular virtual machine instruction set. In
such alternative embodiments, the above described advantages over
normal decoding and execution of instruction sequences are still
obtained.
[0192] Fold determination portion 2104 generates a series of fold
address index composite signal f_adr_ind including component first
operand index signal first_adr_ind, second operand index signal
second_adr_mind, and destination index signal dest_adr_mind, which
are respectively selective for a first operand address, a second
operand address, and a destination address for the equivalent
folded operation. Fold determination portion 2104 provides the
composite signal f_adr_ind to fold address generator 2102 for use
in supplying operand and destination addresses for the equivalent
folded operation. Fold determination portion 2104 asserts a
fold-indicating signal on fold line f/nf to control the switches
2006, 2010 and 2008 (see FIG. 20) to provide the signals
f_instr_dec, f_adr, and f_valid, as signals instr_dec, instr_adr,
and instr_valid, respectively. Otherwise respective signals are
provided to execution unit 1820 from normal decode portion
2002.
[0193] The operation of fold determination portion 2104 is now
described with reference to the flowchart of FIG. 22. At start
2201, fold determination portion 2104 begins an instruction decode
cycle and transfers processing to initialize index 2202. In
initialize index 2202, an instruction index instr_index into
instruction buffer 1816 is initialized to identify the next
bytecode of a bytecode sequence in instruction buffer 1816. In an
exemplary embodiment, instruction index instr_index is initialed to
one (1) and the next bytecode is the first bytecode in instruction
buffer 1816 since prior bytecodes have already been shifted out of
instruction buffer 1816, although a variety of other indexing and
instruction buffer management schemes would also be suitable. Upon
completion, initialize index 2202 transfers processing to first
instruction check 2204.
[0194] In first instruction check 2204, fold determination portion
2104 determines whether the instruction identified by index
instr_index, i.e., the first bytecode, corresponds to an operation
that pushes a value, e.g., an integer value, a floating point
value, a reference value, etc., onto operand stack 1512. Referring
illustratively to a JAVA virtual machine embodiment, first
instruction check 2204 determines whether the instruction
identified by index instr_index is one that the JAVA virtual
machine specification (see Section I) defines as for pushing a
first data item onto the operand stack. If so, first operand index
signal first_adr_mind is asserted (at first operand address setting
2206) to identify the source of the first operand value. In an
exemplary embodiment, first operand index signal first_adr_mind is
selective for one of OPTOP, OPTOP-1, OPTOP-2, VARS, VARS+1, VARS+2,
and VARS+3, although alternative embodiments may encode larger,
smaller, or different sets of source addresses, including for
example, source addresses in constant area 1514. Depending on the
bytecodes which follow, this first bytecode may correspond to an
operation which can be folded into the execution of a subsequent
operation. However, if the first bytecode does not meet the
criteria of first instruction check 2204, folding is not
appropriate and fold determination portion 2104 supplies a
nonfold-indicating signal on fold line f/nf, whereupon indications
from normal decode portion 2002 provide the decoding.
[0195] Assuming the first bytecode meets the criteria of first
instruction check 2204, index instr_index is incremented (at
incrementing 2208) to point to the next bytecode in instruction
buffer 1816. Then, at second instruction check 2210, fold
determination portion 2104 determines whether instruction
identified by index instr_index, i.e., the second bytecode,
corresponds to an operation that pushes a value, e.g., an integer
value, a floating point value, a reference value, etc., onto
operand stack 1512. Referring illustratively to a JAVA virtual
machine embodiment, second instruction check 2210 determines
whether the instruction identified by index instr_index is one that
the JAVA virtual machine specification (see Section I) defines as
for pushing a first data item onto the operand stack. If so, second
operand index signal second_adr_mind is asserted (at second operand
address setting 2212) to indicate the source of the second operand
value and index instr_index is incremented (at incrementing 2214)
to point to the next bytecode in instruction buffer 1816. As
before, second operand index signal second_adr_mind is selective
for one of OPTOP, OPTOP-1, OPTOP-2, VARS, VARS+1, VARS+2, and
VARS+3, although alternative embodiments are also suitable. Fold
determination portion 2104 continues at third instruction check
2216 with index instr_index pointing to either the second or third
bytecode in instruction buffer 1816.
[0196] At third instruction check 2216, fold determination portion
2104 determines whether the instruction identified by index
instr_index, i.e., either the second or third bytecode, corresponds
to an operation that operates on an operand value or values, e.g.,
integer value (s), floating point value(s), reference value(s),
etc., from the uppermost entries of operand stack 1512, effectively
popping such operand values from operand stack 1512 and pushing a
result value onto operand stack 1512. Popping of operand values may
be explicit or merely a net effect of writing the result value to
an upper entry of operand stack 1512 and updating pointer OPTOP to
identify that entry as the top of operand stack 1512. Referring
illustratively to a JAVA virtual machine embodiment, third
instruction check 2216 determines whether the instruction
identified by index instr_index corresponds to an operation that
the JAVA virtual machine specification (see Section I) defines as
for popping a data item (or items) from the operand stack, for
operating on the popped data item(s), and for pushing a result of
the operation onto the operand stack. If so, index instr_index is
incremented (at incrementing 2218) to point to the next bytecode in
instruction buffer 1816. If not, folding is not appropriate and
fold determination portion 2104 supplies a nonfold-indicating
signal on fold line f/nf, whereupon normal decode portion 2002
provides decoding.
[0197] At fourth instruction check 2220, fold determination portion
2104 determines whether the instruction identified by index
instr_index, i.e., either the third or fourth bytecode, corresponds
to an operation that pops a value from operand stack 1512 and
stores the value in a data store such as local variable storage
1513. Referring illustratively to a JAVA virtual machine
embodiment, fourth instruction check 2220 determines whether the
instruction identified by index instr_index corresponds to an
operation that the JAVA virtual machine specification (see Section
I) defines as for popping the result data item from the operand
stack. If so, index signal dest_adr_mind is asserted (at
destination address setting 2222) to identify the destination of
the result value of the equivalent folded operation. Otherwise, if
the bytecode at instruction buffer 1816 location identified by
index instr_index does not match the criterion of fourth
instruction check 2220, index signal dest_adr_mind is asserted (at
destination address setting 1824) to identify the top of operand
stack 1512. Referring illustratively to a JAVA virtual machine
embodiment, if the instruction identified by index instr-index does
not match the criterion of fourth instruction check 2220, index
signal dest_adr_mind is asserted (at destination address setting
1824) to identify the pointer OPTOP, whether the top of operand
stack 1512 or a store operation destination is selected, the folded
instruction valid signal f_valid is asserted (at valid fold
asserting 1826) and a fold-indicating signal on line f/nf is
supplied to select fold decode inputs of switches 2006, 2008, and
2010 for supply to execution unit 1820. Fold determination portion
2104 ends an instruction decode cycle at finish 2250.
[0198] As a simplification, an instruction decoder for hardware
processor 100, e.g., instruction decoder 135, may limit fold
decoding to instruction sequences of two instructions and/or to
sequences of single bytecode instructions. Those of skill in the
art will appreciate suitable simplifications to fold decode portion
2004 of instruction decoder 1818.
[0199] FIG. 23 shows fold address generator 2102 including three
component address generators, first operand address generator 2302,
second operand address generator 2304, and destination address
generator 2306, respectively supplying a corresponding first
operand, second operand, and destination address based on indices
supplied thereto and pointer VARS and pointer OPTOP values from
pointer registers 1522. In an exemplary embodiment, first operand
address generator 2302, second operand address generator 2304, and
destination address generator 2306 supply addresses in
randomly-accessible storage 1510 corresponding to a subset of
operand stack 1512 and local variable storage 1513 entries.
Alternative embodiments may supply identifiers selective for
storage other than random access memory, e.g., physical registers,
which in a particular JAVA virtual machine implementation provide
underlying operand stack and local variable storage.
[0200] First operand address generator 2302 receives first operand
index signal first_adr_mind from fold determination portion 2104
and, using pointer VARS and pointer OPTOP values from pointer
registers 1522, generates a first operand address signal
first_op_adr for a first operand for the equivalent folded
operation. The operation of second operand address generator 2304
and destination address generator 2306 is analogous. Second operand
address generator 2304 receives first operand index signal
first_adr_ind and generates a second operand address signal
second_op_adr for a second operand (if any) for the equivalent
folded operation. Destination address generator 2306 receives the
destination index signal dest_ad_ind and generates the destination
address signal dest_adr for the result of the equivalent folded
operation. In the embodiment of FIGS. 20, 21, and 23, first operand
address signal first_op.sub.--adr, second operand address signal
second_op_adr, and destination address signal dest_adr are
collectively supplied to switch 2010 as fold address signal f_adr
for supply to execution unit 1820 as the first operand, second
operand, and destination addresses for the equivalent folded
operation.
[0201] FIG. 24 illustrates an exemplary embodiment of first operand
address generator 2302. Second operand address generator 2304 and
destination address generator 2306 are analogous. In the exemplary
embodiment of FIG. 24, first operand address signal first_op_adr is
selected from a subset of locations in local variable storage 1513
and operand stack 1512. Alternative embodiments may generate
operand and destination addresses from a larger, smaller, or
different subset of operand stack 1512 and local variable storage
1513 locations or from a wider range of locations in
randomly-accessible storage 1510. For example, alternative
embodiments may generate addresses selective for location in
constant area 1514. Suitable modifications to the exemplary
embodiment of FIG. 24 will be apparent to those of skill in the
art. First operand address generator 2302, second operand address
generator 2304, and destination address generator 2306 may
advantageously define differing sets of locations. For example,
whereas locations in constant area 1514 and in the interior of
operand stack 1512 are valid as operand sources, they are not
typically appropriate result targets. For this reason, the set of
locations provided by an exemplary embodiment of destination
address generator 2306 is restricted to local variable storage 1513
entries and uppermost entries of operand stack 1512, although
alternative sets are also possible. Referring to FIG. 24, pointer
OPTOP is supplied to register 2402, which latches the value and
provides the latched value to a first input of a data selector
2450. Similarly, pointer OPTOP is supplied to registers 2404 and
2406, which latch the value minus one and minus two, respectively,
and provide the latched values to second and third inputs of data
selector 2450. In this way, addresses identified by values OPTOP,
OPTOP-1, and OPTOP-2 are available for selection by data selector
2450. Similarly, pointer VARS is supplied to a series of registers
2408, 2410, 2412 and 2414, which respectively latch the values
VARS, VARS+1, VARS+2, and VARS+3 for provision to the fourth,
fifth, sixth, and seventh inputs of data selector 2450. In this
way, addresses identified by values VARS, VARS+1, VARS+2, and
VARS+3 are available for selection by data selector 2450. In the
exemplary embodiment described herein, offsets from pointer VARS
are positive because local variable storage 1513 is addressed from
its base (identified by pointer VARS) while offsets to pointer
OPTOP are negative because operand stack 1512 is addressed from its
top (identified by pointer OPTOP).
[0202] Data selector 2450 selects from among the latched addresses
available at its inputs. In an embodiment of fold determination
portion 2104 in accordance with the FIG. 24 embodiment of first
operand address generator 2302, load source addresses in local
variable storage 1513 other than those addressed by values VARS,
VARS+1, VARS+2, and VARS+3 are handled as unfoldable and decoded
via normal decode portion 2002. However, suitable modifications for
expanding the set of load addresses supported will be apparent to
those of skill in the art. Second operand address generator 2304
and destination address generator 2306 are of analogous design,
although destination address generator 2306 does not provide
support for addressing into constant area 1514. In one embodiment
in accordance with the present invention, signal RS1_D is supplied
to the zeroth input of data selector 2450. In this embodiment,
additional decode logic (not shown) allows for direct supply of
register identifier information to support an alternate instruction
set. Addition decode logic support for such an alternate
instruction set is described in greater detail in a U.S. Pat. No.
5,925,123 entitled "A PROCESSOR FOR EXECUTING INSTRUCTION SETS
RECEIVED FROM A NETWORK OR FROM A LOCAL MEMORY" naming Marc
Tremblay and James Michael O'Connor as inventors, assigned to the
assignee of this application, and filed on Jan. 23, 1997, the
detailed description of which is incorporated herein by
reference.
[0203] Referring back to FIG. 20, when fold determination portion
2104 of fold decode portion 2004 identifies a foldable bytecode
sequence, fold determination portion 2104 asserts a fold-indicating
signal on line f/nf, supplies an equivalent folded operation as
folded instruction decode signal f_instr_dec, and supplies, based
on load and store instructions from the foldable bytecode sequence,
indices into latched addresses maintained by first operand address
generator 2302, second operand address generator 2304, and
destination address generator 2306. Fold decode portion 2004
supplies the addresses so indexed as folded address signal f-adr.
Responsive to the signal on line tinf, switches 2006, 2008, 2010
supply decode information for the equivalent folded operation to
execution unit 1820.
[0204] Although fold decode portion 2004 has been described above
in the context of an exemplary four instruction foldable sequence,
it is not limited thereto. Based on the description herein, those
of skill in the art will appreciate suitable extensions to support
folding of additional instructions and longer foldable instruction
sequences, e.g., sequences of five or more instructions. By way of
example and not of limitation, support for additional operand
address signals, e.g., a third operand address signal, and/or for
additional destination address signals, e.g., a second destination
address signal, could be provided.
Integer Execution Unit
[0205] Integer execution unit IEU, that includes instruction decode
unit 130, integer unit 142, and stack management unit 150, is
responsible for the execution of all the virtual machine
instructions except the floating point related instructions. The
floating point related instructions are executed in floating point
unit 143.
[0206] Integer execution unit IEU interacts at the front end with
instructions cache unit 120 to fetch instructions, with floating
point unit (FPU) 143 to execute floating point instructions, and
finally with data cache unit (ECU) 160 to execute load and store
related instructions. Integer execution unit IEU also contains
microcode ROM 141 which contains instructions to execute certain
virtual machine instructions associated with integer
operations.
[0207] Integer execution unit IEU includes a cached portion of
stack 400, i.e., stack cache 155. Stack cache 155 provides fast
storage for operand stack and local variable entries associated
with a current method, e.g., operand stack 423 and local variable
storage 421 entries. Although, stack cache 155 may provide
sufficient storage for all operand stack and local variable entries
associated with a current method, depending on the number of
operand stack and local variable entries, less than all of local
variable entries or less than all of both local variable entries
and operand stack entries may be represented in stack cache 155.
Similarly, additional entries, e.g., operand stack and or local
variable entries for a calling method, may be represented in stack
cache 155 if space allows.
[0208] Stack cache 155 is a sixty-four entry thirty-two-bit wide
array of registers that is physically implemented as a register
file in one embodiment. Stack cache 155 has three read ports, two
of which are dedicated to integer execution unit IEU and one to
dribble manager unit 151. Stack cache 155 also has two write ports,
one dedicated to integer execution unit IEU and one to dribble
manager unit 151.
[0209] Integer unit 142 maintains the various pointers, which are
used to access variables, such as local variables, and operand
stack values, in stack cache 155. Integer unit 142 also maintains
pointers to detect whether a stack cache hit has taken place.
Runtime exceptions are caught and dealt with by exception handlers
that are implemented using information in microcode ROM 141 and
circuit 170.
[0210] Integer unit 142 contains a 32-bit ALU to support arithmetic
operations. The operations supported by the ALU include: add,
subtract, shift, and, or, exclusive or, compare, greater than, less
than, and bypass. The ALU is also used to determine the address of
conditional branches while a separate comparator determines the
outcome of the branch instruction.
[0211] The most common set of instructions which executes cleanly
through the pipeline is the group of ALU instructions. The ALU
instructions read the operands from the top of stack 400 in decode
stage 302 and use the ALU in execution stage 303 to compute the
result. The result is written back to stack 400 in write-back stage
305. There are two levels of bypass which may be needed if
consecutive ALU operations are accessing stack cache 155.
[0212] Since the stack cache ports are 32-bits wide in this
embodiment, double precision and long data operations take two
cycles. A shifter is also present as part of the ALU. If the
operands are not available for the instruction in decode stage 302,
or at a maximum at the beginning of execution stage 303, an
interlock holds the pipeline stages before execution stage 303.
[0213] The instruction cache unit interface of integer execution
unit IEU is a valid/accept interface, where instruction cache unit
120 delivers instructions to instruction decode unit 130 in fixed
fields along with valid bits. Instruction decoder 135 responds by
signaling how much byte aligner circuit 122 needs to shift, or how
many bytes instruction decode unit 130 could consume in decode
stage 302. The instruction cache unit interface also signals to
instruction cache unit 120 the branch mis-predict condition, and
the branch address in execution stage 303. Traps, when taken, are
also similarly indicated to instruction cache unit 120. Instruction
cache unit 120 can hold integer unit 142 by not asserting any of
the valid bits to instruction decode unit 130. Instruction decode
unit 130 can hold instruction cache unit 120 by not asserting the
shift signal to byte aligner circuit 122.
[0214] The data cache interface of integer execution unit IEU also
is a valid-accept interface, where integer unit 142 signals, in
execution stage 303, a load or store operation along with its
attributes, e.g., non-cached, special stores etc., to data cache
controller 161 in data cache unit 160. Data cache unit 160 can
return the data on a load, and control integer unit 142 using a
data control unit hold signal. On a data cache hit, data cache-unit
160 returns the requested data, and then releases the pipeline.
[0215] On store operations, integer unit 142 also supplies the data
along with the address in execution stage 303. Data cache unit 160
can hold the pipeline in cache stage 304 if data cache unit 160 is
busy, e.g., doing a line fill etc.
[0216] Floating point operations are dealt with specially by
integer execution unit IEU. Instruction decoder 135 fetches and
decodes floating point unit 143 related instructions. Instruction
decoder 135 sends the floating point operation operands for
execution to floating point unit 142 in decode state 302. While
floating point unit 143 is busy executing the floating point
operation, integer unit 142 halts the pipeline and waits until
floating point unit 143 signals to integer unit 142 that the result
is available.
[0217] A floating point ready signal from floating point unit 143
indicates that execution stage 303 of the floating point operation
has concluded. In response to the floating point ready signal, the
result is written back into stack cache 155 by integer unit 142,
Floating point load and stores are entirely handled by integer
execution unit IEU, since the operands for both floating point unit
143 and integer unit 142 are found in stack cache 155.
Stack Management Unit
[0218] A stack management unit 150 stores information, and provides
operands to execution unit 140. Stack management unit 150 also
takes care of overflow and underflow conditions of stack cache
155.
[0219] In one embodiment, stack management unit 150 includes stack
cache 155 that, as described above, a three read port, two write
port register file in one embodiment; a stack control unit 152
which provides the necessary control signals for two read ports and
one write port that are used to retrieve operands for execution
unit 140 and for storing data back from a write-back register or
data cache 165 into stack cache 155; and a dribble manager 151
which speculatively dribbles data in and out of stack cache 155
into memory whenever there is an overflow or underflow in stack
cache 155. In the exemplary embodiment of FIG. 1, memory includes
data cache 165 and any memory storage interfaced by memory
interface unit 110. In general, memory includes any suitable memory
hierarchy including caches, addressable read/write memory storage,
secondary storage, etc. Dribble manager 151 also provides the
necessary control signals for a single read port and a single write
port of stack cache 155 which are used exclusively for background
dribbling purposes.
[0220] In one embodiment, stack cache 155 is managed as a circular
buffer which ensures that the stack grows and shrinks in a
predictable manner to avoid overflows or overwrites. The saving and
restoring of values to and from data cache 165 is controlled by
dribbler manager 151 using high- and low-water marks, in one
embodiment.
[0221] Stack management unit 150 provides execution unit 140 with
two 32-bit operands in a given cycle. Stack management unit 150 can
store a single 32-bit result in a given cycle.
[0222] Dribble manager 151 handles spills and fills of stack cache
155 by speculatively dribbling the data in and out of stack cache
155 from and to data cache 165. Dribble manager 151 generates a
pipeline stall signal to stall the pipeline when a stack overflow
or underflow condition is detected. Dribble manager 151 also keeps
track of requests sent to data cache unit 160. A single request to
data cache unit 160 is a 32-bit consecutive load or store
request.
[0223] The hardware organization of stack cache 155 is such that,
except for long operands (long integers and double precision
floating-point numbers), implicit operand fetches for opcodes do
not add latency to the execution of the opcodes. The number of
entries in operand stack 423 (FIG. 4A) and local variable storage
421 that are maintained in stack cache 155 represents a
hardware/performance tradeoff. At least a few operand stack 423 and
local variable storage 421 entries are required to get good
performance. In the exemplary embodiment of FIG. 1, at least the
top three entries of operand stack 423 and the first four local
variable storage 421 entries are preferably represented in stack
cache 155.
[0224] One key function provided by stack cache 155 (FIG. 1) is to
emulate a register file where access to the top two registers is
always possible without extra cycles. A small hardware stack is
sufficient if the proper intelligence is provided to load/store
values from/to memory in the background, therefore preparing stack
cache 155 for incoming virtual machine instructions.
[0225] As indicated above, all items on stack 400 (regardless of
size) are placed into a 32-bit word. This tends to waste space if
many small data items are used, but it also keeps things relatively
simple and free of lots of tagging or muxing. An entry in stack 400
thus represents a value and not a number of bytes. Long integer and
double precision floating-point numbers require two entries. To
keep the number of read and write ports low, two cycles to read two
long integers or two double precision floating point numbers are
required.
[0226] The mechanism for filling and spilling the operand stack
from stack cache 155 out to memory by dribble manager 151 can
assume one of several alternative forms. One register at a time can
be filled or spilled, or a block of several registers filled or
spilled at once. A simple scoreboarded method is appropriate for
stack management. In its simplest form, a single bit indicates if
the register in stack cache 155 is currently valid. In addition,
some embodiments of stack cache 155 use a single bit to indicate
whether the data content of the register is saved to stack 400,
i.e., whether the register is dirty. In one embodiment, a
high-water mark/low-water mark heuristic determines when entries
are saved to and restored from stack 400, respectively (FIG. 4A).
Alternatively, when the top-of-the-stack becomes close to bottom
401 of stack cache 155 by a fixed, or alternatively, a programmable
number of entries, the hardware starts loading registers from stack
400 into stack cache 155. Detailed embodiments of stack management
unit 150 and dribble manager unit 151 are described below and in
U.S. patent application Ser. No. 08/787,736, entitled "STACK
MANAGEMENT UNIT AND METHOD FOR A PROCESSOR HAVING A STACK" naming
Marc Tremblay and James Michael O'Connor as inventors, assigned to
the assignee of this application, and filed on Jan. 23, 1997 with
Attorney Docket No. SP2037, now U.S. Pat. No. 6,038,643, which is
incorporated herein by reference in its entirety.
[0227] In one embodiment, stack management unit 150 also includes
an optional local variable look-aside cache 153. Cache 153 is most
important in applications where both the local variables and
operand stack 423 (FIG. 4A) for a method are not located on stack
cache 155. In such instances when cache 153 is not included in
hardware processor 100, there is a miss on stack cache 155 when a
local variable is accessed, and execution unit 140 accesses data
cache unit 160, which in turn slows down execution. In contrast,
with cache 153, the local variable is retrieved from cache 153 and
there is no delay in execution.
[0228] One embodiment of local variable, look-aside cache 153 is
illustrated in FIG. 4D for method 0 to 2 on stack 400. Local
variables zero to M, where M is an integer, for method 0 are stored
in plane 421A_0 of cache 153 and plane 421A_0 is accessed when
method number 402 is zero. Local variables zero to N, where N is an
integer, for method 1 are stored in plane 421A_1 of cache 153 and
plane 421A_1 is accessed when method number 402 is one. Local
variables zero to P, where P is an integer, for method 1 are stored
in plane 421A_2 of cache 153 and plane 421A_2 is accessed when
method number 402 is two. Notice that the various planes of cache
153 may be different sizes, but typically each plane of the cache
has a fixed size that is empirically determined.
[0229] When a new method is invoked, e.g, method 2, a new plane
421A_2 in cache 153 is loaded with the local variables for that
method, and method number register 402, which in one embodiment is
a counter, is changed, e.g., incremented, to point to the plane of
cache 153 containing the local variables for the new method. Notice
that the local variables are ordered within a plane of cache 153 so
that cache 153 is effectively a direct-mapped cache. Thus, when a
local variable is needed for the current method, the variable is
accessed directly from the most recent plane in cache 153, i.e.,
the plane identified by method number 402. When the current method
returns, e.g., method 2, method number register 402 is changed,
e.g., decremented, to point at previous plane 421A_1 of cache 153.
Cache 153 can be made as wide and as deep as necessary.
Data Cache Unit
[0230] Data cache unit 160 (DCU) manages all requests for data in
data cache 165. Data cache requests can come from dribbling manager
151 or execution unit 140. Data cache controller 161 arbitrates
between these requests giving priority to the execution unit
requests. In response to a request, data cache controller 161
generates address, data and control signals for the data and tags
RAMs in data cache 165. For a data cache hit, data cache controller
161 reorders the data RAM output to provide the right data.
[0231] Data cache controller 161 also generates requests to I/O bus
and memory interface unit 110 in case of data cache misses, and in
case of non-cacheable loads and stores. Data cache controller 161
provides the data path and control logic for processing
non-cacheable requests, and the data path and data path control
functions for handling cache misses.
[0232] For data cache hits, data cache unit 160 returns data to
execution unit 140 in one cycle for loads. Data cache unit 160 also
takes one cycle for write hits. In case of a cache miss, data cache
unit 160 stalls the pipeline until the requested data is available
from the external memory. For both non-cacheable loads and stores,
data cache 165 is bypassed and requests are sent to I/O bus and
memory interface unit 110. Non-aligned loads and stores to data
cache 165 trap in software.
[0233] Data cache 165 is a two-way set associative, write back,
write allocate, 16-byte line cache. The cache size is configurable
to 0, 1, 2, 4, 8, 16 Kbyte sizes. The default size is 6 Kbytes.
Each line has a cache tag store entry associated with the line. On
a cache miss, 16 bytes of data are written into cache 165 from
external memory.
[0234] Each data cache tag contains a 20-bit address tag field, one
valid bit, and one dirty bit. Each cache tag is also associated
with a least recently used bit that is used for replacement policy.
To support multiple cache sizes, the width of the tag fields also
can be varied. If a cache enable bit in processor service register
is not set, loads and stores are treated like non-cacheable
instructions by data cache controller 161.
[0235] A single sixteen-byte write back buffer is provided for
writing back dirty cache lines which need to be replaced. Data
cache unit 160 can provide a maximum of four bytes on a read and a
maximum of four bytes of data can be written into cache 165 in a
single cycle. Diagnostic reads and writes can be done on the
caches.
Memory Allocation Accelerator
[0236] In one embodiment, data cache unit 160 includes a memory
allocation accelerator 166. Typically, when a new object is
created, fields for the object are fetched from external memory,
stored in data cache 165 and then the field is cleared to zero.
This is a time consuming process that is eliminated by memory
allocation accelerator 166. When a new object is created, no fields
are retrieved from external memory. Rather, memory allocation
accelerator 166 simply stores line of zeros in data cache 165 and
marks that line of data cache 165 as dirty. Memory allocation
accelerator 166 is particularly advantageous with a write-back
cache. Since memory allocation accelerator 166 eliminates the
external memory access each time a new object is created, the
performance of hardware processor 100 is enhanced.
Floating, Point Unit
[0237] Floating point unit (FPU) 143 includes a microcode
sequencer, input/output section with input/output registers, a
floating point adder, i.e., an ALU, and a floating point
multiply/divide unit. The microcode sequencer controls the
microcode flow and microcode branches. The input/output section
provides the control for input/output data transactions, and
provides the input data loading and output data unloading
registers. These registers also provide intermediate result
storage.
[0238] The floating point adder ALU includes the combinatorial
logic used to perform the floating point adds, floating point
subtracts, and conversion operations. The floating point
multiply/divide unit contains the hardware for performing
multiply/divide and remainder.
[0239] Floating point unit 143 is organized as a microcoded engine
with a 32-bit data path. This data path is often reused many times
during the computation of the result. Double precision operations
require approximately two to four times the number of cycles as
single precision operations. The floating point ready signal is
asserted one-cycle prior to the completion of a given floating
point operation. This allows integer unit 142 to read the floating
point unit output registers without any wasted interface cycles.
Thus, output data is available for reading one cycle after the
floating point ready signal is asserted.
Execution Unit Accelerators
[0240] Since the JAVA Virtual Machine Specification of Section I is
hardware independent, the virtual machine instructions are not
optimized for a particular general type of processor, e.g., a
complex instruction-set computer (CISC) processor, or a reduced
instruction set computer (RISC) processor. In fact, some virtual
machine instructions have a CISC nature and others a RISC nature.
This dual nature complicates the operation and optimization of
hardware processor 100.
[0241] For example, the JAVA virtual machine specification defines
opcode 171 for an instruction lookupswitch, which is a traditional
switch statement. The datastream to instruction cache unit 320
includes an opcode 171, identifying the N-way switch statement,
that is followed zero to three bytes of padding. The number of
bytes of padding is selected so that first operand byte begins at
an address that is a multiple of four. Herein, datastream is used
generically to indicate information that is provided to a
particular element, block, component, or unit.
[0242] Following the padding bytes in the datastream are a series
of pairs of signed four-byte quantities. The first pair is special.
A first operand in the first pair is the default offset for the
switch statement that is used when the argument, referred to as an
integer key, or alternatively, a current match value, of the switch
statement is not equal to any of the values of the matches in the
switch statement. The second operand in the first pair defines the
number of pairs that follow in the datastream.
[0243] Each subsequent operand pair in the datastream has a first
operand that is a match value, and a second operand that is an
offset. If the integer key is equal to one of the match values, the
offset in the pair is added to the address of the switch statement
to define the address to which execution branches. Conversely, if
the integer key is unequal to any of the match values, the default
offset in the first pair is added to the address of the switch
statement to define the address to which execution branches. Direct
execution of this virtual machine instruction requires many
cycles.
[0244] To enhance the performance of hardware processor 100, a
look-up switch accelerator 145 is included in hardware processor
100. Look-up switch accelerator 145 includes an associative memory
which stores information associated with one or more lookup switch
statements. For each lookup switch statement, i.e., each
instruction lookupswitch, this information includes a lookup switch
identifier value, i.e., the program counter value associated with
the lookup switch statement, a plurality of match values and a
corresponding plurality of jump offset values.
[0245] Lookup switch accelerator 145 determines whether a current
instruction received by hardware processor 100 corresponds to a
lookup switch statement stored in the associative memory. Lookup
switch accelerator 145 further determines whether a current match
value associated with the current instruction corresponds with one
of the match values stored in the associative memory. Lookup switch
accelerator 145 accesses a jump offset value from the associative
memory when the current instruction corresponds to a lookup switch
statement stored in the memory and the current match value
corresponds with one of the match values stored in the memory
wherein the accessed jump offset value corresponds with the current
match value.
[0246] Lookup switch accelerator 145 further includes circuitry for
retrieving match and jump offset values associated with a current
lookup switch statement when the associative memory does not
already contain the match and jump offset values associated with
the current lookup switch statement. Lookup switch accelerator 145
is described in more detail in U.S. patent application Ser. No.
08/788,811, entitled "LOOK-UP SWITCH ACCELERATOR AND METHOD OF
OPERATING SAME" naming Marc Tremblay and James Michael O'Connor as
inventors, assigned to the assignee of this application, and filed
on Jan. 23, 1997 with Attorney Docket No. SP2040, now U.S. Pat. No.
6,076,141, which is incorporated herein by reference in its
entirety.
[0247] In the process of initiating execution of a method of an
object, execution unit 140 accesses a method vector to retrieve one
of the method pointers in the method vector, i.e., one level of
indirection. Execution unit 140 then uses the accessed method
pointer to access a corresponding method, i.e., a second level of
indirection.
[0248] To reduce the levels of indirection within execution unit
140, each object is provided with a dedicated copy of each of the
methods to be accessed by the object. Execution unit 140 then
accesses the methods using a single level of indirection. That is,
each method is directly accessed by a pointer which is derived from
the object. This eliminates a level of indirection, which was
previously introduced by the method pointers. By reducing the
levels of indirection, the operation of execution unit 140 can be
accelerated. The acceleration of execution unit 140 by reducing the
levels of indirection experienced by execution unit 140 is
described in more detail in U.S. patent application Ser. No.
08/787,846, entitled "REPLICATING CODE TO ELIMINATE A LEVEL OF
INDIRECTION DURING EXECUTION OF AN OBJECT ORIENTED COMPUTER
PROGRAM" naming Marc Tremblay and James Michael O'Connor as
inventors, assigned to the assignee of this application, and filed
on Jan. 23, 1997 with Attorney Docket No. SP2043, now U.S. Pat. No.
5,970,242, which is incorporated herein by reference in its
entirety.
Getfield-Putfield Accelerator
[0249] Other specific functional units and various translation
lookaside buffer (TLB) types of structures may optionally be
included in hardware processor 100 to accelerate accesses to the
constant pool. For example, the JAVA virtual machine specification
defines an instruction putfield, opcode 181, that upon execution
sets a field in an object and an instruction getfield, opcode 180,
that upon execution fetches a field from an object. In both of
these instructions, the opcode is followed by an index byte one and
an index byte two. Operand stack 423 contains a reference to an
object followed by a value for instruction putfield, but only a
reference to an object for instruction getfield.
[0250] Index bytes one and two are used to generate an index into
the constant pool of the current class. The item in the constant
pool at that index is a field reference to a class name and a field
name. The item is resolved to a field block pointer which has both
the field width, in bytes, and the field offset, in bytes.
[0251] An optional getfield-putfield accelerator 146 in execution
unit 140 stores the field block pointer for instruction getfield or
instruction putfield in a cache, for use after the first invocation
of the instruction, along with the index used to identify the item
in the constant pool that was resolved into the field block pointer
as a tag. Subsequently, execution unit 140 uses index bytes one and
two to generate the index and supplies the index to
getfield-putfield accelerator 146. If the index matches one of the
indexes stored as a tag, i.e., there is a hit, the field block
pointer associated with that tag is retrieved and used by execution
unit 140. Conversely, if a match is not found, execution unit 140
performs the operations described above. Getfield-putfield
accelerator 146 is implemented without using self-modifying code
that was used in one embodiment of the quick instruction
translation described above.
[0252] In one embodiment, getfield-putfield accelerator 146
includes an associative memory that has a first section that holds
the indices that function as tags, and a second section that holds
the field block pointers. When an index is applied through an input
section to the first section of the associative memory, and there
is a match with one of the stored indices, the field block pointer
associated with the stored index that matched in input-index is
output from the second section of the associative memory.
Bounds Check Unit
[0253] Bounds check unit 147 (FIG. 1) in execution unit 140 is an
optional hardware circuit that checks each access to an element of
an array to determine whether the access is to a location within
the array. When the access is to a location outside the array,
bounds check unit 147 issues an active array bound exception signal
to execution unit 140. In response to the active array bound
exception signal, execution unit 140 initiates execution of an
exception handler stored in microcode ROM 141 that in handles the
out of bounds array access.
[0254] In one embodiment, bounds check unit 147 includes an
associative memory element in which is stored a array identifier
for an array, e.g., a program counter value, and a maximum value
and a minimum value for the array. When an array is accessed, i.e.,
the array identifier for that array is applied to the associative
memory element, and assuming the array is represented in the
associative memory element, the stored minimum value is a first
input signal to a first comparator element, sometimes called a
comparison element, and the stored maximum value is a first input
signal to a second comparator element, sometimes also called a
comparison element. A second input signal to the first and second
comparator elements is the value associated with the access of the
array's element.
[0255] If the value associated with the access of the array's
element is less than or equal to the stored maximum value and
greater than or equal to the stored minimum value, neither
comparator element generates an output signal. However, if either
of these conditions is false, the appropriate comparator element
generates the active array bound exception signal. A more detailed
description of one embodiment of bounds check unit 147 is provided
in U.S. patent application Ser. No. 08/786,352, entitled "PROCESSOR
WITH ACCELERATED ARRAY ACCESS BOUNDS CHECKING" naming Marc
Tremblay, James Michael O'Connor, and William N. Joy as inventors,
assigned to the assignee of this application, and filed on Jan. 23,
1997 with Attorney Docket No. SP2041, now U.S. Pat. No. 6,014,723,
which is incorporated herein by reference in its entirety.
[0256] The JAVA Virtual Machine Specification defines that certain
instructions can cause certain exceptions. The checks for these
exception conditions are implemented, and a hardware/software
mechanism for dealing with them is provided in hardware processor
100 by information in microcode ROM 141 and program counter and
trap control logic 170. The alternatives include having a trap
vector style or a single trap target and pushing the trap type on
the stack so that the dedicated trap handler routine determines the
appropriate action.
[0257] No external cache is required for the architecture of
hardware processor 100. No translation lookaside buffers need be
supported.
[0258] FIG. 5 illustrates several possible add-ons to hardware
processor 100 to create a unique system. Circuits supporting any of
the eight functions shown, i.e., NTSC encoder 501, MPEG 502,
Ethernet controller 503, VIS 504, ISDN 505, I/O controller 506, ATM
assembly/reassembly 507, and radio link 508 can be integrated into
the same chip as hardware processor 100 of this invention.
[0259] FIG. 6 is a block diagram of one embodiment of a stack
management unit 150. Stack management unit 150 serves as a high
speed buffer between stack 400 and hardware processor 100. Hardware
processor 100 accesses stack management unit 150 as if stack
management unit 150 were stack 400. Stack management unit 150
automatically transfers data between stack management unit 150 and
stack 400 as necessary to provide improve the throughput of data
between stack 400 and hardware processor 100. In the embodiment of
FIG. 1, if hardware processor 100 requires a data word which is not
cached in stack management unit 150, data cache unit 160 retrieves
the requested data word and places the requested data word at the
top of stack cache 155.
[0260] Stack management unit 150 contains a stack cache, memory
circuit 610. Stack cache memory circuit 610 is typically fast
memory devices such as a register file or SRAM; however, slower
memory devices such as DRAM can also be used. In the embodiment of
FIG. 6, access to stack cache memory circuit 610 is controlled by
stack control unit 152. A write port 630 allows hardware processor
100 to write data on data lines 635 to stack cache memory circuit
610. Read port 640 and read port 650 allow hardware processor 100
to read data from stack cache memory circuit 610 on data lines 645
and 655, respectively. Two read ports are provided to increase
throughput since many operations of stack-based computing systems
require two operands from stack 400. Other embodiments of stack
cache 155 may provide more or less read and write ports.
[0261] As explained above, dribble manager unit 151 controls the
transfer of data between stack 400 (FIG. 4(a)) and stack cache
memory circuit 610. In the embodiment shown in FIG. 1, the transfer
of data between stack 400 and stack cache memory circuit 610 goes
through data cache unit 160. Dribble manager unit 151 includes a
fill control unit 694 and a spill control unit 698. In some
embodiments of dribble manager unit 151, fill control unit 694 and
spill control unit 698 function independently. Fill control unit
694 determines if a fill condition exists. If the fill condition
exists, fill control unit 694 transfers data words from stack 400
to stack cache memory circuit 610 on data lines 675 through a write
port 670. Spill control unit 698 determines if a spill condition
exists. If the spill condition exists, spill control unit 698
transfers data words from stack cache memory circuit 610 to stack
400 through read port 680 on data lines 685. Write port 670 and
read port 680 allows transfers between stack 400 and stack cache
memory circuit 610 to occur simultaneously with reads and writes
controlled by stack control unit 152. If contention for read and
write ports of stack cache memory circuit 610 is not important,
dribble manager unit 151 can share read and write ports with stack
control unit 152.
[0262] Although stack management unit 150 is described in the
context of buffering stack 400 for hardware processor 100, stack
management unit 150 can perform caching for any stack-based
computing system. The details of hardware processor 100, are
provided only as an example of one possible stack-based computing
system for use with the present invention. Thus, one skilled in the
art can use the principles described herein to design a stack
management unit in accordance to the present invention for any
stack-based computing system.
[0263] FIG. 7 shows a conceptual model of the memory architecture
of stack cache memory circuit 610 for one embodiment of stack cache
155. Specifically, in the embodiment of FIG. 7, stack cache memory
circuit 610 is a register file organized in a circular buffer
memory architecture capable of holding 64 data words. Other
embodiments may contain a different number of data words. The
circular memory architecture causes data words in excess of the
capacity of stack cache memory circuit 610 to be written to
previously used registers. If stack cache memory unit 610 uses a
different memory device, such as an SRAM, different registers would
correspond to different memory locations. One technique to address
registers in a circular buffer is to use pointers containing modulo
stack cache size (modulo-SCS) addresses to the various registers of
stack cache memory circuit 610. As used herein, modulo-N operations
have the results of the standard operation mapped to a number
between 0 and SCS-1 using a standard MOD N function. Some common
modulo operations are defined as follows [0264] Modulo-N addition
of X and Y=(X+Y) MOD N, [0265] Modulo-N subtraction of X and
Y=(X-Y) MOD N, [0266] Modulo-N increment of X by Y=(X+Y) MOD N,
[0267] Modulo-N decrement of X by Y=(X-Y) MOD N.
[0268] One embodiment of the pointer addresses of the registers of
stack cache memory circuit 610 are shown in FIG. 7 as numbered 0-63
along the outer edge of stack cache memory circuit 610. Thus for
the embodiment of FIG. 7, if 70 data words (numbered 1 to 70) are
written to stack cache memory circuit 610 when stack cache memory
circuit 610 is empty, data words 1 to 64 are written to registers 0
to 63, respectively and data words 65 to 70 are written
subsequently to registers 0 to 5. Prior to writing data words 65 to
70, dribble manager unit 151, as described below, transfers data
words 1 to 6 which were in registers 0 to 5 to stack 400.
Similarly, as data words 70 to 65 are read out of stack cache
memory circuit 610, data words 1 to 6 can be retrieved from stack
400 and placed in memory locations 0 to 5.
[0269] Since most reads and writes on a stack are from the top of
the stack, a pointer OPTOP contains the location of the top of
stack 400, i.e. the top memory location. In some embodiments of
stack management unit 150, pointer OPTOP is a programmable register
in execution unit 140. However other embodiments of stack
management unit 150 maintain pointer OPTOP in stack control unit
152. Since pointer OPTOP is often increased by one, decreased by
one, or changed by a specific amount, pointer OPTOP, in one
embodiment is a programmable up/down counter.
[0270] Since stack management unit 150 contains the top portion of
stack 400, pointer OPTOP indicates the register of stack cache
memory circuit 610 containing the most recently written data word
in stack cache memory circuit 610, i.e. pointer OPTOP points to the
register containing the most recently written data word also called
the top register. Some embodiments of stack management unit 150
also contains a pointer OPTOPI (not shown) which points to the
register preceding the register pointed to by pointer OPTOP.
Pointer OPTOPI can improve the performance of stack management unit
150 since many operations in hardware processor 100 require two
data words from stack management unit 150.
[0271] Pointer OPTOP and pointer OPTOP1 are incremented whenever a
new data word is written to stack cache 155. Pointer OPTOP and
pointer OPTOP1 are decremented whenever a stacked data word, i.e. a
data word already in stack 400, is popped off of stack cache 155.
Since some embodiments of hardware processor 100 may add or remove
multiple data words simultaneously, pointer OPTOP and OPTOP1 are
implemented, in one embodiment as programmable registers so that
new values can be written into the registers rather than requiring
multiple increment or decrement cycles.
[0272] If stack cache 155 is organized using sequential addressing,
pointer OPTOP1 may also be implemented using a modulo SCS
subtractor which modulo-SCS subtracts one from pointer OPTOP. Some
embodiments of stack cache 155 may also include pointers OPTOP2 or
pointer OPTOP3.
[0273] Since data words are stored in stack cache memory circuit
610 circularly, the bottom of stack cache memory circuit 610 can
fluctuate. Therefore, most embodiments of stack cache memory
circuit 610 include a pointer CACHE_BOTTOM to indicate the bottom
memory location of stack cache memory circuit 610. Pointer
CACHE_BOTTOM is typically maintained by dribble manager unit 151.
The process to increment or decrement pointer CACHE_BOTTOM varies
with the specific embodiment of stack management unit 150. Pointer
CACHE_BOTTOM is typically implemented as a programmable up/down
counter.
[0274] Some embodiments of stack management unit 150 also includes
other pointers, such as pointer VARS, which points to a memory
location of a data word that is often accessed. For example, if
hardware processor 100 is implementing the JAVA Virtual Machine,
entire method frames may be placed in stack management unit 150.
The method frames often contain local variables that are accessed
frequently. Therefore, having pointer. VARS pointed to the first
local variable of the active method decreases the access time
necessary to read the local variable. Other pointers such as a
pointer VARS1 (not shown) and a pointer VARS2 (not shown) may point
to other often used memory locations such as the next two local
variables of the active method in a JAVA Virtual Machine. In some
embodiments of stack management unit 150, these pointers are
maintained in stack control unit 152. In embodiments adapted for
use with hardware processor 100, pointer VARS is stored in a
programmable register in execution unit 140. If stack cache 155 is
organized using sequential addressing, pointer VARS1 may also be
implemented using a modulo-SCS adder which modulo-SCS adds one to
pointer VARS.
[0275] To determine which data words to transfer between stack
cache memory circuit 610 and stack 400, stack management unit 150,
typically tags, i.e. tracks, the valid data words and the data
words which are stored in both stack cache memory circuit 610 and
stack 400. FIG. 8 illustrates one tagging scheme used in some
embodiments of stack management unit 150. Specifically, FIG. 8
shows a register 810 from stack cache memory circuit 610. The
actual data word is stored in data section 812. A valid bit 814 and
a saved bit 816 are used to track the status of register 810. If
valid bit 814 is at a valid logic state, typically logic high, data
section 812 contains a valid data word. If valid bit 814 is at an
invalid logic state, typically logic low, data section 812 does not
contain a valid data word. If saved bit 816 is at a saved logic
state, typically logic high, the data word contained in data
section 812 is also stored in stack 400. However, if saved bit 816
is at an unsaved logic state, typically logic low, the data word
contained in data section 812 is not stored in stack 400.
Typically, when stack management unit 150 is powered up or reset,
valid bit 814 of each register is set to the invalid logic state
and saved bit 816 of each register is set to the unsaved logic
state.
[0276] For the embodiment illustrated in FIG. 6 using the tagging
method of FIG. 8, when stack control unit 152 writes a data word to
a register in stack cache memory circuit 610 through write port 630
the valid bit of that register is set to the valid logic state and
the saved bit of that register is set to the unsaved logic state.
When dribble manager unit 151 transfer a data word to a register of
stack cache memory circuit 610 through write port 670, the valid
bit of that register is set to the valid logic state and the saved
bit of that register is set to the saved logic state since the data
word is currently saved in stack 400.
[0277] When hardware processor 100 reads a stacked data word using
a stack popping operation from a register of stack cache memory
circuit 610 through either read port 640 or read port 650 the valid
bit of that register is set to the invalid logic state and the
saved bit of that location is set to the unsaved logic state.
Typically, stack popping operations use the register indicated by
pointer OPTOP or pointer OPTOP1.
[0278] When hardware processor 100 reads a data word with a
non-stack popping operation from a register of stack cache memory
circuit 610 through either read port 640 or read port 650 the valid
bit and saved bit of the register are not changed. For example, if
hardware processor 100 is implementing the JAVA Virtual Machine, a
local variable stored in stack cache memory circuit 610 in the
register indicated by pointer VARS may be used repeatedly and
should not be removed from stack cache 155. When dribble manager
unit 151 copies a data word from a register of stack cache memory
circuit 610 to stack 400 through read port 680, the valid bit of
that register remains in the valid logic state since the saved data
word is still contained in that register and the saved bit of that
register is set to the saved logic state.
[0279] Since stack cache 155 is generally much smaller than the
memory address space of hardware processor 100, the pointers used
to access stack cache memory circuit 610 are generally much smaller
than general memory addresses. The specific technique used to map
stack cache 155 into the memory space of hardware processor 100 can
vary. In one embodiment of hardware processor 100 the pointers used
to access stack cache memory circuit 610 are only the lower bits of
general memory pointers, i.e, the least significant bits. For
example, if stack cache memory circuit 610 comprises 64 registers,
pointers OPTOP, VARS, and CACHE_BOTTOM need only be six bits long.
If hardware processor 100 has a 12 bit address space, pointers
OPTOP, VARS, and CACHE_BOTTOM could be the lower-six bits of a
general memory pointer. Thus stack cache memory circuit 610 is
mapped to a specific segment of the address space having a unique
upper six bit combination.
[0280] Some embodiments of stack cache management unit 150 may be
used with purely stacked based computing system so that there is
not a memory address space for the system. In this situation, the
pointers for accessing stack cache 155 are only internal to stack
cache management unit 150.
[0281] As explained above, hardware processor 100 primarily
accesses data near the top of the stack. Therefore, stack
management unit 150 can improve data accesses of hardware processor
100 while only caching the top portion of stack 400. When hardware
processor 100 pushes more data words to stack management unit 150
than stack cache memory circuit 610 is able to store, the data
words near the bottom of stack cache memory circuit 610 are
transferred to stack 400. When hardware processor 100 pops data
words out of stack cache 155, data words from stack 400 are copied
under the bottom of stack cache memory circuit 610, and pointer
CACHE_BOTTOM is decremented to point to the new bottom of stack
cache memory circuit 610.
[0282] Determination of when to transfer data words between stack
400 and stack cache memory circuit 610 as well as how many data
words to transfer can vary. In general, dribble manager unit 151
should transfer data from stack cache memory circuit 610 to stack
400, i.e. a spill operation, as hardware processor fills stack
cache memory circuit 610. Conversely, dribble manager unit 151
should copy data from stack 400 to stack cache memory circuit 610,
i.e. a fill operation, as hardware processor empties stack cache
memory circuit 610.
[0283] FIG. 9 shows one embodiment of dribble manager unit 151 in
which decisions on transferring data from stack cache memory
circuit 610 to stack 400, i.e. spilling data, are based on the
number of free registers in stack cache memory circuit 610. Free
registers includes registers without valid data as well as
registers containing data already stored in stack 400, i.e.
registers with saved bit 816 set to the saved logic state.
Decisions on transferring data from stack 400 to stack cache memory
circuit 610, i.e. filling data, are based on the number of used
registers. A used registers contains a valid but unsaved data word
in stack cache memory circuit 610.
[0284] Specifically in the embodiment of FIG. 9, dribble manager
unit 151 further includes a stack cache status circuit 910 and a
cache bottom register 920, which can be a programmable up/down
counter. Stack cache status circuit 910, receives pointer
CACHE_BOTTOM from cache bottom register 920 and pointer OPTOP to
determine the number of free registers FREE and the number of used
registers USED.
[0285] For a circular buffer using sequential modulo-SCS
addressing, as in FIG. 7, the number of free registers FREE is
defined as [0286] FREE=SCS-(OPTOP-CACHE_BOTTOM+1) MOD SCS, where
SCS is the size of stack cache 155. Thus, for the specific pointer
values shown in FIG. 7, the number of free registers FREE is 34, as
calculated by: FREE=64-((27-62+1)MOD 64)=34.
[0287] Similarly, for a circular buffer using sequential modulo
addressing, the number of used registers USED defined as
USED=(OPTOP-CACHE_BOTTOM+1)MOD SCS.
[0288] Thus, for the specific pointer values shown in FIG. 7, the
number of used registers USED is 30, as calculated by:
USED=(27-62+1)MOD 64.
[0289] Thus, stack cache status circuit 910 can be implemented with
a modulo SCS adder/subtractor. The number of used registers USED
and the number of free registers FREE can also be generated using a
programmable up/down counters. For example, a used register can be
incremented whenever a data word is added to stack cache 155 and
decremented whenever a data word is removed from stack cache 155.
Specifically, if pointer OPTOP is modulo-SCS incremented by some
amount, the used register is incremented by the same amount. If
pointer OPTOP is modulo-SCS decremented by some amount, the used
register is decremented by the same amount. However, if pointer
CACHE_BOTTOM is modulo-SCS incremented by some amount, the used
register is decremented by the same amount. If pointer CACHE_BOTTOM
is modulo-SCS. decremented by some amount, the used register is
incremented the same amount. The number of free registers FREE can
be generated by subtracting the number of used registers USED from
the total number of registers.
[0290] Spill control unit 694 (FIGS. 6 and 9) includes a cache high
threshold register 930 and a comparator 940. Comparator 940
compares the value in cache high threshold register 930 to the
number of free registers FREE. If the number of free registers FREE
is less than the value in cache high threshold register 930,
comparator 940 drives a spill signal SPILL to a spill logic level,
typically logic high, to indicate that the spill condition exists
and one or more data words should be transferred from stack cache
memory circuit 610 to stack 400, i.e. a spill operation should be
performed. The spill operation is described in more detail below.
Typically, cache high threshold register 930 is programmable by
hardware processor 100.
[0291] Fill control unit 698 (FIGS. 6 and 9) includes a cache low
threshold register 950 and a comparator 960. Comparator 960
compares the value in cache low threshold register 950 to the
number of used registers USED. If the number of used registers is
less than the value in cache low threshold register 950, comparator
960 drives a fill signal FILL to a fill logic level, typically
logic high, to indicate that the fill condition exists and one or
more data words should be transferred from stack 400 to stack cache
memory circuit 610, i.e. a fill operation should be performed. The
fill operation is described in more detail below. Typically, cache
low threshold register 950 is programmable by hardware processor
100.
[0292] If the value in cache high threshold 930 and cache low
threshold 940 is always the same, a single cache threshold register
can be used. Fill control unit 698 can be modified to use the
number of free registers FREE to drive signal FILL to the fill
logic level if then number of free registers is greater than the
value in cache low threshold 950, with a proper modification of the
value in cache low threshold 950. Alternatively, spill control unit
694 can be modified to use the number of used registers.
[0293] FIG. 10A shows another embodiment of dribble manager unit
151, which uses a high-water mark/low water mark heuristic to
determine when a spill condition or a fill condition exists. Spill
control unit 694 includes a high water mark register 1010
implemented as a programmable up/down counter. A comparator 1020 in
spill control unit 694 compares the value in high water mark
register 1010, i.e. the high water mark, with pointer OPTOP. If
pointer OPTOP is greater than the high water mark, comparator 1020
drives spill signal SPILL to the spill logic level to indicate a
spill operation should be performed. Since, the high water mark is
relative to pointer CACHE_BOTTOM, the high water mark is modulo-SCS
incremented and modulo-SCS decremented whenever pointer
CACHE_BOTTOM is modulo-SCS incremented or modulo-SCS decremented,
respectively.
[0294] Fill control unit 698 includes a low water mark register
1010 implemented as a programmable up/down counter. A comparator
1030 in fill control unit 698 compares the value in low water mark
register 1030, i.e. the low water mark, with pointer OPTOP. If
pointer OPTOP is less than the low water mark, comparator 1040
drives fill signal FILL to the fill logic level to indicate a fill
operation should be performed. Since the low water mark is relative
to pointer CACHE_BOTTOM, the low water mark register is modulo-SCS
incremented and modulo-SCS decremented whenever pointer
CACHE_BOTTOM is modulo-SCS incremented or modulo-SCS decremented,
respectively.
[0295] FIG. 10B shows an alternative circuit to generate the high
water mark and low water mark. Cache high threshold register 930,
typically implemented as a programmable register, contains the
number of free registers which should be maintained in stack cache
memory circuit 610. The high water mark is then calculated by
modulo-SCS subtractor 1050 by modulo-SCS subtracting the value in
cache high threshold register 930 from pointer CACHE_BOTTOM stored
in cache bottom register 920.
[0296] The low water mark is calculated by doing a modulo-SCS
addition. Specifically, cache low threshold register 950 is
programmed to contain the minimum number of used data registers
desired to be maintained in stack cache memory circuit 610. The low
water mark is then calculated by modulo-SCS adder 1060 by
modulo-SCS adding the value in cache low threshold register 950
with pointer CACHE_BOTTOM stored in cache bottom register 920.
[0297] As described above, a spill operation is the transfer of one
or more data words from stack cache memory circuit 610 to stack
400. In the embodiment of FIG. 1, the transfers occurs though data
cache unit 160. The specific interface between stack management
unit 150 and data cache unit 160 can vary. Typically, stack
management unit 150, and more specifically dribble manager unit
151, sends the data word located at the bottom of stack cache 155,
as indicated by pointer CACHE_BOTTOM from read port 680 to data
cache unit 160. The value of pointer CACHE_BOTTOM is also provided
to data cache unit 160 so that data cache unit 160 can address the
data word appropriately. The saved bit of the register indicated by
pointer CACHE_BOTTOM is set to the saved logic level. In addition,
pointer CACHE_BOTTOM is modulo-SCS incremented by one. Other
registers as described above may also be modulo-SCS incremented by
one. For example, high water mark register 1010 (FIG. 10A) and low
water mark 1030 would be modulo-SCS incremented by one. Some
embodiments of dribble manager unit 157. transfer multiple words
for each spill operation. For these embodiments, pointer
CACHE_BOTTOM is modulo-SCS incremented by the number words
transferred to stack 400.
[0298] In embodiments using a saved bit and valid bit, as shown in
FIG. 8, some optimization is possible. Specifically, if the saved
bit of the data register pointed to by pointer CACHE_BOTTOM is at
the saved logic level, the data word in that data register is
already stored in stack 400. Therefore, the data word in that data
register does not need to be copied to stack 400. However, pointer
CACHE_BOTTOM is still modulo-SCS incremented by one.
[0299] A fill operation transfers data words from stack 400 to
stack cache memory circuit 610. In the embodiment of FIG. 1, the
transfers occurs though data cache unit 160. The specific interface
between stack management unit 150 and data cache unit 160 can vary.
Typically, stack management unit 150, and more specifically dribble
manager unit 151, determines whether the data register preceding
the data register pointed by CACHE_BOTTOM is free, i.e. either the
saved bit is in the saved logic state or the valid bit is in the
invalid logic state. If the data register preceding the data
register pointed to by pointer CACHE_BOTTOM is free, dribble
manager unit 151 requests a data word from stack 400 by sending a
request with the value of pointer CACHE_BOTTOM modulo-SCS minus
one. When the data word is received from data cache unit 160,
pointer CACHE_BOTTOM is modulo-SCS decremented by one and the
received data word is written to the data register pointed to by
pointer CACHE_BOTTOM through write port 670. Other registers as
described above may also be modulo-SCS decremented. The saved bit
and valid bit of the register pointed to by pointer CACHE_BOTTOM
are set to the saved logic state and valid logic state,
respectively. Some embodiments of dribble manager unit 151 transfer
multiple words for each spill operation. For these embodiments,
pointer CACHE_BOTTOM is modulo-SCS decremented by the number words
transferred to stack 400.
[0300] In embodiments using a saved bit and valid bit, as shown in
FIG. 8, some optimization is possible. Specifically, if the saved
bit and valid bit of the data register preceding the data register
pointed to by pointer CACHE_BOTTOM is at the saved logic level and
the valid logic level, respectively, then the data word in that
data register was never overwritten. Therefore, the data word in
that data register does not need to be copied from stack 400.
However, pointer CACHE_BOTTOM is still modulo-SCS decremented by
one.
[0301] As stated above, in one embodiment of stack cache 155,
hardware processor 100 accesses stack cache memory circuit 610
(FIG. 6) through write port 630, read port 640 and read port 650.
Stack control unit 152 generates pointers for write port 630, read
port 640, and read port 650 based on the requests of hardware
processor 100. FIG. 11 shows a circuit to generate pointers for a
typical operation which reads two data words from stack cache 155
and writes one data word to stack cache 155. The most common stack
manipulation for a stack-based computing system is to pop the top
two data words off of the stack and to push a data word onto the
top of the stack. Therefore, the circuit of FIG. 11 is configured
to be able to provide read pointers to the value of pointer OPTOP
and the value of pointer OPTOP modulo-SCS minus one, and a write
pointer to the current value of OPTOP modulo-SCS minus one.
[0302] Multiplexer (MUX) 1110 drives a read pointer RP1 for read
port 640. A select line RS1 controlled by hardware processor 100
determines whether multiplexer 1110 drives the same value as
pointer OPTOP or a read address R_ADDR1 as provided by hardware
processor 100.
[0303] Multiplexer 1120 provides a read pointer RP2 for read port
650. Modulo adder 1140 modulo-SCS adds negative one to the value of
pointer OPTOP and drives the resulting sum to multiplexer 1120. A
select line RS2 controlled by hardware processor 100 determines
whether multiplexer 1120 drives the value from modulo adder 1140 or
a read address R_ADDR2 as provided by hardware processor 100.
[0304] Multiplexer 1130 provides a write pointer WP for write port
630. A modulo adder 1150 modulo-SCS adds one to the value of
pointer OPTOP and drives the resulting sum to multiplexer 1130.
Select lines WS controlled by hardware processor 100 determines
whether multiplexer 1130 drives the value from modulo-SCS adder
1140, the value from modulo-SCS adder 1150, or a write address
W_ADDR as provided by hardware processor 100.
[0305] FIG. 12 shows a circuit that generates a read pointer R for
read port 640 or read port 650 in embodiments allowing accessing
stack cache memory circuit using pointer VARS. Multiplexer 1260
drives read pointer R to one of several input values received on
input ports 1261-1267 as determined by selection signals RS.
Selection signals RS are controlled by hardware processor 100. The
value of pointer OPTOP is driven to input port 1261. Modulo-SCS
adder 1210 drives the modulo-SCS sum of the value of pointer OPTOP
with negative one to input port 1262. Modulo-SCS adder 1210 drives
the modulo-SCS sum of the value of pointer OPTOP with negative two
to input port 1263. The value of pointer VARS is driven to input
port 1264. Modulo-SCS adder 1230 drives the modulo-SCS sum of the
value of pointer VARS with one to input port 1265. Modulo-SCS adder
1240 drives the modulo-SCS sum of the value of pointer VARS with
two to input port 1266. Modulo adder-SCS 1250 drives the modulo-SCS
sum of the value of pointer VARS with three to input port 1263.
Other embodiments may provide other values to the input ports of
multiplexer 1260.
[0306] Thus by using the stack cache according to the principles of
the invention, a dribbling management unit can efficiently control
transfers between the stack cache and the stack. Specifically, the
dribbling management unit is able to transfer data out of the stack
cache to make room for additional data as necessary and transfer
data into the stack cache as room becomes available transparently
to the stack-based computing system using the stack management
unit.
[0307] The various embodiments of the structure and method of this
invention that are described above are illustrative only of the
principles of this invention and are not intended to limit the
scope of the invention to the particular embodiments described. In
view of this disclosure, those skilled-in-the-art can define other
memory circuits, registers, counters, stack-based computing
systems, dribble management units, fill control units, spill
control units, read ports, write ports, and use these alternative
features to create a method or system of stack caching according to
the principles of this invention.
SECTION I
The JAVA Virtual Machine Specification
[0308] .COPYRGT.1993, 1994, 1995 Sun Microsystems, Inc. [0309] 2550
Garcia Avenue, Mountain View, Calif. [0310] 94043-1100 U.S.A.
[0311] All rights reserved. This BETA quality release and related
documentation are protected by copyright and distributed under
licenses restricting its use, copying, distribution, and
decompilation. No part of this release or related documentation may
be reproduced in any form by any means without prior written
authorization of Sun and its licensors, if any.
[0312] Portions of this product may be derived from the UNIX.RTM.
and Berkeley 4.3 BSD systems, licensed from UNIX System
Laboratories, Inc. and the University of California, respectively.
Third-party font software in this release is protected by copyright
and licensed from Sun's Font Suppliers.
[0313] RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by
the United States Government is subject to the restrictions set
forth in DFARS 252.227-7013 (c) (1) (ii) and FAR 52.227-19.
[0314] The release described in this manual may be protected by one
or more U.S. patents, foreign patents, or pending applications.
Trademarks
[0315] Sun, Sun Microsystems, Sun Microsystems Computer
Corporation, the Sun logo, the Sun Microsystems Computer
Corporation logo, WebRunner, JAVA, FirstPerson and the FirstPerson
logo and agent are trademarks or registered trademarks of Sun
Microsystems, Inc. The "Duke" character is a trademark of Sun
Microsystems, Inc. and Copyright (c) 1992-1995 Sun Microsystems,
Inc. All Rights Reserved. UNIX.RTM. is a registered trademark in
the United States and other countries, exclusively licensed through
X/Open Company, Ltd. OPEN LOOK is a registered trademark of Novell,
Inc. All other product names mentioned herein are the trademarks of
their respective owners.
[0316] All SPARC trademarks, including the SCD Compliant Logo, are
trademarks or registered trademarks of SPARC International, Inc.
SPARCstation, SPARCserver, SPARCengine, SPARCworks, and
SPARCompiler are licensed exclusively to Sun Microsystems, Inc;
Products bearing SPARC trademarks are based upon an architecture
developed by Sun Microsystems, Inc.
[0317] The OPEN LOOK.RTM. and Sun.TM. Graphical User Interfaces
were developed by Sun Microsystems, Inc. for its users and
licensees. Sun acknowledges the pioneering efforts of Xerox in
researching and developing the concept of visual or graphical user
interfaces for the computer industry. Sun holds a non-exclusive
license from Xerox to the Xerox Graphical User Interface, which
license also covers Sun's licensees who implement OPEN LOOK GUIs
and otherwise comply with Sun's written license agreements.
[0318] X Window System is a trademark and product of the
Massachusetts Institute of Technology.
[0319] THIS PUBLICATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT.
[0320] THIS PUBLICATION COULD INCLUDE TECHNICAL INACCURACIES OR
TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE
INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW
EDITIONS OF THE PUBLICATION. SUN MICROSYSTEMS, INC. MAY MAKE
IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S)
DESCRIBED IN THIS PUBLICATION AT ANY TIME.
Preface
[0321] This document describes version 1.0 of the JAVA Virtual
Machine and its instruction set. We have written this document to
act as a specification for both compiler writers, who wish to
target the machine, and as a specification for others who may wish
to implement a compliant JAVA Virtual Machine.
[0322] The JAVA Virtual Machine is an imaginary machine that is
implemented by emulating it in software on a real machine. Code for
the JAVA Virtual Machine is stored in class files, each of which
contains the code for at most one public class.
[0323] Simple and efficient emulations of the JAVA Virtual Machine
are possible because the machine's format is compact and efficient
bytecodes. Implementations whose native code speed approximates
that of compiled C are also possible, by translating the bytecodes
to machine code, although Sun has not released such implementations
at this time.
[0324] The rest of this document is structured as follows: [0325]
Chapter 1 describes the architecture of the JAVA Virtual Machine;
[0326] Chapter 2 describes the class file format; [0327] Chapter 3
describes the bytecodes; and [0328] Appendix A contains some
instructions generatedinternally by Sun's implementation of the
JAVA Virtual Machine. While not strictly part of the specification
we describe these here so that this specification can serve as a
reference for our implementation. As more implementations of the
JAVA Virtual Machine become available, we may remove Appendix A
from future releases. [0329] Sun will license the JAVA Virtual
Machine trademark and logo for use with compliant implementations
of this specification. If you are considering constructing your own
implementation of the JAVA Virtual Machine please contact us, at
the email address below, 60 that we can work together to insure
100% compatibility of your implementation.
[0330] Send comments on this specification or questions about
implementing the JAVA Virtual Machine to our electronic mail
address:JAVA@JAVA.sun.com.
1. JAVA Virtual Machine Architecture
1.1 Supported Data Types
[0331] The virtual machine data types include the basic data types
of the JAVA language:
[0332] byte //1-byte signed 2's complement integer
[0333] short //2-byte signed 2's complement integer
[0334] int //4-byte signed 2's complement integer
[0335] long //8-byte signed 2's complement integer
[0336] float //4-byte IEEE 754 single-precision float
[0337] double //8-byte IEEE 754 double-precision float
[0338] char //2-byte unsigned Unicode character
[0339] Nearly all JAVA type checking is done at compile time. Data
of the primitive types shown above need not be tagged by the
hardware to allow execution of JAVA. Instead, the bytecodes that
operate on primitive values indicate the types of the operands so
that, for example, the iadd, ladd, fadd, and dadd instructions each
add two numbers, whose types are int, long, float, and double,
respectively
[0340] The virtual machine doesn't have separate instructions for
boolean types. Instead, integer instructions, including integer
returns, are used to operate on boolean values; byte arrays are
used for arrays of boolean.
[0341] The virtual machine specifies that floating point be done in
IEEE 754 format, with support for gradual underflow. Older computer
architectures that do not have support for IEEE format may run JAVA
numeric programs very slowly.
[0342] Other virtual machine data types include: [0343] object
//4-byte reference to a JAVA object [0344] returnAddress //4 bytes,
used with jsr/ret/jsr_w/ret_w instructions Note: JAVA arrays are
treated as objects.
[0345] This specification does not require any particular internal
structure for objects. In our implementation an object reference is
to a handle, which is a pair of pointers: one to a method table for
the object, and the other to the data allocated for the object.
Other implementations may use inline caching, rather than method
table dispatch such methods are likely to be faster on hardware
that is emerging between now and the year 2000.
[0346] Programs represented by JAVA Virtual Machine bytecodes are
expected to maintain proper type discipline and an implementation
may refuse to execute a bytecode program that appears to violate
such type discipline.
[0347] While the JAVA Virtual Machines would appear to be limited
by the bytecode de nition to running on a 32-bit address space
machine, it is possible to build a version of the JAVA Virtual
Machine that automatically translates the bytecodes into a 64-bit
form. A description of this transformation is beyond the scope of
the JAVA Virtual Machine Specification.
1.2 Registers
[0348] At any point the virtual machine is executing the code of a
single method, and the pc register contains the address of the next
bytecode to be executed.
[0349] Each method has memory space allocated for it to hold:
[0350] a set of local variables, referenced by a vars register;
[0351] an operand stack, referenced by an optop register; and
[0352] a execution environment structure, referenced by a frame
register.
[0353] All of this space can be allocated at once, since the size
of the local variables and operand stack are known at compile time,
and the size of the execution environment structure is well-known
to the interpreter.
[0354] All of these registers are 32 bits wide.
1.3 Local Variables
[0355] Each JAVA method uses a fixed-sized set of local variables.
They are addressed as word offsets from the vary register. Local
variables are all 32 bits wide.
[0356] Long integers and double precision floats are considered to
take up two local variables but are addressed by the index of the
first local variable. (For example, a local variable with index
containing a double precision float actually occupies storage at
indices n and n+1.) The virtual machine specifcation does not
require 64-bit values in local variables to be 64-bit aligned.
Implementor are free to decide the appropriate way to divide long
integers and double precision floats into two words.
[0357] Instructions are provided to load the values of local
variables onto the operand stack and store values from the operand
stack into local variables.
1.4 The Operand Stack
[0358] The machine instructions all take operands from an operand
stack, operate on them, and return results to the stack. We chose a
stack organization so that it would be easy to emulate the machine
efficiently on machines with few or irregular registers such as the
Intel 486 microprocessor.
[0359] The operand stack is 32 bits wide. It is used to pass
parameters to methods and receive method results, as well as to
supply parameters for operations and save operation results.
[0360] For example, execution of instruction iadd adds two integers
together. It expects that the two integers are the top two words on
the operand stack, and were pushed there by previous instructions.
Both integers are popped from the stack, added, and their sum
pushed back onto the operand stack. Subcomputations may be nested
on the operand stack, and result in a single operand that can be
used by the is nesting computation.
[0361] Each primitive data type has specialized instructions that
know how to operate on operands of that type. Each operand requires
a single location on the stack, except for long and double
operands, which require two locations.
[0362] Operands must be operated on by operators appropriate to
their type. It is illegal, for example, to push two integers and
then treat them as a long. This restriction is enforced, in the Sun
implementation, by the bytecode verifier. However, a small number
of operations (the dup opcodes and swap) operate on runtime data
areas as raw values of a given width without regard to type.
[0363] In our description of the virtual machine instructions
below, the effect of an instruction's execution on the operand
stack is represented textually, with the stack growing from left to
right, and each 32-bit word separately represented. Thus: [0364]
Stack: . . . , value1, value2 . . . , value3 shows an operation
that begins by having value2 on top of the stack with value1 just
beneath it. As a result of the execution of the instruction, value1
and value2 are popped from the stack and replaced by value3, which
has been calculated by the instruction. The remainder of the stack,
represented by an ellipsis, is unaffected by the instruction's
execution.
[0365] The types long and double take two 32-bit words on the
operand stack: [0366] Stack: . . . . . . , value-word1,
value-word2
[0367] This specification does not say how the two words are
selected from the 64-bit long or double value; it is only necessary
that a particular implementation be internally consistent.
1.5 Execution Environment
[0368] The information contained in the execution environment is
used to do dynamic linking, normal method returns, and exception
propagation.
1.5.1 Dynamic Linking
[0369] The execution environment contains references to the
interpreter symbol table for the current method and current class,
in support of dynamic linking of the method code. The class file
code for a method refers to methods to-be called and variables to
be accessed symbolically. Dynamic linking translates these symbolic
method calls into actual method calls, loading classes as necessary
to resolve as-yet-undefined symbols, and translates variable
accesses into appropriate offsets in storage structures associated
with the runtime location of these variables.
[0370] This late binding of the methods and variables makes changes
in other classes that a method uses less likely to break this
code.
1.5.2 Normal Method Returns
[0371] If execution of the current method completes normally, then
a value is returned to the calling method. This occurs when the
calling method executes a return instruction appropriate to the
return type.
[0372] The execution environment is used in this case to restore
the registers of the caller, with the program counter of the caller
appropriately incremented to skip the method call instruction.
Execution then continues in the calling method's execution
environment.
1.5.3 Exception and Error Propagation
[0373] An exceptional condition, known in JAVA as an Error or
Exception, which are subclasses of Throwable, may arise in a
program because of: [0374] a dynamic linkage failure, such as a
failure to find a needed class file; [0375] a run-time error, such
as a reference through a null pointer; [0376] an asynchronous
event, such as is thrown by Thread.stop, from another thread; and
[0377] the program using a throw statement. When an exception
occurs: [0378] A list of catch clauses associated with the current
method is examined. Each catch clause describes the instruction
range for which it is active, describes the type of exception that
it is to handle, and has the address of the code to handle it.
[0379] An exception matches a catch clause if the instruction that
caused the exception is in the appropriate instruction range, and
the exception type is a subtype of the type of exception that the
catch clause handles. If a matching catch clause is found, the
system branches to the specified handler. If no handler is found,
the process is repeated until all the nested catch clauses of the
current method have been exhausted. [0380] The order of the catch
clauses in the list is important. The virtual machine execution
continues at the first matching catch clause. Because JAVA code is
structured, it is always possible to sort all the exception
handlers for one method into a single list that, for any possible
program counter value, can be searched in linear order to find the
proper (innermost containing applicable) exception handler for an
exception occurring at that program counter value. [0381] If there
is no matching catch clause then the current method is said to have
as its outcome the uncaught exception. The execution state of the
method that called this method is restored from the execution
environment, and the propagation of the exception continues, as
though the exception had just occurred in this caller. 1.5.4
Additional Information
[0382] The execution environment may be extended with additional
implementation-specified information, such as debugging
information.
1.6 Garbage Collected Heap
[0383] The JAVA heap is the runtime data area from which class
instances (objects) are allocated. The JAVA language is designed to
be garbage collected--it does not, give the programmer the ability
to deallocate objects explicitly. The JAVA language does not
presuppose any particular kind of garbage collection; various
algorithms may be used depending on system requirements.
1.7 Method Area
[0384] The method area is analogous to the store for compiled code
in conventional languages or the text segment in a UNIX process. It
stores method code (compiled JAVA code) and symbol tables. In the
current JAVA implementation, method code is not part of the
garbage-collected heap, although this is planned for a future
release.
1.8 The JAVA Instruction Set
[0385] An instruction in the JAVA instruction set consists of a
one-byte opcode specifying the operation to be performed, and zero
or more operands supplying parameters or data that will be used by
the operation. Many instructions have no operands and consist only
of an opcode.
[0386] The inner loop of the virtual machine execution is
effectively: TABLE-US-00001 do { fetch an opcode byte execute an
action depending on the value of the opcode } while (there is more
to do);
[0387] The number and size of the additional operands is determined
by the opcode. If an additional operand is more than one byte in
size, then it is stored in big-endian order--high order byte first.
For example, a 16-bit parameter is stored as two bytes whose value
is: first_byte*256+second_byte
[0388] The bytecode instruction stream is only byte-aligned, with
the exception being the tableswitch and lookupswitch instructions,
which force alignment to a 4-byte boundary within their
instructions.
[0389] These decisions keep the virtual machine code for a compiled
JAVA program compact and reflect a conscious bias in favor of
compactness at some possible cost in performance.
1.9 Limitations
[0390] The per-class constant pool has a maximum of 65535 entries.
This acts as an internal limit on the total complexity of a single
class.
[0391] The amount of code per method is limited to 65535 bytes by
the sizes of the indices in the code in the exception table, the
line number table, and the local variable table.
[0392] Besides this limit, the only other limitation of note is
that the number of words of arguments in a method call is limited
to 255.
2. Class File Format
[0393] This chapter documents the JAVA class (.class) file
format.
[0394] Each class file contains the compiled version of either a
JAVA class or a JAVA interface. Compliant JAVA interpreters must be
capable of dealing with all class files that conform to the
following specification.
[0395] A JAVA class file consists of a stream of 8-bit bytes. All
16-bit and 32-bit quantities are constructed by reading in two or
four 8-bit bytes, respectively. The bytes are joined together in
network (big-endian) order, where the high bytes come first. This
format is supported by the JAVA JAVA.io.DataInput and
JAVA.io.DataOutput interfaces, and classes such as
JAVA.io.DataInputStream and JAVA.io.DataOutputStream.
[0396] The class file format is described here using a structure
notation. Successive fields in the structure appear in the external
representation without padding or alignment. Variable size arrays,
often of variable sized elements, are called tables and are
commonplace in these structures.
[0397] The types u1, u2, and u4 mean an unsigned one-, two-, or
four-byte quantity, respectively, which are read by method such as
readUnsignedByte, readUnsignedShort and readInt of the
JAVA.io.DataInput interface.
2.1 Format
[0398] The following pseudo-structure gives a top-level description
of the format of a class file: TABLE-US-00002 ClassFile { u4 magic;
u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info
constant_pool[constant_pool_count - 1]; u2 access_flags; u2
this_class; u2 super_class; u2 interfaces_count; u2
interfaces[interfaces_count]; u2 fields_count; field_info
fields[fields_count]; u2 methods_count; method_info
methods[methods_count]; u2 attributes_count; attribute_info
attributes[attribute_count]; }
magic
[0399] This field must have the value 0.times.CAFEBABE.
minor_version, major_version
[0400] These fields contain the version number of the. JAVA
compiler that produced this class file. An implementation of the
virtual machine will normally support some range of minor version
numbers 0-n of a particular major version number. If the minor
version number is incremented the new code won't run on the old
virtual machines, but it is possible to make a new virtual machine
which can run versions up to n+1.
[0401] A change of the major version number indicates a major
incompatible change, one that requires a different virtual machine
that may not support the old major version in any way.
[0402] The current major version number is 45; the current minor
version number is 3.
constant_Pool_count
[0403] This field indicates the number of entries in the constant
pool in the class file.
constant_pool
[0404] The constant pool is a table of values. These values are the
various string constants, class names, field names, and others that
are referred to by the class structure or by the code.
[0405] constant_pool[0] is always unused by the compiler, and may
be used by an implementation for any purpose.
[0406] Each of the constant_pool entries 1 through
constant_pool_count-1 is a variable-length entry, whose format is
given by the first "tag" byte, as described in section 2.3.
access_flags
[0407] This field contains a mask of up to sixteen modifiers used
with class, method, and field declarations. The same encoding is
used on similar fields in field_info and method_info as described
below. Here is the encoding: TABLE-US-00003 Flag Name Value Meaning
Used By ACC_PUBLIC 0x0001 Visible to everyone Class, Method,
Variable ACC_PRIVATE 0x0002 Visible only to the Method, defining
class Variable ACC_PROTECTED 0x0004 Visible to Method, subclasses
Variable ACC_STATIC 0x0008 Variable or method Method, is static
Variable ACC_FINAL 0x0010 No further sub- Class, classing, over-
Method, riding, or assign- Variable ment after ini- tialization
ACC_SYNCHRONIZED 0x0020 Wrap use in monitor Method lock
ACC_VOLATILE 0x0040 Can't cache Variable ACC_TRANSIENT 0x0080 Not
to be written Variable or read by a per- sistent object manager
ACC_NATIVE 0x0100 Implemented in a Method language other than JAVA
ACC_INTERFACE 0x0200 Is an interface Class ACC_ABSTRACT 0x0400 No
body provided Class, Method
this_class
[0408] This field is an index into the constant pool; constant_pool
[this_class] must be a CONSTANT_class.
super_class
[0409] This field is an index into the constant pool. If the value
of super_class is nonzero, then constant_pool [super_class] must be
a class, and gives the index of this class's superclass in the
constant pool.
[0410] If the value of super_class is zero, then the class being
defined must be JAVA.lang.Object, and it has no superclass.
interfaces_count
[0411] This field gives the number of interfaces that this class
implements.
interfaces
[0412] Each value in this table is an index into the constant pool.
If a table value is nonzero (interfaces[i]|=0, where 0<=i
<interfaces_count), then constant_pool [interfaces[i]] must be
an interface that this class implements.
fields_count
[0413] This field gives the number of instance variables, both
static and dynamic, defined by this class. The fields table
includes only those variables that are defined explicitly by this
class. It does not include those instance variables that are
accessible from this class but are inherited from superclasses.
fields
[0414] Each value in this table is a more complete description of a
field in the class. See section 2.4 for more information on the
field_info structures.
methods_count
[0415] This field indicates the number of methods, both static and
dynamic, defined by this class. This table only includes those
methods that are explicitly defined by this class. It does not
include inherited methods.
methods
[0416] Each value in this table is a more complete description of a
method in the class. See section 2.5 for more information on the
method_info structure.
attributes_count
[0417] This field indicates the number of additional attributes
about this class.
attributes
[0418] A class can have any number of optional attributes
associated with it. Currently, the only class attribute recognized
is the "SourceFile" attribute, which indicates the name of the
source file from which this class file was compiled. See section
2.6 for more information on the attribute_info structure.
2.2 Signatures
[0419] A signature is a string representing a type of a method,
field or array.
[0420] The field signature represents the value of an argument to a
function or the value of a variable. It is a series of bytes
generated by the following grammar: TABLE-US-00004
<field_signature> ::= <field_type> <field_type>
::=<base_type>|<object_type>| <array_type>
<base_type> ::= B|C|D|F|I|J|S|Z <object_type> ::=
L<fullclassname>; <array_type>
::=[<optional_size><field_type> <optional_size>
::= [0-9]
[0421] The meaning of the base types is as follows: TABLE-US-00005
B byte signed byte C char character D double double precision IEEE
float F float single precision IEEE float I int integer J long long
integer L<fullclassname>; . . . an object of the given class
S short signed short Z boolean true or false [<field sig> . .
. array
[0422] A return-type signature represents the return value from a
method. It is a series of bytes in the following grammar:
<return_signature>::=<field_type>|V
[0423] The character V indicates that the method returns no value.
Otherwise, the signature indicates the type of the return
value.
[0424] An argument signature represents an argument passed to a
method: <argument_signature>::=<field_type>
[0425] A method signature represents the arguments that the method
expects, and the value that it returns. TABLE-US-00006
<method_signature> ::= (<arguments_signature>)
<return_signature> <arguments_signature> ::=
<argument_signature>*
2.3 Constant Pool
[0426] Each item in the constant pool begins with a 1-byte tag:.
The table below lists the valid tags and their values.
TABLE-US-00007 Constant Type Value CONSTANT_Class 7
CONSTANT_Fieldref 9 CONSTANT_Methodref 10
CONSTANT_InterfaceMethodref 11 CONSTANT_String 8 CONSTANT_Integer 3
CONSTANT_Float 4 CONSTANT_Long 5 CONSTANT_Double 6
CONSTANT_NameAndType 12 CONSTANT_Utf8 1 CONSTANT_Unicode 2
[0427] Each tag byte is then followed by one or more bytes giving
more information about the specific constant.
2.3.1 CONSTANT_Class
[0428] CONSTANT_Class is used to represent a class or an interface.
TABLE-US-00008 CONSTANT_Class_info { u1 tag; u2 name_index; }
tag [0429] The tag will have the value CONSTANT_Class name_index
[0430] constant_pool[name_index] is a CONSTANT_Utf8 giving the
string name of the class.
[0431] Because arrays are objects, the opcodes anewarray and
multianewarray can reference array "classes" via CONSTANT_Class
items in the constant pool. In this case, the name of the class is
its signature. For example, the class name for [0432] int [ ] [ ]
is [0433] [[I The class name for [0434] Thread[ ] is [0435]
"[LJAVA.lang.Thread;" 2.3.2 CONSTANT_{Fieldref,Methodref,
InterfaceMethodref}
[0436] Fields, methods, and interface methods are represented by
similar structures. TABLE-US-00009 CONSTANT_Fieldref_info { u1 tag;
u2 class_index; u2 name_and_type_index; } CONSTANT_Methodref_info {
u1 tag; u2 class_index; u2 name_and_type_index; }
CONSTANT_InterfaceMethodref_info { u1 tag; u2 class_index; u2
name_and_type_index; }
tag
[0437] The tag will have the value CONSTANT_Fieldref,
CONSTANT_Methodref, or CONSTANT_InterfaceMethodref.
class_index
[0438] constant_pool [class_index] will be an entry of type
CONSTANT_Class giving the name of the class or interface containing
the field or method.
[0439] For CONSTANT_Fieldref and CONSTANT_Methodref, the
CONSTANT_Class item must be an actual class. For
CONSTANT_InterfaceMethodref, the item must be an interface which
purports to implement the given method.
name_and_type_index
[0440] constant_pool [name_and_type_index] will be an entry of type
CONSTANT_NameAndType. This constant pool entry indicates the name
and signature of the field or method.
2.3.3 CONSTANT_String
[0441] CONSTANT_String is used to represent constant objects of the
built-in type String. TABLE-US-00010 CONSTANT_String_info { u1 tag;
u2 string_index; }
tag
[0442] The tag will have the value CONSTANT_String
string_index
[0443] constant_pool [string_index] is a CONSTANT_Utf8 string
giving the value to which the String object is initialized. 2.3.4
CONSTANT_Integer andCONSTANT_Float
[0444] CONSTANT_Integer andCONSTANT_Float represent four-byte
constants. TABLE-US-00011 CONSTANT_Integer_info { u1 tag; u4 bytes;
} CONSTANT_Float_info { u1 tag; u4 bytes; }
tag [0445] The tag will have the value CONSTANT_Integer or
CONSTANT_Float bytes
[0446] For integers, the four bytes are the integer value. For
floats, they are the IEEE 754 standard representation of the
floating point value. These bytes are in network (high byte first)
order.
2.3.5 CONSTANT_Long and CONSTANT_Double
[0447] CONSTANT_Long andCONSTANT_Double represent eight-byte
constants. TABLE-US-00012 CONSTANT_Long_info { u1 tag; u4
high_bytes; u4 low_bytes; } CONSTANT_Double_info { u1 tag; u4
high_bytes; u4 low_bytes; }
[0448] All eight-byte constants take up two spots in the constant
pool. If this is the n.sup.th item in the constant pool, then the
next item will be numbered n+2.
tag
[0449] The tag will have the value CONSTANT_Long or
CONSTANT_Double.
high_bytes, low_bytes
[0450] For CONSTANT_Long, the 64-bit value is
(high_bytes<<32)+low_bytes.
[0451] For CONSTANT_Double, the 64-bit value;high_bytes and
low_bytes together represent the standard IEEE 754 representation
of the double-precision floating point number.
2.3.6 CONSTANT_NameAndType
[0452] CONSTANT_NameAndType is used to represent a field or method,
without indicating which class it belongs to. TABLE-US-00013
CONSTANT_NameAndType_info { u1 tag; u2 name_index; u2
signature_index; }
tag
[0453] The tag will have the valueCONSTANT_NameAndType.
name_index
[0454] constant_pool [name_index] is a CONSTANT_Utf8 string giving
the name of the field or method.
signature_index
[0455] constant_pool [signature_index] is a CONSTANT_Utf8 string
giving the signature of the field or method.
2.3.7 CONSTANT_Utf8 and CONSTANT_Unicode
[0456] CONSTANT_Utf8 andCONSTANT_Unicode are used to represent
constant string values.
[0457] CONSTANT_Utf8 strings are "encoded" so that strings
containing only non-null ASCII characters, can be represented using
only one byte per character, but characters of up to 16 bits can be
represented:
[0458] All characters in the range 0.times.0001 to 0.times.007F are
represented by a single byte: TABLE-US-00014 ##STR1##
[0459] The null character (0.times.0000) and characters in the
range 0.times.0080 to 0.times.07FF are represented by a pair of two
bytes: TABLE-US-00015 ##STR2##
[0460] Characters in the range 0.times.0800 to 0.times.FFFF are
represented by three bytes: TABLE-US-00016 ##STR3##
[0461] There are two differences between this format and the
"standard" UTF-8 format. First, the null byte (0.times.00) is
encoded in two-byte format rather than one-byte, so that our
strings never have embedded nulls. Second, only the one-byte,
two-byte, and three-byte formats are used. We do not recognize the
longer formats. TABLE-US-00017 CONSTANT_Utf8_info { u1 tag; u2
length; u1 bytes[length]; } CONSTANT_Unicode_info { u1 tag; u2
length; u2 bytes [length]; }
tag [0462] The tag will have the value CONSTANT_Utf8 or
CONSTANT_Unicode. length [0463] The number of bytes in the string.
These strings are not null terminated. bytes [0464] The actual
bytes of the string.
[0465] 2.4 Fields
[0466] The information for each field immediately follows the
field_count field in the class file. Each field is described by a
variable length field_info structure. The format of this structure
is as follows: TABLE-US-00018 field_info { u2 access_flags; u2
name_index; u2 signature_index; u2 attributes_count; attribute_info
attributes[attribute_count]; } access_flags
[0467] This is a set of sixteen flags used by classes, methods, and
fields to describe various properties and how they many be accessed
by methods in other classes. See the table "Access Flags" which
indicates the meaning of the bits in this field.
[0468] The possible fields that can be set for a field are
ACC_PUBLIC, ACC_PRIVATE, ACC_PROTECTED, ACC_STATIC, ACC_FINAL,
ACC_VOLATILE, and ACC_TRANSIENT.
[0469] At most one of ACC_PUBLIC, ACC_PROTECTED, and ACC_PRIVATE
can be set for any method.
name_index
[0470] constant_pool [name_index] is a CONSTANT_Utf8 string which
is the name of the field.
signature_index
[0471] constant_pool [signature_index] is a CONSTANT_Utf8 string
which is the signature of the field. See the section "Signatures"
for more information on signatures.
attributes_count
[0472] This value indicates the number of additional attributes
about this field.
attributes
[0473] A field can have any number of optional attributes
associated with it. Currently, the only field attribute recognized
is the "ConstantValue" attribute, which indicates that this field
is a static numeric constant, and indicates the constant value of
that field.
[0474] Any other attributes are skipped.
2.5 Methods
[0475] The information for each method immediately follows the
method count_field in the class file. Each method is described by a
variable length method_info structure. The structure has the
following format: TABLE-US-00019 method_info { u2 access_flags; u2
name_index; u2 signature_index; u2 attributes_count; attribute_info
attributes [attribute_count]; } access_flags
[0476] This is a set of sixteen flags used by classes, methods, and
fields to describe various properties and how they many be accessed
by methods in other classes. See the table "Access Flags" which
gives the various bits in this field.
[0477] The possible fields that can be set for a method are
ACC_PUBLIC, ACC_PRIVATE, ACC_PROTECTED, ACC_STATIC, ACC_FINAL,
ACC_SYNCHRONIZED, ACC_NATIVE, and ACC_ABSTRACT.
[0478] At most one of ACC_PUBLIC, ACC_PROTECTED, and ACC_PRIVATE
can be set for any method.
name_index
[0479] constant_pool[name_index] is a CONSTANT_Utf8 string giving
the name of the method.
signature_index
[0480] constant_pool [signature_index] is a CONSTANT_Utf8 string
giving the signature of the field. See the section "Signatures" for
more information on signatures.
attributes_count
[0481] This value indicates the number of additional attributes
about this field.
attributes
[0482] A field can have any number of optional attributes
associated with it. Each attribute has a name, and other additional
information. Currently, the only field attributes recognized are
the "Code" and "Exceptions" attributes, which describe the
bytecodes. that are executed to perform this method, and the JAVA
Exceptions which are declared to result from the execution of the
method, respectively.
[0483] Any other attributes are skipped.
[0484] 2.6 Attributes
[0485] Attributes are used at several different places in the class
format. All attributes have the following format: TABLE-US-00020
GenericAttribute_info { u2 attribute_name; u4 attribute_length; u1
info[attribute_length]; }
[0486] The attribute_name is a 16-bit index into the class's
constant pool; the value of constant_pool [attribute_name] is a
CONSTANT_Utf8 string giving the name of the attribute. The field
attribute length indicates the length of the subsequent information
in bytes. This length does not include the six bytes of the
attribute_name and attribute_length.
[0487] In the following text, whenever we allow attributes, we give
the name of the attributes that are currently understood. In the
future, more attributes will be added. Class file readers are
expected to skip over and ignore the information in any attribute
they do not understand.
2.6.1 SourceFile
[0488] The "SourceFile" attribute has the following format:
TABLE-US-00021 SourceFile_attribute { u2 attribute_name_index; u4
attribute_length; u2 sourcefile_index; }
attribute_name_index
[0489] constant_pool [attribute_name_index] is the CONSTANT_Utf8
string "SourceFile".
attribute_length
[0490] The length of a SourceFile_attribute must be 2.
sourcefile_index
[0491] constant_pool [sourcefile_index] is a CONSTANT_Utf8 string
giving the source file from which this class file was compiled.
2.6.2 ConstantValue
[0492] The "ConstantValue" attribute has the following format:
TABLE-US-00022 ConstantValue_attribute { u2 attribute_name_index;
u4 attribute_length; u2 constantvalue_index; }
attribute_name_index
[0493] constant_pool [attribute_name_index] is the CONSTANT_Utf8
string "ConstantValue".
attribute_length
[0494] The length of a ConstantValue_attribute must be 2.
constantvalue_index
[0495] constant_pool [constantvalue_index] gives the constant value
for this field.
[0496] The constant pool entry must be of a type appropriate to the
field, as shown by the following table: TABLE-US-00023 long
CONSTANT_Long float CONSTANT_Float double CONSTANT_Double int,
short, char, byte, boolean CONSTANT_Integer
2.6.3 Code
[0497] The "Code" attribute has the following format:
TABLE-US-00024 Code_attribute { u2 attribute_name_index; u4
attribute_length; u2 max_stack; u2 max_locals; u4 code_length; u1
code[code_length]; u2 exception_table_length; { u2 start_pc; u2
end_pc; u2 handler_pc; u2 catch_type; }
exception_table[exception_table_length]; u2 attributes_count;
attribute_info attributes [attribute_count]; }
attribute_name_index
[0498] constant_pool [attribute_name_index] is the CONSTANT_Utf8
string "Code".
attribute_length
[0499] This field indicates the total length of the "Code"
attribute, excluding the initial six bytes.
max_stack
[0500] Maximum number of entries on the operand stack that will be
used during execution of this method. See the other chapters in
this spec for more information on the operand stack.
max_locals
[0501] Number of local variable slots used by this method. See the
other chapters in this spec for more information on the local
variables.
code_length
[0502] The number of bytes in the virtual machine code for this
method.
code
[0503] These are the actual bytes of the virtual machine code that
implement the method. When read into memory, if the first byte of
code is aligned onto a multiple-of-four boundary the tableswitch
and tablelookup opcode entries will be aligned; see their
description for more information on alignment requirements.
exception_table_length
[0504] The number of entries in the following exception table.
exception_table
[0505] Each entry in the exception table describes one exception
handler in the code.
start_pc, end_pc
[0506] The two fieldsstart_pc and end_pc indicate the ranges in the
code at which the exception handler is active. The values of both
fields are offsets from the start of the code.start_pc is
inclusive.end_pc is exclusive.
handler_pc
[0507] This field indicates the starting address of the exception
handler. The value of the field is an offset from the start of the
code.
catch_type
[0508] If catch_type is nonzero, then constant_pool [catch_type]
will be the class of exceptions that this exception handler is
designated to catch. This exception handler should only be called
if the thrown exception is an instance of the given class.
[0509] If catch_type is zero, this exception handler should be
called for all exceptions.
attributes_count
[0510] This field indicates the number of additional attributes
about code. The "Code" attribute can itself have attributes.
attributes
[0511] A "Code" attribute can have any number of optional
attributes associated with it. Each attribute has a name, and other
additional information. Currently, the only code attributes defined
are the "LineNumberTable" and "LocalVariableTable," both of which
contain debugging information.
2.6.4 Exceptions Table
[0512] This table is used by compilers which indicate which
Exceptions a method is declared to throw: TABLE-US-00025
Exceptions_attribute { u2 attribute_name_index; u4
attribute_length; u2 number_of_exceptions; u2 exception_index_table
[number_of_ex- ceptions]; }
attribute_name_index
[0513] constant_pool [attribute_name_index] will be the
CONSTANT_Utf8 string "Exceptions".
attribute_length
[0514] This field indicates the total length of the
Exceptions_attribute, excluding the initial six bytes.
number_of_exceptions
[0515] This field indicates the number of entries in the following
exception index table.
exception_index_table
[0516] Each value in this table is an index into the constant pool.
For each table element (exception_index_table [i]|=0, where 0<=i
<number_of_exceptions), then constant_pool
[exception_index+table [i]] is a Exception that this class is
declared to throw.
2.6.5 LineNumberTable
[0517] This attribute is used by debuggers and the exception
handler to determine which part of the virtual machine code
corresponds to a given location in the source. The
LineNumberTable_attribute has the following format: TABLE-US-00026
LineNumberTable_attribute { u2 attribute_name_index; u4
attribute_length; u2 line_number_table_length; { u2 start_pc; u2
line_number; } line_number_table[line.sub.-- number_table_length];
}
attribute_name_index
[0518] constant_pool [attribute_name_index] will be the
CONSTANT_Utf8 string "LineNumberTable".
attribute_length
[0519] This field indicates the total length of the
LineNumberTable_attribute, excluding the initial six bytes.
line_number_table_length
[0520] This field indicates the number of entries in the following
line number table.
line_number_table
[0521] Each entry in the line number table indicates that the line
number in the source file changes at a given point in the code.
start_pc
[0522] This field indicates the place in the code at which the code
for a new line in the source begins. source_pc <<SHOULD THAT
BEstart_pc?>> is an offset from the beginning of the
code.
line_number
[0523] The line number that begins at the given location in the
file.
2.6.6 LocalVariableTable
[0524] This attribute is used by debuggers to determine the value
of a given local variable during the dynamic execution of a method.
The format of the LocalVariableTable_attribute is as follows:
TABLE-US-00027 LocalVariableTable_attribute { u2
attribute_name_index; u4 attribute_length; u2
local_variable_table_length; { u2 start_pc; u2 length; u2
name_index; u2 signature_index; u2 slot; }
local_variable_table[local.sub.-- variable_table_length]; }
attribute_name_index
[0525] constant_pool [attribute_name_index] will be the
CONSTANT_Utf8 string "LocalVariableTable".
attribute_length
[0526] This field indicates the total length of the
LineNumberTable_attribute, excluding the initial six bytes.
local_variable_table_length
[0527] This field indicates the number of entries in the following
local variable table.
local_variable_table
[0528] Each entry in the local variable table indicates a code
range during which a local variable has a value. It also indicates
where on the stack the value of that variable can be found.
start_pc, length
[0529] The given local variable will have a value at the code
between start_pc andstart_pc+length. The two values are both
offsets from the beginning of the code.
name_index, signature_index
[0530] constant_pool [name_index] and constant_pool
[signature_index] are CONSTANT_Utf8 strings giving the name and
signature of the local variable.
slot
[0531] The given variable will be the slot.sup.th local variable in
the method's frame.
3. The Virtual Machine Instruction Set
3.1 Format for the Instructions
[0532] JAVA Virtual Machine instructions are represented in this
document by an entry of the following form.
instruction name
[0533] Short description of the instruction
[0534] Syntax: TABLE-US-00028 opcode=number operand1 operand2
...
[0535] Stack: . . . , value1, value2 . . . , value3 [0536] A longer
description that explains the functions of the instruction and
indicates any exceptions that might be thrown during execution.
[0537] Each line in the syntax table represents a single 8-bit
byte.
[0538] Operations of the JAVA Virtual Machine most often take their
operands from the stack and put their results back on the stack. As
a convention, the descriptions do not usually mention when the
stack is the source or destination of an operation, but will always
mention when it is not. For instance, instruction iload has the
short description "Load integer from local variable." Implicitly,
the integer is loaded onto the stack. Instruction iadd is described
as "Integer add"; both its source and destination are the
stack.
[0539] Instructions that do not affect the control flow of a
computation may be assumed to always advance the virtual machine
program counter to the opcode of the following instruction. Only
instructions that do affect control flow will explicitly mention
the effect they have on the program counter.
3.2 Pushing Constants onto the Stack
bipush
[0540] Push one-byte signed integer
[0541] Syntax: TABLE-US-00029 bipush=16 byte1
[0542] Stack: . . . => . . . , value
[0543] byte1 is interpreted as a signed 8-bitvalue. This value is
expanded to an integer and pushed onto the operand stack.
sipush
[0544] Push two-byte signed integer
[0545] Syntax: TABLE-US-00030 sipush=17 byte1 byte2
[0546] Stack: . . . => . . . , item
[0547] byte1 and byte2 are assembled into a signed 16-bit value.
This value is expanded to an integer and pushed onto the operand
stack.
ldc1
[0548] Push item from constant pool
[0549] Syntax: TABLE-US-00031 ldc1=18 indexbyte1
[0550] Stack: . . . => . . . , item
[0551] indexbyte1 is used as an unsigned 8-bit index into the
constant pool of the current class. The item at that index is
resolved and pushed onto the stack. If a String is being pushed and
there isn't enough memory to allocate space for it then an
OutOfMemoryError is thrown.
[0552] Note: A String push results in a reference to an object.
ldc2
[0553] Push item from constant pool
[0554] Syntax: TABLE-US-00032 ldc2=19 indexbyte1 indexbyte2
[0555] Stack: . . . => . . . , item
[0556] indexbyte1 and indexbyte2 are used to construct an unsigned
16-bit index into the constant pool of the current class. The item
at that index is resolved and pushed onto the stack. If a String is
being pushed and there isn't enough memory to allocate space for it
then an OutOfMemoryError is thrown.
[0557] Notes A String push results in a reference to an object.
ldc2w
[0558] Push long or double from constant pool
[0559] Syntax: TABLE-US-00033 ldc2w=20 indexbyte1 indexbyte2
[0560] Stack: . . . => . . . , constant-word1, constant-word2
indexbyte1 and indexbyte2 are used to construct an unsigned 16-bit
index into the constant pool of the current class. The two-word
constant that index is resolved and pushed onto the stack.
aconst_null
[0561] Push null object reference
[0562] Syntax:. aconst_null=1
[0563] Stack: . . . => . . . ,null
[0564] Push the null object reference onto the stack.
iconst_m1
[0565] Push integer constant -1
[0566] Syntax: iconst_ml=2
[0567] Stack: . . . => . . . , 1
[0568] Push the integer -1 onto the stack.
iconst_<n>
[0569] Push integer constant
[0570] Syntax: iconst_<n>
[0571] Stack: . . . => . . ., <n>
[0572] Forms: iconst.sub.--0=3, iconst.sub.--1=4, iconst.sub.--2=5,
iconst.sub.--3=6, iconst.sub.--4=7, iconst.sub.--5=8
[0573] Push the integer <n> onto the stack.
lconst_<l>
[0574] Push long integer constant
[0575] Syntax: lconst_<l>
[0576] Stack: . . . => . . . <l>-word1,
<l>-word2
[0577] Forms: lconst.sub.--0=9, iconst.sub.--1=10
[0578] Push the long integer <I>onto the stack.
fconst_<f>
[0579] Push single float
[0580] Syntax: fconst_<f>
[0581] Stack: . . . => . . . <f>
[0582] Forms: fconst.sub.--0=11, fconst.sub.--1=12,
fconst.sub.--2=13
[0583] Push the single-precision floating point number <f>
onto the stack.
dconst_<d>
[0584] Push double float
[0585] Syntax: dconst_<d>
[0586] Stack: . . . => . . . , <d>-word1,
<d>-word2
[0587] Forms: dconst.sub.--0=14, dconst.sub.--1=15
[0588] Push the double-precision floating point number <d>
onto the stack.
3.3 Loading Local Variables Onto the Stack
lload
[0589] Load integer from local variable
[0590] Syntax: TABLE-US-00034 iload=21 vindex
[0591] Stack: . . . => . . . , value
[0592] The value of the local variable at vindex in the current
JAVA frame is pushed onto the operand stack.
iload_<n>
[0593] Load integer from local variable
[0594] Syntax: iload_<n>
[0595] Stack: . . . => . . . , value
[0596] Forms: iload.sub.--0=26, iload.sub.--1=27,iload.sub.--2=28,
iload.sub.--3=29
[0597] The value of the local variable at <n> in the current
JAVA frame is pushed onto the operand stack.
[0598] This instruction is the same as iload with a vindex of
<n>, except that the operand <n> is implicit.
iload
[0599] Load long integer from local variable
[0600] Syntax: TABLE-US-00035 iload = 22 vindex
[0601] Stack: . . . =>. . . , value-word1, value-work2
[0602] The value of the local variables at vindex and vindex+1 in
the current JAVA frame is pushed onto the operand stack.
lload_<n>
[0603] Load long integer from local variable
[0604] Syntax: iload_<n>
[0605] Stack: . . . => . . . , value-word1, value-word2
[0606] Forms: lload.sub.--0=30, lload.sub.--1=31, lload.sub.--2=32,
lload.sub.--3=33
[0607] The value of the local variables at <n> and
<n>+1 in the current JAVA frame is pushed onto the operand
stack.
[0608] This instruction is the same as lload with a vindex of
<n>, except that the operand <n> is implicit.
fload
[0609] Load single float from local variable
[0610] Syntax: TABLE-US-00036 fload = 23 vindex
[0611] Stack: . . . => . . . , value
[0612] The value of the local variable at vindex in the current
JAVA frame is pushed onto the opera and stack.
fload_<n>
[0613] Load single float from local variable
[0614] Syntax: fload_<n>
[0615] Stack: . . . => . . . ,value
[0616] Forms: fload.sub.--0=34, fload.sub.--1=35, fload.sub.--2=36,
fload.sub.--3=37
[0617] The value of the local variable at <n> in the current
JAVA frame is pushed onto the operand stack.
[0618] This instruction is the same as fload with a vindex of
<n>, except that the operand <n> is implicit.
dload
[0619] Load double float from local variable
[0620] Syntax: TABLE-US-00037 dload = 24 vindex
[0621] Stack: . . . => . . . , value-word1, value-word2
[0622] The value of the local variables at vindex and vindex+1 in
the current JAVA frame is pushed onto the operand stack.
dload_<n>
[0623] Load double float from local variable
[0624] Syntax: dload_<n>
[0625] Stack: . . . => . . . value-word1, value-word2
[0626] Forms: dload.sub.--0=38, dload.sub.--1=39, dload.sub.--2=40,
dload.sub.--3=41
[0627] The value of the local variables at <n> and
<n>+1 in the current JAVA frame is pushed onto the operand
stack.
[0628] This instruction is the same as dload with a vindex of
<n>, except that the operand <on> is implicit.
aload
[0629] Load object reference from local variable
[0630] Syntax: TABLE-US-00038 aload = 25 vindex
[0631] Stack: . . . => . . . , value
[0632] The value of the local variable at vindex in the current
JAVA frame is pushed onto the operand stack.
aload_<n>
[0633] Load object reference from local variable
[0634] Syntax: aload_<n>
[0635] Stack: . . . => . . . , value
[0636] Forms: aload.sub.--0=42, aload.sub.--1=43, aload.sub.--2=44,
aload.sub.--3=45
[0637] The value of the local variable at <n> in the current
JAVA frame is pushed onto the operand stack.
[0638] This instruction is the same as aload with a vindex of
<n>, except that the operand <n> is implicit.
3.4 Storing Stack Values into Local Variables
istore
[0639] Store integer into local variable
[0640] Syntax: TABLE-US-00039 istore = 54 vindex
[0641] Stack: . . . , value => . . . [0642] value must be an
integer. Local variable vindex in the current JAVA frame is set to
value. istore_<n>
[0643] Store integer into local variable
[0644] Syntax: istore_<n>
[0645] Stack: . . . , value => . . .
[0646] Forms: istore.sub.--0=59, istore.sub.--1=60,
istore.sub.--2=61, istore.sub.--3=62
[0647] value must be an integer. Local variable <n> in the
current JAVA frame is set to value.
[0648] This instruction is the same as istore with a vindex of
<n>, except that the operand <n> is implicit.
lstore
[0649] Store long integer into local variable
[0650] Syntax: TABLE-US-00040 lstore = 55 vindex
[0651] Stack: . . . , value-word1, value-word2=> . . .
[0652] value must be a long integer. Local variables vindex+1 in
the current JAVA frame are set to value.
lstore_<n>
[0653] Store long integer into local variable
[0654] Syntax: lstore_<n>
[0655] Stack: . . . value-word1, value-word2=>
[0656] Forms: lstore.sub.--0=63, lstore.sub.--1=64,
lstore.sub.--2=65, lstore.sub.--3=66
[0657] value must be a long integer. Local variables <n> and
<n>+1 in the current JAVA frame are set to value.
[0658] This instruction is the same as lstore with a vindex of
<n>, except that the operand <n>is implicit. fstore
[0659] Store single float into local variable
[0660] Syntax: TABLE-US-00041 fstore =56 vindex
[0661] Stack: . . . , value=>. . .
[0662] value must be a single-precision floating point number.
Local variable vindex in the current JAVA frame is set to
value.
fstore_<n>
[0663] Store single float into local variable
[0664] Syntax: fstore_<n>
[0665] Stack: . . . value=> . . .
[0666] Forms: fstore.sub.--0=67, fstore.sub.--1=68,
fstore.sub.--2=69, fstore.sub.--3=70
[0667] value must be a single-precision floating point number.
Local variable <n> in the current JAVA frame is set to
value.
[0668] This instruction is the same as fstore with a vindex of
<n>, except that the operand <n> is implicit.
dstore
[0669] Store double float into local variable
[0670] Syntax: TABLE-US-00042 dstore = 57 vindex
[0671] Stack: . . . , value-word1, value-word2=> . . .
[0672] value must be a double-precision floating point number.
Local variables vindex and vindex+1 in the current JAVA frame are
set to value.
dstore_<n>
[0673] Store double float into local variable
[0674] Syntax: dstore_<n>
[0675] Stack: . . . , value-word1, value-word2=> . . .
[0676] Forms: dstore.sub.--0=71, dstore.sub.--1=72,
dstore.sub.--2=73, dstore.sub.--3=74
[0677] value must be a double-precision floating point number.
Local variables <n> and <n>+1 in the current JAVA frame
are set to value.
[0678] This instruction is the same as dstore with a vindex of
<n>, except that the operand <n> is implicit.
astore
[0679] Store object reference into local variable
[0680] Syntax: astore=58 vindex
[0681] Stack: . . . , value=> . . .
[0682] value must be a return address or a reference to an object.
Local variable vindex in the current JAVA frame is set to
value.
astore_<n>
[0683] Store object reference into local variable
[0684] Syntax: astore_<n>
[0685] Stack: . . . , value=> . . .
[0686] Forms: astore.sub.--0=75, astore.sub.--1=76,
astore.sub.--2=77, astore.sub.--3=78
[0687] value must be a return address or a reference to an object.
Local variable <n> in the current JAVA frame is set to
value.
[0688] This instruction is the same as astore with a vindex of
<n>, except that the operand <n> is implicit.
iinc
[0689] Increment local variable by constant
[0690] Syntax: TABLE-US-00043 iinc = 132 vindex const
[0691] Stack: no change
[0692] Local variable vindex in the current JAVA frame must contain
an integer. Its value is incremented by the value const, where
const is treated as a signed 8-bit quantity.
3.5 Wider Index for Loading, Storing and Incrementing
Wide
[0693] Wider index for accessing local variables in load, store and
increment.
[0694] Syntax: TABLE-US-00044 wide = 196 vindex2
[0695] Stack: no change
[0696] This bytecode must precede one of the following bytecodes:
iload, lload, fload, dload, aload, istore, lstore, fstore, dstore,
astore, iinc. The vindex of the following bytecode and vindex2 from
this bytecode are assembled into an unsigned 16-bit index to a
local variable in the current JAVA frame. The following bytecode
operates as normal except for the use of this wider index.
3.6 Managing Arrays
newarray
[0697] Allocate new array
[0698] Syntax: TABLE-US-00045 newarray = 188 atype
[0699] Stack: . . . , size=>result
[0700] size must be an integer. It represents the number of
elements in the new array.
[0701] atype is an internal code that indicates the type of array
to allocate. Possible values for atype are as follows:
TABLE-US-00046 T_BOOLEAN 4 T_CHAR 5 T_FLOAT 6 T_DOUBLE 7 T_BYTE 8
T_SHORT 9 T_INT 10 T_LONG 11
[0702] A new array of atype, capable of holding size elements, is
allocated, and result is a reference to this new object. Allocation
of an array large enough to contain size items of atype is
attempted. All elements of the array are initialized to zero.
[0703] If size is less than zero, a NegativeArraySizeException is
thrown. If there is not enough memory to allocate the array,
anOutOfMemoryError is thrown.
anewarray
[0704] Allocate new array of references to objects
[0705] Syntax: TABLE-US-00047 anewarray = 189 indexbyte1
indexbyte2
[0706] Stack: . . . , size=>result
[0707] size must be an integer. It represents the number of
elements in the new array. indexbyte1 and indexbyte2 are used to
construct an index into the constant pool of the current class. The
item at that index is resolved. The resulting entry must be a
class.
[0708] A new array of the indicated class type and capable of
holding size elements is allocated, and result is a reference to
this new object. Allocation of an array large enough to contain
size items of the given class type is attempted. All elements of
the array are initialized to null.
[0709] If size is less than zero, a NegativeArraySizeException is
thrown. If there is not enough memory to allocate the array, an
OutOfMemoryError is thrown.
[0710] anewarray is used to create a single dimension of an array
of object references. For example, to create [0711] new
Thread[7]
[0712] the following code is used: [0713] bipush 7 [0714] anewarray
<Class "JAVA.lang.Thread">
[0715] anewarray can also be used to create the first dimension of
a multi-dimensional array. For example, the following array
declaration: [0716] new int[6] [ ]
[0717] is created with the following code: [0718] bipush 6 [0719]
anewarray <Class "[I">
[0720] See CONSTANT_Class in the "Class File Format" chapter for
information on array class names.
multianewarray
[0721] Allocate new multi-dimensional array
[0722] Syntax: multianwearray=197 indexbyte1 indexbyte2
dimensions
[0723] Stack: . . . , size1 size2 . . . sizen=>result
[0724] Each size must be an integer. Each represents the number of
elements in a dimension of the array.
[0725] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The item at that index
is resolved. The resulting entry must be an array class of one or
more dimensions.
[0726] dimensions has the following aspects: [0727] It must be an
integer .gtoreq.1. [0728] It represents the number of dimensions
being created. It must be s the number of dimensions of the array
class. [0729] It represents the number of elements that are popped
off the stack. All must be integers greater than or equal to zero.
These are used as the sizes of the dimension. For example, to
create [0730] new int[6] [3] [ ] [0731] the following code is used:
[0732] bipush 6 [0733] bipush 3 [0734] multianewarray <Class
"[[[I">2
[0735] If any of the size arguments on the stack is less than zero,
a NegativeArraySizeException is thrown. If there is not enough
memory to allocate the array, an OutOfMemoryError is thrown.
[0736] The result is a reference to the new array object.
[0737] Note: It is more efficient to use newarray or anewarray when
creating a single dimension.
[0738] See CONSTANT_Class in the "Class File Format" chapter for
information on array class names.
arraylength
[0739] Get length of array
[0740] Syntax; arraylength=190
[0741] Stack: . . . , objectref=> . . . , length
[0742] objectref must be a reference to an array object. The length
of the array is determined and replaces objectref on the top of the
stack.
[0743] If the objectref is null, a NullPointerException is
thrown.
iaload
[0744] Load integer from array
[0745] Syntax: iaload=46
[0746] Stack: . . . , arrayref, index=> . . . , value
[0747] arrayref must be a reference to an array of integers.index
must be an integer. The integer value at position number index in
the array is retrieved and pushed onto the top of the stack.
[0748] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
laload
[0749] Load long integer from array
[0750] Syntax: laoad=47
[0751] Stack: . . . , arrayref, index=> . . . , value-word1,
value-word2
[0752] arrayref must be a reference to an array of long integers,
index must be an integer. The long integer value at position number
index in the array is retrieved and pushed onto the top of the
stack.
[0753] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
faload
[0754] Load single float from array
[0755] Syntax: faload=48
[0756] Stack: . . . , arrayref, index=> . . . , value
[0757] arrayref must be a reference to an array of single-precision
floating point numbers, index must be an integer. The
single-precision floating point number value at position number
index in the array is retrieved and pushed onto the top of the
stack.
[0758] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
daload
[0759] Load double float from array
[0760] Syntax: daload=49
[0761] Stack: . . . , arrayref, index=> . . . , value-word1,
value-word2
[0762] arrayref must be a reference to an array of double-precision
floating point numbers. index must be an integer. The
double-precision floating point number value at position number
index in the array is retrieved and pushed onto the top of the
stack.
[0763] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
aaload
[0764] Load object reference from array
[0765] Syntax: aaload=50
[0766] Stack: . . . , arrayref, index=> . . . , value
[0767] arrayref must be a reference to an array of references to
objects. index must be an integer. The object reference at position
number index in the array is retrieved and pushed onto the top of
the stack.
[0768] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
baload
[0769] Load signed byte from array.
[0770] Syntax: baload=51
[0771] Stack: . . . , arrayref, index=> . . . value
[0772] arrayref must be a reference to an array of signed bytes.
index must be an integer. The signed byte value at position number
index in the array is retrieved, expanded to an integer, and pushed
onto the top of the stack.
[0773] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
caload
[0774] Load character from array
[0775] Syntax: caload=52
[0776] Stack: . . . , arrayref, index=> . . . ,value
[0777] arrayref must be a reference to an array of characters.
index must be an integer. The character value at position number
index in the array is retrieved, zero-extended to an integer, and
pushed onto the top of the stack.
[0778] If arrayref is null a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
saload
[0779] Load short from array
[0780] Syntax: saload=53
[0781] Stack: . . . , arrayref, index=> . . . , value
[0782] arrayref must be a reference to an array of short integers.
index must be an integer. The ;signed short integer value at
position number index in the array is retrieved, expanded to an
integer, and pushed onto the top of the stack.
[0783] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
iastore
[0784] Store into integer array
[0785] Syntax: iastore=79
[0786] Stack: . . . , arrayref, index, value=> . . .
[0787] arrayref must be a reference to an array of integers, index
must be an integer, and value an integer. The integer value is
stored at position index in the array.
[0788] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
lastore
[0789] Store into long integer array
[0790] Syntax: lastore=80
[0791] Stack: . . . , arrayref, index, value-word1,
value-word2=> . . .
[0792] arrayref must be a reference to an array of long integers,
index must be an integer, and value a long integer. The long
integer value is stored at position index in the array.
[0793] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array, an
ArrayIndexOutOfBoundsException is thrown.
fastore
[0794] Store into single float array
[0795] Syntax: fastore=81
[0796] Stack: . . . , arrayref, index, value=> . . .
[0797] arrayref must be an array of single-precision floating point
numbers, index must be an integer, and value a single-precision
floating point number. The single float value is stored at position
index in the array.
[0798] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
dastore
[0799] Store into double float array
[0800] Syntax: dastore=82
[0801] Stack: . . . , arrayref, index, value-word1,
value-word2=> . . .
[0802] arrayref must be a reference to an array of double-precision
floating point numbers, index must be an integer, and value a
double-precision floating point number. The double float value is
stored at position index in the array.
[0803] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
aastore
[0804] Store into object reference array
[0805] Syntax: aastore=83
[0806] Stack: . . . , arrayref, index, value=> . . .
[0807] arrayref must be a reference to an array of references to
objects, index must be an integer, and value a reference to an
object. The object reference value is stored at position index in
the array.
[0808] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array, an
ArrayIndexOutOfBoundsException is thrown.
[0809] The actual type of value must be conformable with the actual
type of the elements of the array. For example, it is legal to
store an instance of class Thread in an array of class Object, but
not vice versa. An ArrayStoreException is thrown if an attempt is
made to store an incompatible object reference.
bastore
[0810] Store into signed byte array
[0811] Syntax: bastore=84
[0812] Stack: . . . , arrayref, index, value=> . . .
[0813] arrayref must be a reference to an array of signed bytes,
index must be an integer, and value an integer. The integer value
is stored at position index in the array. If value is too large to
be a signed byte, it is truncated.
[0814] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
castore
[0815] Store into character array
[0816] Syntax: castore=85
[0817] Stack: . . . , arrayref, index, value=> . . .
[0818] arrayref must be an array of characters, index must be an
integer, and value an integer. The integer value is stored at
position index in the array. If value is too large to be a
character, it is truncated.
[0819] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of [the array an
ArrayIndexOutOfBoundsException is thrown.
sastore
[0820] Store into short array
[0821] Syntax: sastore=86
[0822] Stack: . . . , array, index, value=> . . .
[0823] arrayref must be an array of shorts, index must be an
integer, and value an integer. The integer value is stored at
position index in the array. If value is too large to be an short,
it is truncated.
[0824] If arrayref is null, a NullPointerException is thrown. If
index is not within the bounds of the array an
ArrayIndexOutOfBoundsException is thrown.
3.7 Stack Instructions
nop
[0825] Do nothing
[0826] Syntax: nop=0
[0827] Stack: no change
[0828] Do nothing.
pop
[0829] Pop top stack word
[0830] Syntax: pop=87
[0831] Stack: . . . , any=> . . .
[0832] Pop the top word from the stack.
pop2
[0833] Pop top two stack words
[0834] Syntax: pop2=89
[0835] Stack: . . . , any2, any1=> . . .
[0836] Pop the top two words from the stack.
dup
[0837] Duplicate top stack word
[0838] Syntax: dup=89
[0839] Stack: . . . , any=> . . . , any,any
[0840] Duplicate the top word on the stack.
dup2
[0841] Duplicate top two stack words
[0842] Syntax: dup2=92
[0843] Stack: . . . , any2,any1=> . . . , any2, any1,any2,
any1
[0844] Duplicate the top two words on the stack.
dup_x1
[0845] Duplicate top stack word and put two down
[0846] Syntax: dup_x1=90
[0847] Stack: . . . , any2, any1=> . . . , any1, any2, any1
[0848] Duplicate the top word on the stack and insert the copy two
words down in the stack.
dup2_x1
[0849] Duplicate top two stack words and put two down
[0850] Syntax: dup2_x1=93
[0851] Stack: . . . , any3, any2, any1=> . . . , any2, any1,
any3, any2, any1
[0852] Duplicate the top two words on the stack and insert the
copies two words down in the stack.
dup_x2
[0853] Duplicate top stack word and put three down
[0854] Syntax: dup_x2=91
[0855] Stack: . . . , any3, any2, any1=> . . . , any1, any3,
any2, any1
[0856] Duplicate the top word on the stack and insert the copy
three words down in the stack.
dup2_x2
[0857] Duplicate top two stack words and put three down
[0858] Syntax: dup2_x2=94
[0859] Stack: . . . , any4, any3, any2, any1=> . . . , any2,
any1, any4, any3, any2, any1
[0860] Duplicate the top two words on the stack and insert the
copies three words down in the stack.
swap
[0861] Swap top two stack words
[0862] Syntax: swap=95
[0863] Stack: . . . , any2, any1=> . . . , any2, any1
[0864] Swap the top two elements on the stack.
3.8 Arithmetic Instructions
iadd
[0865] Integer add
[0866] Syntax: iadd=96
[0867] Stack: . . . , value1, value2=> . . . , result
[0868] value1 and value 2 must be integers. The values are added
and are replaced on the stack by their integer sum.
ladd
[0869] Long integer add
[0870] Syntax: ladd=97
[0871] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0872] value1 and value 2 must be long integers. The values are
added and are replaced on the stack by their long integer sum.
fadd
[0873] Single floats add
[0874] Syntax: fadd=98
[0875] Stack: . . . , value1, value2=> . . . , result
[0876] value1 and value 2 must be single-precision floating point
numbers. The values are added and are replaced on the stack by
their single-precision floating point sum.
dadd
[0877] Double floats add
[0878] Syntax: dadd=99
[0879] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0880] value1 and value 2 must be double-precision floating point
numbers. The values are added and are replaced on the stack by
their double-precision floating point sum.
isub
[0881] Integer subtract
[0882] Syntax: isub=100
[0883] Stack: . . . , value1, value2=> . . . , result
[0884] value1 and value 2 must be integers. value2 is subtracted
from value1, and both values are replaced on the stack by their
integer difference.
lsub
[0885] Long integer subtract
[0886] Syntax: lsub=101
[0887] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0888] value1 and value 2 must be long integers. valuo2 is
subtracted from value1 , and both values are replaced on the stack
by their long integer difference.
fsub
[0889] Single float subtract
[0890] Syntax: fsub=102
[0891] Stack: . . . , value1, value2=> . . . , result
[0892] value1 and value 2 must be single-precision floating point
numbers. value2 is subtracted from value1, and both values are
replaced on the stack by their single-precision floating point
difference.
dsub
[0893] Double float subtract
[0894] Syntax: dsub=103
[0895] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0896] value1 and value 2 must be double-precision floating point
numbers. value2 is subtracted from value1, and both values are
replaced on the stack by their double-precision floating point
difference.
imul
[0897] Integer multiply
[0898] Syntax: imul=104
[0899] Stack: . . . , value1, value2=> . . . , result
[0900] value1 and value 2 must be integers. Both values are
replaced on the stack by their integer product.
lmul
[0901] Long integer multiply
[0902] Syntax: lmul=105
[0903] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0904] value1 and value 2 must be long integers. Both values are
replaced on the stack by their long integer product.
fmul
[0905] Single float multiply
[0906] Syntax: fmul=106
[0907] Stack: . . . , value1, value2=> . . . , result
[0908] value1 and value 2 must be single-precision floating point
numbers. Both values are replaced on the stack by their
single-precision floating point product.
dmul
[0909] Double float multiply
[0910] Syntax: dmul=107
[0911] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . result-word1, result-word2
[0912] value1 and value 2 must be double-precision floating point
numbers. Both values are replaced on the stack by their
double-precision floating point product.
idiv
[0913] Integer divide
[0914] Syntax: idiv=108
[0915] Stack: . . . , value1, value2=> . . . , result
[0916] value1 and value 2 must be integers. value1 is divided by
value2, and both values are replaced on the stack by their integer
quotient.
[0917] The result is truncated to the nearest integer that is
between it and 0. An attempt to divide by zero results in a "/ by
zero" ArithmeticException being thrown.
ldiv
[0918] Long integer divide
[0919] Syntax: ldiv=109
[0920] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0921] value1 and value 2 must be long integers. value1 is divided
by value2, and both values are replaced on the stack by their long
integer quotient.
[0922] The result is truncated to the nearest integer that is
between it and 0. An attempt to divide by zero results in a "/ by
zero" ArithmeticException being thrown.
fdiv
[0923] Single float divide
[0924] Syntax: fdiv=110
[0925] Stack: . . . , value1, value2=> . . . , result
[0926] value1 and value 2 must be single-precision floating point
numbers. value1 is divided by value2, and both values are replaced
on the stack by their single-precision floating point quotient.
[0927] Divide by zero results in the quotient being NaN.
ddiv
[0928] Double float divide
[0929] Syntax: ddiv=111
[0930] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . result-word1, result-word2
[0931] value1 and value 2 must be double-precision floating point
numbers. value1 is divided by value2, and both values are replaced
on the stack by their double-precision floating point quotient.
[0932] Divide by zero results in the quotient being NaN.
irem
[0933] Integer remainder
[0934] Syntax: irem=112
[0935] Stack: . . . , value1, value2=> . . . , result
[0936] value1 and value 2 must both be integers. value1 is divided
by value2, and both values are replaced on the stack by their
integer remainder.
[0937] An attempt to divide by zero results in a "/ by zero"
ArithmeticException being thrown.
lrem
[0938] Long integer remainder
[0939] Syntax: lrem=113
[0940] Stack: . . . value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0941] value1 and value 2 must both be long integers. value1 is
divided by value2, and both values are replaced on the stack by
their long integer remainder.
[0942] An attempt to divide by zero results in a "/ by zero"
ArithmeticException being thrown.
frem
[0943] Single float remainder
[0944] Syntax: frem=114
[0945] Stacks . . . , value1, value2=> . . . , result
[0946] value1and value 2 must both be single-precision floating
point numbers. value1 is divided by value2, and the quotient is
truncated to an integer, and then multiplied by value2. The product
is subtracted from value1. The result, as a single-precision
floating point number, replaces both values on the stack.
result=value1-(integral_part(value1/value2) *value2), where
integral_part( ) rounds to the nearest integer, with a tie going to
the even number.
[0947] An attempt to divide by zero results in NaN.
drem
[0948] Double float remainder
[0949] Syntax: drem=115
[0950] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0951] value1 and value 2 must both be double-precision floating
point numbers. value1 is divided by value2, and the quotient is
truncated to an integer, and then multiplied by value2. The product
is subtracted from value1. The result, as a double-precision
floating point number, replaces both values on the stack.
result=value1-(integral_part(value1/value2)*value2), where
integral_part( ) rounds to the nearest integer, with a tie going to
the even number.
[0952] An attempt to divide by zero results in NaN.
ineg
[0953] Integer negate
[0954] Syntax: ineg=116
[0955] Stack: . . . , value=> . . . , result
[0956] value must be an integer. It is replaced on the stack by its
arithmetic negation.
lneg
[0957] Long integer negate
[0958] Syntax: lneg=117
[0959] Stack: . . . , value-word1, value-word2=> . . . ,
result-word1, result-word2
[0960] value must be a long integer. It is replaced on the stack by
its arithmetic negation.
fneg
[0961] Single float negate
[0962] Syntax: fneg=118
[0963] Stack: . . . , value=> . . . , result
[0964] value must be a single-precision floating point number. It
is replaced on the stack by its arithmetic negation.
dneg
[0965] Double float negate
[0966] Syntax: dneg=119
[0967] Stack: . . . , value-word1, value-word2=> . . . ,
result-word1, result-word2
[0968] value must be a double-precision floating point number. It
is replaced on the stack by its arithmetic negation.
3.9 Logical Instructions
ishl
[0969] Integer shift left
[0970] Syntax: ishl=120
[0971] Stack: . . . ,value1, value2=> . . . , result
[0972] value1 and value 2 must be integers. value1 is shifted left
by the amount indicated by the low five bits of value2. The integer
result replaces both values on the stack.
ishr
[0973] Integer arithmetic shift right
[0974] Syntax: ishr=122
[0975] Stack: . . . , value1, value2=> . . . , result
[0976] value1 and value 2 must be integers. value1 is shifted right
arithmetically (with sign extension) by the amount indicated by the
low five bits of value2. The integer result replaces both values on
the stack.
iushr
[0977] Integer logical shift right
[0978] Syntax: iushr=124
[0979] Stack: . . . , value1, value2=> . . . , result
[0980] value1 and value 2 must be integers. value1 is shifted right
logically (with no sign extension) by the amount indicated by the
low five bits of value2. The integer result replaces both values on
the stack.
lshl
[0981] Long integer shift left
[0982] Syntax: lshl=121
[0983] Stack: . . . , value1-word1, value1-word2, value2=> . . .
, result-word1, result-word2
[0984] value1 must be a long integer and value 2 must be an
integer. value1 is shifted left by the amount indicated by the low
six bits of value2. The long integer result replaces both values on
the stack.
lshr
[0985] Long integer arithmetic shift right
[0986] Syntax: lshr=123
[0987] Stack: . . . , value1-word1, value1-word2, value2=> . . .
, result-word1, result-word2
[0988] value1 must be a long integer and value 2 must be an
integer. value1 is shifted right arithmetically (with sign
extension) by the amount indicated by the low six bits of value2.
The long integer result replaces both values on the stack.
lushr
[0989] Long integer logical shift right
[0990] Syntax: lushr=125
[0991] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[0992] value1 must be a long integer and value 2 must be an
integer. value1 is shifted right logically (with no sign extension)
by the amount indicated by the low six bits of value2. The long
integer result replaces both values on the stack.
iand
[0993] Integer boolean AND
[0994] Syntax: iand=126
[0995] Stack: . . . , value1, value2=> . . . , result
[0996] value1 and value 2 must both be integers. They are replaced
on the stack by their bitwise logical and (conjunction).
land
[0997] Long integer boolean AND
[0998] Syntax: land=127
[0999] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[1000] value1and value 2 must both be long integers. They are
replaced on the stack by their bitwise logical and
(conjunction).
ior
[1001] Integer boolean OR
[1002] Syntax: ior=128
[1003] Stack: . . . , value1, value2=> . . . , result
[1004] value1 and value 2 must both be integers. They are replaced
on the stack by their bitwise logical or (disjunction).
lor
[1005] Long integer boolean OR
[1006] Syntax: lor=129
[1007] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . , result-word1, result-word2
[1008] value1 and value 2 must both be long integers. They are
replaced on the stack by their bitwise logical or
(disjunction).
ixor
[1009] Integer boolean XOR
[1010] Syntax: ixor=130
[1011] Stack: . . . , value1, value2=> . . . , result
[1012] value1 and value 2 must both be integers. They are replaced
on the stack by their bitwise exclusive or (exclusive
disjunction).
lxor
[1013] Long integer boolean XOR
[1014] Syntax: lxor=131
[1015] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word2=> . . . result-word1, result-word2
[1016] value1 and value 2 must both be long integers. They are
replaced on the stack by their bitwise exclusive or (exclusive
disjunction).
3.10 Conversion Operations
i2l
[1017] Integer to long integer conversion
[1018] Syntax: i2l=133
[1019] Stack: . . . , value=> . . . result-word1,
result-word2
[1020] value must be an integer. It is converted to a long integer.
The result replaces value on the stack.
i2f
[1021] Integer to single float
[1022] Syntax: i2f=134
[1023] Stack: . . . , value=> . . . ,result
[1024] value must be an integer. It is converted to a
single-precision floating point number. The result replaces value
on the stack.
i2d
[1025] Integer to double float
[1026] Syntax: i2d=135
[1027] Stack: . . . , value=> . . . , result-word1,
result-word2
[1028] value must be an integer. It is converted to a
double-precision floating point number. The result replaces value
on the stack.
l2i
[1029] Long integer to integer
[1030] Syntax: l2i=136
[1031] Stack: . . . , value-word1, value-word=> . . . ,
result
[1032] value must be a long integer. It is converted to an integer
by taking the low-order 32 bits. The result replaces value on the
stack.
l2f
[1033] Long integer to single float
[1034] Syntax: l2f=137
[1035] Stack: . . . value-word1, value-word2=> . . . ,
result
[1036] value must be a long integer. It is converted to a
single-precision floating point number. The result replaces value
on the stack.
l2d
[1037] Long integer to double float
[1038] Syntax: l2d=138
[1039] Stack: . . . , value-word1, value-word2=> . . .
result-word1, result-word2
[1040] value must be a long integer. It is converted to a
double-precision floating point number. The result replaces value
on the stack.
f2i
[1041] Single float to integer
[1042] Syntax: f2i=139
[1043] Stack: . . . , value => . . . , result
[1044] value must be a single-precision floating point number. It
is converted to an integer. The result replaces value on the
stack.
f2l
[1045] Single float to long integer
[1046] Syntax: f2l=140
[1047] Stack: . . . , value=> . . . , result-word1,
result-word2
[1048] value must be a single-precision floating point number. It
is converted to a long integer. The result replaces value on the
stack.
f2d
[1049] Single float to double float
[1050] Syntax: f2d=141
[1051] Stack: . . . , value=> . . . , result-word1,
result-word2
[1052] value must be a single-precision floating point number. It
is converted to a double-precision floating point number. The
result replaces value on the stack.
d2i
[1053] Double float to integer
[1054] Syntax: d2i=142
[1055] Stack: . . . , value-word1, value-word2=> . . . ,
result
[1056] value must be a double-precision floating point number. It
is converted to an integer. The result replaces value on the
stack.
d2l
[1057] Double float to long integer
[1058] Syntax: d2l=143
[1059] Stack: . . . , value-word1, value-word2=> . . . ,
result-word1, result-word2
[1060] value must be a double-precision floating point number. It
is converted to a long integer. The result replaces value on the
stack.
d2f
[1061] Double float to single float
[1062] Syntax: d2f=144
[1063] Stack: . . . , value-word1, value-word2=> . . . ,
result
[1064] value must be a double-precision floating point number. It
is converted to a single-precision floating point number. If
overflow occurs, the result must be infinity with the same sign as
value. The result replaces value on the stack.
int2byte
[1065] Integer to signed byte
[1066] Syntax: int2byte=157
[1067] Stack: . . . , value=> . . . , result
[1068] value must be an integer. It is truncated to a signed 8-bit
result, then sign extended to an integer. The result replaces value
on the stack.
int2char
[1069] Integer to char
[1070] Syntax: int2char=146
[1071] Stack: . . . , value=> . . . , result
[1072] value must be an integer. It is truncated to an unsigned
16-bit result, then zero extended to an integer. The result
replaces value on the stack.
int2short
[1073] Integer to short
[1074] Syntax: int2short=147
[1075] Stack: . . . , value=> . . . , result
[1076] value must be an integer. It is truncated to a signed 16-bit
result, then sign extended to an integer. The result replaces value
on the stack.
3.11 Control Transfer Instructions
ifeq
[1077] Branch if equal to 0
[1078] Syntax: TABLE-US-00048 ifeq = 153 branchbyte1
branchbyte2
[1079] Stack: . . . , value=> . . .
[1080] value must be an integer. It is popped from the stack. If
value is zero, branchbyte1 and branchbyte2 are used to construct a
signed 16-bit offset. Execution proceeds at that offset from the
address of this instruction. Otherwise execution proceeds at the
instruction following the ifeq.
ifnull
[1081] Branch if null
[1082] Syntax: TABLE-US-00049 ifnull = 198 branchbyte1
branchbyte2
[1083] Stack: . . . , value=> . . .
[1084] value must be a reference to an object. It is popped from
the stack. If value is null, branchbyte1 and branchbyte2 are used
to construct a signed 16-bit offset. Execution proceeds at that
offset from the address of this instruction. Otherwise execution
proceeds at the instruction following the ifnull.
iflt
[1085] Branch if less than 0
[1086] Syntax: TABLE-US-00050 iflt = 155 branchbyte1
branchbyte2
[1087] Stack: . . . , value=> . . .
[1088] value must be an integer. It is popped from the stack. If
value is less than zero, branchbyte1 and branchbyte2 are used to
construct a signed 16-bit offset. Execution proceeds at that offset
from the address of this instruction. Otherwise execution proceeds
at the instruction following the iflt.
ifle
[1089] Branch if less than or equal to 0
[1090] Syntax: TABLE-US-00051 ifle=158 branchbyte1 branchbyte2
[1091] Stack: . . . , value=> . . .
[1092] value must be an integer. It is popped from the stack. If
value is less than or equal to zero, branchbyte1 and branchbyte2
are used to construct a signed 16-bit offset. Execution proceeds at
that offset from the address of this instruction. Otherwise
execution proceeds at the instruction following the ifle.
ifne
[1093] Branch if not equal to 0
[1094] Syntax: TABLE-US-00052 ifne=154 branchbyte1 branchbyte2
[1095] Stack: . . . value=> . . .
[1096] value must be an integer. It is popped from the stack. If
value is not equal to zero, branchbyte1 and branchbyte2 are used to
construct a signed 16-bit offset. Execution proceeds at that offset
from the address of this instruction. Otherwise execution proceeds
at the instruction following the ifne.
ifnonnull
[1097] Branch if not null
[1098] Syntax: TABLE-US-00053 ifnonnull=199 branchbyte1
branchbyte2
[1099] Stack: . . . , value=> . . .
[1100] value must be a reference to an object. It is popped from
the stack. If value is notnull, branchbyte1 and branchbyte2 are
used to construct a signed 16-bit offset. Execution proceeds at
that offset from the address of this instruction. Otherwise
execution proceeds at the instruction following the ifnonnull.
ifgt
[1101] Branch if greater than 0
[1102] Syntax: TABLE-US-00054 ifft=157 branchbyte1 branchbyte2
[1103] Stack: . . . , value=> . . .
[1104] value must be an integer. It is popped from the stack. If
value is greater than zero, branchbyte1 and branchbyte2 are used to
construct a signed 16-bit offset. Execution proceeds at that offset
from the address of this instruction. Otherwise execution proceeds
at the instruction following the ifgt.
ifge
[1105] Branch if greater than or equal to 0
[1106] Syntax: TABLE-US-00055 ifge=156 branchbyte1 branchbyte2
[1107] Stack: . . . , value=> . . .
[1108] value must be an integer, It is popped from the stack. If
value is greater than or equal to zero, branchbyte1 and branchbyte2
are used to construct a signed 16-bit offset. Execution proceeds at
that offset from the address of this instruction. Otherwise
execution proceeds at the instruction following instruction
ifge.
if_icmpeq
[1109] Branch if integers equal
[1110] Syntax: TABLE-US-00056 if_icmpeq=159 branchbyte1
branchbyte2
[1111] Stack: . . . , value1, value2=> . . .
[1112] value1 and value2 must be integers. They are both popped
from the stack. If value1 is equal to value2, branchbyte1 and
branchbyte2 are used to construct a signed 16-bit offset. Execution
proceeds, at that offset from the address of this instruction.
Otherwise execution proceeds at the instruction following
instruction if_icmpeq.
if_icmpne
[1113] Branch if integers not equal
[1114] Syntax: TABLE-US-00057 if.sub.'icmpne=160 branchbyte1
branchbyte2
[1115] Stack: . . . , value1, value2=> . . .
[1116] value1 and value2 must be integers. They are both popped
from the stack. If value1 is not equal to value2, branchbyte1 and
branchbyte2 are used to construct a signed 16-bit offset. Execution
proceeds at that offset from the address of this instruction.
Otherwise execution proceeds at the instruction following
instruction if_icmpne.
if_icmplt
[1117] Branch if integer less than
[1118] Syntax: TABLE-US-00058 if_icmplt=161 branchbyte1
branchbyte2
[1119] Stack: . . . , value1, value2=> . . .
[1120] value1 and value2 must be integers. They are both popped
from the stack. If value1 is less than value2, branchbyte1 and
branchbyte2 are used to construct a signed 16-bit offset. Execution
proceeds at that offset from the address of this instruction.
Otherwise execution proceeds at the instruction following
instruction if_icmplt.
if_icmpgt
[1121] Branch if integer greater than
[1122] Syntax: TABLE-US-00059 if_icmpgt=163 branchbyte1
branchbyte2
[1123] Stack: . . . , value1, value2=> . . .
[1124] value1 and value2 must be integers. They are both. popped
from the stack. If value1 is greater than value2, branchbyte1 and
branchbyte2 are used to construct a signed 16-bit offset. Execution
proceeds at that offset from the address of this instruction.
Otherwise execution proceeds at the instruction following
instruction if_icmpgt.
if_icmple
[1125] Branch if integer less than or equal to
[1126] Syntax: TABLE-US-00060 if_icmple=164 branchbyte1
branchbyte2
[1127] Stack: . . . , value1, value2=> . . .
[1128] value1 and value2 must be integers. They are both popped
from the stack. If value1 is less than or equal to value2,
branchbyte1 and branchbyte2 are used to construct a signed 16-bit
offset. Execution proceeds at that offset from the address of this
instruction. Otherwise execution proceeds at the instruction
following instruction if_icmple.
if_icmpge
[1129] Branch if integer greater than or equal to
[1130] Syntax: TABLE-US-00061 if_icmpge=162 branchbyte1
branchbyte2
[1131] Stack: . . . , value1, value2=> . . .
[1132] value1 and value2 must be integers. They are both popped
from the stack. If value1 is greater than or equal to value2,
branchbyte1 and branchbyte2 are used to construct a signed 16-bit
offset Execution proceeds at that offset from the address of this
instruction. Otherwise execution proceeds at the instruction
following instruction if_icmpge.
lcmp
[1133] Long integer compare
[1134] Syntax:
[1135] Stack: . . . , value1-word1,
[1136] value1-word2,value2-word1, value2-word1=> . . . ,
result
[1137] value1 and value2 must be long integers. They are both
popped from the stack and compared. If value1 is greater than
value2, the integer value1 is pushed onto the stack. If value1 is
equal to value2, the value 0 is pushed onto the stack. If value1 is
less than value2, the value -1 is pushed onto the stack.
fcmpl
[1138] Single float compare (1 on NaN)
[1139] Syntax: fcmpl=149
[1140] Stack: . . . ; value1, value2=> . . . ,result
[1141] value1 and value2 must be single-precision floating point
numbers. They are both popped from the stack and compared. If
value1 is greater than value2, the integer value 1 is pushed onto
the stack. If value1 is equal to value2, the value 0 is pushed onto
the stack. If value1 is less than value2, the value -1 is pushed
onto the stack.
[1142] If either value1 or value2 is NaN, the value -1 is pushed
onto the stack.
fcmpg
[1143] Single float compare (1 on NaN)
[1144] Syntax: fcmpg=150
[1145] Stack: . . . ,value1 , value2=> . . . , result
[1146] value1 and value2 must be single-precision floating point
numbers. They are both popped from the stack and compared. If
value1 is greater than value2, the integer value 1 is pushed onto
the stack. If value1 is equal to value2, the value 0 is pushed onto
the stack. If value1 is less than value2, the value -1 is pushed
onto the stack.
[1147] If either value1 or value2 is NaN, the value 1 is pushed
onto the stack.
dcmpl
[1148] Double float compare (-1 on NaN)
[1149] Syntax: dcmpl=151
[1150] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word1=> . . . , result
[1151] value1 and value2 must be double-precision floating point
numbers. They are both popped from the stack and compared. If
value1 is greater than value2, the integer value 1 is pushed onto
the stack. If value1 is equal to value2, the value 0 is pushed onto
the stack. If value1 is less than value2, the value 1 is pushed
onto the stacks
[1152] If either value1 or value2 is NaN, the value 1 is pushed
onto the stack.
dcmpg
[1153] Double float compare (1 on NaN)
[1154] Syntax: dcmpg=152
[1155] Stack: . . . , value1-word1, value1-word2, value2-word1,
value2-word1=> . . . , result
[1156] value1 and value2 must be double-precision floating point
numbers. They are both popped from the stack and compared. If
value1 is greater than value2, the integer value 1 is pushed onto
the stack. If value1 is equal to value2, the value 0 is pushed onto
the stack. If value1 is less than value2, the value -1 is pushed
onto the stack.
[1157] If either value1 or value2 is NaN, the value 1 is pushed
onto the stack.
if_acmpeq
[1158] Branch if object references are equal
[1159] Syntax: TABLE-US-00062 if_acmpeq=165 branchbyte1
branchbyte2
[1160] Stack: . . . ,value1 ,value2=> . . .
[1161] value1 and value2 must be references to objects. They are
both popped from the stack. If the objects referenced are not the
same, branchbyte1 and branchbyte2 are used to construct a signed
16-bit offset.
[1162] Execution proceeds at that offset from the Address of this
instruction. Otherwise execution proceeds at the instruction
following the if_acmpeq.
if_acmpne
[1163] Branch if object references not equal
[1164] Syntax: TABLE-US-00063 if_acmpne=166 branchbyte1
branchbyte2
[1165] Stack: . . . , value1 , value2=>. . .
[1166] value1 and value2 must be references to objects. They are
both popped from the stack. If the objects referenced are not the
same, branchbyte1 and branchbyte2 are used to construct a signed
16-bit offset.
[1167] Execution proceeds at that offset from the address of this
instruction. Otherwise execution proceeds at the instruction
following instruction if_acmpne.
goto
[1168] Branch always
[1169] Syntax: TABLE-US-00064 goto=167 branchbyte1 branchbyte2
[1170] Stack: no change
[1171] branchbyte1 and branchbyte2 are used to construct a signed
16-bit offset. Execution proceeds at that offset from the address
of this instruction.
goto_w
[1172] Branch always (wide index)
[1173] Syntax: TABLE-US-00065 goto_w=200 branchbyte1 branchbyte2
branchbyte3 branchbyte4
[1174] Stack: no change
[1175] branchbyte1, branchbyte2, branchbyte3, and branchbyte4 are
used to construct a signed 32-bit offset.
[1176] Execution proceeds at that offset from the address of this
instruction.
jsr
[1177] Jump subroutine
[1178] Syntax: TABLE-US-00066 jsr=168 branchbyte1 branchbyte2
[1179] Stack: . . . => . . . , return-address
[1180] branchbyte1 and branchbyte2 are used to construct a signed
16-bit offset. The address of the instruction immediately following
the jar is pushed onto the stack. Execution proceeds at the offset
from the address of this instruction.
jsr_w
[1181] Jump subroutine (wide index)
[1182] Syntax: TABLE-US-00067 jsr_w=201 branchbyte1 branchbyte2
branchbyte3 branchbyte4
[1183] Stack: . . . => . . . , return-address
[1184] branchbyte1, branchbyte2, branchbyte3, and branchbyte4 are
used to construct a signed 32-bit offset. The address of the
instruction immediately following the jsr_w is pushed onto the
stack. Execution proceeds at the offset from the address of this
instruction.
ret
[1185] Return from subroutine
[1186] Syntax: TABLE-US-00068 ret=169 vindex
[1187] Stack: no change
[1188] Local variable vindex in the current JAVA frame must contain
a return address. The contents of the local variable are written
into the pc.
[1189] Note that jsr pushes the address onto the stack, and ret
gets it out of a local variable. This asymmetry is intentional.
ret_w
[1190] Return from subroutine (wide index)
[1191] Syntax: TABLE-US-00069 ret_w=209 vindexbyte1 vindexbyte2
[1192] Stack: no change
[1193] vindexbyte1 and vindexbyte2 are assembled into an unsigned
16-bit index to a local variable in the current JAVA frame. That
local variable must contain a return address. The contents of the
local variable are written into the pc. See the ret instruction for
more information.
3.12 Function Return
ireturn
[1194] Return integer from function
[1195] Syntax: ireturn=172
[1196] Stack: . . . , value=> [empty]
[1197] value must be an integer. The value value is pushed onto the
stack of the previous execution environment. Any other values on
the operand stack are discarded. The interpreter then returns
control to its caller.
lreturn
[1198] Return long integer from function
[1199] Syntax: lreturn=173
[1200] Stack: . . . , value-word1, value-word2=> [empty]
[1201] value must be a long integer. The value value is pushed onto
the stack of the previous execution environment. Any other values
on the operand stack are discarded. The interpreter then returns
control to its caller.
freturn
[1202] Return single float from function
[1203] Syntax: freturn=174
[1204] Stack: . . . , value=> [empty]
[1205] value must be a single-precision floating point number. The
value value is pushed onto the stack of the previous execution
environment. Any other values on the operand stack are discarded.
The interpreter then returns control to its caller.
dreturn
[1206] Return double float from function
[1207] Syntax: dreturn=175
[1208] Stack: . . . , value-word1, value-word2=> [empty]
[1209] value must be a double-precision floating point number. The
value value is pushed onto the stack of the previous execution
environment. Any other values on the operand stack are discarded.
The interpreter. then returns control to its caller.
areturn
[1210] Return object reference from function
[1211] Syntax: areturn=176
[1212] Stack: . . . , value=> [empty]
[1213] value must be a reference to an object. The value value is
pushed onto the stack of the previous execution environment. Any
other values on the operand stack are discarded. The interpreter
then returns control to its caller.
return
[1214] Return (void) from procedure
[1215] Syntax: return=177
[1216] Stack: . . . => [empty]
[1217] All values on the operand stack are discarded. The
interpreter then returns control to its caller.
breakpoint
[1218] Stop and pass control to breakpoint handler
[1219] Syntax: breakpoint=202
[1220] Stack: no change
3.13 Table Jumping
tableswitch
[1221] Access jump table by index and jump
[1222] Syntax: TABLE-US-00070 tableswitch=170 ...0-3 byte pad...
default-offset1 default-offset2 default-offset3 default-offset4
low1 low2 low3 low4 high1 high2 high3 high4 ...jump offsets...
[1223] Stack: . . . , index=> . . .
[1224] tableswitch is a variable length instruction. Immediately
after the tableswitch opcode, between zero and three 0's are
inserted as padding so that the next byte begins at an address that
is a multiple of four. After the padding follow a series of signed
4-byte quantities: default-offset, low, high, and then high-low+1
further signed 4-byte offsets. The high-low+1 signed 4-byte offsets
are treated as a 0-based jump table.
[1225] The index must be an integer. If index is less than low or
index is greater than high, then default-offset is added to the
address of this instruction. Otherwise, low is subtracted from
index, and the index-low'th element of the jump table is extracted,
and added to the address of this instruction.
lookupswitch
[1226] Access jump table by key match and jump
[1227] Syntax: TABLE-US-00071 lookupswitch=171 ...0-3 byte pad..
default-offset1 default-offset2 default-offset3 default-offset4
npairs1 npairs2 npairs3 npairs4 ...match-offset pairs...
[1228] Stack: . . . , key=> . . .
[1229] lookupswitch is a variable length instruction. Immediately
after the lookupswitch opcode, between zero and three 0's are
inserted as padding so that the next byte begins at an address that
is a multiple of four.
[1230] Immediately after the padding are a series of pairs of
signed 4-byte quantities. The first pair is special. The first item
of that pair is the default offset, and the second item of that
pair gives the number of pairs that follow. Each subsequent pair
consists of a match and an offset.
[1231] The key must be an integer. The integer key on the stack is
compared against each of the matches. If it is equal to one of
them, the offset is added to the address of this instruction. If
the key does not match any of the matches, the default offset is
added to the address of this instruction.
3.14 Manipulating Object Fields
putfield
[1232] Set field in object
[1233] Syntax: TABLE-US-00072 putfield=181 indexbyte1
indexbyte2
[1234] Stack: . . . , objectref, value=> . . . [1235] OR Stack:
. . . , objectref, value-word1, value-word2=> . . .
[1236] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a class name and a field name. The
item is resolved to a field block pointer which has both the field
width (in bytes) and the field offset (in bytes).
[1237] The field at that offset from the start of the object
referenced by object refwill be set to the value on the top of the
stack.
[1238] This instruction deals with both 32-bit and 64-bit wide
fields.
[1239] If object ref is null, aNullPointerException is
generated.
[1240] If the specified field is a static field,
anIncompatibleClassChangeError is thrown.
getfield
[1241] Fetch field from object
[1242] Syntax: TABLE-US-00073 getfield=180 indexbyte1
indexbyte2
[1243] Stack: . . . , objectref=> . . . ,value [1244] OR Stack:
. . . , objectref=> . . . , value-word1, value-word2
[1245] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a class name and a field name. The
item is resolved to a field block pointer which has both the field
width (in bytes) and the field offset (in bytes).
[1246] objectref must be a reference to an object. The value at
offset into the object referenced by objectref replaces objectref
on the top of the stack.
[1247] This instruction deals with both 32-bit and 64-bit wide
fields.
[1248] If objectref is null, a NullPointerException is
generated.
[1249] If the specified field is a static field, an
IncompatibleClassChangeError is thrown.
putstatic
[1250] Set static field in class
[1251] Syntax: TABLE-US-00074 putstatic=179 indexbyte1
indexbyte2
Stack: . . . , value=> . . . [1252] OR
[1253] Stack: . . . , value-word1, value-word2=> . . .
[1254] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class. That field
will be set to have the value on the top of the stack.
[1255] This instruction works for both 32-bit and 64-bit wide
fields.
[1256] If the specified field is a dynamic field, an
IncompatibleClassChangeError is thrown.
getstatic
[1257] Get static field from class
[1258] Syntax: TABLE-US-00075 getstatic=178 indexbyte1
indexbyte2
Stack: . . . , => . . . , value [1259] OR Stack: . . . , => .
. . value-word1, value-word2
[1260] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class.
[1261] This instruction deals with both 32-bit and 64-bit wide
fields.
[1262] If the specified field is a dynamic field, an
IncompatibleClassChangeError is generated.
3.15 Method Invocation
[1263] There are four instructions that implement method
invocation. [1264] invokevirtual Invoke an instance method of an
object, dispatching based on the runtime (virtual) type of the
object. This is the normal method dispatch in JAVA. [1265]
invokenonvirtual Invoke an instance method of an object,
dispatching based on the compile-time (non-virtual) type of the
object. This is used, for example, when the keywordsuper or the
name of a superclass is used as a method qualifier. [1266]
invokestatic Invoke a class (static) method in a named class.
[1267] invokeinterface Invoke a method which is implemented by an
interface, searching the methods implemented by the particular
run-time object to find the appropriate method. invokevirtual
[1268] Invoke instance method, dispatch based on run-time type
[1269] Syntax: TABLE-US-00076 invokevirtual=182 indexbyte1
indexbyte2
[1270] Stack: . . . objectref, [arg1, [arg2 . . . ]], . . . => .
. .
[1271] The operand stack must contain a reference to an object and
some number of arguments.indexbyte1 and indexbyte2 are used to
construct an index into the constant pool of the current class. The
item at that index in the constant pool contains the complete
method signature. A pointer to the object's method table is
retrieved from the object reference. The method signature is looked
up in the method table. The method signature is guaranteed to
exactly match one of the method signatures in the table.
[1272] The result of the lookup is an index into the method table
of the named class, which is used with the object's dynamic type to
look in the method table of that type, where a pointer to the
method block for the matched method is found. The method block
indicates the type of method (native, synchronized, and so on) and
the number of arguments expected on the operand stack.
[1273] If the method is marked synchronized the monitor associated
with objectref is entered.
[1274] The objectref and arguments are popped off this method's
stack and become the initial values of the local variables of the
new method. Execution continues with the first instruction of the
new method.
[1275] If the object reference on the operand stack is null, a
NullPointerException is thrown. If during the method invocation a
stack overflow is detected, a StackOverflowError is thrown.
invokenonvirtual
[1276] Invoke instance method, dispatching based on compile-time
type
[1277] Syntax: TABLE-US-00077 invokenonvirtual = 183 indexbyte1
indexbyte2
[1278] Stack: . . . , objectref, [arg1, [arg2 . . . ]], . . . =>
. . .
[1279] The operand stack must contain a reference to an object and
some number of arguments.indexbyte1 and indexbyte2 are used to
construct an index into the constant pool of the current class. The
item at that index in the constant pool contains a complete method
signature and class. The method signature is looked up in the
method table of the class indicated. The method signature is
guaranteed to exactly match one of the method signatures in the
table.
[1280] The result of the lookup is a method block. The method block
indicates the type of method (native, synchronized, and so on) and
the number of arguments (nargs) expected on the operand stack.
[1281] If the method is marked synchronized the monitor associated
with objectref is entered.
[1282] The objectref and arguments are popped off this method's
stack and become the initial values of the local variables of the
new method. Execution continues with the first instruction of the
new method.
[1283] If the object reference on the operand stack is null, a
NullPointerException is thrown. If during the method invocation a
stack overflow is detected, a StackOverflowError is thrown.
invokestatic
[1284] Invoke a class (static) method
[1285] Syntax: TABLE-US-00078 invokestatis = 184 indexbyte1
indexbyte2
[1286] Stack: . . . , [arg1, [arg2 . . .]]. . . => . . .
[1287] The operand stack must contain some number of
arguments.indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The item at that index
in the constant pool contains the complete method signature and
class. The method signature is looked up in the method table of the
class indicated. The method signature is guaranteed to exactly
match one of the method signatures in the class's method table.
[1288] The result of the lookup is a method block. The method block
indicates the type of method (native, synchronized, and so on) and
the number of arguments (nargs) expected on the operand stack.
[1289] If the method is marked synchronized the monitor associated
with the class is entered.
[1290] The arguments are popped off this method's stack and become
the initial values of the local variables of the new method.
Execution continues with the first instruction of the new
method.
[1291] If during the method invocation a stack overflow is
detected, a StackOverflowError is thrown.
invokeinterface
[1292] Invoke interface method
[1293] Syntax: TABLE-US-00079 invokeinterface = 185 indexbyte1
indexbyte2 nargs reserved
Stack: . . . , objectref, [arg1, [arg2 . . . ]], . . . => . .
.
[1294] The operand stack must contain a reference to an object and
nargs-1 arguments. indexbyte1 and indexbyte2 are used to construct
an index into the constant pool of the current class. The item at
that index in the constant pool contains the complete method
signature. A pointer to the object's method table is retrieved from
the object reference. The method signature is looked up in the
method table. The method signature is guaranteed to exactly match
one of the method signatures in the table.
[1295] The result of the lookup is a method block. The method block
indicates the type of method (native, synchronized, and so on) but
unlike invokevirtual and invokenonvirtual, the number of available
arguments (nargs) is taken from the bytecode.
[1296] If the method is markedsynchronized the monitor associated
with objectref is entered.
[1297] The objectref and arguments are popped off this method's
stack and become the initial values of the local variables of the
new method. Execution continues with the first instruction of the
new method.
[1298] If the objectref on the operand stack is null, a
NullPointerException is thrown. If during the method invocation a
stack overflow is detected, a StackOverflowError is thrown.
3.16 Exception Handling
athrow
[1299] Throw exception or error
[1300] Syntax: athrow=191
[1301] Stack: . . . , objectref=> [undefined]
[1302] objectref must be a reference to an object which is a
subclass of Throwable, which is thrown. The current JAVA stack
frame is searched for the most recent catch clause that catches
this class or a superclass of this class. If a matching catch list
entry is found, the pc is reset to the address indicated by the
catch-list entry, and execution continues there.
[1303] If no appropriate catch clause is found in the current stack
frame, that frame is popped and the object is rethrown. If one is
found, it contains the location of the code for this exception. The
pc is reset to that location and execution continues. If no
appropriate catch is found in the current stack frame, that frame
is popped and the objectref is rethrown.
[1304] If objectref is null, then a NullPointerException is thrown
instead.
3.17 Miscellaneous Object Operations
new
[1305] Create new object
[1306] Syntax: TABLE-US-00080 new = 187 indexbyte1 indexbyte2
[1307] Stack: . . . => . . . , objectref
[1308] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The item at that index
must be a class name that can be resolved to a class pointer,
class. A new instance of that class is then created and a reference
to the object is pushed on the stack.
checkcast
[1309] Make sure object is of given type
[1310] Syntax: TABLE-US-00081 checkcast = 192 indexbyte1
indexbyte2
[1311] Stack: . . . , objectref=> . . . , objectref
[1312] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The string at that
index of the constant pool is presumed to be a class name which can
be resolved to a class pointer, class. objectref must be a
reference to an object.
[1313] checkcast determines whether objectref can be cast to be a
reference to an object of class class. A null objectref can be cast
to any class. Otherwise the referenced object must be an instance
of class or one of its superclasses. If objectref can be cast to
class execution proceeds at the next instruction, and the objectref
remains on the stack.
[1314] If objectref cannot be cast to class, a ClassCastException
is thrown.
instanceof
[1315] Determine if an object is of given type
[1316] Syntax: TABLE-US-00082 instanceof = 193 indexbyte1
indexbyte2
[1317] Stack: . . . , objectref=> . . . , result
[1318] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The string at that
index of the constant pool is presumed to be a class name which can
be resolved to a class pointer, class. objectref must be a
reference to an object.
[1319] instanceof determines whether objectref can be cast to be a
reference to an object of the class class. This instruction will
overwrite objectref with 1 if objectref is an instance of class or
one of its superclasses. Otherwise, objectref is overwritten by 0.
If objectref is null, it's overwritten by 0.
3.18 Monitors
monitorenter
[1320] Enter monitored region of code
[1321] Syntax: monitorenter=194
[1322] Stack: . . . , objectref=> . . .
[1323] objectref must be a reference to an object,
[1324] The interpreter attempts to obtain exclusive access via a
lock mechanism to objectref. If another thread already has
objectref locked, than the current thread waits until the object is
unlocked. If the current thread already has the object locked, then
continue execution. If the object is not locked, then obtain an
exclusive lock.
[1325] If objectref is null, then a NullPointerException is thrown
instead.
monitorexit
[1326] Exit monitored region of code
[1327] Syntax: monitorexit=195
[1328] Stack: . . . , objectref=> . . .
[1329] objectref must be a reference to an object. The lock on the
object released. If this is the last lock that this thread has on
that object (one thread is allowed to have multiple locks on a
single object), then other threads that are waiting for the object
to be available are allowed to proceed.
[1330] If objectref is null, then a NullPointerException is thrown
instead.
Appendix A: An Optimization
[1331] The following set of pseudo-instructions suffixed by _quick
are variants of JAVA virtual machine instructions. They are used to
improve the speed of interpreting bytecodes. They are not part of
the virtual machine specification or instruction set, and are
invisible outside of an JAVA virtual machine implementation.
However, inside a virtual machine implementation they have proven
to be an effective optimization.
[1332] A compiler from JAVA source code to the JAVA virtual machine
instruction set emits only non-_quick instructions. If the _quick
pseudo-instructions are used, each instance of a non-_quick
instruction with a _quick variant is overwritten on execution by
its_quick variant. Subsequent execution of that instruction
instance will be of the_quick variant.
[1333] In all cases, if an instruction has an alternative version
with the suffix_quick, the instruction references the constant
pool. If the_quick optimization is used, each non-_quick
instruction with a_quick variant performs the following: [1334]
Resolves the specified item in the constant pool; [1335] Signals an
error if the item in the constant pool could not be resolved for
some reason; [1336] Turns itself into the _quick version of the
instruction. The instructions putstatic, getstatic, putfield, and
getfield each have two_quick versions; and [1337] Performs its
intended operation,
[1338] This is identical to the action of the instruction without
the _quick optimization, except for the additional step in which
the instruction overwrites itself with its _quick variant.
[1339] The _quick variant of an instruction assumes that the item
in the constant pool has already been resolved, and that this
resolution did not generate any errors. It simply performs the
intended operation on the resolved item.
[1340] Note: some of the invoke methods only support a single-byte
offset into the method table of the object; for objects with 256 or
more methods some invocations cannot be "quicked" with only these
bytecodes.
[1341] This Appendix doesn't give the opcode values of the
pseudo-instructions, since they are invisible and subject to
change.
A.1 Constant Pool Resolution
[1342] When the class is read in, an array constant_pool [ ] of
size n constants is created and assigned to a field in the
class.constant_pool [0] is set to point to a dynamically allocated
array which indicates which fields in the constant_pool have
already been resolved.constant_pool [1] through constant pool
[nconstants-1] are set to point at the "type" field that
corresponds to this constant item.
[1343] When an instruction is executed that references the constant
pool, an index is generated, and constant_pool[0] is checked to see
if the index has already been resolved. If so, the value of
constant_pool [index] is returned. If not, the value of
constant_pool [index] is resolved to be the actual pointer or data,
and overwrites whatever value was already in constant_pool
[index].
A.2 Pushing Constants onto the Stack (_quick variants)
ldcl_quick
[1344] Push item from constant pool onto stack
[1345] Syntax: TABLE-US-00083 ldc1_quick indexbyte1
[1346] Stack: . . . => . . . ,item
[1347] indexbyte1 is used as an unsigned 8-bit index into the
constant pool of the current class. The item at that index is
pushed onto the stack.
ldc2_quick
[1348] Push item from constant pool onto stack
[1349] Syntax: TABLE-US-00084 ldc2_quick indexbyte1 indexbyte2
[1350] Stack: . . . => . . . , item
[1351] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant at that
index is resolved and the item at that index is pushed onto the
stack.
ldc2w_quick
[1352] Push long integer or double float from constant pool onto
stack
[1353] Syntax: TABLE-US-00085 ldc2w_quick indexbyte1 indexbyte2
[1354] Stack: . . . => . . . ,constant-word1,constant-word2
[1355] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant at that
index is pushed onto the stack.
A.3 Managing Arrays (_quick variants)
anewarray_quick
[1356] Allocate new array of references to objects
[1357] Syntax: TABLE-US-00086 anewarray_quick indexbyte1
indexbyte2
[1358] Stack: . . . ,size=>result
[1359] size must be an integer. It represents the number of
elements in the new array.
[1360] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The entry must be a
class.
[1361] A new array of the indicated class type and capable of
holding size elements is allocated, and result is a reference to
this new array. Allocation of an array large enough to contain size
items of the given class type is attempted. All elements of the
array are initialized to zero.
[1362] If size is less than zero, a NegativeArraySizeException is
thrown. If there is not enough memory to allocate the; array, an
OutOfMemoryError is thrown.
multianewarray_quick
[1363] Allocate new multi-dimensional array
[1364] Syntax: TABLE-US-00087 multianewarray_quick indexbyte1
indexbyte2 dimensions
[1365] Stack: . . . ,size1,size2, . . . sizen=>result
[1366] Each size must be an integer. Each represents the number of
elements in a dimension of the array.
[1367] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The resulting entry
must be a class.
[1368] dimensions has the following aspects:
[1369] It must be an integer .gtoreq.1. [1370] It represents the
number of dimensions being created. It must be .ltoreq. the number
of dimensions of the array class. [1371] It represents the number
of elements that are popped off the stack. All must be integers
greater than or equal to zero. These are used as the sizes of the
dimension.
[1372] If any of the size arguments on the stack is less than zero,
a NegativeArraySizeException is thrown. If there is not enough
memory to allocate the array, an OutOfMemoryError is thrown.
[1373] The result is a reference to the new array object.
A.4 Manipulating Object Fields (_quick variants)
putfield_quick
[1374] Set field in object
[1375] Syntax: TABLE-US-00088 putfield2_quick offset unused
[1376] Stack: . . . ,objectref,value=> . . .
[1377] objectref must be a reference to an object. value must be a
value of a type appropriate for the specified field. offset is the
offset for the field in that object. value is written at offset
into the object. Both objectref and value are popped from the
stack.
[1378] If objectref is null, a NullPointerException is
generated.
putfield2_quick
[1379] Set long integer or double float field in object
[1380] Syntax: TABLE-US-00089 putfield2_quick offset unused
[1381] Stack: . . . ,objectref,value-word1,value-word2=> . .
.
[1382] objectref must be a reference to an object. value must be a
value of a type appropriate for the specified field. offset is the
offset for the field in that object. value is written at offset
into the object. Both objectref and value are popped from the
stack.
[1383] If objectref is null, a NullPointerException is
generated.
getfield_quick
[1384] Fetch field from object
[1385] Syntax: TABLE-US-00090 getfield2_quick offset unused
[1386] Stack: . . . ,objectref=> . . . ,value
[1387] objectref must be a handle to an object. The value at offset
into the object referenced by objectref replaces objectref on the
top of the stack.
[1388] If objectref is null, a NullPointerException is
generated.
getfield2_quick
[1389] Fetch field from object
[1390] Syntax: TABLE-US-00091 getfield2_quick offset unused
[1391] Stack: . . . ,objectref=> . . .
,value-word1,value-word2
[1392] objectref must be a handle to an object. The value at offset
into the object referenced by objectref replaces objectref on the
top of the stack.
[1393] If objectref is null, a NullPointerException is
generated.
putstatic_quick
[1394] Set static field in class
[1395] Syntax: TABLE-US-00092 putstatic_quick indexbyte1
indexbyte2
[1396] Stack: . . . ,value=> . . .
[1397] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class.value must
be the type appropriate to that field. That field will be set to
have the value value.
putstatic2_quick
[1398] Set static field in class
[1399] Syntax: TABLE-US-00093 putstatic2_quick indexbyte1
indexbyte2
[1400] Stack: . . . ,value-word1,value-word2=> . . .
[1401] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class. That field
must either be a long integer or a double precision floating point
number. value must be the type appropriate to that field. That
field will be set to have the value value.
getstatic_quick
[1402] Get static field from class
[1403] Syntax: TABLE-US-00094 getstatic_quick indexbyte1
indexbyte2
[1404] Stack: . . . ,=> . . . ,value
[1405] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class. The value
of that field will replace handle on the stack.
getstatic2_quick
[1406] Get static field from class
[1407] Syntax: TABLE-US-00095 getstatic2_quick indexbyte1
indexbyte2
[1408] Stack: . . . ,=> . . . ,value-word1,value-word2
[1409] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The constant pool item
will be a field reference to a static field of a class. The field
must be a long integer or a double precision floating point number.
The value of that field will replace handle on the stack
A.5 Method Invocation (_quick variants)
invokevirtual_quick
[1410] Invoke instance method, dispatching based on run-time
type
[1411] Syntax: TABLE-US-00096 invokevirtual_quick offset nargs
[1412] Stack: . . . ,objectref,[arg1,[arg2 . . . ]]=> . . .
[1413] The operand stack must contain objectref, a reference to an
object and nargs-1 arguments. The method block at offset in the
object's method table, as determined by the object's dynamic type,
is retrieved. The method block indicates the type of method
(native, synchronized, etc.).
[1414] If the method is marked synchronized the monitor associated
with the object is entered.
[1415] The base of the local variables array for the new JAVA stack
frame is set to point to objectref on the stack, making objectref
and the supplied arguments (arg1,arg2, . . . ) the first nargs
local variables of the new frame. The total number of local
variables used by the method is determined, and the execution
environment of the new frame is pushed after leaving sufficient
room for the locals. The base of the operand stack for this method
invocation is set to the first word after the execution
environment. Finally, execution continues with the first
instruction of the matched method.
[1416] If objectref is null, a NullPointerException is thrown. If
during the method invocation a stack overflow is detected, a
StackOverflowError is thrown.
invokevirtualobject_quick
[1417] Invoke instance method of class JAVA.lang.Object,
specifically for benefit of arrays
[1418] Syntax: TABLE-US-00097 invokevirtualobject_quick offset
nargs
[1419] Stack: . . . ,objectref, [arg1, [arg2 . . . ]]=> . .
.
[1420] The operand stack must contain objectref, a reference to an
object or to an array and nargs-1 arguments. The method block at
offset in JAVA.lang.Object's method table is retrieved. The method
block indicates the type of method (native, synchronized,
etc.).
[1421] If the method is marked synchronized the monitor associated
with handle is entered.
[1422] The base of the local variables array for the new JAVA stack
frame is set to point to objectref on the stack, making objectref
and the supplied arguments (arg1,arg2, . . . ) the first nargs
local variables of the new frame. The total number of local
variables used by the method is determined, and the execution
environment of the new frame is pushed after leaving sufficient
room for the locals. The base of the operand stack for this method
invocation is set to the first word after the execution
environment. Finally, execution continues with the first
instruction of the matched method.
[1423] If objectref is null, a NullPointerException is thrown. If
during the method invocation a stack overflow is detected, a
StackOverflowError is thrown.
invokenonvirtual_quick
[1424] Invoke instance method, dispatching based on compile-time
type
[1425] Syntax: TABLE-US-00098 invokenonvirtual_quick indexbyte1
indexbyte2
[1426] Stack: . . . ,objectref,[arg1, [arg2 . . . ]]=> . . .
[1427] The operand stack must contain objectref, a reference to an
object and some number of arguments. indexbyte1 and indexbyte2 are
used to construct an index into the constant pool of the current
class. The item at that index in the constant pool contains a
method slot index and a pointer to a class. The method block at the
method slot index in the indicated class is retrieved. The method
block indicates the type of method (native, synchronized, etc.) and
the number of arguments (nargs) expected on the operand stack.
[1428] If the method is marked synchronized the monitor associated
with the object is entered.
[1429] The base of the local variables array for the new JAVA stack
frame is set to point to objectref on the stack, making objectref
and the supplied arguments (arg1, arg2, . . . ) the first nargs
local variables of the new frame. The total number of local
variables used by the method is determined, and the execution
environment of the new frame is pushed after leaving sufficient
room for the locals. The base of the operand stack for this method
invocation is set to the first word after the execution
environment. Finally, execution continues with the first
instruction of the matched method.
[1430] If objectref is null, a NullPointerException is thrown. If
during the method invocation a stack overflow is detected, a
StackOverflowError is thrown.
invokestatic_quick
[1431] Invoke a class (static) method
[1432] Syntax: TABLE-US-00099 invokestatic_quick indexbyte1
indexbyte2
Stack: . . . , [arg1, [arg2 . . . ]]=> . . .
[1433] The operand stack must contain some number of arguments.
indexbyte1 and indexbyte2 are used to construct an index into the
constant pool of the current class. The item at that index in the
constant pool contains a method slot index and a pointer to a
class. The method block at the method slot index in the indicated
class is retrieved. The method block indicates the type of method
(native, synchronized, etc.) and the number of arguments (nargs)
expected on the operand stack.
[1434] If the method is marked synchronized the monitor associated
with the method's class is entered.
[1435] The base of the local variables array for the new JAVA stack
frame is set to point to the first argument on the stack, making
the supplied arguments (arg1,arg2, . . . ) the first nargs local
variables of the new frame. The total number of local variables
used by the method is determined, and the execution environment of
the new frame is pushed after leaving sufficient room for the
locals. The base of the operand stack for this method invocation is
set to the first word after the execution environment. Finally,
execution continues with the first instruction of the matched
method.
[1436] If the object handle on the operand stack is null, a
NullPointerException is thrown. If during the method invocation a
stack overflow is detected, a StackOverflowError is thrown.
invokeinterface_quick
[1437] Invoke interface method
[1438] Syntax: TABLE-US-00100 invokeinterface_quick idbyte1 idbyte2
nargs guess
[1439] Stack: . . . ,objectref,[arg1,[arg2 . . . ]]=> . . .
[1440] The operand stack must contain objectref, a reference to an
object, and nargs-1 arguments. idbyte1 and idbyte2 are used to
construct an index into the constant pool of the current class. The
item at that index in the constant pool contains the complete
method signature. A pointer to the object's method table is
retrieved from the object handle.
[1441] The method signature is searched for in the object's method
table. As a short-cut, the method signature at slot guess is
searched first. If that fails, a complete search of the method
table is performed. The method signature is guaranteed to exactly
match one of the method signatures in the table.
[1442] The result of the lookup is a method block. The method block
indicates the type of method (native, synchronized, etc.) but the
number of available arguments (nargs) is taken from the
bytecode.
[1443] If the method is marked synchronized the monitor associated
with handle is entered.
[1444] The base of the local variables array for the new JAVA stack
frame is set to point to handle on the stack, making handle and the
supplied arguments (arg1,arg2, . . . ) the first nargs local
variables of the new frame. The total number of local variables
used by the method is determined, and the execution environment of
the new frame is pushed after leaving sufficient room for the
locals. The base of the operand stack for this method invocation is
set to the first word after the execution environment. Finally,
execution continues with the first instruction of the matched
method.
[1445] If objectref is null, a NullPointerException is thrown. If
during the method invocation a stack overflow is detected, a
StackOverflowError is thrown.
[1446] guess is the last guess. Each time through, guess is set to
the method offset that was used.
A.6 Miscellaneous Object Operations (_quick variants)
new_quick
[1447] Create new object
[1448] Syntax: TABLE-US-00101 new_quick indexbyte1 indexbyte2
[1449] Stack: . . . => . . . ,objectref
[1450] indexbyte1 and indexbyte2 are used to construct an index
into the constant pool of the current class. The item at that index
must be a class. A new instance of that class is then created and
objectref, a reference to that object is pushed on the stack.
checkcast_quick
[1451] Make sure object is of given type
[1452] Syntax: TABLE-US-00102 checkcast_quick indexbyte1
indexbyte2
[1453] Stack: . . . ,objectref=> . . . ,objectref
[1454] objectref must be a reference to an object. indexbyte1 and
indexbyte2 are used to construct an index into the constant pool of
the current class. The object at that index of the constant pool
must have already been resolved.
[1455] checkcast then determines whether objectref can be cast to a
reference to an object of class class. A null reference can be cast
to any class, and otherwise the superclasses of objectref's type
are searched for class. If class is determined to be a superclass
of objectref's type, or if objectref is null, it can be cast to
objectref cannot be cast to class, a ClassCastException is
thrown.
instanceof_quick
[1456] Determine if object is of given type
[1457] Syntax: TABLE-US-00103 instanceof_quick indexbyte1
indexbyte2
[1458] Stack: . . . ,objectref=> . . . ,result
[1459] objectref must be a reference to an object. indexbyte1 and
indexbyte2 are used to construct an index into the constant pool of
the current class. The item of class class at that index of the
constant pool must have already been resolved.
[1460] Instance of determines whether objectref can be cast to an
object of the class class. A null objectref can be cast to any
class and otherwise the superclasses of objectref's type are
searched for class. If class is determined to be a superclass of
objectref's type, result is 1 (true). Otherwise, result is 0
(false). If handle is null, result is 0 (false).
* * * * *