U.S. patent application number 15/084884 was filed with the patent office on 2017-01-05 for pattern based branch prediction.
The applicant listed for this patent is International Business Machines Corporation. Invention is credited to Narasimha R. Adiga, Jatin Bhartia, Akash V. Giri, Matthias Heizmann.
Application Number | 20170003968 15/084884 |
Document ID | / |
Family ID | 57683013 |
Filed Date | 2017-01-05 |
United States Patent
Application |
20170003968 |
Kind Code |
A1 |
Adiga; Narasimha R. ; et
al. |
January 5, 2017 |
PATTERN BASED BRANCH PREDICTION
Abstract
A method comprises identifying a number of branches (N.sub.b)
and a number of iterations (N.sub.i) in a loop in an instruction
stream, generating a number of forward branches until the number of
forward branches equals N.sub.b, generating a non-branch
instruction in between the forward branch instruction, recording in
a memory, instruction stream generated and a history of each
branch, an associated target address of each branch, and whether
the branch is a taken branch or a not taken branch, determining
whether a loop iterator number (i) is less than N.sub.i-1,
generating a backward branch with a target address which is greater
than or equal to the start address and is lesser than the current
address responsive to determining that (i) is less than N.sub.i,
and recording in the memory, a branch instruction of the generated
backward branch and the associated target address of the backward
branch.
Inventors: |
Adiga; Narasimha R.;
(Bangalore, IN) ; Bhartia; Jatin; (Uttar Pradesh,
IN) ; Giri; Akash V.; (Austin, TX) ; Heizmann;
Matthias; (Poughkeepsie, NY) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
International Business Machines Corporation |
Armonk |
NY |
US |
|
|
Family ID: |
57683013 |
Appl. No.: |
15/084884 |
Filed: |
March 30, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14789065 |
Jul 1, 2015 |
|
|
|
15084884 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
G06F 9/3844 20130101;
G06F 9/3806 20130101; G06F 9/30058 20130101; G06F 9/325
20130101 |
International
Class: |
G06F 9/38 20060101
G06F009/38; G06F 9/30 20060101 G06F009/30 |
Claims
1. A method comprising: identifying a number of branches (N.sub.b)
and a number of iterations (N.sub.i) in a loop in an instruction
stream of a processor; determining whether the next instruction to
be generated by the processor is a branch instruction; inserting a
random number of random non-branch instructions and incrementing a
program counter (PC) responsive to determining that the next
instruction to be generated by the processor is not a branch
instruction and determining whether b is less than N.sub.b after
inserting the random number of random non-branch instructions and
incrementing the program counter; determining whether b is less
than N.sub.b responsive to determining that the next instruction to
be generated by the processor is a branch instruction; generating a
backward branch such that a branch target address (TA) is greater
than or equal to a start address (SA) and the TA is less than the
PC responsive to determining that b is not less than N.sub.b and
recording the TA into a target address record and recording taken
into a direction record where a loop end address equals the PC and
the PC equals TA; generating a forward branch and incrementing b
responsive to determining that b is less than N.sub.b; determining
whether the forward branch is taken; incrementing PC and recording
that the forward branch is not taken into a direction record
responsive to determining that the forward branch is not taken; and
generating a branch target address that is greater than the PC,
recording the generated branch target address into a branch target
address record, record that the forward branch is taken into the
direction record responsive to determining that the forward branch
is taken.
Description
PRIORITY
[0001] This application is a continuation of and claims priority
from U.S. patent application Ser. No. 14/789,065, filed on Jul. 1,
2015, entitled "PATTERN BASED BRANCH PREDICTION", the entire
contents of which are incorporated herein by reference.
BACKGROUND
[0002] The present invention relates to processors, and more
specifically, to branch prediction.
[0003] Many processors use algorithms to predict branches in the
instruction stream. Branch prediction predicts if branches will
deviate from the sequential stream, and if so also predicts where
the new non-sequential instruction stream begins. The predictions
improve the processor performance by allowing for speculative
execution, which avoids stalling execution for branch resolution
and target address determination.
[0004] Bimodal predictors use branch history tables (BHT) to
predict branches. However, bimodal predictors loose accuracy when
they attempt to predict branches that exhibit changing behavior
such as, branches with different resolution or target
addresses.
[0005] Pattern history tables (PHT) have been used to detect
patterns in the instruction stream and use those detected patterns
to more efficiently predict branches that exhibit changing
behavior. PHTs ideally track all unique paths of instructions that
lead to the branch to achieve accurate branch prediction.
SUMMARY
[0006] A method comprises identifying a number of branches
(N.sub.b) and a number of iterations (N.sub.i) in a loop in an
instruction stream of a processor, generating a number of forward
branches until the number of forward branches equals N.sub.b,
generating a non-branch instruction in between the forward branch
instruction, recording in a memory, instruction stream generated
and a history of each branch, an associated target address of each
branch, and whether the branch is a taken branch or a not taken
branch, determining whether a loop iterator number (i) is less than
N.sub.i-1, generating a backward branch with an associated target
address which is greater than or equal to the start address and is
lesser than the current address responsive to determining that (i)
is less than N.sub.i; generating a forward branch with an
associated target address which is greater than the current address
responsive to determining that loop iterator number (i) is equal to
Ni-1, and recording in the memory, a branch instruction of the
generated backward branch and the associated target address of the
backward branch.
[0007] A system comprises a memory, a processor operative to
identify a number of branches (N.sub.b) and a number of iterations
(N.sub.i) in a loop in an instruction stream of a processor,
generate a number of forward branches until the number of forward
branches equals N.sub.b, record in the memory, a history of each
branch, an associated target address of each branch, and whether
the branch is a taken branch or a not taken branch, determine
whether a loop iterator number (i) is less than N.sub.i, generate a
backward branch with an associated target address which is greater
than or equal to the start address and is lesser than the current
address responsive to determining that it is less than N.sub.i, and
record in the memory, a branch instruction of the generated
backward branch and the associated target address of the backward
branch.
[0008] A computer program product comprising a computer readable
storage medium having program instructions embodied therewith, the
program instructions executable by a processor to cause the
processor to perform a method comprises identifying a number of
branches (N.sub.b) and a number of iterations (N.sub.i) in a loop
in an instruction stream of a processor, generating a number of
forward branches until the number of forward branches equals
N.sub.b, generating a non-branch instruction in between the forward
branch instruction, recording in a memory, instruction stream
generated and a history of each branch, an associated target
address of each branch, and whether the branch is a taken branch or
a not taken branch, determining whether a loop iterator number (i)
is less than N.sub.i-1, generating a backward branch with an
associated target address which is greater than or equal to the
start address and is lesser than the current address responsive to
determining that (i) is less than N.sub.i, generating a forward
branch with an associated target address which is greater than the
current address responsive to determining that loop iterator number
(i) is equal to Ni-1, and recording in the memory, a branch
instruction of the generated backward branch and the associated
target address of the backward branch.
[0009] Additional features and advantages are realized through the
techniques of the present invention. Other embodiments and aspects
of the invention are described in detail herein and are considered
a part of the claimed invention. For a better understanding of the
invention with the advantages and the features, refer to the
description and to the drawings.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
[0010] The subject matter which is regarded as the invention is
particularly pointed out and distinctly claimed in the claims at
the conclusion of the specification. The forgoing and other
features, and advantages of the invention are apparent from the
following detailed description taken in conjunction with the
accompanying drawings in which:
[0011] FIG. 1 illustrates an exemplary embodiment of a processing
system;
[0012] FIG. 2 illustrates a block diagram of an exemplary method of
branch prediction;
[0013] FIG. 3 illustrates a block diagram of an exemplary method of
branch prediction;
[0014] FIG. 4 illustrates a block diagram of an exemplary method of
branch prediction; and
[0015] FIG. 5 illustrates a block diagram of an exemplary method of
branch prediction.
DETAILED DESCRIPTION
[0016] Using pattern history tables (PHT) to track the path of
instructions that lead to a branch can increase the accuracy of the
branch prediction over using branch history tables. The path which
encounters a given branch affects how the branch will behave. Since
there can be multiple paths that lead to the same branch
instruction, the PHT should ideally track all the unique paths in
order to obtain accurate prediction. It is desirable to provide
quality stimulus to verify PHT based branch prediction logic
(BPL).
[0017] An instruction stream generator may be used to verify PHT
logic in a random verification environment. It is desirable for an
instruction stream generator to produce sequences of instructions
to meet these criteria: at least one branch instruction should
exhibit different behavior depending on the instruction path
leading to the branch instruction; the instruction stream should be
biased so that a changing behavior branch is encountered by
multiple paths; a consistent correlation should exist between a
unique path to a given branch and the concomitant behavior of the
given branch; deviation should be controlled via a stream-generator
configuration; there should be multiple branches and patterns to
efficiently explore the state space.
[0018] FIG. 1 illustrates an exemplary embodiment of a processing
system 100. The system 100 includes a processor 102 that is
communicatively connected to a memory 104, a display 106, and an
input device 108. The illustrated embodiment described below may be
performed, for example by the processor 102 and results in an
efficient stimulus generation that can be further used by processor
102 to test the microprocessor design and provide various output to
the user on the display 106.
[0019] In an exemplary embodiment of the method and system
described herein, a loop is generated in a random instruction
stream containing a number of branch instructions that change
behavior each time a branch instruction is encountered with the
assumption that a given branch is encountered via different
instruction paths.
[0020] In this regard an exemplary method includes the processor
102 (of FIG. 1) determining the number of branches (N.sub.b) and
the number of iterations (N.sub.i) of the loop. The processor 102
continues to generate forward branches (conditional or
non-conditional resolution, direct or indirect target address
determination) until the number of branches generated reaches
N.sub.b. The processor maintains a history of each branch and the
associated target address of each branch and the direction of the
branch at resolution (i.e., taken/not taken) and stores the history
in the memory 104. The processor also generates non-branch
instruction padding to fill the loop. If the processor 102
determines that i<N.sub.i, the processor 102 increments i and
generates a backward branch (conditional/indirect) to one of the
addresses for which an instruction was generated (one of the
generated forward branches or non-branch pad instructions). The
processor records the branch instruction and the associated target
address of the branch instruction in the memory 104. The processor
102 begins to reevaluate the instruction stream if the instruction
is a conditional branch instruction or an indirect branch
instruction. If the instruction is a non-branch instruction, no
reevaluation is performed.
[0021] If the instruction is a conditional branch instruction, the
processor 102 selects the direction of the branch (taken/not taken)
using a pseudo random process such that whether the direction of
the branch is taken or not taken is determined substantially
randomly. In some embodiments, the selection of whether a branch is
taken or not taken may be biased by user selected parameters or
specifications. If the branch is selected to be taken, then the
processor jumps to the target address. If the branch is selected as
not taken, then the processor 102 runs through the sequential
instruction. The processor 102 maintains a record in the memory 104
of the directions selected for the conditional branch instruction
per iteration of the loop. If the instruction is an indirect branch
instruction, then the processor 102 generates a forward target
address. The target address should be less than the instruction
address of the backward branch described above. The processor 102
maintains a record of the target address selected for the indirect
branch instruction per iteration of the loop.
[0022] The incrementing of i, generation of a backward branch, and
reevaluation of the instruction stream continues until i=N.sub.i.
The processor proceeds with random instruction generation.
[0023] FIGS. 2-5 illustrate a block diagram of an exemplary method
of branch prediction that may be performed by the system 100 (of
FIG. 1). Referring to FIG. 2, in block 202 the processor 102 (of
FIG. 1) starts the method and the program counter (PC) equals the
start address (SA). In block 204, the processor 102 determines
whether i<N.sub.i and i=0, where i is the loop iterator and
N.sub.i is the number of iterations needed to be run for the loop
i.e. the current instruction pointer is for the initial loop
iteration. If yes, in block 206, the processor 102 determines
whether the next instruction to be generated is a branch. If no, a
random number of random non-branch instructions are inserted into
the loop and the program counter (PC) (i.e., instruction address)
is incremented in block 208. If yes, in block 210, the processor
102 determines whether b<N.sub.b, where b is the branch
instruction count for the first iteration and N.sub.b is the number
of branch instructions there will be in the loop. If yes, in block
212, a forward branch is generated and b is incremented by 1. The
processor 102 determines in block 214 whether the generated branch
is taken. If no, in block 216, the processor 102 increments the PC
and records Not Taken into the branch direction record per branch
instruction (DC[i]). The DC[i] may be stored in, for example the
memory 104 (of FIG. 1). If yes, in block 218, the processor 102
generates a branch target address (TA) such that TA>PC, i.e. a
forward branch is generated. The processor 102 records the TA into
the branch target address record per branch instruction (TC[i]),
which may be stored, for example, in the memory 104. The processor
102 records Taken into the DC[i], and changes the program counter
to the target address (PC=TA). In block 210, if no, the processor
102 generates a backward branch such that the instruction stream
loops back to the known instruction within the loop, i.e. the
branch target address TA.gtoreq.SA and TA<PC in block 220. In
block 222 the processor records TA into the TC[i] and records taken
into the DC[i], increments i, stores the current PC as the Loop End
Address (LA=PC) and changes the program counter to the target
address of the instruction (PC=TA).
[0024] Referring to FIG. 3, following "A", in block 302, the
processor 102 (of FIG. 1) determines whether i<Ni and i>0 and
PC<LA, where LA is the loop end address (backward branch
address) i.e. it determines that current instruction pointer is for
an intermediate loop and current instruction is not the last
instruction of the loop. If no, the processor 102 follows "B"
described in FIG. 4 below. If yes, in block 304 the processor 102
determines whether a branch instruction exists. If no, in block 306
the processor determines if an instruction exists. If no, in block
308 a random non-branch instruction is inserted into the loop. In
block 310, the PC is incremented to the next instruction. If a
branch instruction exists in block 304, the processor 102
determines whether the branch is conditional in block 312. If yes,
the processor 102 determines a new direction, and records the
determined direction (taken/not taken) into the DC[i] in block 314.
If no, in block 316, the processor 102 records taken into the
DC[i]. The processor 102 determines in block 318 whether the branch
is taken. If no, a random target address is recorded into the TC[i]
and the PC is incremented to the next instruction in block 320. If
yes, in block 322, the processor 102 determines whether the branch
is indirect. If yes, in block 324, the processor 102 determines
(generates) a new target address such that TA<LA and TA>PC,
records the new TA into the TC[i], and PC=TA. If no, then in block
326, records the default (generated when i=0 i.e. TC[0]) target
address into TC[i] (TC[i]=TA) and PC=TA.
[0025] Following "C" to FIG. 4, in block 402 the processor 102 (of
FIG. 1) determines whether the instruction stream is in middle of
the loop i.e. PC<LA. If yes, the method moves to "D" of FIG. 3.
If no, in block 404, the processor 102, determines whether the
method is still in the intermediate loop (i<N.sub.i-1). If no,
the method follows "E" to FIG. 5 described below to change the
behavior of the last instruction of the loop such that instruction
stream exits the loop. If yes, the processor 102 determines whether
the branch is indirect in block 406. If yes, in block 408, the
processor 102 selects any TA between SA and LA. If no in block 406,
then in block 410, processor 102 determines the target address for
the current branch instruction as the default target address i.e.
TA=TC[0]. In block 412, the TA is recorded into the TC[i], taken is
recorded into DC[i], PC=TA, and i is incremented.
[0026] Following "E" to FIG. 5, the processor 102 determines
whether the branch is conditional in block 502. If yes, in block
504, the processor 102 determines whether the branch is indirect.
If no, the branch is forced to be not taken in block 506. In block
508, the PC is incremented to the next instruction and not taken is
recorded into DC[i], a random target address can be recorded into
TC[i] Referring to block 504, if yes, the processor determines in
block 510 if the branch is taken. If yes, the TA is forced forward
i.e. a forward address is randomly selected such that method exits
the loop (i.e. TA>PC), taken is recorded in the DC[i], the TA is
recorded into the TC[i], and PC=TA.
[0027] The exemplary methods and systems described above can be
used by an instruction stream generator that lacks the functionally
to routinely generate nested finite loops with branch instructions
that behave differently. These exemplary methods and systems can
increase the effectiveness of verifying such pattern based branch
prediction methods.
[0028] The method and systems described above fit into a
constrained random environment and provide quality stimulus to
verify pattern history table based branch prediction logic (BPL).
This is achieved by generating a loop in a random instruction
stream that contains a number of branch instructions that change
behavior each time the branch instructions are encountered; with
the underlying assumption that a given branch is encountered via
different instruction paths.
[0029] The terminology used herein is for the purpose of describing
particular embodiments only and is not intended to be limiting of
the invention. As used herein, the singular forms "a", "an" and
"the" are intended to include the plural forms as well, unless the
context clearly indicates otherwise. It will be further understood
that the terms "comprises" and/or "comprising," when used in this
specification, specify the presence of stated features, integers,
steps, operations, elements, and/or components, but do not preclude
the presence or addition of one or more other features, integers,
steps, operations, elements, components, and/or groups thereof.
[0030] The corresponding structures, materials, acts, and
equivalents of all means or step plus function elements in the
claims below are intended to include any structure, material, or
act for performing the function in combination with other claimed
elements as specifically claimed. The description of the present
invention has been presented for purposes of illustration and
description, but is not intended to be exhaustive or limited to the
invention in the form disclosed. Many modifications and variations
will be apparent to those of ordinary skill in the art without
departing from the scope and spirit of the invention. The
embodiments were chosen and described in order to best explain the
principles of the invention and the practical application, and to
enable others of ordinary skill in the art to understand the
invention for various embodiments with various modifications as are
suited to the particular use contemplated.
[0031] The present invention may be a system, a method, and/or a
computer program product. The computer program product may include
a computer readable storage medium (or media) having computer
readable program instructions thereon for causing a processor to
carry out aspects of the present invention.
[0032] The computer readable storage medium can be a tangible
device that can retain and store instructions for use by an
instruction execution device. The computer readable storage medium
may be, for example, but is not limited to, an electronic storage
device, a magnetic storage device, an optical storage device, an
electromagnetic storage device, a semiconductor storage device, or
any suitable combination of the foregoing. A non-exhaustive list of
more specific examples of the computer readable storage medium
includes the following: a portable computer diskette, a hard disk,
a random access memory (RAM), a read-only memory (ROM), an erasable
programmable read-only memory (EPROM or Flash memory), a static
random access memory (SRAM), a portable compact disc read-only
memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a
floppy disk, a mechanically encoded device such as punch-cards or
raised structures in a groove having instructions recorded thereon,
and any suitable combination of the foregoing. A computer readable
storage medium, as used herein, is not to be construed as being
transitory signals per se, such as radio waves or other freely
propagating electromagnetic waves, electromagnetic waves
propagating through a waveguide or other transmission media (e.g.,
light pulses passing through a fiber-optic cable), or electrical
signals transmitted through a wire.
[0033] Computer readable program instructions described herein can
be downloaded to respective computing/processing devices from a
computer readable storage medium or to an external computer or
external storage device via a network, for example, the Internet, a
local area network, a wide area network and/or a wireless network.
The network may comprise copper transmission cables, optical
transmission fibers, wireless transmission, routers, firewalls,
switches, gateway computers and/or edge servers. A network adapter
card or network interface in each computing/processing device
receives computer readable program instructions from the network
and forwards the computer readable program instructions for storage
in a computer readable storage medium within the respective
computing/processing device.
[0034] Computer readable program instructions for carrying out
operations of the present invention may be assembler instructions,
instruction-set-architecture (ISA) instructions, machine
instructions, machine dependent instructions, microcode, firmware
instructions, state-setting data, or either source code or object
code written in any combination of one or more programming
languages, including an object oriented programming language such
as Java, Smalltalk, C++ or the like, and conventional procedural
programming languages, such as the "C" programming language or
similar programming languages. The computer readable program
instructions may execute entirely on the user's computer, partly on
the user's computer, as a stand-alone software package, partly on
the user's computer and partly on a remote computer or entirely on
the remote computer or server. In the latter scenario, the remote
computer may be connected to the user's computer through any type
of network, including a local area network (LAN) or a wide area
network (WAN), or the connection may be made to an external
computer (for example, through the Internet using an Internet
Service Provider). In some embodiments, electronic circuitry
including, for example, programmable logic circuitry,
field-programmable gate arrays (FPGA), or programmable logic arrays
(PLA) may execute the computer readable program instructions by
utilizing state information of the computer readable program
instructions to personalize the electronic circuitry, in order to
perform aspects of the present invention.
[0035] Aspects of the present invention are described herein with
reference to flowchart illustrations and/or block diagrams of
methods, apparatus (systems), and computer program products
according to embodiments of the invention. It will be understood
that each block of the flowchart illustrations and/or block
diagrams, and combinations of blocks in the flowchart illustrations
and/or block diagrams, can be implemented by computer readable
program instructions.
[0036] These computer readable program instructions may be provided
to a processor of a general purpose computer, special purpose
computer, or other programmable data processing apparatus to
produce a machine, such that the instructions, which execute via
the processor of the computer or other programmable data processing
apparatus, create means for implementing the functions/acts
specified in the flowchart and/or block diagram block or blocks.
These computer readable program instructions may also be stored in
a computer readable storage medium that can direct a computer, a
programmable data processing apparatus, and/or other devices to
function in a particular manner, such that the computer readable
storage medium having instructions stored therein comprises an
article of manufacture including instructions which implement
aspects of the function/act specified in the flowchart and/or block
diagram block or blocks.
[0037] The computer readable program instructions may also be
loaded onto a computer, other programmable data processing
apparatus, or other device to cause a series of operational steps
to be performed on the computer, other programmable apparatus or
other device to produce a computer implemented process, such that
the instructions which execute on the computer, other programmable
apparatus, or other device implement the functions/acts specified
in the flowchart and/or block diagram block or blocks.
[0038] The flowchart and block diagrams in the Figures illustrate
the architecture, functionality, and operation of possible
implementations of systems, methods, and computer program products
according to various embodiments of the present invention. In this
regard, each block in the flowchart or block diagrams may represent
a module, segment, or portion of instructions, which comprises one
or more executable instructions for implementing the specified
logical function(s). In some alternative implementations, the
functions noted in the block may occur out of the order noted in
the figures. For example, two blocks shown in succession may, in
fact, be executed substantially concurrently, or the blocks may
sometimes be executed in the reverse order, depending upon the
functionality involved. It will also be noted that each block of
the block diagrams and/or flowchart illustration, and combinations
of blocks in the block diagrams and/or flowchart illustration, can
be implemented by special purpose hardware-based systems that
perform the specified functions or acts or carry out combinations
of special purpose hardware and computer instructions.
[0039] The descriptions of the various embodiments of the present
invention have been presented for purposes of illustration, but are
not intended to be exhaustive or limited to the embodiments
disclosed. Many modifications and variations will be apparent to
those of ordinary skill in the art without departing from the scope
and spirit of the described embodiments. The terminology used
herein was chosen to best explain the principles of the
embodiments, the practical application or technical improvement
over technologies found in the marketplace, or to enable others of
ordinary skill in the art to understand the embodiments disclosed
herein.
* * * * *