U.S. patent application number 11/132423 was filed with the patent office on 2005-12-15 for systems and methods of dynamic branch prediction in a microprocessor.
Invention is credited to Aristodemou, Aris, Fuhler, Rich, Wong, Kar-Lik.
Application Number | 20050278513 11/132423 |
Document ID | / |
Family ID | 35429033 |
Filed Date | 2005-12-15 |
United States Patent
Application |
20050278513 |
Kind Code |
A1 |
Aristodemou, Aris ; et
al. |
December 15, 2005 |
Systems and methods of dynamic branch prediction in a
microprocessor
Abstract
A hybrid branch prediction scheme for a multi-stage pipelined
microprocessor that combines features of static and dynamic branch
prediction to reduce complexity and enhance performance over
conventional branch prediction techniques. Prior to microprocessor
deployment, a branch prediction table is populated using static
branch prediction techniques by executing instructions analogous to
those to be executed during microprocessor deployment. The branch
prediction table is stored, and then loaded into the BPU during
deployment, for example, at the time of microprocessor power on.
Dynamic branch prediction is then performed using the pre-loaded
data, thereby enabling dynamic branch prediction with a required
"warm-up" period. After resolving each branch in the selection
stage of the microprocessor instruction pipeline, the BPU is
updated with the address of the next instruction that resulted from
that branch to enhance performance.
Inventors: |
Aristodemou, Aris; (London,
GB) ; Fuhler, Rich; (Santa Cruz, CA) ; Wong,
Kar-Lik; (Wokinham, GB) |
Correspondence
Address: |
HUNTON & WILLIAMS LLP
INTELLECTUAL PROPERTY DEPARTMENT
1900 K STREET, N.W.
SUITE 1200
WASHINGTON
DC
20006-1109
US
|
Family ID: |
35429033 |
Appl. No.: |
11/132423 |
Filed: |
May 19, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60572238 |
May 19, 2004 |
|
|
|
Current U.S.
Class: |
712/228 ;
712/240; 712/E9.051; 712/E9.052 |
Current CPC
Class: |
G06F 9/325 20130101;
Y02D 10/12 20180101; G06F 9/30145 20130101; G06F 9/3861 20130101;
G06F 9/30181 20130101; G06F 9/30036 20130101; G06F 9/30032
20130101; G06F 9/3806 20130101; G06F 9/30149 20130101; G06F 9/3897
20130101; G06F 9/3802 20130101; G06F 9/32 20130101; G06F 9/3846
20130101; G06F 15/7867 20130101; G06F 9/3816 20130101; G06F 9/3885
20130101; G06F 11/3648 20130101; G06F 12/0802 20130101; G06F 9/3844
20130101; Y02D 10/13 20180101; Y02D 10/00 20180101; G06F 5/01
20130101 |
Class at
Publication: |
712/228 ;
712/240 |
International
Class: |
G06F 009/00 |
Claims
1. A method of performing branch prediction in a microprocessor
having a multistage instruction pipeline, the method comprising:
building a branch prediction history table of branch prediction
data through static branch prediction prior to microprocessor
deployment; storing the branch prediction data in a memory; loading
the branch prediction data into a branch prediction unit (BPU) of
the microprocessor upon power on; and performing dynamic branch
prediction with the BPU based on the preloaded branch prediction
data.
2. The method according to claim 1, further comprising updating the
branch prediction data in the BPU if, during instruction
processing, prediction data changes.
3. The method according to claim 2 wherein updating comprises after
resolving a branch in a select stage of the instruction pipeline,
updating the BPU with the address of a next instruction that
resulted from that branch.
4. The method according to claim 1, wherein building a branch
prediction history table comprises simulating instructions that
will be executed by the processor during deployment and populating
a table of branch history with information indicating whether
conditional branches were taken or not.
5. The method according to claim 4, wherein building comprises
using at least one of a simulator and a compiler to generate branch
history.
6. The method according to claim 1, wherein performing dynamic
branch prediction with the branch prediction unit based on the
preloaded branch prediction data comprises parsing a branch history
table in the BPU that indexes non-sequential instructions by their
addresses in association with the next instruction taken.
7. The method according to claim 1, wherein the microprocessor is
an embedded microprocessor.
8. The method according to claim 1, further comprising after
performing dynamic branch prediction, storing branch history data
in the branch prediction unit in a non-volatile memory for preload
upon subsequent microprocessor use.
9. In a multistage pipeline microprocessor employing dynamic branch
prediction, the method of enhancing branch prediction performance
comprising: performing static branch prediction to build a branch
prediction history table of branch prediction data prior to
microprocessor deployment; storing the branch prediction history
table in a memory; loading the branch prediction history table into
a branch prediction unit (BPU) of the microprocessor; and
performing dynamic branch prediction with the BPU based on the
preloaded branch prediction data.
10. The method according to claim 9, wherein static branch
prediction is performed prior to microprocessor deployment.
11. The method according to claim 9, wherein loading the branch
prediction table is performed subsequent to microprocessor power
on.
12. The method according to claim 9, further comprising updating
the branch prediction data in the BPU if, during instruction
processing, prediction data changes.
13. The method according to claim 12, wherein the microprocessor
includes an instruction pipeline having a select stage, and
updating comprises after resolving a branch in the select stage,
updating the BPU with the address of the next instruction resulting
from that branch.
14. The method according to claim 9, wherein building a branch
prediction history table comprises simulating instructions that
will be executed by the processor during deployment and populating
a table of branch history with information indicating whether
conditional branches were taken or not.
15. The method according to claim 14, wherein building comprises
using at least one of a simulator and a compiler to generate branch
history.
16. The method according to claim 9, wherein performing dynamic
branch prediction with the branch prediction unit based on the
preloaded branch prediction data comprises parsing a branch history
table in the BPU that indexes non-sequential instructions by their
addresses in association with the next instruction taken.
17. The method according to claim 9, wherein the microprocessor is
an embedded microprocessor.
18. The method according to claim 9, further comprising after
performing dynamic branch prediction, storing branch history data
in the branch prediction unit in a non-volatile memory for preload
upon subsequent microprocessor use
19. An embedded microprocessor comprising: a multistage instruction
pipeline; and a BPU adapted to perform dynamic branch prediction,
wherein the BPU is preloaded with branch history table created
through static branch prediction, and subsequently updated to
contain the actual address of the next instruction that resulted
from that branch during dynamic branch prediction.
20. The microprocessor according to claim 19, wherein the branch
history table contains data generated prior to microprocessor
deployment and the BPU is preloaded at power on of the
microprocessor.
21. The microprocessor according to claim 19, wherein after
resolving a branch in a select stage of the instruction pipeline,
the BPU is updated to contain the address of the next instruction
that resulted from that branch.
22. The microprocessor according to claim 19, wherein the BPU is
preloaded with a branch history table created through static branch
prediction during a simulation processing that simulated
instructions that will be executed by the microprocessor during
deployment and wherein the BPU comprises a branch history table
that indexes non-sequential instructions by their addresses in
association with the next instruction taken.
Description
CROSS REFERENCE TO RELATED APPLICATION(S)
[0001] This application claims priority to provisional application
No. 60/572,238 filed May 19, 2004, entitled "Microprocessor
Architecture" hereby incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0002] This invention relates generally to microprocessor
architecture and more specifically to improved systems and methods
for performing branch prediction in a multi-stage pipelined
microprocessor.
BACKGROUND OF THE INVENTION
[0003] Multistage pipeline microprocessor architecture is known in
the art. A typical microprocessor pipeline consists of several
stages of instruction handling hardware, wherein each rising pulse
of a clock signal propagates instructions one stage further in the
pipeline. Although the clock speed dictates the number of clock
signals and therefore pipeline propagations per second, the
effective operational speed of the processor is dependent partially
upon the rate that instructions and operands are transferred
between memory and the processor.
[0004] One method of increasing processor performance is branch
prediction. Branch prediction uses instruction history to predict
whether a branch or non-sequential instruction will be taken.
Branch or non-sequential instructions are processor instructions
that require a jump to a non-sequential memory address if a
condition is satisfied. When an instruction is retrieved or
fetched, if the instruction is a conditional branch, the result of
the conditional branch, that is, the address of the next
instruction to be executed following the conditional branch, is
speculatively predicted based on past branch history. This
predictive or speculative result is injected into the pipeline by
referencing a branch history table. Whether or not the prediction
is correct will not be known until a later stage of the pipeline.
However, if the prediction is correct, several clock cycles will be
saved by not having to go back to get the next non-sequential
instruction address.
[0005] If the prediction is incorrect, the current pipeline behind
the stage in which the prediction is determined to be incorrect
must be flushed and the correct branch inserted back in the first
stage. This may seem like a severe penalty in the event of an
incorrect prediction because it results in the same number of clock
cycles as if no branch prediction were used. However, in
applications where small loops are repeated many times, such as
applications typically implemented with embedded processors, branch
prediction has a sufficiently high success rate that the benefits
associated with correct predictions outweigh the cost of occasional
incorrect predictions--i.e., pipeline flush. In these types of
embedded applications branch prediction can achieve accuracy over
ninety percent of the time. Thus, the risk of predicting an
incorrect branch resulting in a pipeline flush is outweighed by the
benefit of saved clock cycles.
[0006] There are essentially two techniques for implementing branch
prediction. The first, dynamic branch prediction, records runtime
program flow behavior in order to establish a history that can be
used at the front of the pipeline to predict future non-sequential
program flow. When a branch instruction comes in, the look up table
is referenced for the address of the next instruction which is then
predictively injected into the pipeline. Once the look up table is
populated with a sufficient amount of data, dynamic branch
prediction significantly increases performance. However, this
technique is initially ineffective, and can even reduce system
performance until a sufficient number of instructions have been
processed to fill the branch history tables. Because of the
required "warm-up" period for this technique to become effective,
runtime behavior of critical code could become unpredictable making
it unacceptable for certain embedded applications. Moreover, as
noted above, mistaken branch predictions result in a flush of the
entire pipeline wasting clock cycles and retarding performance.
[0007] The other primary branch prediction technique is static
branch prediction. Static branch prediction uses profiling
techniques to guide the complier to generate special branch
instructions. These special branch instructions typically include
hints to guide the processor to perform speculative branch
prediction earlier in the pipeline when not all information
required for branch resolution is yet available. However, a
disadvantage of static branch prediction techniques is that they
typically complicate the processor pipeline design because
speculative as well as actual branch resolution has to be performed
in several pipeline stages. Complication of design translates to
increased silicon footprint and higher cost. Static branch
prediction techniques can yield accurate results but they cannot
cope with variation of run-time conditions. Therefore, static
branch prediction also suffers from limitations which reduce its
appeal for critical embedded applications.
[0008] Thus, it would be desirable to have a branch prediction
technique that ameliorates and ideally eliminates one or more of
the above-noted deficiencies of conventional branch prediction
techniques. However, it should be appreciated that the description
herein of various advantages and disadvantages associated with
known apparatus, methods, and materials is not intended to limit
the scope of the invention to their exclusion. Indeed, various
embodiments of the invention may include one or more of the known
apparatus, methods, and materials without suffering from their
disadvantages.
[0009] As background to the techniques discussed herein, the
following references are incorporated herein by reference: U.S.
Pat. No. 6,862,563 issued Mar. 1, 2005 entitled "Method And
Apparatus For Managing The Configuration And Functionality Of A
Semiconductor Design" (Hakewill et al.); U.S. Ser. No. 10/423,745
filed Apr. 25, 2003, entitled "Apparatus and Method for Managing
Integrated Circuit Designs"; and U.S. Ser. No. 10/651,560 filed
Aug. 29, 2003, entitled "Improved Computerized Extension Apparatus
and Methods", all assigned to the assignee of the present
invention.
SUMMARY OF THE INVENTION
[0010] Various embodiments of the invention may ameliorate or
overcome one or more of the shortcomings of conventional branch
prediction techniques through a hybrid branch prediction technique
that takes advantage of features of both static and dynamic branch
prediction.
[0011] At least one exemplary embodiment of the invention may
provide a method of performing branch prediction in a
microprocessor having a multi-stage instruction pipeline. The
method of performing branch prediction according to this embodiment
comprises building a branch prediction history table of branch
prediction data through static branch prediction prior to
microprocessor deployment, storing the branch prediction data in a
memory in the microprocessor, loading the branch prediction data
into a branch prediction unit (BPU) of the microprocessor upon
powering on, and performing dynamic branch prediction with the BPU
based on the preloaded branch prediction data.
[0012] At least one additional exemplary embodiment of the
invention may provide a method of enhancing branch prediction
performance of a multi-stage pipelined microprocessor employing
dynamic branch prediction. The method of enhancing branch
prediction performance according to this embodiment comprises
performing static branch prediction to build a branch prediction
history table of branch prediction data prior to microprocessor
deployment, storing the branch prediction history table in a memory
in the microprocessor, loading the branch prediction history table
into a branch prediction unit (BPU) of the microprocessor, and
performing dynamic branch prediction with the BPU based on the
preloaded branch prediction data.
[0013] Yet an additional exemplary embodiment of the invention may
provide an embedded microprocessor architecture. The embedded
microprocessor architecture according to this embodiment comprises
a multi-stage instruction pipeline, and a BPU adapted to perform
dynamic branch prediction, wherein the BPU is preloaded with branch
history table created through static branch prediction, and
subsequently updated to contain the actual address of the next
instructed that resulted from that branch during dynamic branch
prediction.
[0014] Other aspects and advantages of the invention will become
apparent from the following detailed description, taken in
conjunction with the accompanying drawings, illustrating by way of
example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] FIG. 1 is a block diagram illustrating a multistage
instruction pipeline of a conventional microprocessor core;
[0016] FIG. 2 is a flow chart illustrating the steps of a method
for performing dynamic branch prediction based on preloaded static
branch prediction data in accordance with at least one exemplary
embodiment of the invention; and
[0017] FIG. 3 is a block diagram illustrating the flow of data into
and out of a branch prediction unit in accordance with at least one
exemplary embodiment of the invention.
DETAILED DESCRIPTION OF THE DISCLOSURE
[0018] The following description is intended to convey a thorough
understanding of the invention by providing specific embodiments
and details involving various aspects of a new and useful
microprocessor architecture. It is understood, however, that the
invention is not limited to these specific embodiments and details,
which are exemplary only. It further is understood that one
possessing ordinary skill in the art, in light of known systems and
methods, would appreciate the use of the invention for its intended
purposes and benefits in any number of alternative embodiments,
depending upon specific design and other needs.
[0019] FIG. 1 illustrates a typical microprocessor core 100 with a
multistage instruction pipeline. The first stage of the
microprocessor core 100 is the instruction fetch stage (FET) 110.
In the instruction fetch stage 110, instructions are retrieved or
fetched from instruction RAM 170 based on their N-bit instruction
address. During instruction fetches, a copy of the instruction,
indexed by its address, will be stored in the instruction cache
112. As a result, future calls to the same instruction may be
retrieved from the instruction cache 112, rather than the
relatively slower instruction RAM 170.
[0020] Another typical component of the fetch stage 110 of a
multi-stage pipelined microprocessor is the branch prediction unit
(BPU) 114. The branch prediction unit 114 increases processing
speed by predicting whether a branch to a non-sequential
instruction will be taken based upon past instruction processing
history. The BPU 114 contains a branch look-up or prediction table
that stores the address of branch instructions and an indication as
to whether the branch was taken. Thus, when a branch instruction is
fetched, the look-up table is referenced to make a prediction as to
the address of the next instruction. As discussed herein, whether
or not the prediction is correct will not be known until a later
stage of the pipeline. In the example shown in FIG. 1, it will not
be known until the sixth stage of the pipeline.
[0021] With continued reference to FIG. 1, the next stage of the
typical microprocessor core instruction pipeline is the instruction
decode stage (DEC) 120, where the actual instruction is decoded
into machine language for the processor to interpret. If the
instruction involves a branch or a jump, the target address is
generated. Next, in stage (REG) 130, any required operands are read
from the register file. Then, in stage (EXEC) 140, the particular
instruction is executed by the appropriate unit. Typical execute
stage units include a floating point unit 143, a multiplier unit
144, an arithmetic unit 145, a shifter 146, a logical unit 147 and
an adder unit 148. The result of the execute stage 140 is selected
in the select stage (SEL) 150 and finally, this data is written
back to the register file by the write back stage (WB) 160. The
instruction pipeline increments with each clock cycle.
[0022] Referring now to FIG. 2, a flow chart illustrating the steps
of a method for performing dynamic branch prediction based on
preloaded static branch prediction data in accordance with at least
one exemplary embodiment of this invention is illustrated. As
discussed above, dynamic branch prediction is a technique often
employed to increase pipeline performance when software
instructions lead to a non-sequential program flow. The problem
arises because instructions are sequentially fed into the pipeline,
but are not executed until later stages of the pipeline. Thus, the
decision as to whether a non-sequential program flow (hereinafter
also referred to as a branch) is to be taken or not, is not
resolved until the end of the pipeline, but the related decision of
which address to use to fetch the next instruction is required at
the front of the pipeline. In the absence of branch prediction, the
fetch stage would then have to fetch the next instruction after the
branch is resolved leaving all stages of the pipeline between the
resolution stage and the fetch stage unused. This is an undesired
hindrance to performance. As a result, the choice as to which
instruction to fetch next is made speculatively or predictively
based on historical performance. A branch history table is used in
the branch prediction unit (BPU) which indexes non-sequential
instructions by their addresses in association with the next
instruction taken. After resolving a branch in the select stage of
the pipeline, the BPU is updated with the address of the next
instruction that resulted from that branch.
[0023] To alleviate the limitations of both dynamic and static
branch prediction techniques, the present invention discloses a
hybrid branch prediction technique that combines the benefits of
both dynamic and static branch prediction. With continued reference
to FIG. 2, the technique begins in step 200 and advances to step
205 where static branch prediction is performed offline before
final deployment of the processor, but based on applications which
will be executed by the microprocessor after deployment. In various
exemplary embodiments, this static branch prediction may be
performed using the assistance of a complier or simulator. For
example, if the processor is to be deployed in a particular
embedded application, such as an electronic device, the simulator
can simulate various instructions for the discrete instruction set
to be executed by the processor prior to the processor being
deployed. By performing static branch prediction a table of branch
history can be fully populated with the actual addresses of the
next instruction after a branch instruction is executed.
[0024] After developing a table of branch prediction data during
static branch prediction, operation of the method continues to step
210 where the branch prediction table is stored in memory. In
various exemplary embodiments, this step will involve storing the
branch prediction table in a non-volatile memory that will be
available for future use by the processor. Then, in step 215, when
the processor is deployed in the desired embedded application, the
static branch prediction data is preloaded into the branch history
table in the BPU. In various exemplary embodiments, the branch
prediction data is preloaded at power-up of the microprocessor,
such as, for example, at power-up of the particular product
containing the processor.
[0025] Operation of the method then advances to step 220 where,
during ordinary operation, dynamic branch prediction is performed
based on the preloaded branch prediction data without requiring a
warm-up period or without unstable results. Then, in step 225,
after resolving each branch in the selection stage of the
multistage processor pipeline, the branch prediction table in the
BPU is updated with the results to improve accuracy of the
prediction information as necessary. Operation of the method
terminates in step 230. It should be appreciated that in various
exemplary embodiments, each time the processor is powered down,
that the "current" branch prediction table may be stored in
non-volatile memory so that each time the processor is powered up,
the most recent branch prediction data is loaded into the BPU.
[0026] Referring now to FIG. 3, a block diagram illustrating the
flow of data into and out of a branch prediction unit 314 in
accordance with at least one exemplary embodiment of the invention
is illustrated. In the Fetch stage 310 of the instruction pipeline,
the BPU 314 maintains a branch prediction look-up table 316 that
stores the address of the next instruction indexed by the address
of the branch instruction. Thus, when the branch instruction enters
the pipeline, the look-up table 316 is referenced by the
instruction's address. The address of the next instruction is taken
from the table 316 and injected in the pipeline directly following
the branch instruction. Therefore, if the branch is taken then the
next instruction address is available at the next clock signal. If
the branch is not taken, the pipeline must be flushed and the
correct instruction address injected back at the fetch stage 310.
In the event that a pipeline flush is required, the look-up table
316 is updated with the actual address of the next instruction so
that it will be available for the next instance of that branch
instruction.
[0027] While the foregoing description includes many details and
specificities, it is to be understood that these have been included
for purposes of explanation only. The embodiments of the present
invention are not to be limited in scope by the specific
embodiments described herein. For example, although many of the
embodiments disclosed herein have been described with reference to
branch prediction in embedded RISC-type microprocessors, the
principles herein are equally applicable to branch prediction in
microprocessors in general. Indeed, various modifications of the
embodiments of the present inventions, in addition to those
described herein, will be apparent to those of ordinary skill in
the art from the foregoing description and accompanying drawings.
Thus, such modifications are intended to fall within the scope of
the following appended claims. Further, although the embodiments of
the present inventions have been described herein in the context of
a particular implementation in a particular environment for a
particular purpose, those of ordinary skill in the art will
recognize that its usefulness is not limited thereto and that the
embodiments of the present inventions can be beneficially
implemented in any number of environments for any number of
purposes. Accordingly, the claims set forth below should be
construed in view of the full breadth and spirit of the embodiments
of the present inventions as disclosed herein.
* * * * *