U.S. patent application number 11/445100 was filed with the patent office on 2007-12-06 for multi-threaded processor with deferred thread output control.
Invention is credited to Yun Du, Guofang Jiao, Chun Yu.
Application Number | 20070283356 11/445100 |
Document ID | / |
Family ID | 38626192 |
Filed Date | 2007-12-06 |
United States Patent
Application |
20070283356 |
Kind Code |
A1 |
Du; Yun ; et al. |
December 6, 2007 |
Multi-threaded processor with deferred thread output control
Abstract
A multi-threaded processor is provided that internally reorders
output threads thereby avoiding the need for an external output
reorder buffer. The multi-threaded processor writes its thread
results back to an internal memory buffer to guarantee that thread
results are outputted in the same order in which the threads are
received. A thread scheduler within the multi-threaded processor
manages thread ordering control to avoid the need for an external
reorder buffer. A compiler for the multi-threaded processor
converts instructions that would normally send processed results
directly to an external reorder buffer so that the processed thread
results are instead sent to the internal memory buffer of the
multi-threaded processor.
Inventors: |
Du; Yun; (San Diego, CA)
; Jiao; Guofang; (San Diego, CA) ; Yu; Chun;
(San Diego, CA) |
Correspondence
Address: |
QUALCOMM INCORPORATED
5775 MOREHOUSE DR.
SAN DIEGO
CA
92121
US
|
Family ID: |
38626192 |
Appl. No.: |
11/445100 |
Filed: |
May 31, 2006 |
Current U.S.
Class: |
718/102 ;
712/E9.027; 712/E9.046; 712/E9.049; 712/E9.053 |
Current CPC
Class: |
Y02D 10/24 20180101;
G06F 9/3851 20130101; G06F 9/3855 20130101; G06F 9/3857 20130101;
Y02D 10/00 20180101; G06F 9/30123 20130101; G06F 9/4881 20130101;
G06F 9/3836 20130101 |
Class at
Publication: |
718/102 |
International
Class: |
G06F 9/46 20060101
G06F009/46 |
Claims
1. A multi-threaded processor comprising: a thread scheduler to
track a sequence in which a plurality of threads are received from
an application; an internal memory buffer to temporarily store the
plurality of received threads; and a processing unit coupled to the
thread scheduler and internal memory buffer, the processing unit
configured to process the plurality of threads to obtain a
plurality of corresponding results, and store the plurality of
results in the internal memory buffer, wherein the thread scheduler
causes the plurality of stored results to be outputted from the
internal memory buffer according to the sequence in which the
corresponding threads were received from the application.
2. The multi-threaded processor of 1 wherein the plurality of
threads are processed by the processing unit according to the order
defined by flow control instructions associated with the plurality
of threads.
3. The multi-threaded processor of 2 wherein the flow control
instructions cause the plurality of threads to be processed in a
different sequence than they were received.
4. The multi-threaded processor of 1 wherein the memory buffer
further includes a plurality of input registers to store the
plurality of received threads prior to processing and a plurality
of output registers to store the plurality of results prior to
being outputted.
5. The multi-threaded processor of 4 wherein processing unit is
further configured to retrieve a thread from one of the plurality
of input registers in the memory buffer.
6. The multi-threaded processor of 1 further comprising: an input
interface coupled to the thread scheduler to receive the plurality
of threads; and an output interface coupled to the memory buffer
from which the plurality of stored results are outputted.
7. The multi-threaded processor of 1 further comprising: a load
controller coupled to the thread scheduler and configured to store
the plurality of threads in a plurality of input registers in the
internal memory buffer under the direction of the thread
scheduler.
8. The multi-threaded processor of 7 wherein the load controller
outputs the results from the internal memory buffer under the
direction of the thread scheduler.
9. The multi-threaded processor of 1 wherein the received threads
include pixel data.
10. A multi-threaded processor comprising: means for tracking a
sequence in which a plurality of threads are received; means for
processing the plurality of threads to obtain a plurality of
corresponding results; means for storing the plurality of results
in an internal memory buffer; and means for causing the plurality
of stored results to be outputted from the internal memory buffer
according to the sequence in which the corresponding threads were
received.
11. The multi-threaded processor of 10 wherein the plurality of
threads are processed according to the order defined by flow
control instructions associated with the plurality of threads.
12. The multi-threaded processor of 10 further comprising: means
for storing the plurality of threads in the internal memory buffer
prior to processing.
13. A method for reordering the sequence of a plurality thread
results within a multi-threaded processor comprising: tracking a
sequence in which a plurality of threads are received; processing
the plurality of threads to obtain a plurality of corresponding
results; storing the plurality of results in an internal memory
buffer; and sending out the plurality of stored results from the
memory buffer according to the sequence in which the corresponding
threads were received.
14. The method of 13 wherein the plurality of threads are processed
according to the order defined by flow control instructions
associated with the plurality of threads.
15. The method of 13 further comprising: receiving a plurality of
threads for a particular process at a multi-threaded processor; and
storing the plurality of threads in the memory buffer prior to
processing.
16. A graphics processor comprising: a multi-threaded processor
configured to track a sequence in which a plurality of threads
including pixel data are received from a first application; store
the plurality of received threads in an internal memory buffer; and
process the plurality of threads according to an order defined by
flow control instructions associated with the plurality of threads
to obtain a plurality of corresponding results; store the plurality
of results in the internal memory buffer; output the plurality of
results to the first application from the internal memory buffer
according to the sequence in which the corresponding threads were
received from the application.
17. The graphics processor of 16 wherein the flow control
instructions cause the plurality of threads to be processed in a
different sequence than they were received
18. A method operational on a multi-thread processor compiler,
comprising: receiving a plurality of instructions to be compiled
for operation on a multi-threaded processor; identifying output
instructions in the plurality of instructions that direct output
results to an external register; and converting the identified
output instructions to direct the output results to an internal
register.
19. The method of 18 further comprising: compiling the plurality of
instructions for processing by the multi-threaded processor.
20. The method of 18 further wherein the multi-threaded processor
supports flow control instructions cause threads to be processed in
a different order than they are received.
21. A machine-readable medium having one or more instructions for
compiling instructions for a multi-threaded processor, which when
executed by a processor causes the processor to: receive a
plurality of instructions to be compiled for operation on a
multi-threaded processor; identify output instructions in the
plurality of instructions that direct output results to an external
register; and convert the identified output instructions to direct
the output results to an internal register.
22. The machine-readable medium of 21 further having one or more
instructions which when executed by a processor causes the
processor to: compile the plurality of instructions for processing
by the multi-threaded processor with flow control support.
Description
REFERENCE TO CO-PENDING APPLICATIONS FOR PATENT
[0001] The present application is related to the following
co-assigned U.S. Patent Applications, which are expressly
incorporated by reference herein:
[0002] U.S. application Ser. No. 11/412,678, entitled "GRAPHICS
SYSTEM WITH CONFIGURABLE CACHES" (docket no. 060787), filed on Apr.
26, 2006;
[0003] U.S. application Ser. No. ______, entitled "GRAPHICS
PROCESSOR WITH ARITHMETIC AND ELEMENTARY FUNCTION UNITS" (docket
no. 060802) filed on May 25, 2006.
BACKGROUND
[0004] 1. Field
[0005] Various embodiments of the invention pertain to processor
operation and architectures, and particularly to a multi-threaded
processor that internally reorders output threads thereby avoiding
the need for an external reorder buffer.
[0006] 2. Background
[0007] Multi-threaded processors are designed to improve processing
performance by efficiently executing multiple streams of encoded
data (i.e., threads) at once within a single processor. Multiple
storage registers are typically used to maintain the state of
multiple threads at the same time. Multi-threaded architectures
often provide more efficient utilization of various processor
resources, and particularly the execution logic or arithmetic logic
unit (ALU) within the processor. By feeding multiple threads to the
ALU, clock cycles that would otherwise have been idle due to a
stall or other delays in the processing of a particular thread may
be utilized to service a different thread.
[0008] A conventional multi-threaded processor may receive multiple
threads and processes each thread so as to maintain the same input
thread order at the output stage. This means that the first thread
received from a program is the first thread outputted to the
program.
[0009] Programmable multi-threaded processors often include flow
control capabilities. This permits programs to include flow control
instructions sent to the programmable multi-threaded processor that
may cause threads to be processed out of order. For example, a
first input thread may not finish execution first, in some cases,
it may finish execution last. However, programs expect to receive
outputted threads in the order in which they were sent to the
processor.
[0010] One approach to maintaining the order of a sequence of
threads for a particular program or application is to add a large
buffer to reorder the threads. This buffer is typically external to
the multi-threaded processor core and requires additional logic to
implement. Adding a large external buffer increases the cost of
implementing a multi-threaded processor and also takes up much
needed space.
[0011] Thus, a way is needed to reorder a sequence of threads for a
particular program so that they are outputted by a multi-threaded
processor in the same order as they are received without the need
for an additional reorder buffer.
SUMMARY
[0012] A multi-threaded processor is provided having (a) a thread
scheduler to track a sequence in which a plurality of threads are
received from an application, (b) an internal memory buffer to
temporarily store the plurality of received threads, and (c) a
processing unit coupled to the thread scheduler and internal memory
buffer. The processing unit is configured to (1) process the
plurality of threads to obtain a plurality of corresponding
results, and (2) store the plurality of results in the internal
memory buffer. The plurality of threads are processed by the
processing unit according to the order defined by flow control
instructions associated with the plurality of threads. The flow
control instructions may cause the plurality of threads to be
processed in a different sequence than they were received. The
thread scheduler causes the plurality of stored results to be
outputted from the internal memory buffer according to the sequence
in which the corresponding threads were received from the
application. The memory buffer may include a plurality of input
registers to store the plurality of received threads prior to
processing and a plurality of output registers to store the
plurality of results prior to being outputted. A load controller
may be coupled to the thread scheduler and configured to store the
plurality of threads in a plurality of input registers in the
internal memory buffer under the direction of the thread scheduler.
The load controller may also output the results from the internal
memory buffer under the direction of the thread scheduler.
[0013] A method operational on a multi-thread processor compiler
provides for (a) receiving a plurality of instructions to be
compiled for operation on a multi-threaded processor; (b)
identifying output instructions in the plurality of instructions
that direct output results to an external register, (c) converting
the identified output instructions to direct the output results to
an internal register, and/or (d) compiling the plurality of
instructions for processing by the multi-threaded processor. The
multi-threaded processor may support flow control instructions that
cause threads to be processed in a different order than they are
received.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] FIG. 1 is a block diagram illustrating a programmable
multi-threaded processor that supports flow control instructions
and is configured to output threads for a particular process in the
same order in which they are received according to one
embodiment.
[0015] FIG. 2 is a block diagram illustrating how a sequence of
threads may be buffered in internal temporary registers of a
multi-threaded processor to guarantee that thread results are
outputted in the same order the threads are received.
[0016] FIG. 3 is a flow diagram illustrating a method operational
on a multi-threaded processor to guarantee that threads for a
particular process are outputted in the same order in which they
were received according to one implementation.
[0017] FIG. 4 is a block diagram of a graphics processor that
includes a multi-threaded processor according to one embodiment of
the invention.
[0018] FIG. 5 is a block diagram illustrating a mobile device
having a graphics processor with a multi-threaded processor
configured to operate according to one implementation of the
present invention.
[0019] FIG. 6 illustrates a method operational in a code compiler
for a multi-threaded processor having flow control instructions
according to one embodiment.
DETAILED DESCRIPTION
[0020] In the following description, specific details are given to
provide a thorough understanding of the embodiments. However, it
will be understood by one of ordinary skill in the art that the
embodiments may be practiced without these specific details. For
example, circuits may not be shown in block diagrams in order not
to obscure the embodiments in unnecessary detail.
[0021] Also, it is noted that the embodiments may be described as a
process that is depicted as a flowchart, a flow diagram, a
structure diagram, or a block diagram. Although a flowchart may
describe the operations as a sequential process, many of the
operations can be performed in parallel or concurrently. In
addition, the order of the operations may be re-arranged. A process
is terminated when its operations are completed. A process may
correspond to a method, a function, a procedure, a subroutine, a
subprogram, etc. When a process corresponds to a function, its
termination corresponds to a return of the function to the calling
function or the main function.
[0022] Moreover, a storage medium may represent one or more devices
for storing data, including read-only memory (ROM), random access
memory (RAM), magnetic disk storage mediums, optical storage
mediums, flash memory devices, and/or other machine readable
mediums for storing information. The term "machine readable medium"
includes, but is not limited to portable or fixed storage devices,
optical storage devices, wireless channels, and various other
mediums capable of storing, containing, or carrying instruction(s)
and/or data.
[0023] Furthermore, embodiments may be implemented by hardware,
software, firmware, middleware, microcode, or a combination
thereof. When implemented in software, firmware, middleware, or
microcode, the program code or code segments to perform the
necessary tasks may be stored in a machine-readable medium such as
a storage medium or other storage means. A processor may perform
the necessary tasks. A code segment may represent a procedure, a
function, a subprogram, a program, a routine, a subroutine, a
module, a software package, a class, or a combination of
instructions, data structures, or program statements. A code
segment may be coupled to another code segment or a hardware
circuit by passing and/or receiving information, data, arguments,
parameters, or memory contents. Information, arguments, parameters,
data, and the like, may be passed, forwarded, or transmitted via a
suitable means including memory sharing, message passing, token
passing, and network transmission, among others.
[0024] One feature provides a multi-threaded processor configured
to internally reorder output threads, thereby avoiding the need for
an external output reorder buffer. The multi-threaded processor
writes its thread results to a register bank in an internal memory
buffer to guarantee that thread results are outputted in the same
order in which the threads are received. A thread scheduler within
the multi-threaded processor manages resource allocation, thread
arbitration, and thread ordering to avoid the need for an external
reorder buffer.
[0025] Another aspect of the invention provides a compiler for the
multi-threaded processor that converts instructions that would
normally send thread results directly to an external reorder buffer
so that the thread results are instead sent to internal temporary
registers in an internal memory buffer of the multi-threaded
processor.
[0026] FIG. 1 is a block diagram illustrating a programmable
multi-threaded processor 102 that supports flow control
instructions and is configured to output threads (or segments or
portions of thread results) for a particular process in the same
order in which they are received according to one embodiment. The
terms "core", "engine", "processor" and "processing unit" are used
interchangeably herein. In one implementation, multi-threaded
processor 102 may be a shader core that performs certain graphics
operations such as shading and may compute transcendental
elementary functions.
[0027] A plurality of threads 104 from one or more processes are
received at an input interface (e.g., multiplexer 106) that
multiplexes the threads 104 into a thread stream 105. Input threads
104 may include graphics data, such as pixel information, sent by
one or more applications or processes. Such pixel information may
include position coordinates, color attributes, and/or texture
attributes for one or more pixels. Each application or process may
have more than one thread. In some implementations, the threads for
a particular application or process may have associated flow
control instructions that cause the threads 104 to be processed in
a different order than they were received from the application or
process.
[0028] Thread scheduler 108 receives the thread stream 105 and
performs various functions to schedule and manage execution of
threads 104. For example, thread scheduler 108 may schedule
processing of threads 104, determine whether resources needed by a
particular thread are available, and move the thread to a memory
buffer 118 (e.g., arranged as register file banks) via a load
controller 112. Thread scheduler 108 interfaces with load
controller 112 in order to synchronize the resources for received
threads 104. Thread scheduler 108 may also monitor the order in
which threads are received for a particular application or process
and cause those threads to be outputted in the same order or
sequence as they were received.
[0029] Thread scheduler 108 selects active threads for execution,
checks for read/write port conflicts among the selected threads
and, if there are no conflicts, sends instruction(s) for one thread
into an ALU 110 and sends instruction(s) for another thread to load
controller 112. At the request of thread scheduler 108, load
controller 112 may also be configured to obtain data associated
with a thread (from texture engine 126) and instructions associated
with the thread from an external source (e.g., global data cache
124 and/or an external memory device, etc.). In addition to issuing
fetch requests for missing instructions, load controller 1122 loads
thread data into memory buffer 118 and associated instructions into
instruction cache 114. Thread scheduler 108 also removes threads
that have been processed by ALU 110.
[0030] ALU 110 may be a single quad ALU or four scalar ALUs. In one
implementation, ALU 110 may perform pixel-parallel processing on
one component of an attribute for up to four pixels. Alternatively,
ALU 110 may perform component-parallel processing on up to four
components of an attribute for a single pixel. ALU 110 fetches data
from memory buffer 118 and receives constants from constant RAM
116. Ideally, ALU 110 processes data at every clock cycle so that
it is not idle, thereby increasing processing efficiency. ALU 110
may include multiple read and write ports on a bus to memory buffer
118 so that it is able to write out thread results while new thread
data is fetched/read on each clock cycle.
[0031] Multi-threaded processor 102 may be a programmable processor
configured to efficiently process particular types of data (e.g.,
graphics data). For example, multi-threaded processor 102 may
include constant data for efficiently processing multi-media data
streams (e.g., video, audio, etc.). For this purpose, a constant
RAM 116 may be included in the multi-threaded processor 102 to
enable load controller 112, under the direction of thread scheduler
108, to load application-specific constant data to efficiently
process particular types of instructions. For instance, an
instruction cache 114 stores instructions for the threads to
provide instructions to thread scheduler 108. Under the control of
thread scheduler 108, load controller 112 loads instruction cache
114 with instructions from global data cache 124 and loads constant
RAM 116 and memory buffer 118 with data from global data cache 124
and/or texture engine 126. The instructions indicate specific
operations to be performed for each thread. Each operation may be
an arithmetic operation, an elementary function, a memory access
operation, etc.
[0032] Rather than writing out results to an external reorder
buffer, ALU 110 uses memory buffer 118 to buffer its results before
they are outputted by multi-threaded processor 102. To facilitate
this dual use of memory buffer 118, the compiler for multi-threaded
processor 102 may be configured to convert direct output register
instructions to temporary register and use a global register to
define which internal register in the memory buffer 118 should be
used for writing results. That is, the compiler converts
instructions that would normally send output from the ALU 110 to an
external output register (i.e., an external reorder buffer) so that
the outputs are instead sent to temporary registers (i.e., memory
buffer 118). The compiler may accomplish this by either replacing
direct output register instructions or by redirecting such output
to temporary registers (i.e., in memory buffer 118). Global
registers are used to indicate to ALU 110 which temporary registers
(in memory buffer 118) are to be used to output results. In various
implementations, the global registers that define the internal
registers in memory buffer 118 (to be used to store outputs from
ALU 110) may be either internal or external to multi-threaded
processor 102.
[0033] Once results from ALU 110 are buffered in specified
temporary registers in memory buffer 118, thread scheduler 108
directs their output sequence. That is, since thread scheduler 108
knows the order or sequence in which threads for a particular
process were received, it directs load controller 112 to send out
thread results in a specified sequence (i.e., the order in which
the threads where originally received by thread scheduler 108).
Since thread scheduler 108 knows which thread is being processed by
ALU 110 at each clock cycle, it knows which registers in memory
buffer 118 are used to store each ALU result. Thread scheduler 108
then directs load controller 112 to read-out buffered results from
memory buffer 118 to an output interface (e.g., demultiplexer 120)
so that the thread results 122 are sent to processes in the order
or sequence in which the corresponding threads were received.
[0034] FIG. 2 is a block diagram illustrating how a sequence of
threads may be buffered in internal temporary registers to
guarantee that their results are outputted in the same order they
are received a multi-threaded processor 204 according to one
implementation. A plurality of threads 202 from a process or
application are received by the multi-threaded processor 204 in the
input order from Thread 1, Thread 2, Thread 3, etc., to Thread N-1
and Thread N. Multi-threaded processor 204 may be configured to
support flow control instructions that may cause the received
threads 202 to be processed out of sequence by processing circuits
206 within multi-threaded processor 204.
[0035] Rather than outputting processing results to an external
reorder buffer, the present invention sends or redirects the thread
results (Results 1 through N) to temporary registers in an internal
memory buffer 207. The memory buffer 207 may store threads that are
fetched by the processing circuits as well as the thread results of
the processed threads. Memory buffer 207 may also include a
plurality of register file banks 208 in which thread results are
stored. Multi-threaded processor 204 may map virtual registers to
available registers 208 in internal memory buffer 207 so that
thread results can be stored in contiguous and/or non-contiguous
memory addresses. Dynamic sizing of the internal registers allows
flexible allocation of internal memory buffer 207 depending on the
type and size of data in a thread.
[0036] Redirecting thread results to temporary registers in
internal memory buffer 207 may be accomplished by having the
multi-threaded processor compiler convert instructions that would
normally send output results to an external output register (i.e.,
an external reorder buffer) so that the results from processing
circuits 206 are instead sent to temporary registers in internal
register file banks 208. By using internal register file banks 208
for buffering and reordering output results, an external reorder
buffer is not needed thus saving costs and power.
[0037] As a result of flow control instructions, Threads 1 through
N may be processed out of order, generating the processing output
sequence of Result 2 (corresponding to Thread 2), Result 3
(corresponding to Thread 3), etc., Result N (corresponding to
Thread N), Result 1 (corresponding to Thread 1), Result N-1
(corresponding to Thread N-1), for example. These results are
held/stored in temporary registers in register file banks 208 in
memory buffer 207. The processing results are buffered until
Results 1 through N 210 can be outputted in the order in which
their corresponding threads 202 were received.
[0038] FIG. 3 is a flow diagram illustrating a method operational
on a multi-threaded processor to guarantee that threads are
outputted in the same order in which they were received by the
multi-threaded processor according to one implementation. A
plurality of threads are received at a multi-threaded processor 302
from a particular process. A thread scheduler tracks the sequence
in which the plurality of threads is received 304. The threads to
be processed are stored in an internal memory buffer for fetching
306. The threads are processed according to an order defined by
flow control instructions associated with the plurality of threads
to obtain thread results 308. The thread results are stored in the
internal memory buffer 310. The thread results are sent out from
the internal memory buffer according to the sequence in which the
plurality of corresponding threads were received 312. In one
implementation, the thread order control described by this method
may be performed or managed by a thread scheduler in the
multi-threaded processor. Because the input and output stages of
the multi-threaded processor are decoupled, it s relatively simple
to implement this type of thread ordering control.
[0039] FIG. 4 is a block diagram of a graphics processor 402 that
includes a multi-threaded processor according to one embodiment of
the invention. Graphics processor 402 includes a multi-threaded
processor 404, such as a shader core, that receives a plurality of
threads 406 from one or more graphics applications 408 as inputs,
either serially or in parallel, processes graphic data in the
threads 406 (e.g., pixel coordinates, colors, texture, etc.), and
provides the thread results 408 as outputs to the graphic
applications 408. Graphics applications 408 may include video
games, graphic displays, etc., and may run concurrently. Each
graphics application 408 may generate data threads to achieve their
desired results. Each thread may have associated instructions that
indicate a specific task to be performed on one or more pixels in
the thread.
[0040] In one implementation, graphics processor 402 also includes
supporting components, such as a texture engine 410 that performs
specific graphic operations such as texture mapping, and a cache
memory 412 that is a fast memory that can store data and
instructions for multi-threaded processor 404 and texture engine
410. Cache memory 412 may be coupled to an external main memory 414
through which it can receive data and/or instructions for
particular threads.
[0041] Multi-threaded processor 404 may include an internal memory
buffer which is used for temporarily storing threads 406 and/or
thread result. For a given process or application, a thread
scheduler in the multi-threaded processor causes the thread results
to be output in the same order or sequence in which the
corresponding threads were originally received.
[0042] Graphics processor 402 and/or multi-threaded processor 404
(e.g., shader core) may be implemented in various hardware units,
such as application-specific integrated circuits (ASICs), digital
signal processors (DSPs), digital signal processing device (DSPDs),
programmable logic devices (PLDs), field programmable gate array
(FPGAs), processors, controllers, micro-controllers,
microprocessors, and other electronic units.
[0043] Certain portions of graphics processor 402 and/or
multi-threaded processor 404 may be implemented in firmware and/or
software. For example, a thread scheduler and/or a load control
unit (e.g., in multi-threaded processor 404) may be implemented
with firmware and/or software code (e.g., procedures, functions,
and so on) that perform the functions described herein. The
firmware and/or software codes may be stored in a memory (e.g.,
cache memory 412 or main memory 414) and executed by multi-threaded
processor 404.
[0044] FIG. 5 is a block diagram illustrating a mobile device 502
having a graphics processor 512 with a multi-threaded processor
configured to operate according to one implementation of the
present invention. Mobile device 502 may be a mobile telephone,
personal digital assistant, mobile video terminal, etc. A
processing unit 504 is communicatively coupled to a main memory 510
and a display 506 that provides graphics, video, and other
information to a user. A communication interface 508 serves to
communicatively couple mobile device 502 to other communication
devices via a wireless or wired medium. A graphics processor 512
may be used by processing unit 504 to process graphics data prior
to sending it to the display 506. Graphics processor 512 includes a
multi-threaded processor configured to operate as illustrated in
FIGS. 1, 2, 3 and/or 4. For instance, graphics processor 512 may
include a multi-threaded processor having an internal memory buffer
(e.g., register file banks) which temporarily stores thread
results. For a given process or application, a thread scheduler in
the multi-threaded processor causes the thread results to be output
in the same order or sequence in which the corresponding threads
were originally received.
[0045] FIG. 6 illustrates a method operational in a code compiler
for a multi-threaded processor having flow control instructions
according to one embodiment. The code compiler may be a low-level
compiler that compiles instructions to be executed specifically by
the multi-threaded processor. A plurality of instructions to be
compiled for operation on the multi-threaded processor that
supports flow control are received 602. Output instructions that
direct output results to an external register are identified from
among the plurality of instructions 604. The identified output
instructions are converted to direct the output results to an
internal register 606. This may be accomplished by replacing or
converting the output instructions to instructions that redirect
output to the internal register. The plurality of instructions is
compiled for processing by the multi-threaded processor 608.
[0046] One or more of the components, steps, and/or functions
illustrated in FIGS. 1, 2, 3, 4, 5 and/or 6 may be rearranged
and/or combined into a single component, step, or function or
embodied in several components, steps, or functions without
departing from the invention. Additional elements, components,
steps, and/or functions may also be added without departing from
the invention. The apparatus, devices, and/or components
illustrated in FIGS. 1, 2, 4 and/or 5 may be configured to perform
one or more of the methods, features, or steps described in FIGS. 3
and/or 6.
[0047] Those of skill would further appreciate that the various
illustrative logical blocks, modules, circuits, and algorithm steps
described in connection with the embodiments disclosed herein may
be implemented as electronic hardware, computer software, or
combinations of both. To clearly illustrate this interchangeability
of hardware and software, various illustrative components, blocks,
modules, circuits, and steps have been described above generally in
terms of their functionality. Whether such functionality is
implemented as hardware or software depends upon the particular
application and design constraints imposed on the overall
system.
[0048] It should be noted that the foregoing embodiments are merely
examples and are not to be construed as limiting the invention. The
description of the embodiments is intended to be illustrative, and
not to limit the scope of the claims. As such, the present
teachings can be readily applied to other types of apparatuses and
many alternatives, modifications, and variations will be apparent
to those skilled in the art.
* * * * *