U.S. patent application number 11/488977 was filed with the patent office on 2008-01-24 for quality of service scheduling for simultaneous multi-threaded processors.
This patent application is currently assigned to International Business Machines Corporation. Invention is credited to Orran Y. Krieger, Bryan S. Rosenburg, Robert B. Tremaine, Robert W. Wisniewski.
Application Number | 20080022283 11/488977 |
Document ID | / |
Family ID | 38972856 |
Filed Date | 2008-01-24 |
United States Patent
Application |
20080022283 |
Kind Code |
A1 |
Krieger; Orran Y. ; et
al. |
January 24, 2008 |
Quality of service scheduling for simultaneous multi-threaded
processors
Abstract
A method and system for providing quality of service guarantees
for simultaneous multithreaded processors are disclosed. Hardware
and operating system communicate with one another providing
information relating to thread attributes for threads executing on
processing elements. The operating system controls scheduling of
the threads based at least partly on the information communicated
and provides quality of service guarantees.
Inventors: |
Krieger; Orran Y.; (Newton,
MA) ; Rosenburg; Bryan S.; (Cortlandt Manor, NY)
; Tremaine; Robert B.; (Stormville, NY) ;
Wisniewski; Robert W.; (Ossining, NY) |
Correspondence
Address: |
SCULLY SCOTT MURPHY & PRESSER, PC
400 GARDEN CITY PLAZA, SUITE 300
GARDEN CITY
NY
11530
US
|
Assignee: |
International Business Machines
Corporation
Armonk
NY
|
Family ID: |
38972856 |
Appl. No.: |
11/488977 |
Filed: |
July 19, 2006 |
Current U.S.
Class: |
718/104 |
Current CPC
Class: |
G06F 9/3851 20130101;
G06F 2209/5019 20130101; G06F 2209/508 20130101; G06F 9/505
20130101; G06F 2209/5018 20130101 |
Class at
Publication: |
718/104 |
International
Class: |
G06F 9/46 20060101
G06F009/46 |
Goverment Interests
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
[0001] This invention was made with Government support under
Contract No.: NBCH020056 (DARPA) awarded by Defense, Advanced
Research Projects Agency. The Government has certain rights in this
invention.
Claims
1. A method of providing quality of service scheduling in
multithreaded processing, comprising: identifying one or more
hardware resources utilized by a thread in simultaneous
multithreaded processing; communicating the identified one or more
hardware resource utilization; and allowing reservation of the one
or more hardware resources utilized by a thread in the simultaneous
multithreaded processing.
2. The method of claim 1, wherein the step of identifying includes
identifying one or more hardware resources utilized by a set of
threads in simultaneous multithreaded processing.
3. The method of claim 1, wherein the step of identifying is
performed by hardware.
4. The method of claim 1, wherein the step of communicating
includes storing information pertaining to the identified one or
more hardware resource utilization.
5. The method of claim 1, wherein the one or more hardware
resources include one or more processing elements, functional units
or cache memory or combination thereof.
6. The method of claim 1, wherein the one or more hardware
resources includes at least a floating point unit, an integer unit,
an arithmetic logic unit, a shifter, a register, a load-store unit,
cache memory or combination thereof.
7. The method of claim 1, further including: scheduling one or more
threads based on information associated with the identified one or
more hardware resource utilization.
8. The method of claim 7, wherein the step of scheduling is
performed by an operating system.
9. The method of claim 1, further including: reserving one or more
hardware resources for a thread based on information associated
with the identified one or more hardware resource utilization.
10. The method of claim 9, wherein the step of reserving is
performed by an operating system.
11. The method of claim 9, wherein the step of reserving includes
storing one or more data bits in a register accessible by hardware,
the data bits identifying which one or more hardware resources to
reserve for a thread.
12. The method of claim 1, wherein the step of communicating
includes storing in a register accessible by an operating system,
one or more data bits that identify one or more hardware resource
utilization by a thread.
13. The method of claim 12, wherein the step of storing is
performed by hardware.
14. The method of claim 1, further including: analyzing information
associated with the identified one or more hardware resource
utilization by a thread.
15. The method of claim 1, further including: communicating between
software and hardware information associated with one or more
threads by using software thread identifier to hardware thread
identifier mapping.
16. The method of claim 1, further including: restricting one or
more hardware resources from a thread based on information
associated with the identified one or more hardware resource
utilization.
17. A method of providing quality of service scheduling in
multithreaded processing, comprising: accessing information
associated with hardware resource utilization per thread in
simultaneous multithreaded processing; and scheduling one or more
threads based on the information.
18. The method of claim 17, further including: analyzing the
information to determine at least one of memory utilization
pattern, thread affinity concern, thread interference issue, and
other thread resource behavior affecting utilization, and the step
of scheduling includes scheduling one or more threads based on at
least one of memory utilization pattern, thread affinity concern,
thread interference issue, and other thread resource utilization
affection behavior.
19. A method of providing quality of service scheduling in
multithreaded processing, comprising: accessing information
associated with a thread's use of one or more hardware resources on
a core in simultaneous multithreaded processing; and reserving one
or more hardware resources based on the accessed information.
20. A system for providing quality of service scheduling in
multithreaded processing, comprising: a hardware controller on a
processor operable to track a thread's use of one or more hardware
resources in simultaneous multithreaded processing, the hardware
controller further operable to communicate information associated
with the use of one or more hardware resources per thread; and an
operating system operable to access the information and schedule
one or more threads based on the information.
Description
FIELD OF THE INVENTION
[0002] The present disclosure generally relates to computer
processing and particularly to multithreaded processing.
BACKGROUND OF THE INVENTION
[0003] As the number of available transistors has increased,
processor-chip architects have turned to multithreaded processors
such as simultaneous multithreaded (SMT) processors as a way to
continue to increase performance. Generally, SMT processors permit
multiple threads to execute instructions using the same set of
functional units within a given core. However, this means that the
different hardware threads then compete for use of those functional
units. One class of shared resources includes the execution units
or functional units such as the integer units, floating-point
units, load-store units, and the like. It is predicted that SMT
processor will become a commonplace platform for the next
generation of processor chips. However, because of its capability
to allow sharing of processor resources, SMT technique in
processors introduces a new degree of complexity in scheduling.
[0004] Real-time concerns have long been researched and implemented
in operating systems. However, with the advent of multimedia
applications such as mpeg players, Quality of Service (QOS)
concerns have been addressed more seriously by a much wider range
of operating systems. Now, most operating systems provide some
notion of QOS to the applications.
[0005] However, when it comes to multithreaded processing, the
current operating systems' quality of service schedulers cannot
adequately handle threads executing on an SMT processor. This is
because the threads interfere with each other, for example, by more
than one thread trying to use greater than 1/2 of the available
floating point units, or by colliding in their use of the L1 cache.
Because this happens dynamically, it is difficult to predict the
performance degradation the interference causes, and thus precludes
the ability to make quality of service guarantees. In addition,
conventional SMT processor hardware does not provide the operating
system with a capability to understand the crucial attributes of a
thread on the SMT processor.
[0006] Without significantly under utilizing an SMT processor, the
operating system cannot provide QOS guarantees. Without knowledge
of the characteristics of the threads running on an SMT processor,
an operating system would not be able to provide QOS guarantees if
it schedules more than one thread on a given SMT core. There is no
mechanism currently available for providing information about the
functional unit utilization per thread. What is needed is a method
and system for the hardware and the operating system on
multithreaded processors such as SMT processors to communicate
information about the threads on the processors, so that for
example, an operating system may provide QOS guarantees.
BRIEF SUMMARY OF THE INVENTION
[0007] A method and system for providing quality of service
scheduling in multithreaded processing are disclosed. The method in
one aspect includes identifying one or more hardware resources
utilized by a thread in simultaneous multithreaded processing, and
communicating the identified one or more hardware resource used by
the thread. The resource usage may be recorded for an individual
thread or a set of threads. Thus, in another aspect, the step of
identifying may include identifying one or more hardware resources
utilized by a set of threads in simultaneous multithreaded
processing. In one aspect, hardware identifies the thread's use of
resources.
[0008] The step of communicating may include storing information
pertaining to the identified one or more hardware resource
utilization. Hardware, for instance, may store the information in a
register accessible by an operating system. The one or more
hardware resources for example may include but are not limited to
one or more processing elements, functional units, or cache memory,
or combination thereof. Examples of processing elements and
functional units may include but are not limited to a floating
point unit, an integer unit, an arithmetic logic unit, a shifter, a
register, a load-store unit, or combination thereof. Examples of
cache memory may include but are not limited to cache line and
cache sub-levels.
[0009] The method in another aspect may include scheduling one or
more threads based on information associated with the identified
one or more hardware resource utilization. In one aspect, the
software or operating system performs the scheduling. The method in
yet another aspect may include reserving one or more hardware
resources for a thread based on information associated with the
identified one or more hardware resource utilization. The step of
reserving may be performed by the software or operating system. In
one aspect, the step of reserving may include storing one or more
data bits in a register accessible by hardware, the data bits
identifying which one or more hardware resources to reserve for a
thread. In another aspect, the method may further include analyzing
information associated with the identified one or more hardware
resource utilization by a thread. Still yet in another aspect, the
method may include restricting one or more hardware resources from
a thread based on information associated with the identified one or
more hardware resource utilization.
[0010] A system for providing quality of service scheduling in
multithreaded processing in one aspect may include a hardware
controller on a processor operable to track a thread's use of one
or more hardware resources in simultaneous multithreaded
processing. The hardware controller may be further operable to
communicate information associated with the use of one or more
hardware resources per thread. Software or an operating system is
operable to access the information and schedule one or more threads
based on the information. In one aspect, the communication between
the software and the hardware about information associated with one
or more threads may be performed using software thread identifier
to hardware thread identifier mapping.
[0011] Further features as well as the structure and operation of
various embodiments are described in detail below with reference to
the accompanying drawings. In the drawings, like reference numbers
indicate identical or functionally similar elements.
BRIEF DESCRIPTION OF THE DRAWINGS
[0012] FIG. 1 is a flow diagram illustrating a method of the
present disclosure in one embodiment.
[0013] FIG. 2 shows an example of a register storing software to
hardware thread mapping.
[0014] FIG. 3 illustrates an example of a utilization table that is
populated as the hardware executes a thread.
[0015] FIG. 4 is a flow diagram illustrating a method for providing
quality of service guarantees in an exemplary embodiment of the
present disclosure.
[0016] FIG. 5 illustrates a register memory storing operating
system's reservation information for the resources on a core.
[0017] FIG. 6 is a diagram illustrating an example of processor
components in one embodiment of the present disclosure.
DETAILED DESCRIPTION
[0018] In an exemplary embodiment of the present disclosure, the
hardware provides information as to which hardware threads
executing on a core are using or have used which processing
elements or functional units or the like on the core. The hardware
may also provide information pertaining to memory utilization of a
hardware thread, for instance, the hardware thread's use of L1
cache on the core. Additional characteristics or attributes of the
hardware threads may be provided. The operating system uses this
information to predict resource availability for scheduled
applications, to reserve a particular processing element for a
given thread, and to otherwise guarantee quality of service to
applications. In another embodiment, the information may be
provided for a given set of hardware threads, and the operating
system or the like may use the information to predict resource
availability and reserve processing elements for a given set of
hardware threads.
[0019] FIG. 1 is a flow diagram illustrating a method of the
present disclosure in one embodiment. At 102, an operating system
or the like has a software thread, for example, a software
application or entity, to schedule. At 104, the operating system
schedules the thread onto a given core. A core on a processor for
example includes a plurality of processing elements and can handle
an execution of a thread. A core for example can have one or more
floating point units, integer units, arithmetic logic units (ALUs),
shifters, and the like. A processor may include one or more cores.
In SMT processing, instructions from multiple threads of execution
share the functional units of a single core.
[0020] In one embodiment, the mapping between the threads that the
operating system schedules and the hardware threads that the
hardware receives and executes, is kept, for example, in a series
of registers associated with the hardware threads. FIG. 2 shows an
example of a register table having the software to hardware thread
mapping. In one embodiment, when the operating system schedules the
thread, the operating system records the software thread identifier
(id) 202. When the hardware takes that thread to execute, the
hardware tags the thread with a hardware thread id 204 and enters
the corresponding hardware thread id in the mapping table 200.
[0021] Referring to FIG. 1, the hardware enters the hardware thread
id that maps to the software thread id at 106, for example, into a
register table such as the one shown in FIG. 2. The operating
system now has a mapping of its threads to hardware threads.
Whenever the operating system needs to communicate with the
hardware about its thread, the operating system can use the mapped
hardware thread id. At 108, the thread runs. At 110, for example,
as the thread executes, the hardware records that thread's use of
various processing elements such as the floating point units,
integer units, ALUs, shifters, registers, decoder, and the like.
The hardware may also log the thread's memory utilization patterns
such as the thread's use of L1 cache, the amount of use, etc. Other
characterization or attributes of the running hardware thread may
be provided similarly. In one embodiment of the present disclosure,
this information may be stored or logged in a utilization table
such as the one shown in FIG. 3. The steps shown in FIG. 1 are
repeated for additional threads or applications or the like that
the operating system has to schedule, and the utilization table for
each thread is populated with respective usage or characterization
information according to the method shown in FIG. 1 in an exemplary
embodiment.
[0022] FIG. 3 illustrates an example of a utilization table in one
embodiment, which for example is populated when the hardware
executes a thread and is accessible by an operating system or the
like. For a given hardware thread, for example, identified by a
thread id 302, 304, 306, the hardware may record the usage
information for the functional units (e.g., 308, 310, 312, 314,
316) and the cache (318) or the like on a given core or any other
characterization or attributes associated with the thread. In one
embodiment, the information may be simply a bit setting to show
whether a hardware thread used that particular functional unit or
not. For instance, a value of "1" at 320 shows that the hardware
thread identified by thread id1 302 utilized floating point unit 1
(FPU1) 308. Similarly, the table shows that this thread used ALU
314 and decoder 316. Likewise, the hardware thread having hardware
thread id2 used FPU2 at 310 and decoder at 316. In another
embodiment, the amount of use for the particular functional unit
may be recorded by keeping a counter and incrementing the counter
for each cycle of use. In one embodiment, the hardware thread's
cache utilization 318, for instance, L1 cache may be logged by
entering the number of bytes that thread has in the cache as shown
in 322. Information about additional processing elements,
functional units, caches, and other resources may be stored. The
table shown in FIG. 3 is for illustrative purposes only. Any other
known or will-be-known method or scheme for logging or recording
information or otherwise communicating information may be used. For
instance, a separate memory bank or register may be used per thread
for keeping track of that thread's attributes, characterization or
usage of the processing elements and resources on the core.
[0023] In an exemplary embodiment, the operating system or the like
uses the logged information to determine and predict resource
availability for a given thread, control to an extent what
resources can be allocated to what threads, and otherwise provide
reasonable quality of service guarantees to applications or the
like. FIG. 4 is a flow diagram illustrating a method for providing
quality of service guarantees in an exemplary embodiment of the
present disclosure. At 402, an operating system receives
specifications to deliver quality of service from applications,
users, or from the system, or from any other entity that can
request such service. A specification for example may require that
the application be run and completed in a predetermined amount of
time. Video or audio streaming applications for instance may
require that there be no noticeable delay during the streaming
process. At 404, the operating system or the like performs analysis
based on the information logged about hardware thread's usage of
various processing elements and resources and its characterization
and attributes, for example, as recorded in a table such as the
utilization table shown in FIG. 3. The analysis may determine for
example, memory utilization pattern, thread affinity concern,
thread interference issue, and other thread resource utilization
affection behavior. Such information may be used, for instance, in
scheduling and/or to reserve or restrict one or more hardware
resources.
[0024] At 406, based on the analysis, the operating system
communicates to the hardware to reserve certain resources for a
given thread, to restrict other resources for another thread, etc.
For instance, the logged information may provide that this
particular type of application requires certain functional units
and processing elements to execute. In turn, the operating system
may decide that it needs to reserve those functional units and
processing elements for one or more threads associated with that
particular application in order to meet the guaranteed quality of
service. The operating system in one embodiment may communicate
such reservation requests for functional units, processing elements
or caches to the hardware, for example, by using another register.
The operating system, for example, may fill in a table such as the
one shown in FIG. 5.
[0025] FIG. 5 illustrates bits in a register storing operating
system's reservation information for the resources on a core.
Various processing elements, function units, cache, and the like
(as shown at 506, 508, 510, 512, 514, 516) may be reserved for
different threads 502, 504. For example, the table shows that the
operating system is reserving floating point unit 1 (FPU1) 506 and
floating point unit 2 (FPU2) 508 for the hardware thread identified
as HWT0 502. Similarly, for the hardware thread with thread id HWT1
504 the operating system is reserving floating point unit 3 (FPU3)
510. The reservation may be indicated as a bit value as shown in
the entries. For example, a value of 1 may indicate to reserve
while a value of 0 may indicate to restrict the particular resource
from being used by that thread. Another value in the entry, for
example, may indicate that it is left to the hardware to decide
which thread may use that resource. In one embodiment, reservation
for cache 516 may be indicated by the number of bytes or partitions
to reserve.
[0026] FIG. 6 is a diagram illustrating an example of processor
components in one embodiment of the present disclosure. A processor
600, for instance, may include one or more cores 602, 604. The
example shown in FIG. 6 illustrates a dual-core processor. Each
core 602, 604 may include a set of processing elements 606 or
functional units and cache memory 608 on which the threads of SMT
are multiplexed. Processing elements 606, for instance, may include
one or more floating point units (FPU), integer units, arithmetic
logic units (ALU), registers, decoders, shifters, load-store units,
etc., enabled to process thread executions. In one embodiment of
the present disclosure, the core may also include registers 610,
612, 614 for storing information associated with various
characteristics of a thread as described above. The register at
610, for example, may store mappings of software threads
identifiers to their corresponding hardware thread identifiers. In
one embodiment, an operating system accesses this register to log
its software thread ids, and the hardware inserts the corresponding
hardware thread ids. Thereafter, communications regarding the
threads between the operating system and the hardware may be
performed using the thread id mappings.
[0027] In one embodiment, the register at 612 may store information
regarding various characterization or attributes of a thread. For
instance, it stores the usage information such as whether a
hardware thread used one or more of the processing elements, the
amount of usage of various resources on the core, the amount of
cache usage, etc. The operating system in one embodiment accesses
the information, performs analysis based on the information and
makes scheduling decisions that would fulfill quality of service
guarantees. The register at 614 may store information pertaining to
requests from the operating system as to how the processing
elements or other resources on the core should be allocated to the
running threads. For instance, the operating system may request to
reserve one or more functional units for a given thread. Similarly,
the operating system may request to restrict one or more functional
units for a given thread. Still yet, the operating system may
request that a number of cache bytes or partitions be reserved for
a given thread. The operating system may request such reservations
or restrictions based on the analysis and scheduling decisions that
is has made from using the information stored in the utilization
register 612.
[0028] The operating system may reserve a particular processing
element for a given thread or given set of threads, may reserve
functional units for a given thread or given set of threads, and
may reserve cache lines and sub-levels for data. Similarly, the
operating system may restrict a given thread from using a
particular processing element, functional unit, or cache sub-level.
By reserving the needed resources or otherwise controlling the use
of the resources on a given core, the operating system is able to
meet the quality of service requirements.
[0029] In addition, by using the logged information characterizing
a given thread's attributes and resource usage, the operating
system is able to make decisions as to which threads should or
should not be scheduled together or near each other. For example,
the operating system may determine how much each thread makes uses
of the different processing elements on the core, evaluate the
threads the operating system has to schedule, decide whether
scheduling certain threads together would meet the promised quality
of service, and schedule the threads accordingly.
[0030] In an exemplary embodiment of the present disclosure the
characterization and usage information about different threads
executing on a given core are obtained and gathered during the real
time processing of the hardware threads. In another embodiment, the
execution environment may be modeled and simulated to obtain the
information. Similarly, the operating system's reserving and
restricting may be also modeled and simulated, and the performance
results from such simulation may be used, for example, for
benchmarking.
[0031] The embodiments described above are illustrative examples
and it should not be construed that the present invention is
limited to these particular embodiments. Thus, various changes and
modifications may be effected by one skilled in the art without
departing from the spirit or scope of the invention as defined in
the appended claims.
* * * * *