U.S. patent application number 15/139455 was filed with the patent office on 2016-10-27 for utilization of processor capacity at low operating frequencies.
The applicant listed for this patent is Intel Corporation. Invention is credited to Alexander Gendler, Ruchira Sasanka, Udi Sherel.
Application Number | 20160313786 15/139455 |
Document ID | / |
Family ID | 51627206 |
Filed Date | 2016-10-27 |
United States Patent
Application |
20160313786 |
Kind Code |
A1 |
Sasanka; Ruchira ; et
al. |
October 27, 2016 |
Utilization of Processor Capacity at Low Operating Frequencies
Abstract
In an embodiment, a processor includes one or more cores
including a first core operable at an operating voltage between a
minimum operating voltage and a maximum operating voltage. The
processor also includes a power control unit including first logic
to enable coupling of ancillary logic to the first core responsive
to the operating voltage being less than or equal to a threshold
voltage, and to disable the coupling of the ancillary logic to the
first core responsive to the operating voltage being greater than
the threshold voltage. Other embodiments are described and
claimed.
Inventors: |
Sasanka; Ruchira;
(Hillsboro, OR) ; Gendler; Alexander; (Kiriat
Motzkin, IL) ; Sherel; Udi; (Natanya, IL) |
|
Applicant: |
Name |
City |
State |
Country |
Type |
Intel Corporation |
Santa Clara |
CA |
US |
|
|
Family ID: |
51627206 |
Appl. No.: |
15/139455 |
Filed: |
April 27, 2016 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
14933378 |
Nov 5, 2015 |
9361234 |
|
|
15139455 |
|
|
|
|
14039368 |
Sep 27, 2013 |
9256276 |
|
|
14933378 |
|
|
|
|
Current U.S.
Class: |
1/1 |
Current CPC
Class: |
Y02D 10/00 20180101;
Y02D 50/20 20180101; Y02D 10/126 20180101; G06F 12/084 20130101;
Y02D 10/172 20180101; G06F 1/3296 20130101; G06F 2212/62 20130101;
G06F 12/0223 20130101; G06F 1/266 20130101; Y02D 30/50 20200801;
G06F 1/324 20130101 |
International
Class: |
G06F 1/32 20060101
G06F001/32; G06F 1/26 20060101 G06F001/26; G06F 12/084 20060101
G06F012/084 |
Claims
1. A processor comprising: a plurality of cores including a first
core to operate at an operating voltage and an operating frequency,
the operating voltage between a minimum operating voltage and a
maximum operating voltage; and a first memory coupled to the first
core; a second memory; a shared cache memory to be shared by at
least some of the plurality of cores; and a power control unit
including first logic to enable the second memory to be coupled to
the first core when the operating frequency is less than a maximum
operating frequency at the operating voltage, and to disable the
second memory from being coupled to the first core when the
operating voltage exceeds a threshold voltage.
Description
[0001] This application is a continuation of U.S. patent
application Ser. No. 14/933,378, filed Nov. 5, 2015, which is a
continuation of U.S. patent application Ser. No. 14/039,368, filed
Sep. 27, 2013, now U.S. Pat. No. 9,256,276, issued Feb. 9, 2016,
the content of which is hereby incorporated by reference.
TECHNICAL FIELD
[0002] Embodiments relate to processor capacity utilization at low
operating frequencies. BACKGROUND
[0003] Thermal/power limits (Thermal Design Power (TDP)) may be a
factor in design and operation of a processor. Thermal/power limits
may be obeyed by reduction in operating voltage of the processor.
Additionally, in order to comply with the TDP, the processor
including core, uncore, and graphics portion (GT), may be operated
at a lower frequency than the processor's maximum frequency of
operation, even when the processor is being heavily utilized. For
instance, in a server, when all cores/threads are being actively
utilized, the frequency of each core (or the uncore) may need to be
reduced to meet thermal constraints. However, reduction of
operating frequency typically lowers computing throughput of the
processor.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] FIG. 1 is a block diagram of an apparatus that includes a
reconfigurable structure, according to an embodiment of the present
invention.
[0005] FIG. 2 is a block diagram of a reconfigurable structure,
according to an embodiment of the present invention.
[0006] FIG. 3 is a graph of frequency of operation versus operating
voltage Vcc of a core, according to embodiments of the present
invention.
[0007] FIG. 4 is a block diagram of an apparatus, according to an
embodiment of the present invention.
[0008] FIG. 5 is a block diagram of a portion of frequency
dependent control logic, according to an embodiment of the present
invention.
[0009] FIG. 6 is a flow diagram of a method of increasing processor
performance at low operating frequencies, according to embodiments
of the present invention.
[0010] FIG. 7 is a block diagram of a processor in accordance with
an embodiment of the present invention.
[0011] FIG. 8 is a block diagram of a multi-domain processor in
accordance with another embodiment of the present invention.
[0012] FIG. 9 is a block diagram of a system in accordance with an
embodiment of the present invention.
[0013] FIG. 10 is a block diagram of a system on a chip (SOC) in
accordance with embodiments of the present invention.
DETAILED DESCRIPTION
[0014] Processors typically have a threshold voltage of operation
(minVcc) because a transistor typically does not operate reliably
below the supply voltage minVcc. At minVcc, a core of the processor
may be able to operate in a range of frequencies
f.sub.1.ltoreq.f.ltoreq.f.sub.2. Power (P) consumed by a core may
be expressed as P=P.sub.leakage+(C.sub.dynamic)(f)(Vcc).sup.2,
where P.sub.leakage is power consumed due to leakage effects and
C.sub.dynamic represents an effective capacitance of the core. In
order to meet the TDP, the operating voltage of the core may be
lowered to min Vcc, and additional incremental power savings may be
achieved by lowering the frequency of operation f within the range
f.sub.1.ltoreq.f.ltoreq.f.sub.2.
[0015] Typically structures within the processor, e.g., memory
structures such as Static Random Access Memory (SRAM) arrays that
may be located in the core, uncore, or GT (integrated graphics),
are accessed in one (or a few) cycles at a maximum operating
frequency f.sub.max of the processor for computational efficiency
reasons. As an example, to access a translation lookaside buffer
(TLB) in one-cycle at 2 GHz, a size of the TLB may be limited to 16
entries. At an operating frequency of 2 GHz, a TLB that has 24
entries would not permit access to every entry within one cycle,
e.g., in order to cover a large distance within a 24 entry TLB
during one cycle, more time would need to be allotted for retrieval
of data than is possible at 2 GHz.
[0016] Although a large structure may not be feasible to access at
a high operational frequency, if the frequency is lowered the large
structure may be able to be traversed, e.g., within the range
f.sub.1.ltoreq.f.ltoreq.f.sub.2. By lowering the frequency of
operation of the processor, power/thermal savings may be realized,
and the lower frequency may permit a larger structure, e.g., larger
TLB, to be accessed. Each frequency of operation may permit a
different size of structure to be accessed.
[0017] For example, assume that a structure S is to be accessed
within one cycle, and that S can support 16 entries at frequency
f.sub.2 at voltage minVcc. The same structure S may be able to
support more than 16 entries (e.g., 24 entries) at the frequency
f.sub.1 at the same voltage of minVcc. In other words, at the same
voltage (minVcc), operation at low frequencies allows access to a
larger structure (e.g., with more entries) during the same number
of cycles. Consequently, when frequency is reduced while keeping
the supply voltage constant at minVcc, a structure can support more
entries (e.g., with longer wires) because, at a given voltage, a
larger capacitance (longer wires) can be charged within a longer
cycle time (i.e., at lower frequency).
[0018] Structures may be designed to be re-configurable. If a large
structure was utilized initially, the longer wires would inhibit
high frequency operation (e.g., at f.sub.max). A reconfigurable
structure may include one or more partitions so as to support more
entries (longer wires) at lower frequencies and fewer entries
(shorter wires) at higher frequencies. Although increasing the size
of a few structures can increase the power consumption, this
increase in power consumption may be smaller than an increase of
power consumption of an entire core when operated at a higher
frequency.
[0019] Alternatively or in addition, reduction in the frequency of
operation f of the core at minVcc may permit enablement of decision
logic that takes advantage of additional timing margins as a result
of the reduced frequency. In an embodiment, the decision logic may
gate power to a portion of the core, e.g., the gating based on an
operation to be executed. For example, frequency reduction can
result in increased timing margin, which may permit the decision
logic to determine that first data from a first source is not
needed in order to execute a next operation in an instruction
queue. The decision logic can gate the power to be provided (e.g.,
power down) to the portion of the core that would otherwise read
the first data, thus saving power that would otherwise be used in
the core.
[0020] FIG. 1 is a block diagram of an apparatus 100 that includes
a reconfigurable structure, according to an embodiment of the
present invention. The apparatus 100 includes a processor 102 and a
system memory 160, such as a dynamic random access memory (DRAM).
The processor 102 may include one or more cores 104.sub.0,
104.sub.1, . . . ,104.sub.n. The processor 102 may also include a
memory 106 coupled to the core 104.sub.0, an auxiliary memory 108,
a switch 110, and uncore 120 that may include a power control unit
(PCU) 130, and that may include a shared cache 140, and one or more
interfaces 150.sub.0, 150.sub.1, . . . 150.sub.n to interface with,
e.g., input/output (I/O) devices (not shown). The PCU 130 may
include frequency dependent control logic 132.
[0021] In operation, the core 104.sub.0 may operate at an operating
voltage Vcc that may be between minimum operating voltage minVcc
and a maximum operating voltage maxVcc, and at a frequency of
operation f between a first frequency f.sub.1 (minimum frequency)
and f.sub.max (maximum frequency). In an example, the core operates
at frequency f.sub.max when the operating voltage is maxVcc. At the
operating voltage Vcc =minVcc, the core may operate at a frequency
f between the minimum frequency of operation f.sub.1 and a second
frequency f.sub.2 that is less than f.sub.max.
[0022] Operation parameters (e.g., Vcc, f) of the core 104.sub.0
may be controlled by the PCU 130, which may determine the Vcc and f
for the core 104.sub.0 based at least in part on a thermal budget
(e.g., thermal design point (TDP)) associated with the core
104.sub.0. The operating parameter values may also be selected by
the PCU 130 based in part on an anticipated load, e.g., number and
size of instructions to be executed by the core 104.sub.0 during a
particular time period, and/or based on other factors.
[0023] In order to reduce power consumed by the core 104.sub.0, Vcc
may be reduced to Vcc=minVcc. The frequency of operation at Vcc may
be selected to be within the range f.sub.1.ltoreq.f.ltoreq.f.sub.2.
If f is selected to be in the range f.sub.1.ltoreq.f<f.sub.2,
the core 104.sub.0 is operating at a frequency less than f.sub.2,
and so there is more time available for access of additional memory
locations within the same number of cycles (e.g., a given number of
cycles occurs over a greater time period at f=f.sub.1 than at a
higher frequency such as f=f.sub.2).
[0024] Frequency dependent control logic 132 may determine that,
due to the lower frequency of operation at minVcc, the core
104.sub.0 may access auxiliary memory 108, and the frequency
dependent control logic 132 may activate switch 110 to couple the
auxiliary memory 108 with the memory 106. Thus, as a result of
lowering the frequency of operation f (e.g., to comply with TDP, or
due to a reduced load of the core 104.sub.0) to a value less than
the highest frequency f.sub.2 permitted at minVcc, access by the
core 104.sub.0 to auxiliary memory 108 may be enabled.
[0025] There may be other possible ways to produce a
re-configurable array structure (not shown in FIG. 1), including:
1) Provide two arrays, one short and one long. The shorter array
may be coupled to the core at high frequencies of operation of the
core and the longer array may be coupled to the core at low
frequencies of operation of the core. This configuration may employ
multiplexers to select an input/output from the two arrays. 2)
Utilize multiple partitions and combine the partitions using, e.g.,
multiplexers (not shown). Use of multiple partitions may permit
more than two sizes of memory, each size to be selected based on a
frequency of operation of the core that is to access the
memory.
[0026] There are many structures that can benefit from higher
capacity (more entries), e.g., when the structures are heavily
utilized by multiple threads/processes running in the system. Some
exemplary structures are presented below.
[0027] 1) Translation Lookaside Buffer (TLB). A TLB is generally
capacity constrained, e.g., in server systems running multiple
threads. One or more TLBs may be in a critical path of memory
accesses, and hence a size of each TLB may be limited in order to
permit fast access. Increasing a number of entries of one or more
TLBs at low frequencies can boost performance.
[0028] 2) Cache fill-buffers. Fill-buffers may be part of a
critical path for memory access and hence may not support a large
number of entries. Increasing a number of entries at low
frequencies, especially when multiple simultaneous multithreading
(SMT) threads are running, can boost performance.
[0029] 3) Shared queues (e.g., super-queue) in the uncore. When the
system is heavily utilized by multiple threads/processes, shared
queues in the uncore can become a bottleneck.
[0030] 4) Core structures, such as reservation stations, re-order
buffer entries, branch tables, and physical register files.
[0031] 5) Caches and victim buffers. When multiple
threads/processes are sharing a cache, the cache capacity per
thread gets reduced. Note that L1 caches can be shared by multiple
SMT threads and L2 or L3 caches can be shared by multiple cores.
The number of entries (sets) in a cache may be increased at lower
frequencies to compensate for the loss of cache capacity.
Similarly, the size of victim buffers can be increased.
[0032] 6) Buffers in memory/DRAM Controllers. When the system is
under heavy load, performance may be increased by increasing shared
buffers in a DRAM controller.
[0033] 7) Checkpoint buffers: In structures that support
checkpoints (e.g., register alias table (RAT) checkpoints), a total
number of checkpoints at lower frequencies may be increased.
[0034] 8) Simultaneous multithreading (SMT) shared structures. When
multiple SMT threads are running on a core, some structures are
shared because larger structures cannot be supported at maximum
frequency. However, at lower frequencies, the size of these shared
structures can be increased, which may result in an increase in SMT
effectiveness.
[0035] In processors, there can be a large frequency range over
which to operate at minVcc. For instance, a core can be running
from 1.2 GHz to 600 MHz (or even 400 MHz) at the same voltage of
minVcc. Note that the cycle time is 2 times (2.times.) at 600MHz,
and 3.times. at 400 MHz, as compared with 1.2 GHz. Consequently,
the buffer sizes can made as large as 2.times. and 3.times. at
lower frequencies. Similarly, integrated graphics (GT) may run at
frequencies from 400 MHz to 100 MHz at minVcc, which may permit up
to 4.times. increase in buffer entries.
[0036] An example of an algorithm that can be used to
increase/decrease size of a reconfigurable structure is as follows:
[0037] If the frequency of a unit (core/uncore/GT/etc.) is to be
reduced due to thermal limits, and if the load (utilization) of the
unit is high, and if structure S utilization is above a
"water-mark" (high threshold), increase the size of the structure
S. [0038] Else if the frequency of a unit is to be increased due to
relaxation of thermal limits, and if the frequency is to be
increased beyond a frequency that a current size of S can support,
decrease the size of the S.
[0039] When decreasing the size of the structure S, it may be
possible to discard contents of additional entries, e.g., in cases
where the contents are read-only (e.g., in a TLB or I-Cache).
However, in some cases, it may be necessary to write back the data
in the additional buffers before they are disabled (e.g., in a
D-cache victim buffer).
[0040] FIG. 2 is a block diagram of a reconfigurable structure 200,
according to an embodiment of the present invention. A memory
structure 202 may include a plurality of SRAM cells that are
accessible via corresponding bit lines and word lines. Auxiliary
memory 204 may be reversibly coupled to the memory structure 202 by
activation of an enable switch 206. In an embodiment, the enable
switch 206 may include tri-state buffers to reversibly couple bit
lines of the auxiliary memory 204 to corresponding bit lines of the
memory structure 202. The enable switch 206 may be activated by,
e.g., application of an enable signal to an enable line 208 by
frequency dependent control logic, such as the frequency dependent
control logic 132 of FIG. 1. The enable switch 206 may be activated
responsive to a reduction in frequency of operation of a core that
accesses the memory structure. For example, a power control unit
such as the PCU 130 of FIG. 1, may determine that the core is to
operate at minVcc and at a frequency f that is less than a
frequency f.sub.2 (the maximum frequency at minVcc), and the
frequency dependent control logic may enable access by the core to
the auxiliary memory 206 via the enable switch 206. In other
embodiments, word lines of the auxiliary memory 204 may also be
switchable, e.g., by other tri-state buffers (not shown).
[0041] FIG. 3 is a graph 300 of frequency of operation f versus
operating voltage Vcc of a core, according to embodiments of the
present invention. In a first region 302 there is a direct
relationship between Vcc and f. If Vcc is reduced to minVcc (e.g.,
in order to comply with TDP requirements), in a second region 304
there is a range of frequencies f.sub.1.ltoreq.f.ltoreq.f.sub.2 for
which the core is operable. The core may not be operable below
Vcc=minVcc, or at a frequency less than f.sub.1. Within the
frequency range f.sub.1.ltoreq.f.ltoreq.f.sub.2 operation of the
core may support additional storage (more entries in structures)
and/or additional decision logic (e.g., power gating logic) as
compared with operation in the first region 302.
[0042] FIG. 4 is a block diagram of an apparatus 400, according to
an embodiment of the present invention. The apparatus 400 may
include a processor 402 and a system memory 460, such as a dynamic
random access memory (DRAM). The processor 402 may include one or
more cores 404.sub.0, 404.sub.1, . . . 404.sub.n. The core
402.sub.0 may include decision logic 406 and gating logic 408. The
processor 402 may include an uncore 420 that may include a power
control unit (PCU) 430. The uncore 420 may also include a shared
cache 440 and one or more interfaces 450.sub.0, 450.sub.1, . . .
450.sub.n to interface with, e.g., input/output (I/O) devices (not
shown).
[0043] In operation, the core 404.sub.0 may operate at an operating
voltage Vcc that may be between minimum operating voltage minVcc
and a maximum operating voltage maxVcc, and at an operating
frequency f between a first frequency f.sub.1 (minimum frequency)
and a second frequency f.sub.max, (maximum frequency). In an
example, the core operates at frequency f.sub.max when the
operating voltage is maxVcc. At the operating voltage minVcc, the
core can operate within a frequency range between the minimum
frequency of operation f.sub.1 and a second frequency f.sub.2 that
is less than f.sub.max.
[0044] Operation parameters (e.g., Vcc and f) of the core 404.sub.0
may be controlled by the PCU 430, which may determine the Vcc and f
for the core 404.sub.0 based at least in part on a thermal budget
(e.g., thermal design point (TDP)) associated with the core
404.sub.0. The PCU 430 may also determine Vcc and f based in part
on an anticipated load, e.g., number and size of instructions to be
executed by the core 404.sub.0 during a particular time period.
[0045] In order to reduce power consumed by the core 404.sub.0, Vcc
may be reduced to a threshold voltage, such as minVcc. The
frequency of operation at Vcc may be selected to be within
f.sub.1.ltoreq.f.ltoreq.f.sub.2. At a frequency that is less than
f.sub.max, an ample timing margin may permit decision logic 406 to
determine whether a particular logic portion of the core will be
used in execution of a given operation, and may output a disable
signal upon a determination that the particular logic portion will
not be used in execution of the given operation. The decision logic
406 may provide input, e.g., a gating signal, to the gating logic
408. Another input 410 (V.sub.thresh indicator) may be provided by
the PCU 430 to enable/disable the gating logic 408. For example, if
V.sub.thresh is approximately equal to minVcc and if Vcc=minVcc,
the gating logic 408 may be enabled to gate power to a portion of
the core 404.sub.0 based on the input from the decision logic
406.
[0046] FIG. 5 is a block diagram of a gating logic 500, according
to an embodiment of the present invention. The gating logic 500 may
be situated within a core of a processor, such as the gating logic
408 in the core 404.sub.0 of processor 402 of FIG. 4. The gating
logic 500 may include OR gate 506 and AND gate 508. In operation,
an indication 502 of whether the core is operating in a low
frequency range may be received from, e.g., a PCU of the processor.
For example, the core may be placed at operating voltage Vcc that
is less than a threshold voltage V.sub.thresh, (e.g., Vcc=minVcc)
by the PCU, and may therefore be operating at a frequency f that is
less than f.sub.max. When Vcc>V.sub.thresh, the input 502 has a
value of 1, which indicates that the core is not in the low
frequency range f.sub.1.ltoreq.f.ltoreq.f.sub.2. Therefore, an
enable signal 504 (from, e.g., decision logic) will not gate
inclock 512, e.g., the clock speed of the core and outclock 510
will be the same as inclock 512. Outclock 512 can be input to one
or more particular portions of the core, causing the particular
portions to be operable (e.g., powered up) while a next operation
(e.g., in an instruction queue) is executed. When
Vcc.ltoreq.V.sub.thresh (e.g., Vcc=minVcc), the core operates in
the low frequency range (e.g., f.sub.1.ltoreq.f.ltoreq.f.sub.2) and
the input 502 (.about.low freq. range=NOT low frequency range) has
a value of 0. The output 510 will be gated by the value of the
input 504 that is determined by decision logic, such as the
decision logic 406 of FIG. 4. The decision logic 406 may determine
that particular portions of the core may be powered down during
execution of a particular operation, and the enable signal 504 may
gate the outclock signal 510 causing selected logic portions of the
core to power down. For example, as a result of lower operating
frequency input 502 is 0 and the enable signal 504 gates the
inclock 512. Depending on the value of the enable signal 504, the
outclock 510 may power down selected portion(s) of the core. The
enable signal 504 may be based on which operation is being
executed. In an embodiment, the decision logic 406 may power-down a
particular logic path of the core during execution of a first
operation by the core, and may power-up the particular logic path
during execution of a second operation by the core.
[0047] FIG. 6 is a flow diagram of a method 600 of increasing
processor performance at low operating frequencies, according to
embodiments of the present invention. The method begins at block
602. Proceeding to block 604, operating voltage Vcc of a core is
reduced to a threshold voltage V.sub.thresh. Advancing to block
606, frequency of operation of the core is reduced. For example,
for Vcc=minVcc, the operating frequency may be a value between a
minimum frequency f.sub.1 and a highest allowed frequency f.sub.2
at minVcc, e.g., f.sub.1.ltoreq.f.ltoreq.f.sub.2 to comply with a
thermal budget (TDP) allotted to the core. Moving to decision block
608, if the thermal budget for the core is exceeded, returning to
block 606 the operating frequency f of the core is reduced.
Continuing to block 610, ancillary logic (e.g., gating logic or
auxiliary storage) is coupled to the core. The gating logic, upon
being coupled to the core, may gate power to one or more portions
of the core. The auxiliary storage, upon being coupled to the core,
is accessible to the core. The method ends at 612.
[0048] Referring now to FIG. 7, shown is a block diagram of a
processor in accordance with an embodiment of the present
invention. As shown in FIG. 7, processor 700 may be a multicore
processor including a plurality of cores 710.sub.a-710.sub.n. In
one embodiment, each such core may be of an independent power
domain and can be configured to operate at an independent voltage
and/or frequency, and to enter turbo mode when available headroom
exists. The various cores may be coupled via an interconnect 715 to
a system agent or uncore 720 that may include various components.
As seen, the uncore 720 may include a shared cache 730, which may
be a last level cache. In addition, the uncore 720 may include an
integrated memory controller 740, various interfaces 750, and a
power control unit (PCU) 755. In the embodiment of FIG. 7, the
power control unit 755 may be configured to determine a frequency
and an operating voltage for a particular core of the cores
710a-710n. The PCU 755 may include frequency dependent control
logic 758 to couple ancillary logic (e.g., enable access to
auxiliary storage or to enable gating logic to gate power to
portions of the core) responsive to the frequency at which a
particular core is operating being in a range that is less than or
equal to a frequency f.sub.2 (e.g., highest operating frequency at
minVcc), according to embodiments of the present invention. Also
shown in FIG. 7 are voltage regulators 770a-770.sub.n to regulate
power supplied to the cores 710a-710n based on input received from
the Power Control Unit (PCU) 755. Also shown in FIG. 7 are clock
control units 780a-780n to provide the respective clock frequency
to the respective core 710a-710n.
[0049] With further reference to FIG. 7, processor 700 may
communicate with a system memory 760, e.g., via a memory bus. In
addition, by interfaces 750, connection can be made to various
off-chip components such as peripheral devices, mass storage and so
forth. While shown with this particular implementation in the
embodiment of FIG. 7, the scope of the present invention is not
limited in this regard.
[0050] Referring now to FIG. 8, shown is a block diagram of a
multi-domain processor in accordance with another embodiment of the
present invention. As shown in the embodiment of FIG. 8, processor
800 includes multiple domains. Specifically, a core domain 810 can
include a plurality of cores 810.sub.0-810.sub.n, a graphics domain
820 can include one or more graphics engines, and a system agent
domain 850 may further be present. Note that additional domains can
be present in other embodiments. For example, multiple core domains
may be present each including at least one core.
[0051] In general, each core 810 may further include low level
caches in addition to various execution units and additional
processing elements. The various cores may be coupled to each other
and to a shared cache memory formed of a plurality of units of a
lower level cache (LLC) 840.sub.0-840.sub.n. In various
embodiments, LLC 840.sub.0-840.sub.n may be shared amongst the
cores and the graphics engine, as well as various media processing
circuitry. As seen, a ring interconnect 830 couples the cores
together, and provides interconnection between the cores 810,
graphics domain 820, and system agent circuitry 850.
[0052] As further seen, system agent domain 850 may include a power
control unit (PCU) 856 to perform power management operations for
the processor. In the embodiment of FIG. 8, the power control unit
856 can include frequency dependent control logic 857. Responsive
to a Vcc of a core being set to value less than a threshold value
V.sub.thresh (e.g., the core is to operate at a frequency f less
than f.sub.max.) PCU 856 may enable access by the core to auxiliary
storage, or may enable gating of power to portions of the core, in
accordance with embodiments of the present invention.
[0053] As further seen in FIG. 8, processor 800 can further include
an integrated memory controller (IMC) 870 that can provide for an
interface to a system memory, such as a dynamic random access
memory (DRAM). Multiple interfaces 880.sub.0-880.sub.n may be
present to enable interconnection between the processor and other
circuitry. For example, in one embodiment at least one direct media
interface (DMI) interface may be provided as well as one or more
Peripheral Component Interconnect Express (PCI Express.TM.
(PCIe.TM.)) interfaces. Still further, to provide for
communications between other agents such as additional processors
or other circuitry, one or more interfaces in accordance with the
QPI.sup.TM protocol may also be provided. Although shown at this
high level in the embodiment of FIG. 8, understand the scope of the
present invention is not limited in this regard.
[0054] Embodiments may be implemented in many different system
types. Referring now to FIG. 9, shown is a block diagram of a
system in accordance with an embodiment of the present invention.
As shown in FIG. 9, multiprocessor system 900 is a point-to-point
interconnect system, and includes a first processor 970 and a
second processor 980 coupled via a point-to-point interconnect 920.
As shown in FIG. 9, each of processors 970 and 980 may be multicore
processors, including first and second processor cores (e.g.,
processor cores 974a and 974b, and processor cores 984a and 984b),
although potentially many more cores may be present in the
processors. Each of the processors 970 and 980 may include a PCU
(940 and 950, respectively). Each of the PCUs 940 and 950 may
include frequency dependent control logic 942 and 952 respectively,
in accordance with embodiments of the present invention. Each PCU
940 and 950 may provide, responsive to a frequency reduction in a
particular core of the respective processor, access to auxiliary
storage or gating of power to portions of the core by decision
logic, in accordance with embodiments of the present invention.
[0055] Still referring to FIG. 9, first processor 970 further
includes a memory controller hub (MCH) 972 and point-to-point (P-P)
interfaces 976 and 978. Similarly, second processor 980 includes a
MCH 982 and P-P interfaces 986 and 988. As shown in FIG. 9, MCHs
972 and 982 couple the processors to respective memories, namely a
memory 932 and a memory 934, which may be portions of system memory
(e.g., DRAM) locally attached to respective processors. First
processor 970 and second processor 980 may be coupled to a chipset
990 via P-P interconnects 962 and 984, respectively. As shown in
FIG. 9, chipset 990 includes P-P interfaces 994 and 998.
[0056] Furthermore, chipset 990 includes an interface 992 to couple
chipset 990 with a high performance graphics engine 938 by a P-P
interconnect 939. In turn, chipset 990 may be coupled to a first
bus 916 via an interface 996. As shown in FIG. 9, various
input/output (I/O) devices 914 may be coupled to first bus 916,
along with a bus bridge 918 which couples first bus 916 to a second
bus 920. Various devices may be coupled to the second bus 920
including, for example, a keyboard/mouse 922, communication devices
926 and a data storage unit 928 such as a disk drive or other mass
storage device, in one embodiment. Further, an audio I/O 924 may be
coupled to second bus 920. Embodiments can be incorporated into
other types of systems including mobile devices such as a smart
cellular telephone, tablet computer, netbook, Ultrabook.TM., or so
forth.
[0057] FIG. 10 is a block diagram of a system on a chip (SOC) in
accordance with embodiments of the present invention. The SOC 1000
includes a multicore subsystem 1010, a modem subsystem 1020, a
multimedia subsystem 1030, system fabric 1040, a power control unit
1050, and interfaces 1060 to interface with one or more external
devices. The SOC 1000 may perform multiple tasks concurrently,
e.g., modem tasks, multimedia tasks, and other processing
tasks.
[0058] The multicore subsystem 1010 includes multicore processors
1012 and 1014, L1 caches 1016 and 1018, and L2 cache 1042. Each of
the multicore processors 1012 and 1014 may include a corresponding
PCU 1013 and 1015 that may include frequency dependent control
logic (not shown). The PCU 1013 and 1015 may, responsive to a
reduction in frequency of a core, enable the access by the core to
ancillary logic such as auxiliary storage, or may enable decision
logic to gate power to portions of a core, according to embodiments
of the present invention.
[0059] The modem subsystem 1020 may include a Long Term Evolution
(LTE) modem 1022 for wireless communication of high speed data. The
modem subsystem 1020 may also include a global positioning system
(GPS) 1024, and two (or more) digital signal processor (DSP) cores
1026 and 1028.
[0060] The multimedia subsystem 1030 may include a graphics
processing unit (GPU) 1032, audio/video hardware accelerators 1034,
a digital signal processing core 1036, and an MMX processor 1038,
which may be capable of processing, e.g., single instruction
multiple data (SIMD) instructions.
[0061] The following examples pertain to further embodiments.
[0062] In a first example, a processor may include one or more
cores including a first core to operate at an operating voltage
between a minimum operating voltage and a maximum operating
voltage. The processor may also include a power control unit
including first logic to enable coupling of ancillary logic to the
first core responsive to the operating voltage having a value less
than or equal to a threshold voltage (V.sub.thresh), and to disable
the coupling of the ancillary logic to the first core responsive to
the operating voltage being greater than V.sub.thresh. In an
embodiment, V.sub.thresh is approximately equal to the minimum
operating voltage. In another embodiment, the ancillary logic
includes an auxiliary memory. In another embodiment, the first
logic includes at least one tri-state buffer switch. In another
embodiment, the processor further includes a first memory coupled
to the first core, and the at least one tri-state buffer switch is
operable to couple the auxiliary memory to the first core by
coupling the auxiliary memory to the first memory. In another
embodiment, at least one tri-state buffer switch is operative to
reversibly couple a first bit line of the first memory to a first
bit line of the auxiliary memory. In another embodiment, at least
one tri-state buffer switch is operative to reversibly enable
access by the first core to a word line of the auxiliary memory. In
another embodiment, the ancillary logic includes decision logic to
gate power to a portion of the first core. In an embodiment, the
decision logic is to determine whether to power down the portion of
the first core during execution of a particular operation based on
whether the portion of the first core is to be used during
execution of the particular operation. In another embodiment, the
power control unit further includes second logic to determine the
operating voltage and a frequency f at which to operate the first
core based at least in part on a thermal budget associated with the
first core.
[0063] In a second example, a system includes a processor that
includes at least one core including a first core to operate at an
operating voltage between a minimum operating voltage and a maximum
operating voltage, and switching logic to engage ancillary logic
responsive to the operating voltage having a value less than or
equal to a threshold voltage (V.sub.thresh). The system also
includes a dynamic random access memory (DRAM) coupled to the
processor. In an embodiment, the ancillary logic includes an
auxiliary memory that is engaged by coupling the first core to the
auxiliary memory. In another embodiment, a first memory is coupled
to the processor and the switching logic is to reversibly couple
the first core to the auxiliary memory by coupling the auxiliary
memory to the first memory. In an embodiment, the switching logic
includes at least one tri-state buffer switch. In an embodiment,
V.sub.thresh is approximately equal to the minimum operating
voltage. In another embodiment, the switching logic is to engage
the ancillary logic responsive to the operating voltage being
approximately equal to the minimum operating voltage and responsive
to an operating frequency f of the core having a value
f.sub.1.ltoreq.f<f.sub.2, where f.sub.1 is a minimum operating
frequency of the core at the minimum operating voltage and f.sub.2
is a maximum operating frequency of the core at the minimum
operating voltage. Upon engagement, the ancillary logic is to gate
power to a portion of the core during execution of an operation
based on the operation being executed.
[0064] In a third example, a machine readable medium stores
instructions that, when executed by a processor, cause the
processor to determine whether a core is operating at an operating
voltage that is less than or equal to V.sub.thresh where
V.sub.thresh is a threshold voltage. Responsive to the operating
voltage being less than or equal V.sub.thresh, the instructions
cause the processor to couple ancillary logic to the core, and
responsive to the operating voltage being greater than
V.sub.thresh, the instructions cause the processor to uncouple the
ancillary logic from the core. In an embodiment, the ancillary
logic includes an auxiliary memory. In another embodiment, upon
engagement the ancillary logic is to gate power to a portion of the
core during execution of an operation based on the operation being
executed. In an embodiment, V.sub.thresh is approximately equal to
a minimum operating voltage of the core. In another embodiment the
ancillary logic is to gate the power to the portion of the core
responsive to an operating frequency f of the core having a value
f.sub.1.ltoreq.f<f.sub.2, where f.sub.1 is a minimum operating
frequency of the core at the minimum operating voltage of the core
and f.sub.2 is a maximum operating frequency of the core at the
minimum operating voltage.
[0065] In a fourth example, a method includes determining whether a
core is operating at an operating voltage that is less than or
equal to V.sub.thresh, where V.sub.thresh is a threshold voltage.
Responsive to the operating voltage being less than or equal to
V.sub.thresh, the method includes coupling ancillary logic to the
core, and responsive to the operating voltage being greater than
V.sub.thresh, uncoupling the ancillary logic from the core. In an
embodiment, the ancillary logic includes an auxiliary memory. In
another embodiment, upon coupling the ancillary logic to the core,
the ancillary logic is to gate power to a portion of the core based
on an operation being executed by the core. In an embodiment,
V.sub.thresh is approximately equal to a minimum operating voltage
of the core. In another embodiment, the power is gated to the
portion of the core by the gating logic responsive to an operating
frequency f of the core having a value f.sub.1.ltoreq.f<f.sub.2,
where f.sub.1 is a minimum operating frequency of the core at a
minimum operating voltage and f.sub.2 is a maximum operating
frequency of the core at the minimum operating voltage.
[0066] Embodiments may be used in many different types of systems.
For example, in one embodiment a communication device can be
arranged to perform the various methods and techniques described
herein. Of course, the scope of the present invention is not
limited to a communication device, and instead other embodiments
can be directed to other types of apparatus for processing
instructions, or one or more machine readable media including
instructions that in response to being executed on a computing
device, cause the device to carry out one or more of the methods
and techniques described herein.
[0067] Embodiments may be implemented in code and may be stored on
a non-transitory storage medium having stored thereon instructions
which can be used to program a system to perform the instructions.
The storage medium may include, but is not limited to, any type of
disk including floppy disks, optical disks, solid state drives
(SSDs), compact disk read-only memories (CD-ROMs), compact disk
rewritables (CD-RWs), and magneto-optical disks, semiconductor
devices such as read-only memories (ROMs), random access memories
(RAMs) such as dynamic random access memories (DRAMs), static
random access memories (SRAMs), erasable programmable read-only
memories (EPROMs), flash memories, electrically erasable
programmable read-only memories (EEPROMs), magnetic or optical
cards, or any other type of media suitable for storing electronic
instructions.
[0068] While the present invention has been described with respect
to a limited number of embodiments, those skilled in the art will
appreciate numerous modifications and variations therefrom. It is
intended that the appended claims cover all such modifications and
variations as fall within the true spirit and scope of this present
invention.
* * * * *