U.S. patent application number 11/071944 was filed with the patent office on 2005-10-06 for performance data access.
Invention is credited to Friedenbach, John W., Jardine, Robert L., Smullen, James R., Stott, Graham B..
Application Number | 20050223275 11/071944 |
Document ID | / |
Family ID | 35346428 |
Filed Date | 2005-10-06 |
United States Patent
Application |
20050223275 |
Kind Code |
A1 |
Jardine, Robert L. ; et
al. |
October 6, 2005 |
Performance data access
Abstract
Performance data access is described. In an embodiment, events
are processed with non-synchronized processor elements of a logical
processor in a redundant processor system. Performance data
associated with execution of the processor events is stored in one
or more accumulators corresponding to a respective processor
element. The performance data from each of the non-synchronized
processor elements is exchanged via a logical synchronization unit
such that each processor element includes the performance data from
each of the processor elements. Each processor element then
conforms the performance data to generate synchronized performance
data which is then communicated to a performance monitoring
application that requests the performance data from the logical
processor.
Inventors: |
Jardine, Robert L.;
(Cupertino, CA) ; Smullen, James R.; (Carmel,
CA) ; Stott, Graham B.; (Dublin, CA) ;
Friedenbach, John W.; (Santa Clara, CA) |
Correspondence
Address: |
HEWLETT PACKARD COMPANY
P O BOX 272400, 3404 E. HARMONY ROAD
INTELLECTUAL PROPERTY ADMINISTRATION
FORT COLLINS
CO
80527-2400
US
|
Family ID: |
35346428 |
Appl. No.: |
11/071944 |
Filed: |
March 4, 2005 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60557812 |
Mar 30, 2004 |
|
|
|
Current U.S.
Class: |
714/11 ;
714/E11.195; 714/E11.202; 714/E11.204 |
Current CPC
Class: |
G06F 11/1687 20130101;
G06F 11/1641 20130101; G06F 11/3495 20130101; G06F 11/165 20130101;
G06F 11/185 20130101; G06F 11/1645 20130101; G06F 11/1683 20130101;
G06F 11/3476 20130101; G06F 11/366 20130101; G06F 11/3636 20130101;
G06F 2201/88 20130101; G06F 11/184 20130101; G06F 11/3404 20130101;
G06F 11/1658 20130101; G06F 9/52 20130101 |
Class at
Publication: |
714/011 |
International
Class: |
G06F 011/00 |
Claims
1. A redundant processor system, comprising: non-synchronized
processor elements of a logical processor, each processor element
configured to process events and store performance data associated
with execution of the processor events, each processor element
including one or more accumulators configured to maintain the
performance data corresponding to a respective processor element;
performance monitoring logic configured to request the performance
data from the logical processor; and a logical synchronization unit
configured to exchange the performance data from each of the
non-synchronized processor elements and return synchronized
performance data to the performance monitoring logic, the
synchronized performance data being generated by the processor
elements.
2. A redundant processor system as recited in claim 1, wherein each
of the processor elements are further configured to conform the
performance data exchanged from each of the processor elements to
generate the synchronized performance data.
3. A redundant processor system as recited in claim 2, wherein each
of the processor elements are further configured to average the
performance data exchanged from each of the processor elements to
generate the synchronized performance data.
4. A redundant processor system as recited in claim 2, wherein each
of the processor elements are further configured to conform the
performance data exchanged from each of the processor elements
based on a deterministic algorithm to generate the synchronized
performance data.
5. A redundant processor system as recited in claim 2, wherein each
of the processor elements are further configured to select the
performance data from a particular processor element to generate
the synchronized performance data, the selected performance data
being at least one of a minimum, a middle, or a maximum of the
performance data exchanged from each of the processor elements.
6. A redundant processor system as recited in claim 1, wherein: a
first time duration of a processor event is stored as performance
data in a first accumulator of a first processor element; a second
time duration of the processor event is stored as performance data
in a second accumulator of a second processor element; a third time
duration of the processor event is stored as performance data in a
third accumulator of a third processor element; and the logical
synchronization unit is further configured to receive the first
time duration, the second time duration, and the third time
duration, and exchange the time durations with each of the
processor elements.
7. Non-synchronized processors of a multiple redundant processor
system each configured to maintain and update performance data
associated with executing processor events, the performance data
stored in one or more accumulators of a respective non-synchronized
processor, and the performance data from each of the
non-synchronized processors being conformed as synchronized
performance data after being exchanged via a logical
synchronization unit in response to a request for the performance
data from a performance monitoring application.
8. Non-synchronized processors as recited in claim 7, wherein each
of the non-synchronized processors are further configured to
conform the performance data from each of the non-synchronized
processors after the performance data is exchanged via the logical
synchronization unit.
9. Non-synchronized processors as recited in claim 7, wherein each
of the non-synchronized processors are further configured to
average the performance data from each of the non-synchronized
processors after the performance data is exchanged via the logical
synchronization unit.
10. Non-synchronized processors as recited in claim 7, wherein a
time duration of a processor event is stored as the performance
data in an accumulator of the respective non-synchronized
processor.
11. Non-synchronized processors as recited in claim 7, wherein
counts for multiple executions of a repeated processor event are
stored as the performance data in an accumulator of the respective
non-synchronized processor.
12. Non-synchronized processors as recited in claim 7, wherein time
durations for multiple executions of a repeated processor event are
stored and updated as the performance data in an accumulator of the
respective non-synchronized processor.
13. Non-synchronized processors as recited in claim 7, wherein each
non-synchronized processor includes a clock, and wherein: a first
time is obtained from the clock at a beginning of a processor
event, and the first time is subtracted from an initial time stored
in an accumulator of the respective non-synchronized processor; a
second time is obtained from the clock after the processor event
has been executed by the non-synchronized processor; and the second
time is added to the accumulator such that a time difference
between the first time and the second time is a time duration of
the processor event that is maintained as the performance data in
the accumulator of the respective non-synchronized processor.
14. A method, comprising: processing events with non-synchronized
processor elements of a logical processor in a redundant processor
system; storing performance data associated with execution of the
processor events in one or more accumulators corresponding to a
respective processor element; exchanging the performance data such
that each of the processor elements includes the performance data
from each of the other non-synchronized processor elements;
conforming the performance data from each of the non-synchronized
processor elements to generate synchronized performance data; and
communicating the synchronized performance data to a performance
monitoring application that requests the performance data from the
logical processor.
15. A method as recited in claim 14, wherein each of the processor
elements conform the performance data exchanged from each of the
processor elements to generate the synchronized performance
data.
16. A method as recited in claim 14, wherein conforming the
performance data includes each of the processor elements averaging
the performance data to generate the synchronized performance
data.
17. A method as recited in claim 14, wherein conforming the
performance data includes each of the processor elements using a
deterministic algorithm to conform the performance data to generate
the synchronized performance data.
18. A method as recited in claim 14, wherein conforming the
performance data includes each of the processor elements selecting
the performance data from a particular processor element, the
selected performance data being at least one of a minimum, a
middle, or a maximum of the performance data exchanged from each of
the processor elements.
19. A method as recited in claim 14, further comprising determining
a time duration of a processor event, and wherein storing the
performance data includes storing the time duration of the
processor event as the performance data.
20. A method as recited in claim 14, further comprising
accumulating counts for multiple executions of a repeated processor
event, and wherein storing the performance data includes storing
the counts of the repeated processor event as the performance
data.
21. A method as recited in claim 14, wherein storing the
performance data includes: storing a first time duration of a
processor event in a first accumulator of a first processor
element; storing a second time duration of the processor event in a
second accumulator of a second processor element; storing a third
time duration of the processor event in a third accumulator of a
third processor element; and wherein conforming the performance
data includes conforming the first time duration, the second time
duration, and the third time duration to generate the synchronized
performance data.
22. A method as recited in claim 14, wherein communicating the
synchronized performance data includes communicating the
synchronized performance data to the performance monitoring
application in a remote computing device configured for
communication with the redundant processor system.
23. One or more computer readable media comprising computer
executable instructions that, when executed, direct a performance
data access system to: process events with non-synchronized
processor elements of a logical processor in a redundant processor
system; store performance data associated with execution of the
processor events in one or more accumulators corresponding to a
respective processor element; conform the performance data from
each of the non-synchronized processor elements to generate
synchronized performance data; and communicate the synchronized
performance data to a performance monitoring application that
requests the performance data from the logical processor.
24. One or more computer readable media as recited in claim 23,
further comprising computer executable instructions that, when
executed, direct the performance data access system to exchange the
performance data such that each of the non-synchronized processor
elements includes the performance data from each of the other
non-synchronized processor elements.
25. One or more computer-readable media as recited in claim 23,
further comprising computer executable instructions that, when
executed, direct the performance data access system to: store a
first time duration of a processor event in a first accumulator of
a first processor element; store a second time duration of the
processor event in a second accumulator of a second processor
element; store a third time duration of the processor event in a
third accumulator of a third processor element; and conform the
first time duration, the second time duration, and the third time
duration to generate the synchronized performance data.
Description
RELATED APPLICATION
[0001] This application claims priority to U.S. Provisional
Application Ser. No. 60/557,812 filed Mar. 30, 2004, entitled
"Nonstop Advanced Architecture", the disclosure of which is
incorporated by reference herein.
TECHNICAL FIELD
[0002] This invention relates to performance data access.
BACKGROUND
[0003] Multiple redundant processor systems are implemented as
fault-tolerant systems to prevent downtime, system outages, and to
avoid data corruption. A multiple redundant processor system
provides continuous application availability and maintains data
integrity such as for stock exchange systems, credit and debit card
systems, electronic funds transfers systems, travel reservation
systems, and the like. In these systems, data processing
computations can be performed on multiple, independent processing
elements of a processor system.
[0004] Processors in a multiple redundant processor system can be
loosely synchronized in a loose lock-step implementation such that
processor instructions are executed at slightly different times.
This loosely synchronized implementation provides that each of the
processors can execute the same instruction set faster than a
typical tight lock-step configuration because the processors are
not restricted to synchronized code execution. The performance of a
multiple redundant processor system can be monitored to determine
optimizations for software processing and for hardware
configurations, such as for cache management and configuration to
optimize cache hit rates.
[0005] When performance data is requested, such as the processing
time for a processor event, the loosely-synchronized processor
elements all execute the same instruction set in response to the
request, but may all return a different performance response
because the performance data is likely asymmetric (e.g., different
in each of the multiple processor elements). The different data
responses will appear as an error to the performance monitoring
application that has requested the data.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The same numbers are used throughout the drawings to
reference like features and components:
[0007] FIG. 1 illustrates an exemplary redundant processor system
in which an embodiment of performance data access can be
implemented.
[0008] FIG. 2 further illustrates various components of the
exemplary redundant processor system shown in FIG. 1.
[0009] FIG. 3 illustrates various components of an exemplary
redundant processor system in which an embodiment of performance
data access can be implemented.
[0010] FIG. 4 illustrates various components of an exemplary
redundant processor system in which an embodiment of performance
data access can be implemented.
[0011] FIG. 5 is a flow diagram that illustrates an embodiment of a
method for performance data access.
DETAILED DESCRIPTION
[0012] The following describes embodiments of performance data
access. Performance monitoring is implemented to obtain system
performance data from loosely-synchronized processor elements.
Examples of performance data for a redundant processor system
include time intervals for performing instruction sequences and
counts of various processor events.
[0013] Although embodiments of performance data access may be
implemented in various redundant processor systems, performance
data access is described with reference to the following processing
environment.
[0014] FIG. 1 illustrates an example of a redundant processor
system 100 in which embodiment(s) of performance data access can be
implemented. The redundant processor system 100 includes a
processor complex 102 which has processor groups 104(1-3). Each
processor group 104 includes any number of processor elements which
are each a microprocessor that executes, or processes, computer
executable instructions. For example, Processor group 104(1)
includes processor elements 106(1-N), processor group 104(2)
includes processor elements 108(1-N), and processor group 104(3)
includes processor elements 110(1-N).
[0015] Processor elements, one each from the processor groups
104(1-3), are implemented together as a logical processor 112(1-N).
For example, a first logical processor 112(1) includes processor
element 106(1) from processor group 104(1), processor element
108(1) from processor group 104(2), and processor element 110(1)
from processor group 104(3). Similarly, logical processor 112(2)
includes processor elements 106(2), 108(2), and 110(2), while
logical processor 112(3) includes processor elements 106(3),
108(3), and 110(3). In an alternate embodiment, a logical processor
112 may be implemented to include only two processor elements 106.
For example, a processor complex may be implemented with two
processor groups such that each logical processor includes two
processor elements, one from each of the two processor groups.
[0016] In the example shown in FIG. 1, the three processor elements
combine to implement a logical processor 112 and cooperate to
perform the computations of the logical processor 112. Logical
computations for an input/output operation or an interprocessor
communication are executed separately three times in a logical
processor 112, once each in the three processor elements of the
logical processor 112. Additionally, the three processor elements
in a logical processor 112 can coordinate and synchronize with each
other to exchange data, replicate input data, and vote on
input/output operations and communication outputs.
[0017] Each processor group 104(1-3) has an associated memory
component 114(1-3), respectively. A memory component 114 can be
implemented as any one or more memory components, examples of which
include random access memory (RAM), DRAM, SRAM, a disk drive, and
the like. Although the memory components 114(1-3) are illustrated
as independent components, each processor group 104 can include a
respective memory component 114 as an integrated component in an
alternate embodiment.
[0018] In this example, processor complex 102 is a triplex
redundant processor system having triple modular redundancy in that
each logical processor 112 includes three redundant processor
elements. To maintain data integrity, a faulty processor element
can be replaced and reintegrated into the system while the
redundant processor system 100 remains on-line without a loss of
processing capability. Similarly, in an alternate embodiment, a
duplex redundant processor system has dual modular redundancy in
that each logical processor includes two redundant processor
elements.
[0019] The processor elements of a logical processor 112 are
loosely synchronized in a loose lock-step implementation such that
instructions may be executed, or processed, in each of the
processor elements at a slightly different time. This
implementation provides that the logical processors can execute
instructions faster than a typical tight lock-step configuration
because the processor elements and logical processors 112 are not
restricted to synchronized code execution. This implementation also
provides for non-deterministic execution among the processor
elements in a logical processor, such as non-deterministic branch
prediction, cache replacement algorithms, and the like. The
individual processor elements can also perform independent error
recovery without losing synchronization with the other processor
elements.
[0020] FIG. 2 further illustrates various components 200 of the
redundant processor system 100 shown in FIG. 1. The processor
elements 106(1-N) of processor group 104(1) are shown, one each of
a respective logical processor 112(1-N). Each processor element
106(1-N) is associated with a respective memory region 202(1-N) of
the memory component 114(1) for data storage. The memory component
114(1) associated with processor group 104(1) is partitioned among
the processor elements 106(1-N) of the processor group 104(1). In
an alternate embodiment, each memory region 202(1-N) can be
implemented as an independent, separate memory for data storage.
Although not shown, the processor elements 108(1-N) of processor
group 104(2) are each associated with a respective partitioned
memory region of the memory component 114(2). Similarly, the
processor elements 110(1-N) of processor group 104(3) are each
associated with a respective partitioned memory region of the
memory component 114(3).
[0021] Each of the logical processors 112(1-N) correspond to one or
more respective logical synchronization units 204(1-N). A logical
synchronization unit 204 performs various rendezvous operations for
an associated logical processor 112 to achieve agreements on data
synchronization between the processor elements that cooperate to
form a logical processor 112. For example, input/output operations
and/or interprocessor communications can be communicated from each
processor element of a logical processor 112 to an associated
logical synchronization unit 204 to compare and vote on the
input/output operations and/or interprocessor communications
generated by the processor elements. Logical synchronization units
and rendezvous operations are described in greater detail in U.S.
patent application Ser. No. ______, which is Attorney Docket No.
200316143-1 entitled "Method and System of Executing User Programs
on Non-Deterministic Processors" filed Jan. 25, 2005, to Bernick et
al., the disclosure of which is incorporated by reference herein
for the purpose of implementing performance data access.
[0022] A rendezvous operation may further be implemented by a
logical synchronization unit 204 to exchange state information
and/or data among the processor elements of a logical processor 112
to synchronize operations and responses of the processor elements.
For example, a rendezvous operation may be implemented such that
the processor elements deterministically respond to incoming
asynchronous interrupts, to accommodate varying processing rates of
the processor elements, to exchange software state information when
performing operations that are distributed across the processor
elements, and the like.
[0023] FIG. 3 illustrates various components of an exemplary
redundant processor system 300 in which an embodiment of
performance data access can be implemented. The redundant processor
system 300 includes multiple logical processors and associated
logical synchronization units as described with reference to the
redundant processor system 100 shown in FIGS. 1 and 2. For
illustration, however, only one logical processor 302 and one
associated logical synchronization unit 304 is shown in FIG. 3. The
logical synchronization unit 304 may be implemented as described
with reference to the logical synchronization units 204 shown in
FIG. 2.
[0024] In this example, logical processor 302 includes processor
elements 306(1-3) which are each a microprocessor that executes, or
processes, computer executable instructions. The redundant
processor system 300 includes the memory components 114(1-3) that
are each associated with a respective processor group 104(1-3) as
shown in FIG. 1. Each of the processor elements 306(1-3) are one of
the processor elements in a respective processor group, and each
processor element 306 is associated with a partitioned memory
region 308 in a respective memory component 114(1-3). For example,
processor element 306(1) corresponds to memory region 308(1) in
memory component 114(1), processor element 306(2) corresponds to
memory region 308(2) in memory component 114(2), and processor
element 306(3) corresponds to memory region 308(3) in memory
component 114(3).
[0025] The memory regions 308(1-3) form a logical memory 310 that
corresponds to logical processor 302. The processor elements
306(1-3) of the logical processor 302 each correspond to a
respective partitioned memory region 308(1-3) of the logical memory
310. In practice, a logical processor 302 can communicate with a
corresponding logical memory 310 via an input/output bridge memory
controller (not shown).
[0026] The memory components 114(1-3) each include an instantiation
of performance monitoring logic 312(1-3) that corresponds to a
respective processor element 306(1-3) of the logical processor 302.
Each of the processor elements 306(1-3) can execute the performance
monitoring logic 312 to implement performance data access. In this
example, the performance monitoring logic 312(1-3) is maintained by
the memory components 114(1-3) as a software application.
[0027] As used herein, the term "logic" (e.g., the performance
monitoring logic 312) can also refer to hardware, firmware,
software, or any combination thereof that may be implemented to
perform the logical operations associated with performance data
access. Logic may also include any supporting circuitry utilized to
complete a given task including supportive non-logical operations.
For example, logic may also include analog circuitry, memory
components, input/output (I/O) circuitry, interface circuitry,
power providing/regulating circuitry, and the like.
[0028] Each of the processor elements 306(1-3) of logical processor
302 include a high-frequency clock 314, a cache memory 316, and one
or more accumulators 318, respectively. For illustration, only the
clock 314, cache memory 316, and accumulator(s) 318 for processor
element 306(1) are shown. The description of the processor element
components, however, applies to each processor element 306(1-3).
The one or more accumulators 318 of a processor element 306 can be
implemented as memory to store, update, and/or maintain performance
data corresponding to a respective processor element 306.
[0029] The performance monitoring logic 312(1-3) implements
performance data access such that system performance data can be
obtained from the non-synchronized processor elements 306(1-3) of
the logical processor 302. The performance of the processor
elements 306(1-3) can be monitored for time durations to execute
processor events, such as a procedure, and for any number of other
operational features, such as cache hit rates, interrupt handling,
and the like. While the non-synchronized processor elements
306(1-3) all execute the same instruction set (e.g., a processor
event or procedure), each may return a different performance
response and the corresponding performance data is likely
asymmetric (e.g., different in each of the multiple processor
elements 306).
[0030] The different performance data responses from each of the
processor elements 306(1-3) may appear as an error when the data is
compared by the logical synchronization unit 304, such as when an
output operation of the performance data response is performed. The
different performance data responses may also appear as an error if
the performance monitoring logic 312 makes a decision based on that
data and branches two (or three) different directions causing
different action sequences that can be detected by the logical
synchronization unit 304.
[0031] In an embodiment of performance data access, the performance
data requested by the performance monitoring logic 312 can be
exchanged via a rendezvous operation with the logical
synchronization unit 304 such that the performance monitoring logic
312 receives consistent data from the processor elements 306(1-3).
For example, a procedure may take 6.3 microseconds for processor
element 306(1) to execute, 6.4 microseconds for processor element
306(2) to execute, and 5.9 microseconds for processor element
306(3) to execute. The time duration for each processor element 306
to execute the procedure can be stored in an accumulator 318 for
each respective processor element 306(1-3).
[0032] When the performance data for each of the processor elements
306(1-3) is requested by the performance monitoring logic 312, the
logical synchronization unit 304 exchanges the performance data of
each of the processor elements such that each processor element has
a copy of all three processor elements' individual performance
measurement. For example, processor element 306(1) will have the
6.3 microseconds to execute the procedure, the 6.4 microseconds for
processor element 306(2) to execute the procedure, and the 5.9
microseconds for processor element 306(3) to execute the
procedure.
[0033] Each of the processor elements 306(1-3) then conform, or
synchronize, the performance data. In this example, the 6.3
microseconds, 6.4 microseconds, and 5.9 microseconds can be
averaged as 6.2 microseconds to execute the procedure. The
averaging operation is deterministic, and all three processor
elements 306(1-3) will arrive at the same answer of 6.2
microseconds. The average 6.2 microseconds is then returned to the
performance monitoring logic 312 as the synchronized performance
data.
[0034] Other conforming operations or algorithms can be implemented
to synchronize the performance data from the multiple processor
elements 306(1-3). For example, the processor elements 306(1-3) can
select a performance measurement from any one of the processor
elements 306(1-3), such as the minimum performance measurement, the
middle performance measurement, or the maximum performance
measurement corresponding to a particular processor element 306.
Alternatively, the processor elements 306(1-3) can discard the
performance data value that is the farthest from the other two, and
then average the two remaining performance data values (e.g., for a
system with triple modular redundancy), or any other form of a
deterministic algorithm can be implemented.
[0035] Alternatively, each processor element 306(1-3) can replicate
the performance measurements from the other processor elements 306.
For example, prior to the logical synchronization unit 304 exchange
of data, processor element 306(1) will have value A, processor
element 306(2) will have value B, and processor element 306(3) will
have value C. After the data exchange, each processor element
306(1-3) will have all three values A, B, and C which are
replicated as if each processor element generated the performance
data three times rather than just the one time.
[0036] In an implementation, the time duration of a processor event
can be determined by obtaining a first time from a clock 314 of the
respective processor element 306 at the beginning of a processor
event, and subtracting the first time from an accumulator 318 of
the processor element 306. A second time can be obtained from the
clock 314 after the processor event has been executed by the
processor element. The second time is then added to the accumulator
318 such that a time difference between the first time and the
second time is the time duration of the processor event. The time
duration is maintained in the accumulator 318 as the performance
data.
[0037] For multiple performance data requests, alternate
embodiments of performance data access can be implemented if it is
not practicable to conform each individual performance data
measurement of the processor elements 306(1-3). For example, the
processor time required to accomplish each individual exchange and
conforming operation may not be available within the implementation
constraints of a redundant processor system.
[0038] In another embodiment of performance data access, the
performance data is accumulated, or aggregated, in the accumulators
318 for the respective processor elements 306(1-3). For example,
time durations for multiple executions of a repeated processor
event can be stored and updated as the performance data in the
accumulators 318 of each respective processor element 306(1-3). A
procedure may be executed as a processor event multiple times by
each of the processor elements 306(1-3). For a procedure that is
executed ten-thousand times, and which takes on average 3
microseconds to execute, the accumulated time duration would be
approximately 30 milliseconds. An accumulator 318 for processor
element 306(1) can have stored performance data of 31.5
milliseconds, an accumulator 318 for processor element 306(2) can
have stored performance data of 32.3 milliseconds, and an
accumulator 318 for processor element 306(3) can have stored
performance data of 29.7 milliseconds.
[0039] When the performance data for each of the processor elements
306(1-3) is requested by the performance monitoring logic 312, the
logical synchronization unit 304 exchanges the data and an average
(or other conforming operation) of the performance data for each
processor element 306(1-3) is synchronized. In this example, an
average 3.15 microseconds for processor element 306(1), an average
3.23 microseconds for processor element 306(2), and an average 2.97
microseconds for processor element 306(3) can be averaged, or
conformed, to approximately 3.12 microseconds to execute the
procedure each time. The approximate 3.12 microseconds is then
returned to the performance monitoring logic 312 as the
synchronized performance data.
[0040] This embodiment of performance data access avoids the
extensive processing overhead of exchanging and conforming the
performance data for each individual measurement, and provides
performance data obtained for multiple processor events over a
duration of time. The asymmetric performance data is maintained by
the accumulators 318 in each respective processor element 306 such
that the performance monitoring logic 312 can not directly access
the performance data. Rather, the performance monitoring logic
interfaces with the processor elements 306(1-3) of the logical
processor 302 via application program interfaces (APIs) for
performance data access.
[0041] In an implementation of performance data access, code (e.g.,
software) executing in each of the processor elements 306(1-3)
interfaces with an array of the accumulators 318. The performance
monitoring logic 312 calls the code via APIs to register and have
accumulator(s) allocated, and to request the performance data
stored in the accumulator(s). The code communicates the requested
performance data to the logical synchronization unit 304, and the
performance data is conformed, or synchronized. In an embodiment,
the code can be implemented as millicode which is software running
as the lowest-level software in the operating system.
[0042] FIG. 4 illustrates various components of an exemplary
redundant processor system 400 in which an alternate embodiment of
performance data access can be implemented. As described above with
reference to the exemplary redundant processor system 300 shown in
FIG. 3, logical processor 302 includes processor elements 306(1-3)
which are each a microprocessor that executes processor events as
computer executable instructions. The redundant processor system
400 includes the memory components 114(1-3) that are each
associated with a respective processor group 104(1-3) as shown in
FIG. 1. Further, each processor element 306 is associated with a
partitioned memory region 308 in a respective memory component
114(1-3).
[0043] Each of the processor elements 306(1-3) of logical processor
302 include a high-frequency clock 314, a cache memory 316, and one
or more accumulators 318, respectively. For illustration, only the
clock 314, cache memory 316, and accumulator(s) 318 for processor
element 306(1) are shown. The description of the processor element
components, however, applies to each processor element 306(1-3).
The one or more accumulators 318 of a processor element 306 can be
implemented as memory to store, update, and/or maintain performance
data corresponding to the respective processor element 306.
[0044] The exemplary redundant processor system 400 includes a
remote computing device 402 configured for communication with
components of the redundant processor system via a communication
network 404. The remote computing device 402 includes a performance
monitoring application 406 which implements performance data access
as described above with reference to FIG. 3. Performance data can
be requested by the performance monitoring application 406 and
obtained from the non-synchronized processor elements 306(1-3) of
the logical processor 302.
[0045] The performance of the processor elements 306(1-3) can be
monitored for time durations to execute processor events, such as a
procedure, and for any number of other operational features, such
as cache hit rates, interrupt handling, and the like. While the
non-synchronized processor elements 306(1-3) all execute the same
instruction set (e.g., a processor event), each may return a
different performance response and the corresponding performance
data is likely asymmetric (e.g., different in each of the multiple
processor elements 306). The different performance data responses
from each of the processor elements 306(1-3) may appear as an error
to the performance monitoring application 406 when the performance
data responses are compared (or "voted") by the logical
synchronization unit 304.
[0046] The performance data requested by the performance monitoring
application 406 can be exchanged via a rendezvous operation with
the logical synchronization unit 304 and synchronized in each of
the processor elements 306(1-3) such that the performance
monitoring application 406 receives consistent data from each of
the processor elements 306(1-3). The performance monitoring
application 406 calls code (e.g., software) executed by each of the
processor elements 306(1-3) via APIs to register and have
accumulator(s) allocated, and to request that the performance data
be stored in the accumulator(s). The code communicates the
requested performance data to the logical synchronization unit 304
which exchanges the performance data. The performance data is
conformed, or synchronized, in the processor elements 306(1-3)
before being returned to the remote computing device 402 and to the
performance monitoring application 406 via the communication
network 404.
[0047] Methods for performance data access, such as exemplary
method 500 described with reference to FIG. 5, may be described in
the general context of computer executable instructions. Generally,
computer executable instructions include routines, programs,
objects, components, data structures, procedures, modules,
functions, and the like that perform particular functions or
implement particular abstract data types. The methods may also be
practiced in a distributed computing environment where functions
are performed by remote processing devices that are linked through
a communications network. In a distributed computing environment,
computer executable instructions may be located in both local and
remote computer storage media, including memory storage
devices.
[0048] FIG. 5 illustrates an embodiment of a method 500 for
performance data access. The order in which the method is described
is not intended to be construed as a limitation, and any number of
the described method blocks may be combined in any order to
implement the method. Furthermore, the method can be implemented in
any suitable hardware, software, firmware, or combination
thereof.
[0049] At block 502, processor events are processed with
non-synchronized processor elements of a logical processor in a
redundant processor system. For example, each processor element
306(1-3) of logical processor 302 (FIG. 3) executes the same set of
computer executable instructions, such as for a procedure or
processor event. At block 504, time duration(s) of processor events
are determined. For example, time durations for multiple executions
of a repeated processor event can be determined.
[0050] In an embodiment of performance data access to determine a
time duration of a processor event, a first time is obtained from a
clock of a processor element at block 504(A). For example, a time
is obtained from clock 314 of processor element 306(1) at the
beginning of a processor event. At block 504(B), the first time is
subtracted from a time stored in an accumulator of the processor
element. For example, the time obtained from clock 314 is
subtracted from accumulator 318 for the respective processor
element 306(1).
[0051] If the time stored in the accumulator is initially zero,
then the time obtained from clock 314 will be subtracted from zero
and the accumulator will initially have a negative time. At block
504(C), a second time is obtained from the clock of the processor
element after the processor event has been executed. At block
504(D), the second time is added to the accumulator such that a
time difference between the first time and the second time is the
time duration of the processor event. To accumulate multiple time
durations for multiple executions of a repeated processor event or
procedure, the method blocks 504(A-D) can be repeated to accumulate
the performance data of processor elements 306(1-3). Each beginning
time of a processor event is subtracted from the accumulator at
block 504(B) and each time after the processor event has executed
is added to the accumulator at block 504(D) such that a sum of all
the time differences is accumulated.
[0052] At block 506, performance data associated with execution of
the processor event(s) is stored in one or more accumulators
corresponding to a respective processor element. For example, each
processor element 306(1-3) includes one or more accumulators 318 to
store, update, and maintain performance data associated with a
respective processor element 306. Storing the performance data
includes storing time duration(s) of a processor event as the
performance data. For example, processor element 306(1) stores a
first time duration of a processor event in an accumulator 318 of
the processor element 306(1), processor element 306(2) stores a
second time duration of the processor event in an accumulator 318
of the processor element 306(2), and processor element 306(3)
stores a third time duration of the processor event in an
accumulator 318 of the processor element 306(3). Performance data
may also include counts of a repeated processor event, such as
cache hits or misses, for example.
[0053] At block 508, the performance data from each of the
non-synchronized processor elements is conformed as synchronized
performance data. Conforming the performance data includes
conforming an average of the time durations from each of the
non-synchronized processor elements to generate the synchronized
performance data. The logical synchronization unit 304 exchanges
the performance data from each of the processor elements 306(1-3),
and each of the processor elements conform the performance data to
generate the synchronized performance data.
[0054] At block 510, the synchronized performance data is
communicated to a performance monitoring application or logic that
requests the performance data from the logical processor (e.g., the
performance data stored in the one or more accumulators of the
non-synchronized processor elements). For example, the logical
synchronization unit 304 communicates the synchronized performance
data to the performance monitoring logic 312 (FIG. 3) and/or to a
performance monitoring application 406 in a remote computing device
402 (FIG. 4).
[0055] Although embodiments of performance data access have been
described in language specific to structural features and/or
methods, it is to be understood that the subject of the appended
claims is not necessarily limited to the specific features or
methods described. Rather, the specific features and methods are
disclosed as exemplary implementations of performance data
access.
* * * * *