U.S. patent application number 13/586174 was filed with the patent office on 2012-12-06 for performance visualization including hierarchical display of performance data.
This patent application is currently assigned to Cray Inc.. Invention is credited to Luiz DeRose, Dean T. Johnson.
Application Number | 20120311537 13/586174 |
Document ID | / |
Family ID | 39304492 |
Filed Date | 2012-12-06 |
United States Patent
Application |
20120311537 |
Kind Code |
A1 |
DeRose; Luiz ; et
al. |
December 6, 2012 |
PERFORMANCE VISUALIZATION INCLUDING HIERARCHICAL DISPLAY OF
PERFORMANCE DATA
Abstract
Systems and methods provide a display indicating performance
characteristics of a computer application. The display may include
a call graph having nodes that represent subunits of the
application. A first set of statistics for the subunit may be
represented in the size or dimensions of the node. A second set of
statistics may be displayed in the interior of the node. A third
set of statistics may be displayed in response to selecting the
node.
Inventors: |
DeRose; Luiz; (Mendota
Heights, MN) ; Johnson; Dean T.; (Mendota Heights,
MN) |
Assignee: |
Cray Inc.
Seattle
WA
|
Family ID: |
39304492 |
Appl. No.: |
13/586174 |
Filed: |
August 15, 2012 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
11874017 |
Oct 17, 2007 |
8286135 |
|
|
13586174 |
|
|
|
|
60829823 |
Oct 17, 2006 |
|
|
|
Current U.S.
Class: |
717/125 |
Current CPC
Class: |
G06F 11/3466 20130101;
G06F 11/3452 20130101; G06F 2201/88 20130101; G06F 2201/885
20130101; G06F 2201/865 20130101; G06F 11/3409 20130101 |
Class at
Publication: |
717/125 |
International
Class: |
G06F 9/44 20060101
G06F009/44 |
Claims
1. A tangible computer-readable medium having computer executable
instructions that when executed perform a method, the method
comprising: receiving performance data for a computer application
having subunits, the performance data having a first set of
performance statistics with a performance statistic for at least
some of the subunits; and displaying a call graph having nodes that
each correspond to a subunit, the call graph being displayed as a
hierarchical representation of calls between the subunits, the size
of the displayed nodes indicating the performance statistic from
the first set of performance statistics for the subunit that the
node represents, such that sizes of the displayed nodes vary based
on the performance statistic for each node.
2. The tangible computer-readable medium of claim 1 wherein the
performance data further having a second set of performance
statistics with a performance statistic for at least some of the
subunits and wherein the interiors of the displayed nodes include
an indication of the performance statistic from the second set of
performance statistics for the subunit that the node represents,
such that each node indicates two performance statistics.
3. The tangible computer-readable medium of claim 2 wherein the
indication of the performance statistic from the second set of
performance statistics is in a representation that is selected from
the group consisting of a bar graph, a pie chart, and text.
4. The tangible computer-readable medium of claim 2 wherein the
performance statistic from the second set of performance statistics
is a statistic that is aggregated from execution of a subunit on
multiple processors of a multi-processor system.
5. The tangible computer-readable medium of claim 2 wherein the
performance data further having a third set of performance
statistics with a performance statistic for at least some of the
subunits and further comprising simultaneously displaying, for a
displayed node, a graphic representation for that displayed node
indicating a performance statistic of the third set of performance
statistics for the displayed node.
6. The tangible computer-readable medium of claim 5 wherein the
graphic representation is displayed hierarchically below the
displayed node whose performance statistic the displayed graphic
representation indicates.
7. The tangible computer-readable medium of claim 5 when the
graphic representation is displayed when a cursor hovers over a
displayed node.
8. The tangible computer-readable medium of claim 1 wherein a
displayed node is highlighted to indicate a performance
characteristic.
9. The tangible computer-readable medium of claim 1 wherein the
performance characteristic is selected from the group consisting of
a hot spot and a bottleneck.
10. A method performed by a computer system for providing a
visualization of performance data of a computer program having
subunits, the method comprising: accessing performance data for the
computer program, the performance data having a first performance
statistic for at least some of the subunits; and displaying a call
graph having nodes that each correspond to a subunit, the call
graph being displayed as a hierarchical representation of calls
between the subunits, the size of the displayed nodes indicating
the first performance statistic for the subunit that the node
represents.
11. The method of claim 10 wherein the performance data further
having a second performance statistic for at least some of the
subunits and wherein the interiors of the displayed nodes include
an indication of the second performance statistic for the subunit
that the node represents.
12. The method of claim 11 wherein the indication of the second
performance statistic is selected from the group consisting of a
bar graph, a pie chart, and text.
13. The method of claim 11 wherein the second performance statistic
for a subunit is aggregated from execution of the subunit on
multiple processors of a multi-processor system.
14. The method of claim 11 wherein the performance data further
having a third performance statistic for at least some of the
subunits and further comprising simultaneously displaying, for a
displayed node, a graphic representation for that displayed node
indicating the third performance statistic for the displayed
node.
15. The method of claim 14 wherein the graphic representation is
displayed hierarchically below the displayed node whose performance
statistic the displayed graphic representation indicates.
16. The method of claim 15 wherein when the graphic representation
is displayed when a cursor hovers over a displayed node.
17. A computing system for providing a visualization of performance
data of a computer program having subunits, comprising: a memory
storing computer-executable instructions of: a component that
accesses performance data for the computer program, the performance
data having a first performance statistic for at least some of the
subunits; and a component that displays a call graph having nodes
that each correspond to a subunit, the call graph being displayed
as a hierarchical representation of calls between the subunits, the
size of the displayed nodes indicating the first performance
statistic for the subunit that the node represents; and a processor
that executes the computer-executable instructions stored in the
memory.
18. The computing system of claim 17 wherein the performance data
further having a second performance statistic for at least some of
the subunits and wherein the interiors of the displayed nodes
include an indication of the second performance statistic for the
subunit that the node represents.
19. The computing system of claim 18 wherein the performance data
further having a third performance statistic for at least some of
the subunits and further comprising simultaneously displaying, for
a displayed node, a graphic representation for that displayed node
indicating the third performance statistic for the displayed
node.
20. The computing system of claim 19 wherein when the graphic
representation is displayed when a cursor hovers over a displayed
node.
Description
CROSS-REFERENCE TO RELATED APPLICATION(S)
[0001] This application is a continuation of U.S. patent
application Ser. No. 11/874,017, filed Oct. 17, 2007, entitled
PERFORMANCE VISUALIZATION INCLUDING HIERARCHICAL DISPLAY OF
PERFORMANCE DATA, which claims the benefit of U.S. Provisional
Patent Application No. 60/829,823, filed Oct. 17, 2006, each of
which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] The embodiments of the present invention relate to the
display of performance data for a software application. More
specifically, the embodiments relate to a hierarchical display of
performance data.
LIMITED COPYRIGHT WAIVER
[0003] A portion of the disclosure of this patent document contains
material to which the claim of copyright protection is made. The
copyright owner has no objection to the facsimile reproduction by
any person of the patent document or the patent disclosure, as it
appears in the U.S. Patent and Trademark Office file or records,
but reserves all other rights whatsoever. Copyright .COPYRGT. 2005,
2006 Cray Inc.
BACKGROUND
[0004] Computer software applications and programs may be very
complex, and may be run in complex hardware environments such as
multiprocessor environments. Due to the complexity of the software
or the runtime environment, it can be difficult to determine
performance issues such as hotspots or bottlenecks in computer
programs and applications. Previous systems have attempted to solve
the problem by providing call graphs that represent certain aspects
of the execution of an application. However, the call graphs of
previous systems have been limited in the number and type of
statistics represented in the call graph.
SUMMARY
[0005] Systems and methods provide a display indicating performance
characteristics of a computer application. The display may include
a call graph having nodes that represent subunits of the
application. A first set of statistics for the subunit may be
represented in the size or dimensions of the node. A second set of
statistics may be displayed in the interior of the node.
[0006] A further aspect of the systems and methods includes
displaying a third set of statistics in response to selecting the
node.
[0007] A still further aspect of the systems and methods includes
highlighting a node to indicate a performance characteristic.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] FIG. 1 illustrates example components for building a
software application according to embodiments of the invention.
[0009] FIG. 2 illustrates components used to visualize application
performance data according to embodiments of the invention.
[0010] FIG. 3 is a flowchart illustrating an exemplary method for
displaying application performance data according to example
embodiments of the invention.
[0011] FIG. 4 is an example user interface screen according to
embodiments of the invention.
DETAILED DESCRIPTION
[0012] In the following detailed description, reference is made to
the accompanying drawings that form a part hereof, and in which is
shown by way of illustration, specific embodiments in which the
inventive subject matter may be practiced. These embodiments are
described in sufficient detail to enable those skilled in the art
to practice them, and it is to be understood that other embodiments
may be utilized and that structural, logical, and electrical
changes may be made without departing from the scope of the
inventive subject matter. Such embodiments of the inventive subject
matter may be referred to, individually and/or collectively, herein
by the term "invention" merely for convenience and without
intending to voluntarily limit the scope of this application to any
single invention or inventive concept if more than one is in fact
disclosed.
[0013] The following description is, therefore, not to be taken in
a limited sense, and the scope of the inventive subject matter is
defined by the appended claims.
[0014] In the Figures, the same reference number is used throughout
to refer to an identical component which appears in multiple
Figures. Signals and connections may be referred to by the same
reference number or label, and the actual meaning will be clear
from its use in the context of the description.
[0015] The functions or algorithms described herein are implemented
in hardware, and/or software in embodiments. The software comprises
computer executable instructions stored on computer-readable media
such as memory or other types of storage devices. The term
"computer-readable media" is also used to represent
software-transmitted carrier waves. Further, such functions
correspond to modules, which are software, hardware, firmware, or
any combination thereof. Multiple functions are performed in one or
more modules as desired, and the embodiments described are merely
examples. A digital signal processor, ASIC, microprocessor, or any
other type of processor operating on a system, such as a personal
computer, server, a router, or any other device capable of
processing data including network interconnection devices executes
the software.
[0016] Some embodiments implement the functions in two or more
specific interconnected hardware modules or devices with related
control and data signals communicated between and through the
modules, or as portions of an application-specific integrated
circuit. Thus, the example process flow is applicable to software,
firmware, and hardware implementations.
[0017] FIG. 1 illustrates example components of a system 100 for
building a software application according to embodiments of the
invention. In some embodiments, the system includes a compiler 104
and an application instrumenter 110. Compiler 104 reads one or more
application source code modules 102 as input and produces
application 108 as output. Application 108 may include object code
modules 106 that correspond to source code modules 102. Application
108 may include library modules or references to dynamically loaded
library modules that are loaded when the application is executed.
Compiler 104 may include or invoke a linker or loader to complete
the building of an application 108. Source modules 102 may be in
any programming language, including C, C++, FORTRAN etc.
[0018] In some embodiments, application instrumenter 110 reads an
application 108 as input and inserts instrumentation code 112 into
the application to produce an instrumented application 114. In
particular embodiments the application instrumenter 110 is the
"pat_build" program available from Cray, Inc. Instrumentation code
112 may include code that produces application performance
information for an application. Such performance information may
include time values (e.g. entry time, exit time, and/or total time)
and hardware counters associated with function entry points.
[0019] In alternative embodiments, the compiler may be directed to
include instrumentation code 112 in an application 108. In further
alternative embodiments, a software developer may insert
instrumentation code or calls to instrumentation code directly into
the source code modules 102.
[0020] While FIG. 1 illustrates a system for creating an
application containing instrumentation code, FIG. 2 illustrates a
system to visualize the performance of an instrumented
application.
[0021] FIG. 2 illustrates components of a system 200 used to
visualize application performance data according to embodiments of
the invention. In some embodiments, the system includes an
instrumented application 114, a hardware execution environment 220,
and a performance visualization tool 206. Instrumented application
114 comprises an application created as described above in FIG.
1.
[0022] In some embodiments, hardware execution environment 220 is a
multiple processor hardware environment. There may be four, tens,
hundreds, or even thousands of processors 202 in the hardware
execution environment 220. Processors 202 may be grouped into nodes
having multiple processors, for example, four processors.
Alternatively, processors 202 may be distributed across a number of
different systems and communicably coupled via a network. In
particular embodiments, hardware execution environment 220 may be a
Cray XT3 from Cray, Inc. Details on a hardware execution
environment 220 used in further particular embodiments may be found
in the document entitled "The BlackWidow High-Radix Clos Network"
which is attached as Appendix A to U.S. Provisional Patent
Application Ser. No. 60/829,823, filed Oct. 17, 2006 and entitled
"PERFORMANCE VISUALIZATION INCLUDING HIERARCHICAL DISPLAY OF
PERFORMANCE DATA", which has been previously incorporated by
reference.
[0023] In general, a processor 202 may be any type of processor,
including scalar processors, vector processors, central processing
units or any other logic circuit that can execute instructions and
manipulate data.
[0024] Application 114 may be run on one or more of the processors
in hardware execution environment 220. Application 114 may be
composed of one or more processes, threads or other execution units
that are distributed across one or more processors 202 for
execution. Further, the modules or functions of application 114 may
be executed across one or more processors 202 in hardware execution
environment 220.
[0025] As application 114 is being executed, application
performance data 204 is created. In some embodiments, application
performance data 204 comprises time data, hardware counters and/or
other performance metrics associated with one or more function
entry and/or exit points for application 114. The time data may
include entry time, exit time or execution time for a function.
Other performance metrics include cache misses, number of calls,
TLB (Translation Lookaside Buffer) misses, I/O counters, message
counters, message sizes and/or bandwidth metrics. The application
performance data 204 may be saved in a file for later analysis. In
some embodiments, application performance data 204 may be formatted
as an XML file.
[0026] After application 114 has finished execution, a performance
visualization tool 206 may read the application performance data
204 to determine various performance metrics and statistics 210
regarding application 114. In particular embodiments, performance
visualization tool 206 comprises the Cray Apprentice2 performance
visualization tool available from Cray, Inc. In some embodiments,
performance visualization tool 206 provides a call graph display of
functions and/or modules that are executed during the run-time of
application 114. In some embodiments, the call graph includes a set
of nodes that may be presented in a hierarchy in the call graph.
The call graph may be expressed as a directed graph, which
represents the path the application program 114 took during
execution. The nodes of the call graph may be represented by
rectangles. The dimensions of each node, which represent a
subroutine or a code region in the application, may represent a
first set of metrics or statistics for a subroutine, function, or
code region, for example, execution time. At a second level of the
hierarchy, each node in the call graph display may display a second
set of statistics or metrics related to the subroutine, function or
region represented by the node. Further details on a call graph
provided by example embodiments of the invention are provided below
with reference to FIG. 4, while further details on the operation of
performance visualization tool 206 are provided below with
reference to FIG. 3.
[0027] FIG. 3 is a flowchart illustrating example methods for
displaying application performance data according to example
embodiments of the invention. The methods to be performed by the
operating environment constitute computer programs made up of
computer-executable instructions. Describing the methods by
reference to a flowchart enables one skilled in the art to develop
such programs including such instructions to carry out the methods
on suitable computers (the processor or processors of the computer
executing the instructions from computer-readable media such as
ROMs, RAMs, hard drives, CD-ROM, DVD-ROM, flash memory etc. The
methods illustrated in FIG. 3 are inclusive of acts that may be
taken by an operating environment executing an example embodiment
of the invention.
[0028] The method begins at block 302 by receiving performance data
for an application. As noted above, the performance data may
include start times, end times, total times, or other time related
data for a function, subroutine, or code region of an application.
Further, the performance data may include hardware counters and/or
other performance metrics associated with one or more function
entry and/or exit points for an application such as cache misses,
number of calls, TLB misses, I/O counters, message counters,
message sizes and/or bandwidth metrics.
[0029] At block 304, a performance analysis tool analyzes the
performance data, and determines at least a first set of statistics
and a second set of statistics for one or more application
subunits, such as functions, subroutines, or code regions within
the application.
[0030] At block 306, the performance analysis tool generates a
graph representing the execution path of the application, with
nodes representing the various subunits.
[0031] At block 308, the performance analysis tools displays the
call graph, where the node size represents a first set of
statistics and wherein a second set of statistics is displayed in
the interior portion of the node. For example, in some embodiments,
the node height may represent a first time statistic such as an
execution time of a code region while a node width may represent a
second time statistic, such as an execution time that also includes
child functions or subroutines executed. The statistics displayed
in the interior of the node may include an average time, maximum
time, minimum time across all processors or other statistics
(hardware counters cache misses, number of calls, TLB misses, I/O
counters, message counters, message sizes and/or bandwidth metrics)
regarding the execution of the subunit represented by the node. The
interior portion may be displayed as a bar graph, a pie chart, as
text or any other manner that may be used to present statistics. In
addition, other statistics may be presented besides time based
statistics. For example, I/O (input output), memory usage, or other
execution statistics that may be present in the performance data
may be displayed.
[0032] At block 310, a performance analysis tool may receive, via a
user interface, an indication that a particular node has been
selected. The indication may be a point and click operation, or it
may be an indication that a pointer cursor is "hovering" over the
node. In some embodiments, the selection of a node may cause a
third set of statistics to be displayed. For example a "tooltip"
box may be displayed that provides detailed statistics regarding
the execution of the subunit represented by the node.
[0033] Further, in some embodiments, at block 312 a node may be
highlighted to indicate a characteristic of the node. For example,
a node may be highlighted to indicate that the node is a hot spot,
a bottleneck, or that the node has some other execution
characteristic. The highlighting may including highlighting the
node in a different boundary or interior color, providing a
blinking boundary or interior, providing a boundary having a
different thickness, or any other mechanism known in the art for
highlighting information.
[0034] FIG. 4 is an example user interface screen 402 according to
embodiments of the invention. The example interface screen includes
a call graph 403 including a plurality of nodes 404, 406, 408 that
represent subunits of a computer application. The display is
hierarchical. The first level of the hierarchy comprises the call
graph 403. The call graph 403 may be expressed as a directed graph,
which represents the path the program takes during execution. The
nodes 404, 406 and 408 of the call graph are represented by
rectangles. The dimensions of each node 404, 406 or 408, which
represent a subunit such as a function, subroutine or a code region
in the application, are a function of a particular metric,
exemplified here with execution time. The height of the node
represents the execution time of the subunit, not counting its
children (exclusive time), and the width of the node represents the
total execution time of the children (or children time). At the
second level of the hierarchy, nodes 408 in the call graph may
display a bar graph having one or more bars 410. In the example
shown, the bars 410 are scaled according to the vertical height of
the node, which at this level of the hierarchy indicates the
maximum value for the same metric. The left bar corresponds to the
average value across all processors, and the right bar corresponds
to the minimum value from all processors. The border of the nodes
can be highlighted as exemplified by border 412 to indicate hot
spots based on another metric, providing a third level of the
hierarchy. Finally, at the next level of the hierarchy, the
performance visualization tool displays a full set of metrics on a
"tooltip" (not shown), when the user places the mouse on top of a
node or when the user selects a node.
[0035] As can be seen from the discussion above, various
embodiments provide a call graph view, but with different
interpretations depending on the level of the hierarchy that the
user is considering. The hierarchical view provided in various
embodiments may expose deeper previously hidden information in a
way that allows the user to intuitively and quickly locate
performance bottlenecks.
[0036] The Abstract is provided to comply with 37 C.F.R.
.sctn.1.72(b) to allow the reader to quickly ascertain the nature
and gist of the technical disclosure. The Abstract is submitted
with the understanding that it will not be used to limit the scope
or meaning of the claims.
[0037] In the foregoing Detailed Description, various features are
grouped together in a single embodiment for the purpose of
streamlining the disclosure. This method of disclosure is not to be
interpreted as reflecting an intention that the claimed embodiments
have more features than are expressly recited in each claim. Thus
the following claims are hereby incorporated into the Detailed
Description, with each claim standing on its own as a separate
embodiment.
[0038] The foregoing descriptions of specific embodiments of the
present invention have been presented for purposes of illustration
and description. The embodiments presented are not intended to be
exhaustive or to limit the invention to the particular forms
disclosed. It should be understood that one of ordinary skill in
the art can recognize that the teachings of the detailed
description allow for a variety of modifications and variations
that are not disclosed herein but are nevertheless within the scope
of the present invention. Accordingly, it is intended that the
scope of the present invention be defined by the appended claims
and their equivalents, rather than by the description of the
embodiments.
* * * * *