U.S. patent application number 10/443316 was filed with the patent office on 2003-12-04 for optimized program analysis.
This patent application is currently assigned to ORACLE INTERNATIONAL CORPORATION. Invention is credited to Bhattiprolu, Ramesh, Bradley, Kirk, Hosmath, Mahantesh, Kumar, Sunil, Motlani, Ritesh, Pullokkaran, John, Ramesh, Gopalaswamy, Sethi, Ajay, Shisodia, Sameer.
Application Number | 20030226135 10/443316 |
Document ID | / |
Family ID | 29587069 |
Filed Date | 2003-12-04 |
United States Patent
Application |
20030226135 |
Kind Code |
A1 |
Sethi, Ajay ; et
al. |
December 4, 2003 |
Optimized program analysis
Abstract
The present invention generally relates to computer software,
and more specifically, to a computerized utility for analysis of
optimized program files. A method and apparatus for optimized
program analysis is disclosed.
Inventors: |
Sethi, Ajay; (Bangalore,
IN) ; Shisodia, Sameer; (Bangalore, IN) ;
Hosmath, Mahantesh; (Dharwad, IN) ; Motlani,
Ritesh; (Madhya Pradesh, IN) ; Bhattiprolu,
Ramesh; (Bangalore, IN) ; Bradley, Kirk; (San
Francisco, CA) ; Pullokkaran, John; (Foster City,
CA) ; Kumar, Sunil; (Foster City, CA) ;
Ramesh, Gopalaswamy; (Chennai, IN) |
Correspondence
Address: |
HICKMAN PALERMO TRUONG & BECKER, LLP
1600 WILLOW STREET
SAN JOSE
CA
95125
US
|
Assignee: |
ORACLE INTERNATIONAL
CORPORATION
Redwood Shores
CA
|
Family ID: |
29587069 |
Appl. No.: |
10/443316 |
Filed: |
May 21, 2003 |
Related U.S. Patent Documents
|
|
|
|
|
|
Application
Number |
Filing Date |
Patent Number |
|
|
60384206 |
May 29, 2002 |
|
|
|
Current U.S.
Class: |
717/154 |
Current CPC
Class: |
G06F 11/0778 20130101;
G06F 11/0769 20130101; G06F 11/366 20130101 |
Class at
Publication: |
717/154 |
International
Class: |
G06F 009/45 |
Claims
What is claimed is:
1. A method for analysis of optimized executables, the method
comprising the computer-implemented steps of: generating a types
table, generating a symbol table; combining said types table, and
said symbols table with existing global symbol table data available
from an executable to generate an optimized program file, parsing
said optimized program file with an offline program analyzer,
analyzing said optimized program file with an offline program
analyzer.
2. The method of claim 1, the method further comprising the
computer-implemented steps of: employing a converter component to
read input from an executable and a core file, employing a
converter component to extract said symbol and type information
from said source files, establishing linkages between said two
input files within a generic core file; wherein, said converter
component establishes said linkages.
3. The method of claim 2, wherein storage of said linkages is done
persistently.
4. The method of claim 2, wherein said method of analysis further
comprises the computer-implemented step of: parsing and analyzing
said linkages wherein said linkages may be parsed and analyzed
transparently on a plurality of platforms.
5. The method of claim 2, wherein the method further comprises the
computer-implemented step of: analyzing said linkages within an
analyzer component.
6. The method of claim 1, the method further comprising the
computer-implemented steps of: reconstructing information about the
types of the symbols found in the executable; adding entries into
said types table; wherein, symbols that were obtained from an
operating system core file include corresponding type details.
7. A computer-readable medium carrying one or more sequences of
instructions for analysis of optimized executables, wherein
execution of the one or more sequences of instructions by one or
more processors causes the one or more processors to perform the
steps of: generating a types table, generating a symbol table;
combining types table, and said symbols table with existing global
symbol table data available from an executable to generate an
optimized program file, parsing said optimized program file with an
offline program analyzer, analyzing said optimized program file
with an offline program analyzer.
8. A computer apparatus comprising: a processor; and a memory
coupled to the processor, the memory containing one or more
sequences of instructions for optimized program analysis, wherein
execution of the one or more sequences of instructions by the
processor causes the processor to perform the steps of: generating
a types table, generating a symbol table; combining types table,
and said symbols table with existing global symbol table data
available from an executable to generate an optimized program file,
parsing said optimized program file with an offline program
analyzer, analyzing said optimized program file with an offline
program analyzer.
Description
PRIORITY CLAIM AND RELATED APPLICATION
[0001] This application claims domestic priority from prior U.S.
provisional application Ser. No. 60/384,206, entitled "Platform
Independent Core Dump Analysis," filed May 29, 2002, naming as
inventor Ajay Sethi, the entire disclosure of which is hereby
incorporated by reference for all purposes as if fully set forth
herein. This application is related to U.S. non-provisional
application Ser. No. 10/XXX,XXX (Attorney Docket No. 50277-2028),
entitled "Representation of Core Files in a Generic Format," filed
on the same day herewith, naming as inventors Ajay Sethi, Sameer
Shisodia, Mahantesh Hosmath, Ritesh Motlani, Ramesh Bhattiprolu,
Kirk Bradley, John Pullokkaran, Sunil Kumar, and Gopalaswamy
Ramesh, the entire disclosure of which is hereby incorporated by
reference for all purposes as if fully set forth herein.
FIELD OF THE INVENTION
[0002] The present invention generally relates to computer
software, and more specifically, to a computerized utility for
debugging software.
BACKGROUND OF THE INVENTION
[0003] Unless otherwise indicated, the approaches described in this
section are not prior art to the claims in this application and are
not admitted to be prior art by inclusion in this section.
[0004] When developing program code for multiple computer operating
systems, the program code is generic except for specific layers
performing platform-dependent tasks. Generic program code should
compile and run on all platforms. A core file is typically
generated by the operating system when a process fails because of
an irrecoverable error. Information obtained from this core file
serves as a starting point for determining and analyzing what
contributed to the failure.
[0005] Commercially available software programs are often shipped
in an optimized format, without symbol and type information. In
conventional debugging and analysis techniques, lack of this
information can necessitate running a process multiple times.
Rebuilding unoptimized code is extremely inefficient for software
programs with a large source base. When it is not readily apparent
how much of the code needs rebuilding, it is impractical to rebuild
the code in its entirety because of the size of the resulting
binary.
[0006] To circumvent this limitation, engineers manually inspect
source code while running optimized executables, trying to pinpoint
areas that could have contributed to the error. To debug code,
engineers typically rebuild the suspect portion of the code
unoptimized, run it in a debugging environment on the same platform
where the error occurred, and attempt to replicate the error. Time
and inaccuracy are major drawbacks to this conventional debugging
and analysis technique. In addition, the unoptimized code may not
behave consistently with the optimized code because the behavior of
the executable may be different, and therefore, the error may not
be reproducible.
[0007] Support and development teams typically perform debugging in
tandem. Platforms at client, development, and support sites may
well vary, and core file formats vary from platform to platform.
Additionally, byte ordering of data differs depending on machine
architecture. There are many limitations to conventional debugging
and analysis techniques.
[0008] For example, in most collaborative support and development
environments, support teams are the first to receive and analyze
core files generated by a software crash at a client site.
Generally, development and support work together to troubleshoot
and resolve code errors. One benefit of collaborative environments
is that individuals are able to contribute to areas of the code in
which they have expertise. However, a drawback to traditional
techniques is that collaborative environments often include
multiple platforms, operating system versions, and environments.
Traditional techniques can require support and development
personnel to repeat steps in their separate environments. Both time
and effort would be saved if developers and support analysts were
able to contribute to editing and building code without duplicating
effort. Incremental and persistent capture and storage of analysis
and debugging data would save additional time and effort.
[0009] In addition, when platforms at client and development sites
are different, replicating bugs may be difficult or even
impossible. Conventional debuggers require a compiled binary for
each platform. A drawback to traditional techniques is that even
with platform-specific layers, there may be bugs on a specific
platform that will not replicate on another platform. Traditional
techniques require that the developer would have to replicate,
change, test, and debug the code on both deployment and development
environments. This approach requires that the developer be familiar
with tools, debuggers and other support software on both platforms.
If the developer could analyze the code in a generic format on any
platform, time and effort would be saved.
[0010] Based on the foregoing, it is desirable to provide
techniques for analysis of optimized code in a generic format
wherein the optimized code can be analyzed to help in determining
errors in code that occurred that the operating system is unable to
handle. Additionally, it is desirable to analyze code using
existing core dumps from existing optimized binaries.
SUMMARY OF THE INVENTION
[0011] When developing code for multiple platforms, traditionally,
a majority of the code is generic except for the platform specific
layers. Ideally, the generic code should compile and run as is on
all platforms. The platform-specific layers are added for tasks
which need platform specific implementations, and work differently
across platforms. Because of the variations in core file formats,
traditional techniques make it impossible to transparently analyze
core files from one platform another.
[0012] Techniques are provided for analysis of optimized program
files. According to one aspect, a mechanism for analysis of core
files in a platform independent manner is provided. According to
another aspect, a method of analysis and interpretation of
converted data from existing optimized executables, source files,
and core files is provided. The converted data can be created in a
generic format, such as the generic core format (GCORE) according
to the techniques as described in non-provisional patent
application Ser. No. 10/XXX,XXX "Representation of Core Files in a
Generic Format" filed on the same day herewith. Converted data,
henceforth referenced as a GCORE, can be analyzed and interpreted
according to the techniques described herein.
[0013] Other objects and advantages will become readily apparent
from the following detailed description. The invention can be
embodied in different ways, and its details varied without
departing from the invention. Accordingly, the drawings and
descriptions are to be regarded as illustrative in nature, and not
as restrictive.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The present invention is depicted by way of example, and not
by way of limitation, in the figures of the accompanying drawings
and in which like reference numerals refer to similar elements and
in which:
[0015] FIG. 1 is a block diagram that depicts a high level overview
of a system for optimized program analysis;
[0016] FIG. 2 is a block diagram that depicts an example of a
system for optimized program analysis;
[0017] FIG. 3 is a block diagram that depicts an example of a
generated symbol and type table for analysis of optimized code in a
generic format;
[0018] FIG. 4 is a block diagram that depicts a computer system
upon which embodiments of the invention may be implemented.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
[0019] A method and apparatus for analysis of optimized program
files is herein described. Specific details are set forth to
provide a thorough understanding of the present invention. It will
be apparent, however, that the present invention may be practiced
without these details. In other instances, well-known structures
and devices are depicted in block diagram format to avoid
unnecessarily obscuring the present invention.
GCORE: The Generic Core
[0020] A generic representation of core files and executables, or
GCORE as it is henceforth referenced, contains information about
core files and executables. According to an embodiment, GCORE
includes a superset of binary formats used within UNIX. Examples
include: the Executable and Linking Format (ELF), the Common Object
File Format (COFF), the Programmable Instruction Set Computers
Format (PRISC), and the Mobilization Stationing, Planning, and
Execution System Format (MSPES). This superset of binary formats
can be extended to support a multitude of binary formats. Since
GCORE captures different segments across a multitude of binary
formats, GCORE overcomes the debugging requirement of having a
compiled binary for each platform. The code base for GCORE is
generic therefore analysis can be performed on any platform.
According to an embodiment, analysis of the GCORE can be done
according to the techniques described herein.
Optimized Program Analysis
[0021] In the analysis of a core file, it is often difficult to
ascertain what caused an executable to fail. Most data required for
meaningful analysis of the core file exist in the core file's data
sections. This data exists in raw binary format. Interpreting this
data as such is not possible because symbol information is not
available. In optimized executables, symbol information is stripped
and therefore is not available. Debugging core dumps produced by
executables on many operating systems involves determining the
state of a process at the time of core dump. The state of a process
at the time of a core dump comprises information such as the
following:
[0022] The function call stack and parameters of the called
function.
[0023] The values of local and global variables in the
executable.
[0024] Contents of registers
[0025] Signal state at point of failure
[0026] Of the above, in optimized executables, it is often not
possible to get the parameters of the function calls and the values
of the variables, whether local or global. This necessitates
recompiling the code unoptimized and reproducing the problem to
produce a core dump. However, in the real world, this can cause few
problems:
[0027] Unoptimized executables do not always behave exactly like
optimized ones.
[0028] The problem may be difficult to reproduce consistently.
[0029] For larger executables, it may be difficult to isolate
errors because it may not be feasible to recompile large portions
of code as unoptimized
[0030] According to an embodiment, analyzing the core dump of an
optimized executable file is accomplished by reconstructing the
information about symbol types found in the executable. Type
information describes the entire declaration of a symbol. For
example, for a declaration like "int *a[10]", "*a", "*a[5]" or just
"a" itself can all produce meaningful data. Reconstruction of this
information is possible by parsing declarations in the original
source code. After parsing, symbols extracted from the core are
matched with their corresponding type details. For each symbol, an
entry is added to a types table. The type information is combined
with the starting address for each symbol in the core and the
type's size to extract the values of program variables when the
execution of the program was halted. According to an embodiment, an
analyzer examines the declaration of the structure and the type
information, referring to the header file where the structure
defined. Based on this information, type information may be
determined by the size of intrinsic data types, for example, the
number of bytes for integer, and for character.
[0031] After an executable has been compiled and optimized, symbol
type information is stripped from the executable. An entry in an
optimized executable has an address which points to a data segment
within the core file. From just the operating system core file and
the optimized executable, it is impossible to gather enough
information to reconstruct what caused the failure. After
compilation, some information exists about global symbols, such as
the symbol name, the address of the data, and its value. However,
no information about symbol type and size exist. According to an
embodiment, analyzing the core dump of an optimized executable is
done by reconstructing information about the types of the symbols
found in the optimized executable.
[0032] FIG. 1 is a block diagram that depicts a high level overview
of a system for analysis of optimized executables. According to an
embodiment, a system for analysis of a generic representation of an
optimized executable core file, such as a GCORE file, is
provided.
[0033] To create a generic core file for analysis, a converter
component 110 is employed to convert data from optimized executable
102 and operating system core file 104. The converter component 110
reads both input files from the executable 102 and operating system
core file 104, combines them into a generic format, and establishes
initial linkages between these two input files within the GCORE
106. Symbol information 118 and type information 120 extracted from
source files 130 is added to GCORE 106. The GCORE 106 is processed
by an offline analyzer 200, which provides access to program
structures and values that existed at the point of failure. The
program structures and values are used in analysis and debugging of
this failure.
[0034] According to an embodiment, FIG. 2. depicts details of
offline analyzer 200. A parser and analyzer 202 processes
information from executable 102, such as global, local, and
structure/union members, and information about function parameters.
The parser and analyzer 202 processes information from the
operating system core file 104, such as virtual addresses and
offsets. The parser and analyzer 202 also processes user commands
208, which contain user-defined type definitions which share
namespace with global symbols extracted by parsing code
declarations for various types and functions. From the processed
information, parser and analyzer 202 interprets the processed
information and generates an external reconstructed symbol table
204 and a types table 206. The reconstructed symbol and type
information can now be made available to third party applications
such as a debugger or some other tool 212.
Layout of the Symbol and Type Tables
[0035] According to an embodiment, reconstruction of symbol and
type information is performed by parsing declarations in source
code. Symbols obtained from the operating system core file have
corresponding type details. Therefore, for each type there exists
an entry in a types table. A starting address for each symbol is
available in the basic symbol table available in the executable.
From this information, type and size information can be gleaned as
well.
[0036] As depicted in FIG. 2, symbol table 204 and types table 206
are generated by the parser and analyzer 112 with entries
corresponding to each symbol in the executable 102. FIG. 3 depicts
details of symbol table 204 and types table 206 according to an
embodiment of the present invention.
[0037] According to an embodiment, reconstruction of symbol and
type information is depicted through four closely interlinked lists
300. The four closely interlinked lists 300 represent value and
parameter details for reconstructing symbol and type
information.
Symbol Table
[0038] According to an embodiment, symbol table 204 is represented
by two distinct lists, symbol list 310 and symbol info list 308.
Each entry in symbol list 310 points to an entry in symbol info
list 308, which lists symbol type details. Entries in symbol info
list 308 each have a pointer which corresponds to an entry in type
table 206.
Type Table
[0039] The type table 206 is represented by two distinct lists,
types list 306 and type offset list 314. There is an entry in type
list 306 corresponding to every type in the executable 102. Complex
types, such as structures and functions have an additional pointer
to a types offset list 314 that lists related elements or
parameters. Symbol table 204 has an entry for each type, listed in
the type offset list 314. Entries in this type offset list 314
refer to an entry in the symbol table 204 that identifies its
parent. An identifier, such as a flag, may be used to distinguish a
parent symbol from a child symbol. The four closely interlinked
lists 300 represent details for reconstructed symbol and type
tables as depicted in 204 and 206 respectively.
[0040] External creation of the symbol and types information can be
an effective solution to the problems that arise because of the
optimization of executables after compilation of program source
code, such as C programs. The invention eliminates the need for
recompiling optimized executables, as it reduces overhead (due to
recompilation time) and enables analysts to determine causes of
core dumps. Issues, for example, such as those related to memory
corruption, disappear once executables are recompiled with debug
option. The invention is therefore applicable to any executable. To
ensure that the released code performs well, executables are built
with the maximum optimization level. Therefore, the invention
simplifies analysis of errors encountered in any optimized
executable.
Hardware Overview
[0041] The approach for analysis of optimized executables described
herein may be implemented in a variety of ways and the invention is
not limited to any particular implementation. The approach may be
implemented as a stand-alone mechanism. Furthermore, the approach
may be implemented in computer software, hardware, or a combination
thereof.
[0042] FIG. 4 is a block diagram that depicts a computer system 400
upon which an embodiment of the invention may be implemented.
Computer system 400 includes a bus 402 or other communication
mechanism for communicating information, and a processor 404
coupled with bus 402 for processing information. Computer system
400 also includes a main memory 406, such as a random access memory
(RAM) or other dynamic storage device, coupled to bus 402 for
storing information and instructions to be executed by processor
404. Main memory 406 also may be used for storing temporary
variables or other intermediate information during execution of
instructions to be executed by processor 404. Computer system 400
further includes a read only memory (ROM) 408 or other static
storage device coupled to bus 402 for storing static information
and instructions for processor 404. A storage device 410, such as a
magnetic disk or optical disk, is provided and coupled to bus 402
for storing information and instructions.
[0043] Computer system 400 may be coupled via bus 402 to a display
412, such as a cathode ray tube (CRT), for displaying information
to a computer user. An input device 414, including alphanumeric and
other keys, is coupled to bus 402 for communicating information and
command selections to processor 404. Another type of user input
device is cursor control 416, such as a mouse, a trackball, or
cursor direction keys for communicating direction information and
command selections to processor 404 and for controlling cursor
movement on display 412. This input device typically has two
degrees of freedom in two axes, a first axis (e.g., x) and a second
axis (e.g., y), that allows the device to specify positions in a
plane.
[0044] The invention is related to the use of computer system 400
for implementing the techniques described herein. According to one
embodiment of the invention, those techniques are performed by
computer system 400 in response to processor 404 executing one or
more sequences of one or more instructions contained in main memory
406. Such instructions may be read into main memory 406 from
another computer-readable medium, such as storage device 410.
Execution of the sequences of instructions contained in main memory
406 causes processor 404 to perform the process steps described
herein. In alternative embodiments, hard-wired circuitry may be
used in place of or in combination with software instructions to
implement the invention. Thus, embodiments of the invention are not
limited to any specific combination of hardware circuitry and
software.
[0045] The term "computer-readable medium" as used herein refers to
any medium that participates in providing instructions to processor
404 for execution. Such a medium may take many forms, including but
not limited to, non-volatile media, volatile media, and
transmission media. Non-volatile media includes, for example,
optical or magnetic disks, such as storage device 410. Volatile
media includes dynamic memory, such as main memory 406.
Transmission media includes coaxial cables, copper wire and fiber
optics, including the wires that comprise bus 402. Transmission
media can also take the form of acoustic or light waves, such as
those generated during radio wave and infrared data
communications.
[0046] Common forms of computer-readable media include, for
example, a floppy disk, a flexible disk, hard disk, magnetic tape,
or any other magnetic medium, a CD-ROM, any other optical medium,
punch cards, paper tape, any other physical medium with patterns of
holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory
chip or cartridge, a carrier wave as described hereinafter, or any
other medium from which a computer can read.
[0047] Various forms of computer readable media may be involved in
carrying one or more sequences of one or more instructions to
processor 404 for execution. For example, the instructions may
initially be carried on a magnetic disk of a remote computer. The
remote computer can load the instructions into its dynamic memory
and send the instructions over a telephone line using a modem. A
modem local to computer system 400 can receive the data on the
telephone line and use an infrared transmitter to convert the data
to an infrared signal. An infrared detector can receive the data
carried in the infrared signal and appropriate circuitry can place
the data on bus 402. Bus 402 carries the data to main memory 406,
from which processor 404 retrieves and executes the instructions.
The instructions received by main memory 406 may optionally be
stored on storage device 410 either before or after execution by
processor 404.
[0048] Computer system 400 also includes a communication interface
418 coupled to bus 402. Communication interface 418 provides a
two-way data communication coupling to a network link 420 that is
connected to a local network 422. For example, communication
interface 418 may be an integrated services digital network (ISDN)
card or a modem to provide a data communication connection to a
corresponding type of telephone line. As another example,
communication interface 418 may be a local area network (LAN) card
to provide a data communication connection to a compatible LAN.
Wireless links may also be implemented. In any such implementation,
communication interface 418 sends and receives electrical,
electromagnetic or optical signals that carry digital data streams
representing various types of information.
[0049] Network link 420 typically provides data communication
through one or more networks to other data devices. For example,
network link 420 may provide a connection through local network 422
to a host computer 424 or to data equipment operated by an Internet
Service Provider (ISP) 426. ISP 426 in turn provides data
communication services through the worldwide packet data
communication network now commonly referred to as the "Internet"
428. Local network 422 and Internet 428 both use electrical,
electromagnetic or optical signals that carry digital data streams.
The signals through the various networks and the signals on network
link 420 and through communication interface 418, which carry the
digital data to and from computer system 400, are exemplary forms
of carrier waves transporting the information.
[0050] Computer system 400 can send messages and receive data,
including program code, through the network(s), network link 420
and communication interface 418. In the Internet example, a server
430 might transmit a requested code for an application program
through Internet 428, ISP 426, local network 422 and communication
interface 418.
[0051] Processor 404 may execute the received code as it is
received, and/or stored in storage device 410, or other
non-volatile storage for later execution. In this manner, computer
system 400 may obtain application code in the form of a carrier
wave.
Extensions and Alternatives
[0052] In the foregoing specification, the invention has been
described with reference to specific embodiments thereof. It will,
however, be evident that various modifications and changes may be
made thereto without departing from the broader spirit and scope of
the invention. Thus, the specification and drawings are,
accordingly, to be regarded in an illustrative rather than a
restrictive sense. The invention includes other contexts and
applications in which the mechanisms and processes described herein
are available to other mechanisms, methods, programs, and
processes.
[0053] In addition, in this disclosure, certain process steps are
set forth in a particular order, and alphabetic and alphanumeric
labels are used to identify certain steps. Unless specifically
stated in the disclosure, embodiments of the invention are not
limited to any particular order of carrying out such steps. In
particular, the labels are used merely for convenient
identification of steps, and are not intended to imply, specify or
require a particular order of carrying out such steps. Furthermore,
other embodiments may use more or fewer steps than those discussed
herein.
* * * * *