| Updated 2005/11/01 |
Sun[tm] Studio 11: Performance Analyzer Readme |
Contents
- Introduction
- Starting the Performance Analyzer from the IDE
- About the Performance Analyzer
- New and Changed Features
- Software Corrections
- Problems and Workarounds
- Limitations and Incompatibilities
- Documentation Errors
- Required Patches
A. Introduction
This document contains information about the Performance Analyzer and its accompanying analysis tools.
Product Documentation
- Release Notes for Solaris Platforms: Available on the developer Sun Developer Network (SDN) Sun Studio portal at http://developers.sun.com/sunstudio/documentation/ss11/release_notes.html. Information in the release notes updates and extends information in all readme files.
- Release Notes for Linux Platforms: Available on the SDN Sun Studio portal at http://developers.sun.com/sunstudio/documentation/ss11/Linux_release_notes.html. Information in the release notes updates and extends information in all readme files.
- Sun Studio 11 Documentation: Product man pages, HTML versions of readmes, and manuals can be accessed from /installation_directory/docs/index.html. The default installation directory on Solaris platforms is /opt/SUNWspro. The default installation directory on Linux platforms is /opt/sun/sunstudio11.
- IDE Documentation: Online help for all components of the Sun Studio IDE can be accessed from the Help menu and Help buttons in the IDE.
- Developer Resources Portal: For technical articles, code samples, documentation, and a knowledge base, see the developers portal at http://developers.sun.com/prodtech/cc.
Note - If your Sun Studio compilers and tools have not been installed in the default /opt directory, ask your system administrator for the equivalent path on your system.
B. Starting the Performance Analyzer from the IDE
To start the Performance Analyzer from the IDE, do one of the following:
To open the Collector window from the IDE, do one of the following:
- Click the Analyzing tab, and then load an experiment into the Analyzer window, either by choosing Analyze > File > Open Experiment in the main window or by clicking the Open Experiment toolbar button in the Analyzer window.
- Double-click an experiment in the Explorer window of the IDE.
- Right-click an experiment in the Explorer window of the IDE, and then choose Open Experiment or Add Experiment from the contextual menu.
- Click the Analyzing tab, then click the Collect Experiment toolbar button in the Performance Analyzer window.
- Right-click an executable in the Explorer and choose Performance Tools Collect from the contextual menu.
C. About the Performance Analyzer
This release of the Performance Analyzer is available on the following platform:
- SolarisTM Operating System (Solaris OS), versions 8, 9, and 10
- The following versions of the Linux Operating System:
- SuSE Linux Enterprise Server 9 (en locale only, not supported in ja and zh locales)
- RedHat Enterprise Linux 4
The Performance Analyzer and analysis tools include commands for the collection and manipulation of program performance data, a graphical user interface, and a command-line interface, er_print, for the display of performance data. The term Collector is used in this document for the tools that collect performance data and their underlying libraries. These tools are the collect command, the dbx collector subcommands, and the performance data collection features in the IDE.
The program performance analysis tools collect statistical profiles of a program's performance and trace calls to critical library routines, and display the data in tabular and graphical form. The collected data is converted into performance metrics. Metrics can be viewed in tabular form at the level of the load-object, function, source line or instruction. The tools provide a means of navigating program structure that is useful for identifying functions and paths within the code that are responsible for resource usage, inefficiencies, or time delays. The Performance Analyzer GUI can also display the performance data in a timeline display.
This release of the Performance Analyzer collector allows profiling of applications written in the JavaTM programming language, using underlying support in the JavaTM Virtual Machine (JVM) software. JVM software version 1.4.2_02 and later 1.4.2 updates, and version 5.0 Update 3 and later 5.0 updates, contain this support. The support may change in future JVM releases. If you use this release of the Performance Analyzer collector with a later JVM release, the collector may not be able to collect profiling information for applications written in the Java programming language. Sun Microsystems expects that future releases of the Performance Analyzer collector will be available to support Java profiling under the changing JVM profiling technology.
The terms "Java Virtual Machine" or "JVM" mean a virtual machine for the Java platform.
D. New and Changed Features
This section describes the new and changed features for the Performance Analyzer.
- Improved Control of Tabs Displayed
- Timeline will now Respect Filtering
- Improved Handling of Descendant Processes
- Improved Behavior of New Windows in Analyzer
- New Memory Objects Tab
- Hardware Counter Profiling on Linux
- Improved Handling of MPI Profiling
- Java Mode has been Replaced by View Mode
- Other er_print Command Changes
- Improved Filtering
- Improved Control over Tabs Displayed
The Analyzer's tab mechanism has been redesigned for greater flexibility. Only those tabs applicable to at least one loaded experiment are available, and a default set of tabs is shown, rather than all tabs, especially for large experiments. You can set default tabs in a .er.rc file, with the tabs directive. You can add or remove displayed tabs using the Select Tabs dialog box.
- Timeline Tab Now Respects Filtering
The Timeline Tab shows only events that pass the current filter settings.
- Improved Handling of Descendant Processes
The Analyzer and the er_print utility process an en_desc on|off directive in a .er.rc file. If the directive specifies on, all descendant experiments are read immediately; if the directive specifies off, only the founder experiment is read.
- Improved Behavior of New Windows in Analyzer
Additional Analyzer windows opened by clicking the New Window toolbar button or choosing File -> Create New Window, are now more cleanly separated from each other. They share the loaded experiments, but you can set filtering, metrics, sorting, etc., independently in each window.
- New Memory Objects Tab and Report
New tabs are available in the Analyzer to show performance data for cache-lines, pages, etc. Several new er_print commands are available for memory objects.
- Hardware Counter Profiling on Linux
Hardware counter overflow profiling is available on supported Linux systems. This support requires that you install the Perfctr patch. For more information, see Hardware Counter Overflow Profiling in the Limitations section.
- Improved Handling of MPI Profiling
Additional variables specifying process rank for LAM and MPICH versions of MPI are recognized.
- Java Mode Has Been Replaced by View Mode
Java mode has been replaced by View mode. The View mode settings user, expert, and machine correspond to the Java mode settings on, expert, and off settings. View mode is applicable to programming models other than Java programs, OpenMP, in particular. The javamode command is accepted with a warning.
- er_print Command Changes
Various commands to the er_print utility have been changed. Commands affecting data objects have been renamed, and the handling of the commands concerning metrics has been made more consistent. The new procstat command prints information concerning the processing of the data.
- Improved Filtering
In addition to selecting experiments and filtering on the samples, threads, LWPs, and CPUs for which you want to display metrics, you can now specify a filter expression that evaluates to true for any data record you want to include in the display.
E. Software Corrections
This section describes problems that were fixed in this release.
- Studio 10 collector for MPI profiling does not create experiment according to MPI rank
- Leaklist tab should be shown if and only if one or more expts have leak data
- er_print should warn about overflow counts too high or too low
- Improve help when user mistypes metric or *sort command
- Print Current Sort whenever we print Current Metrics
F. Problems and Workarounds
This section discusses known software problems and possible workarounds for those problems.
Some problems are the result of problems in the Solaris OS and can be fixed by installing the appropriate patches. For further information, see the Required Patches section in this readme. Some problems that appear to be in the Performance Analyzer might actually be Collector problems. Problems in the compilers and dbx can also affect the Performance Analyzer. Some issues that are not due to problems in the software are also described in this section.
Problems in the Performance Tools
- Problems in the Performance Tools
- Linux-specific Performance Tools Problems
- Problems That Can Be Fixed With Solaris OS Patches
- Other Problems
Libraries with Different Architectures May Not Be Properly Handled when Archived
Cannot Run More than One Experiment from a Single dbx session.
Linux-Specific Performance Tools Problems
- Java Synchronization Tracing May Have Performance Problems.
Java synchronization tracing may have performance problems, especially in processing large experiments.
- Java Profiling Under dbx is Not Supported.
Java profiling using the collector command in dbx or the Collector dialog in the Debugger in the Sun Studio IDE is not supported, because the JVM software can not support both a debugging agent and a profiling agent. (4771337)
- Cannot Print Summary Tab or Event Tab Data.
The Performance Analyzer cannot print the data in the Summary tab or the Event tab. To print summary data for a function or a load object, you can use the er_print command. (4286674)
- Double Counting of Metrics on Parallel Directive Lines
Metrics that are reported on parallelization compiler directive lines in annotated source code are double-counted. The metrics on the source lines in the parallel do, for, or section blocks of code are correct. There are also some double counting errors at the function level. (4656193)
- Legend Panel Not Always Updated When Colors are Changed
The colors in the Legend panel are not always updated when you change colors in the color chooser for the Timeline tab. (4948522)
- Libraries with Different Architectures May Not Be Properly Handled when Archived
Libraries with the same name but different architecture are not copied correctly when using the collect -A copy command. (4970739)
- Cannot Run More than One Experiment from a Single dbx session.
Attempting to run multiple experiments from one dbx session fails. (4999242)
- Stack Unwind for Optimized x86 or x64 Code May Fail.
The stack unwind for optimized code on x86 systems or x64 systems may fail on both the Solaris OS and Linux OS. (5084134)
Workaround:
For the Sun Studio compilers, compile with the following option:-xregs=no%frameptrFor GNU compilers, compile with following option to get the best stack unwind results:fno-omit-frame-pointer
- Profiling under dbx is not supported
Profiling while running a program in the dbx debugger is not supported under the Linux OS. This limitation applies to all versions of the Linux OS.
- MP Profiling Undercounts Data from Parallel Threads
The problem occurs on some versions of RedHat Linux and SUSE Linux. (5020387)
- Java Profiling Does Not Show Demangled Names from the JVM
The problem occurs with JVM 1.4.2_02 and later 1.4.x versions of the JVM, and JVM 1.5 that are built with GNU 2.x, which is not a supported compiler. The problem is due to a bug in the demangler that will not be fixed. (no bug number)
- Sample Data for MP Programs May Be Incorrect
CPU times may be inconsistent with sample times. (5025963)
Problems That Can Be Fixed With Solaris OS Patches
The following problems can be fixed by installing the appropriate patches to the Solaris OS. See the Required Patches section in this Readme for more information.
- Application Crash during Hardware Counter Profiling
Under some circumstances hardware counter profiling interrupts triggers an OS bug on UltraSPARC-III processors that can cause the %y register to be corrupted. If the register is live at the time, the application may crash. This is fixed in Solaris 8 OS, HW2 update, and in Solaris 9 OS, update 4. The frequency of the problem is reduced by lower-resolution profiling, and/or the use of only one counter. (4793905)
- Application deadlock with multiple
libmtsk.ain multiple shared objectsApplications with multiple shared objects that have copies of
libmtsk.alinked into them may deadlock undercollect. The workaround is to set the environment variableLD_BIND_NOWbefore invoking thecollectcommand. The Sun Studio 11libmtsk.sois a shared object, so newly compiled and linked source codes will not have this problem. (4881093; compiler bug 4877490)Other Problems
Altered Behavior With Applications That Install Signal Handlers
Incorrect Values for Wait CPU Metric in Statistics Display and Samples
- Lost Clock-Based Profiling Data for LWPs
Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs might be lost when running under the Solaris 8 OS. The workaround is to use the alternate threads library in /usr/lib/lwp. (4298226)
- Lost Hardware Counter Profiling Interrupts
When hardware counter profiling on a multithreaded application with libthread threads, the interrupt from a hardware counter overflow (SIGEMT) is lost occasionally and cannot be recovered. The bug problem occurs under the Solaris 8 OS, and the workaround is to use the alternate threads library in /usr/lib/lwp. (4352643)
- Clock-Based Profiling Inaccuracies on Loaded Systems
Profiling an application when there is a load on the system can result in significant undercount of User CPU time, up to 20%. The missing User CPU time shows up as either System CPU time or as Wait-CPU time. The problem occurs only for x86 targets on both the Solaris 8 OS and Solaris 9 OS. (4509116)
- Altered Behavior With Applications That Install Signal Handlers
Collecting performance data on an application that installs a signal handler can cause altered behavior of the collector or of the application. When such behavior is detected, the collector library records a warning message in the experiment.
When the collector library is preloaded, the collector's signal handler always re-installs itself as the primary handler, and the signal handler passes on signals that it does not use to any other handler. However, because the collector's signal handler does not interrupt system calls, an application that installs a signal handler that does interrupt system calls can show changed behavior. In the case of the asynchronous I/O library, libaio.so, which uses SIGPROF for asynchronous cancel operations, asynchronous cancel requests arrive late. (4397578)
If you attach dbx to the application without preloading the collector library, the collector installs its signal handler as the primary handler when collection is enabled. However, any signal handler installed subsequently takes precedence over the collector's signal handler. If this signal handler does not pass on SIGPROF and SIGEMT signals to the collector's signal handler, profiling data is lost.
- Data Collection Problems When dbx is Attached to a Process
If you attach dbx to a running process without preloading the collector library, libcollector.so, there are a number of errors that can occur.
You might not be able to collect any data when synchronization wait tracing, heap tracing, or MPI tracing. Tracing data is collected by interposing on various libraries, and if libcollector.so is not preloaded, the interposition cannot be done.
If the program installs a signal handler after dbx is attached to the process, and the signal handler does not pass on the SIGPROF and SIGEMT signals, profiling data and sampling data is lost. (4397578)
If the program uses the asynchronous I/O library, libaio.so, clock-based profiling data and sampling data is lost, because libaio.so uses SIGPROF for asynchronous cancel operations.
If the program uses the hardware counter library, libcpc.so, hardware counter overflow profiling experiments are corrupted because both the collector and the program are using the library. If the hardware counter library is loaded after dbx is attached to the process, the hardware counter experiment can succeed provided references to the libcpc library functions are resolved by a general search rather than a search in libcpc.so.
If the program calls setitimer(2), clock-based profiling experiments can be corrupted because both the collector and the program are using the timer.
- Incorrect Values for Wait CPU Metric in Statistics Display and Samples
Incorrect values for the Wait CPU metric are sometimes recorded in the sample packets and the global statistics. These values appear in the Statistics tab of the Performance Analyzer and affect the display of samples in the Timeline tab. (4615617)
- Lost Clock Profiling Data Over a Small Time Period
Clock profiling data can appear to be lost over a period of several seconds when the system clock is being synchronized with an external source. During this time, the system clock is incremented until it is synchronized. Profile signals are delivered at the set interval, but the time stamp recorded in the profile packets includes any increment that is made between signal deliveries.
- Data Collection Aborts With a Stack Overflow.
Sometimes the Collector can fail with a stack overflow error. This happens because the Collector uses the application's stack and the stack size for the application is too small to support use by the Collector. The workaround is to increase the stack size by at least 8 Kbytes. See the limit(1) man page for details. For parallel applications that use the multitasking library, the stack size for each thread must also be set using the STACKSIZE environment variable.
- Incomplete Experiment When Program Calls exec.
When the program on which performance data is being collected successfully calls exec(2) or any of its variants, the experiment is terminated abnormally. Although the experiment can still be read by the Performance Analyzer or er_print, you should run er_archive(1) for the experiment on the computer on which the data was collected, to ensure that the load objects used by the program were archived correctly.
- False Recursion Shown With Tail Call Optimization.
For some functions that make tail-call optimized calls from a shared object (PIC code) and require the determination of the global offset table address in order to reference a global variable, the optimized code is incorrectly reported as recursive, and the real caller is lost. (4656890)
- False Recursion Shown With Outline Function Optimization.
For functions that are optimized by producing an outline function for seldom-executed code, false recursion might be shown due to the inability of the tools to determine the outline function's return address. (4800953)
Check the support page on the SDN Sun Studio portal, http://developers.sun.com/sunstudio/support/ for the latest information.
G. Limitations and Incompatibilities
This section discusses limitations and incompatibilities with systems or other software.
Profiling Applications written in the Java Programming Language
- Hardware Counters on Pentium Processors
- Performance Analyzer Requirements
The Performance Analyzer requires the Java 2 Software Development Kit, Standard Edition (J2SE), in a version no earlier than 1.4.2_02. If you use an earlier version, the Performance Analyzer runs, but could fail, not function correctly, or perform poorly. The Analyzer does not run with the 64-bit J2SE 5.0 Update 3 technology. If you are using the J2SE 5.0 technology, you must have the 32-bit version available to run the Analyzer.
- Profiling Multithreaded Applications
When collecting experiments from applications that are implicitly or explicitly multithreaded (using libthread.so on the Solaris 8 OS), it is recommended that you collect with the alternate libthread library (/usr/lib/lwp/libthread.so or /usr/lib/lwp/64/libthread.so). If you use the Solaris 8 OS with the default libthread.so library, a warning is issued to notify you of possible data distortion that might result.
- Profiling Applications Written in the Java Programming Language
To collect Java-mode or machine-mode profiling data on an application written in the Java programming language you must use a version of the JavaTM 2 Software Development Kit, Standard Edition, no earlier than 1.4.2_02. There are bugs in the JVM software that may cause program failure with any version earlier than 1.4.2_02.
For best results for Java-mode profiling, you should use the version of the Java 2 Software Development Kit, Standard Edition, available as an install option with this Sun Studio release.
Java profiling is not supported under dbx.
- Hardware Counter Overflow Profiling
Hardware counter overflow profiling is not supported on UltraSPARC® processors earlier than the UltraSPARC® III series.
The Collector cannot collect hardware counter overflow data if cputrack is running on the system because cputrack takes control of the hardware counters.
Hardware counter overflow profiling on Linux requires that you install the latest Perfctr patch, which you can download at http://user.it.uu.se/~mikpe/linux/perfctr/2.6/perfctr-2.6.15.tar.gz. Instructions for installing the patch are included in the tar file.
- Library Interposition
The Collector interposes on various system functions, including signal handling, fork and exec calls, the hardware counter library and some timing functions, to ensure that it can collect valid data. If a program uses any of these functions, its behavior can change. In particular, the profiling timer and the hardware counters are not available to a program when profiling is enabled, and system calls are not interrupted to deliver signals. This behavior affects the use of the asynchronous I/O library, libaio.so, which does interrupt system calls to deliver signals. These interpositions do not take place if you attach dbx to a running process without preloading the collector library, libcollector.so, and then enable data collection.
- Finding Source and Object Files
The executable name that is generated when the debugger is attached to a process can be a relative path, not an absolute path, or the path, even though absolute, might not be accessible to the Performance Analyzer. Similar problems can arise with object files loaded from an archive (.a).
The Performance Analyzer extracts the basename (the name following the last "/") from the recorded path in the executable or object file, and searches for the files as follows:
- It searches for a file with the basename under directories given by addpath or setpath commands, or as set from the Search Path tab in the Set Data Presentation dialog box, in the order given. The default setpath is:
- The experiment archive directories
- The current working directory, that is, as ./<basename>
- It searches for the file using the original recorded path.
If the Analyzer does not find the file, it generates an error or warning, showing the path as it originally appeared in the experiment.
To enable the Performance Analyzer to find the source file, you can add the directory containing the file to the search path, or you can set up a symbolic link from the current directory that points to the actual location of the file, or you can copy the file into the experiment.
- Experiment Incompatibility
The Performance Analyzer cannot load experiments created with versions of the Collector prior to the ForteTM Developer 7 software release.
- Use of setuid
If the process calls setuid or executes setuid files, the Collector can fail to create an experiment due to permission problems.
See the collect(1) man page for more information about restrictions on data collection.
Hardware Counters using Pentium Processors
Pentium IV processors with HyperThreading Technology have only one set of hardware counters per physical processor. To use hardware counters on a system with Pentium IV HT processors, a system administrator must first take the processors in the system off-line until each physical processor has only one hardware thread online. See the-vand-poptions topsrinfo(1M) and the-foption topsradm(1M) for more information.
When using multiple hardware counters on a Pentium IV processor, some combinations of counters can not be bound due to resource constraints. For example, thebranch_retiredmetric cannot be measured on registers 12 and 13 simultaneously because both counters require the Pentium IV CRU_ESCR2 ESCR to measure this event. See the processor documentation for more details.
H. Documentation Errata
There is no new information at this time. Additional information might be made available at http://developers.sun.com/sunstudio/
I. Required Patches
Some of the problems with the performance analysis tools originate in bugs in the Solaris OS. To fix these problems, you should install the relevant patches. To obtain a list of required patches, you can type the collect command at the command prompt with no arguments. The patches can be downloaded from http://sunsolve.sun.com. If you are using the Solaris 8 OS, you should install an update that is no earlier than update 5 before installing patches.
The following problems can be encountered by the Collector and Performance Analyzer when the patches are not installed:
- Programs that use libaio and invoke aio_cancel() abort during data collection with a variety of error messages, including the following:
dbx: Cannot read status for 1@1--No such file or directory dbx: Warning: proc state race condition encountered!Multithreaded executables cause a SEGV during data collection. Sometimes the core dump occurs in the thread library code, and sometimes it occurs in sigacthandler() for the SIGPROF signal.
- Multithreaded executables can fail during collection with various dbx error messages, including those listed under the first bullet and messages reporting the following:
generic libthread_db.so errorMultithreaded executables can fail during collection with a libthread panic relating to a signal fault in a critical section.
Data for multithreaded executables can be missing, because at some point the threads library masks the profiling signal and all subsequent data is lost.
When a multiprocessor application is running with unbound threads, the interrupt from a hardware counter overflow (SIGEMT) is occasionally lost and cannot be recovered.
Under some circumstances profiling interrupts (SIGPROF) for one or more LWPs can be lost. When this happens, data displayed does not include thread profile metrics for threads run on those LWPs. This happens most often with unbound threads in the Solaris 8 OS.
An application that uses more than 32 CPUs or threads can run much slower when performance data is being collected.
Under some circumstances hardware counter overflow profiling interrupts triggers an OS bug that can cause the %y register to be corrupted. If the register is live at the time, the application may crash.
Copyright © 2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms.