Sun Java Solaris Communities My SDN Account Join SDN
 
Downloads

Sun Studio Express - Performance Analyzer Readme

 

Sun Studio Express February 2007 Early Access Release

Here is what new and updated in the Sun Studio performance tools in this release:


Updated man pages:


IDE Integration

The Analyzer and Collector GUIs have been integrated into the new IDE, based on NetBeans 5.5.1. Data collection from the IDE will use the collect technology, not the dbx collector.

Experiment Format

The experiment format has been changed to accomodate data-race detection data, to record callstacks more efficiently, and to better handle writing of arbitrary integers. The version number has been changed to 10.

Data Collection

The collect command is changed in this release as follows:
  • The collect command accepts a new argument, -r <option> in support of the Thread Analyzer. (See the Thread Analyzer Readme for more information.) Options available are datarace and deadlock specifying collecting data-race-detection and deadlock-detection data, respectively.

    Data-race data will produce a list of data-races, which may be written by the er_print races command, and which will appear in the new Races tab in the Analyzer. It will also produce a "Race Accesses" metric for functions, callers and callees, source and disassembly.

    Deadlock data will produce a list of deadlocks which will appear in the new Deadlocks Tab in Analyzer, and be written by the er_print deadlocks command.

  • The collect command accepts a new argument, -t <duration>, which specifies a time range for data collection. The <duration> may be a single number, with an optional m or s suffix, giving the time in minutes or seconds (default) at which the experiment should be terminated. It may also be given as two such numbers separated by a hyphen, in which case data collection will be paused until the first time is reached, then resumed, and terminated when the second time is reached. If the second number is given as zero, data will be collected from the first time to the end of the run. Even if the experiment is terminated, the target process is allowed to run to completion.

  • The collect command accepts a new option, -c on (SPARC-Solaris only), to collect function and instruction count data. The option is implemented using bit developed by the Global Optimization team. In addition to supporting metrics for Bit-Func-Count, Bit-Inst-Exec, and Bit-Inst-Annul, it supports a new Instruction-Frequency Tab in Analyzer, and an er_print ifreq command for instruction-frequency summaries. Data is recorded for the executable and for any shared objects that it statically links with, and requires that those executables and shared objects be compiled with the -xbinopt=prepare flag. Any other shared objects that are statically linked but not compiled with the -xbinopt=prepare flag will not be included in the data. Likewise, any shared objects that are dynamically opened will not be included in the data.

  • The collect command also accept a second option, -c static (SPARC-Solaris only), to generate an experiment assuming every instruction in the target executable and any shared objects that it statically links with and which were compiled with the -xbinopt=prepare flag, was executed exactly once. (This functionality was present in Venus, but not documented then.)

  • The collect command accepts a new argument, -P <pid> (Solaris only), which specifies attaching to the process with the given PID, and collecting data from it. The other options to collect are translated into a script for dbx, which is then invoked to collect the data. Only clock- and HWC-profile data may be collected; tracing data is not supported. (This functionality was present in Venus, but not documented then.)

  • Data may be selectively collected on descendant processes, by using a new argument to the -F flag, =<regex>. If either the process lineage, or the base name of the executable (a.out) matches the expression, data will be collected; otherwise it will not. (This functionality was present in Venus for Solaris, but not documented then.) Descendant processes are supported on Solaris and Linux.

  • The clock profiling argument, -p <interval> may be prepended with a + sign (SPARC-Solaris only). When specified, it will record additional data corresponding to dataspace profiling. It will generate data if the instruction immediately preceeding the interrupt PC is a memory operation, and the data will be translated into a metric, "Max. Mem. Stall Time". The data may be very misleading, since a high-metric may not mean a high actual memory stall time. (This functionality was present in Venus, but not documented then.)

  • The collect command on Linux supports a -s <option> specifying synchronization tracing. It also supports a -m [on|off] option specifying MPI tracing.

  • Data collection will only interpose on synchronization, memory-allocation, and MPI entry points if the corresponding data is requested.

The dbx collector is changed as follows:

  • The dbx collector a tha [on|off|all|<options>] command, to support the Thread Analyzer. The <options> values correspond to those for the -r flag on the collect command.

The libcollector API is changed in this release as follows:

  • The prototype for thread-related collector API commands to use a pthread_t instead of an int

  • The linking commands needed are made more explicit on the man page.

The er_kernel command for profiling a Solaris kernel is changed in this release as follows:

  • The er_kernel -t argument has been extended to match the option to collect as described above.

Analyzer and er_print

The Performance Analyzer and er_print in this release are improved versions from the previous release.

The Analyzer is changed in this release as follows:

  • The en_desc command in a .er.rc file has been extended to accept an argument in the form =<regex>. Any descendant process whose lineage or target name matches the regular expression will be automatically loaded; any descendant whose lineage and target name does not match will not be automatically loaded. (This option was available, but undocumented, in Venus.)

  • A new Races tab is available to show detected data-races. A new right-hand Race Detail tab is available to show detailed information for the data-race selected in the Races tab. A new Race Source Tab has been added, showing two source contexts corresponding to the two accesses in the selected race shown in the Race Detail Tab. A new metric, Race Accesses, is available for experiments with data-race-detection data, for functions, callers and callees, source and disassembly. Thresholding for highlighting lines in source and disassembly for this metric is set to any non-zero value, independent of the threshold for other metrics. All of the datarace-related functionality is visible only if data-race-detection data is present.

  • A new Deadlocks tab, and right-hand Deadlock Detail Tab are available to show detected deadlocks. It also uses the Race Source Tab to show the two contexts involved in a deadlock, and has a deadlocks metric for functions, callers and callees, source and disassembly. All of the deadlock-related functionality is visible only if ddeadlock-detection data is present.

  • A new type of tab, IndexObject, is available. They are similar to the MemoryObject tabs, but process all data, and show only Exclusive metrics. Custom IndexObject tabs may be defined either in the GUI, or with directives in a .er.rc file. IndexObject tabs for Threads, CPUs, Samples, and Seconds are predefined.

  • The management of the right-hand set of tabs has been changed so that only those tabs corresponding to enabled left-hand tabs will be shown. The Race Detail tab will be shown only if the Races tab is enabled; the Deadlock Detail tab will be shown only if the Deadlock tab is enabled; the Leak tab will be shown only if the LeakList tab is enabled; and the Event tab will be shown only if the Timeline is enabled. Selection in a left-hand tab will raise the corresponding right-hand tab, except that selection in the Source or Disassembly tab will not raise the Summary tab.

  • A new tab, Source/Disassembly has been added; it shows the source for the selected function in the upper panel, and the disassembly for that function in the lower panel. It is not visible by default, but can be made visible by selection from the GUI, or by adding srcdis to the list of tabs in the tabs or rtabs command in a .er.rc file.

  • A new command, tha, has been implemented to bring up the Analyzer with a simplified set of Tabs, tailored for the Thread Analyzer. The set of tabs shown are governed by a new rtabs directive from the user or system .er.rc files.

  • The Legend Tab has been removed, and the color legend for the Timeline moved to the Timeline color chooser dialog.

  • The Timeline controls have been moved from the main menu bar, to a toolbar on the Event right-hand tab. Other such toolbars are on the Race Detail, Deadlock Detail, and Leak right-hand tabs.

  • The new pathmap directive for finding source and disassembly files is available by putting from a .er.rc file. (See the er_print discussion of pathmap, below.) GUI support for managing pathmapping is planned for later in the release.

The er_print command is changed in this release as follows:

  • The en_desc command in a .er.rc file has been extended to accept an argument in the form =<regex>. Any descendant process whose lineage or target name matches the regular expression will be automatically loaded; any descendant whose lineage and target name does not match will not be automatically loaded. (This option was available, but undocumented, in Venus.)

  • A new command, ifreq prints a summary of instruction-frequencies from a count-data experiment. (This command was available, but undocumented, in Venus.)

  • A new command, races prints a list of the detected data-races from a data-race-detection experiment. A new command, rsummary which takes an argument of either a race ID, or the word "all". It prints detailed information for the given race. A new metric, Race Accesses, is available for experiments with race-detection data, for functions, callers and callees, source and disassembly.

  • A new command, deadlocks prints a list of the detected deadlocks from a deadlock-detection experiment. A new command, dsummary which takes an argument of either a deadlock ID, or the word "all". It prints detailed information for the given deadlock. A new metric, Deadlocks, is available for experiments with deadlock-detection data, for functions, callers and callees, source and disassembly.

  • New commands to support IndexObject reports are available: indxobj <type>, indxobj_list, indxobj_sort, and indxobj_metrics <metric_spec>, as well as the commands for the predefined threads, cpus, seconds, and samples. A new command, indxobj_define, can be used to predefine custom IndexObject tabs, either in a .er.rc file, or from the command line. The implementation of MemoryObject and IndexObject tabs, and the commands for them, may change later in the release.

  • A new command in a .er.rc file, rtabs, can be used to specify which tabs are visible in the Analyzer, when invoked with the tha command. The tabs are specified as they are in the tabs directive.

  • A new command, pathmap, can be used for finding source and disassembly files. It may be put in a .er.rc file, or entered as a command for er_print. Each pathmap command takes two arguments, old_prefix and new_prefix. If a file can not be found with the search path as currently set, if its full path begins with old_prefix, the old_prefix will be replaced by new_prefix to form a new full path, and the file will be looked for there. Multiple pathmap directives may be used.

Linux Functionality

A subset of the functionality available on Solaris is provided for Linux. HW counter profiling is now available for Linux on AMD Opteron-based machines on which the PerfCtr patch has been installed. Synchronization and MPI tracing are now supported on Linux, and support for amd64-Linux has been added. Descendant process following is also supported on Linux.

(Last updated February 15, 2007)