Sun Studio Express February 2007 Early Access Release
Here is what new and updated in the Sun Studio performance tools in this release:
Updated man pages:
IDE Integration
The Analyzer and Collector GUIs have been integrated into the new
IDE, based on NetBeans 5.5.1. Data collection from the IDE will
use the collect technology, not the dbx collector.
Experiment Format
The experiment format has been changed to accomodate data-race detection data,
to record callstacks more efficiently, and to better handle writing of
arbitrary integers. The version number has been changed to 10.
Data Collection
The collect
command is changed in this release as follows:
- The collect command accepts a new argument, -r <option>
in support of the Thread Analyzer.
(See the Thread Analyzer Readme for more information.)
Options available are datarace and deadlock specifying
collecting data-race-detection and deadlock-detection data, respectively.
Data-race data will produce
a list of data-races, which may be written by the er_print races
command, and which will appear in the new Races tab in the Analyzer.
It will also produce a "Race Accesses" metric for functions, callers and callees,
source and disassembly.
Deadlock data will produce a list of deadlocks which will appear in the new
Deadlocks Tab in Analyzer, and be written by the er_print deadlocks command.
- The collect command accepts a new argument, -t <duration>,
which specifies a time range for data collection. The <duration> may
be a single number, with an optional m or s suffix, giving the
time in minutes or seconds (default) at which the experiment should be terminated.
It may also be given as two such numbers separated by a hyphen, in which case
data collection will be paused until the first time is reached, then resumed,
and terminated when the second time is reached.
If the second number is given as zero, data will be collected from the first
time to the end of the run. Even if the experiment is terminated,
the target process is allowed to run to completion.
- The collect command accepts a new option, -c on (SPARC-Solaris only),
to collect function and instruction count data.
The option is implemented using bit developed by the Global Optimization team.
In addition to supporting metrics for Bit-Func-Count, Bit-Inst-Exec,
and Bit-Inst-Annul, it supports a new Instruction-Frequency Tab in Analyzer,
and an er_print ifreq command for instruction-frequency summaries.
Data is recorded for the executable and for any shared objects that it
statically links with, and requires that those executables and shared objects
be compiled with the -xbinopt=prepare flag. Any other shared objects
that are statically linked but not compiled with the -xbinopt=prepare flag
will not be included in the data. Likewise, any shared objects that are dynamically
opened will not be included in the data.
- The collect command also accept a second option, -c static (SPARC-Solaris only),
to generate an experiment assuming every instruction in the target executable
and any shared objects that it statically links with and which were compiled
with the -xbinopt=prepare flag, was executed exactly once.
(This functionality was present in Venus, but not documented then.)
- The collect command accepts a new argument, -P <pid> (Solaris only),
which specifies attaching to the process with the given PID, and collecting
data from it. The other options to collect are translated into a
script for dbx, which is then invoked to collect the data.
Only clock- and HWC-profile data may be collected; tracing data is not
supported.
(This functionality was present in Venus, but not documented then.)
- Data may be selectively collected on descendant processes, by
using a new argument to the -F flag, =<regex>.
If either the process lineage, or the base name of the executable (a.out)
matches the expression, data will be collected; otherwise it will not.
(This functionality was present in Venus for Solaris, but not documented then.)
Descendant processes are supported on Solaris and Linux.
- The clock profiling argument, -p <interval>
may be prepended with a + sign (SPARC-Solaris only).
When specified, it will record additional data corresponding to dataspace profiling.
It will generate data if the instruction immediately preceeding
the interrupt PC is a memory operation, and the data will be
translated into a metric, "Max. Mem. Stall Time". The data
may be very misleading, since a high-metric may not mean a high
actual memory stall time.
(This functionality was present in Venus, but not documented then.)
- The collect command on Linux supports a -s <option>
specifying synchronization tracing. It also supports a -m [on|off]
option specifying MPI tracing.
- Data collection will only interpose on synchronization, memory-allocation,
and MPI entry points if the corresponding data is requested.
The dbx collector
is changed as follows:
- The dbx collector a tha [on|off|all|<options>]
command, to support the Thread Analyzer.
The <options> values correspond to those for the
-r flag on the collect command.
The libcollector API
is changed in this release as follows:
- The prototype for thread-related collector API commands to use a
pthread_t instead of an int
- The linking commands needed are made more explicit on the man page.
The er_kernel command for
profiling a Solaris kernel is changed in this release as follows:
- The er_kernel -t argument has been extended to match
the option to collect as described above.
Analyzer and er_print
The Performance Analyzer and er_print in this release are
improved versions from the previous release.
The Analyzer is changed in this release
as follows:
- The en_desc command in a .er.rc file has been extended
to accept an argument in the form =<regex>.
Any descendant process whose lineage or target name matches the
regular expression will be automatically loaded;
any descendant whose lineage and target name does not match will
not be automatically loaded.
(This option was available, but undocumented, in Venus.)
- A new Races tab is available to show detected data-races.
A new right-hand Race Detail tab is available to show detailed
information for the data-race selected in the Races tab.
A new Race Source Tab has been added, showing two
source contexts corresponding to the two accesses in the
selected race shown in the Race Detail Tab.
A new metric, Race Accesses, is available for experiments with data-race-detection
data, for functions, callers and callees, source and disassembly.
Thresholding for highlighting lines in source and disassembly for
this metric is set to any non-zero value, independent of the threshold
for other metrics.
All of the datarace-related functionality is visible only if data-race-detection data is present.
- A new Deadlocks tab, and right-hand Deadlock Detail Tab
are available to show detected deadlocks.
It also uses the Race Source Tab to show
the two contexts involved in a deadlock, and has a deadlocks metric
for functions, callers and callees, source and disassembly.
All of the deadlock-related functionality is visible only if ddeadlock-detection data is present.
- A new type of tab, IndexObject, is available. They are similar to the
MemoryObject tabs, but process all data, and show only Exclusive metrics.
Custom IndexObject tabs may be defined either in the GUI, or with directives
in a .er.rc file. IndexObject tabs for Threads, CPUs, Samples, and Seconds
are predefined.
- The management of the right-hand set of tabs has been changed so that only those
tabs corresponding to enabled left-hand tabs will be shown.
The Race Detail tab will be shown only if the Races tab is enabled;
the Deadlock Detail tab will be shown only if the Deadlock tab is enabled;
the Leak tab will be shown only if the LeakList tab is enabled;
and the Event tab will be shown only if the Timeline is enabled.
Selection in a left-hand tab will raise the corresponding
right-hand tab, except that selection in the Source or Disassembly tab will not
raise the Summary tab.
- A new tab, Source/Disassembly has been added; it shows the source for the
selected function in the upper panel, and the disassembly for that
function in the lower panel. It is not visible by default, but can
be made visible by selection from the GUI, or by adding srcdis to the
list of tabs in the tabs or rtabs command in a .er.rc file.
- A new command, tha, has been implemented to bring up the Analyzer
with a simplified set of Tabs, tailored for the Thread Analyzer.
The set of tabs shown are governed by a new rtabs directive from
the user or system .er.rc files.
- The Legend Tab has been removed, and the color legend for the Timeline
moved to the Timeline color chooser dialog.
- The Timeline controls have been moved from the main menu bar, to a toolbar
on the Event right-hand tab. Other such toolbars are on the Race Detail,
Deadlock Detail, and Leak right-hand tabs.
- The new pathmap directive for finding source and disassembly files
is available by putting from a .er.rc file.
(See the er_print discussion of pathmap, below.)
GUI support for managing pathmapping is planned for later in the release.
The er_print
command is changed in this release
as follows:
- The en_desc command in a .er.rc file has been extended
to accept an argument in the form =<regex>.
Any descendant process whose lineage or target name matches the
regular expression will be automatically loaded;
any descendant whose lineage and target name does not match will
not be automatically loaded.
(This option was available, but undocumented, in Venus.)
- A new command, ifreq prints a summary of instruction-frequencies
from a count-data experiment.
(This command was available, but undocumented, in Venus.)
-
A new command, races prints a list of the detected data-races
from a data-race-detection experiment.
A new command, rsummary which takes an argument of either a race ID,
or the word "all". It prints detailed information for the given race.
A new metric, Race Accesses, is available for experiments with race-detection
data, for functions, callers and callees, source and disassembly.
-
A new command, deadlocks prints a list of the detected deadlocks
from a deadlock-detection experiment.
A new command, dsummary which takes an argument of either a deadlock ID,
or the word "all". It prints detailed information for the given deadlock.
A new metric, Deadlocks, is available for experiments with deadlock-detection
data, for functions, callers and callees, source and disassembly.
- New commands to support IndexObject reports are available:
indxobj <type>,
indxobj_list, indxobj_sort,
and indxobj_metrics <metric_spec>, as well as the
commands for the predefined threads, cpus, seconds,
and samples.
A new command, indxobj_define, can be
used to predefine custom IndexObject tabs, either
in a .er.rc file, or from the command line.
The implementation of MemoryObject and IndexObject tabs, and the commands
for them, may change later in the release.
- A new command in a .er.rc file, rtabs, can be used to
specify which tabs are visible in the Analyzer, when invoked with the tha
command. The tabs are specified as they are in the tabs directive.
- A new command, pathmap, can be used for finding source and disassembly files.
It may be put in a .er.rc file, or entered as a command for er_print.
Each pathmap command takes two arguments, old_prefix and new_prefix.
If a file can not be found with the search path as currently set, if its full path
begins with old_prefix, the old_prefix will be replaced by
new_prefix to form a new full path, and the file will be looked for there.
Multiple pathmap directives may be used.
Linux Functionality
A subset of the functionality available on Solaris is provided for Linux.
HW counter profiling is now available for Linux on AMD Opteron-based machines
on which the PerfCtr patch has been installed. Synchronization and MPI tracing are now
supported on Linux, and support for amd64-Linux has been added. Descendant process
following is also supported on Linux.
(Last updated February 15, 2007)
|
|