Man Page collect(1)




NAME

     collect - command used to collect program performance data


SYNOPSIS

     collect collect-arguments target target-arguments
     collect
     collect -V
     collect -R


DESCRIPTION

     The collect command runs the target process and records per-
     formance  data and	global data for	the process.  Performance
     data is collected using  profiling	 or  tracing  techniques.
     The  data can be examined with a GUI program (analyzer) or	a
     command-line  program  (er_print).	  The	data   collection
     software  run  by the collect command is referred to here as
     the Collector.

     The data from a single run	of the collect command is  called
     an	 experiment.   The  experiment is represented in the file
     system as a directory, with various files inside that direc-
     tory.

     The target	is the path name of the	executable, Java(TM) .jar
     file, or Java .class file for which you want to collect per-
     formance data.  (For more information about Java  profiling,
     see  JAVA	PROFILING,  below.)  Executables that are targets
     for the collect command can be compiled with  any	level  of
     optimization, but must use	dynamic	linking.  If a program is
     statically	linked,	the collect command prints an error  mes-
     sage.   In	 order	to see annotated source	using analyzer or
     er_print, targets should be compiled with the -g  flag,  and
     should not	be stripped.

     In	order to enable	dataspace profiling, executables must  be
     compiled  with  the  -xhwcprof -xdebugformat=dwarf	-g flags.
     These flags are valid for the C, C++ and Fortran  compilers,
     but  only on SPARC[R] platforms.  See the section "DATASPACE
     PROFILING", below.

     The collect command uses the following strategy to	find  its
     target:

     - If there	is a file with the name	of  the	 target	 that  is
       marked  executable, the file is verified	as an ELF execut-
       able that can run on the	target machine.	If  the	 file  is
       not  such  a  valid  ELF	 executable,  the collect command
       fails.

     - If there	is a file with the name	of the	target,	 and  the
       file is not executable, collect checks whether the file is
       a Java[TM] jar file or class file. If the file is  a  Java
       jar file	or class file, the Java[TM] virtual machine (JVM)
       software	is inserted as the  target,  with  any	necessary
       flags,  and  data  is collected on that JVM machine.  (The
       terms "Java virtual machine"  and  "JVM"	 mean  a  virtual
       machine	for  the  Java[TM] platform.)  See the section on
       "JAVA PROFILING", below.

     - If there	is no file with	 the  name  of	the  target,  the
       user's  path is searched	to find	an executable; if an exe-
       cutable is found, it is verified	as described above.

     - If no file of the current name is found,	the command looks
       for  a file with	that name and the string .class	appended;
       if a file is  found,  the  target  of  a	 JVM  machine  is
       inserted, with the appropriate flags, as	above.

     - If none of these	procedures can find the	target,	the  com-
       mand fails.



OPTIONS

     If	invoked	with no	arguments, print a usage summary, includ-
     ing the default configuration of the experiment. If the pro-
     cessor supports hardware counter overflow	profiling,  print
     two  lists	 containing  information about hardware	counters.
     The first list contains "well known" hardware counters;  the
     second   list  contains  raw  hardware  counters.	For  more
     details, see the "Hardware	Counter	Overflow Profiling"  sec-
     tion below.

  Data Specifications
     -p	option
	  Collect clock-based profiling	data.  The allowed values
	  of option are:

	  Value	    Meaning

	  off	    Turn off clock-based profiling

	  on	    Turn  on  clock-based  profiling   with   the
		    default  profiling	interval of approximately
		    10 milliseconds.

	  lo[w]	    Turn on clock-based	profiling with	the  low-
		    resolution	profiling  interval  of	 approxi-
		    mately 100 milliseconds.

	  hi[gh]    Turn on clock-based	profiling with the  high-
		    resolution	profiling  interval  of	 approxi-
		    mately 1 millisecond.

	  n	    Turn  on   clock-based   profiling	 with	a
		    profiling  interval	of n.  The value n can be
		    an integer or a floating-point number, with	a
		    suffix  of u for values in microseconds, or	m
		    for	values in milliseconds.	 If no suffix  is
		    used, assume the value to be in milliseconds.

		    If the value is smaller than the  clock  pro-
		    filing  minimum, set it to the minimum; if it
		    is not a  multiple	of  the	 clock	profiling
		    resolution,	 round down to the nearest multi-
		    ple	of the clock resolution.  If  it  exceeds
		    the	clock profiling	maximum, report	an error.
		    If it is negative or zero, report  an  error.
		    If	invoked	 with  no  arguments,  report the
		    clock-profiling intervals.

	  An optional +	may be prepended to  the  clock-profiling  |
	  interval,  specifying	 that  collect	capture	dataspace  |
	  data.	 It will do so by backtracking	one  instruction,  |
	  and  if  that	 instruction  is a memory instruction, it  |
	  will assume that  the	 delay	was  attributed	 to  that  |
	  instruction and record the event, including the virtual  |
	  and physical addresses of the	memory reference.	   |

	  Caution  must	 be  used  in  interpreting   clock-based  |
	  dataspace  data;  the	delay may be completely	unrelated  |
	  to the memory	instruction that happened to preceed  the  |
	  instruction with the clock-profile hit; for example, if  |
	  a memory instruction hits in the cache,  but	is  in	a  |
	  loop	executed many times, high counts on that instruc-  |
	  tion may appear to indicate memory stall delays, but it  |
	  does	not.  This situation may be disambiguated by exa-  |
	  mining the disassembly around	the instruction	 indicat-  |
	  ing  the  stall.   If	the surrounding	instructions also  |
	  have high clock-profiling metrics, the memory	delay  is  |
	  likely to be spurious.				   |

	  Clock-based dataspace	profiling should only be used  on  |
	  machines  that  do  not support HW counter profiling on  |
	  memory-based counters.				   |

	  See the section "DATASPACE PROFILING", below.		   |

	  If no	 explicit  -p  off  argument  is  given,  and  no
	  hardware  counter overflow profiling is specified, turn
	  on clock-based profiling.

     -h	ctr_def...[,ctr_n_def]
	  Collect hardware counter overflow profiles. The  number
	  of  counter definitions, (ctr_def through ctr_n_def) is
	  processor-dependent. For example, on an UltraSPARC  III
	  system,  up  to  two	counters may be	programmed; on an
	  Intel	Pentium	IV with	Hyperthreading,	up to 18 counters
	  are  available.  The	user  can  ascertain  the maximum
	  number of hardware counters definitions  for	profiling
	  on  a	 target	 system,  and  the full	list of	available
	  hardware  counters,  by  running  the	 collect  command
	  without any arguments.

	  This option is now available	on  systems  running  the
	  Linux	 OS.  The  user	is responsible for installing the
	  required perfctr patch on the	system;	that patch can be
	  downloaded from:
	  http://user.it.uu.se/~mikpe/linux/perfctr/2.6/perfctr-2.6.15.tar.gz
	  Instructions for installation	are contained within that
	  tar file.

	  Each counter definition  takes  one  of  the	following
	  forms,  depending  on	 whether  attributes for hardware
	  counters are supported on the	processor:

	  1. [+]ctr[/reg#][,interval]

	  2. [+]ctr[~attr=val]...[~attrN=valN][/reg#][,interval]

	  The meanings of the counter definition options  are  as
	  follows:

	  Value	    Meaning

	  +	    Optional parameter that  can  be  applied  to
		    memory-related  counters.  Causes  collect to
		    collect dataspace  data  by	 backtracking  to
		    find the instruction that triggered	the over-
		    flow, and to find the  virtual  and	 physical
		    addresses of the memory reference. Backtrack-
		    ing	works on SPARC processors, and only  with
		    counters  of type load, store, or load-store,
		    as displayed in the	counter	list obtained  by
		    running   the  collect  command  without  any
		    command-line  arguments.   See  the	  section
		    "DATASPACE PROFILING", below.

	  ctr	    Processor-specific counter name. The user can
		    ascertain  the  list of counter names by run-
		    ning  the  collect	 command    without   any
		    command-line arguments.

	  attr=val  On some processors,	attribute options can  be
		    associated	with  a	 hardware counter. If the
		    processor supports	attribute  options,  then
		    running   collect  without	any  command-line
		    arguments specifies	the  counter  definition,
		    ctr_def, in	the second form	listed above, and
		    provide a list of attribute	names to use  for
		    attr.  Value val can be in decimal or hexade-
		    cimal format. Hexadecimal format numbers  are
		    in	C  program  format  where  the	number is
		    prepended  by  a  zero   and   lower-case	x
		    (0xhex_number).

	  reg#	    Hardware register to use for the counter.  If
		    not	 specified, collect attempts to	place the
		    counter into the first available register and
		    as	a result, might	be unable to place subse-
		    quent counters due to register conflicts.  If
		    the	user specifies more than one counter, the
		    counters must use different	 registers.   The
		    list  of  allowable	 register  numbers can be
		    ascertained	by running  the	 collect  command
		    without any	command-line arguments.

	  interval  Sampling  frequency,  set  by  defining   the
		    counter  overflow value.  Valid values are as
		    follows:

		    Value     Meaning

		    on	      Select the default rate, which  can
			      be  determined  by running the col-
			      lect command without  any	 command-
			      line   arguments.	  Note	that  the
			      default value for	all raw	 counters
			      is  the  same, and might not be the
			      most suitable value for a	 specific
			      counter.

		    hi	      Set interval  to	approximately  10
			      times shorter than on.

		    lo	      Set interval  to	approximately  10
			      times longer than	on.

		    value     Set interval to a	 specific  value,
			      specified	in decimal or hexadecimal
			      format.

	  An experiment	can specify both hardware  counter  over-
	  flow	profiling and clock-based profiling.  If hardware
	  counter overflow profiling  is  specified,  but  clock-
	  based	 profiling  is not explicitly specified, turn off
	  clock-based profiling.

	  For more information	on  hardware  counters,	 see  the
	  "Hardware Counter Overflow Profiling"	section	below.

     -s	option
	  Collect synchronization tracing data.

	  The minimum delay threshold for tracing events  is  set
	  using	option.	 The allowed values of option are:

	  Value	    Meaning

	  on	    Turn on synchronization delay tracing and set
		    the	threshold value	by calibration at runtime

	  calibrate Same as on

	  off	    Turn off synchronization delay tracing

	  n	    Turn on synchronization delay tracing with	a
		    threshold  value  of  n microseconds; if n is
		    zero, trace	all events

	  all	    Turn on  synchronization  delay  tracing  and
		    trace all synchronization events

	  By default, turn off synchronization delay tracing.

	  Record synchronization events	for  Java  monitors,  but
	  not for native synchronization within	the JVM	machine.

     -H	option
	  Collect heap trace data. The allowed values  of  option
	  are:

	  Value	    Meaning

	  on	    Turn on tracing of memory allocation requests

	  off	    Turn  off  tracing	 of   memory   allocation
		    requests

	  By default, turn off heap tracing.

	  Record heap-tracing events for any native calls.  Treat
	  calls	to mmap	as memory allocations.

	  Heap profiling is  not  supported  for  Java	programs.
	  Specifying it	is treated as an error.

	  Note that heap tracing may produce a very large experi-
	  ment.	  Such	experiments  are  very	slow  to load and
	  browse.


     -m	option

	  Collect MPI tracing data.

	  The allowed values of	option are:

	  Value	    Meaning

	  on	    Turn on tracing of MPI calls

	  off	    Turn off tracing of	MPI calls

	  By default, turn off MPI tracing.


     -c	option								||
	  Collect count	data, using bit(1) instrumentation.  This  |
	  option is available only on SPARC systems.		   |

	  The allowed values of	option are:			   |

	  Value								||
		    Meaning					   |

	  on								||
		    Turn on count data				   |

	  static							||
		    Turn  on  simulated	 count data, based on the  |
		    assumption that every  instruction	was  exe-  |
		    cuted exactly once.				   |

	  off								||
		    Turn off count data				   |

	  By default, turn off count data.  Count data may not be  |
	  collected  with any other type of data.  For count data  |
	  or  simulated	 count	data,  the  executable	and   any  |
	  shared-objects  that	are  instrumented  and statically  |
	  linked are counted; dynamically  loaded  shared  object  |
	  are not instrumented and counted.			   |

	  In order to collect count data, the executable must  be
	  compiled with	the -xbinopt=prepare flag.


     -r	option								||
	  Collect thread-analyzer data.				   |

	  The allowed values of	option are:			   |

	  Value								||
		    Meaning					   |

	  on								||
		    Turn  on  thread analyzer data-race-detection  |
		    data					   |

	  all								||
		    Turn on all	thread analyzer	data		   |

	  off								||
		    Turn off thread analyzer data		   |

	  dt1,...,dtN							||
		    Turn  on specific thread analyzer data types,  |
		    as named by	the dt*	parameters.		   |

	  The specific types of	thread analyzer	data that may  be  |
	  requested are:					   |

	  Value								||
		    Meaning					   |

	  race								||
		    Collect datarace data			   |

	  deadlock							||
		    Collect deadlock and potential-deadlock data   |

		    By	default,  turn	off  all  thread-analyzer  |
		    data.   Thread  Analyzer data may not be col-  |
		    lected with	any tracing data, but may be col-  |
		    lected  in	conjunction  with  clock- or HWC-  |
		    profile data.  Thread Analyzer data	will sig-  |
		    nificantly	slow  down  the	 execution of the  |
		    target, and	profiles may not be meaningful as  |
		    applied to the user	code.			   |

		    Thread Analyzer experiments	can  be	 examined  |
		    with either	analyzer or with tha.  The latter  |
		    will present a  simplified	list  of  default  |
		    tabs, but is otherwise identical.		   |

		    In order to	enable data-race detection,  exe-  |
		    cutables must be instrumented, either at com-  |
		    pile time, or by  invoking	a  postprocessor.  |
		    If	the  target is not instrumented, and none  |
		    of the shared objects on its library list  is  |
		    instrumented,  a  warning  will be displayed,  |
		    but	the experiment will be run.  Other Thread  |
		    Analyzer data do not require instrumentation.  |

		    For	OpenMP race-detection, a new  version  of  |
		    libmtsk.so	is  needed;  at	FCS time, a patch  |
		    for	that version will be issued, but prior to  |
		    that a copy	of that	library	is installed with  |
		    the	bits, and will be automatically	picked up  |
		    by collect.					   |

		    See	the tha(1) man page for	more detail.	   |

	  -S interval
		    Collect  periodic  samples	at  the	 interval
		    specified  (in seconds).  Record data samples
		    from the process, and include a timestamp and
		    execution  statistics  from	the kernel, among
		    other things.  The allowed values of interval
		    are:

		    Value     Meaning

		    off	      Turn off periodic	sampling

		    on	      Turn on periodic sampling	with  the
			      default	 sampling   interval   (1
			      second)

		    n	      Turn on periodic	sampling  with	a
			      sampling	interval of n in seconds;
			      n	must be	positive.

	  By default, turn on periodic sampling.

	  If no	data specification arguments are  supplied,  col-
	  lect	clock-based  profiling	data,  using  the default
	  resolution.

	  If clock-based profiling is  explicitly  disabled,  and
	  neither  hardware-counter  overflow  profiling  nor any
	  kind of tracing is enabled, display a	warning	 that  no
	  function-level  data	is  being collected, then execute
	  the target and record	global data.

  Experiment Controls
     -L	size
	  Limit	the amount of profiling	and tracing data recorded
	  to size megabytes.  The limit	applies	to the sum of all
	  profiling data and tracing  data,  but  not  to  sample
	  points.  The	limit  is  only	 approximate,  and can be
	  exceeded.  When the limit is	reached,  stop	profiling
	  and  tracing	data,  but  keep  the experiment open and
	  record samples until	the  target  process  terminates.
	  The allowed values of	size are:

	  Value	    Meaning

	  unlimited or none
		    Do not impose a size limit on the experiment

	  n	    Impose a limit of n	MB.; n must  be	 positive
		    and	greater	than zero.

	  The default limit on the amount  of  data  recorded  is
	  2000 Mbytes.

     -F	option
	  Control whether or not descendant processes should have
	  their	data recorded.	The allowed values of option are:

	  Value	    Meaning

	  on	    Record experiments	on  descendant	processes
		    from fork and exec

	  all	    Record   experiments   on	all    descendant
		    processes

	  off	    Do	not  record  experiments  on   descendant
		    processes

	  =<regex>							||
		    Record    experiments   on	 all   descendant  |
		    processes whose executable name (a.out  name)  |
		    or lineage match the regular expression.	   |

	  By default, do not record  descendant	 processes.   For
	  more	details, users should read the section "FOLLOWING
	  DESCENDANT PROCESSES", below.

     -A	option
	  Control whether or not load objects used by the  target
	  process  should be archived or copied	into the recorded
	  experiment.  The allowed values of option are:

	  Value	    Meaning

	  on	    Archive load objects into the experiment.

	  off	    Do not archive load	objects	into the  experi-
		    ment.

	  copy	    Copy and archive load objects into the exper-
		    iment.

	  A  user  that	 copies	 experiments  onto  a	different
	  machine,  or	reads  the  experiments	 from a	different
	  machine, should specify -A copy.  Note  that	doing  so
	  does	not  copy  any sources or object files;	it is the
	  user's responsibility	to ensure that	those  files  are
	  accessible on	the machine where the experiment is being
	  run.

	  The default setting for -A is	on.

     -j	option
	  Control  Java	 profiling  when  the  target  is  a  JVM
	  machine. The allowed values of option	are:

	  Value	    Meaning

	  on	    Record profiling data for  the  JVM	 machine,
		    and	 recognize  methods  compiled by the Java
		    HotSpot[TM]	virtual	machine, and also  record
		    Java callstacks.

	  off	    Do not record Java profiling data.

	  <path>    Record profiling data for the  JVM,	 and  use
		    the	JVM as installed in <path>.

	  See the section "JAVA	PROFILING", below.

	  The user must	use -j on to obtain profiling data if the
	  target  is  a	 JVM  machine.	 The  -j on option is not
	  needed if the	target is a class or jar file.	Users  on
	  a  64-bit  JVM machine must specify its path explicitly
	  as the target; do not	use the	-d64 option for	a  32-bit
	  JVM machine.	If the -j on option is specified, but the
	  target is not	a JVM machine, an invalid  argument might
	  be passed to the target, and no data would be	recorded.
	  The collect command validates	the version  of	 the  JVM
	  machine specified for	Java profiling.

     -J	java_arg
	  Specify a single argument to be passed to the	JVM  used
	  for profiling.  If  -J is specified, but Java	profiling
	  is not specified, an error is	generated, and no experi-
	  ment	run.  The argument is passed as	a single argument  |
	  to the JVM.  If multiple arguments are needed,  do  not  |
	  use	-J, but	rather specify the path	to the JVM expli-  |
	  citly, use  -j on, and add the arguments  for	 the  JVM  |
	  after	the path to the	JVM.				   |

     -l	signal
	  Record a sample point	 whenever  the	given  signal  is
	  delivered to the process.

     -y	signal[,r]
	  Control recording of data with  signal.   Whenever  the
	  given	 signal	 is  delivered	to  the	 process,  switch
	  between paused (no data is recorded) and resumed  (data
	  is  recorded)	states.	Start in the resumed state if the
	  optional ,r flag  is	given,	otherwise  start  in  the
	  paused  state.  This option does not affect the record-
	  ing of sample	points.

  Output Controls
     -o	experiment_name
	  Use experiment_name as the name of the experiment to be
	  recorded.   The  experiment_name must	end in the string
	  .er; if not, print an	error message and do not run  the
	  experiment.

	  If -o	is not specified, give the experiment a	 name  of
	  the  form stem.n.er, where stem is a string, and n is	a
	  number.  If a	group name has been  specified	with  -g,
	  set  stem to the group name without the .erg suffix. If
	  no group name	has  been  specified,  set  stem  to  the
	  string "test".

	  If invoked from one of the commands  used  to	 run  MPI
	  jobs	and -o is not specified, take the value	of n used
	  in the name  from  the  environment  variable	 used  to
	  define  the  MPI rank	of that	process. Otherwise, set	n
	  to one greater than the highest  integer  currently  in
	  use.

	  If the name is not specified in the form stem.n.er, and
	  the given name is in use, print an error message and do
	  not run the experiment.  If the name	is  of	the  form
	  stem.n.er  and  the name supplied is in use, record the
	  experiment under a name corresponding	 to  one  greater
	  than	the  highest value of n	that is	currently in use.
	  Print	a warning if the name is changed.

     -d	directory_name
	  Place	the experiment in directory  directory_name.   If
	  no  directory	 is  given,  place  the	experiment in the
	  current working directory.  If  a  group  is	specified
	  (see	-g, below), the	group file is also written to the
	  directory named by -d.

     -g	group_name
	  Add the experiment to	the experiment group  group_name.
	  The  group_name  string must end in the string .erg; if
	  not, report an error and do not run the experiment.
	  The first line of a group file must contain the string   |
	       #analyzer experiment group			   |
	  and each subsequent line is the name of an experiment.   |

     -O	file
	  Append all output from  collect  itself  to  the  named
	  file,	 but  do not redirect the output from the spawned
	  target.  If file is set to /dev/null suppress	all  out-
	  put from collect, including any error	messages.

     -t	duration							||
	  Collect  data	for the	specified duration.  duration may  |
	  be a single number, followed by  either  m,  specifying  |
	  minutes,  or	s,  specifying	seconds	(default), or two  |
	  such numbers separated by a -	sign.  If one  number  is  |
	  given, data will be collected	from the start of the run  |
	  until	the given time;	if two numbers	are  given,  data  |
	  will	be  collected  from the	first time to the second.  |
	  If the second	time is	 zero,	data  will  be	collected  |
	  until	 the end of the	run.  If two non-zero numbers are  |
	  given, the first must	be less	than the second.	   |

  Other	Arguments
     -P	<pid>								||
	  Write	 a  script  for	dbx to attach to the process with  |
	  the given PID, and collect data from it.  Only  profil-  |
	  ing  data, not tracing data may be specified,	and timed  |
	  runs (-t) are	not supported.				   |

     -C	comment
	  Put the comment into the notes file for the experiment.
	  Up to	ten -C arguments may be	supplied.

     -n	  Dry run: do not run  the  target,  but  print	 all  the
	  details  of  the experiment that would be run.  Turn on
	  -v.

     -R	  Display the  text  version  of  the  performance  tools
	  README  in  the  terminal  window. If	the README is not
	  found, print a warning.  Do not examine  further  argu-
	  ments	and do no further processing.

     -V	  Print	the current  version.	Do  not	 examine  further
	  arguments and	do no further processing.

     -v	  Print	the current version and	further	detailed informa-
	  tion about the experiment being run.

     -x	  Leave	the target process stopped on the exit	from  the
	  exec	system	call,  in  order  to  allow a debugger to
	  attach to it.	 The collect command will print	a message  |
	  with the process PID.					   |

	  To attach a debugger to the target once it  is  stopped
	  by collect, the user must follow the procedure below.

	  - Obtain the	PID  of	 the  process  from  the  message  |
	    printed by the collect -x command			   |

	  - Start the debugger

	  - Configure the debugger to ignore SIGPROF and, if  you  |
	    chose  to  collect	hardware  counter data,	SIGEMT on  |
	    Solaris or SIGIO on	Linux				   |

	  - Attach to the process using	the PID.

	  As the process runs under the	control	of the	debugger,
	  the Collector	records	an experiment.


FOLLOWING DESCENDANT PROCESSES

     Processes can create descendant processes by calling a  sys-
     tem  library  function.  The  Collector can collect data for
     descendant	 processes  initiated  by   calls   to	 fork(2),
     fork1(2),	fork(3F), vfork(2), and	exec(2)	and its	variants.
     The call to vfork is replaced internally by a call	to fork1.
     The  Collector  ignores  calls  to	 system(3C),  system(3F),
     sh(3F), popen(3C),	and similar functions, and their  associ-
     ated  descendant  processes.  If the -F on	argument is used,
     the Collector opens a new	experiment  for	 each  descendant
     process  inside the parent	experiment. These new experiments
     are named with their lineage as follows:

     - An underscore is	 appended  to  the  creator's  experiment
       name.

     - A code letter is	added: either "f" for a	fork, or "x"  for
       an exec.

     - A number	is added after the  code  letter,  which  is  the
       index  of  the fork or exec. The	assignment of this number
       is applied whether the process was started successfully or
       not.

     For example, if the experiment name for the initial  process
     is	 "test.1.er",  the  experiment for the descendant process
     created by	its third fork	is  "test.1.er/_f3.er".	 If  that
     descendant	 process  execs	 a  new	 image,	the corresponding
     experiment	name is	"test.1.er/_f3_x1.er".

     If	the -F all argument is used,  all  descendants	are  fol-
     lowed,  including those from system(3C), system(3F), sh(3F),
     popen(3C),	and similar functions.	 Those	descendants  that
     are  processed by -F all but not by -F on are named with the
     code letter "c".

     If	the -F =<regex>	argument is used, all  descendants  whose  |
     name  or  lineage	match the regular expression will be fol-  |
     lowed.							   |


     The Analyzer and er_print automatically read experiments for
     descendant	 processes  when  the founder experiment is read,
     but the experiments for the  descendant  processes	 are  not
     selected for data display.

     To	select the  data  for  display	from  the  command  line,
     specify  the  path	 name  explicitly  to  either er_print or
     Analyzer. The specified path must include the founder exper-
     iment  name, and the descendant experiment's name inside the
     founder directory.

     For example, to see the data  for	the  third  fork  of  the
     test.1.er experiment:
	       er_print	test.1.er/_f3.er
	       analyzer	test.1.er/_f3.er

     You can prepare an	experiment group file with  the	 explicit
     names of descendant experiments of	interest.

     To	examine	descendant processes in	the  Analyzer,	load  the
     founder  experiment  and  select "Filter data" from the View
     menu. The analyzer	will display a list of	experiments  with
     only  the	founder	 experiment  checked. Uncheck the founder
     experiment	and check the descendant experiment of interest.
								   |


JAVA PROFILING

     Java profiling consists of	collecting a performance  experi-
     ment on the JVM machine as	it runs	the user's .class or .jar
     files.  If	possible, callstacks are collected  in	both  the
     Java model	and in the machine model.

     Data can be shown with view mode set  to  User,  Expert,  or
     Machine.  User mode shows each method by name, with data for
     interpreted   and	 HotSpot-compiled   methods    aggregated
     together; it also suppresses data for non-user-Java threads.
     Expert mode separates HotSpot-compiled methods  from  inter-
     preted methods, and does not suppress non-user Java threads.
     Machine mode shows	data for interpreted Java methods against
     the  JVM machine as it does the interpreting, while data for
     methods compiled with the Java HotSpot  virtual  machine  is
     reported  for named methods.  All threads are shown.  In all
     three modes, data is reported in the usual	way for	any  non-
     OpenMP  C,	 C++,  or  Fortran  code called	by a Java target.
     Such code corresponds to Java native methods.  The	 Analyzer
     and  the  er_print	 utility can switch between the	view mode
     User, view	mode Expert, and view  mode  Machine,  with  User
     being the default.

     Clock-based profiling and hardware	counter	overflow  profil-
     ing  are  supported.   Synchronization tracing collects data
     only on the Java monitor calls,  and  synchronization  calls
     from  native  code;  it does not collect data about internal
     synchronization calls within the JVM.

     Heap tracing is not supported for	Java,  and  generates  an
     error if specified.

     When collect inserts a target name	of java	into the argument
     list,  it	examines  environment variables	for a path to the
     java target, in the order JDK_HOME, and then JAVA_PATH.  For
     the  first	 of  these environment variables that is set, the
     resultant target is verified as an	ELF executable.	If it  is
     not,  collect  fails with an error	indicating which environ-
     ment variable was used, and the  full  path  name	that  was
     tried.

     If	none of	those environment variables is set,  the  collect
     command uses the default path where the Java[TM] 2	Platform,
     Standard Edition technology was installed with the	 release,
     if	 any,  and  if it was not installed, as	set by the user's
     PATH.

     Java Profiling requires the Java[TM] 2 SDK, version 1.5.0_03  |
     or	 later;	 some  earlier	versions (but no earlier than the  |
     Java[TM] 2	SDK, version 1.4.2_02) may work, but are not sup-  |
     ported.							   |


OPENMP PROFILING

     Data collection for OpenMP	programs collects data	that  can
     be	 displayed  in	any  of	the three view modes, just as for
     Java programs.  The presentation is identical for user  mode
     and  expert  mode.	  Slave	threads	are shown as if	they were
     really forked from	the master thread, and have  call  stacks
     matching  the master thread. Frames in the	call stack coming
     from the OpenMP runtime code  (libmtsk.so)	 are  suppressed.
     For machine mode, the actual native stacks	are shown.

     In	user mode, various artificial functions	are introduced as
     the  leaf	function  of  a	 callstack  whenever  the runtime
     library is	in one of several states.   These  functions  are
     <OMP-overhead>,	 <OMP-idle>,	<OMP-reduction>,    <OMP-
     implicit_barrier>,	<OMP-explicit_barrier>,	 <OMP-lock_wait>,
     <OMP-critical_section_wait>, and <OMP-ordered_section_wait>.

     Two additional clock-profiling metrics are	added to the data
     for  clock-profiling  experiments:	 OMP  Work, and	OMP Wait.  |
     OMP Work is counted when the OpenMP runtime thinks	the  code  |
     is	doing work.  It	includes time when the process is consum-  |
     ing User-CPU time,	but it also may	 include  time	when  the  |
     process  is  consuming  System-CPU	 time,	waiting	 for page  |
     faults, waiting for the CPU, etc..	 Hence,	OpenMP	Work  may  |
     excess  User-CPU  time.  OpenMP Wait is accumulated when the  |
     OpenMP runtime  thinks  the  process  is  waiting.	  It  may  |
     include  User-CPU	time  for busy-waits (spin-waits), but it  |
     also includes Other-Wait time for sleep-waits.		   |

     The inclusive metrics are visible by default; the	exclusive  |
     are  not.	Together, the sum of those two metrics equals the  |
     Total LWP Time metric.  These  metrics  are  added	 for  all  |
     clock- and	HWC-counter experiments.			   |


DATASPACE PROFILING

     A dataspace profile is a data collection  in  which  memory-
     related  events,  such as cache misses, are reported against
     the data object references	that cause the events rather than
     just the instructions where the memory-related events occur.
     Dataspace profiling is not	available on systems running  the
     Linux OS, nor on x86 based	systems	running	the Solaris OS.

     To	allow dataspace	profiling, the target may be  written  in  |
     C,	 C++ or	Fortran, and must be compiled for SPARC	architec-  |
     ture, with	the -xhwcprof -xdebugformat=dwarf  -g  flags,  as  |
     described	above.	 Furthermore,  the data	collected must be  |
     hardware  counter	profiles  and  the  optional  +	 must  be
     prepended	to  the	 counter  name.	  If  the  optional  + is
     prepended to one memory-related counter, but  not	all,  the
     counters  without	the  + will report dataspace data against
     the <Unknown> data	object,	with subtype (Dataspace	data  not
     requested during data collection).

     With the data collected, the er_print utility  allows  three
     additional	  commands:    data_objects,   data_single,   and
     data_layout, as well as various commands relating to  Memory
     Objects.  See the er_print(1) man page for	more information.

     In	addition, the Analyzer now includes two	tabs  related  to
     dataspace	profiling, labeled DataObjects and DataLayout, as
     well as a set of tabs relating to Memory Objects.	 See  the
     analyzer(1) man page for more information.

     Clock-based dataspace  profiling  should  only  be	 used  on  |
     machines  that  do	 not  support  HW  counter profiling with  |
     memory-based counters.  It	 requires  the	same  compilation  |
     flags  as	for  HW	counter	profiling.  Data should	be inter-  |
     preted with care, as explained above.			   |


USING COLLECT WITH MPI

     The collect command can be	used with MPI by simply	prefacing
     the  target  and  its arguments with the collect command and
     its arguments in the command line that starts the	MPI  job.
     For example, on an	SMP machine,
	  % mprun -np 16 a.out 3 5
     can be replaced by
	  % mprun -np 16 collect -m  on	 -d  /tmp/mydirectory  -g
	  run1.erg a.out 3 5
     This command runs an MPI tracing experiment on each  of  the
     16	 MPI  processes, collecting them all in	a specific direc-
     tory, and collecting them as a group.  The	individual exper-
     iments  are  named	by the MPI rank, as described above under
     the -o option.  The experiments, as specified above, contain
     clock-based  profiling  data, which is turned on by default,
     and MPI tracing data.

     On	a cluster, local file systems like /tmp	may be private to
     a	node.	If experiments are collected on	node-private file
     systems, you should gather	those experiments to  a	 globally
     visible  file  system  after the experiments have completed,
     and edit any group	file to	reflect	the new	location of those
     experiments.


USING COLLECT WITH PPGSZ

     The collect command can be	used with ppgsz	 by  running  the
     collect  command on the ppgsz command, and	specifying the -F
     on	flag.  The founder experiment is on the	ppgsz  executable
     and is uninteresting.  If your path finds the 32-bit version
     of	ppgsz, and the experiment is being run on a  system  that
     supports  64-bit processes, the first thing the collect com-
     mand does is execute an exec function on its 64-bit version,
     creating _x1.er.  That executable forks, creating _x1_f1.er.
     The descendant process attempts to	execute	an exec	 function
     on	 the  named  target, in	the first directory on your path,
     then in the second, and so	forth,	until  one  of	the  exec
     functions	succeeds.   If,	 for  example,	the third attempt
     succeeds, the first two  descendant  experiments  are  named
     _x1_f1_x1.er  and	_x1_f1_x2.er,  and  both  are  completely
     empty.  The experiment on the target is  the  one	from  the
     successful	 exec, the third one in	the example, and is named
     _x1_f1_x3.er, stored under	the founder experiment.	  It  can
     be	 processed  directly  by  invoking  the	 Analyzer  or the
     er_print utility on test.1.er/_x1_f1_x3.er.

     If	the 64-bit ppgsz is the	initial	process	run,  or  if  the
     32-bit ppgsz is invoked on	a 32-bit kernel, the fork descen-
     dant that executes	exec on	the real target	has its	 data  in
     _f1.er,  and  the	real target's experiment is in _f1_x3.er,
     assuming the same path properties as in the example above.

     See the section  "FOLLOWING  DESCENDANT  PROCESSES",  above.
     For more information on hardware counters,	see the	"Hardware
     Counter Overflow Profiling" section below.

     The collect command operates by inserting a shared	 library,
     libcollector.so,	 into	 the   target's	  address   space
     (LD_PRELOAD),  and	 by  using  a  second	shared	 library,
     collaudit.so,  to	record shared-object use with the runtime
     linker's  audit  interface	 (LD_AUDIT).   Those  two  shared
     libraries write the files that constitute the experiment.
     Several problems may arise	if collect is invoked on  execut-
     ables  that call setuid or	setgid,	or that	create descendant
     processes that call setuid	or setgid.  If the  user  running
     the  experiment  is  not  root, collection	fails because the
     shared libraries are not installed	in a  trusted  directory.
     The workaround is to run the experiments as root.

     In	addition, the umask for	the user running the collect com-
     mand  must	 be  set to allow write	permission for that user,
     and  for  any  users  or  groups  that  are   set	 by   the
     setuid/setgid  attributes	of a program being exec'd and for
     any user or group to which	that program sets itself.  If the
     mask  is  not set properly, some files may	not be written to
     the experiment, and processing of the experiment may not  be
     possible.	If the log file	can be written,	an error is shown
     when the user attempts to process the experiment.

     Other problems can	arise if the target itself makes  any  of
     the  system  calls	 to  set UID or	GID, or	if it changes its
     umask and then forks or runs exec on some other process,  or
     crle  was	used to	configure how the runtime linker searches
     for shared	objects.



DATA COLLECTED

     Three types of data are collected:	profiling  data,  tracing
     data and sampling data. The data packets recorded in profil-
     ing and tracing include the callstack of each LWP,	the  LWP,
     thread  and  CPU IDs, and some event-specific data. The data
     packets recorded in sampling contain  global  data	 such  as
     execution	statistics,  but  no  program-specific	or event-
     specific data. All	data packets include a timestamp.

     Clock-based Profiling
	  The event-specific data recorded in clock-based profil-
	  ing  is  an  array of	counts for each	accounting micro-
	  state. The microstate	array is incremented by	the  sys-
	  tem  at  a prescribed	frequency, and is recorded by the
	  Collector when a profiling signal is processed.

	  Clock-based profiling	can run	at a range of frequencies
	  which	 must  be  multiples of	the clock resolution used  |
	  for the profiling  timer.   If  you  try  to	do  high-  |
	  resolution  profiling	 on  a	machine	with an	operating
	  system that does not support it, the command	prints	a
	  warning  message  and	 uses the highest resolution sup-
	  ported. Similarly, a custom setting that is not a  mul-
	  tiple	 of  the  resolution  supported	 by the	system is
	  rounded down to the nearest non-zero multiple	 of  that
	  resolution, and a warning message is printed.

	  Clock-based  profiling  data	is  converted  into   the
	  following metrics:

	       User CPU	Time
	       Wall Time
	       Total LWP Time
	       System CPU Time
	       Wait CPU	Time
	       User Lock Time
	       Text Page Fault Time
	       Data Page Fault Time
	       Other Wait Time

	  For experiments on multithreaded applications,  all  of
	  the  times, other than Wall Time, are	summed across all
	  LWPs in the process;	Wall Time is the  time	spent  in
	  all  states  for LWP 1 only.	Total LWP Time adds up to
	  the real elapsed time, multiplied by the average number
	  of LWPs in the process.

	  If clock-based dataspace  profiling  is  specified,  an  |
	  additional metric:					   |

	       Max. Mem	Stalls					   |
	  is provided.						   |

     Hardware Counter Overflow Profiling
	  Hardware counter overflow profiling records the  number
	  of  events  counted by the hardware counter at the time
	  the overflow signal was processed. This type of profil-
	  ing  is  now available on systems running the	Linux OS,
	  provided that	they have the Perfctr patch installed.

	  Hardware counter overflow profiling can be done on sys-
	  tems	that  support overflow profiling and that include
	  the hardware counter shared library, libcpc.so(3).  You
	  must	use  a version of the Solaris OS  no earlier that
	  the Solaris 8	OS. On UltraSPARC[R] computers,	you  must
	  use  a  version  of  the  hardware  no earlier than the
	  UltraSPARC III hardware.  On computers that do not sup-
	  port	overflow profiling, an attempt to select hardware
	  counter overflow profiling generates an error.

	  The counters available depend	on the specific	CPU  pro-
	  cessor  and  operating system. Running the collect com-
	  mand with no arguments prints	out a usage message  that
	  contains  the	names of the counters.	The counters that
	  are considered well-known are	displayed  first  in  the
	  list,	followed by a list of the raw hardware counters.

	  The lines of output are formatted similar to	the  fol-
	  lowing:

	    Well known HW counters available for profiling:
	      cycles[/{0|1}],9999991 ('CPU Cycles', alias for Cycle_cnt; CPU-cycles)
	      insts[/{0|1}],9999991 ('Instructions Executed', alias for	Instr_cnt; events)
	      dcrm[/1],100003 ('D$ Read	Misses', alias for DC_rd_miss; load events)
	      ...
	    Raw	HW counters available for profiling:
	      Cycle_cnt[/{0|1}],1000003	(CPU-cycles)
	      Instr_cnt[/{0|1}],1000003	(events)
	      DC_rd[/0],1000003	(load events)
	      SI_snoop[/0],1000003 (not-program-related	events)
	      ...

	  In the first line of the well-known counter output, the
	  first	 field,	 "cycles",  gives  the well-known counter
	  name that can	be used	in the -h counter... argument. It
	  is  followed	by a specification of which registers can
	  be used for that counter.  The next  field,  "9999991",
	  is  the  default  overflow value for that counter.  The
	  following field in parentheses, "CPU	Cycles",  is  the
	  metric name, followed	by the raw hardware counter name.
	  The last field,  "CPU-cycle",	 specifies  the	 type  of
	  units	 being counted.	 There can be up to two	words for
	  the type of information.  The	second or  only	 word  of
	  the  type  information  may  be  either "CPU-cycles" or
	  "events".  If	the counter can	 be  used  to  provide	a
	  time-based  metric,  the value is CPU-cycles;	otherwise
	  it is	events.

	  The second output line of the	well-known counter output
	  above	 has  "events" instead of "CPU-cycles" at the end
	  of the line, indicating that it counts events, and  can
	  not be converted to a	time.

	  The third output line	 above	has  two  words	 of  type
	  information, "load events", at the end of the	line. The
	  first	word of	type information  may have the	value  of
	  "load",   "store",   "load-store",   or   "not-program-
	  related". The	first three of these type values indicate
	  that the counter is memory-related and the counter name
	  can be preceded by the "+" sign when used in the   col-
	  lect	-h   command.  The "+" sign indicates the request
	  for data collection to  attempt  to  find  the  precise
	  instruction  and  virtual address that caused	the event
	  on the counter that overflowed.

	  The  "not-program-related"  value  indicates	that  the
	  counter  captures  events  initiated by some other pro-
	  gram,	 such  as  CPU-to-CPU  cache  snoops.  Using  the
	  counter for profiling	generates a warning and	profiling
	  does not record a call stack.	It  does,  however,  show
	  the  time  being spent in an artificial function called
	  "collector_not_program_related". Thread IDs and LWP IDs
	  are recorded,	but are	meaningless.

	  The information included in the  raw	hardware  counter
	  list	is  a subset of	the well-known counter list. Each
	  line includes	the internal counter name as used by cpu-
	  track(1),  the register number(s) on which that counter
	  can be  used,	 the  default  overflow	 value,	 and  the
	  counter units, which is either CPU-cycles or Events.

	  EXAMPLES:

	  Example 1: Using  the	 well-known  counter  information
	  listed  in  the above	sample output, the following com-
	  mand:

	       collect -h cycles/0,hi,+dcrm,9999

	  enables the CPU Cycle	profiling on register 0. The "hi"
	  value	 enables  a  sample rate that is approximately 10
	  times	faster than the	 default  rate	of  9999991.  The
	  "dcrm"  value	 enables  the  D$  Read	Miss profiling on
	  register 1 and the preceding "+" enables Dataspace pro-
	  filing for the dcrm. The "9999" value	sets the sampling
	  to be	done every  9999  read	misses,	 instead  of  the
	  default value	of every 100003	read misses.

	  Example 2:

	  Running the collect command with no arguments	on an AMD
	  Opteron  machine  would  produce a raw hardware counter
	  output similar to the	following :

		FP_dispatched_fpu_ops[/{0|1|2|3}],1000003 (events)
		FP_cycles_no_fpu_ops_retired[/{0|1|2|3}],1000003 (CPU-cycles)
		...

	  Using	the above raw hardware counter output,	the  fol-
	  lowing command:

	    collect -h FP_dispatched_fpu_ops~umask=0x3/2,10007

	  enables the Floating Point Add and Multiply  operations
	  to  be  tracked  at  the  rate of 1 capture every 10007
	  events. (For more details on	valid  attribute  values,
	  refer	 to  the processor documentation). The "/2" value
	  specifies the	data is	to be captured using the register
	  2 of the hardware.

     Synchronization Delay Tracing
	  Synchronization delay	tracing	records	all calls to  the
	  various   thread  synchronization  routines  where  the
	  real-time  delay  in	the  call  exceeds  a	specified
	  threshold.  The  data	 packet	 contains  timestamps for
	  entry	and exit to  the  synchronization  routines,  the
	  thread  ID,  and  the	LWP ID at the time the request is
	  initiated. (Synchronization requests from a thread  can
	  be initiated on one LWP, but complete	on another.)

	  Synchronization delay	tracing	data  is  converted  into
	  the following	metrics:

	       Synchronization Delay Events
	       Synchronization Wait Time

     Heap Tracing
	  Heap tracing records all calls to malloc,  free,  real-
	  loc,	memalign,  and	valloc with the	size of	the block
	  requested, its address, and for realloc,  the	 previous
	  address.

	  Heap tracing	data  is  converted  into  the	following
	  metrics:

	       Leaks
	       Bytes Leaked
	       Allocations
	       Bytes Allocated

	  Leaks	are defined as allocations that	 are  not  freed.
	  If  a	 zero-length  block is allocated, it counts as an
	  allocation with zero bytes allocated.	If a  zero-length
	  block	is not freed, it counts	as a leak with zero bytes
	  leaked.

	  For applications written in  the  Java[TM]  programming
	  language,  leaks  are	 defined as allocations	that have
	  not been garbage-collected.  Heap  profiling	for  such
	  applications	is  obsolescent	and will not be	supported
	  in future releases.

	  Heap tracing experiments can be very large, and may  be
	  slow to process.

     MPI Tracing
	  MPI tracing records calls to the MPI library for  func-
	  tions	 that  can  take  a significant	amount of time to
	  complete.  MPI tracing is not	available on systems run-
	  ning the Linux OS.

	  The  following  functions  from  the	MPI  library  are
	  traced:

	       MPI_Allgather
	       MPI_Allgatherv
	       MPI_Allreduce
	       MPI_Alltoall
	       MPI_Alltoallv
	       MPI_Barrier
	       MPI_Bcast
	       MPI_Bsend
	       MPI_Gather
	       MPI_Gatherv
	       MPI_Irecv
	       MPI_Isend
	       MPI_Recv
	       MPI_Reduce
	       MPI_Reduce_scatter
	       MPI_Rsend
	       MPI_Scan
	       MPI_Scatter
	       MPI_Scatterv
	       MPI_Send
	       MPI_Sendrecv
	       MPI_Sendrecv_replace
	       MPI_Ssend
	       MPI_Wait
	       MPI_Waitall
	       MPI_Waitany
	       MPI_Waitsome
	       MPI_Win_fence
	       MPI_Win_lock

	  MPI  tracing	data  is  converted  into  the	following
	  metrics:

	       MPI Time
	       MPI Sends
	       MPI Bytes Sent
	       MPI Receives
	       MPI Bytes Received
	       Other MPI Calls

	  MPI Time is the total	LWP time spent in the  MPI  func-
	  tion.

	  The MPI Bytes	Received metric	uses the actual	number of
	  bytes	 for blocking calls, but uses the buffer size for
	  non-blocking calls. Metrics that are computed	for  col-
	  lective  operations such as gather, scatter, and reduce
	  have the maximum possible values for these  operations.
	  No  reduction	in the values is made due to optimization
	  of the collective operations.

	  Note that the	MPI Bytes Received as reported may  seri-
	  ously	 overestimate  the  actual transmissions whenever
	  the  buffer  used  for  a   non-blocking   receive   is
	  significantly	 larger	 than  the  size  needed  for the
	  receive.


     Count Data								||
	  Count	data is	recorded by instrumenting the executable,  |
	  and counting the number of times each	 instruction  was  |
	  executed.  It	also counts the	number of times	the first  |
	  instruction in a function is executed, and  calls  that  |
	  the function execution count.				   |

	  Count	data is	converted into the following metric:	   |

	       Bit Func	Count					   |
	       Bit Inst	Exec					   |
	       Bit Inst	Annul					   |

     Data-race Detection Data						||
	  Data-race  detection	data  consists	of pairs of race-  |
	  access events	that constitute	a race.	 The  events  are  |
	  combined   into   a  race,  and  races  for  which  the  |
	  callstacks for the two access	are identical are  merged  |
	  into a race group.					   |

	  Data-race detection data is converted	into the  follow-  |
	  ing metric:						   |

	       Race Accesses					   |

     Deadlock Detection	Data						||
	  Deadlock  detection  data  consists of pairs of threads  |
	  with conflicting locks.				   |

	  Dedalock detection data is converted into the	following  |
	  metric:						   |

	       Deadlocks					   |


     Sampling and Global Data
	  Sampling refers to the process  of  generating  markers
	  along	the time line of execution. At each sample point,
	  execution statistics are  recorded.  All  of	the  data
	  recorded at sample points is global to the program, and
	  does not map to function-level metrics.

	  Samples are always taken at the start	of  the	 process,
	  and  at its termination. By default or if a non-zero -S
	  argument is specified, samples are  taken  periodically
	  at the specified interval.  In addition, samples can be
	  taken	by using the libcollector(3) API.

	  The data recorded at	each  sample  point  consists  of
	  microstate  accounting  information  from  the  kernel,
	  along	with various other statistics  maintained  within
	  the kernel.


RESTRICTIONS

     The Collector interposes on some signal-handling routines to
     ensure  that its use of SIGPROF signals for clock-based pro-  |
     filing and	SIGEMT (Solaris) or SIGIO  (Linux)  for	 hardware  |
     counter  overflow	profiling  is not disrupted by the target  |
     program.  The Collector library re-installs its  own  signal  |
     handler if	the target program installs a signal handler. The  |
     Collector's signal	handler	sets a	flag  that  ensures  that
     system  calls  are	 not interrupted to deliver signals. This
     setting could change the behavior of the target program.

     The Collector interposes on setitimer(2) to ensure	that  the
     profiling	timer  is  not available to the	target program if
     clock-based profiling is enabled.

     The  Collector  interposes	 on  functions	in  the	 hardware
     counter  library,	libcpc.so,  so that an application cannot
     use hardware counters while the Collector is collecting per-
     formance  data.  The  interposed functions	return a value of
     -1.

     Dataspace profiling are not available on systems running the  |
     Linux OS.							   |

     For this release, the data	from collecting	periodic  samples
     is	not reliable on	systems	running	the Linux OS.

     For this release, wide data discrepancies are observed  when
     profiling	multithreaded applications on systems running the
     RedHat Enterprise Linux OS.

     Hardware counter overflow profiling cannot	be run on a  sys-
     tem  where	cpustat	is running, because cpustat takes control
     of	the counters, and does not let a user process use them.

     Java Profiling requires the Java[TM] 2 SDK, version 1.5.0_03  |
     or	 later updates of 1.5.0, or Java[TM] 2 SDK, version 1.6.0  |
     or	later updates of 1.6.0.					   |

     Data is not  collected  on	 descendant  processes	that  are
     created  to  use the setuid attribute, nor	on any descendant
     processes created with an exec function run on an executable
     that  is  not  dynamically	 linked.  Furthermore, subsequent
     descendant	processes may  produce	corrupted  or  unreadable
     experiments.  The workaround is to	ensure that all	processes
     spawned are dynamically-linked and	do not	have  the  setuid
     attribute.
     Applications that call vfork(2) have these	calls replaced by
     a call to fork1(2).


SEE ALSO

     analyzer(1), collector(1),	dbx(1),	er_archive(1),	er_cp(1),
     er_export(1),  er_mv(1), er_print(1), er_rm(1), tha(1), lib-
     collector(3), and the Performance Analyzer	manual.