|
By Nasser Nouri, Sun Microsystems, Inc, August 2007
|
|
|
One of the most useful debugging features in the Sun Studio dbx debugger is
enabling watchpoints during the execution of programs. A watchpoint,
which is also called a data change breakpoint, can be used in dbx to stop
a program when the value of a variable or expression has changed.
A watchpoint is similar to a breakpoint, except that a
watchpoint stops execution when an address location is read or
modified, whereas a breakpoint stops execution when an instruction is
executed at a specified location.
This article intends to educate users on how to use the watchpoint facility
in the Sun Studio dbx debugger. The dbx debugger can
be used for both source-level and instruction-level debugging.
Additionally, the Solaris Dynamic Tracing (DTrace) facility is used to show how
the internal states of the Solaris kernel can be traced with a simple D script.
Under the Hood
Historically, watchpoints were implemented in software, and to some
extent they slowed down program execution. The newer
versions of microprocessors are equipped with debug registers that
enable modern software debuggers such as dbx to create
hardware watchpoints. The hardware watchpoints are extremely fast and
do not slow down the execution of programs.
For example, Intel and AMD architectures have eight debug
registers, DR0 through DR7. The DR0 through
DR3 registers can be used for
creating address breakpoints. Software can load a virtual (linear) address
into any of the four registers, and enable breakpoints to occur when
the address matches an instruction or data reference.
The debug control register (DR7), is used to establish the
breakpoint conditions for the
address breakpoint registers (DR0 through DR3) and to enable debug exceptions
for each address breakpoint register individually.
DR6 is the debug status register. The microprocessor loads the
debug status into DR6 when an enabled debug condition is encountered
that causes a debug exception. This register is never cleared by the
processor and must be cleared by software after the contents have been
read.
The DR4 and DR5 registers are reserved and cannot be used by
software.
Fortunately, the Solaris Operating System provides a well-defined
interface called /proc that shields developers from the complexity of different
microprocessor architectures.
Using the /proc interface makes applications such as the dbx
debugger extremely portable across Solaris platforms. The Solaris OS
runs on SPARC, x86,
and x64 architectures.
The /proc interface is a file system that provides access to the state of each
process and lightweight process (LWP) in the system. Watchpoints are
set and cleared through the /proc file system interface, by opening the
control file for a process and then sending a PCWATCH command (see the
proc
(4) main page for more details).
The PCWATCH command is accompanied by a prwatch data
structure, which
contains the address, the length of the area to be affected, and the
type of access to be watched for: read, write, execute, and stop before
or after the access.
A watchpoint is triggered when an LWP in the traced process makes a
memory reference that covers at least one byte of a watched area and
the memory reference matches the access mode specified by the
PCWATCH command.
When an LWP triggers a watchpoint, it incurs a watchpoint trap
(FLTWATCH), which is generated by the Solaris kernel. If FLTWATCH is being
traced, the LWP stops; otherwise, it is sent a SIGTRAP signal. If
SIGTRAP is being traced and is not blocked, then the LWP stops.
At this point the dbx debugger takes control and you can issue
other dbx commands to examine the states of the traced process.
Setting Watchpoints in the dbx Debugger
To stop execution when a memory address has been accessed:
(dbx) stop access mode address-expression [, byte-size-expression ]
mode specifies how the memory was accessed. It can be composed
of one or all of the following letters:
| r
|
The memory at the specified address has been read
|
| w
|
The memory has been written to
|
| x
|
The memory has been executed
|
mode can also contain either
of the following:
| a
|
Stops the process after the access (default)
|
| b
|
Stops the process before the access
|
address-expression is any
expression that can be evaluated to produce an address. If you give a
symbolic expression, the size of the region to be watched is
automatically deduced; you can override it by specifying
byte-size-expression. You can also
use nonsymbolic, typeless address expressions, in which case the size
is mandatory.
If you typed the following command, execution would stop after the memory
address 0xfffffd7fffdff7a had been read:
(dbx) stop access r 0xfffffd7fffdff7a8, 4
If you typed the following command, execution would stop before the
variable local had been written to:
(dbx) stop access wb &local
Keep these points in mind when using the stop access
command:
- The event occurs when a variable is written to, even if it has the
same value.
- By default, the event occurs after execution of the instruction
that wrote to the variable. You can indicate that you want the event to
occur before the instruction is executed by specifying the mode as b.
The older stop modify command is still accepted for backward
compatibility and maps to the appropriate stop access command:
(dbx) stop modify address-expression [, byte-size-expression ]
In the following a.cc example, we would like to stop the process
whenever the local variable is accessed for a write operation.
The a.cc test case
#include <stdio.h>
int global = 0;
static int stat = 0;
void poker(int *ip)
{
*ip = 5;
}
main()
{
static int flocal;
int local;
global = 0;
stat = 0;
flocal = 0;
local = 0;
poker(&global);
poker(&stat);
poker(&flocal);
poker(&local);
}
The a.cc testcase is compiled as follows:
CC -g -m64 a.cc
By default, the C++ compiler generates the a.out executable.
Now let's run the dbx debugger on the a.out executable and set
a data change
breakpoint (watchpoint) for the local variable.
Below is the output
of the dbx debugger.
% dbx a.out
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.6'in your .dbxrc
Reading a.out
Reading ld.so.1
Reading libCstd.so.1
Reading libCrun.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
(2) stop in main
Running: a.out
(process id 10452)
stopped in main at line 16 in file "a.cc"
16 global = 0;
(dbx) stop access w &local
(3) stop access wa &local, 4
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 19 in file "a.cc"
19 local = 0;
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
8 *ip = 5;
(dbx)
As shown, the stop access command with write access mode is
used to set a watchpoint for the local variable. The
&local
syntax stands for the address of the local variable. The
local
variable is defined as a four-byte integer, hence, the size of the
region to be watched is automatically deduced and appended to the
command syntax.
The watchpoint trap is triggered twice for the local variable. The
first time is when the local variable is assigned a value of zero in
the main
function. The second time is when the local variable is assigned a
value of five in the poker method.
Monitoring Watchpoint Traps with DTrace
This section describes how the watchpoint trap can be traced in the Solaris
kernel using the DTrace facility. It is assumed that you are already
familiar with D script syntax, probes, and constructs. Otherwise, the
following article is recommended for reading before you continue
with
this section: Using
DTrace with Sun Studio Tools to Understand, Analyze, Debug, and Enhance
Complex Applications.
The /usr/include/sys/falut.h header file contains the names of all
hardware faults that can be traced in the process. However, for
this particular subject, we need to pay attention only to
the FLTWATCH fault. The FLTWATCH fault or the
number 12 is the watchpoint trap.
The following D script shows how to use the fault probe in the
proc
provider to monitor the hardware faults.
The fault.d D script
#pragma D option quiet
dtrace:::BEGIN {
printf("Tracing hardware faults. Enter <control-c> to end.\n");
}
proc:::fault
{
@[execname, args[0], args[1]->__data.__fault.__addr,
args[1]->__data.__fault.__pc] = count();
}
END
{
printf("%10s %10s %16s %16s %10s\n",
"EXECUTABLE", "FAULT", "ADDRESS", "PC", "COUNT");
printa("%10s %10d %16p %16p %8@d\n", @);
}
The fault probe fires when a thread experiences a machine fault.
The fault probe has two arguments: The fault code is in
args[0]. The
kernel siginfo structure corresponding to the fault is pointed to by
args[1].
The kernel siginfo_t structure is defined in
the /usr/include/sys/siginfo.h header file. The siginf_t structure
consists of a union of several structures. However, for this particular
example, we are only interested in tracing the __addr and
__pc fields
of __fault structure.
As shown in the fault.d D script, the aggregation is used in the
proc::fault
clause to collect data based on the following expressions:
| execname |
The executable name |
| args[0] |
The fault code |
| args[1]->__data.__fault.__addr |
args[1] is a pointer to
the siginfo_t structure. __addr is the address of a watched
area in memory. |
| args[1]->__data.__fault.__pc |
args[1] is a pointer to
siginfo_t structure. __pc is the address of the instruction that
accesses the watched area in memory. |
The count() function shows the number of times each fault is
triggered in a process.
The fault.d script needs to be run in a separate terminal window.
The following dtrace
command enables the fault probe in the Solaris kernel:
dtrace -s fault.d
At this point, the fault probe in the proc provider is enabled and
waiting to collect data.
Now, in a separate terminal window, let's run dbx on the a.out executable and enter the
same sequence of commands shown in the
previous section to set (and trigger) a data change breakpoint for the
local variable.
As it is instructed in the terminal window from which the
dtrace command is
invoked, the <control-c> command ends the execution of
the fault.d
script. DTrace generates the following output:
% dtrace -s fault.d
Tracing hardware faults. Enter <control-c> to end.
^C
EXECUTABLE FAULT ADDRESS PC COUNT
a.out 3 401048 0 1
a.out 3 fffffd7fff3ce570 0 1
a.out 4 401053 0 1
a.out 4 fffffd7fff3ce571 0 1
a.out 12 fffffd7fffdff7c8 401030 1
a.out 12 fffffd7fffdff7c8 401069 1
a.out 3 fffffd7fff3ce540 0 2
The FAULT column lists all hardware faults that are traced in the
a.out
process. The FLTBPT fault or the number 3 is the breakpoint trap.
The FLTTRACE fault or the number 4 is the trace trap (single-step).
However, as mentioned before, we only need to pay attention to
FLTWATCH
fault or the number 12.
Based on the output of the fault.d script, the
a.out process incurred
the watchpoint trap twice for the 0xfffffd7fffdff7c8 address. As you
may have already guessed, 0xfffffd7fffdff7c8 is the address of the
local variable in memory (see the output of dbx in
the previous section).
Two instruction addresses, 0x401030 and 0x401069,
are listed in the PC (Program Counter) column. These two
instructions contain a memory reference to the watched area
(0xfffffd7fffdff7c8). Hence, the
watchpoint trap is triggered for these instructions.
The next step is to figure out what these two instructions are.
You can use dbx to disassemble the code and inspect the
assembly code for
0x401030 and 0x401069 instruction addresses.
It is assumed that you are already familiar with dbx
instruction-level debugging commands. Otherwise, the following article
is recommended for reading before proceeding with rest of this
section: AMD64
Instruction-Level Debugging with Sun Studio dbx.
Below is the output of dbx. The dis command is
used to disassemble
the portion of code that correspond to the 0x401030 and
0x401069 instruction addresses. The regs command is
used to display the contents of the general purpose registers.
(dbx) cont
watchpoint wa &local (0xfffffd7fffdff7c8[4]) at line 8 in file "a.cc"
8 *ip = 5;
(dbx) dis main
0x0000000000401040: main pushq %rbp
0x0000000000401041: main+0x0001: movq %rsp,%rbp
0x0000000000401044: main+0x0004: subq $0x0000000000000010,%rsp
0x0000000000401048: main+0x0008: movl $0x0000000000000000,global
0x0000000000401053: main+0x0013: movl $0x0000000000000000,stat
0x000000000040105e: main+0x001e: movl $0x0000000000000000,__1fEmain1AGflocal_
0x0000000000401069: main+0x0029: movl
$0x0000000000000000,0xfffffffffffffff8(%rbp)
0x0000000000401070: main+0x0030: movq $global,%rdi
0x0000000000401077: main+0x0037: movl $0x0000000000000000,%eax
0x000000000040107c: main+0x003c: call poker [ 0x401020, .-0x5c ]
(dbx) dis poker
0x0000000000401020: poker : pushq %rbp
0x0000000000401021: poker+0x0001: movq& %rsp,%rbp
0x0000000000401024: poker+0x0004: subq $0x0000000000000010,%rsp
0x0000000000401028: poker+0x0008: movq %rdi,0xfffffffffffffff8(%rbp)
0x000000000040102c: poker+0x000c: movq 0xfffffffffffffff8(%rbp),%r8
0x0000000000401030: poker+0x0010: movl
$0x0000000000000005,0x0000000000000000(%r8)
0x0000000000401038: poker+0x0018: leave
0x0000000000401039: poker+0x0019: ret
0x000000000040103a: poker+0x001a: nop
0x000000000040103c: _ex_deregister+0x01f4: nop
(dbx) regs
current frame: [1]
r15 0x0000000000000000
r14 0x0000000000000000
r13 0x0000000000000000
r12 0x0000000000000000
r11 0xfffffffffbc01ec8
r10 0x0000000048fe9d0a
r9 0x00000000000015da
r8 0xfffffd7fffdff7c8
rdi 0xfffffd7fffdff7c8
rsi 0xfffffd7fffdff7f8
rbp 0xfffffd7fffdff7b0
rbx 0xfffffd7fff3fac40
rdx 0xfffffd7fffdff808
rcx 0x0000000000093182
rax 0x0000000000000000
trapno 0x0000000000000001
err 0x0000000000000000
rip 0x0000000000401030:poker+0x10 movl
$0x0000000000000005,0x0000000000000000(%r8)
cs 0x0000000000000053
eflags 0x0000000000000286
rsp 0xfffffd7fffdff7a0
ss 0x000000000000004b
fs 0x0000000000000000
gs 0x0000000000000000
es 0x000000000000004b
ds 0x000000000000004b
fsbase 0xfffffd7fff382000
gsbase 0x0000000000000000
(dbx)
As shown above, the watchpoint is triggered when the number 5 is
assigned to the *ip formal parameter inside of the poker method at
line 8
of the a.cc program.
Similarly, the same assignment operation can be
observed at the assembly level. The movl instruction at the
0x401030 address
dereferences the content of %r8 register and assigns 5 to the
variable whose address is 0xfffffd7fffdff7c8 (the local
variable).
Conclusion
The hardware-assisted watchpoints in dbx are fast
and very useful for debugging extremely difficult software defects. A
watchpoint, also known as data change breakpoint, can be used
in dbx to stop
a program when the value of a variable or expression has changed.
The DTrace facility enables you to monitor the internal
states of the Solaris kernel in ways you could not have done it
before. A simple D script, as shown in this article, can reveal how
the Solaris kernel interacts with applications during execution.
Finally, using dbx and DTrace simultaneously creates the
ultimate debugging
environment to unravel the most obscure software defects in your
applications and even the Solaris kernel itself.
|
Nasser Nouri
is a staff software engineer currently working in the dbx debugger engineering group. For the last 10 years at Sun, Nasser has worked on wide spectrum of projects, such as the Massively Parallel Hardware Verilog Simulation system, the Distributed Verilog Simulation over the Internet using Load Balancing software and Java Servlet technology, and Java Graphical User Interfaces for CAD tools. Before joining Sun, he worked on Logic, Fault, and VHDL hardware simulation systems.
|
|
 |  |
| | | | | | | | | | |