|
Abstract
This article describes my experience with Dmalloc, a useful open source debugging package,
on the Solaris Operating System with Sun Studio compilers.
Contents
Introduction
Locating application bugs related to memory access (for example,
memory overwrites) is one of the most difficult and
labor-intensive parts of C/C++ programming. C and C++ give us a
lot of power and performance, but at the price of having to deal
with every detail of memory allocation (among other things).
In addition to crashes and other obvious problems caused by the
memory-related bugs, these bugs may be dormant, causing big
problems in the field at unpredictable times. They can also
create security vulnerabilities resulting from buffer overflows
in the heap and so on.
Dmalloc is an open source Debug Malloc package. All of its
source code and documentation is available at the Dmalloc web site (see reference [1]).
Many similar, relatively low-tech debugging tools are
available, but I like Dmalloc for its effectiveness, relative
simplicity, and the availability of all its implementation
details, including the source code, which is easy enough to adjust
when necessary.
Over the years, I've used Dmalloc a number of times. It has
helped me find some hard-to-catch memory bugs, both in applications
and in system code, such as X/Motif libraries and OpenGL graphics
pipelines.
Dmalloc was created and is still maintained by Gray Watson. It
has been around for quite a long time, apparently since 1992. I
myself used it in 1997 for the first time.
Not surprisingly, Dmalloc has grown over the years. However,
it's still relatively simple, easy to use, and most importantly
(at least in my opinion), it's an effective debugging tool,
especially for large applications where the other tools may not
be available or may not work properly due to scalability and
other problems.
The main reason for this Solaris OS-specific article on Dmalloc
is that historically Dmalloc has been geared mostly toward platforms and tools other
than the Solaris OS, so at the moment, it requires a
few special tweaks and configurations to work well with the Solaris OS. I
would also like to share some experience I've had using Dmalloc
on Solaris systems.
I've used Dmalloc with the Solaris 10 OS for both SPARC based and x64 systems
(it works very similarly on both) with
Sun Studio 11 compilers.
Why Do We Need Dmalloc? Don't We Have Professional Tools for This?
Indeed, many tools are available to help us find memory-related errors at runtime. However, none of them is perfect, so adding
Dmalloc to your debugging toolbox may well be worth the effort.
One at a time, I'll consider some of the alternative tools available for UNIX
systems, and briefly discuss how useful they have
been in the Solaris/Sun Studio environment (in my personal
experience). I'll start with the Sun tools, and then consider a
few others.
All these tools are well known and can be easily found on the Web
and elsewhere, so I won't provide any references for them here.
dbx Run-Time Checking (RTC)
dbx RTC is a part of the dbx debugger, which is a part of the Sun Studio compiler/tool suite. dbx RTC has many advantages compared to other tools and many useful features. When and where it
works, it works quite well. I've used it many times to debug
small- and medium-size programs, with considerable success.
At the moment, dbx RTC memory access checking is available only
on systems running the Solaris OS for SPARC platforms (not x86/x64 platforms), although this
should change in the next version of Sun Studio software.
dbx RTC does have one disadvantage: It may not scale for very
large applications. This is because it instruments every
application and system binary involved (executables and shared
libraries) in memory on the fly in dbx, before starting to run
the application. For large applications, it may run out of
memory, or it may take too long to initialize.
The libumem Package
libumem(3LIB) is an optional malloc replacement package, which is
a part of the Solaris OS (starting with the Solaris 9 OS). It also has a
memory debugging mode. See umem_debug(3MALLOC). The debugging
capabilities of libumem include "redzone" sections to check for
memory overwrites and filling the allocated and freed memory
segments with special patterns to help detect the use of
uninitialized data.
However, a few of my very unscientific experiments with
umem_debug(3MALLOC) have not shown much success in finding memory
access bugs with the applications I've been working with, while
Dmalloc has found such bugs.
Also, using umem_debug(3MALLOC) requires the use of mdb(1), a
"modular debugger" that is mostly intended to be used by kernel
programmers. This makes using umem_debug(3MALLOC) less convenient
for application programmers who are more used to symbolic debuggers, such as dbx.
Purify
Purify is a commercial tool from IBM (formerly from Rational Software,
before that, from Atria Software, and before that, from Pure Software). It is one of the oldest and most powerful runtime memory debugging tools.
However, I found that it has many disadvantages when it comes to debugging
large Solaris applications:
- This tool requires a special Purify version for each Solaris
platform, for each Solaris kernel update, and for each new compiler
version.
- Purify requires relinking the application binary to instrument
the code for Purify use.
- The resulting specially instrumented binaries can be
run only at the location where they were built.
- Running the instrumented binaries can be extremely slow.
I've seen some application runs that normally take a few minutes
take all day to run when instrumented for Purify.
- This tool produces many false-positives making it hard to separate
the real memory bugs from the rest.
- The tool does not work with VIS instructions (on SPARC platforms), making it
impossible to use with applications that call
mediaLib functions,
for example.
- The tool is commercial (and quite expensive). Not only does it
cost money (which is a barrier in itself in this age of
open source and other "free" software), but also having to deal
with Purify licensing adds significant complexities to the
debugging process.
Valgrind
Valgrind is a very powerful open source tool for runtime memory
checking. The tool is not perfect but it's very useful (and fast) where
it is available (primarily for Linux/x86 systems).
Unfortunately, Valgrind can't be used with the Solaris OS, even for x86/x64 platforms (not to mention SPARC platforms). Valgrind has many
details of the x86 architecture (it emulates each x86
instruction), the Linux kernel, GLIBC, and the GNU C compiler
hardwired, so porting it to the Solaris OS is very difficult.
Others
There are many more tools available for runtime memory debugging,
both commercial and open source: Electric Fence, GNU Checker,
Insure++, Mpatrol, Etnus MemoryScape, and more. I have not used
any of these other tools myself, so I can't comment on them.
One list of such tools is available in the Related Software
section of the Mpatrol web site.
Dmalloc Limitations
Dmalloc is a malloc-replacement package. It has limitations,
as all such packages do. In particular:
- Dmalloc can detect only memory-related problems in the heap, not
in the stack, and not in static memory.
- It can detect bugs only when the memory is allocated
with
malloc(), not by other means, such as sbrk() or mmap().
- It cannot detect the following problems, which can be detected by more
complicated tools, such as dbx RTC, Purify, and Valgrind, that do much more than replace
malloc, realloc, and free:
- Checking stack memory
- Reading from or writing to unallocated memory
- Reading from allocated but uninitialized memory
- Writing to read-only memory
Building Dmalloc for the Solaris OS
Depending on your requirements, Dmalloc can be used in many ways. It can be used as a static
library or as a shared library. You can optionally use #include dmalloc.h
and then use Dmalloc's additional features. You may or may not need the "multithreaded version,"
the "C++ version," and so on. Also, Dmalloc provides many configuration options
that can be specified only in the C header files, thus requiring
a rebuild.
Therefore, providing the binaries for all cases is not practical,
and it's necessary to build Dmalloc in place to satisfy your
specific needs.
Although the Solaris OS is listed on Dmalloc's web page among the platforms
on which Dmalloc has been built and run successfully, in reality,
building Dmalloc under the Solaris OS while using the Sun Studio
compiler is far from trivial.
For one thing, Dmalloc uses the ./configure facility,
which is a part of the GNU Autoconf suite frequently used
to build open source programs on various platforms. The problem
is that GNU Autoconf was designed for GNU compilers, tools,
and operating systems, such as Linux or FreeBSD, certainly not for
the Solaris OS. It has many hardwired dependencies on those operating
system features and compilers.
Fixing GNU Autoconf and its ./configure scripts for
the Solaris OS and Sun Studio compilers, and then fixing the way these
tools are used by Dmalloc, are large tasks beyond the scope of
this article. I had neither the time nor patience to do this the
hard way. Therefore, I'll describe an ad hoc "solution" I used
recently, while working on a memory-related bug in a very large application on a SPARC system running the Solaris 10 OS.
Here are the steps I used in this particular case.
1. Download and install the Dmalloc package.
I used the latest version, 5.5.0, which was released in February
2007. To unzip the package, use commands such as this:
% cd where_you_want_to_install_it
% gunzip -c /tmp/dmalloc-5.5.0.tgz | tar xvf -
|
This will create the subdirectory
where_you_want_to_install_it/dmalloc-5.5.0 and place
all Dmalloc files there.
For installation instructions, I generally used the instructions provided in
How to Install the Library on the Dmalloc web site.
2. Modify file settings.dist.
For my case, I changed the following:
ALLOW_FREE_NULL_MESSAGE=0
FENCE_TOP_SIZE=16
FENCE_BOTTOM_SIZE=ALLOCATION_ALIGNMENT*2
LOG_REOPEN=0
|
The details of these settings are described in the Dmalloc docs,
but the most important ones are FENCE_TOP_SIZE=16 and
FENCE_BOTTOM_SIZE=ALLOCATION_ALIGNMENT*2, which set the
"picket-fence" (also known as "redzone") areas to 16 bytes on
each side of each malloc block allocation. During the run,
Dmalloc will check the integrity of these areas and detect
whether they have been overwritten.
3. Run configure.
In this case, I didn't need any special C++ facilities. I did
need a "threaded" version (meaning Dmalloc should use mutex locking in
its special malloc, realloc, and free routines), and I
needed a shared-library version so that I could use LD_PRELOAD_64.
I also needed a 64-bit version of the library, so
I could use it with 64-bit applications.
By default, configure assumes GCC as the compiler. To use
the Sun Studio compiler, I set the CC environment variable, and I
used the following commands.
% setenv CC "/opt/SUNWspro/bin/cc -KPIC -errfmt=error -Xc -xarch=v9"
% ./configure --disable-cxx --enable-threads --enable-shlib
|
It took some trial and error to get the compiler flags
correct. Particularly, -Xc turned out to be necessary because
Dmalloc does certain things correctly only if the compiler is
ANSI-C compliant, which is accomplished with the -Xc flag with
the Sun Studio compilers. The -KPIC setting is to generate
position-independent code intended for a shared library.
The -xarch=v9 setting is for 64-bit Solaris applications for SPARC systems.
For 64-bit Solaris x64 applications, I used -xtarget=opteron -xarch=amd64 instead.
Unfortunately, configure ran into a number of problems.
Since it hasn't been taught how to handle the Sun Studio
compiler, a number of its tests failed, and program conftest
(both 32-bit and 64-bit versions of it) crashed, as I learned
soon enough because all my machines have system-wide AppCrash
(see references [2] and [3]) installed. So, I got emails with AppCrash
output similar to this:
Output from runme_on_app_crash
Program: conftest
Process ID: 8389
Received signal: 6
Application Debugging Data
--------------------------
>> /bin/pstack 8389
8389: ./conftest
ffffffff7f2ce8c4 _lwp_kill (6, 0, ffffffffffffffff, \
ffffffff7f3e6000, 0, 0) + 8
ffffffff7f248bcc abort (1, 1b8, ffffffff7f2ba83c, \
19d540, 0, 0) + 118
0000000100000de0 main (1, ffffffff7ffff1f8, ffffffff7ffff208, \
ffffffff7f249408, ffffffff7f1000c0, ffffffff7f100100) + 28
0000000100000a9c _start (0, 0, 0, 0, 0, 0) + 17c
...
>> /bin/ptree 8389
...
801 /bin/csh
7086 /bin/bash ./configure --disable-cxx --enable-threads --enab
8388 /bin/bash ./configure --disable-cxx --enable-threads --en
8389 ./conftest
...
|
On the other hand, it appears that the crash in
abort() may be intentional, a part of the tests
performed by the configure script. I may have
noticed this crash only because AppCrash detected it.
In any case, configure produced the required files:
Makefile, conf.h, and settings.h. I examined them
for sanity, as recommended in the Dmalloc document.
4. Run make.
This produced a few compiler warnings (seemingly harmless)
and eventually the shared library I needed called
libdmallocth.so. However, that library turned out
to be wrong. It was 32-bit (not 64-bit as I needed):
% file libdmallocth.so
libdmallocth.so: ELF 32-bit MSB dynamic lib SPARC Version 1,
dynamically linked, not stripped, no debugging
information available
|
Also, it had no function definitions at all:
% nm libdmallocth.so | grep FUNC
%
|
An examination of the make output showed the
way the library was built:
ar cr libdmallocth.a arg_check.o compat.o dmalloc_rand.o \
dmalloc_tab.o env.o heap.o chunk_th.o error_th.o malloc_th.o
ranlib libdmallocth.a
rm -f libdmallocth.so libdmallocth.so.t
ld -G -o libdmallocth.so.t libdmallocth.a # arg_check.o \
compat.o dmalloc_rand.o dmalloc_tab.o env.o heap.o chunk_th.o \
error_th.o malloc_th.o
mv libdmallocth.so.t libdmallocth.so
|
This is all wrong. For one thing, the ranlib(1)
command is from the ancient SunOS 4.x OS. There is no need to
create an archive library first in this case at all.
To correct it, I simply ran the cc command to do
what I wanted:
% /opt/SUNWspro/bin/cc -xarch=v9 -G -o libdmallocth.so \
arg_check.o compat.o dmalloc_rand.o dmalloc_tab.o env.o heap.o \
chunk_th.o error_th.o malloc_th.o
% file libdmallocth.so
libdmallocth.so: ELF 64-bit MSB dynamic lib SPARCV9 Version 1, \
dynamically linked, not stripped
% nm libdmallocth.so | grep FUNC
[747] | 61928| 140|FUNC |GLOB |0 |7 |_dmalloc_address_
break
[751] | 45936| 184|FUNC |GLOB |0 |7 |_dmalloc_atoi
...
|
Later, to make this easier, I created a script I called
rebuild containing the make command
followed by the cc command above, and ran
rebuild whenever a Dmalloc rebuild was necessary.
5. Run a test program.
The test described in the Dmalloc documentation ("make light") didn't
work for me.
On the Solaris 10 OS for SPARC platforms, the resulting dmalloc_t program
attempted to consume all the memory available on the system and
then got hung. I had to kill the dmalloc_t process.
On the Solaris 10 OS for x64 platforms, the test program dmalloc_t produced the following error messages:
% ./dmalloc_t -s -t 10000
ERROR: Running special tests failed. Last dmalloc error: no
error (err 1)
Random seed is 1173381022. Final dmalloc error: no error (err 1)
|
Running dmalloc_t without the -s flag
(in the non-silent mode) has produced the following error
messages:
ERROR: index overload failed
ERROR: index overload failed
|
I'm not sure what all those error messages and conditions mean.
Most likely, they indicate problems with the test program
dmalloc_t.c. I've decided not to debug it any
further.
Using the Dmalloc Shared Library
To use the Dmalloc shared library, I performed the following
steps.
1. Set the DMALLOC_OPTIONS environment variable.
Dmalloc has a lot of options. Including all of them in one
command in a legible form would be impossible. Instead, Dmalloc has a
utility program called dmalloc. There are many ways
to use dmalloc. See the detailed information in the Description of the Debugging Tokens
section of the Dmalloc web site.
For example, here's how I used it in this case.
In the dmalloc-5.5.0 directory, I created a file called
.dmallocrc and copied the sample file dmallocrc into it. Then I
modified .dmallocrc and created a section called greg that
contained the Dmalloc options I wanted:
greg log-bad-space, check-fence, check-heap, \
check-funcs, print-messages, error-dump, \
realloc-copy, check-blank
|
Then I ran dmalloc greg and got the following output (ignoring
the various "feature has been disabled" warnings that don't seem
to make much sense):
setenv DMALLOC_OPTIONS debug=0x42106d00
|
In addition to the hexadecimal debug value, I also needed
a few more options, so my final setting of DMALLOC_OPTIONS
became:
setenv DMALLOC_OPTIONS debug=0x42106d00,inter=1000,log=logfile.%p
|
In this command, inter=1000 means that I want
the integrity of the heap checked every 1000th call to
malloc, realloc, or free, as opposed to the default value
of inter=1, meaning check the entire heap each time. The
log=logfile.%p setting means create a log file called
logfile.pid, where pid is replaced with
the process ID.
2. Set the LD_PRELOAD_64 environment variable to the Dmalloc shared library, for example:
setenv LD_PRELOAD_64 /export/home/dmalloc-5.5.0/libdmallocth.so
|
3. Run your application.
Make sure the application is using a dynamically linked malloc package, such as the standard libc malloc(3C). Preloading will not work if malloc, realloc, and free are linked in
statically.
Fixing a Bug Using Dmalloc
Let us consider a simple example. In this case, Dmalloc has
detected two problems in the system libraries, specifically, in
the OpenGL pipeline for the XVR-2500 graphics card installed in
an Ultra 45 workstation. The problems themselves are almost
trivial, but they are quite typical.
1. Memory overwrite:
1168279430: 27537: pointer '0x10d0bf910' from 'unknown' \
prev access 'unknown'
1168279430: 27537: dump of proper fence-top bytes: \
'\372\312\336i\372\312\336i\372\312\336i\372\312\336i'
1168279430: 27537: dump of '0x10d0bf910'+3: \
'v/fb\000\312\336i\372\312\336i\372\312\336i\372\312\336i'
1168279430: 27537: next pointer '0x10d0bf940' (size 8) may \
have run under from 'unknown'
1168279430: 27538: ERROR: _dmalloc_chunk_heap_check: failed \
OVER picket-fence magic-number check (err 27)
|
The stack trace (as reported by AppCrash) was:
...
ffffffff7ee17470 dmalloc_error (ffffffff7ee1b3f0, \
ffffffff7fffac48, 0, ffffffff76bf2044, 0, 0) + 140
ffffffff7ee11a7c log_error_info (0, 0, 0, 10d0b0a98, \
ffffffff7ee1b3d8, ffffffff7ee1b3f0) + 3c
ffffffff7ee13ff0 _dmalloc_chunk_heap_check (0, ffffffff7f72fd5c, \
ff000000, ff000000, 11e6f0, ffffffff7f72c448) + 5f8
ffffffff7ee1806c dmalloc_in (0, 0, 1, ffffffff7f61f068, 11e6a0, \
ffffffff71602000) + 3ec
ffffffff7ee18370 dmalloc_malloc (0, 0, 20, a, 0, 0) + 40
ffffffff7ee18cd0 malloc (20, ffffffff6bacc238, 0, \
ffffffff6b609538, 11e6f0, ffffffff7f72c448) + 28
ffffffff78b055d0 XextAddDisplay (ffffffff6c685190, 10d864010, \
ffffffff6c67d078, ffffffff6c648728, 0, 0) + 20
ffffffff6b609538 ogl_kfb_XF86DRI_glx_QueryDirectRenderingCapable \
(10d864010, 10d090e10, ffffffff7fffb854, 2238, ffffffff6c648718, \
ffffffff7f61f068) + 78
ffffffff6b607964 ogl_kfb_XF86DRIQueryDirectRenderingCapable \
(10d864010, ffffffff6b6094c0, ffffffff7fffb854, ffffffff6b5f6910, \
4b93dc, 0) + 24
ffffffff6b5eed20 __driCreateScreen (10d864010, 0, 10d090e68, \
b, 10d88f010, 0) + 20
ffffffff6b5f6960 ogl_kfb_create_screen (10ce5c810, 8000, \
10d0bf3d0, 10daa7900, 1, 10d88f010) + 758
ffffffff79ab7080 __glxcLoadInitModule (10d091210, 0, 0, \
230, 0, ffffffff79d19a40) + 640
ffffffff79ab71d4 cglxdCreateContext (10d864010, 10daa7690, 0, \
10daaa010, 8014, 10ce5c810) + 94
ffffffff79af6e2c __glXCreateNewContext (10d864010, 10daa7690, \
8014, 1, 0, 0) + 5ec
ffffffff7f1532b8 _glXCreateNewContext (10d864010, 10daa7690, 8014, \
0, 1, ffffffff7f30b7b8) + 254
ffffffff7f131c9c glXCreateContext (10d873010, 10daa9810, 0, 1, \
1, ffffffff7fffc190) + d94
...
|
According to the Dmalloc documentation:
27 (ERROR_OVER_FENCE)
This indicates that a pointer had its upper bound
picket-fence magic space overwritten. If the 'check-fence'
token is enabled, the library writes magic values above and
below allocations to protect against overflow. ...
|
I'll describe more details about this error below.
2. Double free():
1168468189: 27858: ERROR: free: cannot locate pointer in \
heap (err 22)
1168468194: 27858: error details: finding address in heap
1168468194: 27858: pointer '0x10d0c6c10' from 'unknown' prev \
access 'unknown'
|
To get more information, I changed the code that prints these messages (dmalloc_error() routine in
error.c file) to invoke pstack(1) instead of
fork(). This modification is explained in more
detail below.
The following stack trace was produced:
8977: /<path_to_executable>
----------------- lwp# 1 / thread# 1 --------------------
ffffffff76ccebb8 waitid (0, 2316, ffffffff7fffae90, 3)
ffffffff76cc0cec waitpid (2316, ffffffff7fffb110, 0, 0, \
ffffffff6f713a40, 0) + 64
ffffffff76cb44ac system (ffffffff7fffb2b8, 1988, 1800, 0, \
ffffffff76de6000, ffffffff7fffb178) + 394
ffffffff7ef172d8 dmalloc_error (ffffffff7ef1b3c8, \
ffffffff76df0fc0, 0, 1000, 0, 0) + 140
ffffffff7ef11a04 log_error_info (0, 0, 10d0c6c10, 0, \
ffffffff7ef1b3b0, ffffffff7ef1b3c8) + 3c
ffffffff7ef14ca4 _dmalloc_chunk_free (0, 0, 10d0c6c10, 11, \
0, 0) + 1dc
ffffffff7ef18910 dmalloc_free (0, 0, 10d0c6c10, 11, 4cd000, \
30000) + 108
ffffffff7ef18ed0 free (10d0c6c10, 10d0c6b10, 1, 10d0c6b10, \
0, 10d0c6c10) + 20
ffffffff6b5f4854 ogl_kfb_destroy_drawable (10db79890, \
10d080e10, 10d0c6c10, 7800, 2, ffffffff6b5ee008) + 58
ffffffff79ca84c4 cglxdDestroyGlxDrawable (10ce5c810, 0, 0, \
10d0c6d10, ffffffff6b5f48f0, 0) + 324
ffffffff79cafec0 __glXDestroyPbuffer (10d8c3010, b00010, \
400, ffffffff7f214b54, 22f100, 420) + 40
ffffffff7f214b54 glXDestroyPbuffer (10d8c3010, b00010, 0, \
ffffffff7f3e5de0, ffffffff7f3e9ad3, ffffffff7f3e9a70) + 7c4
ffffffff7f272a78 __1cHpbuffer2T5B6M_v_ (10d961780, 0, \
ffffffff7f3e9a70, ffffffff7f3e9ace, ffffffff7f3e9ba8, \
ffffffff7f3e5de0) + 480
ffffffff7f2735bc __1cFpbwin2T5B6M_v_ (10d774af0, \
10d774af8, 10d961780, ffffffff7f3e5de0, 0, 0) + 2c
ffffffff7f26b4cc __1cHwinhashGdetach6MpnP__winhashstruct__v_ \
(10d774af0, 10cf1e7e8, 0, 1000, 0, 6) + 34
ffffffff7f264654 __1cI_winhashGremove6MpcLb_v_ (10ce744c0, \
10d0af2d0, 5400002, 0, ffffffff7f3e5de0, 0) + 264
ffffffff7f22d588 XDestroyWindow (10d8c2010, 5400002, \
10ce744c0, ffffffff7f3eac68, ffffffff7f3e5de0, 0) + 7d8
...
|
It turned out there was a duplicate free() memory
error in this code.
Such problems can be further debugged using a debugger such as
Sun Studio dbx or its IDE. You can set a breakpoint in
dmalloc_error() and examine the function arguments,
the contents of the heap, and so on. Of course, this will be much
easier if the application is compiled debuggable and its source
code is available.
Both of these bugs were fixed in the next release of the
Sun OpenGL patch.
How Dmalloc Can Be Improved
Here are a few Dmalloc improvements I can think of.
Make the Dmalloc Implementation for the Solaris OS Recognize Sun Studio
Compilers
This should include Sun Studio compiler flags for 64-bit (for both SPARC
and x86 platforms) and ANSI-C. Replace the ld command for linking with cc or CC, and so on.
The problems with ./configure described above don't seem to be
related to the tools autoconf, automake, or libtool themselves so much, but rather to the way those tools are being used in Dmalloc.
Since Dmalloc needs to create a shared library (at least as an
option), it could have been using libtool. However, the existing
configure.ac file doesn't invoke the AC_PROG_LIBTOOL
macro as it could. Instead, it seems to have its own macros for
defining how to build shared libraries, and these are wrong for
the Solaris OS and Sun Studio compilers. With the Sun Studio compilers,
it should use the cc (or CC for C++) compiler driver to link the shared library, so that it is consistent about generating 64-bit
or 32-bit libraries. Instead it is invoking ld directly, and it isn't setting -64 when a 64-bit library is desired.
There is no file Makefile.am either, so apparently it's not using
automake. Instead, it is providing a manually written
Makefile.in file.
It would be desirable for the project to move to using libtool.
At least on the Solaris OS, libtool-1.5.22 or later should work
correctly when building shared libraries with Sun compilers.
Add an Option to Build a 64-bit Version of the Dmalloc
Shared Library
Modern applications are as likely to be 64-bit as 32-bit. Users need to have an easy way to choose between the two.
Fix mmap() Problems
I've run into the situation where Dmalloc started to use mmap(2)
instead of malloc for the entire application. To work around
this problem, I commented out the #if HAVE_MMAP &&
USE_MMAP section in Dmalloc source file
heap.c:
% diff heap.c.orig heap.c
97c97
< #if HAVE_MMAP && USE_MMAP
---
> #if HAVE_MMAP && USE_MMAP && 0
|
Improve Readability of the Dump of Overwritten Memory, and
Improve the Documentation Explaining What That Dump Contains
Currently, the dumped data is mostly written in octal format.
I think printing it in hexadecimal format would make it
easier for the user to interpret.
Using the first example described above, the memory overwrite
error message was shown as:
1168279430: 27537: pointer '0x10d0bf910' from 'unknown' \
prev access 'unknown'
1168279430: 27537: dump of proper fence-top bytes: \
'\372\312\336i\372\312\336i\372\312\336i\372\312\336i'
1168279430: 27537: dump of '0x10d0bf910'+3: \
'v/fb\000\312\336i\372\312\336i\372\312\336i\372\312\336i'
1168279430: 27537: next pointer '0x10d0bf940' (size 8) \
may have run under from 'unknown'
1168279430: 27538: ERROR: _dmalloc_chunk_heap_check: failed \
OVER picket-fence magic-number check (err 27)
|
The "proper fence-top bytes" are initialized to 0xFACADE69 (as
defined in Dmalloc file chunk_loc.h), four times in
this case, since I configured Dmalloc to have 16-byte fences
(also known as redzones), both for "bottom" and "top." Using
the octal representation for characters that can't be displayed
as ASCII, this indeed translates to:
\372\312\336i\372\312\336i\372\312\336i\372\312\336i
|
The overwritten buffer is:
\000\312\336i\372\312\336i\372\312\336i\372\312\336i
|
However, this is not obvious from the Dmalloc message. It took
me a while to realize it, and I had to look into what Dmalloc
does exactly. From file chunk.c:
/*
* The size includes the bottom fence post area. We want it to
* align with the start of the top fence post area.
*/
if (DUMP_SPACE > user_size + FENCE_OVERHEAD_SIZE) {
dump_size = user_size + FENCE_OVERHEAD_SIZE;
offset = -FENCE_BOTTOM_SIZE;
}
else {
dump_size = DUMP_SPACE;
/* we will go backwards possibly up to FENCE_BOTTOM_SIZE offset */
offset = user_size + FENCE_TOP_SIZE - DUMP_SPACE;
}
...
dump_pnt = (char *)start_user + offset;
if (IS_IN_HEAP(dump_pnt)) {
out_len = expand_chars(dump_pnt, dump_size, out, sizeof(out));
dmalloc_message(" dump of '%#lx'%+d: '%.*s'",
(unsigned long)start_user, offset, out_len, out);
}
|
In this case, the values are as follows.
DUMP_SPACE = 20
user_size = 7 (length of "/dev/fb")
FENCE_OVERHEAD_SIZE = 16
offset = user_size + FENCE_TOP_SIZE - DUMP_SPACE = 3
|
So, this malloc() was called requesting 7 bytes, but then 8 bytes
were written into that buffer, including the trailing zero:
/dev/fb\000. In this example, it was the result of
OpenGL code like this (where devPath is a character
string containing /dev/fb):
char *ptr;
...
ptr = malloc(strlen(devPath));
strcpy(ptr, devPath);
|
The author of this code forgot that strcpy() adds a trailing zero at the end of the copied string. This is a rather
common error in C/C++. The correct way to call
malloc() in the situation above is:
ptr = malloc(strlen(devPath)+1);
|
To make it easier for the user to deal with errors like this, I
think it's important to print more debugging information, such as
the size of the current allocation (user_size) and the
DUMP_SPACE value. Perhaps printing the damaged "fence" alone, in
addition to what's printed now, would also help.
Add Checking for Calls to memcpy() With
Overlapping Memory Regions
The latest version of Dmalloc (5.5.0) is supposed to have this
check added already, but I haven't been able to make it work at
all, at least on a Solaris machine with the Sun Studio compilers,
even when I included dmalloc.h in my test program.
This issue is not directly related to malloc, but
the additional check is useful and it's easy to do. A while ago,
someone at the Dmalloc forum suggested implementing it. Valgrind
performs this check.
The problem is that memcpy() is sometimes used when
the source and destination memory buffers overlap. This is a
bug, at least in the Solaris OS. It can damage the memory buffers. It is safe to use
memmmove() in that case, usually at the price of less efficiency.
For now, I've created a special library interposer to perform
this check separately from Dmalloc. See reference [4] for more information about library
interposers. Here is this library interposer source code:
% cat memcpy_interp.c
/*
* Interpose on memcpy() and check for overlapping memory
* segments, like Valgrind.
* By Greg Nakhimovsky, Sun Microsystems.
* January 2007.
*
* Build and use this interposer as following
* (assuming 64-bit application on Solaris/SPARC):
* cc -g -errfmt=error -xarch=v9 -o memcpy_interp.so -G
* -Kpic memcpy_interp.c
* setenv LD_PRELOAD_64 /path/memcpy_interp.so
* run the app
* unsetenv LD_PRELOAD_64
*/
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <unistd.h>
#include <string.h>
void *memcpy(void *restrict s1, const void *restrict s2,
size_t n)
{
static void * (*func)(void *restrict s1, const void
*restrict s2, size_t n);
static char buffer[64];
char *cs1 = s1;
char *cs2 = s2;
int x;
if(!func)
{
func = (void *(*)()) dlsym(RTLD_NEXT, "memcpy");
sprintf(buffer,"LD_PRELOAD_64= /bin/pstack %ld\n", getpid());
}
x = cs2 - cs1;
if(x < 0) x = - x;
if(x < n)
{
printf("\nmemcpy() called with overlapping segments:\n
s1=0x%p s2=0x%p n=%d\n", s1, s2, n);
system(buffer);
}
return func(s1,s2,n);
}
%
|
Interestingly, when I ran the application in question with this
library interposer (but without Dmalloc), it detected an
inefficiency in X/Motif routine GetResources(). I
got a lot of output similar to the following.
memcpy() called with overlapping segments:
s1=0x1253f0788 s2=0x1253f0788 n=8
19436: /path/app.exe
----------------- lwp# 1 / thread# 1 --------------------
ffffffff76dcebb8 waitid (0, 4f3e, ffffffff7fff7d30, 3)
ffffffff76dc0cec waitpid (4f3e, ffffffff7fff7fb0, 0, 0, \
ffffffff6f818480, 0) + 64
ffffffff76db44ac system (ffffffff7f300960, 1988, 1800, 0, \
ffffffff76ee6000, ffffffff7fff8018) + 394
ffffffff7f200580 memcpy (1253f0788, 1253f0788, 8, ffffff67, \
5, 1253f0788) + f0
ffffffff7a11f220 GetResources (1253f06f0, 1253f06f0, \
ffffffffffffff67, 0, ffffffff7a264db0, 28) + e4c
ffffffff7a11df38 _XtGetResources (146f8c, ffffffff7fffa650, \
4, 0, ffffffff7fffa39c, ffffffff7fff9d50) + 120
ffffffff7a11c644 xtCreate (1253f06f0, 0, 10ae0a468, \
1254c4780, 10cf04f20, ffffffff7fffa650) + 154
ffffffff7a125bc0 _XtCreateWidget (10db11a78, 10ae0a468, \
1254c4780, ffffffff7fffa650, 4, 0) + 278
ffffffff7a125918 XtCreateWidget (10db11a78, 10ae0a468, \
1254c4780, ffffffff7fffa650, 4, 1) + d0
...
|
Note that GetResources() is calling
memcpy() to copy 8 bytes from a given address to
itself!
I've reported this inefficiency to Sun's X/Motif developers.
Change error-dump Functionality From fork(2) to pstack(1), at Least for the Solaris OS
Currently, error-dump results in Dmalloc calling
fork(), attempting to dump core, and continuing. In my tests,
this has caused recursive behavior, leading to a bad crash.
Also, since I have AppCrash installed on all my Solaris 10 and
later machines, this generated endless AppCrash reports.
Instead, generating a stack trace telling us where in the program
the error occurred would be much more useful. For this purpose,
I replaced the fork() code in the Dmalloc
dmalloc_error() routine (error.c file) with
this:
char buf[128];
sprintf(buf, "LD_PRELOAD_64= /bin/pstack %d", (int)getpid());
system(buf);
|
The "LD_PRELOAD_64= " command is to prevent recursive
preloading of the Dmalloc library for pstack(1).
Dmalloc attempts to determine the "return address" of the caller
of malloc() and other functions. See the
GET_RET_ADDR() macro in the return.h file.
However, that code is obsolete and ineffective (at least for the
Solaris and Sun Studio platforms). This is why Dmalloc prints
unknown in error messages such as this:
pointer '0x10d0bf910' from 'unknown' prev access 'unknown'
|
The pstack(1) technology is much more reliable than
the assembly-level hacks in return.h.
If you include dmalloc.h and recompile your application,
Dmalloc may be able to obtain the caller's address from the
Dmalloc functions. This wasn't practical in my case, so I didn't
test this feature.
Conclusion
Dmalloc is a valuable debugging tool for C, C++, and Fortran
developers, supplementing other available debugging
technologies. I've found it especially useful for large
applications that the more powerful tools can't handle well.
With a few relatively minor adjustments, Dmalloc can become even
more useful, particularly for the developers of Solaris
applications using Sun Studio compilers and tools.
References
Acknowledgements
I'd like to thank Gray Watson for creating Dmalloc and for improving and
maintaining it all these years. Also, thanks to my Sun colleague
Richard Smith for his comments and advice regarding the use of
the GNU Autoconf tools.
About the Author
Greg Nakhimovsky is a Sun engineer working with application
software vendors to make sure their products run well on Sun
systems.
|
|