Sun Java Solaris Communities My SDN Account Join SDN
 
Article

dbx and System Libraries: Why Can't dbx Read My Process or Core File?

 
By Chris Quenelle and Ann Rice, March 28, 2006 (Revised)  
/solaris/

The problems described in this article predominantly affect users of Solaris Operating System releases older than the Solaris 8 OS, and dbx versions older than 6.2. The simplest way to work around these issues is to use the latest release of the Solaris OS, or the latest version of dbx. In what follows, behavior attributed to the Solaris 9 OS also applies to the Solaris 10 OS. Behavior attributed to dbx 7.0 also applies to releases of dbx through 7.5.

For users of older versions of the Solaris OS and dbx, this article explains the long-standing dependencies that dbx has on various system libraries, how those dependencies can cause problems, and how to fix those problems.

Technical Note: The Solaris 10 OS moved the threads functions from libthread into libc. This move means that libthread_db.so.1 is now called libc_db.so.1.

Technical Note: When the new 1-to-1 threads model is being used, the lwp commands in dbx work just as well as the threads commands for inspecting a core file or process. This model is used in the Solaris 8 OS when the alternate threads library is in use. The new model is used for all processes in the Solaris 9 OS and Solaris 10 OS.x

Introduction to the _db Libraries

Two common system libraries that provide services to user programs are the threads library and the runtime linker. To debug a program, dbx needs various internal information from these libraries. For example, when you are debugging a multithreaded program, dbx reads the number of threads in the program, where these threads' stacks are in memory, and other information about the threads.

From the dynamic linker (ld.so), dbx learns when libraries are loaded and unloaded by the program. dbx needs to be able to stop the program after the program has been loaded, but before any of the initialization code has been run.

If dbx directly accessed private data structures from these system libraries, a special version of dbx would be needed for each version of the Solaris Operating Environment (and perhaps each patch level). So these two system facilities provide dbx (and other debuggers) with a "debugging interface" for accessing information.

Every Solaris system that has a libthread.so (the threads library) also has a library called libthread_db.so.1 that can be used directly by the debugger to access internal thread information in a safe way, using a stable interface. So even if the internal data structures change in libthread.so, the library calls to read information in those data structures can be kept the same. The same scheme is used for ld.so (the runtime linker) and librtld_db.so (the runtime linker's debugging library).

These system debugging libraries are provided in the Solaris Operating Environment in /usr/lib. Each debugging library knows about the internal private data structures of the corresponding system library. If you update your system with a new version of libthread_db.so or ld.so and you do not install a new version of the debugging library, then dbx won't function correctly.

Patches that deliver the system libraries include the matching debugging libraries. But sometimes one of the system libraries is copied or moved or updated, and the person doing so does not also update the corresponding debugging library. When this happens, dbx experiences some strange failures. These dbx problems differ depending on the release of dbx, so the following sections use examples for several different versions of dbx.

Versions 7, 8, and 9 of the Solaris Operating Environment support 64-bit SPARC[tm] V9 programs, and include two versions of each system library (32-bit and 64-bit) and two versions of the matching debugging libraries. The 64-bit debugging library can also support 32-bit programs. For example, for a 64-bit debugger debugging a 32-bit multithreaded program, the program uses /usr/lib/libthread.so, and the debugger uses /usr/lib/sparcv9/libthread_db.so.

Problems Debugging Core Files

When you load a core file into dbx, dbx needs information about the shared libraries that were loaded when the program dumped core. To get this information, dbx uses librtld_db.so.

Truncated Core Files

If you have problems debugging a core file with dbx, check to see if the file is truncated. When a core file is incomplete, often the dynamic linker information is missing. In this case, dbx cannot load information about shared libraries, even if the proper version of librtld_db.so is installed on the system.

dbx 7.0 issues a descriptive error message when it detects a truncated core file. If you receive this message, try to get a complete core file to examine.

Often, you can determine if your core file is truncated by checking the core limit set in the shell that was running when the core file was produced. If you are using the C shell, use the limit command. If you are using the Bourne shell or Korn shell, use the ulimit command. For information on both of these commands, see the limit(1) man page.

If you are not running dbx 7.0, and you cannot determine what core limit was in effect when the core file was created, check the program headers in the core file to see if all the segments were written. For example:

% elfdump -p core | tail -5
     Program Header[13]:
         p_vaddr:      0xffbf8000      p_flags:    [ PF_X  PF_W  PF_R ]
         p_paddr:      0               p_type:     [ PT_LOAD ]
         p_filesz:     0x8000          p_memsz:    0x8000
         p_offset:     0x71a24         p_align:    0
     % ls -la core
     -rw-------   1 joeuser staff     498212 Jun  4 14:04 core

     # add the <filesz> field and the <offset> field of the
     # last program header segment to find the total expected 
     # size of the core file.  See if that number matches
     # the size shown by the ls command.

     % dbx -c "print 0x8000 + 0x71A24 ; quit"
     dbx: warning: unknown language, 'ansic' assumed
     0x8000+0x71a24 = 498212

     # The result matches, so this is a complete core file.

Mismatched Core Files

Once you have determined that you have a complete core file, check for a mismatched core file. If the core file was created on a system running a different version of the Solaris Operating Environment than the system on which you are debugging it, then the data structures inside the core file (for libthread.so and ld.so) do not match the debugging libraries on the system running dbx.

Problems might also occur when patching a system, and updating shared system libraries. If you acquire a core file, install a patch on the system, and then run dbx on the core file, dbx might have trouble matching up the core file with the new system libraries.

In some cases, a newer librtld_db.so can read the dynamic linker data structures from an older ld.so, but in many cases, it cannot. So moving a core file to another machine and then using dbx to debug it requires special care.

dbx 6.2 and dbx 7.0 provide a feature that explicitly supports debugging mismatched core files. You can copy a set of libraries from the system on which the core file was generated, and tell dbx to use those libraries instead of the libraries in /usr/lib. For more information about this feature and how to use it, see "Debugging a Mismatched Core File" in Chapter 2 of the Debugging a Program With dbx manual, or type help core mismatch on the dbx command line.

In the Solaris 9 Operating Environment (and the Solaris 8 Operating Environment beginning with the 10/00 update release), the linker automatically stores a checksum in every library and executable, and the checksums of the libraries used in the program are stored in the core file. dbx 6.2 and dbx 7.0 can use this checksum to give you a specific warning if dbx is using symbols from a different library from the one used by the program when the core file was created. To see the checksum for a library or executable, use the dump command:

% dump -Lv /usr/lib/libc.so.1 | grep CHECKSUM
     [11]      CHECKSUM        0x80b8

There is no utility for dumping this information from a core file.

In dbx 5.0, dbx 6.0, or dbx 6.1, you can work around a mismatched librtld_db.so by telling dbx not to load the symbols for ld.so. This causes dbx to operate in a limited mode without accessing the shared libraries in the core file, but you can still examine the contents of the main program. In these versions of dbx you can learn how to use this workaround by typing help loadobject exclusion on the command line and looking for the information on the dbx environment variable allow_critical_exclusion. Versions 6.2 and 7.0 of dbx should revert to this limited mode automatically in cases of librtld_db.so failure.

You might receive errors like the following from dbx 4.0:

dbx: core file read error: address 0xff3dd164 not available
dbx: panic: Proc::get_rtld_stuff(): could not initialize rtld_db

From dbx 5.0, dbx 6.1, and dbx 6.2, you might receive errors like:

dbx: core file read error: address 0xff3dd164 not available
dbx: warning: could not initialize librtld_db.so.1 -- trying libDP_rtld_db.so
Make sure this is the same version of Solaris where the core dump originated.
See also `help core mismatch'.

Newer versions of librtld_db.so understand memory images better than older versions, and some of this improved functionality is delivered by linker patches for the Solaris Operating Environment. If you can't install the latest version of Solaris Operating Environment, some of these problems can be alleviated by installing the latest linker patch for the version of the Solaris Operating Environment that you are using.

Runtime Linker Mismatch

If the librtld_db.so on your system doesn't match the runtime linker version currently installed, then you'll see some strange errors when dbx tries to debug a program.

You can check your system for this mismatch outside of dbx by using the pmap utility. In the Solaris 2.6 Operating Environment and the Solaris 7 Operating Environment, use /usr/proc/bin/pmap. In the Solaris 8 Operating Environment and the Solaris 9 Operating Environment, this program is in /usr/bin. For information on using pmap, see the proc(1) man page.

Normally, the pmap utility uses librtld_db.so to list the shared libraries loaded into a running process. For example:

% pmap $$
    ...
    FF272000      8K read/write/exec   /usr/lib/libcurses.so.1
    FF280000    560K read/exec         /usr/lib/libnsl.so.1
    ...

If the pmap utility fails to initialize librtld_db.so, path names are missing from the listing and only the raw memory map is shown:

% pmap $$
    ...
    FF3C0000    128K read/exec         dev:118,0 ino:68345
    FF3E0000      8K read/write/exec   dev:118,0 ino:68345
    ...

Like dbx, pmap runs in 64-bit mode in a Solaris Operating Environment that supports 64-bit programs, so it depends on the 64-bit version of the debugging library.

These errors occur for the same reason as the mismatched core file errors above: dbx can't find the list of dynamically linked libraries.

Different Time Stamps

If you see unexpected dbx error messages about rtld, look at the time stamps of the following runtime linker and linker debug library files to see if they are similar.

/usr/lib/ld.so.1
/usr/lib/librtld_db.so.1
/usr/lib/sparcv9/ld.so.1
/usr/lib/sparcv9/librtld_db.so.1

On a system without 64-bit support installed, the SPARC V9 libraries might be missing. If the Solaris Operating Environment is started with 64-bit support, then dbx automatically run in 64-bit mode, even if it is debugging a 32-bit program. The program or core file being debugged uses /usr/lib/ld.so.1, but dbx uses /usr/lib/sparcv9/librtld_db.so.1.

If these libraries are installed from the same package or patch, the time stamps are likely to be within a few hours of each other. If this is not the case on your system, and you are receiving rtld error messages from dbx, install the latest linker patch for your version of the Solaris Operating Environment and see if the dbx problems go away.

To find and download the latest linker patch for your version of the Solaris Operating Environment:

  1. Direct your web browser to http://access1.sun.com/patch.public
  2. Under "Search the recommended patches sorted by operating system:", select your version of the Solaris Operating Environment from the list box and click Search.
  3. Search for "Linker" on the page that is displayed.

Possible Error Messages
The ways that dbx responds to an incorrect version of librtld_db.so depend on the version of dbx you are using and the version of the Solaris Operating System you are running.

On dbx 5.0, 6.0, 6.1, 6.2, and 7.0, you might see some, but not all, of the following error messages.

dbx: rtld debug library can't be loaded.
   dbx: could not initialize rtld_db.so
   dbx: warning: could not enable secondary rtld synch event
   dbx: warning: could not enable primary rtld synch event
   dbx: warning: rtld information hasn't been loaded by child process
   dbx: warning: could not initialize librtld_db.so.1 -- trying libDP_rtld_db.so
   dbx: internal warning: do_run(): rtld hand not shaken

On dbx 4.0 you are more likely to receive an assertion from dbx.

dbx: panic: Proc::get_rtld_stuff(): could not initialize rtld_db

Patch for Solaris 8 Operating Environment

Versions 109147-02 through 109147-04 of the linker patch for the Solaris 8 Operating Environment included a new ld.so but not a new librtld_db.so. Later revisions of this patch include compatible versions of both libraries.

Threads Library Mismatch

If you examine a core file using a different version of the Solaris Operating Environment than the one that generated the file, dbx may not understand the runtime linker data structures. In this case, dbx cannot locate libthread.so inside the program, and cannot use libthread_db.so to understand the threads data structures in your program.

If libthread_db.so is out of date, dbx cannot get information about threads in your program. If dbx finds libthread.so in your program, but cannot load or use libthread_db.so, it issues this message:

dbx: warning: thread related commands will not be available

In dbx 6.0, 6.1, 6.2, and 7.0, you can use the lwp and lwps commands to examine threads that are active on LWPs at the time you examine the program or core file. These versions of dbx issue this additional message.

dbx: warning: see `help lwp', `help lwps' and `help where'

If you receive these error messages, check the time stamps of all versions of libthread.so and libthread_db.so on your system.

Library Path Name Solaris Operating Environment Version
/usr/lib/libthread.so.1
/usr/lib/libthread_db.so.1
All versions
/usr/lib/sparcv9/libthread.so.1
/usr/lib/sparcv9/libthread_db.so.1
7, 8, and 9
/usr/lib/lwp/libthread.so.1
/usr/lib/lwp/libthread_db.so.1
/usr/lib/lwp/sparcv9/libthread.so.1
/usr/lib/lwp/sparcv9/libthread_db.so.1
8 only

You might see some of the following error messages if libthread_db.so does not match the corresponding libthread.so.

In dbx 4.0:

detected a multithreaded program
   dbx: store at 0 failed -- I/O error
   dbx: warning: program is not linked with libthread
   dbx: warning: could not initialize libthread_db.so -- ()
   dbx: warning: cannot get thread count
   dbx: warning: thread related commands will not be available

In dbx 5.0:

detected a multithreaded program
   dbx: warning: could not initialize libthread_db.so -- ()
   dbx: warning: thread related commands will not be available

In dbx 6.0, 6.1, 6.2, and 7.0:

detected a multithreaded program
   dbx: warning: could not initialize libthread_db.so -- ()
   dbx: warning: thread related commands will not be available
   dbx: warning: see `help lwp', `help lwps' and `help where'


Related Information


About the Authors

Chris Quenelle is a software engineer currently working in the Sun ONE Studio dbx engineering group. Chris has worked on performance and debugging tools at Sun for the last 9 years. Before coming to Sun, he worked on runtime support libraries and development tools at Supercomputer Systems Incorporated and Pyramid Technologies.

Ann Rice is a staff writer in the Sun ONE Studio Technical Documentation group, and is responsible for dbx and the dbx Debugger GUI. Ann began her career as a programmer, developing Fortran and COBOL applications for 12 years before making a transition to technical publications in 1979. She has documented network technology and development tools, as well as serving as a technical publications manager for several Silicon Valley companies, including 3Com Corporation and Sybase, Inc. Ann is a past president of the Silicon Valley chapter of the Society for Technical Communication.