Introduction
For a variety of reasons, nonreentrant library calls are often made from multithreaded programs. The potential for problems is often not caught until the code is placed under heavy load on a multiprocessor machine. To make matters worse, calls such as Problem Explanation
With the Solaris Operating Environment (OE), to avoid making the mistake of using nonreentrant library calls, you must look at the man page for every library call your code makes. If a reentrant This kind of insidious bug can creep into your MT code in many other ways as well. One example of this is an MT server that allows you to run code written by others (like a plug-in) in an MT environment. Another involves code that is ported from a platform that protects you from this type of bug (at a small cost to performance). There is also the case of code that was written in a single-threaded fashion, but later ported to an MT environment.
In extreme cases, you could accomplish 100 percent code coverage during QA, and not run up against one of these bugs until production. To illustrate this, please consider the trivial MT program that follows,
#include <stdio.h>
#include <thread.h>
#include <time.h>
#define MAXTHREADS 10000
static int Threads;
static volatile int StartYet;
static volatile int GlobCount = 0;
#define THR_MSLEEP(millisecs) poll(NULL,0L,(int)millisecs)
void *test_thread(void *nthr) {
time_t t;
char *ct;
while (!StartYet) THR_MSLEEP(500);
while(1) {
t = time(NULL);
ct = ctime(&t);
GlobCount++; /* this too is unsafe, but we don't care too much! */
}
}
main(int argc, char **argv) {
int nthr;
thread_t threads[MAXTHREADS];
if (argc < 2) {
printf("usage : %s <numthreads>n",argv[0]);
exit(1);
}
Threads = nthr = atoi(argv[1]);
StartYet = 0;
while ( nthr -- ) {
THR_MSLEEP(10); /* stagger creation */
thr_create(NULL, NULL , test_thread, (void *) nthr,
THR_NEW_LWP , &threads[nthr]);
}
while (1) {
poll(NULL,0L,300);
printf("main looping...total count so far %dn",GlobCount);
StartYet = 1;
if (GlobCount > 100000) {
exit(0);
}
}
}
This program has been run with up to 1600 threads on a single-processor machine with no problems. It was run with 100 threads on a 4-CPU machine and also ran fine. Only when run with 200 threads on a 4-CPU machine did the nonreentrant call to Here's a log of execution on a 4-CPU machine: gyruss 81 =>cc -mt mtunsafe.c gyruss 82 =>a.out 100 main looping...total count so far 0 main looping...total count so far 8711 main looping...total count so far 15906 main looping...total count so far 24577 main looping...total count so far 31760 main looping...total count so far 40420 main looping...total count so far 48998 main looping...total count so far 57670 main looping...total count so far 66342 main looping...total count so far 75018 main looping...total count so far 83688 main looping...total count so far 92340 main looping...total count so far 101021 gyruss 83 =>a.out 200 main looping...total count so far 0 main looping...total count so far 3348 main looping...total count so far 11882 main looping...total count so far 18846 main looping...total count so far 32364 main looping...total count so far 40087 main looping...total count so far 49300 Segmentation fault (core dumped) gyruss 84 =>dbx - core dbx: Using "/tmp/a.out" Reading a.out core file header read successfully Reading ld.so.1 Reading libthread.so.1 Reading libc.so.1 Reading libdl.so.1 Reading libc_psr.so.1 detected a multithreaded program t@135 (l@135) terminated by signal SEGV (no mapping at the fault address) 0xff2c116c: _smalloc+0x008c: ld [%o1 + 0x8], %o0 (/net/woornack/files2/forte6u2/SUNWspro/bin/. ./WS6U2/bin/sparcv9/dbx) where current thread: t@135 =>[1] _smalloc(0x10, 0xff33e728, 0x4, 0x10, 0x0, 0x0), at 0xff2c116c [2] malloc(0xb, 0xfffffff9, 0xffffffff, 0xff2d192c, 0x81010100, 0xff00), at 0xff2c11ac [3] tzcpy(0x25a40, 0xff33e8ac, 0x0, 0xa, 0xff338000, 0xffbefd53), at 0xff2d1948 [4] getzname(0xffbefd5d, 0xff33b524, 0x0, 0xff33b524, 0xffbefd53, 0x0), at 0xff2d1890 [5] _ltzset_u(0x3c5f0a23, 0xff338000, 0x0, 0x0, 0x0, 0x1), at 0xff2d1394 [6] localtime_u(0xf2b05d10, 0xff33e8b4, 0x0, 0x0, 0xff338000, 0xff2b731c), at 0xff2d055c [7] ctime(0xf2b05d10, 0xff33e8b4, 0x0, 0x0, 0x80ccc, 0x107e4), at 0xff2b731c [8] test_thread(0x44, 0xfd9d3d18, 0x1, 0xff39ae04, 0x0, 0xfe400000), at 0x107e4 (/net/woornack/files2/forte6u2/SUNWspro/bin/. ./WS6U2/bin/sparcv9/dbx) quit Different ApproachesSo how do you eliminate this type of problem? The tedious way is performing source code analysis by referring back to library call man pages for the Solaris OE. Another approach is to use Solaris software tools to look at all the libraries a binary uses. You can do this statically, or with a running process. As many binaries dynamically load libraries, the latter is more likely to be complete. In the early stages of code development and testing, these problems may not yet have manifested themselves. Before deployment, you can do some basic checking, even on the simple example presented above: rx7 143 =>ldd a.out libthread.so.1 => /usr/lib/libthread.so.1 libc.so.1 => /usr/lib/libc.so.1 libdl.so.1 => /usr/lib/libdl.so.1 /usr/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1 Note: since all of the above are libraries for the Solaris OE, you don't need to search each of them in turn. Alternatively, you can view libraries of a running process: rx7 144 =>a.out 100 & [2] 6221 rx7 145 =>pldd 6221 6221: a.out 200 /usr/lib/libthread.so.1 /usr/lib/libc.so.1 /usr/lib/libdl.so.1 /usr/platform/sun4u-us3/lib/libc_psr.so.1 rx7 146 =>nm a.out | grep UNDEF [56] | 0| 0|NOTY |WEAK |0 |UNDEF |__1cG__CrunMdo_exit_code6F_v_ [50] | 133676| 0|FUNC |GLOB |0 |UNDEF |_exit [70] | 133760| 0|FUNC |WEAK |0 |UNDEF |_get_exit_frame_monitor [65] | 133652| 0|FUNC |GLOB |0 |UNDEF |atexit [53] | 133736| 0|FUNC |GLOB |0 |UNDEF |atoi [74] | 133712| 0|FUNC |GLOB |0 |UNDEF |ctime [73] | 133664| 0|FUNC |GLOB |0 |UNDEF |exit [58] | 133688| 0|FUNC |GLOB |0 |UNDEF |poll [49] | 133724| 0|FUNC |GLOB |0 |UNDEF |printf [45] | 133748| 0|FUNC |GLOB |0 |UNDEF |thr_create [68] | 133700| 0|FUNC |GLOB |0 |UNDEF |time Of course, the example here is a very simple case; usually many libraries will have to be searched for all the different nonreentrant calls, and libraries may be dynamically loaded when putting the code through its paces. An Efficient Solution Using a Tool
Here is the source code for a simple library, rx7 368 =>cc -mt -o nonreentrant.so.1 -G -K pic multithreaded_nonreentrant.c rx7 369 =>setenv LD_PRELOAD ./nonreentrant.so.1 rx7 370 =>setenv PrintInfo 1 rx7 371 =>./a.out 10 INFO : Interposed thr_create looking up real function ptr main looping...total count so far 0 **ERROR** - thread 4 calling MT unsafe ctime(); threads: 14 **ERROR** - thread 4 calling MT unsafe localtime(); threads: 14 **ERROR** - thread 4 calling MT unsafe asctime(); threads: 14 main looping...total count so far 5851 main looping...total count so far 25225 main looping...total count so far 44296 main looping...total count so far 62846 main looping...total count so far 82171 main looping...total count so far 100201
To pinpoint where the offending calls are made, set the rx7 372 =>setenv PrintStack 1 rx7 373 =>./a.out 10 INFO : Interposed thr_create looking up real function ptr main looping...total count so far 0 **ERROR** - thread 4 calling MT unsafe ctime(); threads: 14 unknown lib:??+0x11f437e ./nonreentrant.so.1:print_stack+0x38 ./nonreentrant.so.1:ctime+0x108 a.out:test_thread+0x54 /usr/lib/libthread.so.1:_getfp+0x124 a.out:test_thread+0x0 **ERROR** - thread 4 calling MT unsafe localtime(); threads: 14 unknown lib:??+0x11f444e ./nonreentrant.so.1:print_stack+0x38 ./nonreentrant.so.1:localtime+0x108 /usr/lib/libc.so.1:ctime+0x4 ./nonreentrant.so.1:ctime+0x194 a.out:test_thread+0x54 /usr/lib/libthread.so.1:_getfp+0x124 a.out:test_thread+0x0 **ERROR** - thread 4 calling MT unsafe asctime(); threads: 14 unknown lib:??+0x11f43ee ./nonreentrant.so.1:print_stack+0x38 ./nonreentrant.so.1:asctime+0x108 ./nonreentrant.so.1:ctime+0x194 a.out:test_thread+0x54 /usr/lib/libthread.so.1:_getfp+0x124 a.out:test_thread+0x0 main looping...total count so far 6660 main looping...total count so far 27012 main looping...total count so far 47074 main looping...total count so far 65870 main looping...total count so far 84243 main looping...total count so far 103323
In this case, the bug is fixed by changing two lines in char *ct;=> char *ct, ctimebuf[60]; and ct = ctime(&t); => ct = ctime_r(&t,ctimebuf,sizeof(ctimebuf));
Not all programs work well with Please Note: while this tool is recommended for development and QA purposes, this should not be part of any actual deployment and will not be supported. The following is a list of nonreentrant standard library calls for the Solaris 8 OE that you should never make in a multithreaded program:
ConclusionOft-overlooked, nonreentrant calls in MT code can tend to bite developers late in the development cycle. The author has provided some tips and tools to catch many such problems before deployment. Now, get out there and eliminate these nasty lurking bugs! About the AuthorBruce Chapman is a staff engineer who has been with Sun Microsystems for seven years. | ||||||||
|
| ||||||||||||