|
This document is available in PDF.
The use of dynamic libraries and various features of the runtime linker provides a number of ways to improve the performance and the usability of applications running under the Solaris operating environment:
- Dynamic libraries that can be upgraded without rebuilding the application
- The use of transparent platform specific versions of the libraries
- Features that facilitate debugging and profiling of an application
This paper describes some of the features of the runtime linker that can be used in development of computationally intensive applications for Solaris. For complete documentation of the linker features, see Linker and Libraries Guide at http://docs.sun.com.
Using Platform-Specific Libraries
Taking advantage of hardware properties specific to certain platforms can result in a significant performance improvement for an application. Numerically intensive kernels can utilize the parameters of a particular SPARC architecture as memory latency or the number of registers available for computation. They can also utilize CPU instructions or software techniques, such as data prefetching, that can be advantageous on some architectures but that may not be implemented or degrade the performance on others. In most cases it is not feasible for an independent software vendor (ISV) to develop and distribute different versions of an application specifically tuned for different platforms a hardware vendor provides. However, some features of the Solaris runtime linker allow use of different optimized libraries on different architectures, transparent to the user, within a single version of the application.
One of the ways to build platform-specific libraries is to use the $PLATFORM linker token (introduced in Solaris 2.6), which can be specified as a part of the runtime library path with the -R linker option, and which expands as the output of the uname -i command at runtime. This technique is used for the Solaris libc_psr.so library:
$ dump -Lv /usr/lib/libc.so | grep PLATFORM
[5] AUXILIARY /usr/platform/$PLATFORM/lib/libc_psr.so.1
As an example, we consider two implementations of the same library - one optimized for Ultra Enterprise architecture and the default version for all other Sun machines:
$ find /opt/ISV -name libtmp.so
/opt/ISV/lib/SUNW,Ultra-Enterprise/libtmp.so
/opt/ISV/lib/other/libtmp.so
Linking these libraries with a $PLATFORM token results in an executable that resolves the correct library, depending on the architecture detected at runtime:
$ setenv LD_OPTIONS '-R /opt/ISV/lib/$PLATFORM:/opt/ISV/lib/other'
$ f77 test.f /opt/ISV/lib/other/libtmp.so -o test.out
test.f:
MAIN test:
On an Ultra Enterprise machine:
Note: The /opt/ISV/lib/other directory explicitly specified in the runtime linker path ensures that the default version of the library /opt/ISV/lib/other/libtmp.so will be used on the machines running versions of Solaris earlier than 2.6; anything else is therefore not capable of using the $PLATFORM functionality.
In addition to $PLATFORM, Solaris 8 introduced new tokens $OSNAME and $OSREL, which expand to uname -s and uname -r outputs respectively. The use of these tokens is described in the "System Specific Shared Objects" section of the Linker and Libraries Guide. Solaris 8 linker also allows you to specify instruction-set specific shared libraries with the $ISALIST string token. At runtime this token is replaced by each of the native instruction-sets on the platform shown by the isalist command, until the path containing $ISALIST points to an available implementation of the library. A detailed description of $ISALIST is given in the "Instruction Set Specific Shared Objects" section of the Linker and Libraries Guide.
Both instruction-set specific and system specific shared objects are often used in combination with filters - the special form of shared objects that provide indirection. A filter is a shared library that at link time provides only a symbol table for some (in the case of auxiliary filters) or all of the symbols whose implementations are taken from another library (filtee) at runtime. This technique is described in the "Shared Objects as Filters" section of the Linker and Libraries Guide.
Using platform-specific or instruction-set specific versions of numerically intensive kernels can have a dramatic impact on the overall performance of an application. For example, the following matrix compares the Megaflop rates for the square matrix-matrix multiplication using DGEMM call from V7 and V8PLUS versions of the Sun performance library. The runs were performed on a Sun Ultra 2 with 300 MHz CPUs with theoretical peak performance of 600 Megaflops.
$ORIGIN Token
One of the potential problems with executables linked dynamically is that setting the absolute or relative runtime library path with -R prescribes either the location or the application itself, or its runtime directory relative to the application binaries. One of the solutions to this problem is to run an application from a shell wrapper that appends the LD_LIBRARY_PATH environment variable with the path relative to the application home directory. A more elegant solution that allows the application to locate its dependencies is to use the runtime linker token $ORIGIN (introduced in Solaris 7) that represents the directory in which an object originated.
$ setenv LD_OPTIONS '-R $ORIGIN/lib'
$ f77 test.f lib/libtmp.so -o test.out
test.f:
MAIN test:
$ ldd test.out | grep libtmp.so
libtmp.so => /opt/ISV/test/lib/libtmp.so
The runtime linker resolves the library path relative to the binary that uses the library. That makes the shared objects relocatable as long as the directory structure is preserved:
$ cp -r * ../test2
$ ldd ../test2/test.out | grep libtmp.so
libtmp.so => /opt/ISV/test2/lib/libtmp.so
Combining the $ORIGIN with tokens described in the previous section allows you to create relocatable libraries optimized for different hardware platforms.
$ setenv LD_OPTIONS '-R $ORIGIN/lib/$PLATFORM'
The libraries built using this approach allow you to get maximum performance of the hardware transparently to the user. Detailed description of $ORIGIN token is provided in the section on "Locating Associated Dependencies" of the Linker and Libraries Guide.
Interposing Libraries for Profiling and Debugging
Using specially crafted dynamic libraries can facilitate development of an application for debugging or profiling. The dynamically linked libraries of an application can be updated or replaced (for example, with the libraries compiled with -g, -pg, or -xprofile=tcov) as long as the symbols used in the original version are still defined. Moreover, the LD_PRELOAD variable allows you to interpose any calls (including system calls) resolved through shared objects without replacing the dynamic libraries. The symbols can be redefined to collect the calling and the timing data for the function calls without recompiling or relinking the application. This can be particularly useful when all the code necessary for relinking is not available.
The following example uses dgemmtest.c for interposing the DGEMM call from the Sun Performance Library (libsunperf.so) to collect its calling data. The function is compiled as a shared library and is interposed with the LD_PRELOAD variable.
$ cat dgemmtest.c
#include <stdio.h>
#include <dlfcn.h>
#include <sys/time.h>
static dgemm_counter = 0;
static hrtime_t t1, t2, timetotal;
static void (*dgemm_handle)(char *, char *, int *, int *, int *, double *,
double *, int *, double *, int *,
double *, double *, int *) = NULL;
FILE *fdgemm;
int pid;
static char filename[20];
#pragma init (initptrs)
void initptrs()
{
dgemm_handle = (void(*)(char *, char *, int *, int *, int *, double *,
double *, int *, double *, int *,double *, double *,
int *))dlsym(RTLD_NEXT, "dgemm_");
pid = getpid();
sprintf(filename,"dgemm.calls_%d",pid);
fdgemm = fopen(filename, "w");
fprintf(fdgemm, "\ndgemm calls: \n\n");
fclose(fdgemm);
}
void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double
*dalpha, double *da, int *lda, double *db, int *ldb,
double *dbeta, double *dc, int *ldc)
{
dgemm_counter++;
fdgemm = fopen(filename, "a");
fprintf(fdgemm, "dgemm(%s, %s, %d, %d, %d, %f, A, %d, B, %d, %f, C, %d)\n",
transa, transb,*m,*n,*k,*dalpha, *lda, *ldb, *dbeta, *ldc);
t1=gethrtime();
dgemm_handle(transa,transb,m,n,k,dalpha,da,lda,db,ldb,dbeta,dc,ldc);
t2=gethrtime();
timetotal=timetotal+t2-t1;
fclose(fdgemm);
return;
}
#pragma fini (dumpstats)
dumpstats()
{
fdgemm = fopen(filename, "a");
fprintf(fdgemm, "\n");
fprintf(fdgemm, "%d dgemm calls are made: total %f sec.\n",
dgemm_counter, (float)timetotal/1000000000.0);
fprintf(fdgemm, "\n");
fclose(fdgemm);
}
$ cc -c dgemmtest.c; cc -G -o dgemmtest.so dgemmtest.o
$ setenv LD_PRELOAD ./dgemmtest.so
For each process, this shared object will produce a file with records of all the DGEMM calls with corresponding arguments and the time spent in the calls. The init and fini sections of the code are executed at the beginning and at the end of each process (and therefore LD_PRELOAD should be unset after the run to avoid generating extra files). Also, note the usage of the dlsym call that returns the address of the next reference to DGEMM, which allows you to make the actual computational DGEMM call. A test program testprod that calls DGEMM twice produces the following output:
$ ./testprod
$ unsetenv LD_PRELOAD
$ ls | grep dgemm.calls
2 dgemm.calls_3416
$ cat dgemm.calls_3416
dgemm calls:
dgemm(T, N, 500, 500, 2000, 1.000000, A, 2000, B, 2000, 1.000000, C, 500)
dgemm(T, N, 500, 500, 2000, 1.000000, A, 2000, B, 2000, 1.000000, C, 500)
2 dgemm calls are made: total 5.941649 sec.
It should be pointed out that preloading libraries in development process for profiling or debugging has its limitations; for example, if fork/exec calls are executed. As an alternative to this technique, the LD_AUDIT facility of the runtime linker can be used in cases when additional flexibility is needed.
Resources
Complete documentation for the features of the Solaris linker can be found in the Linker and Libraries Guide.
/linker.pdf">PDF.
Tell us what you think.
The use of dynamic libraries and various features of the runtime linker provides a number of ways to improve the performance and the usability of applications running under the Solaris operating environment:
- Dynamic libraries that can be upgraded without rebuilding the application
- The use of transparent platform specific versions of the libraries
- Features that facilitate debugging and profiling of an application
This paper describes some of the features of the runtime linker that can be used in development of computationally intensive applications for Solaris. For complete documentation of the linker features, see Linker and Libraries Guide at http://docs.sun.com.
Using Platform-Specific Libraries
Taking advantage of hardware properties specific to certain platforms can result in a significant performance improvement for an application. Numerically intensive kernels can utilize the parameters of a particular SPARC architecture as memory latency or the number of registers available for computation. They can also utilize CPU instructions or software techniques, such as data prefetching, that can be advantageous on some architectures but that may not be implemented or degrade the performance on others. In most cases it is not feasible for an independent software vendor (ISV) to develop and distribute different versions of an application specifically tuned for different platforms a hardware vendor provides. However, some features of the Solaris runtime linker allow use of different optimized libraries on different architectures, transparent to the user, within a single version of the application.
One of the ways to build platform-specific libraries is to use the $PLATFORM linker token (introduced in Solaris 2.6), which can be specified as a part of the runtime library path with the -R linker option, and which expands as the output of the uname -i command at runtime. This technique is used for the Solaris libc_psr.so library:
$ dump -Lv /usr/lib/libc.so | grep PLATFORM
[5] AUXILIARY /usr/platform/$PLATFORM/lib/libc_psr.so.1
As an example, we consider two implementations of the same library - one optimized for Ultra Enterprise architecture and the default version for all other Sun machines:
$ find /opt/ISV -name libtmp.so
/opt/ISV/lib/SUNW,Ultra-Enterprise/libtmp.so
/opt/ISV/lib/other/libtmp.so
Linking these libraries with a $PLATFORM token results in an executable that resolves the correct library, depending on the architecture detected at runtime:
$ setenv LD_OPTIONS '-R /opt/ISV/lib/$PLATFORM:/opt/ISV/lib/other'
$ f77 test.f /opt/ISV/lib/other/libtmp.so -o test.out
test.f:
MAIN test:
On an Ultra Enterprise machine:
Note: The /opt/ISV/lib/other directory explicitly specified in the runtime linker path ensures that the default version of the library /opt/ISV/lib/other/libtmp.so will be used on the machines running versions of Solaris earlier than 2.6; anything else is therefore not capable of using the $PLATFORM functionality.
In addition to $PLATFORM, Solaris 8 introduced new tokens $OSNAME and $OSREL, which expand to uname -s and uname -r outputs respectively. The use of these tokens is described in the "System Specific Shared Objects" section of the Linker and Libraries Guide. Solaris 8 linker also allows you to specify instruction-set specific shared libraries with the $ISALIST string token. At runtime this token is replaced by each of the native instruction-sets on the platform shown by the isalist command, until the path containing $ISALIST points to an available implementation of the library. A detailed description of $ISALIST is given in the "Instruction Set Specific Shared Objects" section of the Linker and Libraries Guide.
Both instruction-set specific and system specific shared objects are often used in combination with filters - the special form of shared objects that provide indirection. A filter is a shared library that at link time provides only a symbol table for some (in the case of auxiliary filters) or all of the symbols whose implementations are taken from another library (filtee) at runtime. This technique is described in the "Shared Objects as Filters" section of the Linker and Libraries Guide.
Using platform-specific or instruction-set specific versions of numerically intensive kernels can have a dramatic impact on the overall performance of an application. For example, the following matrix compares the Megaflop rates for the square matrix-matrix multiplication using DGEMM call from V7 and V8PLUS versions of the Sun performance library. The runs were performed on a Sun Ultra 2 with 300 MHz CPUs with theoretical peak performance of 600 Megaflops.
$ORIGIN Token
One of the potential problems with executables linked dynamically is that setting the absolute or relative runtime library path with -R prescribes either the location or the application itself, or its runtime directory relative to the application binaries. One of the solutions to this problem is to run an application from a shell wrapper that appends the LD_LIBRARY_PATH environment variable with the path relative to the application home directory. A more elegant solution that allows the application to locate its dependencies is to use the runtime linker token $ORIGIN (introduced in Solaris 7) that represents the directory in which an object originated.
$ setenv LD_OPTIONS '-R $ORIGIN/lib'
$ f77 test.f lib/libtmp.so -o test.out
test.f:
MAIN test:
$ ldd test.out | grep libtmp.so
libtmp.so => /opt/ISV/test/lib/libtmp.so
The runtime linker resolves the library path relative to the binary that uses the library. That makes the shared objects relocatable as long as the directory structure is preserved:
$ cp -r * ../test2
$ ldd ../test2/test.out | grep libtmp.so
libtmp.so => /opt/ISV/test2/lib/libtmp.so
Combining the $ORIGIN with tokens described in the previous section allows you to create relocatable libraries optimized for different hardware platforms.
$ setenv LD_OPTIONS '-R $ORIGIN/lib/$PLATFORM'
The libraries built using this approach allow you to get maximum performance of the hardware transparently to the user. Detailed description of $ORIGIN token is provided in the section on "Locating Associated Dependencies" of the Linker and Libraries Guide.
Interposing Libraries for Profiling and Debugging
Using specially crafted dynamic libraries can facilitate development of an application for debugging or profiling. The dynamically linked libraries of an application can be updated or replaced (for example, with the libraries compiled with -g, -pg, or -xprofile=tcov) as long as the symbols used in the original version are still defined. Moreover, the LD_PRELOAD variable allows you to interpose any calls (including system calls) resolved through shared objects without replacing the dynamic libraries. The symbols can be redefined to collect the calling and the timing data for the function calls without recompiling or relinking the application. This can be particularly useful when all the code necessary for relinking is not available.
The following example uses dgemmtest.c for interposing the DGEMM call from the Sun Performance Library (libsunperf.so) to collect its calling data. The function is compiled as a shared library and is interposed with the LD_PRELOAD variable.
$ cat dgemmtest.c
#include <stdio.h>
#include <dlfcn.h>
#include <sys/time.h>
static dgemm_counter = 0;
static hrtime_t t1, t2, timetotal;
static void (*dgemm_handle)(char *, char *, int *, int *, int *, double *,
double *, int *, double *, int *,
double *, double *, int *) = NULL;
FILE *fdgemm;
int pid;
static char filename[20];
#pragma init (initptrs)
void initptrs()
{
dgemm_handle = (void(*)(char *, char *, int *, int *, int *, double *,
double *, int *, double *, int *,double *, double *,
int *))dlsym(RTLD_NEXT, "dgemm_");
pid = getpid();
sprintf(filename,"dgemm.calls_%d",pid);
fdgemm = fopen(filename, "w");
fprintf(fdgemm, "\ndgemm calls: \n\n");
fclose(fdgemm);
}
void dgemm_(char *transa, char *transb, int *m, int *n, int *k, double
*dalpha, double *da, int *lda, double *db, int *ldb,
double *dbeta, double *dc, int *ldc)
{
dgemm_counter++;
fdgemm = fopen(filename, "a");
fprintf(fdgemm, "dgemm(%s, %s, %d, %d, %d, %f, A, %d, B, %d, %f, C, %d)\n",
transa, transb,*m,*n,*k,*dalpha, *lda, *ldb, *dbeta, *ldc);
t1=gethrtime();
dgemm_handle(transa,transb,m,n,k,dalpha,da,lda,db,ldb,dbeta,dc,ldc);
t2=gethrtime();
timetotal=timetotal+t2-t1;
fclose(fdgemm);
return;
}
#pragma fini (dumpstats)
dumpstats()
{
fdgemm = fopen(filename, "a");
fprintf(fdgemm, "\n");
fprintf(fdgemm, "%d dgemm calls are made: total %f sec.\n",
dgemm_counter, (float)timetotal/1000000000.0);
fprintf(fdgemm, "\n");
fclose(fdgemm);
}
$ cc -c dgemmtest.c; cc -G -o dgemmtest.so dgemmtest.o
$ setenv LD_PRELOAD ./dgemmtest.so
For each process, this shared object will produce a file with records of all the DGEMM calls with corresponding arguments and the time spent in the calls. The init and fini sections of the code are executed at the beginning and at the end of each process (and therefore LD_PRELOAD should be unset after the run to avoid generating extra files). Also, note the usage of the dlsym call that returns the address of the next reference to DGEMM, which allows you to make the actual computational DGEMM call. A test program testprod that calls DGEMM twice produces the following output:
$ ./testprod
$ unsetenv LD_PRELOAD
$ ls | grep dgemm.calls
2 dgemm.calls_3416
$ cat dgemm.calls_3416
dgemm calls:
dgemm(T, N, 500, 500, 2000, 1.000000, A, 2000, B, 2000, 1.000000, C, 500)
dgemm(T, N, 500, 500, 2000, 1.000000, A, 2000, B, 2000, 1.000000, C, 500)
2 dgemm calls are made: total 5.941649 sec.
It should be pointed out that preloading libraries in development process for profiling or debugging has its limitations; for example, if fork/exec calls are executed. As an alternative to this technique, the LD_AUDIT facility of the runtime linker can be used in cases when additional flexibility is needed.
Resources
Complete documentation for the features of the Solaris linker can be found in the Linker and Libraries Guide.
|
|