|
By Tim Jacobson, March 2006
|
|
|
Abstract
Programs sometimes implement hardware-specific instructions. Using these instructions on unsupported hardware can have negative effects. To protect against this, Sun Studio compilers set bits in the object file of the program. These bits are called HardWare CAPabilities (HWCAP). The Sun Studio linker checks these bits against the hardware of the machine. If the hardware does not match the HWCAP bits, the linker halts and the program cannot be executed. There are times when a programmer needs to override this protection. This paper describes how to check the HWCAP bits of an object file and offers several methods to modify those bits.
Overview
HWCAP is a set of bits attached to the object file of a program. These
bits describe the hardware capabilities of the program. This is a new
feature introduced in Sun Studio 10 software and is supported in the Sun Studio 11
release and beyond. All examples in this paper use Fortran code compiled for AMD or Intel 64-bit architecture. The HWCAP feature can also be used with C/C++ code, and it supports all 32/64-bit SPARC as well as AMD or Intel processors found in Sun hardware.
Hardware-specific instructions contained in the program
correspond to bits set in HWCAP. An example would be a program that contains a
prefetch instruction. Several different prefetch instructions exist, and
not every processor supports them. When the compiler finds a
hardware-specific instruction, such as prefetch, bits are set in the
HWCAP stored as part of the object file of the program. Later, in the
linker phase, a check of the HWCAP bits will be made against the
processor of the machine. If the processor does not support the prefetch instruction indicated by the bits in HWCAP, the linker halts.
This protection is to prevent the execution of unsupported instructions.
However, in some cases, a programmer desires to override the protection of HWCAP. An example would be when creating a custom library for use on several different types of processors. If the library contains instructions specific to these different processors, the programmer can do two things.
The first option is to create multiple copies of each program in the library specific to each processor. The library must be stored as a shared object (.so file). When linking the shared object, the right version of a program can be called. This might be the right thing to do if the majority of the code is different. The downside of this is that multiple copies of each program are cumbersome to maintain and take up more space in memory.
The other option is to embed processor-specific instructions separated
by conditional expressions. At runtime, the appropriate code will be executed. This requires the program to do its own check for the hardware capabilities of the processor in order to use the appropriate code instructions. This would be more advantageous if the majority of the code does not change, or space is a factor. It is this second option that requires modifying the HWCAP bits. Even though a program does its own hardware checking, the linker may still halt. To get around this, the linker must be fooled into thinking all the code is safe. It should be pointed out that the programmer is taking on the responsibility of properly checking the hardware capabilities and executing the appropriate code instructions. Failure to do so leads to unpredictable behavior. The program may core dump, halt unexpectedly, or produce incorrect results.
HWCAP bits can be represented in two ways, either by name or a hexadecimal value. The bits can be viewed using either the command-line utility file (see Example 1) or elfdump (see Example 2).
Example 1: Using the file Command
> file dtest.o
dtest.o: ELF 64-bit LSB relocatable AMD64 Version 1 [SSE2 AMD_3DNow]
|
Both examples show that the object file dtest.o contains the HWCAP names SSE2 and 3DNow. SSE2 is a capability found on most newer AMD or Intel x86 chips. 3DNow is a capability that is predominantly on AMD x86 chips.
Example 2: Using the elfdump Command
> elfdump -H dtest.o
Hardware/Software Capabilities Section: .SUNW_cap
index tag value
[0] CA_SUNW_HW_1 0x1100 [ SSE2 AMD_3DNow ]
|
Example 2, using the elfdump command, provides further information by giving the hexadecimal value (0x1100). Although these examples focus on x86 processors, the same can be performed with SPARC processors. A complete list of the full names and values is found in the file /usr/include/sys/auxv_386.h for x86 platforms: see List 1. For SPARC platforms, see the file sys/auxv_SPARC.h. Notice from the elfdump in Example 2 that the value was 0x1100. This is the combination of 3DNow (0x00100) OR'ed with SSE2 (0x01000).
List 1: HWCAP Bits for x86 Platforms
#define AV_386_FPU 0x00001 /* x87-style floating point */
#define AV_386_TSC 0x00002 /* rdtsc insn */
#define AV_386_CX8 0x00004 /* cmpxchg8b insn */
#define AV_386_SEP 0x00008 /* sysenter and sysexit */
#define AV_386_AMD_SYSC 0x00010 /* AMD's syscall and sysret */
#define AV_386_CMOV 0x00020 /* conditional move insns */
#define AV_386_MMX 0x00040 /* MMX insns */
#define AV_386_AMD_MMX 0x00080 /* AMD's MMX insns */
#define AV_386_AMD_3DNow 0x00100 /* AMD's 3Dnow! insns */
#define AV_386_AMD_3DNowx 0x00200 /* AMD's 3Dnow! extended insns */
#define AV_386_FXSR 0x00400 /* fxsave and fxrstor */
#define AV_386_SSE 0x00800 /* SSE insns and regs */
#define AV_386_SSE2 0x01000 /* SSE2 insns and regs */
#define AV_386_PAUSE 0x02000 /* use pause insn (in spin loops) */
#define AV_386_SSE3 0x04000 /* SSE3 insns and regs */
#define AV_386_MON 0x08000 /* monitor/mwait insns */
#define AV_386_CX16 0x10000 /* cmpxchg16b insn */
|
The object file dtest.o must be executed on a machine that has both 3DNow and SSE2 capabilities. The linker checks to make sure this is the case. Two ways are available to override the linker from checking HWCAP values. The first is to remove all the bits, which tells the linker that no specific capabilities are found. The second is to remove just specific bits while preserving others. Remember that modifying HWCAP bits does not remove the offending instructions from the program. It just removes the characteristic bits so the linker cannot compare against the hardware of the machine. Most of the time, programs should keep all the HWCAP bits to protect against implementing unsupported instructions. Overriding the HWCAP should only be done with care, and only when necessary.
This example uses the routine dtest.f, which contains both 3DNow and SSE2 instructions. Let's assume that dtest.f checks for the kind of processor it is running on and can jump to appropriate code for AMD or Intel platforms. The AMD code contains the 3DNow instructions. However, if this program is compiled on certain Intel processor-based machines, the linker will halt because 3DNow instruction are not supported. The linker does not know which parts of code are for AMD and which are for 3DNow. It just compares the HWCAP bits against the processor, and in this case it halts. Since we know that the code does its own processor checking, we can trick the linker by overriding HWCAP. Again, this should only be done with knowledge of how the processor will behave with the code. It would be a bad idea to allow the unsupported Intel processor to read the 3DNow instructions.
Method 1
To remove all the HWCAP bits, the fbe (Fortran Back End) assembler option -nH is used. Before this is done the routine must be converted to assembly, see Example 3. The -S option, in the first command, creates an assembly version of the routine called dtest.s. The second command compiles dtest.s with the assembler fbe and option -nH removes the HWCAP completely. This is verified by blank return from elfdump and the missing information from the command file. The -nH option has two drawbacks. The first is that every program must be converted to assembly and then compiled by the assembler fbe. Having a two-step process is undesirable. This may change with time, so that -nH can be passed from the Fortran compiler directly to the assembler. The second is that -nH removes everything from HWCAP.
Example 3: Using -nH With the fbe Assembler
> f90 -xarch=generic64 -S dtest.f
> fbe -xarch=generic64 -nH dtest.s
> elfdump -H dtest.o
> file dtest.o
dtest.o: ELF 64-bit LSB relocatable AMD64 Version 1
|
Another way to remove all the HWCAP values is to use a mapfile. A mapfile that removes all HWCAP bits is seen in Example 4. Use of this mapfile is shown in Example 5.
Example 4: Mapfile to Remove All HWCAP Bits
> cat map.remove_all
hwcap_1 = OVERRIDE
|
Mapfiles are explained in the description of Method 2. It would be more eloquent to remove only the 3DNow bits and leave the SSE2. Mapfiles have the versatility to do just this.
Example 5: Linker Using a Mapfile
> f90 -xarch=generic64 -o temp.o -c dtest.f
> ld -r -o dtest.o -Mmap.remove_all temp.o
|
Method 2
To remove only the 3DNow bits, a mapfile can be used. A mapfile can contain changes that the linker will read. Mapfiles are commonly labeled in the form map.my_name to help distinguish them from other files. Examples of mapfiles can be found in the directory /usr/lib/ld. The mapfile syntax for HWCAP is described in Example 6. In reference material about the linker, a HWCAP name is called a TOKEN. Common tokens are the names AMD_3DNow, SSE, and SSE2. These names are similar to those found in List 1 except they don't have the prefix. Alternatively the value (Vval) can be used: V0x00100, V0x00800, and V0x01000 (see List 1). Tokens are combined using the logic operator OR. This allows for adding bits to HWCAP. To remove bits, the word OVERRIDE must be used. OVERRIDE takes the token or value in the mapfile and OR's it with the value zero.
Example 6: Mapfile Syntax for HWCAP
hwcap_1 = TOKEN | Vval [OVERRIDE];
|
Mapfiles are more powerful in that they allow both the removal and adding of bits to the HWCAP. The token SSE can be added by creating a mapfile shown in Example 7. Commented out in map.sse is an alternative form with the Vval usage. Either the token name or the hexadecimal Vval can be used.
Example 7: Mapfile for Adding SSE Bits to HWCAP
> cat map.sse
hwcap_1 = SSE;
#hwcap_1 = V0x00800; // alternative form
|
To implement a mapfile, the linker option -M indicates that a mapfile is to be read. Example 8 shows how map.sse is used to add SSE capabilities to the HWCAP values. Notice that this is a two-phase process as well. A temporary object temp.o is first created and then passed to the loader to create dtest.o. However, if a shared object library (.so file) is being created, the linker phase with the mapfile is only needed once, after all the routines have been placed into the shared object file.
Example 8: Adding SSE Bits to HWCAP
> f90 -xarch=generic64 -o temp.o -c dtest.f
> ld -r -o dtest.o -Mmap.sse temp.o
> file dtest.o
dtest.o: ELF 64-bit LSB relocatable AMD64 Version 1 [SSE2 SSE AMD_3DNow]
elfdump -H dtest.o
Hardware/Software Capabilities Section: .SUNW_cap
index tag value
[0] CA_SUNW_HW_1 0x1900 [ SSE2 SSE AMD_3DNow ]
|
Example 9 shows a mapfile to remove the AMD_3DNow value and leave only SSE and SSE2 bits set. Notice that the hexadecimal Vval is used, and the token form is commented out. Also, the OVERRIDE option is used to remove all other bits except SSE and SSE2. In this way, 3DNow is removed.
Example 9: Mapfile to Set Only SSE and SSE2 Bits
> cat map.sse_sse2
#hwcap_1 = SSE SSE2 OVERRIDE;
hwcap_1 = V0x00800 V0x01000 OVERRIDE;
|
Example 10 removes the AMD_3DNow bits from HWCAP at the linker stage. Also, notice that two values, SSE and SSE2, were used in one mapfile. A number of HWCAP values or tokens can be combined in this way, making Method 2 more robust.
Example 10: Removing 3DNow Bits From HWCAP
> f90 -xarch=generic64 -o temp.o -c dtest.f
> ld -r -o dtest.o -Mmap.sse_sse2 temp.o
> file dtest.o
dtest.o: ELF 64-bit LSB relocatable AMD64 Version 1 [SSE2 SSE]
> elfdump -H dtest.o
Hardware/Software Capabilities Section: .SUNW_cap
index tag value
[0] CA_SUNW_HW_1 0x1800 [ SSE2 SSE ]
|
In conclusion, several methods are available to modify HWCAP bits. The quick and dirty options remove all bits. Method 2 is more precise. By utilizing mapfiles, one can add and remove multiple bits from HWCAP at the same time. Although these examples all show Fortran programs, the same procedure can be done for C and C++ programs with Sun Studio compilers. Much more can be done with mapfiles, but that is beyond the scope of this article. Modifying HWCAP bits can be dangerous if the code is executed on machines that do not support instructions.
Resources
For further information about the linker see http://docs.sun.com. Here you will find links to the Solaris 10 Software Developers Collection and the Linker and Libraries Guide. Of particular help are Chapters 2, 9, and Appendix C.
Another resource is Alfred Huang's blog on this topic.
Sun Developer Network (SDN) is also a great resource, offering articles on the latest Sun compilers as well as forums where Sun engineers can answer your questions directly. We are more than happy to help you. Visit the Sun Studio Compilers and Tools section of SDN.
Feedback
About the Author
Tim Jacobson is a member of the High Performance Library Group at Sun
Microsystems. He develops performance mathematical software and
specializes in AMD64 assembly programming.
|