|
Abstract: This article introduces the concept of linkage and shows how a simple C++ program fails without language linkage, but can succeed with proper linkage.
Contents:
It is a common practice to call functions of a C library from a C++ program. This works out well as long as developers restrict themselves to the standard headers and libraries that were supplied with the operating system. But novice programmers may stumble with some link-time errors, as soon as they try to call methods of their own C library from a C++ program. Potential reasons for the failure could include unfamiliarity with linkage specifications and how C/C++ compilers handle symbols during the compilation.
This article briefly introduces the concept of linkage and shows how a simple C++ program fails without language linkage, and succeeds with proper linkage. Mixing code written in C++ with code written in C is relatively straightforward, as C++ is mostly a superset of C. Although mixing C++ objects with objects in languages other than C is allowed, it is a bit more complicated, hence this article restricts the discussion to C and C++ objects.
The C++ standard provides a mechanism called linkage specification for mixing code that was written in different programming languages and was compiled by the respective compilers, in the same program. Linkage specification refers to the protocol for linking functions or procedures written in different languages. Linkage is the term used by the C++ standard to describe the accessibility of objects from one file to another or even within the same file. Three types of linkage exist:
- No linkage
- Internal linkage
- External linkage
Something internal to a function, in regard to its arguments, variables, and so on, always has no linkage and hence can be accessed only within the function.
Sometimes it is necessary to declare functions and other objects within a single file in a way that allows them to reference each other, but not to be accessible from outside that file. This can be done through internal linkage. Symbols with internal linkage only refer to the same object within a single source file. Prefixing the declarations with the keyword static changes the linkage of external objects from external linkage to internal linkage.
Objects that have external linkage are all considered to be located at the outermost level of the program. This is the default linkage for functions and anything declared outside of a function. All instances of a particular name with external linkage refer to the same object in the program. If two or more declarations of the same symbol have external linkage but with incompatible types (for example, mismatch of declaration and definition), then the program may either crash or show abnormal behavior. The rest of the article discusses one of the issues with mixed code and provides a recommended solution with external linkage.
In the real world, it is very common to use the functionality of code written in one programming language from code written in another. A trivial example is a C++ programmer relying on a standard C library (libc) for sorting a series of integers with the "quick sort" technique. It works because the C implementation takes care of the language linkage for us. But we need to take additional care if we use our own libraries written in C, from a C++ program. Otherwise the compilation may fail with link errors caused by unresolved symbols. Consider the following example:
Assume that we're writing C++ code and wish to call a C function from C++ code. Here's the code for the callee, for example, C routine:
%cat greet.h
extern char *greet();
%cat greet.c
#include "greet.h"
char *greet() {
return ((char *) "Hello!");
}
%cc -G -o libgreet.so greet.c
|
Note:
The extern keyword declares a variable or function and specifies that it has external linkage, i.e., its name is visible from files other than the one in which it's defined.
Let's try to call the C function greet() from a C++ program.
%cat mixedcode.cpp
#include <iostream.h>
#include "greet.h"
int main() {
char *greeting = greet();
cout << greeting << "\n";
return (0);
}
|
%CC -lgreet mixedcode.cpp
Undefined first referenced
symbol in file
char*greet() mixedcode.o
ld: fatal: Symbol referencing errors. No output written to a.out
|
Though the C++ code is linked with the dynamic library that holds the implementation for greet(), libgreet.so, the linking failed with undefined symbol error. What went wrong?
The reason for the link error is that a typical C++ compiler mangles (encodes) function names to support function overloading. So, the symbol greet is changed to something else depending on the algorithm implemented in the compiler during the name mangling process. Hence the object file does not have the symbol greet anywhere in the symbol table. The symbol table of mixedcode.o confirms this. Let's have a look at the symbol tables of both libgreet.so and mixedcode.o:
%elfdump1 -s libgreet.so
Symbol Table Section: .symtab
index value size type bind oth ver shndx name
...
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS libgreet.so
...
[37] 0x00000268 0x00000004 OBJT GLOB D 0 .rodata _lib_version
[38] 0x000102f3 0x00000000 OBJT GLOB D 0 .data1 _edata
[39] 0x00000228 0x00000028 FUNC GLOB D 0 .text greet
[40] 0x0001026c 0x00000000 OBJT GLOB D 0 .dynamic _DYNAMIC
%elfdump -s mixedcode.o
Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[0] 0x00000000 0x00000000 NOTY LOCL D 0 UNDEF
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS mixedcode.cpp
[2] 0x00000000 0x00000000 SECT LOCL D 0 .rodata
[3] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF
__1cDstd2l6Frn0ANbasic_ostream4Ccn0ALchar_traits4Cc____pkc_2_
[4] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF __1cFgreet6F_pc_
[5] 0x00000000 0x00000000 NOTY GLOB D 0 UNDEF __1cDstdEcout_
[6] 0x00000010 0x00000050 FUNC GLOB D 0 .text main
[7] 0x00000000 0x00000000 NOTY GLOB D 0 ABS __fsr_init_value
%dem2 __1cFgreet6F_pc_
__1cFgreet6F_pc_ == char*greet()
|
char*greet() has been mangled to __1cFgreet6F_pc_ by the Sun Studio 9 C++ compiler. That's the reason why the static linker (ld) couldn't match the symbol in the object file.
Note that a C compiler that complies with the C99 standard may mangle some names. For example, on systems in which linkers cannot accept extended characters, a C compiler may encode the universal character name in forming valid external identifiers.
The C++ standard provides a mechanism called linkage specification to enables smooth compilation of mixed code. Linkage between C++ and non-C++ code fragments is called language linkage. All function types, function names, and variable names have a default C++ language linkage. Language linkage can be achieved using the following linkage specification.
Linkage specification:
extern string-literal {
function-declaration
function-declaration
}
extern string-literal function-declaration;
|
The string-literal specifies the linkage associated with a particular function, for example, C and C++. Every C++ implementation provides for linkage to functions written in C language ("C") and linkage to C++ ("C++").
The solution to the problem under discussion is to ask the C++ compiler to use C mangling for the external functions to be called, so we can use the functionality of external C functions from C++ code, without any issues. We can accomplish this using the linkage to C. The following declaration of greet() in greet.h should resolve the problem:
extern "C" char *greet();
|
Because we were calling C code from a C++ program, C linkage was used for the routine greet(). The linkage directive extern "C" tells the compiler to change from C++ mangling to C mangling for the function, and to use C calling conventions while sending external information to the linker. In other words, the C linkage specification forces the C++ compiler to adopt C conventions, which are not the same as C++ conventions.
So, let's modify the header greet.h, and recompile:
%cat greet.h
#if defined __cplusplus
extern "C" {
#endif
char *greet();
#if defined __cplusplus
}
#endif
%cc -G -o libgreet.so greet.c
%CC -lgreet mixedcode.cpp
%./a.out
Hello!
|
It works! Since the header greet.h was used in both C and C++ files, it is necessary to guard extern "C" with the C++ compiler's predefined macro _cplusplus. This is because the C compiler doesn't recognize the "C" portion of extern "C", and throws an error message for the same.
Let's have a look at the symbol table of mixedcode.o one more time.
%CC -c -lgreet mixedcode.cpp
%elfdump -s mixedcode.o
Symbol Table Section: .symtab
index value size type bind oth ver shndx name
[0] 0x00000000 0x00000000 NOTY LOCL D 0 UNDEF
[1] 0x00000000 0x00000000 FILE LOCL D 0 ABS mixedcode.cpp
[2] 0x00000000 0x00000000 SECT LOCL D 0 .rodata
[3] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF
__1cDstd2l6Frn0ANbasic_ostream4Ccn0ALchar_traits4Cc____pkc_2_
[4] 0x00000000 0x00000000 FUNC GLOB D 0 UNDEF greet
[5] 0x00000000 0x00000000 NOTY GLOB D 0 UNDEF __1cDstdEcout_
[6] 0x00000010 0x00000050 FUNC GLOB D 0 .text main
[7] 0x00000000 0x00000000 NOTY GLOB D 0 ABS __fsr_init_value
|
The function name greet was not mangled by the C++ compiler, and hence the linker could find the symbol in the object file and was able to build the executable.
Let's conclude this article with some generic information on mixed-code programming along with some tips and warnings:
- If you are mixing C and C++ code, use compilers that are compatible. For example, they must define basic types such as
int, float, or pointer in the same way. Make sure that the data types in the different languages correspond.
- While mixing code, avoid mismatching data types for parameters and return values. Using the same header avoids the mismatch problem.
- Don't worry about language linkage while using standard header files, because most of the C/C++ compiler vendors handle the linkage specifications inside their header files that work with both C and C++. This is why most existing C libraries can be called without explicit specification of C linkage.
- Pay attention to case-sensitivity conventions for function names in the different languages.
- A function declared as
extern "C" cannot be overloaded.
-
extern "C" declaration can only be applied to global functions.
-
extern "C" declaration must always be after the last include.
- Be aware of the fact that
extern "C" declaration does not specify the details of what must be done to allow the mixing of the C and C++ code.
- It is possible to use linkage directive with all the functions in a file. This is useful if we wish to use C library functions in a C++ program. For example:
extern "C" {
#include "mylibrary.h"
}
|
Note that wrapping an extern "C" around an #include may lead to some problems with nested includes. It is a good practice to fix the header, instead of wrapping an extern "C" around an #include. The wrapping should be used only when we cannot change the header, and the supplier of the header will not fix it for us.
- When programming header files are to be used for both C and C++ programs, use the following convention with predefined macros. The system header files under
/usr/include directory also provide an example of correct usage of the predefined macros.
#if defined __cplusplus3
/* If the functions in this header have C linkage, this
* will specify linkage for all C++ language compilers */
extern "C" {
#endif
... /* body of header */
#if defined __cplusplus
} /* matches the linkage specification at the beginning. */
#endif
|
Please note that the list is not complete. Also name mangling, which we highlighted earlier in the example, is commonly part of the problem to be solved, but it is only a part. Certain other issues exist with mixing fragments of code written in different programming languages and additional steps are needed to resolve those issues. For example, differences in function argument passing between C++ and C functions may create some problems if wider arguments were passed than the expected arguments. Unfortunately, discussion about those issues is beyond the scope of this brief article.
- Programming Languages - C++, ISO/IEC 14882 International Standard
- "The C++ Programming Language" by Bjarne Stroustrup
- C++ name mangling
- elfdump utility can be used to dump selected parts of an object file, like symbol table, elf header, and global offset table.
- dem utility prints a demangled C++ name that closely resembles the name that was originally declared. Sun distributes this utility as part of the Sun Studio compiler collection suite.
- _cplusplus is a predefined macro by the C++ compiler. To see all the predefined macros of C/C++ compilers in Sun Studio 8 or later, compile a simple program with
-xdumpmacros flag. To learn more about -xdumpmacros, please have a look at the
man page of CC.
Giri Mandalika has been working at Sun Microsystems for two and half years as an engineering consultant. His primary responsibility is to work with independent software vendors (ISVs) to make sure their products run well on Sun systems.
|