Sun Java Solaris Communities My SDN Account Join SDN

Article

Stability of the C++ ABI: Evolution of a Programing Language

 
By Stephen Clamage, Sun ONE Studio Solaris Tools Development Engineering  

As C++ evolved over the years, the Application Binary Interface used by a compiler often needed to change in order to support new or changing language features. Programmers expected to recompile all their binaries with every compiler release. An unstable ABI is incompatible with the Solaris philosophy of shared libraries, and is a nightmare for library and middleware vendors. With the advent of the C++ Standard in 1998, there is new hope for a stable C++ ABI on Solaris platforms. This paper addresses these issues in Sun C++, and what you can expect when you use C++ as an implementation language on Solaris

Introduction

The Application Binary Interface (ABI) of a programming-language implementation is a specification of all the low-level details that allow separately-compiled modules to work together. Without a stable ABI, all parts of a program must be compiled with the same version of the same compiler. That situation creates a maintenance nightmare for distributed projects, and particularly for suppliers of binary libraries. The early rapid evolution of the C++ programming language precluded a stable ABI. The advent of the C++ international standard in 1998 [ISO/IEC 14882:1998 Programming Languages - C++] provides a base for a stable C++ ABI, at least for a given C++ implementation. In this paper we explore the stability question for Sun C++ compilers for the Solaris operating environment.

The C ABI

The Solaris ABI is also the C ABI, because C is the standard Unix implementation language. Among other things, the C ABI specifies:

  • size and layout of predefined types (char, int, float, etc.)
  • layout of compound types (arrays and structs)
  • external (linker-visible) spelling of programmer-defined names
  • machine-code function-calling sequence
  • stack layout
  • register usage.

THE C++ ABI

The C++ ABI includes the C ABI. In addition, it involves the following features:

  • layout of hierarchical class objects, i.e., base classes, virtual base classes
  • layout of pointer-to-member
  • passing of hidden function parameters (e.g. this)
  • how to call a virtual function
    • vtable contents and layout
    • location in objects of pointers to vtables
    • finding adjustment for the this pointer
  • finding base-class offsets
  • calling a function via pointer-to-member
  • managing template instances
  • external spelling of names ("name mangling")
  • construction and destruction of static objects
  • throwing and catching exceptions
  • some details of the standard library
    • implementation-defined details
    • typeinfo and run-time type information
    • inline function access to members

Name Mangling

C++ allows different functions to have the same name, and allows an unbounded number of scopes where different global entities with the same name can be declared. Example:

int     f(int);
float   f(float);
class T {
        int f(int);
        int f(char*);
        class U {
                 int f(int);
        };
};
namespace N {
        class T {
                 int f(int);
        };
}

This example has two classes named T, and six functions named f, some of which are in the same scope. All the functions have external linkage. To differentiate entities with the same name, the C++ implementation must make references to these functions unique. To ensure that references to the same entity from different modules can be resolved correctly, the method of making references unique must be predictable.

The usual scheme involves decorating the name of the entity with encodings of the scope names, along with the parameter types and return type if it is a function. The resulting names appear to be scrambled, or "mangled". For example, the names of the six functions above would be encoded by the Sun C++ compiler as follows:

Examples of Mangled Function Names:

Function Mangled Name
float f(float) __1cBf6Ff_f_
int f(int) __1cBf6Fi_i_
int T::f(int) __1cBTBf6Mi_i_
int T::f(char*) __1cBTBf6Mpc_i_
int T::U::f(int) __1cBTBUBf6Mi_i_
int N::T::f(int) __1cBNBTBf6Mi_i_

C++ also provides a way to specify that a name is accessible from C code, and therefore should not be mangled.

Name Mangling and ABI.

The name-mangling algorithm is part of the ABI, because it defines how a compiler must generate external references and definitions for program entities. If two compilers or compiler versions do not mangle equivalent declarations the same way, a program composed of parts compiled from the two compilers will not link correctly.

Hierarchical Layout

Like Smalltalk and Java, C++ allows user definition of hierarchies of class types, wherein a "derived" class implicitly includes all the data and functions of the classes from which it inherits. An ordinary base class is laid out in an object similarly to a member of class type, at a fixed offset from the start of the complete object. Example:

class Base {
      ...
};
class Derived : public Base {
        int i, j;
};
class Composed {
        Base b;
        int i, j;
};

In many C++ implementations, the layout of classes Derived and Composed will be the same.

A pointer to a complete object can be converted to a pointer to one of its base classes, but the address the pointer represents must be adjusted by the offset of the base class within the complete object.

Unlike Smalltalk and Java, a C++ class can have more than one immediate base class, a feature called multiple inheritance.

If classes A and B each have a base class Z, a class C derived from both A and B could have two copies of Z. Sometimes it is appropriate for C to have two independent copies of Z. Other times, Z represents a resource of which there must be only one copy.

To specify that there is to be only one copy of a base class in a hierarchical object, that base class can be declared "virtual."

The offset of a virtual base class relative to an intermediate class depends on the entire hierarchy. Example:

class Z {
        ...
};
class A : virtual public Z {
        ...
};
class B : virtual public Z {
        ...
};
class C : public A, public B {
        ...
};

Suppose in an A object, the Z portion is at offset OA, and in a B object it is at offset OB. There is only one copy of Z in a C object. It cannot simultaneously be at offset OA from the A portion and at offset OB from the B portion. At least one of these offsets must be different when the entire object is of type C.

Given a pointer to A, the location of the Z sub-object therefore cannot be determined at compile time, because the A object might be in turn a sub-object of some more complex type, like C. The runtime system must allow for the dynamic determination of the type of the complete object, so that the offsets of other objects can be determined.

C++ implementations typically store the offset information for each object type in an auxiliary table, often called a vtable. There is one vtable for each type that needs one, shared by all objects of that type. An object needing a vtable then contains a pointer to the vtable. The vtable also contains addresses of virtual functions, to allow dynamic function dispatch based on the actual object type referred to by a pointer or reference.

The C++ Standard Library

The C++ standard defines the names and properties of types and functions in the library, as well as the programming interface to the library. Source code written to the specification is therefore portable among conforming implementations. The binary interface is a different story, however.

The C++ standard allows considerable variation in implementation details, as long as the programming interface is not affected. Many of those implementation details therefore become part of the ABI - particularly the size of class objects.

Many parts of the standard library are best implemented with inline functions to enhance performance. Somewhat like a macro in C, a call to an inline function is replaced by the code of the function. If the function accesses members of a class defined in the standard library, the location of the class member becomes built into the code of application programs that use the inline function. Anything referred to by an inline function is therefore part of the C++ ABI.

Even if an enhancement or bug fix to the standard library does not affect the programming interface, the change would affect the ABI if it altered the size or layout of classes defined in the library.

The Sources of ABI Instability

A new or changed language feature can require a change, not just an extension, to an ABI. Two examples:

  • The C++ standard allows an overriding virtual function to have a return type different from the function it overrides. The return type must be a pointer or reference type, and the return type of the function in the derived class must refer to a type derived from the type referred to by the function it overrides. Example:

    class Base {
            virtual Base* clone();
    };
    class Derived : public Base {
            virtual Derived* clone();
    };
    void f(Base* p)
    {
            Base* copy = p->clone();
    }PRE>
    
    

    The compiler cannot know whether the call to clone will return a Base* or a pointer to a derived type. The ABI must provide a way to accomplish the correct pointer adjustment no matter what type is returned. The required mechanism would not have been available in the old ABI.

  • Consider a template function specialization and a non-template function with the same name and type.

            template<class T> T min(T, T) { ... }
            int min<int>(int, int); // old specialization syntax
            int min(int, int); // non-template
    

    Under old language rules, a non-template function with the same name and type as a template function was considered to be a specialization of the template. Such a function and the corresponding specialization must therefore have the same mangled name.

    Under the rules in the C++ standard, they are distinct functions, and must have different mangled names. The external name of at least one of the functions must change under the new rules.

Fixing some bugs requires an ABI change. Two examples:

  • Early C++ compilers typically could not support calls to functions in a virtual base class from the constructor or destructor of a derived class under some circumstances. Eventually a reasonable-cost solution to this problem was invented, but it required a different vtable organization, and a different way of calling constructors and destructors.

  • Our shipping compiler generates different mangled names for some function declarations that are supposed to be equivalent. Fixing the bug would mean that some existing functions get a different mangled name, an ABI change.

The Consequences of ABI Instability

Any difference in the ABI can mean that object files from different compilers will not link, or if they do link, will not run correctly. (To help prevent code generated for different ABIs from accidentally linking, different compiler implementations typically use different name mangling schemes.)

In the early days of C++, when the language was evolving rapidly, ABIs changed frequently. C++ programmers were accustomed to recompiling everything whenever they updated a compiler.

Suppose an application uses an ORB library from vendor A and a database library from vendor B. The vendors do not wish to distribute source code, and so provide binary libraries. The application code and the libraries from both vendors must all use the same ABI.

If every compiler release has a different ABI, application programmers will not want to upgrade compilers frequently. It would mean coordinating the upgrade among all developers on the project, and recompiling everything on the official upgrade installation date.

If vendor A and vendor B must support many clients, each of whom is using a different compiler release, they must release and support a library for each compiler version. This situation is very expensive in resources, and typically is not feasible. Under this scenario, vendors might release source code, and clients would build the libraries themselves. That in turn creates new support problems, since different clients will use different tool sets, and the build scripts must be configured to conform to local practices.

The Solaris vision of shared libraries is not well-supported by the above scenario. A different version of a C++ shared library must be generated for every supported ABI variation. Even when a compiler is no longer supported, programs may exist in the field that depend on using the shared library. The obsolete library versions must continue to be shipped for a long time.

Successful distribution of libraries as products - particularly shared libraries - depends on having a stable ABI.

A History Of Sun C++ ABIs

Major releases of Sun's C++ compilers have always used incompatible ABIs, in accordance with the engineering taxonomy of release numbers: A new major version number signifies an incompatible release.

Beginning with C++ 3.0, Sun attempted to inject some stability into the C++ ABI. The C++ runtime support library became a shared library shipped along with Solaris: libC.so.3.

But it was also recognized that C++ was still evolving, and work was in progress on a C++ standard that would doubtless change some important details. Accordingly, Sun policy was that no Sun software product could export a C++ interface. With no exported C++ interface, it might be reasonable to "recompile everything" when the ABI changed.

C++ 4.0, released in 1993, introduced a new, incompatible ABI. This ABI was intended to be stable. The C++ development team had solicited input from major Sun clients and even competitors on the ABI design, and had accepted some suggestions from outside. The ABI was published as a public document. The new C++ support library, libC.so.5, was added to Solaris shipments.

Over time, some bugs were found that required small changes in name mangling. These bugs were corrected, and users were provided with ways to restore the previous behavior when it was necessary to link with older code. C++ 4.2, released in 1996, represented the final version of this ABI.

This ABI contained known bugs, such as the virtual base-class problem described in the "Sources of Instability" section. In addition, work on the C++ standard was nearing completion, and it was known to contain features that would require a different ABI. The change in template semantics described in the "Sources of Instability" section is one of several changes that affected the ABI.

To avoid the scenario of a constantly-changing ABI that would result by trying to track the evolving C++ standard, the C++ development team adopted the strategy of implementing in C++ 4.2 only those features that were assumed to remain stable, and that did not require an ABI change. At the cost of lagging behind in C++ features, Sun provided a stable ABI for its customers.

The Runtime Libraries

The earlier C++ runtime library consisted of an I/O library known as "iostreams" and the runtime "helper" functions for the compiler, including support for heap memory allocation, exception handling, and dynamic type information.

The library specified in the C++ standard includes an extensive set of template classes and functions, including strings, iostreams, numerics, and the "STL".

Due to time pressure, version 5.0 of the C++ compiler did not implement all the features of the C++ standard. For example, parts of the standard library definition involve templates as members of classes, a feature not supported by C++ 5.0. In the library to be delivered with the compiler, those parts were missing, or were implemented slightly differently.

For those reasons, the library implementation was split into two parts:

  • libCrun, consisting of the compiler helper functions, including support for heap memory allocation, exception handling, and dynamic type information.
  • libCstd, consisting of the remainder of the C++ standard library.

Here an ABI, There an ABI ...

Sun policy deems continued compatibility more important than conformance to the C++ standard, and more important even than correctness. Even though libCstd was sub-standard and some bugs were found in name mangling, those deficiencies would remain in the following releases.

Because libCstd would remain binary compatible, it could be shipped as a shared library. C++ 5.2 shipped with libCstd.so.1 as an optional library, and C++ 5.3 provided libCstd.so.1 as part of a Solaris package. To support customers who need more of the features of the C++ standard library but do not need binary compatibility, C++ 5.4 also shipped with the open-source STLport [http://www.stlport.org] implementation of the C++ standard library.

Current ABI Status (May 2002)

The current shipping compiler is C++ 5.4, a component of Sun ONE Studio 7, Compiler Collection (formerly Forte Developer 7).

All of the C++ 5.x compilers provide a mode compatible with C++ 4.2, and a default mode that supports the C++ standard. The two modes represent two different ABIs. For a given mode, all the 5.x compilers generate binary-compatible code.

Despite Sun's compatibility policy, some customers have demanded fixes for ABI bugs that prevent their programs from working. To satisfy these customers, the compiler has an undocumented option that generates a correct, but incompatible, ABI. Customers who do not depend on third-party binary libraries can use the corrected ABI, provided they are careful to compile all their code with the option. The libraries that ship with the C++ compiler do not trigger any of the ABI bugs, so no separate version of those libraries is required.

The Future

There is no question that the two C++ ABIs now in use must continue to be supported for some years. That means shipping compilers that generate both ABIs, along with compatible versions of the libraries.

But customers are demanding better standards conformance as well. That would mean supporting a third ABI. How stable would that ABI be? What if more ABI bugs were discovered? What happens when the C++ standard changes again?

That last question is not just theoretical. The C++ Committee is about to release a Technical Corrigendum to the standard that includes binary-incompatible changes to the standard library. The Sun representatives on the Committee could not get agreement from other members that binary compatibility was a requirement.

It is thus nearly certain that a third ABI would not be more stable than the second ABI. Yet a proliferation of supported ABIs is not a viable option - not for Sun, not for Sun's customers.

The C++ development team is exploring the idea of providing an "experimental" ABI as an option with a future compiler. The ABI would be unstable from release to release, but would be as correct as possible. Customers who need correctness but not compatibility could use it if they chose. Use by such customers would help validate the ABI for use as a possible future supported ABI.

For More Information
 
C++ User's Guide Detailed information on the current Sun ONE Studio C++ compiler, including command-line options.