|
By Morgan Herrington, July 16, 2004
|
|
|
Contents
Overview
Most C/C++ development environments depend on some version of the
make utility to manage the build process, yet many engineers
don't take advantage of its parallel capabilities to reduce compile
times. This article will explain how to use this feature and
provide explanations and solutions for the most common pitfalls.
Introduction
Unlike the procedural programming languages we typically use (that is, C++,
perl, and Java), the dataflow language of make doesn't specify a particular order for operations.
Instead, each step is taken when the necessary dependencies have been
completed. This property permits parallel execution of all
non-dependent operations, however, you must attend to a few details in
order to avoid inconsistent or even incorrect results.
In more sophisticated development environments, parallel builds
can be distributed across multiple machines or even multiple networks
(which are sometimes distinguished as distributed rather than a
parallel builds). This may require non-trivial
preparation (that is, setting up a grid) which is
normally addressed by an administrator or a build master so that
each engineer doesn't need to be aware of the fine details of the
implementation.
Here we'll address the simpler case where there is an existing serial
build environment, on a single machine, and you just want to decrease
your build times. Note, however, that most of the issues discussed
for this simpler situation must also be addressed for the more general
environment as well.
Rationale
On a multiprocessor machine, the advantage of parallel make
seems obvious since the additional CPUs can perform multiple compilations
faster than a single CPU. Somewhat less obvious, however, a parallel
make can be faster even on a uniprocessor because the CPU and I/O
demands of multiple compiles can be overlapped (improving the overall
throughput). For the simplest situation where a single source file is
changed, using a parallel make won't help. However, whenever
multiple files (or headers used by multiple sources) have changed, then
a parallel build should decrease the total build time.
Putting It into Practice
While most make variants support a parallel build capability,
this article will discuss GNU make
because it is generally available and widely used. Other make utilities
provide special syntactic shortcuts for some of the suggestions illustrated
below; however, using those features will render your Makefiles
less portable.
A parallel build with gmake can be invoked using the
-j flag, that is:
The invocation syntax to specify the number of jobs may differ
for other parallel make utilities like dmake, pmake, qmake, or mwmake, but the behavior is similar.
Your results will vary based on the particular compiler, options, and language
being compiled, as well as whether the sources are local or remote. A common rule-of-thumb is to request the number of parallel jobs
to be approximately 1.5 times the number of available CPUs on the
machine.
If this initial invocation works (and usually, it will), then you
should start seeing reduced build times on uniprocessor machines and even more of a reduction on multiprocessor machines. However, if it
does not work, then the following sections may help explain some
problems that are specific to parallelization, how to recognize them,
and how to fix them.
Problem 1: Implicit Dependencies
Possibly the most common failure is caused by an unstated dependency that
has inadvertently been introduced in the Makefile (that is, a Makefile bug).
These can go unnoticed for serial builds, but will cause a parallel build
to fail.
For example, consider an application where a module implementing the menus
for a user interface is automatically generated by scanning the other object
files, looking for functions matching a particular naming convention.
The offending make rules might look something like:
OBJECTS=app.o helper.o utility.o ...
application: ${OBJECTS} menu.o
cc -o application ${OBJECTS} menu.o
# Automatically generate menu.c from the other modules
menu.o:
nm ${OBJECTS} | pattern_match_and_gen_code > menu.c
cc -c menu.c
|
The subtlety here is that for a serial build, make will
work from left to right on the dependency list for application.
First it will create all of the objects in ${OBJECTS}, and then
it will apply the rules to build menu.o. There is no problem
in the serial build because by the time menu.o is processed, all
of the necessary objects in ${OBJECTS} will already be in place.
For a parallel build, however, make is free to process
all of the dependencies in parallel, so the creation and compilation
of menu.o could be started before all of the objects
are ready.
To make matters worse, the failure may be intermittent. Because this
is a parallel race condition, sometimes it will work correctly
and sometimes it will fail. A naive user might just learn to distrust
parallel make without realizing that there is a bug in the Makefile.
You can recognize this situation because a serial build will succeed,
but a parallel build will fail (with the specific failure being that
some dependencies are not created). If you retry the build, it will
then succeed.
Out of frustration, some engineers resort to using the
compound command "gmake;gmake" to force the parallel build to complete.
However, a more appropriate fix is to make the dependencies
explicit, allowing the built-in dataflow analysis of make to
process all targets in the correct order. In the following
example, notice that ${OBJECTS} was added to the dependency
list for meno.o:
application: ${OBJECTS} menu.o
cc -o application ${OBJECTS} menu.o
menu.o: ${OBJECTS}
nm ${OBJECTS} | pattern_match_and_gen_code > menu.c
cc -c menu.c
|
Problem 2: Reuse of Temporary Files
Another common problem can arise from the reuse of intermediate
files. For example, consider an application which uses yacc
to generate two distinct parsers. Without considering the
issue of parallel builds, the Makefile author might inadvertently
allow yacc to use the same (default) intermediate source
file for both targets:
application: application.o parser1.o parser2.o
cc -o $* $<
parser1.o: parser1.y
yacc parser1.y
cc -o $* -c y.tab.c
parser2.o: parser2.y
yacc parser2.y
cc -o $* -c y.tab.c
|
This problem becomes even more subtle when using the default
make rules. In this case, rather than being explicitly
visible (like the previous example), this intermediate
file would only be referenced from a system default file (for
example, on the Solaris Operating System, /usr/share/lib/make/make.rules) which has a generic build rule like the following:
.y.o:
$(YACC.y) $<
$(COMPILE.c) -o $@ y.tab.c
$(RM) y.tab.c
|
In either case, this works without any problem for a serial build
because the intermediate file, y.tab.c, is used and then
discarded by each individual build rule. However, when executed in parallel, the
two rules conflict (if both try to write to the same intermediate file
at the same time).
This particular example is easy to fix because yacc provides an
option, -b, to rename the intermediate file so that it will be
unique between invocations. Similarly, if the intermediate file is
being generated by a shell script, these are usually easy to generalize. In
the worst-case scenario, one workaround is to build one of the targets
in a unique (possibly temporary) subdirectory:
parser2.o: parser2.y
mkdir _temp; cd _temp; yacc ../parser2.y
cc -o $* -c _temp/y.tab.c
$(RM) -rf _temp
|
Problem 3: Resource Exhaustion
If your machine is under-configured to handle the amount of parallelism
that you've requested, you may run out of either virtual or physical
memory.
In the first case, you'll see the message:
Fatal error: fork failed: Not enough space
|
And in the second, you will see a significant slowdown in compile speed
(because the compiler is being forced to page to disk).
In both cases, the solution is to reduce the amount of parallelism (or
add memory).
A slightly different, but related, situation is when your parallel build
consumes more than your fair share of a multi-user machine. In that case,
gmake provides an option, -l, to limit parallelism
based on a load average upper limit.
Problem 4: Serial Tools
In some cases, the compiler itself can inhibit parallel builds. In
particular, older versions of the Sun C++ compiler serialize when
accessing the template cache.
The fix for this problem is to avoid using the template cache by:
- Upgrading to Sun Studio 8 software, which uses a different template mechanism
- Working with options to the older compiler (for
example,
-instances=static), which can be used to bypass the
template cache. (See the
C++ User's Guide for instructions and restrictions.)
Problem 5: Old Versions of gmake and NFS
Very old versions of gmake will sometimes fail for parallel
builds (but succeed for serial builds) when invoked in an NFS-mounted
directory. The failure will manifest itself as a report that either of these has occurred:
- One of the sources cannot be created:
gmake: *** No way to make target 'some_source.c'. Stop.
gmake: *** Waiting for unfinished jobs....
|
stat was interrupted:
gmake: stat: some_source.c: Interrupted system call
|
However, if you manually inspect the directory using ls, you
will see that the source file does exist. If gmake is
reinvoked, more of the build will succeed, but it might require
multiple invocations to complete the build.
To fix this problem, install a more recent version of gmake,
for example version 3.79 (or newer).
Summary
This optimization is usually easy to implement and should noticeably
reduce build times on a uniprocessor and dramatically reduce them on a
multiprocessor. The cleanup usually does not complicate the Makefile
and is a necessary first step for more sophisticated distributed
builds.
Resources
|