This
article discusses general and data race condition issues that arise
in parallel programming. Data race condition problems are common,
while rarer, but harder, general race problems can also occur. Some
data race conditions are not harmful and can be permitted in
parallel software for performance reasons. Furthermore a race
condition might be the symptom of a deeper design problem in the
program code. A simple and actual parallel partitioning program is
used to illustrate these various race condition issues and how to
avoid them.
IntroductionParallel programming[1] is never an easy task for software developers. Most software developers are trained to write their code as a single flow of sequential operations. In parallel programming, developers need to take a new perspective and design their code as multiple flows of concurrent operations. The most challenging issue for developers new to the parallel software paradigm is a fundamental problem in parallel programming: syndromes known as race conditions.A race condition is a programming fault which produces unpredictable program state and behavior due to un-synchronized concurrent executions. Race conditions are hard to avoid and also hard to find with the conventional debugging methods and tools. To make things worse, there are many subtle aspects surrounding race conditions. It is important for programmers to be aware of these land mines before they step into the dangerous parallel programming zone. The most often encountered race conditions are data race conditions. A data race condition is caused by unordered concurrent accesses of the same memory location from multiple threads. Less frequent but harder to find are general race conditions. A subtle general race condition often occurs at a transitory state due to the undetermined program execution order of multiple concurrent events that have data conflicts. However not every race condition is a programming bug, and some data race conditions are not harmful. There are some situations where race conditions in a parallel program for performance reason. Last but not least, the race condition might be only the surface symptoms of a deeper design problem in the program code such as a bad logic or bad coding style. It is easier to discuss these subtle software design issues with a real program. In this article, a simple and popular parallel partitioning program example is used to illustrate the above various race condition issues. Partitioning Program Example Partitioning is one of the common tasks in high performance computing application domain dealing with a huge number of objects. In general, a partitioning task processes a number of input objects and then sorts and collects the objects into group containers according to their attributes. In the partitioning example shown below, the main program processes the input source data into an object array and then creates N working threads to sort and collect the objects in parallel. Each working thread will operate with the collect routine to walk through the object array to select the objects with the right attribute and store them into the group container. Each working thread also counts the objects stored in a group container and adds the group count to the global total count. In the end, it verifies that the total count is identical to the input object count to see if any objects were not collected in the partitioning.partition_main.cpp
Data Race Condition ProblemData race condition problems often occur in shared memory parallel programming models such as Pthreads[2] and OpenMP[3]. A data race condition occurs when multiple threads access a shared memory location with an undetermined accessing order and at least one access is to write a new data into the shared memory location.In the partition program example, a data race problem occurs at line 32:
total_count += group_count in the collect routine. total_count is a
global static variable shared by all the collect working threads.
When a thread reads the value of total count and adds its group
count value to the total count, another thread may intervene in the
middle of operation and read the old value of total count
simultaneously. Right after the first thread writes a new value to
the total count, the second thread may overwrite it with its own new
value and wipe out the computing effort from the first thread.
The effect of such a data race problem is unpredictable and may occur only once during hundreds of runs. The data race condition problem is often harder to debug than the problem caused by a corrupted pointer. A corrupted pointer often points to an illegal region and just causes a segmentation fault. But a data race condition tends to produce wrong values of the critical variable in an unpredictable way and makes the programmer confused about the program behavior. (Data race conditions do not occur in distributed memory parallel programs such as MPI[4,5] applications.) It is very important to restrict the usage and access of global variables in multi-thread programming. Benign Data Race ConditionNot every data race condition is a harmful problem. In the above partitioning example there is another data race that occurs in the collect routine. The second data race occurs at lines 25 and 27 of thecollect routine.
At line 25, the working thread checks the current object's collect_flag and sees
if it needs to continue operating on the current object. At line 27,
the working thread will turn on the current object's collect flag
when its attribute matches the group attribute and will be stored
into the group container. The data race condition occurs when
multiple threads process the same current object and read/write the
same object's collect_flag field in an undetermined order.
If every object's attribute will match only one group attribute in this partitioning example, this data race condition is benign and will not cause any harmful result. In fact it will make the parallel program run faster, because the flag checking will eliminate the unnecessary match operation when the current object is owned by the other group. The second data race condition only affects the efficiency of eliminating the redundant match operation. However if an object can belong to multiple groups, this data race condition is a real problem and will cause a harmful result. When a thread decides the current object belongs to its group and turns on its collect_flag, another thread may check the same object's collect_flag, simultaneously and find the object still not owned by
any group. Therefore the object may be stored in one group or
multiple groups depending on the accessing order of its collect_flag,
In shared memory parallel programming model, the threads must communicate critical data among one another. In principle all the threads need to read this critical data to be aware of the current overall program state and decide whether to proceed further. At least one thread needs to generate a new value and update the critical data. This kind of checking and update pattern is quite common in parallel programming, but it causes a data race condition in nature. If every update action needs to halt and synchronize all the participating threads, it will reduce the parallel computing efficiency significantly. Therefore a data race is a necessary compromise in many cases. The boundary between being a benign and harmful data race condition is quite subtle and depends on various things such as the input set and program running conditions. Only the programmers who really understand their programs can decide if a data race condition is benign or not. General Race ConditionA general race condition is caused by an undetermined sequence of executions that violates the program state integrity and cannot be attributed to a single memory location access. (Robert H.B. Netzer and Barton P. Miller from University of Wisconsin have a formal definition of a general race condition[6]) Data race conditions are a simple form of general race conditions. The term general race condition throughout this article means a race condition which is not data race type. A general race condition is a much harder problem to detect than a data race condition. The challenging nature of general race condition prompts many computer science researchers to study a transactional memory approach[7,8]. The basic idea of transactional memory is to apply the database transaction concept to solve the general problem of parallel programming by a simple programming model and still deliver the parallel performance on multi-processor machines.Here we extend the partition program example to demonstrate a general race problem. Let's say after the first phase of collection, the partitioning program needs to fine-tune and shuffle some objects from a group container to another group container as shown in the code below.
The above program looks clean and simple. Since both container remove_array
and add_array methods are protected by a lock, a data race condition
will not occur. However a general race may occur at the transitory
state, line 8 in the code, between the remove and add actions of the
two group containers. If there is another thread operating on the
objects in parallel, it may miss the target objects because they are
homeless at the critical time. Here is a simple example to explain
this situation. Let's say the object shuffling is to be done based
on 5 independent rules. It will be a natural thing to create 5
threads and let each thread do the shuffling based on one rule
simultaneously. A general race condition occurs when thread 2
misses the target objects which are at a transitory state due to the
shuffling action from thread 1.
The intuitive solution is to encapsulate the entire shuffle_objects method with an expensive atomic pragma[9] and disallow another thread or process to interfere in the middle of operations. However this fix may not be a complete solution to meet some application partitioning requirements. For example, if the partitioning program deals with the electronic component objects and their child pin objects, the grouping of the electronic components and their pins must be atomic to keep the parent component and child pin objects in a consistent state. Again the program code below shows a transitory state at line 25 which may produce general race condition fault. partition_shuffle.cpp (continued)
The solution of this new problem is to encapsulate the entire shuffle_components method into an atomic transaction. Therefore the
atomic transaction type requirement may be imposed on a long
sequence of operations. But when the atomic transaction of
operations becomes long and complex, it deviates from the original
purpose of parallel programming. In general the general race is
subtle to avoid in the first place and also hard to fix even when
you are lucky to discover the problem. Furthermore general race
condition can occur in distributed memory parallel models such as
MPI as well as shared memory.
The big hazard of general race condition demonstrates an important design lesson for parallel programming. Developers need to pay attention to program states and state transitions as well as data dependency. It always helps to implement some auxiliary methods to verify the internal program states during each operation stage. Understanding The Real Cause Of A Race Condition ProblemLet's come back to the collect routine in the partitioning example program. There is a third data race condition problem in the collect routine. This data race problem is quite subtle and hard to understand without a serious investigation. In the collect routine at line 28, the statementgroup_container.add(current_object) has its own data
race condition problem. It is hard to see that there is a data race
problem here by just looking at the code. As a matter of fact, this
data race problem is caused by another data race problem, the fourth
data race problem in such a simple program!
The fourth data race problem in this partitioning program is easier to explain. At line 18, the group ID in each working thread is coming from the routine argument arg which is a pointer passed from the address of loop index i in the main program. The main thread program will advance loop index i at line 44 and write a new value to memory location pointed by arg. Therefore there is a data race condition with the reading of this memory location and converting it to a group ID by a working thread (line 18) and writing a new loop index value by the main thread (line 44). When this data race condition occurs, two different working threads may get assigned with the same group ID and causing the data race condition at line 28. But the real cause of the these two data race conditions is the bad coding style at line 45 and line 18. Another good design lesson can be learned here. Always pass a data value to another thread or process instead of passing a pointer to the data. Had the partitioning program been implemented using OpenMP, the third and fourth data race conditions would not occur. Higher level programming models like OpenMP tend to reduce the risk of such tedious implementation bugs. If your multi-threaded program can be implemented in OpenMP, you should consider it. It would have been safer and more efficient to have implemented this partitioning program in OpenMP. Below is the partitioning program implemented using OpenMP. Because the collect routine input argument type is no longer restricted to the void* pointer type required for Pthreads, its input argument is
declared as integer type and takes the loop index value from the
main program as the group ID. The collect routine return type is not
restricted either. The return value is declared as integer type and
the collected group count is returned to the main program. The
OpenMP program is safer, easier to implement and also reads better
than the Pthread program.partition_omp.cpp
SummaryParallel programming is a new world for most software developers. The current multi-core processor technology trend of throughput computing[10] compels many software developers to explore parallel programming. Unfortunately, the difficulty of developing parallel software[11] is several orders harder than developing sequential software. There are various other design issues in parallel programming besides these race condition problems to consider. Nonetheless, race conditions are the most basic design problems developers will encounter. In this article, we discussed the various race condition issues that developers need to learn about and avoid when designing parallel software.Software development requires using the right set of tools, and this is especially true for parallel software. From the analysis of the parallel partitioning program shown in this article, it is easy to see how a tool to detect race conditions is sorely needed. There is no developer tool existing today to deal with the general race condition and it is doubtful if such a tool could ever exist given the intricate problem nature. Therefore many computer science researchers advocate using the transactional memory approach. However there are several tools available today to handle the data race condition problem. These include the Data Race Detection Tool[12] in the recent Sun Studio Express release from Sun, Thread Checker from Intel[13] and Helgrind from open source Valgrind[14]. Designing good quality software ultimately depends on the programmer's knowledge and experience, even with the right tools. There are several good lessons that can be learned to avoid race condition problems, and these are summarized here:
References
AcknowledgementsThe author would like to thank his Sun colleagues on the Data Race Detection Tool project who inspired this article. Some of the data race conditions discussed in this article were taken from real problems encountered in DRDT testing. The author also wants to thank his son David Chen for comments on the initial draft, his Sun colleagues Yuan Lin and Vijay Tatkar for their helpful reviews, and Richard Friedman for the final editing. |
| ||||||||||||
|
| ||||||||||||