Sun Java Solaris Communities My SDN Account Join SDN
 
Downloads

Sun Studio Express - Using The Thread Analyzer For Data Race Detection

 

Contents

  1. Introduction
  2. What is a Data Race?
  3. Finding Data Races Using the Thread Analyzer
  4. Understanding the Experiment Result
  5. What to Do after a Data Race is Found?
  6. False Positive and Benign Data Races
  7. User APIs

1. Introduction

This document presents a tutorial on using the Sun Studio Thread Analyzer for detecting data races. The tutorial will help you get started using the Thread Analyzer and understanding data races.

If you want a quick overview of the steps involved in using the Thread Analyzer, refer to the Thread Analyzer Getting Started Guide.

Throughout this document, we will use two multi-threaded programs to illustrate data races and the use of the Sun Studio Thread Analyzer.

The first program (Program 1) is a C program that finds prime numbers, parallelized using OpenMP. The source file is called omp_prime.c. There are data races in the program.

The second program (Program 2) is also a C program that finds prime numbers, but is parallelized using POSIX threads instead of OpenMP. The source file is called pthr_prime.c. There are data races in the program.

Here is a listing of Program 1 (omp_prime.c):

Program 1:


[Figure 1]


OpenMP Program for Finding Prime Numbers (omp_prime.c)

     1  #include <stdio.h>
     2  #include <math.h>
     3  #include <omp.h>
     4
     5  #define THREADS 4
     6  #define N 3000
     7
     8  int primes[N];
     9  int pflag[N];
    10
    11  int is_prime(int v)
    12  {
    13      int i;
    14      int bound = floor(sqrt ((double)v)) + 1;
    15      
    16      for (i = 2; i < bound; i++) {
    17          /* No need to check against known composites */ 
    18          if (!pflag[i]) 
    19              continue;
    20          if (v % i == 0) { 
    21              pflag[v] = 0;
    22              return 0;
    23          }
    24      }
    25      return (v > 1); 
    26  }
    27
    28  int main(int argn, char **argv)
    29  {
    30      int i;
    31      int total = 0;
    32
    33  #ifdef _OPENMP
    34      omp_set_num_threads(THREADS);
    35      omp_set_dynamic(0);
    36  #endif
    37
    38      for (i = 0; i < N; i++) {
    39          pflag[i] = 1; 
    40      }
    41      
    42      #pragma omp parallel for
    43      for (i = 2; i < N; i++) {
    44          if ( is_prime(i) ) {
    45              primes[total] = i;
    46              total++;
    47          }
    48      }
    49      printf("Number of prime numbers between 2 and %d: %d\n",
    50             N, total);
    51      for (i = 0; i < total; i++) {
    52          printf("%d\n", primes[i]);
    53      }
    54
    55      return 0;
    56  } 


Here is a listing of Program 2 (pthr_prime.c):

Program 2:


[Figure 2]


POSIX Thread Program for Finding Prime Numbers (pthr_prime.c)

     1  #include <stdio.h>
     2  #include <math.h>
     3  #include <pthread.h>
     4
     5  #define THREADS 4
     6  #define N 3000
     7
     8  int primes[N];
     9  int pflag[N];
    10  int total = 0;
    11
    12  int is_prime(int v)
    13  {
    14      int i;
    15      int bound = floor(sqrt ((double)v)) + 1;
    16
    17      for (i = 2; i < bound; i++) {
    18          /* No need to check against known composites */ 
    19          if (!pflag[i])
    20              continue;
    21          if (v % i == 0) {
    22              pflag[v] = 0;
    23              return 0;
    24          }
    25      }
    26      return (v > 1); 
    27  }
    28
    29  void *work(void *arg)
    30  {
    31      int start;
    32      int end;
    33      int i;
    34
    35      start = (N/THREADS) * (*(int *)arg) ;
    36      end = start + N/THREADS;
    37      for (i = start; i < end; i++) {
    38          if ( is_prime(i) ) {
    39              primes[total] = i;
    40              total++;        
    41          }
    42      }
    43      return NULL;
    44  }
    45
    46  int main(int argn, char **argv)
    47  {
    48      int i;
    49      pthread_t tids[THREADS-1];
    50
    51      for (i = 0; i < N; i++) {
    52          pflag[i] = 1; 
    53      }
    54
    55      for (i = 0; i < THREADS-1; i++) {
    56          pthread_create(&tids[i], NULL, work, (void *)&i);
    57      }
    58
    59      i = THREADS-1;
    60      work((void *)&i);
    61      
    62	    for (i = 0; i < THREADS-1; i++) {
    63	        pthread_join(tids[i], NULL);
    64	    }
    65	
    66	    printf("Number of prime numbers between 2 and %d: %d\n",
    67	           N, total);
    68	    for (i = 0; i < total; i++) {
    69	        printf("%d\n", primes[i]);
    70	    }
    71
    72      return 0;
    73	}

2. What is a Data Race?

A data race occurs when all of the following conditions hold:

  • Two or more threads in a single process access the same memory location concurrently, and
  • At least one of the accesses is for writing, and
  • The threads are not using any exclusive locks to control their accesses to that memory

When the above three conditions hold, the order of accesses is non-deterministic, and the computation may give different results from one run to another, depending on that order. Some data races may be benign (for example, when the memory access is used for a busy-wait), but many data races are either bugs or caused by bugs in the program..

For example, as a result of data races in Program 1 (omp_prime.c), different runs of the program may produce incorrect and inconsistent results as shown below.

Running Program 1:


[Figure 3]


% cc omp_prime.c -xopenmp=noopt -lm   
% a.out | sort -n
0
0
0
0
0
0
0
Number of prime numbers between 2 and 3000: 336
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
...
2971
2999

% a.out | sort -n
0
0
0
0
0
0
0
0
0
Number of prime numbers between 2 and 3000: 325
3
5
7
13
17
19
23
29
31
41
43
47
61
67
71
73
79
83
89
101
...
2971
2999


Similarly, as a result of data races in Program 2 (pthr_prime.c), different runs of the program may produce incorrect and inconsistent results as shown below.

Running Program 2:


[Figure 4]


% cc pthr_prime.c -lm -mt                            
% a.out | sort -n
Number of prime numbers between 2 and 3000: 304
751
757
761
769
773
787
797
809
811
821
823
827
829
839
853
857
859
863
877
881
...
2999
2999

% a.out | sort -n
Number of prime numbers between 2 and 3000: 314
751
757
761
769
773
787
797
809
811
821
823
827
839
853
859
877
881
883
907
911
...
2999
2999

3. Finding Data Races Using the Thread Analyzer

The Thread Analyzer follows the same "collect-analyze" model that the Sun Studio Performance Analyzer uses.

There are three steps involved in using the Thread Analyzer:

  • 3.1 Instrument the Source Code
  • 3.2 Create an Experiment
  • 3.3 Examine the Experiment

These three steps are described below.

3.1 Instrument the Source Code

In order to enable data race detection in a program, the program must first be instrumented. You can instrument a program by compiling the source file(s) with a special compiler option. This special option for each of C, C++, and F90 is: -xinstrument=datarace

Add the -xinstrument=datarace compiler option to the existing set of options you use to compile your program. You can apply the option to only the source files that you suspect to have data races.

Note: It is recommended that you use -g and no optimization level when compiling your program for error checking by the thread analyzer. Compile an OpenMP program with -g -xopenmp=noopt, and compile a POSIX threads program with just -g. The information reported, such as line numbers and callstacks, may be incorrect when a high optimization level is used.

3.2 Create an Experiment

Use the collect command with the -r race option to run the program and create a data-race-detection experiment during the execution of the process. If you are running an OpenMP program, make sure that the number of threads used is larger than 1.

Note: To increase the likelihood of detecting data races, it is recommended that you create several data-race-detection experiments using collect with the -r race option. Use a different number of threads and different input data in the different experiments.

3.3 Examine the Experiment

A data-race-detection experiment can be examined with the tha command, the analyzer command, or the er_print utility.

Both the tha and the analyzer commands present a GUI interface; the former presents a simplified set of default tabs, but is otherwise identical to the analyzer.

The tha GUI has a menu bar, a tool bar, and a split pane that contains tabs for the various displays. On the left-hand pane, you will notice that the following three tabs are shown by default:

  • The Races tab. Shows a list of data races detected in the program. This tab is selected by default.
  • The Dual Source tab. Shows the two source locations corresponding to the two accesses of a selected data race. The source line where a data race access occurred will appear highlighted.
  • The Experiments tab. Shows the load objects in the experiment, and lists error and warning messages.

On the right-hand pane of the tha display, the following two tabs are shown:

  • The Summary tab. Shows summary information about a data race access selected from the Races tab.
  • The Race Details tab. Shows detailed information about a data race trace selected from the Races tab.

The er_print utility, on the other hand, presents a command-line interface. The subcommands that are generally useful for examining races with er_print are:

races Report any data races detected in the experiment.

rdetail race_id

Display detailed information about the data race with the specified race_id. If the specified race_id is all, then detailed information about all data races will be displayed.

header

Display descriptive information about the experiment, and report any errors and warnings.

When displaying the data races detected, you will see the following information about each data race:

  • A unique id that identifies the data race.
  • The virtual address (Vaddr) associated with the data race. If there is more than one virtual address, then (Multiple Addresses) will appear.
  • The two accesses by two different threads that constitute the data race. For each access, the type of the access (read or write) is shown, as well as the function, offset, and line number in the source code where the access occurred.
  • The total number of traces associated with the data race. Each trace refers to the pair of thread callstacks at the time the two data race accesses occurred. If using the tha GUI, the two callstacks will be displayed in in the Race Details tab when an individual trace is selected. If using the er_print utility, the two callstacks will be displayed by the rdetail command.

Refer to the collect.1, collector.1, tha.1, analyzer.1, and er_print.1 man pages for more information.

Note:

The data races detected in a program depend on the order of memory accesses by the different threads, the number of threads used, and the input data used, among other things. Therefore, different runs of the program may reveal different data races than the ones shown below.

4. Understanding the Experiment Result

The following examples show the steps involved in instrumenting Program 1 and Program 2, creating a data-race-detection experiment, and using er_print or the tha GUI to display the data races detected.

4.1 Using the Thread Analyzer with Program 1


[Figure 5]


% cc omp_prime.c -xopenmp=noopt -lm -xinstrument=datarace -g             

% collect -r race a.out | sort -n 
0
0
0
0
0
0
0
0
0
0
...
0
0
Creating experiment database test.1.er ...
Number of prime numbers between 2 and 3000: 429
2
3
5
7
11
13
17
19
23
29
31
37
41
47
53
59
61
67
71
73
...
2971
2999

% er_print test.1.er 
(er_print) races

Total Races:  4 Experiment:  test.1.er

Race #1, Vaddr: 0xffbfeec4
      Access 1: Read,  main -- MP doall from line 42 [_$d1A42.main] + 0x00000060,
                       line 45 in "omp_prime.c"
      Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C,
                       line 46 in "omp_prime.c"
  Total Traces: 2

Race #2, Vaddr: 0xffbfeec4
      Access 1: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C,
                       line 46 in "omp_prime.c"
      Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000008C,
                       line 46 in "omp_prime.c"
  Total Traces: 1

Race #3, Vaddr: (Multiple Addresses)
      Access 1: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000007C,
                       line 45 in "omp_prime.c"
      Access 2: Write, main -- MP doall from line 42 [_$d1A42.main] + 0x0000007C,
                       line 45 in "omp_prime.c"
  Total Traces: 1

Race #4, Vaddr: 0x21418
      Access 1: Read,  is_prime + 0x00000074,
                       line 18 in "omp_prime.c"
      Access 2: Write, is_prime + 0x00000114,
                       line 21 in "omp_prime.c"
  Total Traces: 1
(er_print)
 


The following screen-shot shows the races detected in omp_primes.c displayed through the tha GUI, using the command:
% tha test.1.er

Thread Analyzer window showing
data races in test.1.er


Looking at the data races reported for Program 1, we find there are four data races in omp_prime.c:

  1. Race #1: A data race between a read from total on line 45 and a write to total on line 46
  2. Race #2: A data race between a write to total on line 46 and another write to total on the same line
  3. Race #3: A data race between a write to primes[] on line 45 and another write to primes[] on the same line
  4. Race #4: A data race between a read from pflag[] on line 18 and a write to pflag[] on line 21

4.2 Using the Thread Analyzer with Program 2


[Figure 6]


% cc pthr_prime.c -lm -mt -xinstrument=datarace -g 
% collect -r race a.out | sort -n 
Creating experiment database test.2.er ...
of type "nfs", which may distort the measured performance.
0
0
0
0
0
0
0
0
0
0
...
0
0
Creating experiment database test.2.er ...
Number of prime numbers between 2 and 3000: 328
751
757
761
773
797
809
811
821
823
827
829
839
853
857
859
877
881
883
887
907
...
2999
2999

% er_print test.2.er 
(er_print) races

Total Races:  6 Experiment:  test.2.er

Race #1, Vaddr: 0x218d0
      Access 1: Write, work + 0x00000154,
                       line 40 in "pthr_prime.c"
      Access 2: Write, work + 0x00000154,
                       line 40 in "pthr_prime.c"
  Total Traces: 3

Race #2, Vaddr: 0x218d0
      Access 1: Read,  work + 0x000000CC,
                       line 39 in "pthr_prime.c"
      Access 2: Write, work + 0x00000154,
                       line 40 in "pthr_prime.c"
  Total Traces: 3

Race #3, Vaddr: 0xffbfeec4
      Access 1: Write, main + 0x00000204,
                       line 55 in "pthr_prime.c"
      Access 2: Read,  work + 0x00000024,
                       line 35 in "pthr_prime.c"
  Total Traces: 2

Race #4, Vaddr: (Multiple Addresses)
      Access 1: Write, work + 0x00000108,
                       line 39 in "pthr_prime.c"
      Access 2: Write, work + 0x00000108,
                       line 39 in "pthr_prime.c"
  Total Traces: 1

Race #5, Vaddr: 0x23bfc
      Access 1: Write, is_prime + 0x00000210,
                       line 22 in "pthr_prime.c"
      Access 2: Write, is_prime + 0x00000210,
                       line 22 in "pthr_prime.c"
  Total Traces: 1

Race #6, Vaddr: 0x247bc
      Access 1: Write, work + 0x00000108,
                       line 39 in "pthr_prime.c"
      Access 2: Read,  main + 0x00000394,
                       line 65 in "pthr_prime.c"
  Total Traces: 1
(er_print) 
 


The following screen-shot shows the races detected in pthr_prime.c displayed through the tha GUI, using the command:
% tha test.2.er

Thread Analyzer window showing
data races in test.2.er


Looking at the data races reported for Program 2, we find there are six data races in pthr_prime.c:

  1. Race #1: A data race between a write to total on line 40 and another write to total on the same line
  2. Race #2: A data race between a read from total on line 39 and a write to total on line 40
  3. Race #3: A data race between a write to i on line 55 and a read from i on line 35
  4. Race #4: A data race between a write to primes[] on line 39 and another write to primes[] the same line
  5. Race #5: A data race between a write to pflag[] on line 22 and another write to pflag[] on the same line
  6. Race #6: A data race between a write to primes[] on line 39 and a read from primes[] on line 65

One advantage of the tha GUI is that it allows you to see, side by side, the two source locations associated with a data race. For example, if you first select Race #6 reported for Program 2 in the Races tab and then click on the Dual Source tab, you will see the following:

Thread
Analyzer window showing Dual Source for Race #6 in test.3.er


In the above screen-shot, the first access for Race #6 (line 39) is shown in the the top source display, while the second access for that data race is shown in the bottom source display. The source lines, 39 and 65, where the data race accesses occurred appear highlighted. To the left of each source line, the default metric, exclusive number of data race accesses, is shown. The number shown is a lower bound of the number of data race accesses that occurred at that line

5. What to Do after a Data Race is Found?

5.1 Check Whether the Data Race is a False Positive

A false positive data race is a data race that is reported by the Thread Analyzer, but has actually not occurred. The Thread Analyzer tries to reduce the number of false positives reported. However, there are cases where the tool is not able to do a precise job and may report false positive data races.

Since a false positive data race is simply a false alarm, and is not really a data race, it does not affect the behavior of the program and can therefore be ignored.

See Section 6.1 (False Positive Data Races) for some examples of false positive data races. Also see Section 7 (User APIs), for information on how to remove false positive data races from the report.

5.2 Check Whether the Data Race is Benign

A benign data race is an intentional data race whose existence does not affect the correctness of the program.

Some multi-threaded applications intentionally use code that may cause data races. Since the data races are there by design, no fix is required. In many cases, however, it is quite tricky to get such codes to run correctly. These data races should be reviewed carefully.

See Section 6.2 (Benign Data Races) for more detailed information about benign races.

5.3 Fix the Bug, Not the Data Race

The Thread Analyzer can help find data races in the program, but it cannot automatically find bugs in the program. While a data race may be the root cause of a bug, it may also have been introduced by a bug. It is important to find and fix the bug. Merely removing the data race is not the right approach, and could make further debugging even more difficult. Fix the bug, not the data race.

Fixing Bugs in Program 1:

To remove the data race in omp_prime.c between the read from total on line 45 and the write to total on line 46, we could put each of lines 45 and 46 in a critical section, as follows:


[Figure 7]


 Incorrect fix 

    42      #pragma omp parallel for      
    43      for (i = 2; i < N; i++) {
    44          if ( is_prime(i) ) {
                     #pragma omp critical
                     {          
    45                 primes[total] = i;
                     }
                     #pragma omp critical
                     {

    46                 total++;
                     }
    47          }
    48      }

The critical sections around each of lines 45 and 46 get rid of the data race between lines 46 and 45, because the third condition for a data race, namely that the threads are not using any exclusive locks to control their accesses to total, is not satisfied -- see Section 2 (What is a Data Race).

The critical section around line 46 ensures that the computed value of total is correct. However, the program is still incorrect. Two threads may update the same element of primes[] using the same total value. Moreover, some elements in primes[] may not be assigned a value at all.

One correct fix is to put both lines 45 and 46 in a single critical section that protects the two lines and prevents the data race, as follows:


[Figure 8]


 A correct fix 

    42      #pragma omp parallel for        
    43      for (i = 2; i < N; i++) {
    44          if ( is_prime(i) ) {
                     #pragma omp critical

                     {          
    45                 primes[total] = i;
    46                 total++;
                     }
    47          }
    48      }

Note that the single critical section shown above also fixes two other data races in omp_prime.c: The data race on prime[] at line 45, as well as the data race on total at line 46.

The fourth data race reported, between a read from pflag[] on line 18 and a write to pflag[] on line 21, is actually a benign race as it does not lead to incorrect results. Therefore, it is not essential to fix this data race. Refer to Section 6.2.1 (First Example of a Benign Data Races) for more information.

Fixing Bugs in Program 2:

To remove the data race in pthr_prime.c between the read from total on line 39 and the write to total on line 40, a single mutex can be used to protect these two lines. This also fixes two other data races in pthr_prime.c: The data race on prime[] at line 39, as well as the data race on total at line 40.

The two data races, (a) data race between the write to i on line 55 and the read from i on line 35, and (b) data race on pflag[] on line 22, reveal a problem in the shared-access to variable i by different threads. The initial thread in pthr_prime.c creates the child threads in a loop (source lines 55-57), and dispatches them to work on the function work(). The loop index i is passed to work() by address. Since all threads access the same memory location for i, the value of i for each thread will not remain unique, but will change as the initial thread increments the loop index. As different threads use the same value of i, the aforementioned data races occur. One way to fix the problem is to pass i to work() by value. This ensures that each thread will have its own private copy of i with a unique value.

To remove the data race on primes[] between the write access on line 39 and the read access on line 65, we can protect line 65 with the same mutex lock as the one used above for lines 39 and 40. However, this is not the right fix. The real problem is that the main thread may report the result (lines 50 through 53) while the child threads are still updating total and primes[] in function work(). Using mutex locks does not provide the proper ordering synchronization between the threads. One correct fix is to let the main thread wait for all child threads to join it before printing out the results.

The following is a fixed version of Program 2 (pthr_prime.c).


[Figure 9]


A fixed version of pthr_prime.c

     1	#include <stdio.h>
     2	#include <math.h>
     3	#include <pthread.h>
     4	
     5	#define THREADS 4
     6	#define N 3000
     7	
     8	int primes[N];
     9	int pflag[N];
    10	int total = 0;
    11	pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    12	
    13	int is_prime(int v)
    14	{
    15	    int i;
    16	    int bound = floor(sqrt(v)) + 1;
    17	
    18	    for (i = 2; i < bound; i++) {
    19	        /* no need to check against known composites */ 
    20	        if (!pflag[i])
    21	            continue;
    22	        if (v % i == 0) {
    23	            pflag[v] = 0;
    24	            return 0;
    25	        }
    26	    }
    27	    return (v > 1); 
    28	}
    29	
    30	void *work(void *arg)
    31	{
    32	    int start;
    33	    int end;
    34	    int i;
    35	    
    36	    start = (N/THREADS) * ((int)arg) ;
    37	    end = start + N/THREADS;
    38	    for (i = start; i < end; i++) {
    39	        if ( is_prime(i) ) {
    40	            pthread_mutex_lock(&mutex);
    41	            primes[total] = i;
    42	            total++;        
    43	            pthread_mutex_unlock(&mutex);
    44	        }
    45	    }
    46	    return NULL;
    47	}
    48	
    49	int main(int argn, char **argv)
    50	{
    51	    int i;
    52	    pthread_t tids[THREADS-1];
    53	
    54	    for (i = 0; i < N; i++) {
    55	        pflag[i] = 1; 
    56	    }
    57	
    58	    for (i = 0; i < THREADS-1; i++) {
    59	        pthread_create(&tids[i], NULL, work, (void *)i);
    60	    }
    61	
    62	    i = THREADS-1;
    63	    work((void *)i);
    64	    
    65	    for (i = 0; i < THREADS-1; i++) {
    66	        pthread_join(tids[i], NULL);
    67	    }
    68	
    69	    printf("Number of prime numbers between 2 and %d: %d\n",
    70	           N, total);
    71	    for (i = 0; i < total; i++) {
    72	        printf("%d\n", primes[i]);
    73	    }
    74    
    75      return 0;
    76
    77	}

5.4 Run the Thread Analyzer Again

After fixing the bugs in the program revealed by the detected data races, the updated program should be tested again using the Thread Analyzer. The bug fixes may change the behavior of the program. The changes may expose data races that were previously hidden, or may introduce new data races. Therefore, the updated program should be re-tested.

The following diagram shows the recommended flow of using the Thread Analyzer.


[Figure 10]


                START
                  |	
                  V
 --> Instrument the program for data race detection 
 |                |                                                                              
 |                |                                                                             
 |                V                                                                            
 |   L1: Perform a data-race-detection experiment: Repeat the same setup, or 
 |                |                                Use different input data, or                  
 |                |                                Use a different number of threads, or           
 |                |                                Use different loop schedules, or             
 |                |                                Use a different machine                         
 |                |                                                                   
 |                V                                                                    
 |   New data races reported? -- NO --> Confident about the result? --NO--> GOTO L1
 |                |                                                  |     
 |               YES                                                YES 
 |                |                                                  |           
 |                |                                                  |----> DONE            
 |                V                                                  
 |   Is the data race a false positive? <---------------------                    
 |                |             |                             ^
 |                NO            YES                           |
 |                |             |                             |
 |                |             |----> Ignore it, or try to   |
 |                |                    remove it using the    |
 |                |                    user APIs. GOTO L3.    |
 |                V                                           |
 |   L2: Is the data race benign?                             |
 |                |             |                             |
 |                NO            YES                           |
 |                |             |                             |
 |                |             |                             |
 |                V             |----> Ignore it. GOTO L3     |
 |           Fix the bug                                      |
 |                |                                           |
 |                |                                           |                                        
 |                V                                           |
 |   L3: Any more races to examine? --- YES ------------------>
 |                |
 |                NO
 |                |
 |<---------------V

6. False Positive and Benign Data Races

6.1 False Positive Data Races

Occasionally, the Thread Analyzer may report data races that are false and that actually have not occurred in the program. These are called false positives.

In most cases, false positives are caused by one of the following two reasons:

6.1.1 Use of User-Defined Synchronizations

The Thread Analyzer can recognize most standard synchronization APIs and constructs provided by OpenMP, POSIX threads, and Solaris threads. However, the tool cannot recognize roll-your-own style synchronizations, and may report false data races if such synchronizations are used. For example, the tool cannot recognize implementation of locks using CAS instructions, post and wait operations using busy-waits, etc.

Here is a typical example of a class of false positives where the program employs a common idiom of using POSIX thread condition variables.


[Figure 11]


        /* Initially ready_flag is 0 */
 
        /* Thread 1: Producer */
  100   data = ...
  101   pthread_mutex_lock (&mutex);  
  102   ready_flag = 1;
  103   pthread_cond_signal (&cond);
  103   pthread_mutex_unlock (&mutex);
  ...
 
        /* Thread 2: Consumer */
  200   pthread_mutex_lock (&mutex);
  201   while (!ready_flag) {
  202       pthread_cond_wait (&cond, &mutex);   
  203   }
  204   pthread_mutex_unlock (&mutex);
  205   ... = data;

To protect against program errors and spurious wake-ups, the pthread_cond_wait() call is usually made within a loop that tests the predicate. The test and set of the predicate is often protected by a mutex lock. In the above code, thread 1 produces the value for variable data at line 100, sets the ready_flag to 1 at line 102 to indicate that the data has been produced, and then calls pthread_cond_signal() to wake up the consumer thread 2. Thread 2 tests the predicate (!ready_flag) in a loop. When it finds that the flag is set, it consumes the data at line 205.

The write of ready_flag at line 102 and read of ready_flag at line 201 are protected by the same mutex lock, so there is no data race between the two accesses and the tool recognizes that correctly.

The write of data at line 100 and the read of data at line 205 are not protected by mutex locks. However, in the program logic, the read at line 205 always happens after the write at line 100 because of the flag variable ready_flag. So there is no data race between these two accesses to data. However, the tool will report that there is a data race between the two accesses if the call to pthread_cond_wait() (line 202) is actually not called at run time. If line 102 is executed before line 201 is ever executed, then when line 201 is executed, the loop entry test fails and line 202 is skipped. The tool monitors pthread_cond_signal() calls and pthread_cond_wait() calls and can pair them to derive synchronization. When the pthread_cond_wait() at line 202 is not called, the tool does not know that the write at line 100 is always executed before the read at line 205. Therefore, it considers them as executed concurrently and reports a data race between them.

In order to avoid reporting this kind of false positive data race, the Thread Analyzer provides a set of APIs that can be used to notify the tool when user-defined synchronizations are performed. See Section 7 (User APIs) for details.

6.1.2 Memory Recycled by Different Threads

Some memory management routines will recycle memory freed by one thread for the use by another thread. The Thread Analyzer is sometimes not able to recognize that the life-times of the same memory location used by different threads do not overlap. When this happens, the tool may report a false positive data race. The following example illustrates this kind of false positive.


[Figure 12]

    
    /*----------*/                         /*----------*/
    /* Thread 1 */                         /* Thread 2 */
    /*----------*/                         /*----------*/
    ptr1 = mymalloc(sizeof(data_t));
    ptr1->data = ...
    ...
    myfree(ptr1);

                                           ptr2 = mymalloc(sizeof(data_t));   
                                           ptr2->data = ...
                                           ...
                                           myfree(ptr2);

Thread 1 and thread 2 execute concurrently. Each thread allocates a chunk of memory that is used as its private memory. The routine mymalloc() may supply the memory freed by a previous myfree() call. If thread 2 calls mymalloc() before thread 1 calls myfree(), then ptr1 and ptr2 will get different values and there is no data race between the two threads. However, if thread 2 calls mymalloc() after thread 1 calls myfree(), then ptr1 and ptr2 may have the same value. There is no data race because thread 1 will no longer access that memory. However, if the tool does not know mymalloc() is recycling memory, it will report a data race between the write of ptr1->data and the write of ptr2->data.

This kind of false positive often happens in C++ applications when the C++ runtime library recycles memory for temporary variables. It also often happens in user applications that implement their own memory management routines.

Currently, the Thread Analyzer is able to recognize memory allocation and free operations performed via the standard malloc(), calloc(), and realloc() interfaces.

6.2 Benign Data Races

Some multi-threaded applications intentionally allow data races in order to get better performance. A benign data race is an intentional data race whose existence does not affect the correctness of the program. Three examples are given below.

6.2.1 First Example of a Benign Data Race

In Program 1 (omp_prime.c), the threads check whether an integer is prime by executing the function is_prime():


[Figure 13]


    11  int is_prime(int v)
    12  {
    13      int i;
    14      int bound = floor(sqrt ((double)v)) + 1;
    15      
    16      for (i = 2; i < bound; i++) {
    17          /* No need to check against known composites */ 
    18          if (!pflag[i]) 
    19              continue;
    20          if (v % i == 0) { 
    21              pflag[v] = 0;
    22              return 0;
    23          }
    24      }
    25      return (v > 1); 
    26  }

The Thread Analyzer reports that there is a data race between the write to pflag[] on line 21 and the read of pflag[] on line 18. However, this data race is benign as it does not affect the correctness of the final result. At line 18, a thread checks whether pflag[i], for a given value of i is equal to 0. If pflag[i] is equal to 0, that means that i is a known composite number (i.e., i is known to be non-prime). So there is no need to check whether v is divisible by i; we really only need to check whether v is divisible by some prime number. Therefore, if pflag[i] is equal to 0, then the thread continues on to the next value of i. If pflag[i] is not equal to 0 and v is divisible by i, then the thread assigns 0 to pflag[v] to indicate that v is not a prime number. It does not matter, from a correctness point of view, if multiple threads check the same pflag[] element and write to it concurrently. The initial value of a pflag[] element is 1. When the threads update that element, they assign 0 to it. That is, the threads store 0 in the same bit in the same byte of memory for that element. On current architectures, it is safe to assume that those stores are atomic. Therefore, when that element is read by a thread, the value read will either be 1 or 0. If a thread checks a given pflag[] element (line 18) before it has been assigned the value 0, then it will execute lines 20-23. If in the meantime, another thread has assigned 0 to that same pflag[] element (line 21), that does not change the final result. It just means that the first thread executed lines 20-23 when it really did not need to.

6.2.2 Second Example of a Benign Data Race

A group of threads call check_bad_array() concurrently to check whether any element of array data_array is "bad". Each thread checks a different section of the array. If a thread finds that an element is "bad", it sets the value of a global shared variable is_bad to true.


[Figure 14]


     20  volatile int is_bad = 0;
    ...

    100  /* 
    102   * Each thread checks its assigned portion of data_array, and sets 
    102   * the global flag is_bad to 1 once it finds a bad data element.
    103   */
    104  void check_bad_array(volatile data_t *data_array, unsigned int thread_id)    
    105  {
    106     int i;
    107     for (i=my_start(thread_id); i<my_end(thread_id); i++) {
    108          if (is_bad) 
    109              return;
    110          else {
    111              if (is_bad_element(data_array[i])) { 
    112                  is_bad = 1;
    113                  return;
    114              }
    115          }
    116     }
    117  }

There is a data race between the read of is_bad on line 108 and the write to is_bad on line 112. However, the data race does not affect the correctness of the final result. The initial value of is_bad is 0. When the threads update is_bad, they assign 1 to it. That is, the threads store 1 in the same bit in the same byte of memory for is_bad. On current architectures, it is safe to assume that those stores are atomic. Therefore, when is_bad is read by a thread, the value read will either be 0 or 1. If a thread checks is_bad (line 108) before it has been assigned the value 1, then it will continue executing the for loop. If in the meantime, another thread has assigned 1 to is_bad (line 112), that does not change the final result. It just means that the thread executed the for loop longer than it really needed to.

6.2.3 Third Example of a Benign Data Race

Double-checked-locking is a program idiom that provides an efficient way to initialize a singleton in multi-threaded applications. The following code illustrates such an implementation.


[Figure 15]


 100 class Singleton {
 101     public:
 102     static Singleton* instance();
 103     ...
 104     private:
 105     static Singleton* ptr_instance;
 106 };
 ...

 200 Singleton* Singleton::ptr_instance = 0;
 ...

 300 Singleton* Singleton::instance() {
 301     Singleton *tmp = ptr_instance;
 302     memory_barrier();
 303     if (tmp == NULL) {
 304         Lock();
 305         if (ptr_instance == NULL) {
 306             tmp = new Singleton;
 307             memory_barrier();
 308             ptr_instance = tmp;
 309         }
 310         Unlock();
 311     }
 312     return tmp;
 313 }

The read of ptr_instance (line 301) is intentionally not protected by a lock to make the checking of whether the singleton has already been instantiated efficient in a multi-threaded environment. Notice that there is a data race on variable ptr_instance between the read on line 301 and the write on line 308, but the program works correctly.

Writing a correct program that allows data races is often a tricky task. For example, in the above double-checked-locking example, the calls to memory_barrier() at lines 302 and 307 are used to ensure that the singleton and ptr_instance is set and read in the proper order, so all threads read them consistently. This program idiom is broken if the memory barriers are not used.

Finally, in addition to the above examples of benign data races, a large class of applications that allow data races are those that use lock-free/wait-free algorithms which are difficult to design correctly. The Thread Analyzer can help pin-point the locations of data races in these applications.

7. User APIs

The Thread Analyzer can recognize most standard synchronization APIs and constructs provided by OpenMP, POSIX threads, and Solaris threads. However, the tool cannot recognize roll-your-own style synchronizations, and may report false data races if such synchronizations are used. For example, the tool cannot recognize spin locking implemented using hand-coded assembly.

If the program includes roll-your-own style synchronizations, then a set of user API calls can be inserted in the program to inform the Thread Analyzer of those synchronizations. This allows the tool to recognize the synchronizations, thus reducing the number of false positives reported.

tha_notify_acquire_lock(id) Insert immediately before the program tries to acquire a user-defined lock
tha_notify_lock_acquired(id) Insert immediately after a user-defined lock has been acquired successfully
tha_notify_writelock_acquired(id) Insert immediately after a user-defined read-write lock has been acquired successfully in write mode
tha_notify_readlock_acquired(id) Insert immediately after a user-defined read-write lock has been acquired successfully in read mode
tha_notify_lock_released(id) Insert immediately after a user-defined lock (including a read-write lock) has been released successfully
tha_notify_sync_post_begin(id) Insert immediately before a user-defined post synchronization is performed
tha_notify_sync_post_end(id) Insert immediately after a user-defined post synchronization has been performed
tha_notify_sync_wait_begin(id) Insert immediately before a user-defined wait synchronization is performed
tha_notify_sync_wait_end(id) Insert immediately after a user-defined wait synchronization has been performed

A C/C++ version and a Fortran version of the APIs are provided. Each API call takes a single argument id, whose value should uniquely identify the synchronization object.

In the C/C++ version of the APIs, the type of the argument is uintptr, which is 4 bytes long in 32-bit mode and 8 bytes long in 64-bit mode. You need to add #include <tha_interface.h> to your C/C++ source file when calling any of the APIs.

In the Fortran version of the APIs, the type of the argument is integer of kind tha_sobj_kind which is 8-byte long in both 32-bit mode and 64-bit mode. You need to add include "tha_finterface.h" to your Fortran source file when calling any of the APIs.

To uniquely identify a synchronization object, the argument id should have a different value for each different synchronization object. One way to do this is to let the value of id be the address of the synchronization object.

The following code shows how to use the API to remove the false positive in Figure 11.


[Figure 16]


        # include <tha_interface.h>
        ...

        /* Initially ready_flag is 0 */
        ...
          
        /* Thread 1: Producer */
  100   data = ...
  101   pthread_mutex_lock (&mutex);
        tha_notify_sync_post_begin ((uintptr) &ready_flag);
  102   ready_flag = 1;
        tha_notify_sync_post_end ((uintptr) &ready_flag);

  103   pthread_cond_signal (&cond);
  103   pthread_mutex_unlock (&mutex);
 
 
        /* Thread 2: Consumer */
  200   pthread_mutex_lock (&mutex);
        tha_notify_sync_wait_begin ((uintptr) &ready_flag);
  201   while (!ready_flag) {
  202       pthread_cond_wait (&cond, &mutex);   
  203   }
        tha_notify_sync_wait_end ((uintptr) &ready_flag);
  204   pthread_mutex_unlock (&mutex);
  205   ... = data;

Refer to the libtha.3 man page for more information about using the User APIs.


(Page last modified February 12, 2007)