Sun Java Solaris Communities My SDN Account Join SDN
 
Article

OpenMP Support in Sun Studio Compilers and Tools

 
By Nawal Copty, Scalable Systems Group, Sun Microsystems, December 13, 2005  
OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to express multi-threaded shared-memory parallelism in C, C++, and Fortran programs. OpenMP is fast becoming the standard paradigm for parallelizing applications. With a relatively small amount of coding effort, programmers can obtain scalable performance for their applications on a shared-memory multi-processor system.

This paper presents an overview of the OpenMP model of computation, and describes OpenMP support in the Sun Studio compilers and tools. In addition, the paper reports on the performance of the SPEC OMP2001 benchmarks and outlines directions for future work.

What Is OpenMP


OpenMP is an Application Programming Interface (API) that can be used to explicitly specify multi-threaded shared-memory parallelism in C, C++, and Fortran programs.  The OpenMP API is composed of three components:
  • Compiler directives,
  • Runtime library routines, and
  • Environment variables
In C/C++, OpenMP directives are specified using the #pragma mechanism.  In Fortran, OpenMP directives are specified using special comments that are identified by unique sentinels (in fixed form source files, the sentinels !$omp, c$omp, and *$omp are recognized).

The OpenMP Specification is the definitive reference on OpenMP.  and can be found at [1].  The latest Specification is Version 2.5 which is a combined Specification for C, C++, and Fortran. The Specification is owned and managed by the OpenMP Architecture Review Board (ARB) [2], a non-profit organization established in 1997.

Membership in the ARB is open to corporations, research organizations, and academic institutions.  Sun Microsystems is a member in the OpenMP ARB and plays a prominent role in shaping the future of the API.  Sun Microsystems takes active part in weekly meetings of the language committee of the ARB, where the Specification is discussed and updated.  In addition, a Sun Microsystems Distinguished Engineer is on the Board of Directors of the ARB, and a Sun Microsystems engineer serves as the Secretary of the ARB.

The main motivations for using OpenMP are performance, scalability, portability, and standardization.  As an application's requirements and data set become larger, more computing power is needed.  Higher performance can be achieved by utilizing many processors together to execute a single application.  OpenMP provides a widely supported API for programming shared-memory machines.  With a relatively small amount of coding effort, users can obtain scalable performance for their applications on these machines.

The Sun Studio application development suite [3] is a comprehensive, integrated set of compilers and tools for the development and deployment of applications on Sun platforms.  The Sun Studio compilers support the OpenMP Specification Version 2.5.  In addition, the Sun Studio tools support writing, debugging, and analyzing the performance of OpenMP applications.

The OpenMP Execution Model


The underlying machine model for OpenMP is a shared-memory machine, where all the processors access one global memory. Examples of shared memory machines include the Sun Fire V40z server with up to four dual-core AMD Opteron processors, the Sun Fire V890 server with up to eight dual-core UltraSPARC IV+ processors, and the Sun Fire Enterprise E25K with up to 72 dual-core UltraSPARC IV+ processors. Figure 1 gives a simplified view of a shared-memory machine with n processors.

Figure 1



OpenMP uses the fork-join model of parallel execution.  When a thread encounters a parallel construct, the thread creates a team of threads composed of itself and some additional (possibly zero) number of threads.  The thread that encounters the parallel construct is called the master thread of the team.  The other threads are called slave threads of the team.

All team members execute the code inside the parallel construct.  When a thread finishes its work within the parallel construct, it waits at an implicit barrier at the end of the construct.  When all team members have arrived at the barrier, the master thread alone continues execution of user code beyond the end of the parallel construct.  Any number of parallel constructs can be specified in a single program.

OpenMP Directives


OpenMP has a rich set of directives that the user can use to specify parallelism in a program.  In this section, we give examples of three OpenMP directives, namely the PARALLEL directive, the DO/for directive, and the SECTIONS directive.  More detailed information about these and other OpenMP directives can be found in [1] and [4].

Since OpenMP is based on the shared-memory programming model, variables are shared by default.  OpenMP data scope attribute clauses can be used to explicitly define the scope of variables. These clauses include the SHARED, PRIVATE, FIRSTPRIVATE, LASTPRIVATE, and REDUCTION clauses.

The number of threads used to execute a parallel construct can be specified by setting the OMP_NUM_THREADS environment variable, by calling the omp_set_num_threads runtime library routine, or by using the NUM_THREADS clause.

Parallel

The PARALLEL directive defines a region of code to be executed in parallel by multiple threads.  All the threads participating in the execution of the PARALLEL construct will execute the same region of code.  In effect, the region of code is replicated across the threads.

When a thread reaches a PARALLEL construct, it creates a team of threads and becomes the master of the team.  The master has thread number 0 within the team.  The other threads are numbered 1, 2, ..., n-1, where n is the total number of threads in the team.  There is an implied barrier at the end of the PARALLEL construct.  Only the master thread continues execution past this point.

See Appendix A.1 for an example of the PARALLEL construct.

DO/for

The DO/for directive is a work-sharing directive that applies to a DO loop (Fortran) or a for loop (C/C++).  The DO/for construct divides the iterations of the DO/for loop among the team of  threads that encounters the construct.  There is an implied barrier at the end of  the DO/for construct, unless a  NOWAIT clause is specified.

It is the programmer's responsibility to ensure that the iterations of a loop with the DO/for directive have no dependencies.  That is, the result of one loop iteration does not depend on the result of any other loop iteration.  If this condition holds, then two different loop iterations can be executed in parallel by two different threads.

See Appendix A.2 for an example of the DO/for directive.

Sections

The SECTIONS directive is a work-sharing directive that applies to a set of structured blocks of code (each structured block is called a SECTION).  The SECTIONS construct divides the blocks of code among the team of threads that encounters the construct.  Each block  is executed once by a thread in the team.  There is an implied barrier at the end of the SECTIONS construct, unless a  NOWAIT clause is specified.

It is the programmer's responsibility to ensure that the structured blocks of code in the SECTIONS construct are independent of each other and can be executed in parallel by different threads.

See Appendix A.3 for an example of the SECTIONS directive.

OpenMP Support in Sun Studio Software


The -xopenmp compiler option instructs the Sun Studio compilers to recognize OpenMP directives in a program.

OpenMP support in the compilers consists of two parts.  First, the compiler processes OpenMP directives and transforms the code so it can be executed by multiple threads.  Second, a runtime library provides support for thread management, synchronization, and scheduling of work.

Compiler Support

Figure 2 shows the various stages and components of a compiler.  These consist of  a language-specific Front-End, an Optimizer, and a machine-dependent Code Generator.  The Front-End component of the compiler recognizes OpenMP directives, processes the information associated with them, and then passes that information to the Optimizer.  The Optimizer processes the information passed by the Front-End and transforms the code so it can be executed by multiple threads.  In transforming the code, the Optimizer inserts calls to the OpenMP runtime support library, libmtsk.  Finally, the Code Generator generates target machine code.

When the Optimizer processes a PARALLEL construct in the program, it does the following:

Figure 2



First, the Optimizer analyzes the scopes of variables in the PARALLEL construct.  That is, it determines whether a variable accessed in the body of the PARALLEL construct is SHARED, PRIVATE, FIRSTPRIVATE, LASTPRIVATE, REDUCTION, etc.

Second, the Optimizer extracts the body of the PARALLEL construct and places it in a separate routine, called an outlined routine.  Variables that are SHARED are passed as arguments to the outlined routine so they can be accessed by multiple threads.  Variables that are PRIVATE are declared to be local in the outlined routine, so separate copies of these variables are allocated on different thread stacks. Additional code is added to the outlined routine to initialize FIRSTPRIVATE variables, update LASTPRIVATE variables, combine reduction results, etc.

Third, the Optimizer replaces the original PARALLEL construct by a call to the libmtsk library routine, __mt_MasterFunction_.  The address of the outlined routine is passed as an argument to __mt_MasterFunction_.  When  __mt_MasterFunction_  is executed at runtime, it dispatches a team of threads to execute the outlined routine.

The outlining transformation described above has advantages.  First, an outlined routine defines a context for parallel execution that can be easily executed by multiple threads.  Second, outlining simplifies storage management, since variables that are local to the outlined routine will automatically be allocated on different thread stacks thus making them local to each thread.

Figures 3(a) and 3(b) illustrate the outlining transformation.  The body of the PARALLEL  construct in Figure 3(a) is extracted and placed in the outlined routine __mf_par_001.   Since variable n is SHARED in the PARALLEL construct, its address is passed as an argument to __mf_par_001, so all threads executing __mf_par_001 would access the same copy.  On the other hand, since variable id is PRIVATE, it is declared local to __mf_par_001, so every thread executing __mf_par_001 would have its own copy of the variable on its stack.

Figure 3(b) shows how the PARALLEL construct is replaced by a call to __mt_MasterFunction_, and the address of __mf_par_001 is passed as an argument to __mt_MasterFunction_.

Other constructs, such as the work-sharing DO/for and SECTIONS, are processed by the Optimizer in a similar fashion.  The Optimizer, however, replaces a work-sharing construct by a call to the libmtsk library routine __mt_WorkSharing_.

Figure 3


Automatic Parallelization

Besides OpenMP directive-based parallelization, the Optimizer can also automatically parallelize loops in a program.  When a program is compiled with the -xautopar option, the Optimizer examines all loops in the program and uses data-flow analysis to determine which loops have iterations that can be executed independently of each other.  The Optimizer then transforms these loops in a fashion similar to that described above for OpenMP.

Runtime Library Support

The OpenMP runtime support library, libmtsk, provides support for thread management, synchronization, and scheduling of work.  The library is implemented on top of the POSIX threads library (libpthread).

As described above, the Optimizer replaces the code for a PARALLEL construct by a call to __mt_MasterFunction_. When a thread calls __mt_MasterFunction_, it creates a team of threads to execute the PARALLEL construct and it becomes the master thread of the team.  Then the master  thread dispatches the slave  threads to work on the outlined routine.  The master  thread itself also takes part in executing the outlined routine.  When finished, the master  thread synchronizes with other threads in the team via a call to the barrier routine __mt_EndOfTask_Barrier_.   The general logic of __mt_MasterFunction_ is shown in Figure 4. 

Thr runtime library, libmtsk, maintains a pool of threads that can be used as slave threads for PARALLEL constructs.  The threads in the pool are created via calls to the POSIX threads library routine pthread_create. When a master thread needs to create a team of more than one thread, the master thread checks the pool and grabs idle threads from the pool, making them slave threads of the team.  When the team finishes executing the PARALLEL region, the slave threads are returned to the pool.

Throughout its lifetime, a slave thread executes the runtime routine slave_startup_function, where it alternates between waiting for the next PARALLEL task and executing a PARALLEL task.  While waiting for a PARALLEL task, a slave thread may be spinning or sleeping. This behavior can be controlled by setting the environment variable SUNW_MP_THR_IDLE.  When a thread finishes working on a task, it synchronizes with the master thread and other threads in the team via a call to the barrier routine __mt_EndOfTask_Barrier_.  The general logic of slave_startup_function  is shown in Figure 5.

OpenMP allows PARALLEL regions to be nested inside each other.  The runtime library, libmtsk, supports nested parallelism.  If nested parallelism is enabled by setting the environment variable OMP_NESTED or by calling omp_set_nested, then a nested PARALLEL region can be executed by a team that consists of more than one thread.

In addition, the runtime library, libmtsk, supports multiple user threads.  If a user program is threaded via explicit calls to the POSIX threads library (libpthread) or the Solaris OS threads library (libthread), then libmtsk will treat each of the user program threads as a master thread, and provide it with its own team of slave threads.

Figure 4


Tools Support

The Sun Studio application development suite provides a variety of tools that facilitate and support OpenMP programming.  These include tools that aid the programmer in parallelizing a program using OpenMP, as well as tools for checking, debugging, and analyzing the performance of OpenMP programs.  Some of these tools are described below.

Automatic Scoping of Variables

The process of manually specifying scopes of variables when writing an OpenMP program is both tedious and error-prone.  To improve productivity, an autoscoping feature was implemented in the Sun Studio compilers, as a Sun-specific extension to OpenMP.  The Sun Studio compilers are currently the only commercially available compilers that support this feature.

The autoscoping feature leverages the analysis capability of the Optimizer to determine the appropriate scopes of variables.  The programmer specifies which variables in a given PARALLEL construct should be scoped automatically by the Optimizer.  The Optimizer determines the appropriate scopes of  these variables by analyzing the program and applying a set of autoscoping rules.  The scoping results are displayed in an annotated source code listing as compiler commentary.  This automatic scoping feature offers a very attractive compromise between automatic and manual parallelization.

For  additional information on autoscoping, refer to [7].

Static Error Checking

Under the control of the compiler option -vpara  (Fortran) or -xvpara (C),  the compiler can check a program for a variety of static errors.   These include invalid nesting of OpenMP constructs, invalid scoping of variables, data races, etc.

In addition, under the control of the compiler option -XlistMP, the Fortran compiler can perform global (inter-procedural) analysis of the program and report inconsistencies and possible runtime problems in the code.  Problems reported include invalid use of OpenMP directives, errors in alignment, disagreement in the number or type of procedure arguments, etc.

Runtime Error Checking

If the environment variable SUNW_MP_WARN is set to TRUE, then the runtime library libmtsk checks the program for a variety of runtime errors.  Problems reported include semantic errors that violate the OpenMP Specification, invalid nesting of OpenMP constructs, inconsistencies in the use of OpenMP directives, invalid chunk sizes, deadlock at barriers, etc.

OpenMP Debugging

The dbx tool in the Sun Studio software can be used to debug C, C++, and Fortran OpenMP programs. An OpenMP program should first be prepared for debugging with dbx by compiling it with the options -xopenmp=noopt -g.

All of the dbx commands that operate on threads can be used for OpenMP debugging. dbx allows the user to single-step into a PARALLEL region, set breakpoints in the body of an OpenMP construct, as well as print the values of SHARED, PRIVATE, THREADPRIVATE, etc., variables for a given thread.

Performance Analysis

The Collector and Performance Analyzer [8] are a pair of tools in Sun Studio that can be used to collect and analyze performance data for an application.  Both tools can be used from the command line or from a graphical user interface.

The Collector tool collects performance data using a statistical method called profiling and by tracing function calls. The data can include call-stacks, microstate accounting information, thread synchronization delay data, hardware counter overflow data, memory allocation data, and summary information for the operating system and the process.

The Performance Analyzer processes the data recorded by the Collector, and displays various metrics of performance at program, function, caller-callee, source-line, and assembly instruction levels. The Performance Analyzer can also display the raw data in a graphical format as a function of time.

The Performance Analyzer can present the performance of an OpenMP program in either of two modes: User mode and Machine mode.

In User mode, the Performance Analyzer presents profile data in a manner that matches the user's intuitive understanding of the program. In this mode, the master thread and slave thread call-stacks are reconciled, and artificial functions with names of the form are added to the call-stack when the OpenMP runtime library is performing certain operations.

In Machine mode, the Performance Analyzer presents the call-stacks as measured, with no transformations done and no artificial functions constructed, thus exposing the implementation details of the runtime library, libmtsk.

Compiler Commentary

Compiler commentary in annotated source code listings informs the user about the various optimizations and transformations that have been applied to the source code by the compiler. The generate compiler commentary, the program should be compiled with -g. The compiler commentary can be viewed in an annotated source code listing by using the Performance Analyzer or by running the command-line utility er_src.

OpenMP Performance


SPEC OMP2001 is a software benchmark produced by the High-Performance Group (HPG) of the Standard Performance Evaluation Corporation (SPEC).  The benchmark is designed to evaluate the performance of real scientific and engineering applications parallelized using OpenMP, and is representative of high performance technical computing applications from the areas of chemistry, mechanical engineering, climate modeling, and physics (see Table 1).

The SPEC OMP2001 benchmark contains two suites. The first suite, SPEC OMPM2001, uses a medium-sized data set and is designed to measure the performance of shared-memory systems with between four and 32 processors.  The second suite, SPEC OMPL2001, uses a large-sized data set and is designed to measure the performance of systems with a larger number of processors.

Name Application Area Language
311.wupwise Quantum chromodynamics Fortran
313.swim Shallow water modeling Fortran
315.mgrid Multi-grid solver Fortran
317.applu Partial differential  equations Fortran
321.equake Earthquake modeling C
325.apsi Air pollutants Fortran
327.gafort Genetic algorithm Fortran
329.fma3d Crash simulation Fortran
331.art Neural network simulation C
318.galgel Fluid dynamics analysis Fortran
332.ammp Computational chemistry C

Table 1: Applications in the SPEC OMP2001 Benchmark


The SPEC OMP2001 benchmark exhibits superior scaling and performance on Sun systems.  Figure 6 shows the scaling of the SPEC OMPL2001 suite (base performance) on a Sun Fire 6800 configured with 1.2 GHz UltraSPARC III Cu processors.  Figure 7 shows the scaling of the SPEC OMPL2001 suite (base performance) on a Sun Fire 15K also configured with 1.2 GHz UltraSPARC III Cu processors.

Sun Microsystems has announced several world-record performance results for the SPEC OMP2001 benchmark:
  • In June 2003, Sun Microsystems announced a world-record for the SPEC OMPL2001 suite (peak performance) on a Sun Fire 15K server configured with 72 UltraSPARC III Cu processors.  The Sun Fire 15K server was the first server to break the 200,000 mark with a score of 213,466.
  • In February 2004, Sun Microsystems established a new world record on the SPEC OMPL2001 suite (peak performance) on a Sun Fire E25K server configured with 72 UltraSPARC IV processors  The Sun Fire E25K system set a record SPEC OMPL2001 peak performance score of 316,182.
  • In March 2005, Sun Microsystems announced new world record SPEC OMPM2001 results in the two- and four-thread categories. The peak result of 12,434 on the Sun Fire V40z server in a four CPU configuration outperformed the scores reported by other commercially available compilers by up to 43 percent.
Detailed SPEC OMP2001 benchmark results can be found at [10].

 Figure 6
Figure 6: Scaling of OMPL2001 (Base) on Sun Fire 6800

Figure 7
Figure 7: Scaling of OMPL2001 (Base) on Sun Fire 15K

Future Directions


Sun Microsystems continues to invest in delivering world-class, high quality OpenMP support in its compilers and tools.  Current and future projects include the following:
  • New OpenMP features.  Sun continues to track changes in the OpenMP Specification and play an active part in its evolution.
  • OpenMP-specific optimizations for improved performance. These include compiler optimizations, such as removal of redundant barriers, as well as optimizations in the runtime library for reducing the overhead of parallelization.
  • Enhanced tools support.  Sun continues to invest in tools that aid the programmer in writing, debugging, and analyzing the performance of OpenMP programs.  These tools include enhanced automatic scoping for OpenMP programs, data race detection, and interactive tools to assist programmers in parallelizing their applications.
  • Architectural support.  Sun continues to enhance the performance of its OpenMP implementation.  Work in this area includes improving performance on Non-Uniform Memory Access (NUMA) machines and on Chip Multi-threading (CMT) architectures.

References


1. OpenMP Specification,  http://www.openmp.org/drupal/node/view/8
2. OpenMP Architecture Review Board, http://www.openmp.org
3. Sun Studio Software, http://www.sun.com/software/products/studio/index.html
4. Sun Studio 11 OpenMP API User's Guide, http://docs.sun.com/doc/819-3694
5. Sun Fire E25K server, http://www.sun.com/servers/highend/sunfire_e25k/index.xml
6. The SPEC OMP benchmark suite,  http://www.spec.org/omp
7. Yuan Lin, Christian Terboven, Dieter an Mey, and Nawal Copty, “Automatic Scoping of Variables in Parallel Regions of an OpenMP Program”, WOMPAT 2004. (PDF)
8.Sun Studio 11: Performance Analyzer, http:/docs.sun.com/app/docs/doc/819-3687
9. Myungho Lee, Brian Whitney, and Nawal Copty, “Performance and Scalability of OpenMP Programs on the Sun Fire E25K Throughput Computing Server”, WOMPAT 2004.
10. SPEC OMP2001 benchmarks results, http://www.spec.org/omp/results

APPENDIX A.1: Parallel Directive Example


The following is a simple “Hello World” program with a PARALLEL directive.  The number of threads to be used is specified via a call to the OpenMP runtime library routine omp_set_num_threads.  Dynamic adjustment of the number of threads is disabled by calling the OpenMP runtime library routine omp_set_dynamic.

The initial thread of the program executes sequentially until it reaches the PARALLEL construct.  At that point, the initial thread creates a team of 10 threads.  The team is composed of the initial thread itself (master of the team) and 9 other threads (slaves of the team).  All the threads in the team execute the code enclosed in the PARALLEL construct concurrently.  When a thread reaches the end of the PARALLEL construct, it waits at the implicit barrier at the end of the construct.  When all the threads have reached the barrier, only the master thread continues executing the code following the PARALLEL construct.

Fortran – PARALLEL Directive Example:
       PROGRAM HELLO
USE OMP_LIB
INTEGER TID
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (10)
!$OMP PARALLEL PRIVATE (TID)
! Obtain thread ID.
TID = OMP_GET_THREAD_NUM()
! Print thread ID.
PRINT *, 'Hello World from thread = ', TID
!$OMP END PARALLEL
END

C/C++ – PARALLEL Directive Example:
#include <stdio.h>
#include <omp.h>
int main(void)
{
int tid; omp_set_dynamic(0);
omp_set_num_threads(10);
#pragma omp parallel private(tid)
{
/* Obtain thread ID. */
tid = omp_get_thread_num();
/* Print thread ID. */
printf ("Hello World from thread = %d\n", tid);
}
}

APPENDIX A.2: DO/for Directive Example


The following is an example program with a DO/for directive.  The initial thread of the program executes sequentially until it reaches the PARALLEL construct.  At that point, the initial thread creates a team of 20 threads.  The team is composed of the initial thread itself (master of the team) and 19 other threads (slaves of the team).  All the threads in the team execute the code enclosed in the PARALLEL construct concurrently.

When the threads in the team encounter the DO/for construct, the 100 iterations of the loop are divided among the 20 threads.  So, each thread executes 5 iterations of  the loop.  The threads execute their iterations concurrently.  When a thread completes its work, it waits at the implicit barrier at the end of the DO/for loop.  When all threads have reached the barrier, the threads continue executing the PARALLEL region code.

Fortran – DO Directive Example:
       PROGRAM VECTOR_ADD
USE OMP_LIB
PARAMETER (N=100)
INTEGER N, I
REAL A(N), B(N), C(N)
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (20)
! Initialize arrays A and B.
DO I = 1, N
A(I) = I * 1.0
B(I) = I * 2.0
ENDDO
! Compute values of array C in parallel.
!$OMP PARALLEL SHARED(A, B, C), PRIVATE(I)
!$OMP DO
DO I = 1, N
C(I) = A(I) + B(I)
ENDDO
!$OMP END PARALLEL
PRINT *, C(10)
END

C/C++ – For Directive Example:
#include <stdio.h>
#include <omp.h>
#define N 100
int main(void)
{
float a[N], b[N], c[N];
int i;
omp_set_dynamic(0);
omp_set_num_threads(20);
/* Initialize arrays a and b. */
for (i = 0; i < N; i++)
{
a[i] = i * 1.0;
b[i] = i * 2.0;
}
/* Compute values of array c in parallel. */
#pragma omp parallel shared(a, b, c) private(i)
{
#pragma omp for
for (i = 0; i < N; i++)
c[i] = a[i] + b[i];
}
printf ("%f\n", c[10]);
}

APPENDIX A.3: Parallel Sections Example


The following is an example program with a SECTIONS directive applied to three sections of code. The initial thread of the program executes sequentially until it reaches the PARALLEL construct. At that point, the initial thread creates a team of 3 threads. The team is composed of the initial thread itself (master of the team) and 2 other threads (slaves of the team). All the threads in the team execute the code enclosed in the PARALLEL construct concurrently.

When the threads in the team encounter the SECTIONS construct, the 3 sections are divided among the 3 threads in the team. Each section is executed only once by a thread in the team. When a thread completes its work, it waits at the implicit barrier at the end of the SECTIONS construct. When all threads have reached the barrier, the threads continue executing the PARALLEL region code.

Fortran – SECTIONS Directive Example:
       PROGRAM SECTIONS
USE OMP_LIB
INTEGER SQUARE
INTEGER X, Y, Z, XS, YS, ZS
CALL OMP_SET_DYNAMIC (.FALSE.)
CALL OMP_SET_NUM_THREADS (3)
X = 2
Y = 3
Z = 5
!$OMP PARALLEL
!$OMP SECTIONS
!$OMP SECTION
XS = SQUARE(X)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "XS =", XS
!$OMP SECTION
YS = SQUARE(Y)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "YS =", YS
!$OMP SECTION
ZS = SQUARE(Z)
PRINT *, "ID = ", OMP_GET_THREAD_NUM(), "ZS =", ZS
!$OMP END SECTIONS
!$OMP END PARALLEL
END
INTEGER FUNCTION SQUARE(N)
INTEGER N
SQUARE = N*N
END

C/C++ – SECTIONS Directive Example:
#include <stdio.h>
#include <omp.h>
int square(int n);
int main(void)
{
int x, y, z, xs, ys, zs;
omp_set_dynamic(0);
omp_set_num_threads(3);
x = 2;
y = 3;
z = 5;
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
xs = square(x);
printf ("id = %d, xs = %d\n", omp_get_thread_num(), xs);
}
#pragma omp section
{
ys = square(y);
printf ("id = %d, ys = %d\n", omp_get_thread_num(), ys);
}
#pragma omp section
{
zs = square(z);
printf ("id = %d, zs = %d\n", omp_get_thread_num(), zs);
}
}
}
}
int square(int n)
{
return n*n;
}
About the Author
Nawal Copty is a staff engineer in the Scalable Systems Group, and OpenMP project lead.
 
Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.
 
Related Links