My First Thread
Overview
Teaching: 20 min
Exercises: 10 minQuestions
How to use OpenMP in C and Fortran?
OpenMP parallel and loop constructs
Objectives
Identify libraries that enable OpenMP functions
Identify compiler flags to enable OpenMP support
Familiarize with OpenMP main constructs
Setting up
In principle, the exercises on this training lessons can be done on any computer with a compiler that supports OpenMP, but they have been tested on Cardiff University Linux-based supercomputer “Hawk” using Intel compilers 2017.
Access to the system is necessary to undertake this course. It is assumed that attendees have a user account or have received a guest training account.
Please follow the instructions below to obtain some example scripts:
- Download arc_openmp.zip and extract somewhere on Hawk.
- Extract the zip file and check the extracted directory
For example:
$ wget https://supercomputingwales.github.io/Introduction-to-Parallel-Programming-using-OpenMP/data/arc_openmp.zip
$ unzip arc_openmp.zip
$ ls arc_openmp
A node reservation is created in partition c_compute_mdi1. To access it users need to specify in their job scripts:
#SBATCH --reservation=training
#SBATCH --account=scw1148
Using OpenMP
In order to use them, OpenMP function prototypes and types need to be included in our source code by importing a header fileomp.h (for C,C++), a module omp_lib (for Fortran 90) or a file named omp_lib.h (for Fortran 77).
Include OpenMP in your program
When using OpenMP in Fortran 77:
INCLUDE "omp_lib.h"
Declare functions at start of code e.g.
INTEGER OMP_GET_NUM_THREADS
When using Fortran 90:
USE omp_lib
When using C or C++:
#include<omp.h>
In order to enables the creation of multi-threaded code based on OpenMP directives, we need to pass compilation flags to our compiler:
Compiling OpenMP programs
For Fortran codes:
~/openmp/Intro_to_OpenMP.2020/fortran$ ifort –qopenmp –o first first.f90For C codes:
~/openmp/Intro_to_OpenMP.2020/c$ icc –qopenmp –o first first.c
My first thread
In this first example we will take a look at how to use OpenMP directives to parallelize sections of our code. But before been able to compile, we need a compiler with OpenMP support. Hawk provides several options but for this training course we will use Intel compilers 2017:
~$ module load compiler/intel/2017/7
Our first example looks like this (there is an equivalent Fortran code too available to you):
int main()
{
const int N = 10;
int i;
#pragma omp parallel for
for(i = 0; i < N; i++)
{
printf("I am counter %d\n", i);
}
}
#pragma (and !$OMP in the Fortran version) is an OpenMP directive that indicates to the compiler that the following section (a for loop in this case) needs to be parallelized. In C and C++ the parallel section is delimited by loop’s scope while in Fortran it needs to be explicitly marked with !$OMP END.
Random threads
- Try running the program above. What do you notice?
- Run it a number of times, what happens?
- What happens if you compile it without the –qopenmp argument?
The PARALLEL construct
This is the fundamental OpenMP construct for threading operations that defines a parallel region. When a thread encounters a parallel construct, a team of threads is created to execute the parallel region. The thread that encountered the parallel construct becomes the master thread of the new team, with a thread number of zero for the duration of the new parallel region. All threads in the new team, including the master thread, execute the region. The syntax of the parallel construct is as follows:
Fortran:
!$OMP PARALLEL [clause,[clause...]]
block
!$OMP END PARALLEL
C, C++:
#pragma parallel omp [clause,[clause...]]
{
block
}
Clauses
OpenMP is a shared memory programming model where most variables are visible to all threads by default. However, private variables are necessary sometimes to avoid race conditions and to pass values between the sequential part and the parallel region. Clauses are a data sharing attributes that allow data environment management by appending them to OpenMP directives
For example, a private
clause declares variables to be private to each thread in a team. Private copies of the variable are initialized from the original object when entering the parallel region. A shared
clause specifically shares variables among all the threads in a team, this is the default behaviour. A full list of clauses can be found in OpenMP documentation.
Loop constructs
The DO (Fortran) directive splits the following do loop across multiple threads.
!$OMP DO [clause,[clause...]]
do_loop
!$OMP END DO
Similarly, the “for” (C) directive splits the following do loop across multiple threads. Notice that no curly brackets are needed in this case.
#pragma omp for [clause,[clause...]]
for_loop
OpenMP clauses can also define how the loop iterations run across threads. They include:
SCHEDULE: How many chunks of the loop are allocated per thread.
Possible options are:
schedule(static, chunk-size)
: Gives threads chunks of sizechunk-size
in circular order around thread id.chunk-size
is optional, default is to divide up work to give one chunk to each thread.schedule(dynamic, chunk-size)
: Gives threads chunks of sizechunk-size
and when complete gives another chunk until complete.chunk-size
is optional, default is 1.schedule(guided, chunk-size)
: Minimum size given bychunk-size
but size of chunk initially is given by unassigned iterations divided by number of threads.schedule(auto)
: Decision is given to the compiler of runtime.
Auto schedule can be set with OMP_SCHEDULE
at runtime or omp_set_schedule
in the code at compile time.
If no SCHEDULE is given then compiler dependent default is used.
ORDERED: Loop will be executed as it would in serial, i.e. in order. These clauses are useful when trying to fine-tune the behaviour of our code, but caution should be observed since they can introduced unwanted communication overheads.
Working with private variables and loop constructs
Consider the previous example. What happens if you remove the for (DO) clause in our OpenMP construct? Is this what you expected? What happens if you add the private(i) (DO(i)) clause? How does the output changes? Why?
WORKSHARE
The WORKSHARE construct is a Fortran feature that consists of a region with a single structure block (section of code). Statements in the work share region are divided into units of work and executed (once) by threads of the team. A good example for block would be array assignment statements (I.e. no DO)
!$OMP WORKSHARE
block
!$OMP END WORKSHARE
Thread creation
Creating OpenMP threads add an overhead to the program’s overall runtime and for small loops this can be expensive enough that it doesn’t make sense to parallelize that section of the code. If there are several sections of code that require threading, it is better to parallelize the entire program and specify where the workload should be distributed among the threads team.
#pragma omp parallel for ...
for (int i=0; i<k; i++)
nwork1...
#pragma omp parallel for ...
for (int i=0; i<k; i++)
nwork2...
Is better to:
#pragma omp parallel ...
{
#pragma omp for ...
for (int i=0; i<k; i++)
nwork1...
#pragma omp for ...
for (int i=0; i<k; i++)
nwork2...
}
Key Points
Importing
omp.h
in C,C++,omp_lib
in Fortran 90 andomp_lib.h
in Fortran 77 allows OpenMP functions to be used in your codeOpenMP construct parallel in C,C++ and PARALLEL instructs the compiler to create a team of threads to distribute the region of the code enclosed by the construct