This lesson is in the early stages of development (Alpha version)

Introduction to Parallel Programming using OpenMP: Glossary

Key Points

Introduction
  • A core is a physical independent execution unit capable of running one program thread.

  • A node is another term to refer to a computer in a network.

  • In recent years computers with several cores per CPU have become the norm and are likely to continue being into future.

  • Learning parallelization techniques let you exploit multi core systems more effectively

Single vs Parallel computers
  • Clock speed and number of cores are two key elements that affect a computer’s performance

  • A single computer is typically understood as a single core CPU

  • A parallel computer is typically understood as a multi core CPU

  • Multiple cores in a parallel computer are able to share a CPU’s memory

Shared vs Distributed Memory
  • Shared memory is the physical memory shared by all CPUs in a multi-processor computer

  • Distributed memory is the system created by linking the shared memories of different computers

  • It is important to distribute workload as equally as possible among processors to increase performance

Using shared memory
  • OpenMP is an API that defines directives to parallelize programs written in Fortran, C and C++

  • OpenMP relies on directives called pragmas to define sections of code to work in parallel by distributing it on threads.

My First Thread
  • Importing omp.h in C,C++, omp_lib in Fortran 90 and omp_lib.h in Fortran 77 allows OpenMP functions to be used in your code

  • OpenMP construct parallel in C,C++ and PARALLEL instructs the compiler to create a team of threads to distribute the region of the code enclosed by the construct

Basic thread control
  • OpenMP threads can be identified by querying their ID using OpenMP functions

  • OpenMP barrier construct allow us to define a point that all threads need to reach before continuing the program execution

  • OpenMP master construct allow us to define regions of the code that should only be executed by the master thread

Setting number of threads
  • We can use the environment variable OMP_NUM_THREADS to control how many threads are created by OpenMP by default

  • OpenMP function num_threads have a similar effect and can be used within your code

Data sharing
  • The OpenMP sections construct allow us to divide a code in regions to be executed by individual threads

  • The OpenMP single construct specifies that a code region should only be executed by one thread

  • Many OpenMP constructs apply an implicit barrier at the end of its defined region. This can be overruled with nowait clause. However, this should be done carefully as to avoid data conflicts and race conditions

Synchronization
  • OpenMP constructs master, barrier and critical are useful to define sections and points in our code where threads should synchronize with each other

Reduction operation
  • OpenMP constructs copyprivate and reduction are useful to pass values obtained by a one or more threads to other threads in the team, and to perform some recurrence calculations in parallel

OpenMP and Slurm
  • Use Slurm –cpus-per-task option to request the number of threads

  • Set OMP_NUM_THREADS equal to the number of cpus per task requested

  • Be careful not to exceed the number of cores available per node

Advanced features
  • OpenMP is still an evolving interface to parallel code.

Glossary

FIXME