VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING OpenMP Runtime Error Detection with A RCHER At the 27nd VI-HPS Tuning Workshop Joachim Protze, Simone Atzeni RWTH Aachen University, University of Utah April 2018 + VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race example in OpenMP What could possibly go static double farg1,farg2; wrong? #define FMAX(a,b) (farg1=(a),farg2=(b),farg1>farg2?farg1:farg2) To avoid side effects, the arguments Double checked scoping of variables: are copied to temporary storage everything seems to be fine 1619: #pragma omp parallel for shared(bar, foo, THRESH) 1620: for (x=0; x<1000; x++) 1621: T = FMAX(0.1111*foo*bar[x],THRESH); What could possibly go Tool flags a write-write race in line 1621 wrong? JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 2 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects Data Deadlocks Races JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 3 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects – Deadlock A circular wait condition exists in the system that causes two or more parallel units to wait indefinitely #pragma omp parallel sections time { #pragma omp section Deadlocking { set(lock_a) set(lock_b) Execution omp_set_lock(&lock_a); Thread1 Order omp_set_lock(&lock_b); Thread2 omp_unset_lock(&lock_b); omp_unset_lock(&lock_a); set(lock_b) set(lock_a) } #pragma omp section { • Thread 1 waits for lock_b owned by omp_set_lock(&lock_b); thread 2 omp_set_lock(&lock_a); • Thread 2 waits for lock_a, owned by omp_unset_lock(&lock_a); Thread 1. omp_unset_lock(&lock_b); } • Neither thread can free a lock and } both threads wait indefinitely. JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 4 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects – Data Race Program behavior dependent on execution order of threads/processes int x,y; int x,y; #pragma omp parallel #pragma omp parallel { { x = omp_get_thread_num (); #pragma omp master #pragma omp barrier sleep(5); #pragma omp master x = omp_get_thread_num (); printf (“Master is:%d” ,x); #pragma omp barrier } #pragma omp master printf (“Master is:%d” ,x); } A write-write race on x If the master thread is intended to write x, it will usually do so, due to the sleep; But sometimes it may not … JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 5 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Definitions Data race Deadlock ▪ Two threads access the same shared ▪ Two or more threads are waiting for variable each other to release locks while ▪ at least one thread modifies the variable holding the lock the other leads to non- ▪ the accesses are concurrent, i.e. deterministic behavior unsynchronized ▪ Program hangs ▪ Leads to non-deterministic behavior ▪ May be non-deterministic ▪ Hard to find with traditional debugging tools JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 6 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race detection tools Helgrind Intel Inspector (XE?) ▪ valgrind --tool=helgrind ▪ They rename the tool every other year ☺ ▪ Less false alerts ▪ Many false alerts ▪ Especially for newer OpenMP ▪ Misses synchronization information clauses/constructs ▪ Binary instrumentation during execution ▪ High runtime overhead for detailed analysis JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 7 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race detection tools Archer ▪ Error checking tool for ▪ Memory errors ▪ Threading errors (OpenMP, Pthreads) ▪ Based on ThreadSanitizer (runtime check) ▪ Available for Linux, Windows and Mac ▪ Supports C, C++ (Fortran in work) ▪ Modified OpenMP runtime improved for data race detection ▪ More info: https://github.com/PRUNERS/archer JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 8 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Archer – Background ▪ Static Analysis ▪ Only for OpenMP programs ▪ Exclude race free regions and sequential code from runtime analysis to reduce overhead ▪ Runtime check ▪ Error detection only in software branches that are executed ▪ Low runtime overhead ▪ Roughly 2x - 20x ▪ Detect races in large OpenMP applications ▪ No false positives ▪ Compiler instrumentation ▪ Slower compilation process (apply different passes on the source code to identify race free regions of code, instruments only the rest) JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 9 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Archer – Usage ▪ Compile the program with the –g and –fsanitize=thread flag ▪ clang-archer myprog.c –o myprog ▪ Run the program under control of A Runtime RCHER ▪ export OMP_NUM_THREADS=... ./myprog ▪ Detects problems only in software branches that are executed ▪ Understand and correct the threading errors detected ▪ Edit the source code ▪ Repeat until no errors reported JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 10
Description: