ebook img

Step 3 - archer PDF

55 Pages·2014·0.86 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Step 3 - archer

Implicit Vectorisation Stephen Blair-Chappell Intel Compiler Labs This training relies on you owning a copy of the following… Parallel Programming with Parallel Studio XE Stephen Blair-Chappell & Andrew Stokes Wiley ISBN: 9780470891650 Part I: Introduction Part II: Using Parallel Studio XE Part III :Case Studies 1: Parallelism Today 4: Producing Optimized Code 13: The World’s First Sudoku ‘Thirty-Niner’ 2: An Overview of Parallel Studio XE 5: Writing Secure Code 14: Nine Tips to Parallel Heaven 3: Parallel Studio XE for the Impatient 6: Where to Parallelize 15: Parallel Track-Fitting in the CERN Collider 7: Implementing Parallelism 16: Parallelizing Legacy Code 8: Checking for Errors 9: Tuning Parallelism 10: Advisor-Driven Design 11: Debugging Parallel Applications 12:Event-Based Analysis with VTune Amplifier XE 2 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. What’s in this section? • (A seven-step optimization process ) • Using different compiler options to optimize your code • Using auto-vectorization to tune your application to different CPUs 3 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. The Sample Application • Initialises two matrices with a numeric sequence • Does a Matrix Multiplication 4 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. The main loop (without timing & printf) // repeat experiment six times for( l=0; l<6; l++ ) { // initialize matrix a sum = Work(&total,a); // initialize matrix b; for (i = 0; i < N; i++) { for (j=0; j<N; j++) { for (k=0;k<DENOM_LOOP;k++) { sum += m/denominator; } b[N*i + j] = sum; } } // do the matrix manipulation MatrixMul( (double (*)[N])a, (double (*)[N])b, (double (*)[N])c); } 5 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. The Matrix Multiply void MatrixMul(double a[N][N], double b[N][N], double c[N][N]) { int i,j,k; for (i=0; i<N; i++) { for (j=0; j<N; j++) { for (k=0; k<N; k++) { c[i][j] += a[i][k] * b[k][j]; } } } } 6 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. start Step 1 Example options Build with Windows (Linux) s optimization disabled p /Od (-O0) e Step 2 t Use General S Optimizations n /01,/02,/03 (-O1, -O2, -O3) o Step 3 i Use Processor-Specific t /QxSSE4.2 (-xsse4.2) a Options /QxHOST (-xhost) s i Step 4 m Add Inter-procedural /Qipo (-ipo) i t p Step 5 O Use Profile Guided /Qprof-gen (-prof-gen) n Optimization /Qprof-use (-prof-use) e Step 6 v Tune automatic e S vectorization /Qguide (-guide) e h Step 7 T Implement Parallelism Use Intel Family of Parallel Models or use Automatic /Qparallel (-parallel) Parallelism Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. start Step 1 Example options Build with Windows (Linux) s optimization disabled p /Od (-O0) e Step 2 t Use General S Optimizations n /01,/02,/03 (-O1, -O2, -O3) o Step 3 i Use Processor-Specific t /QxSSE4.2 (-xsse4.2) a Options /QxHOST (-xhost) s i Step 4 m Add Inter-procedural /Qipo (-ipo) i t p Step 5 O Use Profile Guided /Qprof-gen (-prof-gen) n Optimization /Qprof-use (-prof-use) e Step 6 v Tune automatic e S vectorization /Qguide (-guide) e h Step 7 T Implement Parallelism Use Intel Family of Parallel Models or use Automatic /Qparallel (-parallel) Parallelism Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Compiler Architecture C++ FORTRAN Front End Front End Profiler Disambiguation: types, array, pointer, structure, Interprocedural analysis and optimizations: inlining, directives constant prop, whole program detect, mod/ref, points-to Loop optimizations: data deps, prefetch, vectorizer, unroll/interchange/fusion/dist, auto-parallel/OpenMP Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim Code generation: vectorization, software pipelining, global scheduling, register allocation, code generation Step 2 9 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Getting Visibility : Compiler Optimization Report Compiler switch: (Linux) -opt-report-phase[=phase] ‚ ‘ can be: phase – Interprocedural Optimization • ipo – Intermediate Language Scalar Optimization • ilo – High Performance Optimization • hpo – High-level Optimization • hlo … – All optimizations (not recommended, output too • all verbose) Control the level of detail in the report: (Windows) /Qopt-report[0|1|2|3] (Linux, MacOS X) -opt-report[0|1|2|3] Step 2 10 10 8/2/2012 Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Description:
Part II: Using Parallel Studio XE Part III :Case Studies. 1: Parallelism Today . SIMD Instruction Enhancements. 15. Step 3. 70 instr. Single- blocks. Advanced vector instr. SSE. 1999. SSE2. 2000. SSE3. 2004. SSSE3. 2006. SSE4.1. 2007.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.