Example: Intel compiler optimization report for benchmark stream
- Code fragments benchmark
stream
void inline tuned_STREAM_Scale(STREAM_TYPE scalar) { // L.527 #pragma omp parallel shared(scalar) // L.528 { // L.529 #ifdef __INTEL_COMPILER // L.530 // Instructs the compiler to use non-temporal (that is, streaming) stores // L.531 #pragma vector nontemporal // L.532 #endif // L.533 #pragma omp simd aligned (b, c : alignment_bytes) // L.534 for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++) // L.535 b[j] = scalar*c[j]; // L.536 } // L.537 } // L.538 // L.539 void inline tuned_STREAM_Add() { // L.540 #pragma omp parallel // L.541 { // L.542 #ifdef __INTEL_COMPILER // L.543 // Instructs the compiler to use non-temporal (that is, streaming) stores // L.544 #pragma vector nontemporal // L.545 #endif // L.546 #pragma omp simd aligned (a, b, c : alignment_bytes) // L.547 for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++) // L.548 c[j] = a[j] + b[j]; // L.549 } // L.550 } // L.551
- Compile benchmark with optimization report enabled
module add compiler/intel icc -std=c11 -Ofast -xHost -ipo -qopenmp \ -qopt-report=5 -qopt-report-stdout \ stream.c
- Output
... LOOP BEGIN at stream.c(535,9) inlined into stream.c(349,9) remark #15388: vectorization support: reference *b[j] has aligned access [ stream.c(536,13) ] remark #15388: vectorization support: reference *c[j] has aligned access [ stream.c(536,27) ] remark #15412: vectorization support: streaming store was generated for b [ stream.c(536,13) ] remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.200 remark #15301: OpenMP SIMD LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15467: unmasked aligned streaming stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 7 remark #15477: vector cost: 1.250 remark #15478: estimated potential speedup: 5.580 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at stream.c(535,9) inlined into stream.c(349,9) <Remainder loop for vectorization> LOOP END
... LOOP BEGIN at stream.c(548,9) inlined into stream.c(353,9) remark #15388: vectorization support: reference *c[j] has aligned access [ stream.c(549,13) ] remark #15388: vectorization support: reference *a[j] has aligned access [ stream.c(549,20) ] remark #15388: vectorization support: reference *b[j] has aligned access [ stream.c(549,27) ] remark #15412: vectorization support: streaming store was generated for c [ stream.c(549,13) ] remark #15305: vectorization support: vector length 4 remark #15301: OpenMP SIMD LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15467: unmasked aligned streaming stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 8 remark #15477: vector cost: 1.250 remark #15478: estimated potential speedup: 6.400 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at stream.c(548,9) inlined into stream.c(353,9) <Remainder loop for vectorization> LOOP END
- Report on data alignment
- Report on loads, stores and streaming store
- Report on successful vectorization
Last modified 7 days ago
Last modified on Apr 9, 2018, 4:12:24 PM