performance/compiler_optionen/llvm/example_loop_vectorizer_diagnostics_stream
Example:
LLVM compiler loop vectorizer diagnostics for benchmark
stream
stream
source code snippet/* --- Tuned vector scale: b[] = scalar * c[] In: STREAM_ARRAY_SIZE_thread, scalar, c[] Out: b[] --- */ void static inline tuned_STREAM_Scale(const STREAM_TYPE scalar) { #pragma omp parallel default(none) shared(scalar, STREAM_ARRAY_SIZE_thread) { #ifdef __INTEL_COMPILER // Instructs the compiler to use non-temporal (that is, streaming) stores #pragma vector nontemporal #endif #pragma omp simd aligned(b, c : alignment_bytes) for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++) { [j] = scalar * c[j]; // Line: 349 b} } } /* --- Tuned vector add: c[] = a[] + b[] In: STREAM_ARRAY_SIZE_thread, a[], b[] Out: c[] --- */ void static inline tuned_STREAM_Add() { #pragma omp parallel default(none) shared(STREAM_ARRAY_SIZE_thread) { #ifdef __INTEL_COMPILER // Instructs the compiler to use non-temporal (that is, streaming) stores #pragma vector nontemporal #endif #pragma omp simd aligned(a, b, c : alignment_bytes) for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++) { [j] = a[j] + b[j]; // Line: 369 c} } }
Prepare environment
module purge module add compiler/llvm
Compile benchmark with optimization report enabled
# -Rpass=loop-vectorize # -> identifies loops that were successfully vectorized # -Rpass-missed=loop-vectorize # -> identifies loops that failed vectorization and indicates if vectorization was specified. # -Rpass-analysis=loop-vectorize # -> identifies the statements that caused vectorization to fail. # If in addition -fsave-optimization-record is provided, multiple causes of vectorization failure may be listed. clang -std=c11 -Ofast -march=native -flto -fopenmp \ -Rpass=loop-vectorize \ -Rpass-missed=loop-vectorize \ -Rpass-analysis=loop-vectorize \ -o stream stream.OpenMP.c
Output
stream.OpenMP.c:347:9: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize] #pragma omp simd aligned(b, c : alignment_bytes) ^ stream.OpenMP.c:367:9: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize] #pragma omp simd aligned(a, b, c : alignment_bytes) ^ ... LLVM gold plugin: stream.OpenMP.c:347:9: loop not vectorized: vectorization and interleaving are explicitly disabled, or the loop has already been vectorized LLVM gold plugin: stream.OpenMP.c:347:9: loop not vectorized: vectorization and interleaving are explicitly disabled, or the loop has already been vectorized LLVM gold plugin: stream.OpenMP.c:367:9: loop not vectorized: vectorization and interleaving are explicitly disabled, or the loop has already been vectorized LLVM gold plugin: stream.OpenMP.c:367:9: loop not vectorized: vectorization and interleaving are explicitly disabled, or the loop has already been vectorized ...
- Report on successful vectorization
- Report on vector length