wiki:performance/compiler_optionen/intel/example_vec_report_stream

Example: Intel compiler optimization report for benchmark stream

  • Code fragments benchmark stream
    void inline tuned_STREAM_Scale(STREAM_TYPE scalar) {                                   // L.527
        #pragma omp parallel shared(scalar)                                                // L.528
        {                                                                                  // L.529
            #ifdef __INTEL_COMPILER                                                        // L.530
                // Instructs the compiler to use non-temporal (that is, streaming) stores  // L.531
                #pragma vector nontemporal                                                 // L.532
            #endif                                                                         // L.533
            #pragma omp simd aligned (b, c : alignment_bytes)                              // L.534
            for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++)                        // L.535
                b[j] = scalar*c[j];                                                        // L.536
        }                                                                                  // L.537
    }                                                                                      // L.538
                                                                                           // L.539
    void inline tuned_STREAM_Add() {                                                       // L.540
        #pragma omp parallel                                                               // L.541
        {                                                                                  // L.542
            #ifdef __INTEL_COMPILER                                                        // L.543
                // Instructs the compiler to use non-temporal (that is, streaming) stores  // L.544
                #pragma vector nontemporal                                                 // L.545
            #endif                                                                         // L.546
            #pragma omp simd aligned (a, b, c : alignment_bytes)                           // L.547
            for (long int j = 0; j < STREAM_ARRAY_SIZE_thread; j++)                        // L.548
                c[j] = a[j] + b[j];                                                        // L.549
        }                                                                                  // L.550
    }                                                                                      // L.551
    
  • Compile benchmark with optimization report enabled
    module add compiler/intel
    icc -std=c11 -Ofast -xHost -ipo -qopenmp \
        -qopt-report=5 -qopt-report-stdout \
        stream.c
    
  • Output
    ...
    LOOP BEGIN at stream.c(535,9) inlined into stream.c(349,9)
       remark #15388: vectorization support: reference *b[j] has aligned access   [ stream.c(536,13) ]
       remark #15388: vectorization support: reference *c[j] has aligned access   [ stream.c(536,27) ]
       remark #15412: vectorization support: streaming store was generated for b   [ stream.c(536,13) ]
       remark #15305: vectorization support: vector length 4
       remark #15309: vectorization support: normalized vectorization overhead 0.200
       remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
       remark #15448: unmasked aligned unit stride loads: 1 
       remark #15449: unmasked aligned unit stride stores: 1 
       remark #15467: unmasked aligned streaming stores: 1 
       remark #15475: --- begin vector cost summary ---
       remark #15476: scalar cost: 7 
       remark #15477: vector cost: 1.250 
       remark #15478: estimated potential speedup: 5.580 
       remark #15488: --- end vector cost summary ---
    LOOP END
    
    LOOP BEGIN at stream.c(535,9) inlined into stream.c(349,9)
    <Remainder loop for vectorization>
    LOOP END
    
    ...
    LOOP BEGIN at stream.c(548,9) inlined into stream.c(353,9)
       remark #15388: vectorization support: reference *c[j] has aligned access   [ stream.c(549,13) ]
       remark #15388: vectorization support: reference *a[j] has aligned access   [ stream.c(549,20) ]
       remark #15388: vectorization support: reference *b[j] has aligned access   [ stream.c(549,27) ]
       remark #15412: vectorization support: streaming store was generated for c   [ stream.c(549,13) ]
       remark #15305: vectorization support: vector length 4
       remark #15301: OpenMP SIMD LOOP WAS VECTORIZED
       remark #15448: unmasked aligned unit stride loads: 2 
       remark #15449: unmasked aligned unit stride stores: 1 
       remark #15467: unmasked aligned streaming stores: 1 
       remark #15475: --- begin vector cost summary ---
       remark #15476: scalar cost: 8 
       remark #15477: vector cost: 1.250 
       remark #15478: estimated potential speedup: 6.400 
       remark #15488: --- end vector cost summary ---
    LOOP END
    
    LOOP BEGIN at stream.c(548,9) inlined into stream.c(353,9)
    <Remainder loop for vectorization>
    LOOP END
    
    • Report on data alignment
    • Report on loads, stores and streaming store
    • Report on successful vectorization
Last modified 7 days ago Last modified on Apr 9, 2018, 4:12:24 PM