wiki:Tools/likwid/example_perfctr_stream

Example likwid-perfctr performance group MEM on benchmark stream

  • Build stream benchmark with GNU compiler
    module purge
    module add compiler/gnu/7
    gcc -std=c11 -Ofast -march=native -flto -fopenmp \
         stream.c -o stream
    
  • Set up OpenMP environment
    export OMP_NUM_THREADS=20
    
  • List available performance groups
    likwid-perfctr -a
    
    ...
            MEM     Main memory bandwidth in MBytes/s
         MEM_DP     Overview of arithmetic and main memory performance
         MEM_SP     Overview of arithmetic and main memory performance
           NUMA     Local and remote data transfers
    ...
    
  • Get detailed information on performance groups
    likwid-perfctr -H --group MEM
    
    Group MEM:
    Formulas:
    Memory read bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0))*64.0/runtime
    Memory read data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0))*64.0
    Memory write bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC1))*64.0/runtime
    Memory write data volume [GBytes] = 1.0E-09*(SUM(MBOXxC1))*64.0
    Memory bandwidth [MBytes/s] = 1.0E-06*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0/runtime
    Memory data volume [GBytes] = 1.0E-09*(SUM(MBOXxC0)+SUM(MBOXxC1))*64.0
    -
    Profiling group to measure memory bandwidth drawn by all cores of a socket.
    Since this group is based on Uncore events it is only possible to measure on a
    per socket base. Some of the counters may not be available on your system.
    Also outputs total data volume transferred from main memory.
    The same metrics are provided by the HA group.
    
  • Messure performance group MEM for benchmark stream on CPU 0 to 19
    likwid-perfctr \
        --group MEM \
        -C 0-19  \
        ./stream -n 100000000
    
    --------------------------------------------------------------------------------
    CPU name:       Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
    CPU type:       Intel Xeon Haswell EN/EP/EX processor
    CPU clock:      2.30 GHz
    --------------------------------------------------------------------------------
    
    -------------------------------------------------------------
    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 100000000 (elements) (elements)
    Memory per array = 762.9 MiB (= 0.7 GiB).
    Total memory required = 2288.8 MiB (= 2.2 GiB).
    Each kernel will be executed 10 times.
     The *best* time for each kernel (excluding the first iteration)
     will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Number of Threads requested = 20
    Number of Threads counted = 20
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 15955 microseconds.
       (= 15955 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function    Best Rate MB/s  Avg time     Min time     Max time
    Copy:          102387.3     0.016643     0.015627     0.020727
    Scale:          72944.7     0.022326     0.021934     0.024904
    Add:            81663.2     0.029859     0.029389     0.032404
    Triad:          81578.8     0.029487     0.029419     0.029520
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays
    -------------------------------------------------------------
    
    --------------------------------------------------------------------------------
    Group 1: MEM
    +-----------------------+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    |         Event         | Counter |   Core 0   |   Core 1   |   Core 2   |   Core 3   |   Core 4   |   Core 5   |   Core 6   |   Core 7   |   Core 8   |   Core 9   |   Core 10  |   Core 11  |   Core 12  |   Core 13  |   Core 14  |   Core 15  |   Core 16  |   Core 17  |   Core 18  |   Core 19  |
    +-----------------------+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    |   INSTR_RETIRED_ANY   |  FIXC0  |  938295039 |  860131666 |  844442515 |  862490745 |  855282747 |  863090338 |  869367586 |  862849791 |  851971039 |  866514428 |  851549344 |  851361100 |  844113575 |  864512780 |  851730705 |  859556446 |  843886084 |  860724500 |  872205627 |  872122155 |
    | CPU_CLK_UNHALTED_CORE |  FIXC1  | 2618345331 | 2536800836 | 2551076323 | 2534122499 | 2547936397 | 2548491438 | 2539820912 | 2532567385 | 2542849578 | 2546347750 | 2551713389 | 2551333284 | 2537147110 | 2529932248 | 2551593906 | 2542084452 | 2541590035 | 2546862115 | 2644565960 | 2550826845 |
    |  CPU_CLK_UNHALTED_REF |  FIXC2  | 2308912178 | 2244121316 | 2256404949 | 2240746434 | 2253571004 | 2254085192 | 2246398592 | 2239977222 | 2248706734 | 2251983751 | 2257285021 | 2256941125 | 2243232274 | 2237576712 | 2256949681 | 2247969906 | 2246809234 | 2252129709 | 2334964324 | 2255982393 |
    |      CAS_COUNT_RD     | MBOX0C0 |  149185857 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |  147906384 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX0C1 |   69185518 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |   69109259 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX1C0 |  152486380 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |  147870827 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX1C1 |   69626367 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |   69075586 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX2C0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX2C1 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX3C0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX3C1 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX4C0 |  149851128 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |  147883074 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX4C1 |   69262252 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |   69133632 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX5C0 |  149850079 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |  147845659 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX5C1 |   69420339 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |   69101844 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX6C0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX6C1 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_RD     | MBOX7C0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    |      CAS_COUNT_WR     | MBOX7C1 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |          0 |
    +-----------------------+---------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
    
    +----------------------------+---------+-------------+------------+------------+--------------+
    |            Event           | Counter |     Sum     |     Min    |     Max    |      Avg     |
    +----------------------------+---------+-------------+------------+------------+--------------+
    |   INSTR_RETIRED_ANY STAT   |  FIXC0  | 17246198210 |  843886084 |  938295039 | 8.623099e+08 |
    | CPU_CLK_UNHALTED_CORE STAT |  FIXC1  | 51046007793 | 2529932248 | 2644565960 | 2.552300e+09 |
    |  CPU_CLK_UNHALTED_REF STAT |  FIXC2  | 45134747751 | 2237576712 | 2334964324 | 2.256737e+09 |
    |      CAS_COUNT_RD STAT     | MBOX0C0 |   297092241 |          0 |  149185857 | 1.485461e+07 |
    |      CAS_COUNT_WR STAT     | MBOX0C1 |   138294777 |          0 |   69185518 | 6.914739e+06 |
    |      CAS_COUNT_RD STAT     | MBOX1C0 |   300357207 |          0 |  152486380 | 1.501786e+07 |
    |      CAS_COUNT_WR STAT     | MBOX1C1 |   138701953 |          0 |   69626367 | 6.935098e+06 |
    |      CAS_COUNT_RD STAT     | MBOX2C0 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_WR STAT     | MBOX2C1 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_RD STAT     | MBOX3C0 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_WR STAT     | MBOX3C1 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_RD STAT     | MBOX4C0 |   297734202 |          0 |  149851128 | 1.488671e+07 |
    |      CAS_COUNT_WR STAT     | MBOX4C1 |   138395884 |          0 |   69262252 | 6.919794e+06 |
    |      CAS_COUNT_RD STAT     | MBOX5C0 |   297695738 |          0 |  149850079 | 1.488479e+07 |
    |      CAS_COUNT_WR STAT     | MBOX5C1 |   138522183 |          0 |   69420339 | 6.926109e+06 |
    |      CAS_COUNT_RD STAT     | MBOX6C0 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_WR STAT     | MBOX6C1 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_RD STAT     | MBOX7C0 |           0 |          0 |          0 |            0 |
    |      CAS_COUNT_WR STAT     | MBOX7C1 |           0 |          0 |          0 |            0 |
    +----------------------------+---------+-------------+------------+------------+--------------+
    
    +-----------------------------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
    |               Metric              |   Core 0   |   Core 1  |   Core 2  |   Core 3  |   Core 4  |   Core 5  |   Core 6  |   Core 7  |   Core 8  |   Core 9  |   Core 10  |  Core 11  |  Core 12  |  Core 13  |  Core 14  |  Core 15  |  Core 16  |  Core 17  |  Core 18  |  Core 19  |
    +-----------------------------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
    |        Runtime (RDTSC) [s]        |     1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |     1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |    1.4661 |
    |        Runtime unhalted [s]       |     1.1384 |    1.1030 |    1.1092 |    1.1018 |    1.1078 |    1.1081 |    1.1043 |    1.1011 |    1.1056 |    1.1071 |     1.1095 |    1.1093 |    1.1031 |    1.1000 |    1.1094 |    1.1053 |    1.1051 |    1.1073 |    1.1498 |    1.1091 |
    |            Clock [MHz]            |  2608.1950 | 2599.9236 | 2600.3210 | 2601.0904 | 2600.3864 | 2600.3596 | 2600.3801 | 2600.3868 | 2600.8087 | 2600.5967 |  2599.9563 | 2599.9651 | 2601.3091 | 2600.4680 | 2600.2208 | 2600.8783 | 2601.7158 | 2600.9535 | 2604.9219 | 2600.5537 |
    |                CPI                |     2.7905 |    2.9493 |    3.0210 |    2.9381 |    2.9791 |    2.9528 |    2.9215 |    2.9351 |    2.9847 |    2.9386 |     2.9966 |    2.9968 |    3.0057 |    2.9264 |    2.9958 |    2.9574 |    3.0118 |    2.9590 |    3.0320 |    2.9249 |
    |  Memory read bandwidth [MBytes/s] | 26251.9756 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 | 25821.2260 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    |  Memory read data volume [GBytes] |    38.4879 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    37.8564 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    | Memory write bandwidth [MBytes/s] | 12113.5682 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 | 12066.6777 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    | Memory write data volume [GBytes] |    17.7596 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    17.6909 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    |    Memory bandwidth [MBytes/s]    | 38365.5438 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 | 37887.9037 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    |    Memory data volume [GBytes]    |    56.2475 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |    55.5473 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |         0 |
    +-----------------------------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
    
    +----------------------------------------+------------+-----------+------------+-----------+
    |                 Metric                 |     Sum    |    Min    |     Max    |    Avg    |
    +----------------------------------------+------------+-----------+------------+-----------+
    |        Runtime (RDTSC) [s] STAT        |    29.3220 |    1.4661 |     1.4661 |    1.4661 |
    |        Runtime unhalted [s] STAT       |    22.1943 |    1.1000 |     1.1498 |    1.1097 |
    |            Clock [MHz] STAT            | 52023.3908 | 2599.9236 |  2608.1950 | 2601.1695 |
    |                CPI STAT                |    59.2171 |    2.7905 |     3.0320 |    2.9609 |
    |  Memory read bandwidth [MBytes/s] STAT | 52073.2016 |         0 | 26251.9756 | 2603.6601 |
    |  Memory read data volume [GBytes] STAT |    76.3443 |         0 |    38.4879 |    3.8172 |
    | Memory write bandwidth [MBytes/s] STAT | 24180.2459 |         0 | 12113.5682 | 1209.0123 |
    | Memory write data volume [GBytes] STAT |    35.4505 |         0 |    17.7596 |    1.7725 |
    |    Memory bandwidth [MBytes/s] STAT    | 76253.4475 |         0 | 38365.5438 | 3812.6724 |
    |    Memory data volume [GBytes] STAT    |   111.7948 |         0 |    56.2475 |    5.5897 |
    +----------------------------------------+------------+-----------+------------+-----------+
    
    • All memory related performance counters are only accounted on first CPU core on the socket
  • Validity check
    Socket 0:
    Memory  read bandwidth: 262519756
    Memory write bandwidth: 121135682
    +                       ---------
                            383655438
    Memory bandwidth:       383655438
    
    Memory write data volume socket 0:      17.7596 GB
    Memory write data volume socket 1:      17.6909 GB
    +                                       ----------
                                            35.4505 GB
    Memory write data volume [GBytes] STAT: 35.4505 GB
    
    Memory read data volume socket 0:      38.4879 GB
    Memory read data volume socket 1:      37.8564 GB
    +                                      ----------
                                           76.3443 GB
    Memory read data volume [GBytes] STAT: 76.3443 GB
    
    #Elements/vec   = 100.000.000
    #Bytes/Element  = 8
    #Bytes/vec      = 100.000.000 * 8 = 800.000.000
    #Num repetition = 10
    
    Copy:  1 Vec. read, 1 Vec. write
    Scale: 1 Vec. read, 1 Vec. write
    Add:   2 Vec. read, 1 Vec. write
    Triad: 2 Vec. read, 1 Vec. write
    
    4 vec. write * 10 repetition * 800.000.000 Bytes/vec = 32 GB
    ~ 35.4505 Memory write data volume [GBytes] STAT
    
    6 Vec. read * 10 repetition * 800.000.000 Bytes/vec = 48 GB
    !~ 76.3443 Memory read data volume [GBytes] STAT
    
    6 Vec. read + 4 Vec. write * 10 repetition * 800.000.000 Bytes/vec = 80 GB
    ~ 76.3443 Memory read data volume [GBytes] STAT
    
    • Each store to memory triggers an extra read from memory. => GNU compiler does not use non-temporal stores which can directly write to memory.

Example likwid-perfctr performance group NUMA on benchmark stream

  • Build stream benchmark with Intel compiler
    module purge
    module add compiler/intel/18.0
    icc -std=c11 -Ofast -xHost -ipo -qopenmp \
         stream.c -o stream
    
  • Set up OpenMP environment
    export OMP_NUM_THREADS=20
    
  • List available performance groups
    likwid-perfctr -a
    
    ...
            MEM     Main memory bandwidth in MBytes/s
         MEM_DP     Overview of arithmetic and main memory performance
         MEM_SP     Overview of arithmetic and main memory performance
           NUMA     Local and remote data transfers
    ...
    
  • Get detailed information on performance groups
    likwid-perfctr -H --group NUMA
    
    Group NUMA:
    Formula:
    CPI = CPU_CLK_UNHALTED_CORE/INSTR_RETIRED_ANY
    Local bandwidth [MByte/s] = 1.E-06*((SUM(REQUESTS_READS_LOCAL)+SUM(REQUESTS_WRITES_LOCAL))*64)/time
    Local data volume [GByte] = 1.E-09*(SUM(REQUESTS_READS_LOCAL)+SUM(REQUESTS_WRITES_LOCAL))*64
    Remote bandwidth [MByte/s] = 1.E-06*((SUM(REQUESTS_READS_REMOTE)+SUM(REQUESTS_WRITES_REMOTE))*64)/time
    Remote data volume [GByte] = 1.E-09*(SUM(REQUESTS_READS_REMOTE)+SUM(REQUESTS_WRITES_REMOTE))*64
    Total bandwidth [MByte/s] = 1.E-06*((SUM(REQUESTS_READS_LOCAL)+SUM(REQUESTS_WRITES_LOCAL)+SUM(REQUESTS_READS_REMOTE)+SUM(REQUESTS_WRITES_REMOTE))*64)/time
    Total data volume [GByte] = 1.E-09*(SUM(REQUESTS_READS_LOCAL)+SUM(REQUESTS_WRITES_LOCAL)+SUM(REQUESTS_READS_REMOTE)+SUM(REQUESTS_WRITES_REMOTE))*64
    --
    This performance group measures the data traffic of CPU sockets to local and remote
    CPU sockets. It uses the Home Agent for calculation. This may include also data from
    other sources than the memory controllers.
    
  • Messure performance group NUMA for benchmark stream on CPU 0 to 19 with locally allocated memory
    likwid-perfctr --group NUMA -C 0-19 \
        numactl --localalloc \
           ./stream -n 100000000
    
    ...
    -------------------------------------------------------------
    Function    Best Rate MB/s  Avg time     Min time     Max time
    Copy:          104573.5     0.015537     0.015300     0.015842
    Scale:         105859.6     0.015214     0.015114     0.015308
    Add:           108120.1     0.022280     0.022198     0.022395
    Triad:         109300.7     0.021987     0.021958     0.022040
    -------------------------------------------------------------
    
    ...
    +--------------------------------------+------------+--------------+-----------+-----------+
    |                Metric                |     Sum    |      Min     |    Max    |    Avg    |
    +--------------------------------------+------------+--------------+-----------+-----------+
    |       Runtime (RDTSC) [s] STAT       |    37.9020 |       1.8951 |    1.8951 |    1.8951 |
    |       Runtime unhalted [s] STAT      |    18.2252 |       0.8769 |    1.3908 |    0.9113 |
    |           Clock [MHz] STAT           | 58088.7700 |    2899.9413 | 2989.1844 | 2904.4385 |
    |               CPI STAT               |   128.2410 |       1.0635 |    7.0634 |    6.4120 |
    |  Local DRAM data volume [GByte] STAT |    13.3097 |       0.6477 |    0.6756 |    0.6655 |
    |  Local DRAM bandwidth [MByte/s] STAT |  7023.3007 |     341.7686 |  356.5150 |  351.1650 |
    | Remote DRAM data volume [GByte] STAT |     0.0063 | 2.496000e-05 |    0.0008 |    0.0003 |
    | Remote DRAM bandwidth [MByte/s] STAT |     3.2454 |       0.0132 |    0.4004 |    0.1623 |
    |    Memory data volume [GByte] STAT   |    13.3158 |       0.6481 |    0.6758 |    0.6658 |
    |    Memory bandwidth [MByte/s] STAT   |  7026.5460 |     342.0066 |  356.5833 |  351.3273 |
    +--------------------------------------+------------+--------------+-----------+-----------+
    
    • Remote DRAM data volume and Remote DRAM bandwidth are very low
  • Messure performance group NUMA for benchmark stream on CPU 0 to 19 with all allocated memory in NUMA domain 0
    likwid-perfctr --group NUMA -C 0-19 \
        numactl --membind=0 \
           ./stream -n 100000000
    
    ...
    -------------------------------------------------------------
    Function    Best Rate MB/s  Avg time     Min time     Max time
    Copy:           50143.6     0.031936     0.031908     0.031993
    Scale:          49960.4     0.032053     0.032025     0.032086
    Add:            56319.0     0.042653     0.042614     0.042680
    Triad:          56425.9     0.042577     0.042534     0.042612
    -------------------------------------------------------------
    
    ...
    +--------------------------------------+------------+--------------+-----------+-----------+
    |                Metric                |     Sum    |      Min     |    Max    |    Avg    |
    +--------------------------------------+------------+--------------+-----------+-----------+
    |       Runtime (RDTSC) [s] STAT       |    53.4480 |       2.6724 |    2.6724 |    2.6724 |
    |       Runtime unhalted [s] STAT      |    34.9158 |       1.6857 |    2.1850 |    1.7458 |
    |           Clock [MHz] STAT           | 58063.6648 |    2899.9915 | 2963.3627 | 2903.1832 |
    |               CPI STAT               |   167.8990 |       1.3344 |   14.4638 |    8.3950 |
    |  Local DRAM data volume [GByte] STAT |     6.5933 | 7.744000e-06 |    0.6628 |    0.3297 |
    |  Local DRAM bandwidth [MByte/s] STAT |  2467.1862 |       0.0029 |  248.0175 |  123.3593 |
    | Remote DRAM data volume [GByte] STAT |     6.6188 |            0 |    0.6689 |    0.3309 |
    | Remote DRAM bandwidth [MByte/s] STAT |  2476.7374 |            0 |  250.3028 |  123.8369 |
    |    Memory data volume [GByte] STAT   |    13.2118 |       0.6343 |    0.6689 |    0.6606 |
    |    Memory bandwidth [MByte/s] STAT   |  4943.9239 |     237.3654 |  250.3130 |  247.1962 |
    +--------------------------------------+------------+--------------+-----------+-----------+
    
    • Remote DRAM data volume and Remote DRAM bandwidth are very high
    • Memory bandwidth halved
Last modified 11 days ago Last modified on Apr 5, 2018, 7:24:07 PM