Omniperf version: 2.0.0-RC1
Profiler choice: rocprofv1
Path: /home1/josantos/omniperf/tests/workloads/ipblocks_SPI/MI100
Target: MI100
Command: ./tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
IP Blocks: ['spi']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_0.txt
   |-> [rocprof] RPL: on '240321_155516' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_0.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155516_1285210'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155516_1285210/input0_results_240321_155516'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155516_1285210/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 7 metrics
   |-> [rocprof] SQ_CYCLES, SQ_BUSY_CYCLES, SQ_WAVES, GRBM_COUNT, GRBM_GUI_ACTIVE, SPI_CSN_WINDOW_VALID, SPI_CSN_BUSY
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155516_1285210/input0_results_240321_155516
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_0.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_1.txt
   |-> [rocprof] RPL: on '240321_155516' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_1.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155516_1285393'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155516_1285393/input0_results_240321_155516'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155516_1285393/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 3 metrics
   |-> [rocprof] GRBM_SPI_BUSY, SPI_CSN_NUM_THREADGROUPS, SPI_CSN_WAVE
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155516_1285393/input0_results_240321_155516
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_1.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_2.txt
   |-> [rocprof] RPL: on '240321_155517' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_2.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155517_1285576'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155517_1285576/input0_results_240321_155517'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155517_1285576/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_REQ_NO_ALLOC, SPI_RA_REQ_NO_ALLOC_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155517_1285576/input0_results_240321_155517
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_2.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_3.txt
   |-> [rocprof] RPL: on '240321_155517' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_3.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155517_1285763'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155517_1285763/input0_results_240321_155517'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155517_1285763/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_RES_STALL_CSN, SPI_RA_TMP_STALL_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155517_1285763/input0_results_240321_155517
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_3.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_4.txt
   |-> [rocprof] RPL: on '240321_155518' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_4.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155518_1285952'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155518_1285952/input0_results_240321_155518'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155518_1285952/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_WAVE_SIMD_FULL_CSN, SPI_RA_VGPR_SIMD_FULL_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155518_1285952/input0_results_240321_155518
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_4.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_5.txt
   |-> [rocprof] RPL: on '240321_155518' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_5.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155518_1286137'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155518_1286137/input0_results_240321_155518'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155518_1286137/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_SGPR_SIMD_FULL_CSN, SPI_RA_LDS_CU_FULL_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155518_1286137/input0_results_240321_155518
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_5.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_6.txt
   |-> [rocprof] RPL: on '240321_155518' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_6.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155518_1286320'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155518_1286320/input0_results_240321_155518'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155518_1286320/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_BAR_CU_FULL_CSN, SPI_RA_TGLIM_CU_FULL_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155518_1286320/input0_results_240321_155518
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_6.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_7.txt
   |-> [rocprof] RPL: on '240321_155519' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_7.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155519_1286503'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155519_1286503/input0_results_240321_155519'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155519_1286503/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_RA_WVLIM_STALL_CSN, SPI_SWC_CSC_WR
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155519_1286503/input0_results_240321_155519
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_7.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_8.txt
   |-> [rocprof] RPL: on '240321_155519' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/pmc_perf_8.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155519_1286686'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155519_1286686/input0_results_240321_155519'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155519_1286686/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] SPI_VWC_CSC_WR, SPI_RA_BULKY_CU_FULL_CSN
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155519_1286686/input0_results_240321_155519
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/pmc_perf_8.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_SPI/MI100/perfmon/timestamps.txt
   |-> [rocprof] RPL: on '240321_155520' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_SPI/MI100/perfmon/timestamps.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_155520_1286870'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_155520_1286870/input0_results_240321_155520'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_155520_1286870/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 0 metrics
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_155520_1286870/input0_results_240321_155520
   |-> [rocprof] File 'tests/workloads/ipblocks_SPI/MI100/timestamps.csv' is generating
   |-> [rocprof]
