Omniperf version: 2.0.0-RC1
Profiler choice: rocprofv1
Path: /home1/josantos/omniperf/tests/workloads/ipblocks_TA/MI200
Target: MI200
Command: ./tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
IP Blocks: ['ta']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_0.txt
   |-> [rocprof] RPL: on '240321_164135' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_0.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164135_4168185'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164135_4168185/input0_results_240321_164135'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164135_4168185/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 9 metrics
   |-> [rocprof] SQ_CYCLES, SQ_BUSY_CYCLES, SQ_BUSY_CU_CYCLES, SQ_WAVES, SQ_WAVE_CYCLES, GRBM_COUNT, GRBM_GUI_ACTIVE, TA_TA_BUSY_sum, TA_BUFFER_WAVEFRONTS_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164135_4168185/input0_results_240321_164135
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_0.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_1.txt
   |-> [rocprof] RPL: on '240321_164135' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_1.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164135_4168386'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164135_4168386/input0_results_240321_164135'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164135_4168386/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_BUFFER_READ_WAVEFRONTS_sum, TA_BUFFER_WRITE_WAVEFRONTS_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164135_4168386/input0_results_240321_164135
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_1.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_2.txt
   |-> [rocprof] RPL: on '240321_164136' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_2.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164136_4168578'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164136_4168578/input0_results_240321_164136'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164136_4168578/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_BUFFER_ATOMIC_WAVEFRONTS_sum, TA_BUFFER_TOTAL_CYCLES_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164136_4168578/input0_results_240321_164136
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_2.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_3.txt
   |-> [rocprof] RPL: on '240321_164136' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_3.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164136_4168765'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164136_4168765/input0_results_240321_164136'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164136_4168765/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_BUFFER_COALESCED_READ_CYCLES_sum, TA_BUFFER_COALESCED_WRITE_CYCLES_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164136_4168765/input0_results_240321_164136
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_3.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_4.txt
   |-> [rocprof] RPL: on '240321_164137' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_4.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164137_4168966'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164137_4168966/input0_results_240321_164137'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164137_4168966/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_ADDR_STALLED_BY_TC_CYCLES_sum, TA_TOTAL_WAVEFRONTS_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164137_4168966/input0_results_240321_164137
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_4.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_5.txt
   |-> [rocprof] RPL: on '240321_164137' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_5.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164137_4169167'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164137_4169167/input0_results_240321_164137'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164137_4169167/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_ADDR_STALLED_BY_TD_CYCLES_sum, TA_DATA_STALLED_BY_TC_CYCLES_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164137_4169167/input0_results_240321_164137
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_5.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_6.txt
   |-> [rocprof] RPL: on '240321_164138' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_6.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164138_4169355'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164138_4169355/input0_results_240321_164138'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164138_4169355/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_FLAT_WAVEFRONTS_sum, TA_FLAT_READ_WAVEFRONTS_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164138_4169355/input0_results_240321_164138
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_6.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_7.txt
   |-> [rocprof] RPL: on '240321_164138' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/pmc_perf_7.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164138_4169558'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164138_4169558/input0_results_240321_164138'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164138_4169558/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 2 metrics
   |-> [rocprof] TA_FLAT_WRITE_WAVEFRONTS_sum, TA_FLAT_ATOMIC_WAVEFRONTS_sum
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164138_4169558/input0_results_240321_164138
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/pmc_perf_7.csv' is generating
   |-> [rocprof]
[profiling] Current input file: tests/workloads/ipblocks_TA/MI200/perfmon/timestamps.txt
   |-> [rocprof] RPL: on '240321_164139' from '/opt/rocm-6.0.2' in '/home1/josantos/omniperf'
   |-> [rocprof] RPL: profiling '""./tests/vcopy -n 1048576 -b 256 -i 3""'
   |-> [rocprof] RPL: input file 'tests/workloads/ipblocks_TA/MI200/perfmon/timestamps.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240321_164139_4169744'
   |-> [rocprof] RPL: result dir '/tmp/rpl_data_240321_164139_4169744/input0_results_240321_164139'
   |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_240321_164139_4169744/input0.xml"
   |-> [rocprof] gpu_index =
   |-> [rocprof] kernel =
   |-> [rocprof] range =
   |-> [rocprof] 0 metrics
   |-> [rocprof] vcopy testing on GCD 0
   |-> [rocprof] Finished allocating vectors on the CPU
   |-> [rocprof] Finished allocating vectors on the GPU
   |-> [rocprof] Finished copying vectors to the GPU
   |-> [rocprof] sw thinks it moved 1.000000 KB per wave
   |-> [rocprof] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprof] Launching the  kernel on the GPU
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished executing kernel
   |-> [rocprof] Finished copying the output vector from the GPU to the CPU
   |-> [rocprof] Releasing GPU memory
   |-> [rocprof] Releasing CPU memory
   |-> [rocprof]
   |-> [rocprof] ROCPRofiler: 3 contexts collected, output directory /tmp/rpl_data_240321_164139_4169744/input0_results_240321_164139
   |-> [rocprof] File 'tests/workloads/ipblocks_TA/MI200/timestamps.csv' is generating
   |-> [rocprof]
[roofline] Checking for roofline.csv in tests/workloads/ipblocks_TA/MI200
[roofline] No roofline data found. Generating...
