| component | wall_time_sum (minutes) | wall_time_span (minutes) | wall_time_est_elapsed (minutes) | avg_concurrency | peak_concurrency | cpu_sum (minutes) | user_sum (minutes) | sys_sum (minutes) | max_rss_mb | max_rss_gb |
|---|---|---|---|---|---|---|---|---|---|---|
| rocprofiler-sdk | 231.01 | 173.83 | 65.20 | 1.33 | 67 | 67.70 | 49.90 | 17.80 | 1953.93 | 1.9081 |
| host-blas | 146.85 | 4.90 | 41.45 | 29.98 | 132 | 31.17 | 17.00 | 14.17 | 23.29 | 0.0227 |
| amd-dbgapi | 92.99 | 9.94 | 26.25 | 9.36 | 337 | 21.53 | 13.18 | 8.36 | 235.55 | 0.2300 |
| amd-llvm | 87.18 | 22.16 | 24.61 | 3.93 | 79 | 21.36 | 12.32 | 9.05 | 409.04 | 0.3995 |
| iree-compiler | 34.92 | 16.83 | 9.86 | 2.08 | 43 | 8.67 | 5.04 | 3.63 | 225.16 | 0.2199 |
| sysdeps | 19.75 | 108.75 | 5.58 | 0.18 | 62 | 6.59 | 4.23 | 2.36 | 455.30 | 0.4446 |
| unknown | 5.40 | 184.16 | 1.53 | 0.03 | 87 | 2.26 | 1.29 | 0.96 | 120.30 | 0.1175 |
| fftw3 | 17.21 | 3.18 | 4.86 | 5.41 | 132 | 1.34 | 0.63 | 0.71 | 24.11 | 0.0235 |
| rocprofiler-systems | 4.90 | 51.87 | 1.38 | 0.09 | 34 | 1.27 | 0.70 | 0.57 | 37.49 | 0.0366 |
| nlohmann-json | 3.39 | 99.50 | 0.96 | 0.03 | 66 | 0.69 | 0.33 | 0.36 | 22.21 | 0.0217 |
| miopen | 4.56 | 0.84 | 1.29 | 5.40 | 61 | 0.64 | 0.35 | 0.29 | 75.95 | 0.0742 |
| fusilliprovider | 1.38 | 82.61 | 0.39 | 0.02 | 18 | 0.59 | 0.37 | 0.22 | 99.29 | 0.0970 |
| support | 0.79 | 35.06 | 0.22 | 0.02 | 10 | 0.35 | 0.27 | 0.08 | 143.61 | 0.1402 |
| rocshmem | 0.81 | 3.73 | 0.23 | 0.22 | 35 | 0.35 | 0.24 | 0.10 | 117.13 | 0.1144 |
| base | 0.39 | 5.10 | 0.11 | 0.08 | 9 | 0.16 | 0.08 | 0.08 | 25.01 | 0.0244 |
| rccl | 0.33 | 164.37 | 0.09 | 0.00 | 66 | 0.15 | 0.07 | 0.08 | 77.61 | 0.0758 |
| sparse | 0.45 | 126.05 | 0.13 | 0.00 | 55 | 0.13 | 0.07 | 0.06 | 78.80 | 0.0770 |
| aqlprofile | 1.17 | 0.12 | 0.33 | 10.04 | 65 | 0.12 | 0.05 | 0.06 | 76.26 | 0.0745 |
| rand | 0.31 | 10.93 | 0.09 | 0.03 | 28 | 0.10 | 0.05 | 0.04 | 76.94 | 0.0751 |
| flatbuffers | 0.12 | 0.31 | 0.03 | 0.38 | 42 | 0.05 | 0.03 | 0.03 | 24.87 | 0.0243 |
| blas | 0.16 | 102.62 | 0.05 | 0.00 | 1 | 0.05 | 0.03 | 0.02 | 79.97 | 0.0781 |
| hipblasltprovider | 0.12 | 0.16 | 0.03 | 0.72 | 14 | 0.05 | 0.03 | 0.02 | 77.66 | 0.0758 |
| rocprofiler-compute | 0.11 | 1.58 | 0.03 | 0.07 | 2 | 0.03 | 0.02 | 0.01 | 76.50 | 0.0747 |
| rdc | 0.13 | 0.02 | 0.04 | 8.06 | 14 | 0.03 | 0.02 | 0.01 | 14.80 | 0.0145 |
| fft | 0.07 | 20.87 | 0.02 | 0.00 | 1 | 0.03 | 0.02 | 0.01 | 80.11 | 0.0782 |
| hipdnn-integration-tests | 0.07 | 0.01 | 0.02 | 5.86 | 10 | 0.03 | 0.01 | 0.02 | 21.77 | 0.0213 |
| rocdecode | 0.06 | 0.06 | 0.02 | 1.03 | 13 | 0.03 | 0.01 | 0.02 | 86.63 | 0.0846 |
| solver | 0.07 | 123.29 | 0.02 | 0.00 | 1 | 0.02 | 0.01 | 0.01 | 77.54 | 0.0757 |
| prim | 0.04 | 15.43 | 0.01 | 0.00 | 4 | 0.02 | 0.01 | 0.01 | 75.31 | 0.0735 |
| rocjpeg | 0.03 | 0.04 | 0.01 | 0.68 | 7 | 0.02 | 0.01 | 0.01 | 77.78 | 0.0760 |
| rocwmma | 0.03 | 0.10 | 0.01 | 0.25 | 1 | 0.01 | 0.01 | 0.01 | 79.13 | 0.0773 |
| hipdnn | 0.03 | 115.37 | 0.01 | 0.00 | 1 | 0.01 | 0.01 | 0.01 | 75.50 | 0.0737 |
| rocr-debug-agent | 0.03 | 11.15 | 0.01 | 0.00 | 3 | 0.01 | 0.01 | 0.00 | 77.64 | 0.0758 |
| rocr-debug-agent-tests | 0.03 | 0.06 | 0.01 | 0.51 | 1 | 0.01 | 0.01 | 0.00 | 77.73 | 0.0759 |
| libhipcxx | 0.01 | 0.02 | 0.00 | 0.63 | 1 | 0.01 | 0.00 | 0.00 | 78.00 | 0.0762 |
| fmt | 0.01 | 14.43 | 0.00 | 0.00 | 2 | 0.01 | 0.00 | 0.00 | 14.78 | 0.0144 |
| spdlog | 0.01 | 0.00 | 0.00 | 4.56 | 6 | 0.00 | 0.00 | 0.00 | 14.79 | 0.0144 |
| miopenprovider | 0.01 | 0.01 | 0.00 | 0.75 | 1 | 0.00 | 0.00 | 0.00 | 75.03 | 0.0733 |
| hipify | 0.01 | 0.01 | 0.00 | 0.77 | 1 | 0.00 | 0.00 | 0.00 | 22.57 | 0.0220 |
| composable-kernel | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-amdsmi | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-hip | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-hipinfo | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-hiptests | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-kpack | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-ocl | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-ocl-icd | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| core-runtime | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| elfio | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| hipdnn-samples | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| hipkernelprovider | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| host-suite-sparse | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| openmpi | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| rocgdb | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| rocrtst | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-amd-mesa | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-expat | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-gmp | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-hwloc | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-libmnl | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-libnl | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-libpciaccess | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-mpfr | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
| sysdeps-ncurses | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0000 |
It means the amount of physical RAM actually in use by a process at its peak. High rss means memory-heavy compilation or linking. If rss is high across many parallel jobs, we can hit swapping, cache thrashing, OOM kills in CI. High rss often limits how much parallelism you can safely use (-j)
the maximum RAM a compiler or linker process had allocated in memory at any point.
wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.
wall_time_span_min is the component’s elapsed span:
max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to
“how long the build spent working on that component”, though it can still overlap with other components (parallelism).
wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency:
we compute avg_concurrency = sum(real_s) / build_span_s using timestamps, and then estimate:
wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency.
That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.
Case 1> avg_threads ≈ 1.0
This means process used about one CPU core for most of its runtime.
Case 2> avg_threads < 1.0
Meaning: The process spent a lot of time waiting, not computing (I/O, contention, throttling).
Case 3> avg_threads > 1.0
Meaning: The process used multiple CPU cores simultaneously (e.g. LTO backends, LLVM worker threads).
avg_concurrency measures average parallelism across tool processes for the component:
avg_concurrency ≈ wall_time_sum / wall_time_span.
It can be a decimal because concurrency ramps up/down during the build.
peak_concurrency is the maximum overlap (most tool processes running at the same time) for that component.
user_sum → time spent executing your code (compiler, linker, optimizer logic)
If this is high:
a) the build is compute-heavy
b) faster CPUs, fewer templates, or fewer TUs help
c) more parallelism may help if avg_threads > 1
sys_sum → time spent inside the operating system kernel
a) If this is high:
b) you’re often I/O-bound
c) disk speed, filesystem, caching, or build directory layout matters
d) adding more CPUs will not help much
cpu_sum → total CPU time = user_sum_min + sys_sum_min
a) How much CPU did this component cost overall?
b) It’s the best metric for:
c) capacity planning
d) CI cost estimation
e) "what’s expensive" comparisons between components
wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.
wall_time_span_min is the component’s elapsed span:
max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to
“how long the build spent working on that component”, though it can still overlap with other components (parallelism).
wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency:
we compute avg_concurrency_build = sum(real_s) / build_span_s using timestamps, and then estimate:
wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency_build.
That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.