TheRock Build Resource Observability Report

Build Resource Utilization Summary

component wall_time_sum (minutes) wall_time_span (minutes) wall_time_est_elapsed (minutes) avg_concurrency peak_concurrency cpu_sum (minutes) user_sum (minutes) sys_sum (minutes) max_rss_mb max_rss_gb
rocprofiler-sdk 231.01 173.83 65.20 1.33 67 67.70 49.90 17.80 1953.93 1.9081
host-blas 146.85 4.90 41.45 29.98 132 31.17 17.00 14.17 23.29 0.0227
amd-dbgapi 92.99 9.94 26.25 9.36 337 21.53 13.18 8.36 235.55 0.2300
amd-llvm 87.18 22.16 24.61 3.93 79 21.36 12.32 9.05 409.04 0.3995
iree-compiler 34.92 16.83 9.86 2.08 43 8.67 5.04 3.63 225.16 0.2199
sysdeps 19.75 108.75 5.58 0.18 62 6.59 4.23 2.36 455.30 0.4446
unknown 5.40 184.16 1.53 0.03 87 2.26 1.29 0.96 120.30 0.1175
fftw3 17.21 3.18 4.86 5.41 132 1.34 0.63 0.71 24.11 0.0235
rocprofiler-systems 4.90 51.87 1.38 0.09 34 1.27 0.70 0.57 37.49 0.0366
nlohmann-json 3.39 99.50 0.96 0.03 66 0.69 0.33 0.36 22.21 0.0217
miopen 4.56 0.84 1.29 5.40 61 0.64 0.35 0.29 75.95 0.0742
fusilliprovider 1.38 82.61 0.39 0.02 18 0.59 0.37 0.22 99.29 0.0970
support 0.79 35.06 0.22 0.02 10 0.35 0.27 0.08 143.61 0.1402
rocshmem 0.81 3.73 0.23 0.22 35 0.35 0.24 0.10 117.13 0.1144
base 0.39 5.10 0.11 0.08 9 0.16 0.08 0.08 25.01 0.0244
rccl 0.33 164.37 0.09 0.00 66 0.15 0.07 0.08 77.61 0.0758
sparse 0.45 126.05 0.13 0.00 55 0.13 0.07 0.06 78.80 0.0770
aqlprofile 1.17 0.12 0.33 10.04 65 0.12 0.05 0.06 76.26 0.0745
rand 0.31 10.93 0.09 0.03 28 0.10 0.05 0.04 76.94 0.0751
flatbuffers 0.12 0.31 0.03 0.38 42 0.05 0.03 0.03 24.87 0.0243
blas 0.16 102.62 0.05 0.00 1 0.05 0.03 0.02 79.97 0.0781
hipblasltprovider 0.12 0.16 0.03 0.72 14 0.05 0.03 0.02 77.66 0.0758
rocprofiler-compute 0.11 1.58 0.03 0.07 2 0.03 0.02 0.01 76.50 0.0747
rdc 0.13 0.02 0.04 8.06 14 0.03 0.02 0.01 14.80 0.0145
fft 0.07 20.87 0.02 0.00 1 0.03 0.02 0.01 80.11 0.0782
hipdnn-integration-tests 0.07 0.01 0.02 5.86 10 0.03 0.01 0.02 21.77 0.0213
rocdecode 0.06 0.06 0.02 1.03 13 0.03 0.01 0.02 86.63 0.0846
solver 0.07 123.29 0.02 0.00 1 0.02 0.01 0.01 77.54 0.0757
prim 0.04 15.43 0.01 0.00 4 0.02 0.01 0.01 75.31 0.0735
rocjpeg 0.03 0.04 0.01 0.68 7 0.02 0.01 0.01 77.78 0.0760
rocwmma 0.03 0.10 0.01 0.25 1 0.01 0.01 0.01 79.13 0.0773
hipdnn 0.03 115.37 0.01 0.00 1 0.01 0.01 0.01 75.50 0.0737
rocr-debug-agent 0.03 11.15 0.01 0.00 3 0.01 0.01 0.00 77.64 0.0758
rocr-debug-agent-tests 0.03 0.06 0.01 0.51 1 0.01 0.01 0.00 77.73 0.0759
libhipcxx 0.01 0.02 0.00 0.63 1 0.01 0.00 0.00 78.00 0.0762
fmt 0.01 14.43 0.00 0.00 2 0.01 0.00 0.00 14.78 0.0144
spdlog 0.01 0.00 0.00 4.56 6 0.00 0.00 0.00 14.79 0.0144
miopenprovider 0.01 0.01 0.00 0.75 1 0.00 0.00 0.00 75.03 0.0733
hipify 0.01 0.01 0.00 0.77 1 0.00 0.00 0.00 22.57 0.0220
composable-kernel 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-amdsmi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hip 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hipinfo 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hiptests 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-kpack 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl-icd 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-runtime 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
elfio 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipdnn-samples 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipkernelprovider 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
host-suite-sparse 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
openmpi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocgdb 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocrtst 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-amd-mesa 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-expat 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-gmp 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-hwloc 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libmnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libpciaccess 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-mpfr 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-ncurses 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000

General FAQ

  1. rss = Resident Set Size

    It means the amount of physical RAM actually in use by a process at its peak. High rss means memory-heavy compilation or linking. If rss is high across many parallel jobs, we can hit swapping, cache thrashing, OOM kills in CI. High rss often limits how much parallelism you can safely use (-j)

  2. max_rss_mb

    the maximum RAM a compiler or linker process had allocated in memory at any point.

  3. Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  4. What Avg Threads (avg_threads) actually mean ?

    Case 1> avg_threads ≈ 1.0

    This means process used about one CPU core for most of its runtime.

    Case 2> avg_threads < 1.0

    Meaning: The process spent a lot of time waiting, not computing (I/O, contention, throttling).

    Case 3> avg_threads > 1.0

    Meaning: The process used multiple CPU cores simultaneously (e.g. LTO backends, LLVM worker threads).

  5. avg_concurrency and peak_concurrency

    avg_concurrency measures average parallelism across tool processes for the component: avg_concurrency ≈ wall_time_sum / wall_time_span. It can be a decimal because concurrency ramps up/down during the build.

    peak_concurrency is the maximum overlap (most tool processes running at the same time) for that component.

  6. What user_sum, sys_sum mean and cpu_sum mean ? (Note: all times are in minutes)

    user_sum → time spent executing your code (compiler, linker, optimizer logic)

    If this is high:

    a) the build is compute-heavy

    b) faster CPUs, fewer templates, or fewer TUs help

    c) more parallelism may help if avg_threads > 1

    sys_sum → time spent inside the operating system kernel

    a) If this is high:

    b) you’re often I/O-bound

    c) disk speed, filesystem, caching, or build directory layout matters

    d) adding more CPUs will not help much

    cpu_sum → total CPU time = user_sum_min + sys_sum_min

    a) How much CPU did this component cost overall?

    b) It’s the best metric for:

    c) capacity planning

    d) CI cost estimation

    e) "what’s expensive" comparisons between components

  • Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency_build = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency_build.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  • Key mental model