TheRock Build Resource Observability Report

Build Resource Utilization Summary

component wall_time_sum (minutes) wall_time_span (minutes) wall_time_est_elapsed (minutes) avg_concurrency peak_concurrency cpu_sum (minutes) user_sum (minutes) sys_sum (minutes) max_rss_mb max_rss_gb
host-blas 304.99 5.40 71.23 56.50 132 35.80 16.13 19.67 45.19 0.0441
amd-llvm 150.54 20.98 35.16 7.17 67 21.84 11.57 10.27 408.46 0.3989
rocprofiler-sdk 42.94 135.57 10.03 0.32 51 20.88 16.19 4.70 1956.36 1.9105
amd-dbgapi 29.52 9.39 6.89 3.14 125 9.54 5.30 4.23 240.38 0.2347
iree-compiler 22.27 13.73 5.20 1.62 38 8.02 5.00 3.02 225.50 0.2202
sysdeps 13.34 23.84 3.11 0.56 63 4.68 2.72 1.97 132.89 0.1298
fftw3 45.82 3.35 10.70 13.66 66 2.27 0.54 1.74 24.07 0.0235
unknown 5.52 146.59 1.29 0.04 63 2.08 1.25 0.83 120.31 0.1175
rocprofiler-systems 3.25 36.02 0.76 0.09 23 1.16 0.70 0.46 37.61 0.0367
nlohmann-json 3.81 14.58 0.89 0.26 66 0.64 0.33 0.32 22.17 0.0217
miopen 1.60 0.74 0.37 2.16 22 0.61 0.36 0.25 75.89 0.0741
fusilliprovider 1.33 0.81 0.31 1.64 18 0.53 0.36 0.17 99.22 0.0969
rocshmem 0.73 2.49 0.17 0.29 26 0.33 0.25 0.08 117.04 0.1143
support 1.04 20.26 0.24 0.05 10 0.28 0.20 0.08 117.33 0.1146
base 0.33 5.63 0.08 0.06 10 0.14 0.08 0.06 25.12 0.0245
rccl 0.41 127.09 0.10 0.00 66 0.14 0.07 0.07 78.11 0.0763
aqlprofile 1.57 0.12 0.37 13.28 66 0.14 0.05 0.08 76.11 0.0743
sparse 0.40 50.97 0.09 0.01 55 0.12 0.07 0.05 79.94 0.0781
rand 0.33 10.36 0.08 0.03 28 0.10 0.05 0.04 77.27 0.0755
flatbuffers 0.08 0.29 0.02 0.29 42 0.05 0.03 0.02 24.87 0.0243
blas 0.12 29.30 0.03 0.00 1 0.05 0.03 0.02 79.83 0.0780
hipblasltprovider 0.10 0.08 0.02 1.31 16 0.04 0.03 0.01 77.47 0.0757
rocprofiler-compute 0.08 0.22 0.02 0.38 2 0.03 0.02 0.01 76.93 0.0751
hipdnn-integration-tests 0.08 0.01 0.02 7.07 12 0.03 0.01 0.01 22.28 0.0218
fft 0.08 2.05 0.02 0.04 1 0.03 0.02 0.01 79.82 0.0780
rdc 0.09 0.01 0.02 6.76 13 0.03 0.02 0.01 14.79 0.0144
rocdecode 0.05 0.05 0.01 0.91 13 0.03 0.01 0.01 87.44 0.0854
solver 0.08 49.66 0.02 0.00 2 0.02 0.01 0.01 77.42 0.0756
prim 0.04 14.33 0.01 0.00 4 0.02 0.01 0.01 75.10 0.0733
rocjpeg 0.03 0.04 0.01 0.67 7 0.01 0.01 0.01 77.39 0.0756
hipdnn 0.03 41.62 0.01 0.00 1 0.01 0.01 0.00 75.12 0.0734
rocwmma 0.03 0.11 0.01 0.24 1 0.01 0.01 0.00 79.45 0.0776
rocr-debug-agent 0.03 10.52 0.01 0.00 3 0.01 0.01 0.00 77.45 0.0756
rocr-debug-agent-tests 0.03 0.05 0.01 0.52 1 0.01 0.01 0.00 77.79 0.0760
libhipcxx 0.01 0.02 0.00 0.63 1 0.01 0.00 0.00 78.14 0.0763
fmt 0.01 14.53 0.00 0.00 2 0.01 0.00 0.00 14.70 0.0144
spdlog 0.01 0.00 0.00 4.80 6 0.00 0.00 0.00 14.72 0.0144
miopenprovider 0.01 0.02 0.00 0.59 1 0.00 0.00 0.00 75.11 0.0734
hipify 0.01 0.01 0.00 0.77 1 0.00 0.00 0.00 22.38 0.0219
composable-kernel 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-amdsmi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hip 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hipinfo 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hiptests 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-kpack 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl-icd 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-runtime 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
elfio 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipdnn-samples 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipkernelprovider 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
host-suite-sparse 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
openmpi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocgdb 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocrtst 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-amd-mesa 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-expat 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-gmp 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-hwloc 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libmnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libpciaccess 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-mpfr 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-ncurses 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000

General FAQ

  1. rss = Resident Set Size

    It means the amount of physical RAM actually in use by a process at its peak. High rss means memory-heavy compilation or linking. If rss is high across many parallel jobs, we can hit swapping, cache thrashing, OOM kills in CI. High rss often limits how much parallelism you can safely use (-j)

  2. max_rss_mb

    the maximum RAM a compiler or linker process had allocated in memory at any point.

  3. Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  4. What Avg Threads (avg_threads) actually mean ?

    Case 1> avg_threads ≈ 1.0

    This means process used about one CPU core for most of its runtime.

    Case 2> avg_threads < 1.0

    Meaning: The process spent a lot of time waiting, not computing (I/O, contention, throttling).

    Case 3> avg_threads > 1.0

    Meaning: The process used multiple CPU cores simultaneously (e.g. LTO backends, LLVM worker threads).

  5. avg_concurrency and peak_concurrency

    avg_concurrency measures average parallelism across tool processes for the component: avg_concurrency ≈ wall_time_sum / wall_time_span. It can be a decimal because concurrency ramps up/down during the build.

    peak_concurrency is the maximum overlap (most tool processes running at the same time) for that component.

  6. What user_sum, sys_sum mean and cpu_sum mean ? (Note: all times are in minutes)

    user_sum → time spent executing your code (compiler, linker, optimizer logic)

    If this is high:

    a) the build is compute-heavy

    b) faster CPUs, fewer templates, or fewer TUs help

    c) more parallelism may help if avg_threads > 1

    sys_sum → time spent inside the operating system kernel

    a) If this is high:

    b) you’re often I/O-bound

    c) disk speed, filesystem, caching, or build directory layout matters

    d) adding more CPUs will not help much

    cpu_sum → total CPU time = user_sum_min + sys_sum_min

    a) How much CPU did this component cost overall?

    b) It’s the best metric for:

    c) capacity planning

    d) CI cost estimation

    e) "what’s expensive" comparisons between components

  • Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency_build = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency_build.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  • Key mental model