TheRock Build Resource Observability Report

Build Resource Utilization Summary

component wall_time_sum (minutes) wall_time_span (minutes) wall_time_est_elapsed (minutes) avg_concurrency peak_concurrency cpu_sum (minutes) user_sum (minutes) sys_sum (minutes) max_rss_mb max_rss_gb
rocprofiler-sdk 680.44 143.24 96.62 4.75 68 312.73 273.96 38.78 1953.85 1.9081
sysdeps 159.24 58.87 22.61 2.71 67 41.75 32.35 9.40 385.93 0.3769
host-blas 69.61 3.77 9.88 18.45 66 26.37 16.66 9.71 23.26 0.0227
amd-llvm 54.47 19.75 7.73 2.76 66 17.57 11.25 6.32 407.85 0.3983
rocshmem 28.19 3.03 4.00 9.31 52 11.37 8.89 2.49 366.24 0.3577
amd-dbgapi 22.26 9.00 3.16 2.47 57 8.61 4.50 4.11 234.76 0.2293
miopen 21.71 1.54 3.08 14.06 66 8.49 7.25 1.24 757.05 0.7393
iree-compiler 23.68 14.16 3.36 1.67 33 7.32 4.66 2.66 225.13 0.2199
unknown 3.55 152.65 0.50 0.02 62 1.85 1.24 0.62 120.32 0.1175
rocjpeg 1.29 0.30 0.18 4.25 7 1.21 1.09 0.12 179.77 0.1756
fftw3 4.77 2.29 0.68 2.08 62 1.09 0.66 0.42 24.09 0.0235
rocprofiler-systems 3.45 36.28 0.49 0.09 19 1.08 0.66 0.42 37.67 0.0368
nlohmann-json 2.69 51.23 0.38 0.05 62 0.72 0.48 0.23 223.16 0.2179
fusilliprovider 1.44 36.95 0.20 0.04 18 0.49 0.33 0.16 99.05 0.0967
support 0.49 43.37 0.07 0.01 10 0.25 0.20 0.06 117.15 0.1144
base 0.29 3.99 0.04 0.07 9 0.13 0.08 0.05 24.91 0.0243
rccl 0.27 132.90 0.04 0.00 66 0.12 0.07 0.05 77.77 0.0759
sparse 0.25 71.74 0.04 0.00 26 0.11 0.07 0.04 79.40 0.0775
rand 0.19 8.57 0.03 0.02 17 0.08 0.06 0.03 77.30 0.0755
aqlprofile 0.18 0.28 0.03 0.64 32 0.08 0.05 0.03 76.19 0.0744
flatbuffers 0.07 0.30 0.01 0.24 42 0.04 0.02 0.02 24.77 0.0242
blas 0.16 39.68 0.02 0.00 1 0.04 0.02 0.02 80.11 0.0782
hipblasltprovider 0.11 16.66 0.02 0.01 13 0.04 0.02 0.01 78.11 0.0763
rocprofiler-compute 0.10 1.15 0.01 0.09 2 0.03 0.02 0.01 76.61 0.0748
hipdnn-integration-tests 0.08 0.01 0.01 6.38 14 0.03 0.01 0.01 21.85 0.0213
rdc 0.10 0.01 0.01 6.46 15 0.03 0.02 0.01 14.79 0.0144
fft 0.07 33.03 0.01 0.00 1 0.03 0.02 0.01 80.10 0.0782
rocdecode 0.04 0.05 0.01 0.86 12 0.02 0.01 0.01 87.45 0.0854
solver 0.04 68.68 0.01 0.00 1 0.02 0.01 0.01 77.53 0.0757
prim 0.03 13.86 0.00 0.00 4 0.01 0.01 0.01 75.43 0.0737
hipdnn 0.03 63.15 0.00 0.00 1 0.01 0.01 0.00 75.30 0.0735
rocwmma 0.02 0.09 0.00 0.26 1 0.01 0.01 0.00 79.94 0.0781
rocr-debug-agent 0.02 10.09 0.00 0.00 3 0.01 0.01 0.00 77.37 0.0756
rocr-debug-agent-tests 0.02 0.05 0.00 0.50 1 0.01 0.01 0.00 77.79 0.0760
libhipcxx 0.01 0.02 0.00 0.61 1 0.01 0.00 0.00 78.14 0.0763
fmt 0.01 14.09 0.00 0.00 2 0.01 0.00 0.00 14.78 0.0144
spdlog 0.01 0.00 0.00 5.27 6 0.00 0.00 0.00 14.67 0.0143
miopenprovider 0.01 0.02 0.00 0.67 1 0.00 0.00 0.00 75.04 0.0733
hipify 0.01 0.01 0.00 0.80 1 0.00 0.00 0.00 22.48 0.0220
composable-kernel 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-amdsmi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hip 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hipinfo 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hiptests 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-kpack 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl-icd 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-runtime 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
elfio 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipdnn-samples 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipkernelprovider 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
host-suite-sparse 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
openmpi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocgdb 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocrtst 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-amd-mesa 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-expat 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-gmp 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-hwloc 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libmnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libpciaccess 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-mpfr 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-ncurses 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000

General FAQ

  1. rss = Resident Set Size

    It means the amount of physical RAM actually in use by a process at its peak. High rss means memory-heavy compilation or linking. If rss is high across many parallel jobs, we can hit swapping, cache thrashing, OOM kills in CI. High rss often limits how much parallelism you can safely use (-j)

  2. max_rss_mb

    the maximum RAM a compiler or linker process had allocated in memory at any point.

  3. Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  4. What Avg Threads (avg_threads) actually mean ?

    Case 1> avg_threads ≈ 1.0

    This means process used about one CPU core for most of its runtime.

    Case 2> avg_threads < 1.0

    Meaning: The process spent a lot of time waiting, not computing (I/O, contention, throttling).

    Case 3> avg_threads > 1.0

    Meaning: The process used multiple CPU cores simultaneously (e.g. LTO backends, LLVM worker threads).

  5. avg_concurrency and peak_concurrency

    avg_concurrency measures average parallelism across tool processes for the component: avg_concurrency ≈ wall_time_sum / wall_time_span. It can be a decimal because concurrency ramps up/down during the build.

    peak_concurrency is the maximum overlap (most tool processes running at the same time) for that component.

  6. What user_sum, sys_sum mean and cpu_sum mean ? (Note: all times are in minutes)

    user_sum → time spent executing your code (compiler, linker, optimizer logic)

    If this is high:

    a) the build is compute-heavy

    b) faster CPUs, fewer templates, or fewer TUs help

    c) more parallelism may help if avg_threads > 1

    sys_sum → time spent inside the operating system kernel

    a) If this is high:

    b) you’re often I/O-bound

    c) disk speed, filesystem, caching, or build directory layout matters

    d) adding more CPUs will not help much

    cpu_sum → total CPU time = user_sum_min + sys_sum_min

    a) How much CPU did this component cost overall?

    b) It’s the best metric for:

    c) capacity planning

    d) CI cost estimation

    e) "what’s expensive" comparisons between components

  • Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency_build = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency_build.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  • Key mental model