TheRock Build Resource Observability Report

Build Resource Utilization Summary

component wall_time_sum (minutes) wall_time_span (minutes) wall_time_est_elapsed (minutes) avg_concurrency peak_concurrency cpu_sum (minutes) user_sum (minutes) sys_sum (minutes) max_rss_mb max_rss_gb
rocprofiler-sdk 238.64 181.76 88.25 1.31 86 177.97 152.72 25.25 440.84 0.4305
host-blas 67.25 3.65 24.87 18.44 66 25.78 16.44 9.33 23.22 0.0227
sysdeps 44.40 24.66 16.42 1.80 67 19.23 14.50 4.73 386.68 0.3776
amd-llvm 52.28 24.14 19.34 2.17 80 17.83 11.68 6.16 408.41 0.3988
rocshmem 29.79 2.80 11.02 10.64 54 11.90 9.30 2.60 365.95 0.3574
miopen 12.51 0.64 4.63 19.61 66 10.84 8.71 2.14 323.88 0.3163
amd-dbgapi 25.88 8.98 9.57 2.88 87 8.24 4.35 3.89 236.95 0.2314
iree-compiler 24.65 14.95 9.11 1.65 29 7.16 4.60 2.55 225.07 0.2198
nlohmann-json 4.93 16.93 1.82 0.29 65 2.90 2.25 0.66 223.31 0.2181
fusilliprovider 3.44 2.91 1.27 1.18 24 2.11 1.92 0.19 804.33 0.7855
unknown 3.76 191.36 1.39 0.02 57 1.91 1.30 0.61 120.28 0.1175
fftw3 4.60 2.23 1.70 2.06 62 1.09 0.65 0.44 23.95 0.0234
rocprofiler-systems 2.92 28.58 1.08 0.10 20 1.03 0.63 0.40 37.52 0.0366
rocjpeg 0.55 0.16 0.20 3.36 7 0.48 0.43 0.05 179.77 0.1756
support 0.50 23.94 0.19 0.02 10 0.24 0.19 0.05 117.08 0.1143
hipdnn-integration-tests 0.35 0.03 0.13 11.04 15 0.16 0.12 0.04 77.88 0.0761
base 0.31 3.88 0.12 0.08 11 0.12 0.08 0.05 25.08 0.0245
rccl 0.26 164.56 0.10 0.00 66 0.12 0.07 0.05 77.71 0.0759
sparse 0.31 189.22 0.11 0.00 55 0.10 0.06 0.04 79.33 0.0775
aqlprofile 0.18 0.17 0.07 1.08 41 0.08 0.05 0.03 76.18 0.0744
rand 0.31 8.47 0.12 0.04 28 0.08 0.05 0.03 76.39 0.0746
rocdecode 0.09 0.05 0.03 1.77 13 0.07 0.05 0.02 87.22 0.0852
flatbuffers 0.07 0.30 0.03 0.23 42 0.04 0.03 0.02 24.81 0.0242
blas 0.08 162.20 0.03 0.00 1 0.03 0.02 0.01 79.46 0.0776
rocprofiler-compute 0.06 0.55 0.02 0.11 2 0.03 0.02 0.01 76.58 0.0748
hipblasltprovider 0.05 0.03 0.02 1.94 24 0.03 0.01 0.01 78.20 0.0764
rdc 0.07 0.01 0.03 6.25 14 0.02 0.02 0.01 14.80 0.0145
fft 0.08 13.86 0.03 0.01 1 0.02 0.01 0.01 80.40 0.0785
prim 0.04 14.82 0.01 0.00 2 0.01 0.01 0.01 75.00 0.0732
solver 0.03 188.25 0.01 0.00 1 0.01 0.01 0.00 78.12 0.0763
hipdnn 0.02 180.20 0.01 0.00 1 0.01 0.01 0.00 75.50 0.0737
rocr-debug-agent 0.02 10.84 0.01 0.00 3 0.01 0.01 0.00 77.64 0.0758
rocr-debug-agent-tests 0.03 0.05 0.01 0.50 1 0.01 0.00 0.00 77.57 0.0757
libhipcxx 0.01 0.02 0.01 0.60 1 0.01 0.00 0.00 77.46 0.0756
rocwmma 0.01 0.03 0.00 0.30 1 0.01 0.00 0.00 79.06 0.0772
fmt 0.01 19.39 0.01 0.00 2 0.01 0.00 0.00 14.79 0.0144
spdlog 0.01 0.00 0.00 3.82 6 0.00 0.00 0.00 14.79 0.0144
miopenprovider 0.01 0.01 0.00 0.76 1 0.00 0.00 0.00 75.14 0.0734
hipify 0.01 0.01 0.00 0.79 1 0.00 0.00 0.00 22.48 0.0220
composable-kernel 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-amdsmi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hip 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hipinfo 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-hiptests 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-kpack 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-ocl-icd 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
core-runtime 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
elfio 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipdnn-samples 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
hipkernelprovider 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
host-suite-sparse 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
openmpi 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocgdb 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
rocrtst 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-amd-mesa 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-expat 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-gmp 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-hwloc 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libmnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libnl 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-libpciaccess 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-mpfr 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000
sysdeps-ncurses 0.00 0.00 0.00 0.00 0 0.00 0.00 0.00 0.00 0.0000

General FAQ

  1. rss = Resident Set Size

    It means the amount of physical RAM actually in use by a process at its peak. High rss means memory-heavy compilation or linking. If rss is high across many parallel jobs, we can hit swapping, cache thrashing, OOM kills in CI. High rss often limits how much parallelism you can safely use (-j)

  2. max_rss_mb

    the maximum RAM a compiler or linker process had allocated in memory at any point.

  3. Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  4. What Avg Threads (avg_threads) actually mean ?

    Case 1> avg_threads ≈ 1.0

    This means process used about one CPU core for most of its runtime.

    Case 2> avg_threads < 1.0

    Meaning: The process spent a lot of time waiting, not computing (I/O, contention, throttling).

    Case 3> avg_threads > 1.0

    Meaning: The process used multiple CPU cores simultaneously (e.g. LTO backends, LLVM worker threads).

  5. avg_concurrency and peak_concurrency

    avg_concurrency measures average parallelism across tool processes for the component: avg_concurrency ≈ wall_time_sum / wall_time_span. It can be a decimal because concurrency ramps up/down during the build.

    peak_concurrency is the maximum overlap (most tool processes running at the same time) for that component.

  6. What user_sum, sys_sum mean and cpu_sum mean ? (Note: all times are in minutes)

    user_sum → time spent executing your code (compiler, linker, optimizer logic)

    If this is high:

    a) the build is compute-heavy

    b) faster CPUs, fewer templates, or fewer TUs help

    c) more parallelism may help if avg_threads > 1

    sys_sum → time spent inside the operating system kernel

    a) If this is high:

    b) you’re often I/O-bound

    c) disk speed, filesystem, caching, or build directory layout matters

    d) adding more CPUs will not help much

    cpu_sum → total CPU time = user_sum_min + sys_sum_min

    a) How much CPU did this component cost overall?

    b) It’s the best metric for:

    c) capacity planning

    d) CI cost estimation

    e) "what’s expensive" comparisons between components

  • Wall time sum vs component span vs estimated elapsed

    wall_time_sum is the sum of wall times across all commands for that component. In parallel builds this can be far larger than the total build duration because many commands run concurrently.

    wall_time_span_min is the component’s elapsed span: max(end_time) - min(start_time) across all commands belonging to that component. This is usually much closer to “how long the build spent working on that component”, though it can still overlap with other components (parallelism).

    wall_time_est_elapsed_min is an estimated elapsed contribution computed from average build concurrency: we compute avg_concurrency_build = sum(real_s) / build_span_s using timestamps, and then estimate: wall_time_est_elapsed ≈ wall_time_sum / avg_concurrency_build.

    That doesn’t mean the build is slower; it means processes are spending time waiting (I/O, scheduling, throttling, contention), so CPU time isn’t keeping up with wall time.

  • Key mental model