Hotspots now handles monorepos — what changed from v1.11 to v1.15.1

This post was originally published on hotspots.dev.

In March I shipped hotspots diff — a command that compares any two git refs rather than just a commit to its parent. The post included a FAQ answer that acknowledged a gap:

The --auto-analyze flag (coming in a future release) will handle this by spinning up temporary git worktrees automatically.

That future is v1.15.1, released today. But getting here took more than one flag. Between v1.11.0 and v1.15.1, the most significant work was making hotspots run on large monorepos at all — repos with tens of thousands of functions where the analysis process was simply being killed by the operating system.

This post covers what changed and why.

—auto-analyze closes the cold-start gap

The diff command requires snapshots at both refs before it can run. Without --auto-analyze, a missing snapshot exits with code 3:

error: no snapshot found for ref 'main' (3a8f12c)
  → run: git checkout main && hotspots analyze --mode snapshot

With --auto-analyze, that case is handled inline:

hotspots diff main HEAD --auto-analyze

If either snapshot is missing, hotspots checks out the ref into a temporary git worktree (leaving your working tree untouched), runs a full snapshot analysis there, persists the result to .hotspots/snapshots/, and continues with the diff. The worktree is removed when done.

The tradeoff: a cold start now runs two analyses instead of one. But it eliminates the separate “create snapshot on push to main” CI step and the cache layer needed to pass it to the PR job. For teams starting out, a single self-contained diff command is usually the right starting point.

--auto-analyze shipped in v1.12.0. A follow-up in the same release fixed unique worktree path generation to prevent collisions when multiple diff jobs run concurrently.

The OOM problem

Running hotspots analyze . --mode snapshot on a large monorepo — expo/expo, kubernetes, similar — would fail with an out-of-memory kill. The analysis never completed.

There were two separate root causes, fixed in separate releases.

Root cause 1: pipeline memory overlap (v1.13.0 → v1.15.0)

The original analysis pipeline built each data structure in sequence and held them all in memory simultaneously during the overlap periods:

Phase	What was in RAM
After analysis	`Vec<FunctionRiskReport>` (~23 MB)
Building call graph	`FunctionRiskReport` + `CallGraph` (~48 MB)
After `Snapshot::new`	`CallGraph` + `Vec<FunctionSnapshot>` (~75 MB)
JSON output (pre-streaming)	`FunctionSnapshot` + JSON string (~200 MB)

On expo/expo with a depth-25 clone, this peaked around 250 MB before streaming JSON output was added in v1.13.0. Even after streaming, the CallGraph and FunctionSnapshot overlap kept peak RSS in the 150–200 MB range.

v1.13.0 addressed this with streaming JSON output and a --jobs flag for parallelism tuning. This got large repos to complete in many cases, but the peak was still too high for memory-constrained CI runners.

v1.15.0 went further. The CallGraph was rewritten to use integer indices instead of string keys, eliminating per-BFS HashMap allocations. Metric types were converted from heap-allocated strings to enums. These changes reduced the CallGraph’s footprint substantially.

v1.15.0 also introduced hybrid touch mode (--hybrid-touches N): instead of running git log -L on every function to get precise per-function touch counts, it runs file-level commit counts first (cheap), then upgrades to per-function tracking only for files that have been touched N or more times in the last 30 days. This bounds the number of expensive git log -L subprocess calls without sacrificing accuracy for the functions that matter most.

Root cause 2: co-change pair explosion (v1.15.1)

Even after the pipeline improvements, expo/expo would still occasionally OOM. The culprit was co-change analysis.

Co-change pairs are extracted from git history: for each commit, every pair of files modified together is recorded. On a normal commit touching 5–10 files, that’s 10–45 pairs — fine. But commits that touch 200+ files (version bumps, mass renames, dependency updates) generate up to 20,000 pairs per commit. At scale, this was exhausting memory before analysis even started.

v1.15.1 caps this by skipping the pair-counting loop for commits that touch more than 200 files. Those commits add noise rather than signal to co-change analysis — a version bump commit doesn’t tell you that two source files are meaningfully coupled — so skipping them improves both memory and accuracy.

The same release also added a SQLite-based pipeline buffer (TempDb): an in-memory SQLite database that acts as the single store between pipeline phases. Each phase writes its results to SQLite and drops its in-memory structure before the next phase starts. Only one large structure lives in RAM at a time.

Combined result: expo/expo at 28k functions, 5.5k files, depth-25 clone — 107 MB peak RSS, validated exit 0 with a 7 GB memory cap.

UX: knowing what’s happening during a long run

On a large repo, the touch metrics phase — where hotspots queries git history for each function — can take several minutes. Before v1.14.0, this phase was silent.

v1.14.0 added two things:

Cold cache miss warning: on first run, the touch cache is empty. Hotspots now prints a message explaining this and estimates how long the cold run will take.
Per-item progress: during touch metric collection, the output shows N/M functions so you can see it’s making progress rather than stalled.

A silent process that takes three minutes looks like a hang.

Using it today

For typical repos, nothing changes. Run hotspots analyze . --mode snapshot as before.

For large repos where you’ve hit memory pressure, use hybrid touch mode:

hotspots analyze . --mode snapshot --hybrid-touches 5

This keeps per-function touch precision for the actively-changing files, and uses cheaper file-level counts for the rest.

For zero-ceremony diffs that handle their own snapshots:

hotspots diff main HEAD --auto-analyze --policy

To upgrade to v1.15.1:

curl -fsSL https://raw.githubusercontent.com/Stephen-Collins-tech/hotspots/main/install.sh | sh

Check your current version with hotspots --version.