Running Hotspots on expo/expo Without Running Out of Memory

Stephen CollinsApr 12, 2026
6 mins

Key Points

Why use SQLite as an intermediate store instead of just streaming JSON directly during analysis?

Streaming JSON output requires the full set of function metrics to be available at write time — but some fields (percentile thresholds, callgraph-derived scores like betweenness and PageRank) can only be computed after the full dataset is known. SQLite lets each phase write its outputs as it goes, then subsequent phases query over those rows. The result is bounded peak memory without requiring a two-pass design or holding intermediate structures in RAM. A flat streaming approach would either require two full passes over the source files, or defer percentile-dependent fields entirely.

When should I use --skip-touch-metrics vs --no-per-function-touches?

--skip-touch-metrics is the hard exit — all touch counts are 0, no git log I/O at all. Use it for a fast first-pass on a very large repo or in CI where you don't have a persistent cache and touch accuracy doesn't matter. --no-per-function-touches keeps touch metrics but switches to file-level batching — one git log call per file instead of one per function. This is a good default for large repos where you want real touch data but per-function git log -L would be too slow. The warm-run cache (populated on first run) makes per-function mode fast on subsequent runs, so the tradeoff mainly applies to cold starts.

What does --callgraph-skip-above actually skip, and what still runs?

When the edge count exceeds the threshold, betweenness centrality computation (and the derived PageRank ranking) is skipped. Fan-in and fan-out are still computed — those are O(N+E) passes over the edge list, not the O(N×(N+E)) betweenness algorithm. The practical effect is that the callgraph still provides structural metrics (how many functions call this function, how many does it call), but the "hub score" ranking that identifies functions sitting on many shortest paths is omitted. For repos where the callgraph is very dense, betweenness is already approximate anyway — skipping it entirely loses a display signal, not a risk-scoring input.

What is convergence PageRank and why does the fixed-iteration approach waste time?

PageRank is an iterative algorithm — each iteration refines rank values based on the previous iteration's output. The fixed-iteration approach runs a preset number of iterations (e.g., 100) regardless of whether values have stabilized. Convergence PageRank exits early when the maximum rank delta across all nodes drops below a threshold (1e-6). For most callgraphs this happens in 20–40 iterations. The savings are proportional to how quickly the graph converges, which depends on its structure — sparse graphs with clear hubs converge faster than dense, uniform ones. The fixed-iteration count was essentially a guess at "good enough"; convergence detection makes it exact.

Why are Go vendor/generated files now excluded by default rather than requiring a config?

The default excludes list was already opinionated about test files and node_modules. Go vendor directories and generated files (*.pb.go, zz_generated*.go) have the same property — they're not authored code, they don't benefit from complexity analysis, and they pollute top-risk rankings with noise. Requiring every Go project to configure this explicitly was unnecessary friction. The Kubernetes benchmark made this concrete — without the excludes, generated protobuf files and vendor code dominated the output.

This post was originally published on hotspots.dev.


After the VS Code stress test last month, I knew Hotspots had an O(N³) callgraph bug. I fixed it. Approximate betweenness held up at 102k functions. Things looked reasonable.

Then I tried expo/expo.

expo is a React Native platform with a massive monorepo — the kind of codebase that tests every assumption you made about “large.” The analysis didn’t produce a slow result. It produced an OOM and a crash.

That was the start of a focused push to make Hotspots genuinely viable on large repos, not just fast-on-demos viable.

What Was Causing the OOM

The first thing I did was look at where memory actually accumulates during analysis.

The snapshot pipeline in Hotspots works roughly like this: parse files, compute metrics, run the callgraph, collect touch data from git, assemble FunctionRiskReport structs, and then serialize them. The problem was that all of those structs were accumulated in a Vec in memory before any serialization happened. For a repo with 50k+ functions, each with callgraph metrics, touch counts, driver labels, and risk factor breakdowns, that Vec becomes very large before anything is written to disk.

The fix was to move to a SQLite-backed snapshot store (T4 in the internal ticket numbering). Instead of accumulating function reports in RAM, each function is written to SQLite as it’s produced. The serialization pipeline — whether to JSON, JSONL, or HTML — reads from SQLite row by row. Peak memory is now bounded by the size of a single row plus the SQLite page cache, not the full dataset.

The schema is straightforward: a commits table keyed by SHA, a functions table keyed by (commit_sha, file, name, start_line), and a percentile_thresholds table computed once per commit using NTILE(100) over the function scores. The TempDb type (in-memory SQLite) handles analysis mode; SnapshotDb (file-backed) handles the persistent snapshot store.

// Before: accumulate everything, then serialize
let mut reports: Vec<FunctionRiskReport> = Vec::new();
// ... populate ...
let json = serde_json::to_string(&reports)?;  // full dataset in RAM

// After: write as you go, stream on read
let db = TempDb::new()?;
for report in reports_iter {
    db.insert_function(&commit_sha, &report)?;
}
// Read back row-by-row when serializing

806 lines of new code in hotspots-core/src/db/mod.rs, replacing what had been an implicit accumulation pattern spread across the pipeline.

The Touch Metrics Problem

Memory wasn’t the only bottleneck. Touch metrics — how many times a function was changed in the last 30 days — required git log -L calls. Per-function, that’s one subprocess per function. On a repo with 50k+ functions, this adds up quickly.

Parallel touch metrics (--jobs) was the first fix. The touch computation was moved to a channel-based parallel pipeline: a configurable number of worker threads each drain a shared queue of git log -L requests. The --jobs flag controls parallelism. For local development on a laptop, --jobs 4 is reasonable; the benchmark infrastructure runs at --jobs 2 for reproducibility.

Batched stale-file detection was a related fix. A separate git call was checking last-modified dates for “stale” files (not recently changed). This was running one git log call per file instead of batching all files into a single invocation. On a repo with thousands of source files, this produced a visible CPU spike during analysis — many short-lived subprocesses instead of one longer one.

These two changes together make touch metrics practical at scale. But for very large repos, even a parallel, batched implementation pays a real cost. The --skip-touch-metrics flag provides a hard exit:

hotspots analyze . --skip-touch-metrics

All touch counts are reported as 0. Fan-out, fan-in, cyclomatic complexity, nesting depth — everything except the git-log-derived fields — still runs. For a first-pass analysis on a 50k+ function codebase, skipping touch metrics reduces cold-start time significantly (measured at ~66 seconds saved on expo/expo).

There’s also --no-per-function-touches, a softer version: use file-level touch batching (one git log per file, shared across all functions in that file) instead of the more accurate per-function git log -L. Less git I/O, slightly lower accuracy.

The choice between the three modes:

ModeAccuracyCold-start cost
Per-function (--per-function-touches)Highest~66s (expo/expo)
File-level batching (--no-per-function-touches)Good~15s (expo/expo)
Skip entirely (--skip-touch-metrics)Touch = 0Negligible

The CPU Problem: Callgraph at Scale

The betweenness approximation from the VS Code work handles large callgraphs correctly — but it’s still doing real work. For repos where the callgraph has an extremely high edge count, even the approximate algorithm is expensive.

--callgraph-skip-above N is an escape hatch: if the callgraph has more than N edges, skip betweenness centrality computation entirely. Fan-in and fan-out are still computed (those are O(N+E) passes); only the O(N×(N+E)) betweenness (and the derived PageRank) is skipped. The flag accepts a threshold in edges:

hotspots analyze . --callgraph-skip-above 100000

Alongside this, convergence PageRank replaced the fixed-iteration approach. The previous PageRank ran a fixed number of iterations regardless of whether the values had stabilized. The new implementation exits early when the maximum rank delta across all nodes drops below 1e-6. For most callgraphs this converges in 20–40 iterations; the old fixed-iteration default was 100. On a large repo where PageRank is actually running, this is a real reduction in wall time.

Eliminating Duplicate Allocations in the Callgraph (T3.1)

While profiling expo/expo, I found that function IDs were being allocated twice per function during callgraph construction: once when building the node index, and again when resolving call edges. This was a linear-cost redundancy — not a bottleneck on small repos, but visible on a 50k-function graph.

The fix was a single-pass construction that reuses the node ID from the initial insertion during edge resolution. 60 lines changed in lib.rs. No behavior change, ~2× fewer allocations during callgraph build.

Go Support

While working on the Kubernetes benchmark (a comparably large codebase), vendored and generated files dominated the top-risk list — same problem as the minified JS issue in VS Code, but in Go form. vendor/**, *.pb.go, and zz_generated*.go are now part of the default excludes:

// Added to DEFAULT_EXCLUDES in config.rs
"**/vendor/**",
"**/*.pb.go",
"**/zz_generated*.go",

No config file needed. Go projects get clean analysis output by default.

What This Looks Like in Practice

If you’re running Hotspots on a large repo and hitting performance issues, the practical recommendation:

Small-to-medium repos (< 5k functions): No changes needed. Default settings work.

Large repos (5k–50k functions): Add --no-per-function-touches to use file-level touch batching. Consider --callgraph-skip-above 50000 if callgraph analysis is slow.

Very large repos (50k+ functions): Start with --skip-touch-metrics for a fast first pass. If you need touch data, run without the flag and let the cache warm — subsequent runs are fast.

# First pass on a large repo
hotspots analyze . --skip-touch-metrics --callgraph-skip-above 100000

# Once you have a sense of the codebase, warm the touch cache
hotspots analyze . --callgraph-skip-above 100000

The configuration file equivalent for file-level batching:

{
  "per_function_touches": false
}

The --skip-touch-metrics flag doesn’t have a config file equivalent by design — it’s meant as a conscious override, not a default you set and forget.

The SQLite Architecture Going Forward

The SQLite store is T4 in a longer planned refactor. The goal is to pipeline every phase of analysis through SQLite rather than holding intermediate data in memory. Phase outputs feed directly into the next phase as SQL queries rather than in-memory data structures.

At the extremes — repos with 200k+ functions — the current approach still materializes some intermediate state. The next phase of this work would replace those remaining in-memory collections with SQL views and streaming cursors. The infrastructure for that is now in place; T4 was the foundation.

For now, the OOM is gone, expo/expo runs to completion, and the escape-hatch flags make the tradeoffs explicit.