Skip to content

Performance and tuning

How to measure, gate, and tune Buntime performance. The local harness (bun run perf in apps/runtime/) covers the worker pool + routing; Rancher/k3s environments are in dated reports (referenced at the bottom). For the pool model itself, see worker-pool.

Runs inside apps/runtime itself. Uses temporary apps generated in .perf-fixtures/ and the real routing/pool path — useful as a baseline before external tests.

Terminal window
cd apps/runtime
bun run perf # full run (all scenarios, default gates)
bun run perf:smoke # short direct-mode, no thresholds
bun run perf:ci # short direct-mode with perf/thresholds.json
bun run perf:gate # full run with perf/thresholds.json

PERF_MODE=http (default) starts Bun.serve on a local port and drives requests via fetch — includes HTTP parsing, Hono routing, pool dispatch, body cloning, and response transfer. PERF_MODE=direct calls app.fetch in memory, isolating socket/client overhead.

ScenarioMeasuresGood for detecting
warm-noopLatency/throughput of an already-warm persistent workerRegression in routing, pool hit, plugin hooks
echo-1kb1 KiB POST body: clone, transfer to worker, responseSerialization/IPC costs in the pool
slow-50msConcurrency while worker processes 50 msBackpressure, pool fairness
ephemeral-noopCold start with ttl: 0 (new worker per request)Spawn cost; most sensitive to CPU
VarDefaultDescription
PERF_MODEhttphttp (Bun.serve) or direct (app.fetch)
PERF_DURATION_MS10000Measured duration per scenario
PERF_WARMUP_MS2000Warmup per scenario
PERF_CONCURRENCY50Concurrent loops
PERF_POOL_SIZE100LRU pool size
PERF_CLIENT_TIMEOUT_MS10000Client timeout
PERF_PORTrandomFixed port in PERF_MODE=http
PERF_SCENARIOSallCSV: warm-noop,echo-1kb
PERF_JSONunset1 for machine-readable JSON
PERF_OUTPUT_FILEunsetJSON report output path
PERF_GATE_FILEunsetJSON with thresholds (maxErrors, minRps, maxP95Ms, maxP99Ms, maxAvgMs)
PERF_KEEP_FIXTURESunset1 keeps generated apps in .perf-fixtures/

Examples:

Terminal window
PERF_DURATION_MS=30000 PERF_CONCURRENCY=200 PERF_POOL_SIZE=500 bun run perf
PERF_MODE=direct PERF_SCENARIOS=warm-noop PERF_DURATION_MS=15000 bun run perf
PERF_SCENARIOS=ephemeral-noop PERF_CONCURRENCY=5 bun run perf
PERF_GATE_FILE=perf/thresholds.json PERF_OUTPUT_FILE=perf-results.json bun run perf

These directly affect the numbers the harness measures and production behavior.

VarDefaultEffect
RUNTIME_EPHEMERAL_CONCURRENCY2Maximum parallel ttl: 0 requests. Excess goes into the queue
RUNTIME_EPHEMERAL_QUEUE_LIMIT100Queue depth before returning 503
RUNTIME_WORKER_CONFIG_CACHE_TTL_MS1000Cache TTL for worker manifest/config. 0 disables
RUNTIME_WORKER_RESOLVER_CACHE_TTL_MS1000Cache TTL for app directory lookup. 0 disables

Recommendations:

  • ttl: 0 apps (functions/serverless): keep boot cheap. Increase RUNTIME_EPHEMERAL_CONCURRENCY only with spare CPU — each request pays a full spawn.
  • ttl > 0 apps (HTTP services): TTL is sliding — each request renews it. The LRU pool evicts the oldest on fill.
  • Cache TTL = 0 only in dev when app files change constantly; in production, 1000 ms is safe.
  • See also storage about the in-memory caches.

perf/thresholds.json is intentionally conservative — it works as a smoke gate on a small runner. It checks:

  • maxErrors (default 0 even without a gate file)
  • minRps
  • maxP95Ms, maxP99Ms
  • maxAvgMs (optional)

Tighten gradually after collecting repeated samples on the same runner or Rancher/k3s environment. Do not aim for “production capacity” in the local harness — it measures regression, not capacity.

The report table shows:

MetricNotes
Requests, RPSThroughput per scenario
ErrorsTimeouts, non-2xx/3xx — treated as gate failures
p50, p95, p99 (ms)Latency. P99 tends to regress first
Pool active workersShould converge to PERF_POOL_SIZE in warm scenarios
Pool hit rateExpected >90% in warm-noop; ~0% in ephemeral-noop
Worker creations / failuresSpike in ephemeral-noop; should be stable in warm
Heap usageFor detecting leaks between runs

Exit code 1 on any error or gate violation — intentionally “fail loud” in CI.

Dated reports (do not copy content — these are snapshots, read as supplementary material):

ReportFocus
2026-05-01-performance-rancher-pod-load.mdk6 against GET /_/api/health on the pod (Ingress + TLS + Traefik); pod impact (CPU/mem)
2026-05-01-performance-rancher-worker-routes.mdk6 against temporary worker routes (perf-noop, perf-echo, perf-slow, perf-ephemeral) installed on Rancher

Both run against https://buntime.home, namespace zomme, with lab TLS (--insecure-skip-tls-verify). The second covers warm + 1 KiB POST + slow 50 ms + cold churn (ttl: 0) with the gateway rate limit (100 req/min) active.

  • bun test and bun run lint:types passing on both sides.
  • Comparing http and direct separates runtime overhead from socket/client overhead.
  • Watch warm-noop p95/p99 + pool hit rate after any change to routing, config loading, worker resolution, or plugin hooks.
  • Track ephemeral-noop separately — it is cold-start churn, not steady-state.
  • Use the same runner (laptop, CI runner, Rancher pod) between runs: comparing runs across different machines is not apples-to-apples.

Mapping of “change type → sensitive scenario”:

Changed…Run…Check first
Routing / Hono middlewarewarm-noopp95/p99, RPS, pool hit rate
Plugin onRequest hookwarm-noopp95/p99 (additive overhead per hook)
Worker pool LRU / lifecyclewarm-noop + ephemeral-noopWorker creations, pool size, hit rate
IPC / wrapper.ts (worker)echo-1kbp95/p99 and throughput on large request body
Config loading / app.yaml parsingephemeral-noopCold-start RPS
Worker resolver (app lookup)ephemeral-noopCold-start RPS, failures
  • .perf-fixtures/ is deleted by default between runs; use PERF_KEEP_FIXTURES=1 to inspect generated manifests.
  • The harness does not exercise Ingress, TLS, Traefik, NetworkPolicy, or K8s scheduling — for that, run k6 on Rancher (links above).
  • On laptops with thermal throttling, long runs (PERF_DURATION_MS > 60000) tend to degrade — use short runs for regression and long runs for capacity.
  • PERF_MODE=direct deliberately ignores HTTP server overhead. Do not use it alone as a baseline — always compare with http.
  • worker-pool — LRU model, sliding TTL, ephemeral queue.
  • storage — tunable in-memory caches, file stores.
  • All runtime env vars in one place — see the deployment configuration reference.