- capture_timings.py: fail when captured/total ratio < 50% (--min-capture-ratio). Prevents silent pass on unreachable Prometheus. - run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so the final exit code reflects it. Update exit code docs in header. - compare_to_baseline.py: extract _skip_delta helper to bring compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound resolution (use explicit None check instead of `or`). Remove dead variable override_prefix_key. - prom_queries.py: extract _build_simple_entries and _build_job_entries to bring build_query_plan under 80 lines. Fix module docstring return type example. Use aiohttp.ClientTimeout instead of bare int. - telemetry-validation.yml: add set -euo pipefail to regression summary step; guard jq calls with -e flag and fallback; fail on missing baseline file; emit ::warning annotation when timings.json missing. - baselines/README.md: document the placeholder field.
Performance Baselines
This directory holds the committed baseline file used by the OTel-driven regression gate.
How the gate works
After the validation suite runs, capture_timings.py queries Prometheus for the timings
declared in ../regression-metrics.json and writes a
timings.json. Then compare_to_baseline.py reads baseline-timings.json,
../regression-thresholds.json, and the captured
timings.json. The comparator picks one of two modes automatically:
- Placeholder baseline (
"placeholder": trueor emptymetrics): the comparator prints the captured timings JSON in exactly the format expected for this file, then exits 0 without gating. This is how we bootstrap the baseline. - Populated baseline: the comparator diffs per-metric, enforces the thresholds (regression = current exceeds baseline on BOTH the percentage AND absolute bound), and exits non-zero on any regression.
The regression gate runs against whatever workload profile run-full-validation.sh
was invoked with. Capture and comparison are profile-agnostic — they only read
Prometheus — so all existing profiles (full-validation, quick-smoke, stress)
continue to work unchanged.
Bootstrapping the baseline
- Merge a CI run with a
"placeholder": truebaseline. The telemetry-validation workflow runs, fails no gate, and prints the captured timings block to the workflow Step Summary under the heading### Paste into baselines/baseline-timings.json. - Open a new PR. Copy the full JSON block from the Step Summary (or download the
timings.jsonartifact) into this file, replacing the placeholder contents. The JSON is emitted in the exact byte-for-byte format this file expects — sorted keys, 2-space indent, trailing newline. - The committed baseline PR needs reviewer approval just like any other code change. This is the primary audit point for "who moved the performance bar."
Refreshing the baseline
Refresh when a legitimate performance change lands on develop (for example, a
deliberate rewrite that changes a span's structure). The process is identical to
bootstrapping: run CI with the current baseline, inspect the delta, and if the
new numbers should become the norm, open a PR pasting the fresh timings into
baseline-timings.json. The reviewer decides whether the new baseline is acceptable.
Do not edit baseline-timings.json by hand outside of this process — every entry
should trace back to a real CI run so variance characteristics are preserved.
Schema
{
"schema_version": 1,
"captured_at": "2026-04-24T17:30:00Z",
"window": "3m",
"git_sha": "<SHA of the commit that produced these numbers>",
"profile": "<workload profile used>",
"metrics": {
"span.tx.process.p99": { "value": 12.4, "unit": "ms" },
"rpc.server_info.p95": { "value": 850.0, "unit": "us" },
"job.transaction.queued.p95": { "value": 1500.0, "unit": "us" }
}
}
Placeholder baselines additionally include "placeholder": true. The comparator
detects this field (or an empty metrics object) to switch into "populate" mode
instead of enforcing thresholds. Remove the placeholder key when pasting real
captured timings.
Missing metrics (value null) in a captured run do not count as regressions — they
are reported separately in regression-report.json under missing_in_current.
This keeps the gate robust when a profile doesn't exercise every span on every run.