Files
rippled/docker/telemetry/workload/baselines/README.md
Pratik Mankawde 577d1f8a21 fix: address review findings in regression gate
- capture_timings.py: fail when captured/total ratio < 50%
  (--min-capture-ratio). Prevents silent pass on unreachable Prometheus.
- run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so
  the final exit code reflects it. Update exit code docs in header.
- compare_to_baseline.py: extract _skip_delta helper to bring
  compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound
  resolution (use explicit None check instead of `or`). Remove dead
  variable override_prefix_key.
- prom_queries.py: extract _build_simple_entries and _build_job_entries
  to bring build_query_plan under 80 lines. Fix module docstring return
  type example. Use aiohttp.ClientTimeout instead of bare int.
- telemetry-validation.yml: add set -euo pipefail to regression summary
  step; guard jq calls with -e flag and fallback; fail on missing
  baseline file; emit ::warning annotation when timings.json missing.
- baselines/README.md: document the placeholder field.
2026-04-24 19:36:15 +01:00

73 lines
3.5 KiB
Markdown

# Performance Baselines
This directory holds the committed baseline file used by the OTel-driven regression gate.
## How the gate works
After the validation suite runs, `capture_timings.py` queries Prometheus for the timings
declared in [`../regression-metrics.json`](../regression-metrics.json) and writes a
`timings.json`. Then `compare_to_baseline.py` reads [`baseline-timings.json`](./baseline-timings.json),
[`../regression-thresholds.json`](../regression-thresholds.json), and the captured
`timings.json`. The comparator picks one of two modes automatically:
- **Placeholder baseline** (`"placeholder": true` or empty `metrics`): the comparator
prints the captured timings JSON in exactly the format expected for this file, then
exits 0 without gating. This is how we bootstrap the baseline.
- **Populated baseline**: the comparator diffs per-metric, enforces the thresholds
(regression = current exceeds baseline on BOTH the percentage AND absolute bound),
and exits non-zero on any regression.
The regression gate runs against whatever workload profile `run-full-validation.sh`
was invoked with. Capture and comparison are profile-agnostic — they only read
Prometheus — so all existing profiles (`full-validation`, `quick-smoke`, `stress`)
continue to work unchanged.
## Bootstrapping the baseline
1. Merge a CI run with a `"placeholder": true` baseline. The telemetry-validation
workflow runs, fails no gate, and prints the captured timings block to the workflow
Step Summary under the heading `### Paste into baselines/baseline-timings.json`.
2. Open a new PR. Copy the full JSON block from the Step Summary (or download the
`timings.json` artifact) into this file, replacing the placeholder contents. The
JSON is emitted in the exact byte-for-byte format this file expects — sorted keys,
2-space indent, trailing newline.
3. The committed baseline PR needs reviewer approval just like any other code change.
This is the primary audit point for "who moved the performance bar."
## Refreshing the baseline
Refresh when a legitimate performance change lands on `develop` (for example, a
deliberate rewrite that changes a span's structure). The process is identical to
bootstrapping: run CI with the current baseline, inspect the delta, and if the
new numbers should become the norm, open a PR pasting the fresh timings into
`baseline-timings.json`. The reviewer decides whether the new baseline is acceptable.
Do **not** edit `baseline-timings.json` by hand outside of this process — every entry
should trace back to a real CI run so variance characteristics are preserved.
## Schema
```json
{
"schema_version": 1,
"captured_at": "2026-04-24T17:30:00Z",
"window": "3m",
"git_sha": "<SHA of the commit that produced these numbers>",
"profile": "<workload profile used>",
"metrics": {
"span.tx.process.p99": { "value": 12.4, "unit": "ms" },
"rpc.server_info.p95": { "value": 850.0, "unit": "us" },
"job.transaction.queued.p95": { "value": 1500.0, "unit": "us" }
}
}
```
Placeholder baselines additionally include `"placeholder": true`. The comparator
detects this field (or an empty `metrics` object) to switch into "populate" mode
instead of enforcing thresholds. Remove the `placeholder` key when pasting real
captured timings.
Missing metrics (value `null`) in a captured run do not count as regressions — they
are reported separately in `regression-report.json` under `missing_in_current`.
This keeps the gate robust when a profile doesn't exercise every span on every run.