Commit Graph

2 Commits

Author SHA1 Message Date
Pratik Mankawde
577d1f8a21 fix: address review findings in regression gate
- capture_timings.py: fail when captured/total ratio < 50%
  (--min-capture-ratio). Prevents silent pass on unreachable Prometheus.
- run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so
  the final exit code reflects it. Update exit code docs in header.
- compare_to_baseline.py: extract _skip_delta helper to bring
  compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound
  resolution (use explicit None check instead of `or`). Remove dead
  variable override_prefix_key.
- prom_queries.py: extract _build_simple_entries and _build_job_entries
  to bring build_query_plan under 80 lines. Fix module docstring return
  type example. Use aiohttp.ClientTimeout instead of bare int.
- telemetry-validation.yml: add set -euo pipefail to regression summary
  step; guard jq calls with -e flag and fallback; fail on missing
  baseline file; emit ::warning annotation when timings.json missing.
- baselines/README.md: document the placeholder field.
2026-04-24 19:36:15 +01:00
Pratik Mankawde
df79d5e74b feat: add OTel-driven regression gate for Phase 10 telemetry validation
Captures per-span / per-RPC / per-job timings from Prometheus after the
workload run and diffs them against a committed baseline. Regression
requires breaching both a percentage and an absolute bound, tolerating
small-value noise. When the baseline is a placeholder, the comparator
emits the captured JSON in the exact schema for one-time paste into
baselines/baseline-timings.json, and the CI Step Summary surfaces that
block for the reviewer.

Scope: gate only — automated baseline persistence, benchmark.sh
PromQL migration, and the historical trend dashboard remain follow-ups.
2026-04-24 18:53:44 +01:00