Files
rippled/docker/telemetry/workload/baselines
Pratik Mankawde cb9fce6890 fix(telemetry): align Phase 10 workload harness with current OTel recording surface + fix CI
The Phase 10 validation harness had drifted from the code's recording surface
and the telemetry-validation CI job was failing before it could build.

CI fix (telemetry-validation.yml):
- Replace nonexistent local action ./.github/actions/print-env with the remote
  XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this).
- Sync prepare-runner and upload-artifact action SHAs to the canonical workflow.

Recording-surface reconciliation (docker/telemetry/workload/):
- Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore
  form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id,
  ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...).
  Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared
  constant), while consensus.validation.send uses bare ledger_hash.
- Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq);
  ledger.build carries ledger_seq/close_* (not tx_count/tx_failed).
- Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop
  the never-emitted duration_ms; rebuild the parent-child map accordingly.
- Add the new spans the code emits: apply-pipeline stage spans
  (tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq.*,
  consensus sub-spans (round/establish/update_positions/check/phase.open),
  ledger.acquire, grpc.*, pathfind.*. Conditional spans are marked optional so
  they are skipped (not failed) when the workload does not exercise them.
- validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix
  PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span
  attrs); add optional-span handling that skips missing optional spans while still
  validating attributes when present.
- expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics,
  xrpld_job_count, the 15 on-disk xrpld-* dashboard UIDs, and the real bare
  spanmetrics dimension labels.
- regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message.

Metrics pipeline fix:
- Switch node [insight] config from server=statsd/prefix=rippled to server=otel +
  /v1/metrics endpoint + prefix=xrpld across run-full-validation.sh,
  xrpld-validator.cfg.template, benchmark.sh and the workload compose. The
  collector has no StatsD receiver, so system metrics only reach Prometheus over
  OTLP.

Synthetic load for new spans:
- Add ripple_path_find to the RPC load generator (drives pathfind.* spans).
- Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.*).

All facts verified against the *SpanNames.h headers and a live xrpld node +
collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type],
279 xrpld_ Prometheus metrics and zero rippled_).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 17:08:58 +01:00
..

Performance Baselines

This directory holds the committed baseline file used by the OTel-driven regression gate.

How the gate works

After the validation suite runs, capture_timings.py queries Prometheus for the timings declared in ../regression-metrics.json and writes a timings.json. Then compare_to_baseline.py reads baseline-timings.json, ../regression-thresholds.json, and the captured timings.json. The comparator picks one of two modes automatically:

  • Placeholder baseline ("placeholder": true or empty metrics): the comparator prints the captured timings JSON in exactly the format expected for this file, then exits 0 without gating. This is how we bootstrap the baseline.
  • Populated baseline: the comparator diffs per-metric, enforces the thresholds (regression = current exceeds baseline on BOTH the percentage AND absolute bound), and exits non-zero on any regression.

The regression gate runs against whatever workload profile run-full-validation.sh was invoked with. Capture and comparison are profile-agnostic — they only read Prometheus — so all existing profiles (full-validation, quick-smoke, stress) continue to work unchanged.

Bootstrapping the baseline

  1. Merge a CI run with a "placeholder": true baseline. The telemetry-validation workflow runs, fails no gate, and prints the captured timings block to the workflow Step Summary under the heading ### Paste into baselines/baseline-timings.json.
  2. Open a new PR. Copy the full JSON block from the Step Summary (or download the timings.json artifact) into this file, replacing the placeholder contents. The JSON is emitted in the exact byte-for-byte format this file expects — sorted keys, 2-space indent, trailing newline.
  3. The committed baseline PR needs reviewer approval just like any other code change. This is the primary audit point for "who moved the performance bar."

Refreshing the baseline

Refresh when a legitimate performance change lands on develop (for example, a deliberate rewrite that changes a span's structure). The process is identical to bootstrapping: run CI with the current baseline, inspect the delta, and if the new numbers should become the norm, open a PR pasting the fresh timings into baseline-timings.json. The reviewer decides whether the new baseline is acceptable.

Do not edit baseline-timings.json by hand outside of this process — every entry should trace back to a real CI run so variance characteristics are preserved.

Schema

{
  "schema_version": 1,
  "captured_at": "2026-04-24T17:30:00Z",
  "window": "3m",
  "git_sha": "<SHA of the commit that produced these numbers>",
  "profile": "<workload profile used>",
  "metrics": {
    "span.tx.process.p99": { "value": 12.4, "unit": "ms" },
    "rpc.server_info.p95": { "value": 850.0, "unit": "us" },
    "job.transaction.queued.p95": { "value": 1500.0, "unit": "us" }
  }
}

Placeholder baselines additionally include "placeholder": true. The comparator detects this field (or an empty metrics object) to switch into "populate" mode instead of enforcing thresholds. Remove the placeholder key when pasting real captured timings.

Missing metrics (value null) in a captured run do not count as regressions — they are reported separately in regression-report.json under missing_in_current. This keeps the gate robust when a profile doesn't exercise every span on every run.