The Phase 10 validation harness had drifted from the code's recording surface
and the telemetry-validation CI job was failing before it could build.
CI fix (telemetry-validation.yml):
- Replace nonexistent local action ./.github/actions/print-env with the remote
XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this).
- Sync prepare-runner and upload-artifact action SHAs to the canonical workflow.
Recording-surface reconciliation (docker/telemetry/workload/):
- Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore
form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id,
ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...).
Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared
constant), while consensus.validation.send uses bare ledger_hash.
- Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq);
ledger.build carries ledger_seq/close_* (not tx_count/tx_failed).
- Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop
the never-emitted duration_ms; rebuild the parent-child map accordingly.
- Add the new spans the code emits: apply-pipeline stage spans
(tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq.*,
consensus sub-spans (round/establish/update_positions/check/phase.open),
ledger.acquire, grpc.*, pathfind.*. Conditional spans are marked optional so
they are skipped (not failed) when the workload does not exercise them.
- validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix
PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span
attrs); add optional-span handling that skips missing optional spans while still
validating attributes when present.
- expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics,
xrpld_job_count, the 15 on-disk xrpld-* dashboard UIDs, and the real bare
spanmetrics dimension labels.
- regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message.
Metrics pipeline fix:
- Switch node [insight] config from server=statsd/prefix=rippled to server=otel +
/v1/metrics endpoint + prefix=xrpld across run-full-validation.sh,
xrpld-validator.cfg.template, benchmark.sh and the workload compose. The
collector has no StatsD receiver, so system metrics only reach Prometheus over
OTLP.
Synthetic load for new spans:
- Add ripple_path_find to the RPC load generator (drives pathfind.* spans).
- Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.*).
All facts verified against the *SpanNames.h headers and a live xrpld node +
collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type],
279 xrpld_ Prometheus metrics and zero rippled_).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The tx.transactor span covered only the apply stage; preflight and
preclaim had no telemetry, so a transaction that hard-failed those
stages produced no apply-pipeline span and per-stage latency/failure
was invisible.
Add tx.preflight and tx.preclaim spans in applySteps.cpp via a
makeStageSpan() helper using SpanGuard::hashSpan, so all three stages
share a deterministic trace_id derived from txID[0:16] even though they
run sequentially and often cross-thread. Each span carries stage,
tx_type, and ter_result; exceptions are recorded as tefEXCEPTION before
the public wrappers map them. The type lookup is guarded behind the
span-active check so it costs nothing when tracing is off.
Add a stage="apply" attribute to the tx.transactor span and move its
three hardcoded attribute strings to a new library-safe header
include/xrpl/tx/detail/TxApplySpanNames.h, which mirrors the daemon-side
TxSpanNames.h strings so the collector spanmetrics connector aggregates
both span sets under one dimension set.
A constants-contract test pins the span-name, attribute-key, and
stage-value strings; span content stays covered by the docker
integration test, as the rest of the telemetry suite is.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: semgrep-companion-app[bot] <218312740+semgrep-companion-app[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
- `src/test/telemetry/MetricsRegistry_test.cpp` (Beast `unit_test::suite`
format under `src/test/`) duplicates the GTest version already
maintained at `src/tests/libxrpl/telemetry/MetricsRegistry.cpp`.
Project rule (`tasks/lessons.md` §Test Format): all new tests use
GTest under `src/tests/libxrpl/`. The GTest version exercises the
same four cases (disabled construction, start/stop lifecycle, recording
no-op, destructor-calls-stop). Deleting the Beast duplicate eliminates
drift and keeps the test authoritative in one place.
- Drop the matching `test.telemetry > xrpl.basics/xrpl.core/xrpld.telemetry`
entries from `.github/scripts/levelization/results/ordering.txt`
because `xrpl.test.telemetry` (the GTest binary) retains its own
entries; the removed ones belonged to the deleted Beast suite.
- `.claude/instructions.md` was committed as a symlink to an
author-local absolute path (`/home/pratik/sourceCode/personal/Rippled/
instructions.md`) that does not exist for any other contributor or in
CI. Remove the symlink from git tracking and add `.claude/` to
`.gitignore` so future agent commits do not re-add per-developer
settings.