- capture_timings.py: fail when captured/total ratio < 50%
(--min-capture-ratio). Prevents silent pass on unreachable Prometheus.
- run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so
the final exit code reflects it. Update exit code docs in header.
- compare_to_baseline.py: extract _skip_delta helper to bring
compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound
resolution (use explicit None check instead of `or`). Remove dead
variable override_prefix_key.
- prom_queries.py: extract _build_simple_entries and _build_job_entries
to bring build_query_plan under 80 lines. Fix module docstring return
type example. Use aiohttp.ClientTimeout instead of bare int.
- telemetry-validation.yml: add set -euo pipefail to regression summary
step; guard jq calls with -e flag and fallback; fail on missing
baseline file; emit ::warning annotation when timings.json missing.
- baselines/README.md: document the placeholder field.
Captures per-span / per-RPC / per-job timings from Prometheus after the
workload run and diffs them against a committed baseline. Regression
requires breaching both a percentage and an absolute bound, tolerating
small-value noise. When the baseline is a placeholder, the comparator
emits the captured JSON in the exact schema for one-time paste into
baselines/baseline-timings.json, and the CI Step Summary surfaces that
block for the reviewer.
Scope: gate only — automated baseline persistence, benchmark.sh
PromQL migration, and the historical trend dashboard remain follow-ups.
The Jaeger-removal rebase used --ours conflict resolution which
dropped content added by intermediate phases (6-8): StatsD receiver,
filelog receiver, Loki service/exporter, health_check extension,
and OTLP metrics pipeline. Restore from pre-rebase origin minus
Jaeger references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix remaining Jaeger references that accumulated across intermediate
branches in the stacked PR chain. These were in files modified by
multiple phases where the per-branch fixes didn't cover all additions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate validate_telemetry.py to Tempo TraceQL search API, remove
Jaeger service from workload docker-compose, update readiness checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-include boost/asio/detail/socket_types.hpp on Windows before OTel
SDK headers to ensure WinSock2.h is included before WinSock.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ValidationTracker.cpp to xrpl.test.telemetry target sources
(implementation lives in src/xrpld/ but has no OTel SDK dependency)
- Change BEAST_DEFINE_TESTSUITE namespace from ripple to xrpl
- Replace recursive *.md glob with non-recursive GLOB in XrplDocs.cmake
to avoid picking up .claude/instructions.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update MockServiceRegistry to match current ServiceRegistry interface
(17 method renames: get* prefix, PathRequests→PathRequestManager)
- Make throwUnimplemented() static to satisfy clang-tidy
- Regenerate levelization ordering.txt and loops.txt
- Remove 'rippled' prefix from 3 StatsD dashboard titles
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix validations_checked_total recording site (NetworkOPs, not LedgerMaster)
- Add Slide 11 to presentation: External Dashboard Parity overview with
Mermaid diagrams for new metric categories, ValidationTracker sequence,
and new dashboard summary
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove duplicate 'system-node-health' UID from expected_metrics.json
(already covered by 'rippled-system-node-health')
- Add parity span attributes to expected_spans.json: node health on
rpc.command.*, validation hash/full on consensus.validation.send,
quorum/proposers on consensus.accept, validation hash/full on
peer.validation.receive
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add agreement_pct_7d, agreements_7d, missed_7d labels to the
rippled_validation_agreement observable gauge, matching the external
xrpl-validator-dashboard's 7-day tracking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ValidationTracker member to MetricsRegistry with a public accessor,
register a rippled_validation_agreement observable gauge that calls
reconcile() and reports 1h/24h agreement percentages and counts, and
hook recordOurValidation/recordNetworkValidation into RCLConsensus
validate() and LedgerMaster setValidLedger() respectively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire incrementValidationsChecked() into NetworkOPs::recvValidation() so
each received network validation increments the counter.
Note: incrementJqTransOverflow() hook is deferred — JobQueue has no
explicit overflow event path; the counter is reserved for future use.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add rippled_validation_agreements_total and rippled_validation_missed_total
counter declarations and creation (wiring to ValidationTracker pending rebase)
- Fix peer-quality dashboard: query rippled_server_info{metric="peer_disconnects_resources"}
instead of non-existent rippled_Overlay_Peer_Disconnects_Charges
- Remove dead getCountsJson() call in storageDetail callback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename fee labels to match spec: base_fee_drops -> base_fee_xrp,
reserve_base_drops -> reserve_base_xrp, reserve_inc_drops -> reserve_inc_xrp
- Add peers_insane_count (stub with TODO for PeerImp::tracking_ exposure)
- Add transaction_rate to ledger economy gauge
- Replace node_store_writes/node_written_bytes with nudb_bytes per spec
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validator health, peer quality, ledger economy, state tracking, and
storage detail observable gauges plus 5 synchronous counters with recording
hooks for ledger close, validation send, state change, and overflow events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add window7d_ deque, agreementPct7d(), agreements7d(), missed7d() to
match the external xrpl-validator-dashboard's 7-day agreement tracking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover normal agreement, missed validation, late repair, empty window,
grace period boundary, max pending trimming, mixed results, duplicate
recording, and only-we-validated scenarios.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use >= instead of > for grace period comparison to reconcile at exactly
8 seconds rather than skipping the boundary
- Two-pass hard trim: first remove entries past late-repair window, then
any reconciled entry, to avoid sabotaging late repairs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone class that tracks whether this validator's validations agree
with network consensus, maintaining rolling 1h/24h windows and lifetime
totals with a late-repair mechanism for out-of-order arrivals.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds ValidationTracker (agreement computation with 8s grace period),
validator health, peer quality, ledger economy, state tracking,
storage detail gauges, 7 synchronous counters, and agreement gauge.
29 new metrics covering validation agreement, peer quality, UNL health,
ledger economy, state tracking, and upgrade awareness.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridge the existing beast::insight gauge for resource-limit peer
disconnects (peerDisconnectsCharges_) into the StatsD metric inventory.
Part of the external dashboard parity initiative.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The xrpld/app/misc/CanonicalTXSet.h header doesn't exist — it was
incorrectly added during a rebase conflict resolution. The correct
include xrpl/ledger/CanonicalTXSet.h is already present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ledger hash and full-validation flag to peer.validation.receive
spans for trace-level agreement analysis across validators.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>