Add the new synchronous counters (ledger_history_mismatch_total{reason},
txq_expired_total, txq_dropped_total{reason}) and the reduce-relay observable
gauge to the ASCII ownership diagram in the MetricsRegistry header so the
documented instrument inventory matches the code.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The transaction reduce-relay subsystem (selected vs suppressed peers,
feature-disabled peers, missing-tx frequency) was computed in OverlayImpl's
TxMetrics but only surfaced via the get_counts JSON RPC — invisible to
Prometheus/Grafana, despite being the central efficiency KPI for the feature.
Add an observable gauge xrpld_reduce_relay_metrics{metric} that reads
Overlay::txMetrics() and parses its rolling-average fields:
- selected_peers (txr_selected_cnt)
- suppressed_peers (txr_suppressed_cnt)
- not_enabled_peers (txr_not_enabled_cnt)
- missing_tx_freq (txr_missing_tx_freq)
The JSON values are decimal strings (std::to_string), parsed via std::stoll —
the same JSON-reading pattern as registerNodeStoreGauge. No new Overlay
accessor or core-interface change required.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
InboundLedger drives ledger back-fill and fork recovery with timeout/retry
logic (kLedgerTimeoutRetriesMax = 6), but emitted only a global ledger_fetches
counter — sync/recovery cost was a telemetry blind spot.
Add a ledger.acquire span that wraps the acquisition lifecycle:
- Started in InboundLedger::init() with ledger_seq and acquire_reason
(history / consensus / generic, mirroring InboundLedger::Reason).
- Finalized in InboundLedger::done() with outcome (complete / failed),
timeouts, and peer_count, then reset so the span duration is exported.
Held as a std::optional<SpanGuard> member (same pattern as RCLConsensus
roundSpan_). New op/attr/val constants added to LedgerSpanNames.h. Compiles to
a no-op when telemetry is disabled via the SpanGuard fallback.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The transaction queue had no metric for demand that leaves or never enters the
queue, so fee-underpayment abandonment and admission-control rejection were
invisible (distinct from jq_trans_overflow, which is the job queue).
Add two synchronous counters via MetricsRegistry:
- xrpld_txq_expired_total — incremented in TxQ::processClosedLedger() for each
queued transaction removed because its LastLedgerSequence passed (submitters
who under-bid the escalating fee and were never included).
- xrpld_txq_dropped_total{reason} — incremented in TxQ::apply() at the
queue-full admission-control returns (reason="queue_full").
Both reach MetricsRegistry via the Application& parameter already passed to
these methods; calls are null-guarded so they no-op when telemetry is disabled.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
LedgerHistory::handleMismatch() already classifies a built-vs-validated ledger
mismatch (prior ledger, close time, consensus tx set, same/different tx set),
but only bumped a single untyped beast::insight counter — the reason was
dropped. Fork diagnosis was therefore a log-grep exercise.
Add a labeled OTel counter so the mismatch reason is a queryable time series:
- MetricsRegistry: new ledgerHistoryMismatchCounter_ + incrementLedgerHistoryMismatch(reason)
- LedgerHistory: record one reason per classification branch (unknown,
prior_ledger, close_time, consensus_txset, same_txset_diff_result,
different_txset). Reaches MetricsRegistry via the existing app_ reference.
The existing beast::insight mismatchCounter_ is left intact.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The xrpld_peer_quality{metric="peers_insane_count"} gauge was hardcoded to 0.0
with a TODO, leaving the "Insane/Diverged Peers" panel permanently empty.
PeerImp::json() already exposes the peer's tracking state via the "track"
field (set to "diverged" when tracking_ == Tracking::Diverged). The peer-quality
callback already iterates peer->json() for latency and version, so count peers
whose "track" field equals "diverged" in the same loop — no change to the
abstract Peer interface required.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The OTel C++ SDK's SetAttribute appends rather than overwrites on
in-flight spans. Setting suppressed=false as a default then overriding
to true resulted in both values appearing in the exported span.
Fix: remove the default-false set, place suppressed=false once after
the HashRouter check passes (non-suppressed path), and suppressed=true
remains only in the suppressed path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.
Adds libxrpl.tx > xrpl.telemetry levelization dependency.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds workflow-critical attributes to consensus spans:
- consensus.proposal.send: is_bow_out (identifies resignation proposals)
- consensus.accept: consensus_state (yes/moved_on/expired), disputes_count
- consensus.validation.send: ledger_hash (correlates validation to ledger)
Enables answering: "Did we reach consensus or time out?", "How many
disputes existed at acceptance?", "Which ledger did we validate?"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The apply() function doesn't have a `using namespace telemetry` directive
(unlike processTransaction), so tx_span attrs need explicit qualification.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enriches the tx.process span with final outcome after batch application:
- ter_result: the TER code string (e.g., "tesSUCCESS", "tecPATH_DRY")
- applied: boolean whether the transaction was included in the ledger
These attributes complete the tx.process span lifecycle — it now captures
identity (tx_type, tx_hash), intent (fee, sequence), and outcome
(ter_result, applied) for full workflow traceability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire up span attributes that enable filtering/grouping traces by request
characteristics: batch detection, payload size, resource cost category,
command name on WS spans, and pathfinding search parameters (destination
amount/currency, source asset count).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The project-wide rename check changed the factory function name but
missed this call site in the consensus simulation test framework.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ServiceRegistry gained the pure virtual getMetricsRegistry() in phase 7
but TestServiceRegistry was never updated. Returns nullptr since tests
don't need a real MetricsRegistry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>