The spanmetrics connector dimension was `xrpl.consensus.mode`, but the code
emits the span attribute under the bare key `consensus_mode` (matching every
other dimension after the Phase 6 rename). The mismatch left the
`xrpl_consensus_mode` Prometheus label empty, so the Consensus Health
"Consensus Mode Over Time" panel and the `$consensus_mode` template variable
(which filters every panel) matched no live series.
- otel-collector-config.yaml: dimension `xrpl.consensus.mode` -> `consensus_mode`
- consensus-health.json: 11 label refs `xrpl_consensus_mode` -> `consensus_mode`
(the `$consensus_mode` Grafana variable name is unchanged)
- telemetry-runbook.md: refresh the stale spanmetrics label table to the bare
names actually emitted (command/rpc_status/consensus_mode/local/
proposal_trusted/validation_trusted), fix dotted->bare attribute names in
span tables and TraceQL examples (tx_hash, ledger_seq, consensus_round_id,
consensus_ledger_id, consensus_round, tx_id event attr), correct the
consensus_round_id query to int (not quoted string), and fix the
load_type value query ("exception_rpc" -> "exceptioned RPC").
Verified against the live stack: Tempo span tags confirm bare attribute keys
(consensus_mode, ledger_seq, tx_hash, ...); the populated xrpl_consensus_mode
series in Prometheus is stale retained data from an older build.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds comprehensive "Insights and Sample Queries" section showing operators
what questions they can answer with the newly-added span attributes:
- Transaction workflow analysis (filter by tx_type, fee, ter_result)
- TxQ health (txq_status, ledger_changed)
- RPC debugging (is_batch, request_payload_size, load_type)
- PathFinding performance (dest_currency, num_source_assets)
- Consensus health (consensus_state, is_bow_out, disputes_count)
- Cross-subsystem correlation examples
Also updates all span reference tables with the new attributes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- RPC Spans table: `rpc.request` was documented but the code actually emits
`rpc.http_request`. Listed the actual emitted names
(`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`)
and their parent/child relationship.
- Drop `:<line>` suffixes from Source File columns in both RPC and
Transaction span tables. Line numbers drift with every refactor; the
filename is enough for operators to grep.
- Summary table: replace the never-emitted `rpc.request` row with the real
entry points so `span_name=` filters in PromQL / TraceQL match.
Update 31 attribute references in telemetry-runbook.md to match the
simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use
domain-qualified names for collisions (rpc_status, consensus_state,
etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix use-after-free: extract gauge callback to static function and call
RemoveCallback in ~OTelGaugeImpl() before unregistering from collector
- Use memory_order_acq_rel on callHooks() debounce CAS for proper
happens-before relationship between hook invocations
- Add explicit 2s timeout to ForceFlush() in destructor to prevent
blocking indefinitely when OTLP endpoint is unreachable at shutdown
- Add OTLP receiver to metrics pipeline so native OTel metrics from
xrpld are actually received by the collector
- Remove stale health check port from docker-compose (extension was
removed from collector config)
- Clarify fallback docs: StatsD path requires re-enabling receiver/port
- Fix comments: Counter uses uint64_t not int64_t, gauge clamps to
[0, INT64_MAX] not [0, UINT64_MAX]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture
Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp
Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
09-data-collection-reference.md and telemetry-runbook.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The StatsD receiver config was lost during a branch rebase (--ours
conflict resolution dropped it). Re-add the statsd receiver to the
OTel Collector config and wire it into the metrics pipeline so
beast::insight UDP metrics flow to Prometheus.
Also fixes:
- Metric prefix mismatch: docs used xrpld_ but dashboards/tests use
rippled_ — align all documentation to match the runnable stack
- Remove phantom Peer_Disconnects_Charges from docs (plain atomic,
not a beast::insight gauge)
- Remove premature .codecov.yml exclusions for Phase 7 OTelCollector
files that don't exist on this branch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing xrpl.consensus.quorum attribute to consensus.accept in runbook
- Fix dashboard legend formats: add exported_instance, use Title Case
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>