Dashboard template variables reference datasource uid "prometheus" but
the provisioning config had no uid set, causing Grafana to auto-assign
a random one and break dashboard panel queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add resource_metrics_key_attributes to spanmetrics connector so
service.instance.id becomes a Prometheus label for per-node filtering
- Add 'node' dropdown (service_instance_id) to all 3 dashboards
- Add 'command' dropdown (xrpl_rpc_command) to RPC Performance
- Add 'tx_origin' dropdown (xrpl_tx_local) to Transaction Overview
- Add 'consensus_mode' dropdown (xrpl_consensus_mode) to Consensus Health
- Update all panel PromQL queries to include $node filter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Ledger Apply Duration and Close Time Agreement panels to
consensus-health dashboard
- Add consensus.accept.apply to telemetry runbook with TraceQL
queries for close time disagreements and consensus failures
- Add span to TESTING.md expected span catalog and verification loop
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full consensus tracing with deterministic trace ID correlation
and establish-phase instrumentation:
- Deterministic trace_id from previousLedger.id() for cross-node
correlation (switchable via consensus_trace_strategy config)
- Round-to-round span links (follows-from) for causal chaining
- Establish phase spans with convergence tracking, dispute resolution
events, and threshold escalation attributes
- Validation spans with links to round spans (thread-safe via
roundSpanContext_ snapshot for jtACCEPT cross-thread access)
- Mode change spans for proposing/observing transitions
- New startSpan overload with span links in Telemetry interface
- XRPL_TRACE_ADD_EVENT macro with do-while(0) safety wrapper
- Config validation for consensus_trace_strategy
- Test adaptor (csf::Peer) updated with getTelemetry() stub
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a new trace span in doAccept() capturing ledger close time details:
- xrpl.consensus.close_time: agreed-upon close time (epoch seconds)
- xrpl.consensus.close_time_correct: whether validators converged
(per avCT_CONSENSUS_PCT = 75% threshold)
- xrpl.consensus.close_resolution_ms: time rounding granularity
- xrpl.consensus.state: "finished" or "moved_on" (consensus failure)
- xrpl.consensus.proposing: whether this node was proposing
Update Tempo datasource with close time filters, plan docs with
new span inventory, and add test coverage for the attribute pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add xrpl.tx.hash (static), xrpl.tx.local and xrpl.tx.status
(dynamic) search filters for Phase 3 transaction span attributes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add xrpl.rpc.command (static), xrpl.rpc.status and xrpl.rpc.role
(dynamic) search filters for Phase 2 RPC span attributes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tracing instrumentation to the RPC request handling layer:
TracingInstrumentation.h:
- Convenience macros (XRPL_TRACE_RPC, XRPL_TRACE_TX, etc.) that
create RAII SpanGuard objects when telemetry is enabled
- XRPL_TRACE_SET_ATTR / XRPL_TRACE_EXCEPTION for span enrichment
- Zero-overhead no-ops when XRPL_ENABLE_TELEMETRY is not defined
RPCHandler.cpp:
- Trace each RPC command with span name "rpc.command.<method>"
- Record command name, API version, role, and status as attributes
- Capture exceptions on the span for error visibility
ServerHandler.cpp:
- Trace HTTP requests ("rpc.request"), WebSocket messages
("rpc.ws_message"), and processRequest ("rpc.process")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only override serviceInstanceId with the node public key when the user
hasn't explicitly set service_instance_id in the [telemetry] section.
This allows operators to assign human-friendly names (e.g. "validator-1")
while still defaulting to the node's base58 public key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Telemetry object is constructed in ApplicationImp's member initializer
list where nodeIdentity_ is not yet available, resulting in an empty
service.instance.id resource attribute. Add setServiceInstanceId() virtual
method that Application::setup() calls after nodeIdentity_ is known but
before telemetry_->start() creates the OTel resource.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add resource-level search filters for node identity:
- service.instance.id (node public key) — unique node identifier
- service.version (rippled build version)
- xrpl.network.id (numeric network ID)
- xrpl.network.type (mainnet/testnet/devnet/standalone)
These enable filtering traces by specific nodes in multi-node
deployments and by network in mixed environments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Configure Grafana Tempo datasource with pre-built search filters
(service.name, span name, status, duration) for the Explore UI.
Enable Tempo metrics_generator with service-graphs and span-metrics
processors to power Grafana's service map visualization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Tempo 2.7.2 service to docker-compose with local storage
- Add otlp/tempo exporter to OTel Collector traces pipeline
- Add Tempo Grafana datasource provisioning with node graph
- Update 05-configuration-reference.md examples with Tempo
- OTel Collector fans traces to both Jaeger and Tempo simultaneously
Jaeger provides a standalone UI at :16686 for quick lookups.
Tempo is queryable via Grafana Explore using TraceQL and is the
recommended backend for production (supports S3/GCS storage).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The refs as previously used pointed to the source branch, not the target branch. However, determining the target branch is different depending on the GitHub event. The pull request logic was incorrect and needed to be fixed, and the logic inside the workflow could be simplified. Both modifications have been made in this commit.
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>