rippled Telemetry Operator Runbook
Overview
rippled supports OpenTelemetry distributed tracing to provide visibility into RPC requests, transaction processing, and consensus rounds.
Quick Start
1. Start the observability stack
This starts:
2. Enable telemetry in rippled
Add to your xrpld.cfg:
3. Build with telemetry support
Configuration Reference
| Option |
Default |
Description |
enabled |
0 |
Master switch for telemetry |
endpoint |
http://localhost:4318/v1/traces |
OTLP/HTTP endpoint |
exporter |
otlp_http |
Exporter type |
sampling_ratio |
1.0 |
Head-based sampling ratio (0.0–1.0) |
trace_rpc |
1 |
Enable RPC request tracing |
trace_transactions |
1 |
Enable transaction tracing |
trace_consensus |
1 |
Enable consensus tracing |
trace_peer |
0 |
Enable peer message tracing (high volume) |
trace_ledger |
1 |
Enable ledger tracing |
batch_size |
512 |
Max spans per batch export |
batch_delay_ms |
5000 |
Delay between batch exports |
max_queue_size |
2048 |
Max spans queued before dropping |
use_tls |
0 |
Use TLS for exporter connection |
tls_ca_cert |
(empty) |
Path to CA certificate bundle |
Span Reference
All spans instrumented in rippled, grouped by subsystem:
RPC Spans (Phase 2)
| Span Name |
Source File |
Attributes |
Description |
rpc.request |
ServerHandler.cpp:271 |
— |
Top-level HTTP RPC request |
rpc.process |
ServerHandler.cpp:573 |
— |
RPC processing (child of rpc.request) |
rpc.ws_message |
ServerHandler.cpp:384 |
— |
WebSocket RPC message |
rpc.command.<name> |
RPCHandler.cpp:161 |
xrpl.rpc.command, xrpl.rpc.version, xrpl.rpc.role, xrpl.rpc.status, xrpl.rpc.duration_ms, xrpl.rpc.error_message |
Per-command span (e.g., rpc.command.server_info) |
Transaction Spans (Phase 3)
| Span Name |
Source File |
Attributes |
Description |
tx.process |
NetworkOPs.cpp:1227 |
xrpl.tx.hash, xrpl.tx.local, xrpl.tx.path |
Transaction submission and processing |
tx.receive |
PeerImp.cpp:1273 |
xrpl.peer.id, xrpl.tx.hash, xrpl.tx.suppressed, xrpl.tx.status |
Transaction received from peer relay |
tx.apply |
BuildLedger.cpp:88 |
xrpl.ledger.seq, xrpl.ledger.tx_count, xrpl.ledger.tx_failed |
Transaction set applied per ledger |
Consensus Spans (Phase 4)
| Span Name |
Source File |
Attributes |
Description |
consensus.proposal.send |
RCLConsensus.cpp:177 |
xrpl.consensus.round |
Consensus proposal broadcast |
consensus.ledger_close |
RCLConsensus.cpp:282 |
xrpl.consensus.ledger.seq, xrpl.consensus.mode |
Ledger close event |
consensus.accept |
RCLConsensus.cpp:395 |
xrpl.consensus.proposers, xrpl.consensus.round_time_ms |
Ledger accepted by consensus |
consensus.validation.send |
RCLConsensus.cpp:753 |
xrpl.consensus.ledger.seq, xrpl.consensus.proposing |
Validation sent after accept |
consensus.accept.apply |
RCLConsensus.cpp:453 |
xrpl.consensus.close_time, close_time_correct, close_resolution_ms, state, proposing, round_time_ms, ledger.seq |
Ledger application with close time details |
Close Time Queries (Tempo TraceQL)
Ledger Spans (Phase 5)
| Span Name |
Source File |
Attributes |
Description |
ledger.build |
BuildLedger.cpp:31 |
xrpl.ledger.seq, xrpl.ledger.tx_count, xrpl.ledger.tx_failed |
Ledger build during consensus |
ledger.validate |
LedgerMaster.cpp:915 |
xrpl.ledger.seq, xrpl.ledger.validations |
Ledger promoted to validated |
ledger.store |
LedgerMaster.cpp:409 |
xrpl.ledger.seq |
Ledger stored in history |
Peer Spans (Phase 5)
| Span Name |
Source File |
Attributes |
Description |
peer.proposal.receive |
PeerImp.cpp:1667 |
xrpl.peer.id, xrpl.peer.proposal.trusted |
Proposal received from peer |
peer.validation.receive |
PeerImp.cpp:2264 |
xrpl.peer.id, xrpl.peer.validation.trusted |
Validation received from peer |
Prometheus Metrics (Spanmetrics)
The OTel Collector's spanmetrics connector automatically derives RED (Rate, Errors, Duration) metrics from every span. No custom metrics code is needed in rippled.
Generated Metric Names
| Prometheus Metric |
Type |
Description |
traces_span_metrics_calls_total |
Counter |
Total span invocations |
traces_span_metrics_duration_milliseconds_bucket |
Histogram |
Latency distribution buckets |
traces_span_metrics_duration_milliseconds_count |
Histogram |
Latency observation count |
traces_span_metrics_duration_milliseconds_sum |
Histogram |
Cumulative latency |
Metric Labels
Every metric carries these standard labels:
| Label |
Source |
Example |
span_name |
Span name |
rpc.command.server_info |
status_code |
Span status |
STATUS_CODE_UNSET, STATUS_CODE_ERROR |
service_name |
Resource attribute |
rippled |
span_kind |
Span kind |
SPAN_KIND_INTERNAL |
Additionally, span attributes configured as dimensions in the collector become metric labels (dots → underscores):
| Span Attribute |
Metric Label |
Applies To |
xrpl.rpc.command |
xrpl_rpc_command |
rpc.command.* spans |
xrpl.rpc.status |
xrpl_rpc_status |
rpc.command.* spans |
xrpl.consensus.mode |
xrpl_consensus_mode |
consensus.ledger_close spans |
xrpl.tx.local |
xrpl_tx_local |
tx.process spans |
xrpl.peer.proposal.trusted |
xrpl_peer_proposal_trusted |
peer.proposal.receive spans |
xrpl.peer.validation.trusted |
xrpl_peer_validation_trusted |
peer.validation.receive spans |
Histogram Buckets
Configured in otel-collector-config.yaml:
StatsD Metrics (beast::insight)
rippled has a built-in metrics framework (beast::insight) that emits StatsD-format metrics over UDP. These complement the span-derived RED metrics by providing system-level gauges, counters, and timers that don't map to individual trace spans.
Configuration
Add to xrpld.cfg:
The OTel Collector receives these via a statsd receiver on UDP port 8125 and exports them to Prometheus alongside spanmetrics.
Metric Reference
Gauges
| Prometheus Metric |
Source |
Description |
rippled_LedgerMaster_Validated_Ledger_Age |
LedgerMaster.h:373 |
Age of validated ledger (seconds) |
rippled_LedgerMaster_Published_Ledger_Age |
LedgerMaster.h:374 |
Age of published ledger (seconds) |
rippled_State_Accounting_{Mode}_duration |
NetworkOPs.cpp:774 |
Time in each operating mode (Disconnected/Connected/Syncing/Tracking/Full) |
rippled_State_Accounting_{Mode}_transitions |
NetworkOPs.cpp:780 |
Transition count per mode |
rippled_Peer_Finder_Active_Inbound_Peers |
PeerfinderManager.cpp:214 |
Active inbound peer connections |
rippled_Peer_Finder_Active_Outbound_Peers |
PeerfinderManager.cpp:215 |
Active outbound peer connections |
rippled_Overlay_Peer_Disconnects |
OverlayImpl.h:557 |
Peer disconnect count |
rippled_job_count |
JobQueue.cpp:26 |
Current job queue depth |
rippled_{category}_Bytes_In/Out |
OverlayImpl.h:535 |
Overlay traffic bytes per category (57 categories) |
rippled_{category}_Messages_In/Out |
OverlayImpl.h:535 |
Overlay traffic messages per category |
Counters
| Prometheus Metric |
Source |
Description |
rippled_rpc_requests |
ServerHandler.cpp:108 |
Total RPC request count |
rippled_ledger_fetches |
InboundLedgers.cpp:44 |
Ledger fetch request count |
rippled_ledger_history_mismatch |
LedgerHistory.cpp:16 |
Ledger hash mismatch count |
rippled_warn |
Logic.h:33 |
Resource manager warning count |
rippled_drop |
Logic.h:34 |
Resource manager drop count |
Histograms (from StatsD timers)
| Prometheus Metric |
Source |
Description |
rippled_rpc_time |
ServerHandler.cpp:110 |
RPC response time (ms) |
rippled_rpc_size |
ServerHandler.cpp:109 |
RPC response size (bytes) |
rippled_ios_latency |
Application.cpp:438 |
I/O service loop latency (ms) |
rippled_pathfind_fast |
PathRequests.h:23 |
Fast pathfinding duration (ms) |
rippled_pathfind_full |
PathRequests.h:24 |
Full pathfinding duration (ms) |
Grafana Dashboards
Eight dashboards are pre-provisioned in docker/telemetry/grafana/dashboards/:
RPC Performance (rippled-rpc-perf)
| Panel |
Type |
PromQL |
Labels Used |
| RPC Request Rate by Command |
timeseries |
sum by (xrpl_rpc_command) (rate(traces_span_metrics_calls_total{span_name=~"rpc.command.*"}[5m])) |
xrpl_rpc_command |
| RPC Latency p95 by Command |
timeseries |
histogram_quantile(0.95, sum by (le, xrpl_rpc_command) (rate(traces_span_metrics_duration_milliseconds_bucket{span_name=~"rpc.command.*"}[5m]))) |
xrpl_rpc_command |
| RPC Error Rate |
bargauge |
Error spans / total spans × 100, grouped by xrpl_rpc_command |
xrpl_rpc_command, status_code |
| RPC Latency Heatmap |
heatmap |
sum(increase(traces_span_metrics_duration_milliseconds_bucket{span_name=~"rpc.command.*"}[5m])) by (le) |
le (bucket boundaries) |
| Overall RPC Throughput |
timeseries |
rpc.request + rpc.process rate |
— |
| RPC Success vs Error |
timeseries |
by status_code (UNSET vs ERROR) |
status_code |
| Top Commands by Volume |
bargauge |
topk(10, ...) by xrpl_rpc_command |
xrpl_rpc_command |
| WebSocket Message Rate |
stat |
rpc.ws_message rate |
— |
Transaction Overview (rippled-transactions)
| Panel |
Type |
PromQL |
Labels Used |
| Transaction Processing Rate |
timeseries |
rate(traces_span_metrics_calls_total{span_name="tx.process"}[5m]) and tx.receive |
span_name |
| Transaction Processing Latency |
timeseries |
histogram_quantile(0.95 / 0.50, ... {span_name="tx.process"}) |
— |
| Transaction Path Distribution |
piechart |
sum by (xrpl_tx_local) (rate(traces_span_metrics_calls_total{span_name="tx.process"}[5m])) |
xrpl_tx_local |
| Transaction Receive vs Suppressed |
timeseries |
rate(traces_span_metrics_calls_total{span_name="tx.receive"}[5m]) |
— |
| TX Processing Duration Heatmap |
heatmap |
tx.process histogram buckets |
le |
| TX Apply Duration per Ledger |
timeseries |
p95/p50 of tx.apply |
— |
| Peer TX Receive Rate |
timeseries |
tx.receive rate |
— |
| TX Apply Failed Rate |
stat |
tx.apply with STATUS_CODE_ERROR |
status_code |
Consensus Health (rippled-consensus)
| Panel |
Type |
PromQL |
Labels Used |
| Consensus Round Duration |
timeseries |
histogram_quantile(0.95 / 0.50, ... {span_name="consensus.accept"}) |
— |
| Consensus Proposals Sent Rate |
timeseries |
rate(traces_span_metrics_calls_total{span_name="consensus.proposal.send"}[5m]) |
— |
| Ledger Close Duration |
timeseries |
histogram_quantile(0.95, ... {span_name="consensus.ledger_close"}) |
— |
| Validation Send Rate |
stat |
rate(traces_span_metrics_calls_total{span_name="consensus.validation.send"}[5m]) |
— |
| Ledger Apply Duration |
timeseries |
histogram_quantile(0.95 / 0.50, ... {span_name="consensus.accept.apply"}) |
— |
| Close Time Agreement |
timeseries |
rate(traces_span_metrics_calls_total{span_name="consensus.accept.apply"}[5m]) |
— |
| Consensus Mode Over Time |
timeseries |
consensus.ledger_close by xrpl_consensus_mode |
xrpl_consensus_mode |
| Accept vs Close Rate |
timeseries |
consensus.accept vs consensus.ledger_close rate |
— |
| Validation vs Close Rate |
timeseries |
consensus.validation.send vs consensus.ledger_close |
— |
| Accept Duration Heatmap |
heatmap |
consensus.accept histogram buckets |
le |
Ledger Operations (rippled-ledger-ops)
| Panel |
Type |
PromQL |
Labels Used |
| Ledger Build Rate |
stat |
ledger.build call rate |
— |
| Ledger Build Duration |
timeseries |
p95/p50 of ledger.build |
— |
| Ledger Validation Rate |
stat |
ledger.validate call rate |
— |
| Build Duration Heatmap |
heatmap |
ledger.build histogram buckets |
le |
| TX Apply Duration |
timeseries |
p95/p50 of tx.apply |
— |
| TX Apply Rate |
timeseries |
tx.apply call rate |
— |
| Ledger Store Rate |
stat |
ledger.store call rate |
— |
| Build vs Close Duration |
timeseries |
p95 ledger.build vs consensus.ledger_close |
— |
Peer Network (rippled-peer-net)
Requires trace_peer=1 in the [telemetry] config section.
| Panel |
Type |
PromQL |
Labels Used |
| Proposal Receive Rate |
timeseries |
peer.proposal.receive rate |
— |
| Validation Receive Rate |
timeseries |
peer.validation.receive rate |
— |
| Proposals Trusted vs Untrusted |
piechart |
by xrpl_peer_proposal_trusted |
xrpl_peer_proposal_trusted |
| Validations Trusted vs Untrusted |
piechart |
by xrpl_peer_validation_trusted |
xrpl_peer_validation_trusted |
Node Health — StatsD (rippled-statsd-node-health)
| Panel |
Type |
PromQL |
Labels Used |
| Validated Ledger Age |
stat |
rippled_LedgerMaster_Validated_Ledger_Age |
— |
| Published Ledger Age |
stat |
rippled_LedgerMaster_Published_Ledger_Age |
— |
| Operating Mode Duration |
timeseries |
rippled_State_Accounting_*_duration |
— |
| Operating Mode Transitions |
timeseries |
rippled_State_Accounting_*_transitions |
— |
| I/O Latency |
timeseries |
histogram_quantile(0.95, rippled_ios_latency_bucket) |
— |
| Job Queue Depth |
timeseries |
rippled_job_count |
— |
| Ledger Fetch Rate |
stat |
rate(rippled_ledger_fetches[5m]) |
— |
| Ledger History Mismatches |
stat |
rate(rippled_ledger_history_mismatch[5m]) |
— |
Network Traffic — StatsD (rippled-statsd-network)
| Panel |
Type |
PromQL |
Labels Used |
| Active Peers |
timeseries |
rippled_Peer_Finder_Active_*_Peers |
— |
| Peer Disconnects |
timeseries |
rippled_Overlay_Peer_Disconnects |
— |
| Total Network Bytes |
timeseries |
rippled_total_Bytes_In/Out |
— |
| Total Network Messages |
timeseries |
rippled_total_Messages_In/Out |
— |
| Transaction Traffic |
timeseries |
rippled_transactions_Messages_In/Out |
— |
| Proposal Traffic |
timeseries |
rippled_proposals_Messages_In/Out |
— |
| Validation Traffic |
timeseries |
rippled_validations_Messages_In/Out |
— |
| Traffic by Category |
bargauge |
topk(10, rippled_*_Bytes_In) |
— |
RPC & Pathfinding — StatsD (rippled-statsd-rpc)
| Panel |
Type |
PromQL |
Labels Used |
| RPC Request Rate |
stat |
rate(rippled_rpc_requests[5m]) |
— |
| RPC Response Time |
timeseries |
histogram_quantile(0.95, rippled_rpc_time_bucket) |
— |
| RPC Response Size |
timeseries |
histogram_quantile(0.95, rippled_rpc_size_bucket) |
— |
| RPC Response Time Heatmap |
heatmap |
rippled_rpc_time_bucket |
— |
| Pathfinding Fast Duration |
timeseries |
histogram_quantile(0.95, rippled_pathfind_fast_bucket) |
— |
| Pathfinding Full Duration |
timeseries |
histogram_quantile(0.95, rippled_pathfind_full_bucket) |
— |
| Resource Warnings Rate |
stat |
rate(rippled_warn[5m]) |
— |
| Resource Drops Rate |
stat |
rate(rippled_drop[5m]) |
— |
Span → Metric → Dashboard Summary
| Span Name |
Prometheus Metric Filter |
Grafana Dashboard |
rpc.request |
{span_name="rpc.request"} |
RPC Performance (Overall Throughput) |
rpc.process |
{span_name="rpc.process"} |
RPC Performance (Overall Throughput) |
rpc.ws_message |
{span_name="rpc.ws_message"} |
RPC Performance (WebSocket Rate) |
rpc.command.* |
{span_name=~"rpc.command.*"} |
RPC Performance (Rate, Latency, Error, Top) |
tx.process |
{span_name="tx.process"} |
Transaction Overview (Rate, Latency, Heatmap) |
tx.receive |
{span_name="tx.receive"} |
Transaction Overview (Rate, Receive) |
tx.apply |
{span_name="tx.apply"} |
Transaction Overview + Ledger Ops (Apply) |
consensus.accept |
{span_name="consensus.accept"} |
Consensus Health (Duration, Rate, Heatmap) |
consensus.proposal.send |
{span_name="consensus.proposal.send"} |
Consensus Health (Proposals Rate) |
consensus.ledger_close |
{span_name="consensus.ledger_close"} |
Consensus Health (Close, Mode) |
consensus.validation.send |
{span_name="consensus.validation.send"} |
Consensus Health (Validation Rate) |
consensus.accept.apply |
{span_name="consensus.accept.apply"} |
Consensus Health (Apply Duration, Close Time) |
ledger.build |
{span_name="ledger.build"} |
Ledger Ops (Build Rate, Duration, Heatmap) |
ledger.validate |
{span_name="ledger.validate"} |
Ledger Ops (Validation Rate) |
ledger.store |
{span_name="ledger.store"} |
Ledger Ops (Store Rate) |
peer.proposal.receive |
{span_name="peer.proposal.receive"} |
Peer Network (Rate, Trusted/Untrusted) |
peer.validation.receive |
{span_name="peer.validation.receive"} |
Peer Network (Rate, Trusted/Untrusted) |
Troubleshooting
No traces appearing in Jaeger
- Check rippled logs for
Telemetry starting message
- Verify
enabled=1 in the [telemetry] config section
- Test collector connectivity:
curl -v http://localhost:4318/v1/traces
- Check collector logs:
docker compose logs otel-collector
High memory usage
- Reduce
sampling_ratio (e.g., 0.1 for 10% sampling)
- Reduce
max_queue_size and batch_size
- Disable high-volume trace categories:
trace_peer=0
Collector connection failures
- Verify endpoint URL matches collector address
- Check firewall rules for ports 4317/4318
- If using TLS, verify certificate path with
tls_ca_cert
Performance Tuning
| Scenario |
Recommendation |
| Production mainnet |
sampling_ratio=0.01, trace_peer=0 |
| Testnet/devnet |
sampling_ratio=1.0 (full tracing) |
| Debugging specific issue |
sampling_ratio=1.0 temporarily |
| High-throughput node |
Increase batch_size=1024, max_queue_size=4096 |
Disabling Telemetry
Set enabled=0 in config (runtime disable) or build without the flag:
When telemetry is compiled out, all trace macros expand to no-ops with zero overhead.