Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-04-29 20:24:52 +01:00
1526 changed files with 75760 additions and 25422 deletions

View File

@@ -1,6 +1,6 @@
# Observability Data Collection Reference
> **Audience**: Developers and operators. This is the single source of truth for all telemetry data collected by rippled's observability stack.
> **Audience**: Developers and operators. This is the single source of truth for all telemetry data collected by xrpld's observability stack.
>
> **Related docs**: [docs/telemetry-runbook.md](../docs/telemetry-runbook.md) (operator runbook with alerting and troubleshooting) | [03-implementation-strategy.md](./03-implementation-strategy.md) (code structure and performance optimization) | [04-code-samples.md](./04-code-samples.md) (C++ instrumentation examples)
@@ -8,7 +8,7 @@
```mermaid
graph LR
subgraph rippledNode["rippled Node"]
subgraph xrpldNode["xrpld Node"]
A["Trace Macros<br/>XRPL_TRACE_SPAN<br/>(OTLP/HTTP exporter)"]
B["beast::insight<br/>OTel native metrics<br/>(OTLP/HTTP exporter)"]
C["MetricsRegistry<br/>OTel SDK metrics<br/>(OTLP/HTTP exporter)"]
@@ -43,7 +43,7 @@ graph LR
BP -->|"OTLP/gRPC :4317"| D
SM -->|"span_calls_total<br/>span_duration_ms<br/>(6 dimension labels)"| E
R1 -->|"rippled_* gauges<br/>rippled_* counters<br/>rippled_* histograms"| E
R1 -->|"xrpld_* gauges<br/>xrpld_* counters<br/>xrpld_* histograms"| E
E -->|"Prometheus<br/>data source"| F
D -->|"Tempo<br/>data source"| F
@@ -56,7 +56,7 @@ graph LR
style D fill:#f0ad4e,color:#000,stroke:#c78c2e
style E fill:#f0ad4e,color:#000,stroke:#c78c2e
style F fill:#5bc0de,color:#000,stroke:#3aa8c1
style rippledNode fill:#1a2633,color:#ccc,stroke:#4a90d9
style xrpldNode fill:#1a2633,color:#ccc,stroke:#4a90d9
style collector fill:#1a3320,color:#ccc,stroke:#5cb85c
style backends fill:#332a1a,color:#ccc,stroke:#f0ad4e
style metrics fill:#332a1a,color:#ccc,stroke:#f0ad4e
@@ -93,9 +93,9 @@ Controlled by `trace_rpc=1` in `[telemetry]` config.
| `rpc.ws_message` | — | ServerHandler.cpp | WebSocket message handling |
| `rpc.command.<name>` | `rpc.process` | RPCHandler.cpp | Per-command span (e.g., `rpc.command.server_info`, `rpc.command.ledger`) |
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"rpc.request|rpc.command.*"}`
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"rpc.request|rpc.command.*"}`
**Grafana dashboard**: _RPC Performance_ (`rippled-rpc-perf`)
**Grafana dashboard**: _RPC Performance_ (`xrpld-rpc-perf`)
#### Transaction Spans
@@ -107,9 +107,9 @@ Controlled by `trace_transactions=1` in `[telemetry]` config.
| `tx.receive` | — | PeerImp.cpp | Raw transaction received from peer overlay (before deduplication) |
| `tx.apply` | `ledger.build` | BuildLedger.cpp | Transaction set applied to new ledger during consensus |
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"tx.process|tx.receive"}`
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"tx.process|tx.receive"}`
**Grafana dashboard**: _Transaction Overview_ (`rippled-transactions`)
**Grafana dashboard**: _Transaction Overview_ (`xrpld-transactions`)
#### Consensus Spans
@@ -123,9 +123,9 @@ Controlled by `trace_consensus=1` in `[telemetry]` config.
| `consensus.validation.send` | — | RCLConsensus.cpp | Validation message sent after ledger accepted |
| `consensus.accept.apply` | — | RCLConsensus.cpp | Ledger application with close time details |
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"consensus.*"}`
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"consensus.*"}`
**Grafana dashboard**: _Consensus Health_ (`rippled-consensus`)
**Grafana dashboard**: _Consensus Health_ (`xrpld-consensus`)
#### Ledger Spans
@@ -137,9 +137,9 @@ Controlled by `trace_ledger=1` in `[telemetry]` config.
| `ledger.validate` | — | LedgerMaster.cpp | Ledger promoted to validated status |
| `ledger.store` | — | LedgerMaster.cpp | Ledger stored to database/history |
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"ledger.*"}`
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"ledger.*"}`
**Grafana dashboard**: _Ledger Operations_ (`rippled-ledger-ops`)
**Grafana dashboard**: _Ledger Operations_ (`xrpld-ledger-ops`)
#### Peer Spans
@@ -150,9 +150,9 @@ Controlled by `trace_peer=1` in `[telemetry]` config. **Disabled by default** (h
| `peer.proposal.receive` | — | PeerImp.cpp | Consensus proposal received from peer |
| `peer.validation.receive` | — | PeerImp.cpp | Validation message received from peer |
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"peer.*"}`
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"peer.*"}`
**Grafana dashboard**: _Peer Network_ (`rippled-peer-net`)
**Grafana dashboard**: _Peer Network_ (`xrpld-peer-net`)
---
@@ -237,7 +237,7 @@ Every span can carry key-value attributes that provide context for filtering and
> **See also**: [01-architecture-analysis.md](./01-architecture-analysis.md) §1.8.2 for how span-derived metrics map to operational insights.
The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Errors, Duration) metrics from every span. No custom metrics code in rippled is needed.
The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Errors, Duration) metrics from every span. No custom metrics code in xrpld is needed.
| Prometheus Metric | Type | Description |
| -------------------------------------------------- | --------- | ------------------------------------------------------------------------------ |
@@ -269,7 +269,7 @@ The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Er
>
> **Migration complete**: Phase 7 replaced the StatsD UDP transport with native OTel Metrics SDK export via OTLP/HTTP. The `beast::insight::Collector` interface and all metric names are preserved — only the wire protocol changed. `[insight] server=statsd` remains as a fallback.
These are system-level metrics emitted by rippled's `beast::insight` framework via OTel OTLP/HTTP. They cover operational data that doesn't map to individual trace spans.
These are system-level metrics emitted by xrpld's `beast::insight` framework via OTel OTLP/HTTP. They cover operational data that doesn't map to individual trace spans.
### Configuration
@@ -278,7 +278,7 @@ These are system-level metrics emitted by rippled's `beast::insight` framework v
[insight]
server=otel
endpoint=http://localhost:4318/v1/metrics
prefix=rippled
prefix=xrpld
```
Fallback (StatsD):
@@ -287,56 +287,56 @@ Fallback (StatsD):
[insight]
server=statsd
address=127.0.0.1:8125
prefix=rippled
prefix=xrpld
```
### 2.1 Gauges
| Prometheus Metric | Source File | Description | Typical Range |
| --------------------------------------------------- | --------------------- | ----------------------------------------- | ------------------------------- |
| `rippled_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h | Seconds since last validated ledger | 010 (healthy), >30 (stale) |
| `rippled_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h | Seconds since last published ledger | 010 (healthy) |
| `rippled_State_Accounting_Disconnected_duration` | NetworkOPs.cpp | Cumulative seconds in Disconnected state | Monotonic |
| `rippled_State_Accounting_Connected_duration` | NetworkOPs.cpp | Cumulative seconds in Connected state | Monotonic |
| `rippled_State_Accounting_Syncing_duration` | NetworkOPs.cpp | Cumulative seconds in Syncing state | Monotonic |
| `rippled_State_Accounting_Tracking_duration` | NetworkOPs.cpp | Cumulative seconds in Tracking state | Monotonic |
| `rippled_State_Accounting_Full_duration` | NetworkOPs.cpp | Cumulative seconds in Full state | Monotonic (should dominate) |
| `rippled_State_Accounting_Disconnected_transitions` | NetworkOPs.cpp | Count of transitions to Disconnected | Low |
| `rippled_State_Accounting_Connected_transitions` | NetworkOPs.cpp | Count of transitions to Connected | Low |
| `rippled_State_Accounting_Syncing_transitions` | NetworkOPs.cpp | Count of transitions to Syncing | Low |
| `rippled_State_Accounting_Tracking_transitions` | NetworkOPs.cpp | Count of transitions to Tracking | Low |
| `rippled_State_Accounting_Full_transitions` | NetworkOPs.cpp | Count of transitions to Full | Low (should be 1 after startup) |
| `rippled_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp | Active inbound peer connections | 085 |
| `rippled_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp | Active outbound peer connections | 1021 |
| `rippled_Overlay_Peer_Disconnects` | OverlayImpl.cpp | Cumulative peer disconnection count | Low growth |
| `rippled_Overlay_Peer_Disconnects_Charges` | OverlayImpl.cpp | Disconnects due to resource limit charges | Low growth (subset of above) |
| `rippled_job_count` | JobQueue.cpp | Current job queue depth | 0100 (healthy) |
| Prometheus Metric | Source File | Description | Typical Range |
| ------------------------------------------------- | --------------------- | ----------------------------------------- | ------------------------------- |
| `xrpld_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h | Seconds since last validated ledger | 010 (healthy), >30 (stale) |
| `xrpld_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h | Seconds since last published ledger | 010 (healthy) |
| `xrpld_State_Accounting_Disconnected_duration` | NetworkOPs.cpp | Cumulative seconds in Disconnected state | Monotonic |
| `xrpld_State_Accounting_Connected_duration` | NetworkOPs.cpp | Cumulative seconds in Connected state | Monotonic |
| `xrpld_State_Accounting_Syncing_duration` | NetworkOPs.cpp | Cumulative seconds in Syncing state | Monotonic |
| `xrpld_State_Accounting_Tracking_duration` | NetworkOPs.cpp | Cumulative seconds in Tracking state | Monotonic |
| `xrpld_State_Accounting_Full_duration` | NetworkOPs.cpp | Cumulative seconds in Full state | Monotonic (should dominate) |
| `xrpld_State_Accounting_Disconnected_transitions` | NetworkOPs.cpp | Count of transitions to Disconnected | Low |
| `xrpld_State_Accounting_Connected_transitions` | NetworkOPs.cpp | Count of transitions to Connected | Low |
| `xrpld_State_Accounting_Syncing_transitions` | NetworkOPs.cpp | Count of transitions to Syncing | Low |
| `xrpld_State_Accounting_Tracking_transitions` | NetworkOPs.cpp | Count of transitions to Tracking | Low |
| `xrpld_State_Accounting_Full_transitions` | NetworkOPs.cpp | Count of transitions to Full | Low (should be 1 after startup) |
| `xrpld_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp | Active inbound peer connections | 085 |
| `xrpld_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp | Active outbound peer connections | 1021 |
| `xrpld_Overlay_Peer_Disconnects` | OverlayImpl.cpp | Cumulative peer disconnection count | Low growth |
| `xrpld_Overlay_Peer_Disconnects_Charges` | OverlayImpl.cpp | Disconnects due to resource limit charges | Low growth (subset of above) |
| `xrpld_job_count` | JobQueue.cpp | Current job queue depth | 0100 (healthy) |
**Grafana dashboard**: _Node Health (System Metrics)_ (`rippled-system-node-health`)
**Grafana dashboard**: _Node Health (System Metrics)_ (`xrpld-system-node-health`)
### 2.2 Counters
| Prometheus Metric | Source File | Description |
| --------------------------------- | ------------------ | --------------------------------------------- |
| `rippled_rpc_requests` | ServerHandler.cpp | Total RPC requests received |
| `rippled_ledger_fetches` | InboundLedgers.cpp | Inbound ledger fetch attempts |
| `rippled_ledger_history_mismatch` | LedgerHistory.cpp | Ledger hash mismatches detected |
| `rippled_warn` | Logic.h | Resource manager warnings issued |
| `rippled_drop` | Logic.h | Resource manager drops (connections rejected) |
| Prometheus Metric | Source File | Description |
| ------------------------------- | ------------------ | --------------------------------------------- |
| `xrpld_rpc_requests` | ServerHandler.cpp | Total RPC requests received |
| `xrpld_ledger_fetches` | InboundLedgers.cpp | Inbound ledger fetch attempts |
| `xrpld_ledger_history_mismatch` | LedgerHistory.cpp | Ledger hash mismatches detected |
| `xrpld_warn` | Logic.h | Resource manager warnings issued |
| `xrpld_drop` | Logic.h | Resource manager drops (connections rejected) |
**Note**: With `server=otel`, `rippled_warn` and `rippled_drop` are properly exported as OTel Counter instruments. The previous StatsD `|m` type limitation no longer applies.
**Note**: With `server=otel`, `xrpld_warn` and `xrpld_drop` are properly exported as OTel Counter instruments. The previous StatsD `|m` type limitation no longer applies.
**Grafana dashboard**: _RPC & Pathfinding (System Metrics)_ (`rippled-system-rpc`)
**Grafana dashboard**: _RPC & Pathfinding (System Metrics)_ (`xrpld-system-rpc`)
### 2.3 Histograms (Event timers)
| Prometheus Metric | Source File | Unit | Description |
| ----------------------- | ----------------- | ----- | ------------------------------ |
| `rippled_rpc_time` | ServerHandler.cpp | ms | RPC response time distribution |
| `rippled_rpc_size` | ServerHandler.cpp | bytes | RPC response size distribution |
| `rippled_ios_latency` | Application.cpp | ms | I/O service loop latency |
| `rippled_pathfind_fast` | PathRequests.h | ms | Fast pathfinding duration |
| `rippled_pathfind_full` | PathRequests.h | ms | Full pathfinding duration |
| Prometheus Metric | Source File | Unit | Description |
| --------------------- | ----------------- | ----- | ------------------------------ |
| `xrpld_rpc_time` | ServerHandler.cpp | ms | RPC response time distribution |
| `xrpld_rpc_size` | ServerHandler.cpp | bytes | RPC response size distribution |
| `xrpld_ios_latency` | Application.cpp | ms | I/O service loop latency |
| `xrpld_pathfind_fast` | PathRequests.h | ms | Fast pathfinding duration |
| `xrpld_pathfind_full` | PathRequests.h | ms | Full pathfinding duration |
Quantiles collected: 0th, 50th, 90th, 95th, 99th, 100th percentile.
@@ -346,10 +346,10 @@ Quantiles collected: 0th, 50th, 90th, 95th, 99th, 100th percentile.
For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), four gauges are emitted:
- `rippled_{category}_Bytes_In`
- `rippled_{category}_Bytes_Out`
- `rippled_{category}_Messages_In`
- `rippled_{category}_Messages_Out`
- `xrpld_{category}_Bytes_In`
- `xrpld_{category}_Bytes_Out`
- `xrpld_{category}_Messages_In`
- `xrpld_{category}_Messages_Out`
**Key categories**:
@@ -368,7 +368,7 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
| `ping` / `status` | Keepalive and status |
| `set_get` | Set requests |
**Grafana dashboards**: _Network Traffic_ (`rippled-system-network`), _Overlay Traffic Detail_ (`rippled-system-overlay-detail`), _Ledger Data & Sync_ (`rippled-system-ledger-sync`)
**Grafana dashboards**: _Network Traffic_ (`xrpld-system-network`), _Overlay Traffic Detail_ (`xrpld-system-overlay-detail`), _Ledger Data & Sync_ (`xrpld-system-ledger-sync`)
---
@@ -378,28 +378,28 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
### 3.1 Span-Derived Dashboards (5)
| Dashboard | UID | Data Source | Key Panels |
| -------------------- | ---------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
| RPC Performance | `rippled-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
| Transaction Overview | `rippled-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
| Consensus Health | `rippled-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap |
| Ledger Operations | `rippled-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
| Peer Network | `rippled-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
| Dashboard | UID | Data Source | Key Panels |
| -------------------- | -------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
| RPC Performance | `xrpld-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
| Transaction Overview | `xrpld-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
| Consensus Health | `xrpld-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap |
| Ledger Operations | `xrpld-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
| Peer Network | `xrpld-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
### 3.2 System Metrics Dashboards (5)
| Dashboard | UID | Data Source | Key Panels |
| ---------------------- | ------------------------------- | ----------------- | --------------------------------------------------------------------------------- |
| Node Health | `rippled-system-node-health` | Prometheus (OTLP) | Ledger age, operating mode, I/O latency, job queue, fetch rate |
| Network Traffic | `rippled-system-network` | Prometheus (OTLP) | Active peers, disconnects, bytes in/out, messages in/out, traffic by category |
| RPC & Pathfinding | `rippled-system-rpc` | Prometheus (OTLP) | RPC rate, response time/size, pathfinding duration, resource warnings/drops |
| Overlay Traffic Detail | `rippled-system-overlay-detail` | Prometheus (OTLP) | Squelch, overhead, validator lists, set get/share, have/requested tx, proof paths |
| Ledger Data & Sync | `rippled-system-ledger-sync` | Prometheus (OTLP) | Ledger data exchange, legacy ledger share/get, getobject by type, traffic heatmap |
| Dashboard | UID | Data Source | Key Panels |
| ---------------------- | ----------------------------- | ----------------- | --------------------------------------------------------------------------------- |
| Node Health | `xrpld-system-node-health` | Prometheus (OTLP) | Ledger age, operating mode, I/O latency, job queue, fetch rate |
| Network Traffic | `xrpld-system-network` | Prometheus (OTLP) | Active peers, disconnects, bytes in/out, messages in/out, traffic by category |
| RPC & Pathfinding | `xrpld-system-rpc` | Prometheus (OTLP) | RPC rate, response time/size, pathfinding duration, resource warnings/drops |
| Overlay Traffic Detail | `xrpld-system-overlay-detail` | Prometheus (OTLP) | Squelch, overhead, validator lists, set get/share, have/requested tx, proof paths |
| Ledger Data & Sync | `xrpld-system-ledger-sync` | Prometheus (OTLP) | Ledger data exchange, legacy ledger share/get, getobject by type, traffic heatmap |
### 3.3 Accessing the Dashboards
1. Open Grafana at **http://localhost:3000**
2. Navigate to **Dashboards → rippled** folder
2. Navigate to **Dashboards → xrpld** folder
3. All 10 dashboards are auto-provisioned from `docker/telemetry/grafana/dashboards/`
---
@@ -410,18 +410,18 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
### Finding Traces by Type
| What to Find | Tempo TraceQL Query |
| ------------------------ | -------------------------------------------------------------------------------- |
| All RPC calls | `{resource.service.name="rippled" && name="rpc.request"}` |
| Specific RPC command | `{resource.service.name="rippled" && name="rpc.command.server_info"}` |
| Slow RPC calls | `{resource.service.name="rippled" && name=~"rpc.command.*"} \| duration > 100ms` |
| Failed RPC calls | `{span.xrpl.rpc.status="error"}` |
| Specific transaction | `{span.xrpl.tx.hash="<hex_hash>"}` |
| Local transactions only | `{span.xrpl.tx.local=true}` |
| Consensus rounds | `{resource.service.name="rippled" && name="consensus.accept"}` |
| Rounds by mode | `{span.xrpl.consensus.mode="proposing"}` |
| Specific ledger | `{span.xrpl.ledger.seq=12345}` |
| Peer proposals (trusted) | `{span.xrpl.peer.proposal.trusted=true}` |
| What to Find | Tempo TraceQL Query |
| ------------------------ | ------------------------------------------------------------------------------ |
| All RPC calls | `{resource.service.name="xrpld" && name="rpc.request"}` |
| Specific RPC command | `{resource.service.name="xrpld" && name="rpc.command.server_info"}` |
| Slow RPC calls | `{resource.service.name="xrpld" && name=~"rpc.command.*"} \| duration > 100ms` |
| Failed RPC calls | `{span.xrpl.rpc.status="error"}` |
| Specific transaction | `{span.xrpl.tx.hash="<hex_hash>"}` |
| Local transactions only | `{span.xrpl.tx.local=true}` |
| Consensus rounds | `{resource.service.name="xrpld" && name="consensus.accept"}` |
| Rounds by mode | `{span.xrpl.consensus.mode="proposing"}` |
| Specific ledger | `{span.xrpl.ledger.seq=12345}` |
| Peer proposals (trusted) | `{span.xrpl.peer.proposal.trusted=true}` |
### Trace Structure
@@ -475,19 +475,19 @@ sum by (xrpl_peer_proposal_trusted) (rate(traces_span_metrics_calls_total{span_n
```promql
# Validated ledger age (should be < 10s)
rippled_LedgerMaster_Validated_Ledger_Age
xrpld_LedgerMaster_Validated_Ledger_Age
# Active peer count
rippled_Peer_Finder_Active_Inbound_Peers + rippled_Peer_Finder_Active_Outbound_Peers
xrpld_Peer_Finder_Active_Inbound_Peers + xrpld_Peer_Finder_Active_Outbound_Peers
# RPC response time p95
histogram_quantile(0.95, rippled_rpc_time_bucket)
histogram_quantile(0.95, xrpld_rpc_time_bucket)
# Total network bytes in (rate)
rate(rippled_total_Bytes_In[5m])
rate(xrpld_total_Bytes_In[5m])
# Operating mode (should be "Full" after startup)
rippled_State_Accounting_Full_duration
xrpld_State_Accounting_Full_duration
```
---
@@ -497,7 +497,7 @@ rippled_State_Accounting_Full_duration
> **Plan details**: [06-implementation-phases.md §6.8.1](./06-implementation-phases.md) — motivation, architecture, Mermaid diagrams
> **Task breakdown**: [Phase8_taskList.md](./Phase8_taskList.md) — per-task implementation details
Phase 8 injects OTel trace context into rippled's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
Phase 8 injects OTel trace context into xrpld's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
### Log Format
@@ -522,7 +522,7 @@ The trace context injection is implemented in `Logs::format()` (`src/libxrpl/bas
### Log Ingestion Pipeline
```
rippled debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
xrpld debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
```
The OTel Collector's `filelog` receiver tails `debug.log` files and uses a `regex_parser` operator to extract structured fields:
@@ -551,16 +551,16 @@ Grafana Loki (v2.9.0) serves as the log storage backend. It receives log entries
```logql
# Find all logs for a specific trace
{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01"
{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01"
# Error logs with trace context
{job="rippled"} |= "ERR" |= "trace_id="
{job="xrpld"} |= "ERR" |= "trace_id="
# Logs from a specific partition with trace context
{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
# Count traced log lines over time
count_over_time({job="rippled"} |= "trace_id=" [5m])
count_over_time({job="xrpld"} |= "trace_id=" [5m])
```
---
@@ -571,141 +571,141 @@ count_over_time({job="rippled"} |= "trace_id=" [5m])
> **Plan details**: [06-implementation-phases.md §6.8.2](./06-implementation-phases.md) — motivation, architecture, third-party context
> **Task breakdown**: [Phase9_taskList.md](./Phase9_taskList.md) — per-task implementation details
Phase 9 fills ~50+ metrics that exist inside rippled but currently lack time-series export. Uses a hybrid approach: `beast::insight` extensions for NodeStore I/O, OTel `ObservableGauge` async callbacks for new categories.
Phase 9 fills ~50+ metrics that exist inside xrpld but currently lack time-series export. Uses a hybrid approach: `beast::insight` extensions for NodeStore I/O, OTel `ObservableGauge` async callbacks for new categories.
### New Metric Categories
#### NodeStore I/O (via beast::insight)
| Prometheus Metric | Type | Description |
| ------------------------------------ | ----- | ----------------------------------- |
| `rippled_nodestore_reads_total` | Gauge | Cumulative read operations |
| `rippled_nodestore_reads_hit` | Gauge | Cache-served reads |
| `rippled_nodestore_writes` | Gauge | Cumulative write operations |
| `rippled_nodestore_written_bytes` | Gauge | Cumulative bytes written |
| `rippled_nodestore_read_bytes` | Gauge | Cumulative bytes read |
| `rippled_nodestore_read_duration_us` | Gauge | Cumulative read time (microseconds) |
| `rippled_nodestore_write_load` | Gauge | Current write load score |
| `rippled_nodestore_read_queue` | Gauge | Items in read queue |
| Prometheus Metric | Type | Description |
| ---------------------------------- | ----- | ----------------------------------- |
| `xrpld_nodestore_reads_total` | Gauge | Cumulative read operations |
| `xrpld_nodestore_reads_hit` | Gauge | Cache-served reads |
| `xrpld_nodestore_writes` | Gauge | Cumulative write operations |
| `xrpld_nodestore_written_bytes` | Gauge | Cumulative bytes written |
| `xrpld_nodestore_read_bytes` | Gauge | Cumulative bytes read |
| `xrpld_nodestore_read_duration_us` | Gauge | Cumulative read time (microseconds) |
| `xrpld_nodestore_write_load` | Gauge | Current write load score |
| `xrpld_nodestore_read_queue` | Gauge | Items in read queue |
#### Cache Hit Rates (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| ------------------------------- | ----- | ------------------------------------ |
| `rippled_cache_SLE_hit_rate` | Gauge | SLE cache hit rate (0.0-1.0) |
| `rippled_cache_ledger_hit_rate` | Gauge | Ledger object cache hit rate |
| `rippled_cache_AL_hit_rate` | Gauge | AcceptedLedger cache hit rate |
| `rippled_cache_treenode_size` | Gauge | SHAMap TreeNode cache size (entries) |
| `rippled_cache_fullbelow_size` | Gauge | FullBelow cache size |
| Prometheus Metric | Type | Description |
| ----------------------------- | ----- | ------------------------------------ |
| `xrpld_cache_SLE_hit_rate` | Gauge | SLE cache hit rate (0.0-1.0) |
| `xrpld_cache_ledger_hit_rate` | Gauge | Ledger object cache hit rate |
| `xrpld_cache_AL_hit_rate` | Gauge | AcceptedLedger cache hit rate |
| `xrpld_cache_treenode_size` | Gauge | SHAMap TreeNode cache size (entries) |
| `xrpld_cache_fullbelow_size` | Gauge | FullBelow cache size |
#### Transaction Queue (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| -------------------------------------- | ----- | -------------------------------- |
| `rippled_txq_count` | Gauge | Current transactions in queue |
| `rippled_txq_max_size` | Gauge | Maximum queue capacity |
| `rippled_txq_in_ledger` | Gauge | Transactions in open ledger |
| `rippled_txq_per_ledger` | Gauge | Expected transactions per ledger |
| `rippled_txq_open_ledger_fee_level` | Gauge | Open ledger fee escalation level |
| `rippled_txq_med_fee_level` | Gauge | Median fee level in queue |
| `rippled_txq_reference_fee_level` | Gauge | Reference fee level |
| `rippled_txq_min_processing_fee_level` | Gauge | Minimum fee to get processed |
| Prometheus Metric | Type | Description |
| ------------------------------------ | ----- | -------------------------------- |
| `xrpld_txq_count` | Gauge | Current transactions in queue |
| `xrpld_txq_max_size` | Gauge | Maximum queue capacity |
| `xrpld_txq_in_ledger` | Gauge | Transactions in open ledger |
| `xrpld_txq_per_ledger` | Gauge | Expected transactions per ledger |
| `xrpld_txq_open_ledger_fee_level` | Gauge | Open ledger fee escalation level |
| `xrpld_txq_med_fee_level` | Gauge | Median fee level in queue |
| `xrpld_txq_reference_fee_level` | Gauge | Reference fee level |
| `xrpld_txq_min_processing_fee_level` | Gauge | Minimum fee to get processed |
#### PerfLog Per-RPC Method (via OTel Metrics SDK)
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------- | --------- | ----------------- | --------------------------- |
| `rippled_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
| `rippled_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed |
| `rippled_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls errored |
| `rippled_rpc_method_duration_us_bucket` | Histogram | `method="<name>"` | Execution time distribution |
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------- | --------- | ----------------- | --------------------------- |
| `xrpld_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
| `xrpld_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed |
| `xrpld_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls errored |
| `xrpld_rpc_method_duration_us_bucket` | Histogram | `method="<name>"` | Execution time distribution |
#### PerfLog Per-Job Type (via OTel Metrics SDK)
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------- | --------- | ------------------- | --------------- |
| `rippled_job_queued_total` | Counter | `job_type="<name>"` | Jobs queued |
| `rippled_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
| `rippled_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
| `rippled_job_queued_duration_us_bucket` | Histogram | `job_type="<name>"` | Queue wait time |
| `rippled_job_running_duration_us_bucket` | Histogram | `job_type="<name>"` | Execution time |
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------- | --------- | ------------------- | --------------- |
| `xrpld_job_queued_total` | Counter | `job_type="<name>"` | Jobs queued |
| `xrpld_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
| `xrpld_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
| `xrpld_job_queued_duration_us_bucket` | Histogram | `job_type="<name>"` | Queue wait time |
| `xrpld_job_running_duration_us_bucket` | Histogram | `job_type="<name>"` | Execution time |
#### Counted Object Instances (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| ---------------------- | ----- | --------------- | ------------------------------- |
| `rippled_object_count` | Gauge | `type="<name>"` | Live instances of internal type |
| Prometheus Metric | Type | Labels | Description |
| -------------------- | ----- | --------------- | ------------------------------- |
| `xrpld_object_count` | Gauge | `type="<name>"` | Live instances of internal type |
Tracked types: `Transaction`, `Ledger`, `NodeObject`, `STTx`, `STLedgerEntry`, `InboundLedger`, `Pathfinder`, `PathRequest`, `HashRouterEntry`
#### Fee Escalation & Load Factors (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| ------------------------------------ | ----- | ------------------------------------ |
| `rippled_load_factor` | Gauge | Combined transaction cost multiplier |
| `rippled_load_factor_server` | Gauge | Server + cluster + network load |
| `rippled_load_factor_local` | Gauge | Local server load only |
| `rippled_load_factor_net` | Gauge | Network-wide load estimate |
| `rippled_load_factor_cluster` | Gauge | Cluster peer load |
| `rippled_load_factor_fee_escalation` | Gauge | Open ledger fee escalation |
| `rippled_load_factor_fee_queue` | Gauge | Queue entry fee level |
| Prometheus Metric | Type | Description |
| ---------------------------------- | ----- | ------------------------------------ |
| `xrpld_load_factor` | Gauge | Combined transaction cost multiplier |
| `xrpld_load_factor_server` | Gauge | Server + cluster + network load |
| `xrpld_load_factor_local` | Gauge | Local server load only |
| `xrpld_load_factor_net` | Gauge | Network-wide load estimate |
| `xrpld_load_factor_cluster` | Gauge | Cluster peer load |
| `xrpld_load_factor_fee_escalation` | Gauge | Open ledger fee escalation |
| `xrpld_load_factor_fee_queue` | Gauge | Queue entry fee level |
#### Server Info (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------------------------------- | ----- | -------- | -------------------------------------------- |
| `rippled_server_info{metric="server_state"}` | Gauge | `metric` | Operating mode (0=DISCONNECTED .. 4=FULL) |
| `rippled_server_info{metric="uptime"}` | Gauge | `metric` | Seconds since server start |
| `rippled_server_info{metric="peers"}` | Gauge | `metric` | Total connected peers |
| `rippled_server_info{metric="validated_ledger_seq"}` | Gauge | `metric` | Validated ledger sequence number |
| `rippled_server_info{metric="ledger_current_index"}` | Gauge | `metric` | Current open ledger sequence |
| `rippled_server_info{metric="peer_disconnects_resources"}` | Gauge | `metric` | Cumulative resource-related peer disconnects |
| `rippled_server_info{metric="last_close_proposers"}` | Gauge | `metric` | Proposers in last closed round |
| `rippled_server_info{metric="last_close_converge_time_ms"}` | Gauge | `metric` | Last close convergence time (milliseconds) |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------------- | ----- | -------- | -------------------------------------------- |
| `xrpld_server_info{metric="server_state"}` | Gauge | `metric` | Operating mode (0=DISCONNECTED .. 4=FULL) |
| `xrpld_server_info{metric="uptime"}` | Gauge | `metric` | Seconds since server start |
| `xrpld_server_info{metric="peers"}` | Gauge | `metric` | Total connected peers |
| `xrpld_server_info{metric="validated_ledger_seq"}` | Gauge | `metric` | Validated ledger sequence number |
| `xrpld_server_info{metric="ledger_current_index"}` | Gauge | `metric` | Current open ledger sequence |
| `xrpld_server_info{metric="peer_disconnects_resources"}` | Gauge | `metric` | Cumulative resource-related peer disconnects |
| `xrpld_server_info{metric="last_close_proposers"}` | Gauge | `metric` | Proposers in last closed round |
| `xrpld_server_info{metric="last_close_converge_time_ms"}` | Gauge | `metric` | Last close convergence time (milliseconds) |
#### Build Info (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------- | ----- | --------- | --------------------------------- |
| `rippled_build_info{version="<ver>"}` | Gauge | `version` | Info-style metric, always value 1 |
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------- | ----- | --------- | --------------------------------- |
| `xrpld_build_info{version="<ver>"}` | Gauge | `version` | Info-style metric, always value 1 |
#### Complete Ledger Ranges (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------------------------- | ----- | --------------- | --------------------------- |
| `rippled_complete_ledgers{bound="start",index="<N>"}` | Gauge | `bound`,`index` | Start of contiguous range N |
| `rippled_complete_ledgers{bound="end",index="<N>"}` | Gauge | `bound`,`index` | End of contiguous range N |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------- | ----- | --------------- | --------------------------- |
| `xrpld_complete_ledgers{bound="start",index="<N>"}` | Gauge | `bound`,`index` | Start of contiguous range N |
| `xrpld_complete_ledgers{bound="end",index="<N>"}` | Gauge | `bound`,`index` | End of contiguous range N |
#### Database Metrics (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------- | ----- | -------- | --------------------------------- |
| `rippled_db_metrics{metric="db_kb_total"}` | Gauge | `metric` | Total database size (KB) |
| `rippled_db_metrics{metric="db_kb_ledger"}` | Gauge | `metric` | Ledger database size (KB) |
| `rippled_db_metrics{metric="db_kb_transaction"}` | Gauge | `metric` | Transaction database size (KB) |
| `rippled_db_metrics{metric="historical_perminute"}` | Gauge | `metric` | Historical ledger fetches per min |
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------- | ----- | -------- | --------------------------------- |
| `xrpld_db_metrics{metric="db_kb_total"}` | Gauge | `metric` | Total database size (KB) |
| `xrpld_db_metrics{metric="db_kb_ledger"}` | Gauge | `metric` | Ledger database size (KB) |
| `xrpld_db_metrics{metric="db_kb_transaction"}` | Gauge | `metric` | Transaction database size (KB) |
| `xrpld_db_metrics{metric="historical_perminute"}` | Gauge | `metric` | Historical ledger fetches per min |
#### Extended Cache Metrics (additions to existing rippled_cache_metrics)
#### Extended Cache Metrics (additions to existing xrpld_cache_metrics)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------------- | ----- | -------- | ------------------------- |
| `rippled_cache_metrics{metric="AL_size"}` | Gauge | `metric` | AcceptedLedger cache size |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------- | ----- | -------- | ------------------------- |
| `xrpld_cache_metrics{metric="AL_size"}` | Gauge | `metric` | AcceptedLedger cache size |
#### Extended NodeStore Metrics (additions to existing rippled_nodestore_state)
#### Extended NodeStore Metrics (additions to existing xrpld_nodestore_state)
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------------- | ----- | -------- | ----------------------------------- |
| `rippled_nodestore_state{metric="node_reads_duration_us"}` | Gauge | `metric` | Cumulative read time (microseconds) |
| `rippled_nodestore_state{metric="read_request_bundle"}` | Gauge | `metric` | Read request bundle count |
| `rippled_nodestore_state{metric="read_threads_running"}` | Gauge | `metric` | Active read threads |
| `rippled_nodestore_state{metric="read_threads_total"}` | Gauge | `metric` | Total read threads configured |
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------------------------- | ----- | -------- | ----------------------------------- |
| `xrpld_nodestore_state{metric="node_reads_duration_us"}` | Gauge | `metric` | Cumulative read time (microseconds) |
| `xrpld_nodestore_state{metric="read_request_bundle"}` | Gauge | `metric` | Read request bundle count |
| `xrpld_nodestore_state{metric="read_threads_running"}` | Gauge | `metric` | Active read threads |
| `xrpld_nodestore_state{metric="read_threads_total"}` | Gauge | `metric` | Total read threads configured |
### New Grafana Dashboards (Phase 9)
| Dashboard | UID | Data Source | Key Panels |
| ------------------ | -------------------- | ----------- | ----------------------------------------------------------------- |
| Fee Market & TxQ | `rippled-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown, escalation |
| Job Queue Analysis | `rippled-job-queue` | Prometheus | Per-job rates, queue wait times, execution times, queue depth |
| Dashboard | UID | Data Source | Key Panels |
| ------------------ | ------------------ | ----------- | ----------------------------------------------------------------- |
| Fee Market & TxQ | `xrpld-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown, escalation |
| Job Queue Analysis | `xrpld-job-queue` | Prometheus | Per-job rates, queue wait times, execution times, queue depth |
---
@@ -737,7 +737,7 @@ Phase 10 builds a 5-node validator docker-compose harness with RPC load generato
> **Plan details**: [06-implementation-phases.md §6.8.4](./06-implementation-phases.md) — motivation, architecture, consumer gap analysis
> **Task breakdown**: [Phase11_taskList.md](./Phase11_taskList.md) — per-task implementation details
Phase 11 builds a custom OTel Collector receiver (Go) that polls rippled's admin RPCs and exports `xrpl_*` metrics for external consumers. No rippled code changes.
Phase 11 builds a custom OTel Collector receiver (Go) that polls xrpld's admin RPCs and exports `xrpl_*` metrics for external consumers. No xrpld code changes.
### Exported Metrics (via Custom OTel Collector Receiver)
@@ -807,102 +807,102 @@ via OTLP/HTTP to the OTel Collector and scraped by Prometheus.
#### NodeStore I/O (Observable Gauge — `nodestore_state`)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------------ | ----- | -------- | ------------------------------------ |
| `rippled_nodestore_state{metric="node_reads_total"}` | Gauge | `metric` | Cumulative NodeStore read operations |
| `rippled_nodestore_state{metric="node_reads_hit"}` | Gauge | `metric` | Reads served from cache |
| `rippled_nodestore_state{metric="node_writes"}` | Gauge | `metric` | Cumulative write operations |
| `rippled_nodestore_state{metric="node_written_bytes"}` | Gauge | `metric` | Cumulative bytes written |
| `rippled_nodestore_state{metric="node_read_bytes"}` | Gauge | `metric` | Cumulative bytes read |
| `rippled_nodestore_state{metric="write_load"}` | Gauge | `metric` | Current write load score |
| `rippled_nodestore_state{metric="read_queue"}` | Gauge | `metric` | Items in read prefetch queue |
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------- | ----- | -------- | ------------------------------------ |
| `xrpld_nodestore_state{metric="node_reads_total"}` | Gauge | `metric` | Cumulative NodeStore read operations |
| `xrpld_nodestore_state{metric="node_reads_hit"}` | Gauge | `metric` | Reads served from cache |
| `xrpld_nodestore_state{metric="node_writes"}` | Gauge | `metric` | Cumulative write operations |
| `xrpld_nodestore_state{metric="node_written_bytes"}` | Gauge | `metric` | Cumulative bytes written |
| `xrpld_nodestore_state{metric="node_read_bytes"}` | Gauge | `metric` | Cumulative bytes read |
| `xrpld_nodestore_state{metric="write_load"}` | Gauge | `metric` | Current write load score |
| `xrpld_nodestore_state{metric="read_queue"}` | Gauge | `metric` | Items in read prefetch queue |
#### Cache Hit Rates & Sizes (Observable Gauge — `cache_metrics`)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------------------------- | ----- | -------- | ----------------------------- |
| `rippled_cache_metrics{metric="SLE_hit_rate"}` | Gauge | `metric` | SLE cache hit rate (0.0-1.0) |
| `rippled_cache_metrics{metric="ledger_hit_rate"}` | Gauge | `metric` | Ledger cache hit rate |
| `rippled_cache_metrics{metric="AL_hit_rate"}` | Gauge | `metric` | AcceptedLedger cache hit rate |
| `rippled_cache_metrics{metric="treenode_cache_size"}` | Gauge | `metric` | SHAMap TreeNode cache entries |
| `rippled_cache_metrics{metric="treenode_track_size"}` | Gauge | `metric` | Tracked tree nodes |
| `rippled_cache_metrics{metric="fullbelow_size"}` | Gauge | `metric` | FullBelow cache entries |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------- | ----- | -------- | ----------------------------- |
| `xrpld_cache_metrics{metric="SLE_hit_rate"}` | Gauge | `metric` | SLE cache hit rate (0.0-1.0) |
| `xrpld_cache_metrics{metric="ledger_hit_rate"}` | Gauge | `metric` | Ledger cache hit rate |
| `xrpld_cache_metrics{metric="AL_hit_rate"}` | Gauge | `metric` | AcceptedLedger cache hit rate |
| `xrpld_cache_metrics{metric="treenode_cache_size"}` | Gauge | `metric` | SHAMap TreeNode cache entries |
| `xrpld_cache_metrics{metric="treenode_track_size"}` | Gauge | `metric` | Tracked tree nodes |
| `xrpld_cache_metrics{metric="fullbelow_size"}` | Gauge | `metric` | FullBelow cache entries |
#### Transaction Queue (Observable Gauge — `txq_metrics`)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------------------ | ----- | -------- | -------------------------------- |
| `rippled_txq_metrics{metric="txq_count"}` | Gauge | `metric` | Transactions currently in queue |
| `rippled_txq_metrics{metric="txq_max_size"}` | Gauge | `metric` | Maximum queue capacity |
| `rippled_txq_metrics{metric="txq_in_ledger"}` | Gauge | `metric` | Transactions in open ledger |
| `rippled_txq_metrics{metric="txq_per_ledger"}` | Gauge | `metric` | Expected transactions per ledger |
| `rippled_txq_metrics{metric="txq_reference_fee_level"}` | Gauge | `metric` | Reference fee level |
| `rippled_txq_metrics{metric="txq_min_processing_fee_level"}` | Gauge | `metric` | Minimum fee to get processed |
| `rippled_txq_metrics{metric="txq_med_fee_level"}` | Gauge | `metric` | Median fee level in queue |
| `rippled_txq_metrics{metric="txq_open_ledger_fee_level"}` | Gauge | `metric` | Open ledger fee escalation level |
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------------- | ----- | -------- | -------------------------------- |
| `xrpld_txq_metrics{metric="txq_count"}` | Gauge | `metric` | Transactions currently in queue |
| `xrpld_txq_metrics{metric="txq_max_size"}` | Gauge | `metric` | Maximum queue capacity |
| `xrpld_txq_metrics{metric="txq_in_ledger"}` | Gauge | `metric` | Transactions in open ledger |
| `xrpld_txq_metrics{metric="txq_per_ledger"}` | Gauge | `metric` | Expected transactions per ledger |
| `xrpld_txq_metrics{metric="txq_reference_fee_level"}` | Gauge | `metric` | Reference fee level |
| `xrpld_txq_metrics{metric="txq_min_processing_fee_level"}` | Gauge | `metric` | Minimum fee to get processed |
| `xrpld_txq_metrics{metric="txq_med_fee_level"}` | Gauge | `metric` | Median fee level in queue |
| `xrpld_txq_metrics{metric="txq_open_ledger_fee_level"}` | Gauge | `metric` | Open ledger fee escalation level |
#### Per-RPC Method Metrics (Synchronous Counters/Histogram)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------- | --------- | ----------------- | -------------------------------- |
| `rippled_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
| `rippled_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed successfully |
| `rippled_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls that errored |
| `rippled_rpc_method_duration_us` | Histogram | `method="<name>"` | Execution time distribution (us) |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------- | --------- | ----------------- | -------------------------------- |
| `xrpld_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
| `xrpld_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed successfully |
| `xrpld_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls that errored |
| `xrpld_rpc_method_duration_us` | Histogram | `method="<name>"` | Execution time distribution (us) |
#### Per-Job-Type Metrics (Synchronous Counters/Histogram)
| Prometheus Metric | Type | Labels | Description |
| --------------------------------- | --------- | ------------------- | --------------------------------- |
| `rippled_job_queued_total` | Counter | `job_type="<name>"` | Jobs enqueued |
| `rippled_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
| `rippled_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
| `rippled_job_queued_duration_us` | Histogram | `job_type="<name>"` | Queue wait time distribution (us) |
| `rippled_job_running_duration_us` | Histogram | `job_type="<name>"` | Execution time distribution (us) |
| Prometheus Metric | Type | Labels | Description |
| ------------------------------- | --------- | ------------------- | --------------------------------- |
| `xrpld_job_queued_total` | Counter | `job_type="<name>"` | Jobs enqueued |
| `xrpld_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
| `xrpld_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
| `xrpld_job_queued_duration_us` | Histogram | `job_type="<name>"` | Queue wait time distribution (us) |
| `xrpld_job_running_duration_us` | Histogram | `job_type="<name>"` | Execution time distribution (us) |
#### Counted Object Instances (Observable Gauge — `object_count`)
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------- | ----- | --------------- | ------------------------------ |
| `rippled_object_count{type="Transaction"}` | Gauge | `type="<name>"` | Live Transaction objects |
| `rippled_object_count{type="Ledger"}` | Gauge | `type="<name>"` | Live Ledger objects |
| `rippled_object_count{type="NodeObject"}` | Gauge | `type="<name>"` | Live NodeObject instances |
| `rippled_object_count{type="STTx"}` | Gauge | `type="<name>"` | Serialized transaction objects |
| `rippled_object_count{type="STLedgerEntry"}` | Gauge | `type="<name>"` | Serialized ledger entries |
| `rippled_object_count{type="InboundLedger"}` | Gauge | `type="<name>"` | Ledgers being fetched |
| `rippled_object_count{type="Pathfinder"}` | Gauge | `type="<name>"` | Active pathfinding operations |
| `rippled_object_count{type="PathRequest"}` | Gauge | `type="<name>"` | Active path requests |
| `rippled_object_count{type="HashRouterEntry"}` | Gauge | `type="<name>"` | Hash router entries |
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------------- | ----- | --------------- | ------------------------------ |
| `xrpld_object_count{type="Transaction"}` | Gauge | `type="<name>"` | Live Transaction objects |
| `xrpld_object_count{type="Ledger"}` | Gauge | `type="<name>"` | Live Ledger objects |
| `xrpld_object_count{type="NodeObject"}` | Gauge | `type="<name>"` | Live NodeObject instances |
| `xrpld_object_count{type="STTx"}` | Gauge | `type="<name>"` | Serialized transaction objects |
| `xrpld_object_count{type="STLedgerEntry"}` | Gauge | `type="<name>"` | Serialized ledger entries |
| `xrpld_object_count{type="InboundLedger"}` | Gauge | `type="<name>"` | Ledgers being fetched |
| `xrpld_object_count{type="Pathfinder"}` | Gauge | `type="<name>"` | Active pathfinding operations |
| `xrpld_object_count{type="PathRequest"}` | Gauge | `type="<name>"` | Active path requests |
| `xrpld_object_count{type="HashRouterEntry"}` | Gauge | `type="<name>"` | Hash router entries |
#### Load Factor Breakdown (Observable Gauge — `load_factor_metrics`)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------------------------ | ----- | -------- | --------------------------------------- |
| `rippled_load_factor_metrics{metric="load_factor"}` | Gauge | `metric` | Combined transaction cost multiplier |
| `rippled_load_factor_metrics{metric="load_factor_server"}` | Gauge | `metric` | Server + cluster + network contribution |
| `rippled_load_factor_metrics{metric="load_factor_local"}` | Gauge | `metric` | Local server load only |
| `rippled_load_factor_metrics{metric="load_factor_net"}` | Gauge | `metric` | Network-wide load estimate |
| `rippled_load_factor_metrics{metric="load_factor_cluster"}` | Gauge | `metric` | Cluster peer load |
| `rippled_load_factor_metrics{metric="load_factor_fee_escalation"}` | Gauge | `metric` | Open ledger fee escalation |
| `rippled_load_factor_metrics{metric="load_factor_fee_queue"}` | Gauge | `metric` | Queue entry fee level |
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------------------- | ----- | -------- | --------------------------------------- |
| `xrpld_load_factor_metrics{metric="load_factor"}` | Gauge | `metric` | Combined transaction cost multiplier |
| `xrpld_load_factor_metrics{metric="load_factor_server"}` | Gauge | `metric` | Server + cluster + network contribution |
| `xrpld_load_factor_metrics{metric="load_factor_local"}` | Gauge | `metric` | Local server load only |
| `xrpld_load_factor_metrics{metric="load_factor_net"}` | Gauge | `metric` | Network-wide load estimate |
| `xrpld_load_factor_metrics{metric="load_factor_cluster"}` | Gauge | `metric` | Cluster peer load |
| `xrpld_load_factor_metrics{metric="load_factor_fee_escalation"}` | Gauge | `metric` | Open ledger fee escalation |
| `xrpld_load_factor_metrics{metric="load_factor_fee_queue"}` | Gauge | `metric` | Queue entry fee level |
#### Prometheus Query Examples (Phase 9)
```promql
# NodeStore cache hit ratio
rippled_nodestore_state{metric="node_reads_hit"} / rippled_nodestore_state{metric="node_reads_total"}
xrpld_nodestore_state{metric="node_reads_hit"} / xrpld_nodestore_state{metric="node_reads_total"}
# RPC error rate for server_info
rate(rippled_rpc_method_errored_total{method="server_info"}[5m])
rate(xrpld_rpc_method_errored_total{method="server_info"}[5m])
# Job queue wait time p95
histogram_quantile(0.95, sum by (le) (rate(rippled_job_queued_duration_us_bucket[5m])))
histogram_quantile(0.95, sum by (le) (rate(xrpld_job_queued_duration_us_bucket[5m])))
# TxQ utilization percentage
rippled_txq_metrics{metric="txq_count"} / rippled_txq_metrics{metric="txq_max_size"}
xrpld_txq_metrics{metric="txq_count"} / xrpld_txq_metrics{metric="txq_max_size"}
# High load factor alert candidate
rippled_load_factor_metrics{metric="load_factor"} > 5
xrpld_load_factor_metrics{metric="load_factor"} > 5
```
### Phase 7+: External Dashboard Parity Metrics
@@ -911,75 +911,75 @@ rippled_load_factor_metrics{metric="load_factor"} > 5
>
> **Task breakdown**: Phase 7 Tasks 7.9-7.16 (implementation), Phase 9 Tasks 9.11-9.13 (dashboards)
These metrics fill gaps identified by comparing rippled's internal observability with the community external dashboard's 86-metric coverage. All are exported via the OTel Metrics SDK (same `PeriodicMetricReader` as Phase 9 metrics).
These metrics fill gaps identified by comparing xrpld's internal observability with the community external dashboard's 86-metric coverage. All are exported via the OTel Metrics SDK (same `PeriodicMetricReader` as Phase 9 metrics).
#### Validation Agreement (Observable Gauge — `validation_agreement`)
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------------- | ------ | -------- | --------------------------------------- |
| `rippled_validation_agreement{metric="agreement_pct_1h"}` | Double | `metric` | Rolling 1h agreement percentage (0-100) |
| `rippled_validation_agreement{metric="agreement_pct_24h"}` | Double | `metric` | Rolling 24h agreement percentage |
| `rippled_validation_agreement{metric="agreements_1h"}` | Int64 | `metric` | Agreed validations in 1h window |
| `rippled_validation_agreement{metric="missed_1h"}` | Int64 | `metric` | Missed validations in 1h window |
| `rippled_validation_agreement{metric="agreements_24h"}` | Int64 | `metric` | Agreed validations in 24h window |
| `rippled_validation_agreement{metric="missed_24h"}` | Int64 | `metric` | Missed validations in 24h window |
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------------------------- | ------ | -------- | --------------------------------------- |
| `xrpld_validation_agreement{metric="agreement_pct_1h"}` | Double | `metric` | Rolling 1h agreement percentage (0-100) |
| `xrpld_validation_agreement{metric="agreement_pct_24h"}` | Double | `metric` | Rolling 24h agreement percentage |
| `xrpld_validation_agreement{metric="agreements_1h"}` | Int64 | `metric` | Agreed validations in 1h window |
| `xrpld_validation_agreement{metric="missed_1h"}` | Int64 | `metric` | Missed validations in 1h window |
| `xrpld_validation_agreement{metric="agreements_24h"}` | Int64 | `metric` | Agreed validations in 24h window |
| `xrpld_validation_agreement{metric="missed_24h"}` | Int64 | `metric` | Missed validations in 24h window |
Data source: `ValidationTracker` class with 8s grace period and 5m late repair window.
#### Validator Health (Observable Gauge — `validator_health`)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------------ | ------ | -------- | ------------------------------ |
| `rippled_validator_health{metric="amendment_blocked"}` | Int64 | `metric` | 1 if amendment-blocked, else 0 |
| `rippled_validator_health{metric="unl_blocked"}` | Int64 | `metric` | 1 if UNL-blocked, else 0 |
| `rippled_validator_health{metric="unl_expiry_days"}` | Double | `metric` | Days until UNL list expires |
| `rippled_validator_health{metric="validation_quorum"}` | Int64 | `metric` | Validation quorum threshold |
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------- | ------ | -------- | ------------------------------ |
| `xrpld_validator_health{metric="amendment_blocked"}` | Int64 | `metric` | 1 if amendment-blocked, else 0 |
| `xrpld_validator_health{metric="unl_blocked"}` | Int64 | `metric` | 1 if UNL-blocked, else 0 |
| `xrpld_validator_health{metric="unl_expiry_days"}` | Double | `metric` | Days until UNL list expires |
| `xrpld_validator_health{metric="validation_quorum"}` | Int64 | `metric` | Validation quorum threshold |
#### Peer Quality (Observable Gauge — `peer_quality`)
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------------- | ------ | -------- | ------------------------------------ |
| `rippled_peer_quality{metric="peer_latency_p90_ms"}` | Double | `metric` | P90 peer latency in milliseconds |
| `rippled_peer_quality{metric="peers_insane_count"}` | Int64 | `metric` | Peers with diverged tracking status |
| `rippled_peer_quality{metric="peers_higher_version_pct"}` | Double | `metric` | % of peers on newer rippled version |
| `rippled_peer_quality{metric="upgrade_recommended"}` | Int64 | `metric` | 1 if >60% of peers are newer version |
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------------------- | ------ | -------- | ------------------------------------ |
| `xrpld_peer_quality{metric="peer_latency_p90_ms"}` | Double | `metric` | P90 peer latency in milliseconds |
| `xrpld_peer_quality{metric="peers_insane_count"}` | Int64 | `metric` | Peers with diverged tracking status |
| `xrpld_peer_quality{metric="peers_higher_version_pct"}` | Double | `metric` | % of peers on newer xrpld version |
| `xrpld_peer_quality{metric="upgrade_recommended"}` | Int64 | `metric` | 1 if >60% of peers are newer version |
#### Ledger Economy (Observable Gauge — `ledger_economy`)
| Prometheus Metric | Type | Labels | Description |
| ----------------------------------------------------- | ------ | -------- | ---------------------------------- |
| `rippled_ledger_economy{metric="base_fee_xrp"}` | Double | `metric` | Base transaction fee in drops |
| `rippled_ledger_economy{metric="reserve_base_xrp"}` | Double | `metric` | Account reserve in drops |
| `rippled_ledger_economy{metric="reserve_inc_xrp"}` | Double | `metric` | Owner reserve increment in drops |
| `rippled_ledger_economy{metric="ledger_age_seconds"}` | Double | `metric` | Seconds since last validated close |
| `rippled_ledger_economy{metric="transaction_rate"}` | Double | `metric` | Smoothed transaction rate (tx/s) |
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------------- | ------ | -------- | ---------------------------------- |
| `xrpld_ledger_economy{metric="base_fee_xrp"}` | Double | `metric` | Base transaction fee in drops |
| `xrpld_ledger_economy{metric="reserve_base_xrp"}` | Double | `metric` | Account reserve in drops |
| `xrpld_ledger_economy{metric="reserve_inc_xrp"}` | Double | `metric` | Owner reserve increment in drops |
| `xrpld_ledger_economy{metric="ledger_age_seconds"}` | Double | `metric` | Seconds since last validated close |
| `xrpld_ledger_economy{metric="transaction_rate"}` | Double | `metric` | Smoothed transaction rate (tx/s) |
#### State Tracking (Observable Gauge — `state_tracking`)
| Prometheus Metric | Type | Labels | Description |
| ---------------------------------------------------------------- | ------ | -------- | -------------------------------------- |
| `rippled_state_tracking{metric="state_value"}` | Int64 | `metric` | Numeric state 0-6 (see encoding below) |
| `rippled_state_tracking{metric="time_in_current_state_seconds"}` | Double | `metric` | Duration in current state |
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------------------------------- | ------ | -------- | -------------------------------------- |
| `xrpld_state_tracking{metric="state_value"}` | Int64 | `metric` | Numeric state 0-6 (see encoding below) |
| `xrpld_state_tracking{metric="time_in_current_state_seconds"}` | Double | `metric` | Duration in current state |
State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full, 5=validating (FULL + validating), 6=proposing (FULL + proposing).
#### Storage Detail (Observable Gauge — `storage_detail`)
| Prometheus Metric | Type | Labels | Description |
| --------------------------------------------- | ----- | -------- | ---------------------- |
| `rippled_storage_detail{metric="nudb_bytes"}` | Int64 | `metric` | NuDB backend file size |
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------------- | ----- | -------- | ---------------------- |
| `xrpld_storage_detail{metric="nudb_bytes"}` | Int64 | `metric` | NuDB backend file size |
#### Synchronous Counters (Phase 7+)
| Prometheus Metric | Type | Description | Increment Site |
| ------------------------------------- | ------- | -------------------------------- | --------------------- |
| `rippled_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
| `rippled_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
| `rippled_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
| `rippled_validation_agreements_total` | Counter | Cumulative validation agreements | ValidationTracker.cpp |
| `rippled_validation_missed_total` | Counter | Cumulative validation misses | ValidationTracker.cpp |
| `rippled_state_changes_total` | Counter | Operating mode transitions | NetworkOPs.cpp |
| `rippled_jq_trans_overflow_total` | Counter | Job queue transaction overflows | JobQueue.cpp |
| Prometheus Metric | Type | Description | Increment Site |
| ----------------------------------- | ------- | -------------------------------- | --------------------- |
| `xrpld_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
| `xrpld_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
| `xrpld_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
| `xrpld_validation_agreements_total` | Counter | Cumulative validation agreements | ValidationTracker.cpp |
| `xrpld_validation_missed_total` | Counter | Cumulative validation misses | ValidationTracker.cpp |
| `xrpld_state_changes_total` | Counter | Operating mode transitions | NetworkOPs.cpp |
| `xrpld_jq_trans_overflow_total` | Counter | Job queue transaction overflows | JobQueue.cpp |
#### Span Attribute Enrichments (Phases 2-4)
@@ -997,29 +997,29 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
### New Grafana Dashboards (Phase 9)
| Dashboard | UID | Data Source | Key Panels |
| ---------------------- | -------------------------- | ----------- | --------------------------------------------------------- |
| Fee Market & TxQ | `rippled-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown |
| Job Queue Analysis | `rippled-job-queue` | Prometheus | Per-job rates, queue wait times, execution times |
| RPC Performance (OTel) | `rippled-rpc-perf` | Prometheus | Per-method call rates, error rates, latency distributions |
| Validator Health | `rippled-validator-health` | Prometheus | Agreement %, validation rate, amendment/UNL, state |
| Peer Quality | `rippled-peer-quality` | Prometheus | P90 latency, insane peers, version awareness, disconnects |
| Dashboard | UID | Data Source | Key Panels |
| ---------------------- | ------------------------ | ----------- | --------------------------------------------------------- |
| Fee Market & TxQ | `xrpld-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown |
| Job Queue Analysis | `xrpld-job-queue` | Prometheus | Per-job rates, queue wait times, execution times |
| RPC Performance (OTel) | `xrpld-rpc-perf` | Prometheus | Per-method call rates, error rates, latency distributions |
| Validator Health | `xrpld-validator-health` | Prometheus | Agreement %, validation rate, amendment/UNL, state |
| Peer Quality | `xrpld-peer-quality` | Prometheus | P90 latency, insane peers, version awareness, disconnects |
### Updated Grafana Dashboards (Phase 9)
| Dashboard | UID | New Panels Added |
| -------------------- | ---------------------------- | -------------------------------------------------------------------- |
| Node Health (StatsD) | `rippled-statsd-node-health` | NodeStore I/O, cache hit rates, object instance counts |
| System Node Health | `rippled-system-node-health` | Ledger economy row: base fee, reserves, ledger age, transaction rate |
| Dashboard | UID | New Panels Added |
| -------------------- | -------------------------- | -------------------------------------------------------------------- |
| Node Health (StatsD) | `xrpld-statsd-node-health` | NodeStore I/O, cache hit rates, object instance counts |
| System Node Health | `xrpld-system-node-health` | Ledger economy row: base fee, reserves, ledger age, transaction rate |
### New Grafana Dashboards (Phase 11)
| Dashboard | UID | Data Source | Key Panels |
| ------------------ | ----------------------------- | ----------- | ---------------------------------------------------------------------- |
| Validator Health | `rippled-validator-health` | Prometheus | Server state timeline, proposer count, converge time, amendment voting |
| Network Topology | `rippled-network-topology` | Prometheus | Peer count, version distribution, latency distribution, diverged peers |
| Fee Market (Ext) | `rippled-fee-market-external` | Prometheus | Fee levels, queue depth, load factor breakdown, escalation timeline |
| DEX & AMM Overview | `rippled-dex-amm` | Prometheus | AMM TVL, order book depth, spread trends, trading fee revenue |
| Dashboard | UID | Data Source | Key Panels |
| ------------------ | --------------------------- | ----------- | ---------------------------------------------------------------------- |
| Validator Health | `xrpld-validator-health` | Prometheus | Server state timeline, proposer count, converge time, amendment voting |
| Network Topology | `xrpld-network-topology` | Prometheus | Peer count, version distribution, latency distribution, diverged peers |
| Fee Market (Ext) | `xrpld-fee-market-external` | Prometheus | Fee levels, queue depth, load factor breakdown, escalation timeline |
| DEX & AMM Overview | `xrpld-dex-amm` | Prometheus | AMM TVL, order book depth, spread trends, trading fee revenue |
### Prometheus Alerting Rules (Phase 11)
@@ -1044,8 +1044,8 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
| Issue | Impact | Status |
| ------------------------------------------------------------------ | ------------------------------------------------ | -------------------------------------------------------------------- |
| `warn` and `drop` metrics use non-standard StatsD `\|m` meter type | Metrics silently dropped by OTel StatsD receiver | Phase 6 Task 6.1 — needs `\|m``\|c` change in StatsDCollector.cpp |
| `rippled_job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
| `rippled_rpc_requests` depends on `[insight]` config | Zero series if StatsD not configured | Requires `[insight] server=statsd` in xrpld.cfg |
| `xrpld_job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
| `xrpld_rpc_requests` depends on `[insight]` config | Zero series if StatsD not configured | Requires `[insight] server=statsd` in xrpld.cfg |
| Peer tracing disabled by default | No `peer.*` spans unless `trace_peer=1` | Intentional — high volume on mainnet |
---
@@ -1077,7 +1077,7 @@ enabled=1
[insight]
server=statsd
address=127.0.0.1:8125
prefix=rippled
prefix=xrpld
```
### Production Setup
@@ -1094,7 +1094,7 @@ max_queue_size=4096
[insight]
server=statsd
address=otel-collector:8125
prefix=rippled
prefix=xrpld
```
### Trace Category Toggle