mirror of
https://github.com/XRPLF/rippled.git
synced 2026-06-04 17:27:00 +00:00
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Observability Data Collection Reference
|
||||
|
||||
> **Audience**: Developers and operators. This is the single source of truth for all telemetry data collected by rippled's observability stack.
|
||||
> **Audience**: Developers and operators. This is the single source of truth for all telemetry data collected by xrpld's observability stack.
|
||||
>
|
||||
> **Related docs**: [docs/telemetry-runbook.md](../docs/telemetry-runbook.md) (operator runbook with alerting and troubleshooting) | [03-implementation-strategy.md](./03-implementation-strategy.md) (code structure and performance optimization) | [04-code-samples.md](./04-code-samples.md) (C++ instrumentation examples)
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph rippledNode["rippled Node"]
|
||||
subgraph xrpldNode["xrpld Node"]
|
||||
A["Trace Macros<br/>XRPL_TRACE_SPAN<br/>(OTLP/HTTP exporter)"]
|
||||
B["beast::insight<br/>OTel native metrics<br/>(OTLP/HTTP exporter)"]
|
||||
C["MetricsRegistry<br/>OTel SDK metrics<br/>(OTLP/HTTP exporter)"]
|
||||
@@ -43,7 +43,7 @@ graph LR
|
||||
BP -->|"OTLP/gRPC :4317"| D
|
||||
|
||||
SM -->|"span_calls_total<br/>span_duration_ms<br/>(6 dimension labels)"| E
|
||||
R1 -->|"rippled_* gauges<br/>rippled_* counters<br/>rippled_* histograms"| E
|
||||
R1 -->|"xrpld_* gauges<br/>xrpld_* counters<br/>xrpld_* histograms"| E
|
||||
|
||||
E -->|"Prometheus<br/>data source"| F
|
||||
D -->|"Tempo<br/>data source"| F
|
||||
@@ -56,7 +56,7 @@ graph LR
|
||||
style D fill:#f0ad4e,color:#000,stroke:#c78c2e
|
||||
style E fill:#f0ad4e,color:#000,stroke:#c78c2e
|
||||
style F fill:#5bc0de,color:#000,stroke:#3aa8c1
|
||||
style rippledNode fill:#1a2633,color:#ccc,stroke:#4a90d9
|
||||
style xrpldNode fill:#1a2633,color:#ccc,stroke:#4a90d9
|
||||
style collector fill:#1a3320,color:#ccc,stroke:#5cb85c
|
||||
style backends fill:#332a1a,color:#ccc,stroke:#f0ad4e
|
||||
style metrics fill:#332a1a,color:#ccc,stroke:#f0ad4e
|
||||
@@ -93,9 +93,9 @@ Controlled by `trace_rpc=1` in `[telemetry]` config.
|
||||
| `rpc.ws_message` | — | ServerHandler.cpp | WebSocket message handling |
|
||||
| `rpc.command.<name>` | `rpc.process` | RPCHandler.cpp | Per-command span (e.g., `rpc.command.server_info`, `rpc.command.ledger`) |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"rpc.request|rpc.command.*"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"rpc.request|rpc.command.*"}`
|
||||
|
||||
**Grafana dashboard**: _RPC Performance_ (`rippled-rpc-perf`)
|
||||
**Grafana dashboard**: _RPC Performance_ (`xrpld-rpc-perf`)
|
||||
|
||||
#### Transaction Spans
|
||||
|
||||
@@ -107,9 +107,9 @@ Controlled by `trace_transactions=1` in `[telemetry]` config.
|
||||
| `tx.receive` | — | PeerImp.cpp | Raw transaction received from peer overlay (before deduplication) |
|
||||
| `tx.apply` | `ledger.build` | BuildLedger.cpp | Transaction set applied to new ledger during consensus |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"tx.process|tx.receive"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"tx.process|tx.receive"}`
|
||||
|
||||
**Grafana dashboard**: _Transaction Overview_ (`rippled-transactions`)
|
||||
**Grafana dashboard**: _Transaction Overview_ (`xrpld-transactions`)
|
||||
|
||||
#### Consensus Spans
|
||||
|
||||
@@ -123,9 +123,9 @@ Controlled by `trace_consensus=1` in `[telemetry]` config.
|
||||
| `consensus.validation.send` | — | RCLConsensus.cpp | Validation message sent after ledger accepted |
|
||||
| `consensus.accept.apply` | — | RCLConsensus.cpp | Ledger application with close time details |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"consensus.*"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"consensus.*"}`
|
||||
|
||||
**Grafana dashboard**: _Consensus Health_ (`rippled-consensus`)
|
||||
**Grafana dashboard**: _Consensus Health_ (`xrpld-consensus`)
|
||||
|
||||
#### Ledger Spans
|
||||
|
||||
@@ -137,9 +137,9 @@ Controlled by `trace_ledger=1` in `[telemetry]` config.
|
||||
| `ledger.validate` | — | LedgerMaster.cpp | Ledger promoted to validated status |
|
||||
| `ledger.store` | — | LedgerMaster.cpp | Ledger stored to database/history |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"ledger.*"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"ledger.*"}`
|
||||
|
||||
**Grafana dashboard**: _Ledger Operations_ (`rippled-ledger-ops`)
|
||||
**Grafana dashboard**: _Ledger Operations_ (`xrpld-ledger-ops`)
|
||||
|
||||
#### Peer Spans
|
||||
|
||||
@@ -150,9 +150,9 @@ Controlled by `trace_peer=1` in `[telemetry]` config. **Disabled by default** (h
|
||||
| `peer.proposal.receive` | — | PeerImp.cpp | Consensus proposal received from peer |
|
||||
| `peer.validation.receive` | — | PeerImp.cpp | Validation message received from peer |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"peer.*"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"peer.*"}`
|
||||
|
||||
**Grafana dashboard**: _Peer Network_ (`rippled-peer-net`)
|
||||
**Grafana dashboard**: _Peer Network_ (`xrpld-peer-net`)
|
||||
|
||||
---
|
||||
|
||||
@@ -237,7 +237,7 @@ Every span can carry key-value attributes that provide context for filtering and
|
||||
|
||||
> **See also**: [01-architecture-analysis.md](./01-architecture-analysis.md) §1.8.2 for how span-derived metrics map to operational insights.
|
||||
|
||||
The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Errors, Duration) metrics from every span. No custom metrics code in rippled is needed.
|
||||
The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Errors, Duration) metrics from every span. No custom metrics code in xrpld is needed.
|
||||
|
||||
| Prometheus Metric | Type | Description |
|
||||
| -------------------------------------------------- | --------- | ------------------------------------------------------------------------------ |
|
||||
@@ -269,7 +269,7 @@ The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Er
|
||||
>
|
||||
> **Migration complete**: Phase 7 replaced the StatsD UDP transport with native OTel Metrics SDK export via OTLP/HTTP. The `beast::insight::Collector` interface and all metric names are preserved — only the wire protocol changed. `[insight] server=statsd` remains as a fallback.
|
||||
|
||||
These are system-level metrics emitted by rippled's `beast::insight` framework via OTel OTLP/HTTP. They cover operational data that doesn't map to individual trace spans.
|
||||
These are system-level metrics emitted by xrpld's `beast::insight` framework via OTel OTLP/HTTP. They cover operational data that doesn't map to individual trace spans.
|
||||
|
||||
### Configuration
|
||||
|
||||
@@ -278,7 +278,7 @@ These are system-level metrics emitted by rippled's `beast::insight` framework v
|
||||
[insight]
|
||||
server=otel
|
||||
endpoint=http://localhost:4318/v1/metrics
|
||||
prefix=rippled
|
||||
prefix=xrpld
|
||||
```
|
||||
|
||||
Fallback (StatsD):
|
||||
@@ -287,56 +287,56 @@ Fallback (StatsD):
|
||||
[insight]
|
||||
server=statsd
|
||||
address=127.0.0.1:8125
|
||||
prefix=rippled
|
||||
prefix=xrpld
|
||||
```
|
||||
|
||||
### 2.1 Gauges
|
||||
|
||||
| Prometheus Metric | Source File | Description | Typical Range |
|
||||
| --------------------------------------------------- | --------------------- | ----------------------------------------- | ------------------------------- |
|
||||
| `rippled_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h | Seconds since last validated ledger | 0–10 (healthy), >30 (stale) |
|
||||
| `rippled_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h | Seconds since last published ledger | 0–10 (healthy) |
|
||||
| `rippled_State_Accounting_Disconnected_duration` | NetworkOPs.cpp | Cumulative seconds in Disconnected state | Monotonic |
|
||||
| `rippled_State_Accounting_Connected_duration` | NetworkOPs.cpp | Cumulative seconds in Connected state | Monotonic |
|
||||
| `rippled_State_Accounting_Syncing_duration` | NetworkOPs.cpp | Cumulative seconds in Syncing state | Monotonic |
|
||||
| `rippled_State_Accounting_Tracking_duration` | NetworkOPs.cpp | Cumulative seconds in Tracking state | Monotonic |
|
||||
| `rippled_State_Accounting_Full_duration` | NetworkOPs.cpp | Cumulative seconds in Full state | Monotonic (should dominate) |
|
||||
| `rippled_State_Accounting_Disconnected_transitions` | NetworkOPs.cpp | Count of transitions to Disconnected | Low |
|
||||
| `rippled_State_Accounting_Connected_transitions` | NetworkOPs.cpp | Count of transitions to Connected | Low |
|
||||
| `rippled_State_Accounting_Syncing_transitions` | NetworkOPs.cpp | Count of transitions to Syncing | Low |
|
||||
| `rippled_State_Accounting_Tracking_transitions` | NetworkOPs.cpp | Count of transitions to Tracking | Low |
|
||||
| `rippled_State_Accounting_Full_transitions` | NetworkOPs.cpp | Count of transitions to Full | Low (should be 1 after startup) |
|
||||
| `rippled_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp | Active inbound peer connections | 0–85 |
|
||||
| `rippled_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp | Active outbound peer connections | 10–21 |
|
||||
| `rippled_Overlay_Peer_Disconnects` | OverlayImpl.cpp | Cumulative peer disconnection count | Low growth |
|
||||
| `rippled_Overlay_Peer_Disconnects_Charges` | OverlayImpl.cpp | Disconnects due to resource limit charges | Low growth (subset of above) |
|
||||
| `rippled_job_count` | JobQueue.cpp | Current job queue depth | 0–100 (healthy) |
|
||||
| Prometheus Metric | Source File | Description | Typical Range |
|
||||
| ------------------------------------------------- | --------------------- | ----------------------------------------- | ------------------------------- |
|
||||
| `xrpld_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h | Seconds since last validated ledger | 0–10 (healthy), >30 (stale) |
|
||||
| `xrpld_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h | Seconds since last published ledger | 0–10 (healthy) |
|
||||
| `xrpld_State_Accounting_Disconnected_duration` | NetworkOPs.cpp | Cumulative seconds in Disconnected state | Monotonic |
|
||||
| `xrpld_State_Accounting_Connected_duration` | NetworkOPs.cpp | Cumulative seconds in Connected state | Monotonic |
|
||||
| `xrpld_State_Accounting_Syncing_duration` | NetworkOPs.cpp | Cumulative seconds in Syncing state | Monotonic |
|
||||
| `xrpld_State_Accounting_Tracking_duration` | NetworkOPs.cpp | Cumulative seconds in Tracking state | Monotonic |
|
||||
| `xrpld_State_Accounting_Full_duration` | NetworkOPs.cpp | Cumulative seconds in Full state | Monotonic (should dominate) |
|
||||
| `xrpld_State_Accounting_Disconnected_transitions` | NetworkOPs.cpp | Count of transitions to Disconnected | Low |
|
||||
| `xrpld_State_Accounting_Connected_transitions` | NetworkOPs.cpp | Count of transitions to Connected | Low |
|
||||
| `xrpld_State_Accounting_Syncing_transitions` | NetworkOPs.cpp | Count of transitions to Syncing | Low |
|
||||
| `xrpld_State_Accounting_Tracking_transitions` | NetworkOPs.cpp | Count of transitions to Tracking | Low |
|
||||
| `xrpld_State_Accounting_Full_transitions` | NetworkOPs.cpp | Count of transitions to Full | Low (should be 1 after startup) |
|
||||
| `xrpld_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp | Active inbound peer connections | 0–85 |
|
||||
| `xrpld_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp | Active outbound peer connections | 10–21 |
|
||||
| `xrpld_Overlay_Peer_Disconnects` | OverlayImpl.cpp | Cumulative peer disconnection count | Low growth |
|
||||
| `xrpld_Overlay_Peer_Disconnects_Charges` | OverlayImpl.cpp | Disconnects due to resource limit charges | Low growth (subset of above) |
|
||||
| `xrpld_job_count` | JobQueue.cpp | Current job queue depth | 0–100 (healthy) |
|
||||
|
||||
**Grafana dashboard**: _Node Health (System Metrics)_ (`rippled-system-node-health`)
|
||||
**Grafana dashboard**: _Node Health (System Metrics)_ (`xrpld-system-node-health`)
|
||||
|
||||
### 2.2 Counters
|
||||
|
||||
| Prometheus Metric | Source File | Description |
|
||||
| --------------------------------- | ------------------ | --------------------------------------------- |
|
||||
| `rippled_rpc_requests` | ServerHandler.cpp | Total RPC requests received |
|
||||
| `rippled_ledger_fetches` | InboundLedgers.cpp | Inbound ledger fetch attempts |
|
||||
| `rippled_ledger_history_mismatch` | LedgerHistory.cpp | Ledger hash mismatches detected |
|
||||
| `rippled_warn` | Logic.h | Resource manager warnings issued |
|
||||
| `rippled_drop` | Logic.h | Resource manager drops (connections rejected) |
|
||||
| Prometheus Metric | Source File | Description |
|
||||
| ------------------------------- | ------------------ | --------------------------------------------- |
|
||||
| `xrpld_rpc_requests` | ServerHandler.cpp | Total RPC requests received |
|
||||
| `xrpld_ledger_fetches` | InboundLedgers.cpp | Inbound ledger fetch attempts |
|
||||
| `xrpld_ledger_history_mismatch` | LedgerHistory.cpp | Ledger hash mismatches detected |
|
||||
| `xrpld_warn` | Logic.h | Resource manager warnings issued |
|
||||
| `xrpld_drop` | Logic.h | Resource manager drops (connections rejected) |
|
||||
|
||||
**Note**: With `server=otel`, `rippled_warn` and `rippled_drop` are properly exported as OTel Counter instruments. The previous StatsD `|m` type limitation no longer applies.
|
||||
**Note**: With `server=otel`, `xrpld_warn` and `xrpld_drop` are properly exported as OTel Counter instruments. The previous StatsD `|m` type limitation no longer applies.
|
||||
|
||||
**Grafana dashboard**: _RPC & Pathfinding (System Metrics)_ (`rippled-system-rpc`)
|
||||
**Grafana dashboard**: _RPC & Pathfinding (System Metrics)_ (`xrpld-system-rpc`)
|
||||
|
||||
### 2.3 Histograms (Event timers)
|
||||
|
||||
| Prometheus Metric | Source File | Unit | Description |
|
||||
| ----------------------- | ----------------- | ----- | ------------------------------ |
|
||||
| `rippled_rpc_time` | ServerHandler.cpp | ms | RPC response time distribution |
|
||||
| `rippled_rpc_size` | ServerHandler.cpp | bytes | RPC response size distribution |
|
||||
| `rippled_ios_latency` | Application.cpp | ms | I/O service loop latency |
|
||||
| `rippled_pathfind_fast` | PathRequests.h | ms | Fast pathfinding duration |
|
||||
| `rippled_pathfind_full` | PathRequests.h | ms | Full pathfinding duration |
|
||||
| Prometheus Metric | Source File | Unit | Description |
|
||||
| --------------------- | ----------------- | ----- | ------------------------------ |
|
||||
| `xrpld_rpc_time` | ServerHandler.cpp | ms | RPC response time distribution |
|
||||
| `xrpld_rpc_size` | ServerHandler.cpp | bytes | RPC response size distribution |
|
||||
| `xrpld_ios_latency` | Application.cpp | ms | I/O service loop latency |
|
||||
| `xrpld_pathfind_fast` | PathRequests.h | ms | Fast pathfinding duration |
|
||||
| `xrpld_pathfind_full` | PathRequests.h | ms | Full pathfinding duration |
|
||||
|
||||
Quantiles collected: 0th, 50th, 90th, 95th, 99th, 100th percentile.
|
||||
|
||||
@@ -346,10 +346,10 @@ Quantiles collected: 0th, 50th, 90th, 95th, 99th, 100th percentile.
|
||||
|
||||
For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), four gauges are emitted:
|
||||
|
||||
- `rippled_{category}_Bytes_In`
|
||||
- `rippled_{category}_Bytes_Out`
|
||||
- `rippled_{category}_Messages_In`
|
||||
- `rippled_{category}_Messages_Out`
|
||||
- `xrpld_{category}_Bytes_In`
|
||||
- `xrpld_{category}_Bytes_Out`
|
||||
- `xrpld_{category}_Messages_In`
|
||||
- `xrpld_{category}_Messages_Out`
|
||||
|
||||
**Key categories**:
|
||||
|
||||
@@ -368,7 +368,7 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
| `ping` / `status` | Keepalive and status |
|
||||
| `set_get` | Set requests |
|
||||
|
||||
**Grafana dashboards**: _Network Traffic_ (`rippled-system-network`), _Overlay Traffic Detail_ (`rippled-system-overlay-detail`), _Ledger Data & Sync_ (`rippled-system-ledger-sync`)
|
||||
**Grafana dashboards**: _Network Traffic_ (`xrpld-system-network`), _Overlay Traffic Detail_ (`xrpld-system-overlay-detail`), _Ledger Data & Sync_ (`xrpld-system-ledger-sync`)
|
||||
|
||||
---
|
||||
|
||||
@@ -378,28 +378,28 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
|
||||
### 3.1 Span-Derived Dashboards (5)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| -------------------- | ---------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
|
||||
| RPC Performance | `rippled-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
|
||||
| Transaction Overview | `rippled-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
|
||||
| Consensus Health | `rippled-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap |
|
||||
| Ledger Operations | `rippled-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
|
||||
| Peer Network | `rippled-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| -------------------- | -------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
|
||||
| RPC Performance | `xrpld-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
|
||||
| Transaction Overview | `xrpld-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
|
||||
| Consensus Health | `xrpld-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap |
|
||||
| Ledger Operations | `xrpld-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
|
||||
| Peer Network | `xrpld-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
|
||||
|
||||
### 3.2 System Metrics Dashboards (5)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ---------------------- | ------------------------------- | ----------------- | --------------------------------------------------------------------------------- |
|
||||
| Node Health | `rippled-system-node-health` | Prometheus (OTLP) | Ledger age, operating mode, I/O latency, job queue, fetch rate |
|
||||
| Network Traffic | `rippled-system-network` | Prometheus (OTLP) | Active peers, disconnects, bytes in/out, messages in/out, traffic by category |
|
||||
| RPC & Pathfinding | `rippled-system-rpc` | Prometheus (OTLP) | RPC rate, response time/size, pathfinding duration, resource warnings/drops |
|
||||
| Overlay Traffic Detail | `rippled-system-overlay-detail` | Prometheus (OTLP) | Squelch, overhead, validator lists, set get/share, have/requested tx, proof paths |
|
||||
| Ledger Data & Sync | `rippled-system-ledger-sync` | Prometheus (OTLP) | Ledger data exchange, legacy ledger share/get, getobject by type, traffic heatmap |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ---------------------- | ----------------------------- | ----------------- | --------------------------------------------------------------------------------- |
|
||||
| Node Health | `xrpld-system-node-health` | Prometheus (OTLP) | Ledger age, operating mode, I/O latency, job queue, fetch rate |
|
||||
| Network Traffic | `xrpld-system-network` | Prometheus (OTLP) | Active peers, disconnects, bytes in/out, messages in/out, traffic by category |
|
||||
| RPC & Pathfinding | `xrpld-system-rpc` | Prometheus (OTLP) | RPC rate, response time/size, pathfinding duration, resource warnings/drops |
|
||||
| Overlay Traffic Detail | `xrpld-system-overlay-detail` | Prometheus (OTLP) | Squelch, overhead, validator lists, set get/share, have/requested tx, proof paths |
|
||||
| Ledger Data & Sync | `xrpld-system-ledger-sync` | Prometheus (OTLP) | Ledger data exchange, legacy ledger share/get, getobject by type, traffic heatmap |
|
||||
|
||||
### 3.3 Accessing the Dashboards
|
||||
|
||||
1. Open Grafana at **http://localhost:3000**
|
||||
2. Navigate to **Dashboards → rippled** folder
|
||||
2. Navigate to **Dashboards → xrpld** folder
|
||||
3. All 10 dashboards are auto-provisioned from `docker/telemetry/grafana/dashboards/`
|
||||
|
||||
---
|
||||
@@ -410,18 +410,18 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
|
||||
### Finding Traces by Type
|
||||
|
||||
| What to Find | Tempo TraceQL Query |
|
||||
| ------------------------ | -------------------------------------------------------------------------------- |
|
||||
| All RPC calls | `{resource.service.name="rippled" && name="rpc.request"}` |
|
||||
| Specific RPC command | `{resource.service.name="rippled" && name="rpc.command.server_info"}` |
|
||||
| Slow RPC calls | `{resource.service.name="rippled" && name=~"rpc.command.*"} \| duration > 100ms` |
|
||||
| Failed RPC calls | `{span.xrpl.rpc.status="error"}` |
|
||||
| Specific transaction | `{span.xrpl.tx.hash="<hex_hash>"}` |
|
||||
| Local transactions only | `{span.xrpl.tx.local=true}` |
|
||||
| Consensus rounds | `{resource.service.name="rippled" && name="consensus.accept"}` |
|
||||
| Rounds by mode | `{span.xrpl.consensus.mode="proposing"}` |
|
||||
| Specific ledger | `{span.xrpl.ledger.seq=12345}` |
|
||||
| Peer proposals (trusted) | `{span.xrpl.peer.proposal.trusted=true}` |
|
||||
| What to Find | Tempo TraceQL Query |
|
||||
| ------------------------ | ------------------------------------------------------------------------------ |
|
||||
| All RPC calls | `{resource.service.name="xrpld" && name="rpc.request"}` |
|
||||
| Specific RPC command | `{resource.service.name="xrpld" && name="rpc.command.server_info"}` |
|
||||
| Slow RPC calls | `{resource.service.name="xrpld" && name=~"rpc.command.*"} \| duration > 100ms` |
|
||||
| Failed RPC calls | `{span.xrpl.rpc.status="error"}` |
|
||||
| Specific transaction | `{span.xrpl.tx.hash="<hex_hash>"}` |
|
||||
| Local transactions only | `{span.xrpl.tx.local=true}` |
|
||||
| Consensus rounds | `{resource.service.name="xrpld" && name="consensus.accept"}` |
|
||||
| Rounds by mode | `{span.xrpl.consensus.mode="proposing"}` |
|
||||
| Specific ledger | `{span.xrpl.ledger.seq=12345}` |
|
||||
| Peer proposals (trusted) | `{span.xrpl.peer.proposal.trusted=true}` |
|
||||
|
||||
### Trace Structure
|
||||
|
||||
@@ -475,19 +475,19 @@ sum by (xrpl_peer_proposal_trusted) (rate(traces_span_metrics_calls_total{span_n
|
||||
|
||||
```promql
|
||||
# Validated ledger age (should be < 10s)
|
||||
rippled_LedgerMaster_Validated_Ledger_Age
|
||||
xrpld_LedgerMaster_Validated_Ledger_Age
|
||||
|
||||
# Active peer count
|
||||
rippled_Peer_Finder_Active_Inbound_Peers + rippled_Peer_Finder_Active_Outbound_Peers
|
||||
xrpld_Peer_Finder_Active_Inbound_Peers + xrpld_Peer_Finder_Active_Outbound_Peers
|
||||
|
||||
# RPC response time p95
|
||||
histogram_quantile(0.95, rippled_rpc_time_bucket)
|
||||
histogram_quantile(0.95, xrpld_rpc_time_bucket)
|
||||
|
||||
# Total network bytes in (rate)
|
||||
rate(rippled_total_Bytes_In[5m])
|
||||
rate(xrpld_total_Bytes_In[5m])
|
||||
|
||||
# Operating mode (should be "Full" after startup)
|
||||
rippled_State_Accounting_Full_duration
|
||||
xrpld_State_Accounting_Full_duration
|
||||
```
|
||||
|
||||
---
|
||||
@@ -497,7 +497,7 @@ rippled_State_Accounting_Full_duration
|
||||
> **Plan details**: [06-implementation-phases.md §6.8.1](./06-implementation-phases.md) — motivation, architecture, Mermaid diagrams
|
||||
> **Task breakdown**: [Phase8_taskList.md](./Phase8_taskList.md) — per-task implementation details
|
||||
|
||||
Phase 8 injects OTel trace context into rippled's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
|
||||
Phase 8 injects OTel trace context into xrpld's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
|
||||
|
||||
### Log Format
|
||||
|
||||
@@ -522,7 +522,7 @@ The trace context injection is implemented in `Logs::format()` (`src/libxrpl/bas
|
||||
### Log Ingestion Pipeline
|
||||
|
||||
```
|
||||
rippled debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
|
||||
xrpld debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
|
||||
```
|
||||
|
||||
The OTel Collector's `filelog` receiver tails `debug.log` files and uses a `regex_parser` operator to extract structured fields:
|
||||
@@ -551,16 +551,16 @@ Grafana Loki (v2.9.0) serves as the log storage backend. It receives log entries
|
||||
|
||||
```logql
|
||||
# Find all logs for a specific trace
|
||||
{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
|
||||
# Error logs with trace context
|
||||
{job="rippled"} |= "ERR" |= "trace_id="
|
||||
{job="xrpld"} |= "ERR" |= "trace_id="
|
||||
|
||||
# Logs from a specific partition with trace context
|
||||
{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
|
||||
# Count traced log lines over time
|
||||
count_over_time({job="rippled"} |= "trace_id=" [5m])
|
||||
count_over_time({job="xrpld"} |= "trace_id=" [5m])
|
||||
```
|
||||
|
||||
---
|
||||
@@ -571,141 +571,141 @@ count_over_time({job="rippled"} |= "trace_id=" [5m])
|
||||
> **Plan details**: [06-implementation-phases.md §6.8.2](./06-implementation-phases.md) — motivation, architecture, third-party context
|
||||
> **Task breakdown**: [Phase9_taskList.md](./Phase9_taskList.md) — per-task implementation details
|
||||
|
||||
Phase 9 fills ~50+ metrics that exist inside rippled but currently lack time-series export. Uses a hybrid approach: `beast::insight` extensions for NodeStore I/O, OTel `ObservableGauge` async callbacks for new categories.
|
||||
Phase 9 fills ~50+ metrics that exist inside xrpld but currently lack time-series export. Uses a hybrid approach: `beast::insight` extensions for NodeStore I/O, OTel `ObservableGauge` async callbacks for new categories.
|
||||
|
||||
### New Metric Categories
|
||||
|
||||
#### NodeStore I/O (via beast::insight)
|
||||
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ------------------------------------ | ----- | ----------------------------------- |
|
||||
| `rippled_nodestore_reads_total` | Gauge | Cumulative read operations |
|
||||
| `rippled_nodestore_reads_hit` | Gauge | Cache-served reads |
|
||||
| `rippled_nodestore_writes` | Gauge | Cumulative write operations |
|
||||
| `rippled_nodestore_written_bytes` | Gauge | Cumulative bytes written |
|
||||
| `rippled_nodestore_read_bytes` | Gauge | Cumulative bytes read |
|
||||
| `rippled_nodestore_read_duration_us` | Gauge | Cumulative read time (microseconds) |
|
||||
| `rippled_nodestore_write_load` | Gauge | Current write load score |
|
||||
| `rippled_nodestore_read_queue` | Gauge | Items in read queue |
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ---------------------------------- | ----- | ----------------------------------- |
|
||||
| `xrpld_nodestore_reads_total` | Gauge | Cumulative read operations |
|
||||
| `xrpld_nodestore_reads_hit` | Gauge | Cache-served reads |
|
||||
| `xrpld_nodestore_writes` | Gauge | Cumulative write operations |
|
||||
| `xrpld_nodestore_written_bytes` | Gauge | Cumulative bytes written |
|
||||
| `xrpld_nodestore_read_bytes` | Gauge | Cumulative bytes read |
|
||||
| `xrpld_nodestore_read_duration_us` | Gauge | Cumulative read time (microseconds) |
|
||||
| `xrpld_nodestore_write_load` | Gauge | Current write load score |
|
||||
| `xrpld_nodestore_read_queue` | Gauge | Items in read queue |
|
||||
|
||||
#### Cache Hit Rates (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ------------------------------- | ----- | ------------------------------------ |
|
||||
| `rippled_cache_SLE_hit_rate` | Gauge | SLE cache hit rate (0.0-1.0) |
|
||||
| `rippled_cache_ledger_hit_rate` | Gauge | Ledger object cache hit rate |
|
||||
| `rippled_cache_AL_hit_rate` | Gauge | AcceptedLedger cache hit rate |
|
||||
| `rippled_cache_treenode_size` | Gauge | SHAMap TreeNode cache size (entries) |
|
||||
| `rippled_cache_fullbelow_size` | Gauge | FullBelow cache size |
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ----------------------------- | ----- | ------------------------------------ |
|
||||
| `xrpld_cache_SLE_hit_rate` | Gauge | SLE cache hit rate (0.0-1.0) |
|
||||
| `xrpld_cache_ledger_hit_rate` | Gauge | Ledger object cache hit rate |
|
||||
| `xrpld_cache_AL_hit_rate` | Gauge | AcceptedLedger cache hit rate |
|
||||
| `xrpld_cache_treenode_size` | Gauge | SHAMap TreeNode cache size (entries) |
|
||||
| `xrpld_cache_fullbelow_size` | Gauge | FullBelow cache size |
|
||||
|
||||
#### Transaction Queue (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Description |
|
||||
| -------------------------------------- | ----- | -------------------------------- |
|
||||
| `rippled_txq_count` | Gauge | Current transactions in queue |
|
||||
| `rippled_txq_max_size` | Gauge | Maximum queue capacity |
|
||||
| `rippled_txq_in_ledger` | Gauge | Transactions in open ledger |
|
||||
| `rippled_txq_per_ledger` | Gauge | Expected transactions per ledger |
|
||||
| `rippled_txq_open_ledger_fee_level` | Gauge | Open ledger fee escalation level |
|
||||
| `rippled_txq_med_fee_level` | Gauge | Median fee level in queue |
|
||||
| `rippled_txq_reference_fee_level` | Gauge | Reference fee level |
|
||||
| `rippled_txq_min_processing_fee_level` | Gauge | Minimum fee to get processed |
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ------------------------------------ | ----- | -------------------------------- |
|
||||
| `xrpld_txq_count` | Gauge | Current transactions in queue |
|
||||
| `xrpld_txq_max_size` | Gauge | Maximum queue capacity |
|
||||
| `xrpld_txq_in_ledger` | Gauge | Transactions in open ledger |
|
||||
| `xrpld_txq_per_ledger` | Gauge | Expected transactions per ledger |
|
||||
| `xrpld_txq_open_ledger_fee_level` | Gauge | Open ledger fee escalation level |
|
||||
| `xrpld_txq_med_fee_level` | Gauge | Median fee level in queue |
|
||||
| `xrpld_txq_reference_fee_level` | Gauge | Reference fee level |
|
||||
| `xrpld_txq_min_processing_fee_level` | Gauge | Minimum fee to get processed |
|
||||
|
||||
#### PerfLog Per-RPC Method (via OTel Metrics SDK)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------- | --------- | ----------------- | --------------------------- |
|
||||
| `rippled_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
|
||||
| `rippled_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed |
|
||||
| `rippled_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls errored |
|
||||
| `rippled_rpc_method_duration_us_bucket` | Histogram | `method="<name>"` | Execution time distribution |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------- | --------- | ----------------- | --------------------------- |
|
||||
| `xrpld_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
|
||||
| `xrpld_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed |
|
||||
| `xrpld_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls errored |
|
||||
| `xrpld_rpc_method_duration_us_bucket` | Histogram | `method="<name>"` | Execution time distribution |
|
||||
|
||||
#### PerfLog Per-Job Type (via OTel Metrics SDK)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------- | --------- | ------------------- | --------------- |
|
||||
| `rippled_job_queued_total` | Counter | `job_type="<name>"` | Jobs queued |
|
||||
| `rippled_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
|
||||
| `rippled_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
|
||||
| `rippled_job_queued_duration_us_bucket` | Histogram | `job_type="<name>"` | Queue wait time |
|
||||
| `rippled_job_running_duration_us_bucket` | Histogram | `job_type="<name>"` | Execution time |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------------------------- | --------- | ------------------- | --------------- |
|
||||
| `xrpld_job_queued_total` | Counter | `job_type="<name>"` | Jobs queued |
|
||||
| `xrpld_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
|
||||
| `xrpld_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
|
||||
| `xrpld_job_queued_duration_us_bucket` | Histogram | `job_type="<name>"` | Queue wait time |
|
||||
| `xrpld_job_running_duration_us_bucket` | Histogram | `job_type="<name>"` | Execution time |
|
||||
|
||||
#### Counted Object Instances (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------- | ----- | --------------- | ------------------------------- |
|
||||
| `rippled_object_count` | Gauge | `type="<name>"` | Live instances of internal type |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------- | ----- | --------------- | ------------------------------- |
|
||||
| `xrpld_object_count` | Gauge | `type="<name>"` | Live instances of internal type |
|
||||
|
||||
Tracked types: `Transaction`, `Ledger`, `NodeObject`, `STTx`, `STLedgerEntry`, `InboundLedger`, `Pathfinder`, `PathRequest`, `HashRouterEntry`
|
||||
|
||||
#### Fee Escalation & Load Factors (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ------------------------------------ | ----- | ------------------------------------ |
|
||||
| `rippled_load_factor` | Gauge | Combined transaction cost multiplier |
|
||||
| `rippled_load_factor_server` | Gauge | Server + cluster + network load |
|
||||
| `rippled_load_factor_local` | Gauge | Local server load only |
|
||||
| `rippled_load_factor_net` | Gauge | Network-wide load estimate |
|
||||
| `rippled_load_factor_cluster` | Gauge | Cluster peer load |
|
||||
| `rippled_load_factor_fee_escalation` | Gauge | Open ledger fee escalation |
|
||||
| `rippled_load_factor_fee_queue` | Gauge | Queue entry fee level |
|
||||
| Prometheus Metric | Type | Description |
|
||||
| ---------------------------------- | ----- | ------------------------------------ |
|
||||
| `xrpld_load_factor` | Gauge | Combined transaction cost multiplier |
|
||||
| `xrpld_load_factor_server` | Gauge | Server + cluster + network load |
|
||||
| `xrpld_load_factor_local` | Gauge | Local server load only |
|
||||
| `xrpld_load_factor_net` | Gauge | Network-wide load estimate |
|
||||
| `xrpld_load_factor_cluster` | Gauge | Cluster peer load |
|
||||
| `xrpld_load_factor_fee_escalation` | Gauge | Open ledger fee escalation |
|
||||
| `xrpld_load_factor_fee_queue` | Gauge | Queue entry fee level |
|
||||
|
||||
#### Server Info (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------------------------------- | ----- | -------- | -------------------------------------------- |
|
||||
| `rippled_server_info{metric="server_state"}` | Gauge | `metric` | Operating mode (0=DISCONNECTED .. 4=FULL) |
|
||||
| `rippled_server_info{metric="uptime"}` | Gauge | `metric` | Seconds since server start |
|
||||
| `rippled_server_info{metric="peers"}` | Gauge | `metric` | Total connected peers |
|
||||
| `rippled_server_info{metric="validated_ledger_seq"}` | Gauge | `metric` | Validated ledger sequence number |
|
||||
| `rippled_server_info{metric="ledger_current_index"}` | Gauge | `metric` | Current open ledger sequence |
|
||||
| `rippled_server_info{metric="peer_disconnects_resources"}` | Gauge | `metric` | Cumulative resource-related peer disconnects |
|
||||
| `rippled_server_info{metric="last_close_proposers"}` | Gauge | `metric` | Proposers in last closed round |
|
||||
| `rippled_server_info{metric="last_close_converge_time_ms"}` | Gauge | `metric` | Last close convergence time (milliseconds) |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------------- | ----- | -------- | -------------------------------------------- |
|
||||
| `xrpld_server_info{metric="server_state"}` | Gauge | `metric` | Operating mode (0=DISCONNECTED .. 4=FULL) |
|
||||
| `xrpld_server_info{metric="uptime"}` | Gauge | `metric` | Seconds since server start |
|
||||
| `xrpld_server_info{metric="peers"}` | Gauge | `metric` | Total connected peers |
|
||||
| `xrpld_server_info{metric="validated_ledger_seq"}` | Gauge | `metric` | Validated ledger sequence number |
|
||||
| `xrpld_server_info{metric="ledger_current_index"}` | Gauge | `metric` | Current open ledger sequence |
|
||||
| `xrpld_server_info{metric="peer_disconnects_resources"}` | Gauge | `metric` | Cumulative resource-related peer disconnects |
|
||||
| `xrpld_server_info{metric="last_close_proposers"}` | Gauge | `metric` | Proposers in last closed round |
|
||||
| `xrpld_server_info{metric="last_close_converge_time_ms"}` | Gauge | `metric` | Last close convergence time (milliseconds) |
|
||||
|
||||
#### Build Info (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------- | ----- | --------- | --------------------------------- |
|
||||
| `rippled_build_info{version="<ver>"}` | Gauge | `version` | Info-style metric, always value 1 |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------- | ----- | --------- | --------------------------------- |
|
||||
| `xrpld_build_info{version="<ver>"}` | Gauge | `version` | Info-style metric, always value 1 |
|
||||
|
||||
#### Complete Ledger Ranges (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------------------------- | ----- | --------------- | --------------------------- |
|
||||
| `rippled_complete_ledgers{bound="start",index="<N>"}` | Gauge | `bound`,`index` | Start of contiguous range N |
|
||||
| `rippled_complete_ledgers{bound="end",index="<N>"}` | Gauge | `bound`,`index` | End of contiguous range N |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------- | ----- | --------------- | --------------------------- |
|
||||
| `xrpld_complete_ledgers{bound="start",index="<N>"}` | Gauge | `bound`,`index` | Start of contiguous range N |
|
||||
| `xrpld_complete_ledgers{bound="end",index="<N>"}` | Gauge | `bound`,`index` | End of contiguous range N |
|
||||
|
||||
#### Database Metrics (via OTel MetricsRegistry)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------- | ----- | -------- | --------------------------------- |
|
||||
| `rippled_db_metrics{metric="db_kb_total"}` | Gauge | `metric` | Total database size (KB) |
|
||||
| `rippled_db_metrics{metric="db_kb_ledger"}` | Gauge | `metric` | Ledger database size (KB) |
|
||||
| `rippled_db_metrics{metric="db_kb_transaction"}` | Gauge | `metric` | Transaction database size (KB) |
|
||||
| `rippled_db_metrics{metric="historical_perminute"}` | Gauge | `metric` | Historical ledger fetches per min |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------- | ----- | -------- | --------------------------------- |
|
||||
| `xrpld_db_metrics{metric="db_kb_total"}` | Gauge | `metric` | Total database size (KB) |
|
||||
| `xrpld_db_metrics{metric="db_kb_ledger"}` | Gauge | `metric` | Ledger database size (KB) |
|
||||
| `xrpld_db_metrics{metric="db_kb_transaction"}` | Gauge | `metric` | Transaction database size (KB) |
|
||||
| `xrpld_db_metrics{metric="historical_perminute"}` | Gauge | `metric` | Historical ledger fetches per min |
|
||||
|
||||
#### Extended Cache Metrics (additions to existing rippled_cache_metrics)
|
||||
#### Extended Cache Metrics (additions to existing xrpld_cache_metrics)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------------- | ----- | -------- | ------------------------- |
|
||||
| `rippled_cache_metrics{metric="AL_size"}` | Gauge | `metric` | AcceptedLedger cache size |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------- | ----- | -------- | ------------------------- |
|
||||
| `xrpld_cache_metrics{metric="AL_size"}` | Gauge | `metric` | AcceptedLedger cache size |
|
||||
|
||||
#### Extended NodeStore Metrics (additions to existing rippled_nodestore_state)
|
||||
#### Extended NodeStore Metrics (additions to existing xrpld_nodestore_state)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------------- | ----- | -------- | ----------------------------------- |
|
||||
| `rippled_nodestore_state{metric="node_reads_duration_us"}` | Gauge | `metric` | Cumulative read time (microseconds) |
|
||||
| `rippled_nodestore_state{metric="read_request_bundle"}` | Gauge | `metric` | Read request bundle count |
|
||||
| `rippled_nodestore_state{metric="read_threads_running"}` | Gauge | `metric` | Active read threads |
|
||||
| `rippled_nodestore_state{metric="read_threads_total"}` | Gauge | `metric` | Total read threads configured |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------------------------------------------- | ----- | -------- | ----------------------------------- |
|
||||
| `xrpld_nodestore_state{metric="node_reads_duration_us"}` | Gauge | `metric` | Cumulative read time (microseconds) |
|
||||
| `xrpld_nodestore_state{metric="read_request_bundle"}` | Gauge | `metric` | Read request bundle count |
|
||||
| `xrpld_nodestore_state{metric="read_threads_running"}` | Gauge | `metric` | Active read threads |
|
||||
| `xrpld_nodestore_state{metric="read_threads_total"}` | Gauge | `metric` | Total read threads configured |
|
||||
|
||||
### New Grafana Dashboards (Phase 9)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ------------------ | -------------------- | ----------- | ----------------------------------------------------------------- |
|
||||
| Fee Market & TxQ | `rippled-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown, escalation |
|
||||
| Job Queue Analysis | `rippled-job-queue` | Prometheus | Per-job rates, queue wait times, execution times, queue depth |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ------------------ | ------------------ | ----------- | ----------------------------------------------------------------- |
|
||||
| Fee Market & TxQ | `xrpld-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown, escalation |
|
||||
| Job Queue Analysis | `xrpld-job-queue` | Prometheus | Per-job rates, queue wait times, execution times, queue depth |
|
||||
|
||||
---
|
||||
|
||||
@@ -737,7 +737,7 @@ Phase 10 builds a 5-node validator docker-compose harness with RPC load generato
|
||||
> **Plan details**: [06-implementation-phases.md §6.8.4](./06-implementation-phases.md) — motivation, architecture, consumer gap analysis
|
||||
> **Task breakdown**: [Phase11_taskList.md](./Phase11_taskList.md) — per-task implementation details
|
||||
|
||||
Phase 11 builds a custom OTel Collector receiver (Go) that polls rippled's admin RPCs and exports `xrpl_*` metrics for external consumers. No rippled code changes.
|
||||
Phase 11 builds a custom OTel Collector receiver (Go) that polls xrpld's admin RPCs and exports `xrpl_*` metrics for external consumers. No xrpld code changes.
|
||||
|
||||
### Exported Metrics (via Custom OTel Collector Receiver)
|
||||
|
||||
@@ -807,102 +807,102 @@ via OTLP/HTTP to the OTel Collector and scraped by Prometheus.
|
||||
|
||||
#### NodeStore I/O (Observable Gauge — `nodestore_state`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------------ | ----- | -------- | ------------------------------------ |
|
||||
| `rippled_nodestore_state{metric="node_reads_total"}` | Gauge | `metric` | Cumulative NodeStore read operations |
|
||||
| `rippled_nodestore_state{metric="node_reads_hit"}` | Gauge | `metric` | Reads served from cache |
|
||||
| `rippled_nodestore_state{metric="node_writes"}` | Gauge | `metric` | Cumulative write operations |
|
||||
| `rippled_nodestore_state{metric="node_written_bytes"}` | Gauge | `metric` | Cumulative bytes written |
|
||||
| `rippled_nodestore_state{metric="node_read_bytes"}` | Gauge | `metric` | Cumulative bytes read |
|
||||
| `rippled_nodestore_state{metric="write_load"}` | Gauge | `metric` | Current write load score |
|
||||
| `rippled_nodestore_state{metric="read_queue"}` | Gauge | `metric` | Items in read prefetch queue |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------- | ----- | -------- | ------------------------------------ |
|
||||
| `xrpld_nodestore_state{metric="node_reads_total"}` | Gauge | `metric` | Cumulative NodeStore read operations |
|
||||
| `xrpld_nodestore_state{metric="node_reads_hit"}` | Gauge | `metric` | Reads served from cache |
|
||||
| `xrpld_nodestore_state{metric="node_writes"}` | Gauge | `metric` | Cumulative write operations |
|
||||
| `xrpld_nodestore_state{metric="node_written_bytes"}` | Gauge | `metric` | Cumulative bytes written |
|
||||
| `xrpld_nodestore_state{metric="node_read_bytes"}` | Gauge | `metric` | Cumulative bytes read |
|
||||
| `xrpld_nodestore_state{metric="write_load"}` | Gauge | `metric` | Current write load score |
|
||||
| `xrpld_nodestore_state{metric="read_queue"}` | Gauge | `metric` | Items in read prefetch queue |
|
||||
|
||||
#### Cache Hit Rates & Sizes (Observable Gauge — `cache_metrics`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------------------------- | ----- | -------- | ----------------------------- |
|
||||
| `rippled_cache_metrics{metric="SLE_hit_rate"}` | Gauge | `metric` | SLE cache hit rate (0.0-1.0) |
|
||||
| `rippled_cache_metrics{metric="ledger_hit_rate"}` | Gauge | `metric` | Ledger cache hit rate |
|
||||
| `rippled_cache_metrics{metric="AL_hit_rate"}` | Gauge | `metric` | AcceptedLedger cache hit rate |
|
||||
| `rippled_cache_metrics{metric="treenode_cache_size"}` | Gauge | `metric` | SHAMap TreeNode cache entries |
|
||||
| `rippled_cache_metrics{metric="treenode_track_size"}` | Gauge | `metric` | Tracked tree nodes |
|
||||
| `rippled_cache_metrics{metric="fullbelow_size"}` | Gauge | `metric` | FullBelow cache entries |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------- | ----- | -------- | ----------------------------- |
|
||||
| `xrpld_cache_metrics{metric="SLE_hit_rate"}` | Gauge | `metric` | SLE cache hit rate (0.0-1.0) |
|
||||
| `xrpld_cache_metrics{metric="ledger_hit_rate"}` | Gauge | `metric` | Ledger cache hit rate |
|
||||
| `xrpld_cache_metrics{metric="AL_hit_rate"}` | Gauge | `metric` | AcceptedLedger cache hit rate |
|
||||
| `xrpld_cache_metrics{metric="treenode_cache_size"}` | Gauge | `metric` | SHAMap TreeNode cache entries |
|
||||
| `xrpld_cache_metrics{metric="treenode_track_size"}` | Gauge | `metric` | Tracked tree nodes |
|
||||
| `xrpld_cache_metrics{metric="fullbelow_size"}` | Gauge | `metric` | FullBelow cache entries |
|
||||
|
||||
#### Transaction Queue (Observable Gauge — `txq_metrics`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------------------ | ----- | -------- | -------------------------------- |
|
||||
| `rippled_txq_metrics{metric="txq_count"}` | Gauge | `metric` | Transactions currently in queue |
|
||||
| `rippled_txq_metrics{metric="txq_max_size"}` | Gauge | `metric` | Maximum queue capacity |
|
||||
| `rippled_txq_metrics{metric="txq_in_ledger"}` | Gauge | `metric` | Transactions in open ledger |
|
||||
| `rippled_txq_metrics{metric="txq_per_ledger"}` | Gauge | `metric` | Expected transactions per ledger |
|
||||
| `rippled_txq_metrics{metric="txq_reference_fee_level"}` | Gauge | `metric` | Reference fee level |
|
||||
| `rippled_txq_metrics{metric="txq_min_processing_fee_level"}` | Gauge | `metric` | Minimum fee to get processed |
|
||||
| `rippled_txq_metrics{metric="txq_med_fee_level"}` | Gauge | `metric` | Median fee level in queue |
|
||||
| `rippled_txq_metrics{metric="txq_open_ledger_fee_level"}` | Gauge | `metric` | Open ledger fee escalation level |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------------- | ----- | -------- | -------------------------------- |
|
||||
| `xrpld_txq_metrics{metric="txq_count"}` | Gauge | `metric` | Transactions currently in queue |
|
||||
| `xrpld_txq_metrics{metric="txq_max_size"}` | Gauge | `metric` | Maximum queue capacity |
|
||||
| `xrpld_txq_metrics{metric="txq_in_ledger"}` | Gauge | `metric` | Transactions in open ledger |
|
||||
| `xrpld_txq_metrics{metric="txq_per_ledger"}` | Gauge | `metric` | Expected transactions per ledger |
|
||||
| `xrpld_txq_metrics{metric="txq_reference_fee_level"}` | Gauge | `metric` | Reference fee level |
|
||||
| `xrpld_txq_metrics{metric="txq_min_processing_fee_level"}` | Gauge | `metric` | Minimum fee to get processed |
|
||||
| `xrpld_txq_metrics{metric="txq_med_fee_level"}` | Gauge | `metric` | Median fee level in queue |
|
||||
| `xrpld_txq_metrics{metric="txq_open_ledger_fee_level"}` | Gauge | `metric` | Open ledger fee escalation level |
|
||||
|
||||
#### Per-RPC Method Metrics (Synchronous Counters/Histogram)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------- | --------- | ----------------- | -------------------------------- |
|
||||
| `rippled_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
|
||||
| `rippled_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed successfully |
|
||||
| `rippled_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls that errored |
|
||||
| `rippled_rpc_method_duration_us` | Histogram | `method="<name>"` | Execution time distribution (us) |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------- | --------- | ----------------- | -------------------------------- |
|
||||
| `xrpld_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
|
||||
| `xrpld_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed successfully |
|
||||
| `xrpld_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls that errored |
|
||||
| `xrpld_rpc_method_duration_us` | Histogram | `method="<name>"` | Execution time distribution (us) |
|
||||
|
||||
#### Per-Job-Type Metrics (Synchronous Counters/Histogram)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------- | --------- | ------------------- | --------------------------------- |
|
||||
| `rippled_job_queued_total` | Counter | `job_type="<name>"` | Jobs enqueued |
|
||||
| `rippled_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
|
||||
| `rippled_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
|
||||
| `rippled_job_queued_duration_us` | Histogram | `job_type="<name>"` | Queue wait time distribution (us) |
|
||||
| `rippled_job_running_duration_us` | Histogram | `job_type="<name>"` | Execution time distribution (us) |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------- | --------- | ------------------- | --------------------------------- |
|
||||
| `xrpld_job_queued_total` | Counter | `job_type="<name>"` | Jobs enqueued |
|
||||
| `xrpld_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
|
||||
| `xrpld_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
|
||||
| `xrpld_job_queued_duration_us` | Histogram | `job_type="<name>"` | Queue wait time distribution (us) |
|
||||
| `xrpld_job_running_duration_us` | Histogram | `job_type="<name>"` | Execution time distribution (us) |
|
||||
|
||||
#### Counted Object Instances (Observable Gauge — `object_count`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------- | ----- | --------------- | ------------------------------ |
|
||||
| `rippled_object_count{type="Transaction"}` | Gauge | `type="<name>"` | Live Transaction objects |
|
||||
| `rippled_object_count{type="Ledger"}` | Gauge | `type="<name>"` | Live Ledger objects |
|
||||
| `rippled_object_count{type="NodeObject"}` | Gauge | `type="<name>"` | Live NodeObject instances |
|
||||
| `rippled_object_count{type="STTx"}` | Gauge | `type="<name>"` | Serialized transaction objects |
|
||||
| `rippled_object_count{type="STLedgerEntry"}` | Gauge | `type="<name>"` | Serialized ledger entries |
|
||||
| `rippled_object_count{type="InboundLedger"}` | Gauge | `type="<name>"` | Ledgers being fetched |
|
||||
| `rippled_object_count{type="Pathfinder"}` | Gauge | `type="<name>"` | Active pathfinding operations |
|
||||
| `rippled_object_count{type="PathRequest"}` | Gauge | `type="<name>"` | Active path requests |
|
||||
| `rippled_object_count{type="HashRouterEntry"}` | Gauge | `type="<name>"` | Hash router entries |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------------------------------- | ----- | --------------- | ------------------------------ |
|
||||
| `xrpld_object_count{type="Transaction"}` | Gauge | `type="<name>"` | Live Transaction objects |
|
||||
| `xrpld_object_count{type="Ledger"}` | Gauge | `type="<name>"` | Live Ledger objects |
|
||||
| `xrpld_object_count{type="NodeObject"}` | Gauge | `type="<name>"` | Live NodeObject instances |
|
||||
| `xrpld_object_count{type="STTx"}` | Gauge | `type="<name>"` | Serialized transaction objects |
|
||||
| `xrpld_object_count{type="STLedgerEntry"}` | Gauge | `type="<name>"` | Serialized ledger entries |
|
||||
| `xrpld_object_count{type="InboundLedger"}` | Gauge | `type="<name>"` | Ledgers being fetched |
|
||||
| `xrpld_object_count{type="Pathfinder"}` | Gauge | `type="<name>"` | Active pathfinding operations |
|
||||
| `xrpld_object_count{type="PathRequest"}` | Gauge | `type="<name>"` | Active path requests |
|
||||
| `xrpld_object_count{type="HashRouterEntry"}` | Gauge | `type="<name>"` | Hash router entries |
|
||||
|
||||
#### Load Factor Breakdown (Observable Gauge — `load_factor_metrics`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------------------------ | ----- | -------- | --------------------------------------- |
|
||||
| `rippled_load_factor_metrics{metric="load_factor"}` | Gauge | `metric` | Combined transaction cost multiplier |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_server"}` | Gauge | `metric` | Server + cluster + network contribution |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_local"}` | Gauge | `metric` | Local server load only |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_net"}` | Gauge | `metric` | Network-wide load estimate |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_cluster"}` | Gauge | `metric` | Cluster peer load |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_fee_escalation"}` | Gauge | `metric` | Open ledger fee escalation |
|
||||
| `rippled_load_factor_metrics{metric="load_factor_fee_queue"}` | Gauge | `metric` | Queue entry fee level |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------------------- | ----- | -------- | --------------------------------------- |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor"}` | Gauge | `metric` | Combined transaction cost multiplier |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_server"}` | Gauge | `metric` | Server + cluster + network contribution |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_local"}` | Gauge | `metric` | Local server load only |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_net"}` | Gauge | `metric` | Network-wide load estimate |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_cluster"}` | Gauge | `metric` | Cluster peer load |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_fee_escalation"}` | Gauge | `metric` | Open ledger fee escalation |
|
||||
| `xrpld_load_factor_metrics{metric="load_factor_fee_queue"}` | Gauge | `metric` | Queue entry fee level |
|
||||
|
||||
#### Prometheus Query Examples (Phase 9)
|
||||
|
||||
```promql
|
||||
# NodeStore cache hit ratio
|
||||
rippled_nodestore_state{metric="node_reads_hit"} / rippled_nodestore_state{metric="node_reads_total"}
|
||||
xrpld_nodestore_state{metric="node_reads_hit"} / xrpld_nodestore_state{metric="node_reads_total"}
|
||||
|
||||
# RPC error rate for server_info
|
||||
rate(rippled_rpc_method_errored_total{method="server_info"}[5m])
|
||||
rate(xrpld_rpc_method_errored_total{method="server_info"}[5m])
|
||||
|
||||
# Job queue wait time p95
|
||||
histogram_quantile(0.95, sum by (le) (rate(rippled_job_queued_duration_us_bucket[5m])))
|
||||
histogram_quantile(0.95, sum by (le) (rate(xrpld_job_queued_duration_us_bucket[5m])))
|
||||
|
||||
# TxQ utilization percentage
|
||||
rippled_txq_metrics{metric="txq_count"} / rippled_txq_metrics{metric="txq_max_size"}
|
||||
xrpld_txq_metrics{metric="txq_count"} / xrpld_txq_metrics{metric="txq_max_size"}
|
||||
|
||||
# High load factor alert candidate
|
||||
rippled_load_factor_metrics{metric="load_factor"} > 5
|
||||
xrpld_load_factor_metrics{metric="load_factor"} > 5
|
||||
```
|
||||
|
||||
### Phase 7+: External Dashboard Parity Metrics
|
||||
@@ -911,75 +911,75 @@ rippled_load_factor_metrics{metric="load_factor"} > 5
|
||||
>
|
||||
> **Task breakdown**: Phase 7 Tasks 7.9-7.16 (implementation), Phase 9 Tasks 9.11-9.13 (dashboards)
|
||||
|
||||
These metrics fill gaps identified by comparing rippled's internal observability with the community external dashboard's 86-metric coverage. All are exported via the OTel Metrics SDK (same `PeriodicMetricReader` as Phase 9 metrics).
|
||||
These metrics fill gaps identified by comparing xrpld's internal observability with the community external dashboard's 86-metric coverage. All are exported via the OTel Metrics SDK (same `PeriodicMetricReader` as Phase 9 metrics).
|
||||
|
||||
#### Validation Agreement (Observable Gauge — `validation_agreement`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------------- | ------ | -------- | --------------------------------------- |
|
||||
| `rippled_validation_agreement{metric="agreement_pct_1h"}` | Double | `metric` | Rolling 1h agreement percentage (0-100) |
|
||||
| `rippled_validation_agreement{metric="agreement_pct_24h"}` | Double | `metric` | Rolling 24h agreement percentage |
|
||||
| `rippled_validation_agreement{metric="agreements_1h"}` | Int64 | `metric` | Agreed validations in 1h window |
|
||||
| `rippled_validation_agreement{metric="missed_1h"}` | Int64 | `metric` | Missed validations in 1h window |
|
||||
| `rippled_validation_agreement{metric="agreements_24h"}` | Int64 | `metric` | Agreed validations in 24h window |
|
||||
| `rippled_validation_agreement{metric="missed_24h"}` | Int64 | `metric` | Missed validations in 24h window |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------------------------------------------- | ------ | -------- | --------------------------------------- |
|
||||
| `xrpld_validation_agreement{metric="agreement_pct_1h"}` | Double | `metric` | Rolling 1h agreement percentage (0-100) |
|
||||
| `xrpld_validation_agreement{metric="agreement_pct_24h"}` | Double | `metric` | Rolling 24h agreement percentage |
|
||||
| `xrpld_validation_agreement{metric="agreements_1h"}` | Int64 | `metric` | Agreed validations in 1h window |
|
||||
| `xrpld_validation_agreement{metric="missed_1h"}` | Int64 | `metric` | Missed validations in 1h window |
|
||||
| `xrpld_validation_agreement{metric="agreements_24h"}` | Int64 | `metric` | Agreed validations in 24h window |
|
||||
| `xrpld_validation_agreement{metric="missed_24h"}` | Int64 | `metric` | Missed validations in 24h window |
|
||||
|
||||
Data source: `ValidationTracker` class with 8s grace period and 5m late repair window.
|
||||
|
||||
#### Validator Health (Observable Gauge — `validator_health`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------------ | ------ | -------- | ------------------------------ |
|
||||
| `rippled_validator_health{metric="amendment_blocked"}` | Int64 | `metric` | 1 if amendment-blocked, else 0 |
|
||||
| `rippled_validator_health{metric="unl_blocked"}` | Int64 | `metric` | 1 if UNL-blocked, else 0 |
|
||||
| `rippled_validator_health{metric="unl_expiry_days"}` | Double | `metric` | Days until UNL list expires |
|
||||
| `rippled_validator_health{metric="validation_quorum"}` | Int64 | `metric` | Validation quorum threshold |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------- | ------ | -------- | ------------------------------ |
|
||||
| `xrpld_validator_health{metric="amendment_blocked"}` | Int64 | `metric` | 1 if amendment-blocked, else 0 |
|
||||
| `xrpld_validator_health{metric="unl_blocked"}` | Int64 | `metric` | 1 if UNL-blocked, else 0 |
|
||||
| `xrpld_validator_health{metric="unl_expiry_days"}` | Double | `metric` | Days until UNL list expires |
|
||||
| `xrpld_validator_health{metric="validation_quorum"}` | Int64 | `metric` | Validation quorum threshold |
|
||||
|
||||
#### Peer Quality (Observable Gauge — `peer_quality`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------------- | ------ | -------- | ------------------------------------ |
|
||||
| `rippled_peer_quality{metric="peer_latency_p90_ms"}` | Double | `metric` | P90 peer latency in milliseconds |
|
||||
| `rippled_peer_quality{metric="peers_insane_count"}` | Int64 | `metric` | Peers with diverged tracking status |
|
||||
| `rippled_peer_quality{metric="peers_higher_version_pct"}` | Double | `metric` | % of peers on newer rippled version |
|
||||
| `rippled_peer_quality{metric="upgrade_recommended"}` | Int64 | `metric` | 1 if >60% of peers are newer version |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------------------- | ------ | -------- | ------------------------------------ |
|
||||
| `xrpld_peer_quality{metric="peer_latency_p90_ms"}` | Double | `metric` | P90 peer latency in milliseconds |
|
||||
| `xrpld_peer_quality{metric="peers_insane_count"}` | Int64 | `metric` | Peers with diverged tracking status |
|
||||
| `xrpld_peer_quality{metric="peers_higher_version_pct"}` | Double | `metric` | % of peers on newer xrpld version |
|
||||
| `xrpld_peer_quality{metric="upgrade_recommended"}` | Int64 | `metric` | 1 if >60% of peers are newer version |
|
||||
|
||||
#### Ledger Economy (Observable Gauge — `ledger_economy`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ----------------------------------------------------- | ------ | -------- | ---------------------------------- |
|
||||
| `rippled_ledger_economy{metric="base_fee_xrp"}` | Double | `metric` | Base transaction fee in drops |
|
||||
| `rippled_ledger_economy{metric="reserve_base_xrp"}` | Double | `metric` | Account reserve in drops |
|
||||
| `rippled_ledger_economy{metric="reserve_inc_xrp"}` | Double | `metric` | Owner reserve increment in drops |
|
||||
| `rippled_ledger_economy{metric="ledger_age_seconds"}` | Double | `metric` | Seconds since last validated close |
|
||||
| `rippled_ledger_economy{metric="transaction_rate"}` | Double | `metric` | Smoothed transaction rate (tx/s) |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------------- | ------ | -------- | ---------------------------------- |
|
||||
| `xrpld_ledger_economy{metric="base_fee_xrp"}` | Double | `metric` | Base transaction fee in drops |
|
||||
| `xrpld_ledger_economy{metric="reserve_base_xrp"}` | Double | `metric` | Account reserve in drops |
|
||||
| `xrpld_ledger_economy{metric="reserve_inc_xrp"}` | Double | `metric` | Owner reserve increment in drops |
|
||||
| `xrpld_ledger_economy{metric="ledger_age_seconds"}` | Double | `metric` | Seconds since last validated close |
|
||||
| `xrpld_ledger_economy{metric="transaction_rate"}` | Double | `metric` | Smoothed transaction rate (tx/s) |
|
||||
|
||||
#### State Tracking (Observable Gauge — `state_tracking`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ---------------------------------------------------------------- | ------ | -------- | -------------------------------------- |
|
||||
| `rippled_state_tracking{metric="state_value"}` | Int64 | `metric` | Numeric state 0-6 (see encoding below) |
|
||||
| `rippled_state_tracking{metric="time_in_current_state_seconds"}` | Double | `metric` | Duration in current state |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| -------------------------------------------------------------- | ------ | -------- | -------------------------------------- |
|
||||
| `xrpld_state_tracking{metric="state_value"}` | Int64 | `metric` | Numeric state 0-6 (see encoding below) |
|
||||
| `xrpld_state_tracking{metric="time_in_current_state_seconds"}` | Double | `metric` | Duration in current state |
|
||||
|
||||
State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full, 5=validating (FULL + validating), 6=proposing (FULL + proposing).
|
||||
|
||||
#### Storage Detail (Observable Gauge — `storage_detail`)
|
||||
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| --------------------------------------------- | ----- | -------- | ---------------------- |
|
||||
| `rippled_storage_detail{metric="nudb_bytes"}` | Int64 | `metric` | NuDB backend file size |
|
||||
| Prometheus Metric | Type | Labels | Description |
|
||||
| ------------------------------------------- | ----- | -------- | ---------------------- |
|
||||
| `xrpld_storage_detail{metric="nudb_bytes"}` | Int64 | `metric` | NuDB backend file size |
|
||||
|
||||
#### Synchronous Counters (Phase 7+)
|
||||
|
||||
| Prometheus Metric | Type | Description | Increment Site |
|
||||
| ------------------------------------- | ------- | -------------------------------- | --------------------- |
|
||||
| `rippled_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
|
||||
| `rippled_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
|
||||
| `rippled_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
|
||||
| `rippled_validation_agreements_total` | Counter | Cumulative validation agreements | ValidationTracker.cpp |
|
||||
| `rippled_validation_missed_total` | Counter | Cumulative validation misses | ValidationTracker.cpp |
|
||||
| `rippled_state_changes_total` | Counter | Operating mode transitions | NetworkOPs.cpp |
|
||||
| `rippled_jq_trans_overflow_total` | Counter | Job queue transaction overflows | JobQueue.cpp |
|
||||
| Prometheus Metric | Type | Description | Increment Site |
|
||||
| ----------------------------------- | ------- | -------------------------------- | --------------------- |
|
||||
| `xrpld_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
|
||||
| `xrpld_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
|
||||
| `xrpld_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
|
||||
| `xrpld_validation_agreements_total` | Counter | Cumulative validation agreements | ValidationTracker.cpp |
|
||||
| `xrpld_validation_missed_total` | Counter | Cumulative validation misses | ValidationTracker.cpp |
|
||||
| `xrpld_state_changes_total` | Counter | Operating mode transitions | NetworkOPs.cpp |
|
||||
| `xrpld_jq_trans_overflow_total` | Counter | Job queue transaction overflows | JobQueue.cpp |
|
||||
|
||||
#### Span Attribute Enrichments (Phases 2-4)
|
||||
|
||||
@@ -997,29 +997,29 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
|
||||
|
||||
### New Grafana Dashboards (Phase 9)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ---------------------- | -------------------------- | ----------- | --------------------------------------------------------- |
|
||||
| Fee Market & TxQ | `rippled-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown |
|
||||
| Job Queue Analysis | `rippled-job-queue` | Prometheus | Per-job rates, queue wait times, execution times |
|
||||
| RPC Performance (OTel) | `rippled-rpc-perf` | Prometheus | Per-method call rates, error rates, latency distributions |
|
||||
| Validator Health | `rippled-validator-health` | Prometheus | Agreement %, validation rate, amendment/UNL, state |
|
||||
| Peer Quality | `rippled-peer-quality` | Prometheus | P90 latency, insane peers, version awareness, disconnects |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ---------------------- | ------------------------ | ----------- | --------------------------------------------------------- |
|
||||
| Fee Market & TxQ | `xrpld-fee-market` | Prometheus | TxQ depth/capacity, fee levels, load factor breakdown |
|
||||
| Job Queue Analysis | `xrpld-job-queue` | Prometheus | Per-job rates, queue wait times, execution times |
|
||||
| RPC Performance (OTel) | `xrpld-rpc-perf` | Prometheus | Per-method call rates, error rates, latency distributions |
|
||||
| Validator Health | `xrpld-validator-health` | Prometheus | Agreement %, validation rate, amendment/UNL, state |
|
||||
| Peer Quality | `xrpld-peer-quality` | Prometheus | P90 latency, insane peers, version awareness, disconnects |
|
||||
|
||||
### Updated Grafana Dashboards (Phase 9)
|
||||
|
||||
| Dashboard | UID | New Panels Added |
|
||||
| -------------------- | ---------------------------- | -------------------------------------------------------------------- |
|
||||
| Node Health (StatsD) | `rippled-statsd-node-health` | NodeStore I/O, cache hit rates, object instance counts |
|
||||
| System Node Health | `rippled-system-node-health` | Ledger economy row: base fee, reserves, ledger age, transaction rate |
|
||||
| Dashboard | UID | New Panels Added |
|
||||
| -------------------- | -------------------------- | -------------------------------------------------------------------- |
|
||||
| Node Health (StatsD) | `xrpld-statsd-node-health` | NodeStore I/O, cache hit rates, object instance counts |
|
||||
| System Node Health | `xrpld-system-node-health` | Ledger economy row: base fee, reserves, ledger age, transaction rate |
|
||||
|
||||
### New Grafana Dashboards (Phase 11)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ------------------ | ----------------------------- | ----------- | ---------------------------------------------------------------------- |
|
||||
| Validator Health | `rippled-validator-health` | Prometheus | Server state timeline, proposer count, converge time, amendment voting |
|
||||
| Network Topology | `rippled-network-topology` | Prometheus | Peer count, version distribution, latency distribution, diverged peers |
|
||||
| Fee Market (Ext) | `rippled-fee-market-external` | Prometheus | Fee levels, queue depth, load factor breakdown, escalation timeline |
|
||||
| DEX & AMM Overview | `rippled-dex-amm` | Prometheus | AMM TVL, order book depth, spread trends, trading fee revenue |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| ------------------ | --------------------------- | ----------- | ---------------------------------------------------------------------- |
|
||||
| Validator Health | `xrpld-validator-health` | Prometheus | Server state timeline, proposer count, converge time, amendment voting |
|
||||
| Network Topology | `xrpld-network-topology` | Prometheus | Peer count, version distribution, latency distribution, diverged peers |
|
||||
| Fee Market (Ext) | `xrpld-fee-market-external` | Prometheus | Fee levels, queue depth, load factor breakdown, escalation timeline |
|
||||
| DEX & AMM Overview | `xrpld-dex-amm` | Prometheus | AMM TVL, order book depth, spread trends, trading fee revenue |
|
||||
|
||||
### Prometheus Alerting Rules (Phase 11)
|
||||
|
||||
@@ -1044,8 +1044,8 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
|
||||
| Issue | Impact | Status |
|
||||
| ------------------------------------------------------------------ | ------------------------------------------------ | -------------------------------------------------------------------- |
|
||||
| `warn` and `drop` metrics use non-standard StatsD `\|m` meter type | Metrics silently dropped by OTel StatsD receiver | Phase 6 Task 6.1 — needs `\|m` → `\|c` change in StatsDCollector.cpp |
|
||||
| `rippled_job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
|
||||
| `rippled_rpc_requests` depends on `[insight]` config | Zero series if StatsD not configured | Requires `[insight] server=statsd` in xrpld.cfg |
|
||||
| `xrpld_job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
|
||||
| `xrpld_rpc_requests` depends on `[insight]` config | Zero series if StatsD not configured | Requires `[insight] server=statsd` in xrpld.cfg |
|
||||
| Peer tracing disabled by default | No `peer.*` spans unless `trace_peer=1` | Intentional — high volume on mainnet |
|
||||
|
||||
---
|
||||
@@ -1077,7 +1077,7 @@ enabled=1
|
||||
[insight]
|
||||
server=statsd
|
||||
address=127.0.0.1:8125
|
||||
prefix=rippled
|
||||
prefix=xrpld
|
||||
```
|
||||
|
||||
### Production Setup
|
||||
@@ -1094,7 +1094,7 @@ max_queue_size=4096
|
||||
[insight]
|
||||
server=statsd
|
||||
address=otel-collector:8125
|
||||
prefix=rippled
|
||||
prefix=xrpld
|
||||
```
|
||||
|
||||
### Trace Category Toggle
|
||||
|
||||
Reference in New Issue
Block a user