@@ -78,22 +78,45 @@ There are two independent telemetry pipelines entering a single **OTel Collector
## 1. OpenTelemetry Spans
### 1.1 Complete Span Inventory (16 spans)
### 1.1 Complete Span Inventory (~37 spans)
> **See also**: [02-design-decisions.md §2.3](./02-design-decisions.md#23-span-naming-conventions) for naming conventions and the full span catalog with rationale. [04-code-samples.md §4.6](./04-code-samples.md#46-span-flow-visualization) for span flow diagrams.
> **Span names vs. attribute keys**: span names use dotted `subsystem.operation`
> form (e.g. `rpc.http_request`). Span _attribute_ keys use the bare/underscore
> form from the 2026-05-13 naming redesign (e.g. `tx_hash`, not `xrpl.tx.hash`).
> The dotted `xrpl.*` form is reserved for OTel **resource** attributes set once
> at startup. See §1.2 for the full attribute inventory.
#### RPC Spans
Controlled by `trace_rpc=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| -------------------- | ------------- | ----------------- | ------------------------------------------------------------------------ |
| `rpc.request` | — | ServerHandler.cpp | Top-level HTTP RPC request entry point |
| `rpc.process` | `rpc.request` | ServerHandler.cpp | RPC processing pipeline |
| `rpc.ws_messag e` | — | ServerHandler.cpp | WebSocket message handling |
| `rpc.command.<name>` | `rpc.proc ess ` | RPC Handler.cpp | Per-command span (e.g., `rpc.command.server_info` , `rpc.command.ledger` ) |
| Span Name | Parent | Source File | Description |
| -------------------- | ------------------ | ----------------- | ------------------------------------------------------------------------ |
| `rpc.http_ request` | — | ServerHandler.cpp | Top-level HTTP JSON- RPC request entry point |
| `rpc.ws_message` | — | ServerHandler.cpp | WebSocket message handling (one per inbound frame) |
| `rpc.ws_upgrad e` | — | ServerHandler.cpp | WebSocket upgrade handshake (records handshake failures) |
| `rpc.process` | `rpc.http_requ est ` | Server Handler.cpp | RPC processing pipeline (single or batch request) |
| `rpc.command.<name>` | `rpc.process` | RPCHandler.cpp | Per-command span (e.g., `rpc.command.server_info` , `rpc.command.ledger` ) |
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"rpc.request|rpc.command.*"}`
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"rpc.http_ request|rpc.command.*"}`
**Grafana dashboard** : _RPC Performance_ (`xrpld-rpc-perf` )
#### gRPC Spans
Controlled by `trace_rpc=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| ------------------- | ------ | -------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `grpc.<MethodName>` | — | GRPCServer.cpp | One flat span per gRPC method (e.g., `grpc.GetLedger` , `grpc.GetLedgerData` , `grpc.GetLedgerDiff` , `grpc.GetLedgerEntry` ) |
The method name is embedded in the span name (formed at the call site as
`grpc.<MethodName>` ), so dashboards break out per-method latency and error
rates without TraceQL attribute filters.
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"grpc.*"}`
**Grafana dashboard** : _RPC Performance_ (`xrpld-rpc-perf` )
@@ -121,17 +144,46 @@ or, for the apply pipeline: `{resource.service.name="xrpld" && name=~"tx.preflig
**Grafana dashboard** : _Transaction Overview_ (`xrpld-transactions` )
#### Transaction Queue (TxQ) Spans
Controlled by `trace_transactions=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| ------------------ | ------------- | ----------- | --------------------------------------------------- |
| `txq.enqueue` | `tx.process` | TxQ.cpp | Enqueue decision when a tx is submitted |
| `txq.apply_direct` | `txq.enqueue` | TxQ.cpp | Direct apply attempt that bypasses the queue |
| `txq.batch_clear` | `txq.enqueue` | TxQ.cpp | Batch clear of an account's queued txs |
| `txq.accept` | — | TxQ.cpp | Ledger-close accept loop (drains the queue) |
| `txq.accept.tx` | `txq.accept` | TxQ.cpp | Per-queued-transaction apply inside the accept loop |
| `txq.cleanup` | — | TxQ.cpp | Post-close cleanup of expired queue entries |
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"txq.*"}`
**Grafana dashboard** : _Transaction Overview_ (`xrpld-transactions` )
#### Consensus Spans
Controlled by `trace_consensus=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| --------------------------- | ------ | ---------------- | --------------------------------------------- |
| `consensus.p roposal.send` | — | RCLConsensus.cpp | Node broadcasts its transaction set proposal |
| `consensus.ledger_close ` | — | RCL Consensus.cpp | Ledger close event triggered by consensus |
| `consensus.accept` | — | RCLConsensus.cpp | Consensus accepts a ledger (round complete) |
| `consensus.validation.send` | — | RCLConsensus.cpp | Validation message sent after ledger accepted |
| `consensus.accept.apply ` | — | RCLConsensus.cpp | Ledger application with close time details |
| Span Name | Parent | Source File | Description |
| ------------------------------ | ------------ ------ | ---------------- | ------------------------------------------------------------------- |
| `consensus.round` | — (root) | RCLConsensus.cpp | Root span for one consensus round (deterministic trace per round) |
| `consensus.phase.open ` | `consensus.round` | Consensus.h | Open phase — collecting transactions before close |
| `consensus.proposal.send` | `consensus.round` | RCLConsensus.cpp | Node broadcasts its transaction set proposal |
| `consensus.ledger_close` | `consensus.round` | RCLConsensus.cpp | Ledger close event triggered by consensus |
| `consensus.establish ` | `consensus.round` | Consensus.h | Establish phase — converging on the transaction set |
| `consensus.update_positions` | `consensus.round` | Consensus.h | Position update with per-dispute vote details |
| `consensus.check` | `consensus.round` | Consensus.h | Consensus threshold check (agree/disagree tally) |
| `consensus.accept` | `consensus.round` | RCLConsensus.cpp | Consensus accepts a ledger (round complete) |
| `consensus.accept.apply` | `consensus.accept` | RCLConsensus.cpp | Ledger application with close-time details (jtACCEPT thread) |
| `consensus.validation.send` | `consensus.round` | RCLConsensus.cpp | Validation message sent after ledger accepted (follows-from link) |
| `consensus.mode_change` | `consensus.round` | RCLConsensus.cpp | Operating-mode transition during the round |
| `consensus.proposal.receive` | (context) | PeerImp.cpp | Proposal received from a peer (context-propagated into the round) |
| `consensus.validation.receive` | (context) | PeerImp.cpp | Validation received from a peer (context-propagated into the round) |
The `.receive` spans are created per-message in the overlay and joined to the
round trace via context propagation rather than direct parenting. The
`consensus.validation.send` span uses a follows-from link off the round.
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"consensus.*"}`
@@ -141,11 +193,12 @@ Controlled by `trace_consensus=1` in `[telemetry]` config.
Controlled by `trace_ledger=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| ----------------- | ------ | ---------------- | ---------------------------------------------- |
| `ledger.build` | — | BuildLedger.cpp | Build new ledger from accepted transaction set |
| `ledger.validate` | — | LedgerMaster.cpp | Ledger promoted to validated status |
| `ledger.store` | — | LedgerMaster.cpp | Ledger stored to database/history |
| Span Name | Parent | Source File | Description |
| ----------------- | ------ | ----------------- | ---------------------------------------------- |
| `ledger.build` | — | BuildLedger.cpp | Build new ledger from accepted transaction set |
| `ledger.validate` | — | LedgerMaster.cpp | Ledger promoted to validated status |
| `ledger.store` | — | LedgerMaster.cpp | Ledger stored to database/history |
| `ledger.acquire` | — | InboundLedger.cpp | Fetch a missing ledger from peers |
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"ledger.*"}`
@@ -164,88 +217,206 @@ Controlled by `trace_peer=1` in `[telemetry]` config. **Disabled by default** (h
**Grafana dashboard** : _Peer Network_ (`xrpld-peer-net` )
#### PathFind Spans
Controlled by `trace_rpc=1` in `[telemetry]` config.
| Span Name | Parent | Source File | Description |
| --------------------- | ------------------ | --------------- | ---------------------------------------------------------- |
| `pathfind.request` | `rpc.command.*` | PathRequest.cpp | `path_find` / `ripple_path_find` RPC entry |
| `pathfind.compute` | `pathfind.request` | PathRequest.cpp | Path computation for one request (`PathRequest::doUpdate` ) |
| `pathfind.discover` | `pathfind.compute` | Pathfinder.cpp | Graph exploration (one per RPC call) |
| `pathfind.update_all` | — | PathRequest.cpp | Async recomputation of all active requests at ledger close |
**Where to find** : Tempo → TraceQL: `{resource.service.name="xrpld" && name=~"pathfind.*"}`
---
### 1.2 Complete Attribute Inventory (22 attribute s)
### 1.2 Complete Attribute Inventory (bare/underscore key s)
> **See also**: [02-design-decisions.md §2.4.2](./02-design-decisions.md#242-span-attributes-by-category) for attribute design rationale and privacy considerations.
Every span can carry key-value attributes that provide context for filtering and aggregation.
Every span can carry key-value attributes that provide context for filtering and
aggregation. Per the 2026-05-13 naming redesign, span-attribute keys use the
**bare** field name (the span name already carries the domain), or the
`<domain>_<field>` underscore form where a bare name would collide (e.g.
`rpc_status` , `grpc_status` , `tx_status` , `txq_status` ).
> **Dotted exceptions** (do not confuse with span attributes):
>
> - `xrpl.ledger.hash` is the **only** dotted span attribute. It is a shared
> constant set on `peer.validation.receive`. Note that `consensus.validation.send`
> uses the **bare** `ledger_hash` instead.
> - `xrpl.network.id` and `xrpl.network.type` are **resource** attributes set
> once at startup on the OTel resource — not span attributes. They appear on
> every span's resource scope, queried as `{resource.xrpl.network.id=...}`.
#### RPC Attributes
| Attribute | Type | Set On | Description |
| --------------- | ------ | --------------- | ------------------------------------------------ |
| `command` | string | `rpc.command.*` | RPC command name (e.g., `server_info` , `ledger` ) |
| `version` | int64 | `rpc.command.*` | API version number |
| `rpc_role` | string | `rpc.command.*` | Caller role: `"admin"` or `"user"` |
| `rpc_status` | string | `rpc.command.*` | Result: `"success"` or `"error"` |
| `duration_ms ` | int64 | `rpc.command.*` | Command execution time in milliseconds |
| `error_message` | string | `rpc.command.*` | Error details (only set on failure) |
| Attribute | Type | Set On | Description |
| ---------------------- | ------- | --------------------------------- | ------------------------------------------------ |
| `command` | string | `rpc.command.*` , `rpc.ws_message` | RPC command name (e.g., `server_info` , `ledger` ) |
| `version` | int64 | `rpc.command.*` | API version number |
| `rpc_role` | string | `rpc.command.*` | Caller role: `"admin"` or `"user"` |
| `rpc_status` | string | `rpc.command.*` | Result: `"success"` or `"error"` |
| `request_payload_size ` | int64 | `rpc.http_request` | Bytes of inbound request payload |
| `is_batch` | boolean | `rpc.process` | `true` if the request is a JSON-RPC batch |
| `batch_size` | int64 | `rpc.process` | Number of sub-requests in a batch |
| `load_type` | string | `rpc.command.*` | Resource cost category after execution |
**Tempo query** : `{span.command="server_info"}` to find all `server_info` calls.
**Prometheus label** : `xrpl_rpc_ command` (dots converted to underscores by SpanMetrics).
**Prometheus label** : `command` (used as a SpanMetrics dimension ).
#### gRPC Attributes
| Attribute | Type | Set On | Description |
| ------------- | ------ | ------------------- | ------------------------------------ |
| `method` | string | `grpc.<MethodName>` | gRPC method name (e.g., `GetLedger` ) |
| `grpc_role` | string | `grpc.<MethodName>` | Caller role: `"admin"` or `"user"` |
| `grpc_status` | string | `grpc.<MethodName>` | Result: `"success"` or `"error"` |
**Tempo query** : `{span.method="GetLedger"}` or `{name="grpc.GetLedger"}` .
**Prometheus labels** : `method` , `grpc_role` , `grpc_status` (SpanMetrics dimensions).
#### Transaction Attributes
| Attribute | Type | Set On | Description |
| ------------------- | ------- | ---------------------------------------------- | --------------------------------------------------------------------- |
| `xrpl. tx. hash` | string | `tx.process` , `tx.receive` | Transaction hash (hex-encoded) |
| `local` | boolean | `tx.process` | `true` if locally submitted, `false` if peer-relayed |
| `path` | string | `tx.process` | Submission path: `"sync"` or `"async"` |
| `suppressed` | boolean | `tx.receive` | `true` if transaction was suppressed (duplicate) |
| `tx_status ` | string | `tx.receive` | Transaction status (e.g., `"known_bad"` ) |
| `xrpl.peer.id ` | int64 | `tx.receive` | Peer identifier (also set on peer spans) |
| `xrpl.peer.version` | string | `tx.receive` | Peer protocol version string |
| `stage` | string | `tx.p reflight` , `tx.preclaim` , `tx.transactor` | Apply-pipeline stage: `preflight` , `preclaim` , or `apply` |
| `tx_type ` | string | `tx.p reflight` , `tx.preclaim` , `tx.transactor` | Transaction type name (e.g., `Payment` ) |
| `t er_result` | string | `tx.p reflight` , `tx.preclaim` , `tx.transactor` | Engine result token for that stage (e.g., `tesSUCCESS` , `terPRE_SEQ` ) |
| `applied ` | boolean | `tx.transactor` | `true` if the transaction was applied to the ledger |
| Attribute | Type | Set On | Description |
| -------------- | -- ----- | -------------- ---------------------------------------------- | --------------------------------------------------------------------- |
| `tx_ hash` | string | `tx.process` , `tx.receive` | Transaction hash (hex-encoded) |
| `local` | boolean | `tx.process` | `true` if locally submitted, `false` if peer-relayed |
| `path` | string | `tx.process` | Submission path: `"sync"` or `"async"` |
| `tx_type` | string | `tx.process` , `tx.preflight` , `tx.preclaim` , `tx.transactor` | Transaction type name (e.g., `Payment` ) |
| `fee ` | int64 | `tx.process` | Transaction fee in drops |
| `sequence ` | int64 | `tx.process` | Transaction sequence number |
| `suppressed` | boolean | `tx.receive` | `true` if transaction was suppressed (duplicate) |
| `tx_status` | string | `tx.receive` | Transaction status (e.g., `"known_bad"` ) |
| `peer_id ` | int64 | `tx.receive` | Peer identifier (also set on peer spans) |
| `pe er_version` | string | `tx.receive` | Peer protocol version string |
| `stage ` | string | `tx.preflight` , `tx.preclaim` , `tx.transactor` | Apply-pipeline stage: `preflight` , `preclaim` , or `apply` |
| `ter_result` | string | `tx.preflight` , `tx.preclaim` , `tx.transactor` | Engine result token for that stage (e.g., `tesSUCCESS` , `terPRE_SEQ` ) |
| `applied` | boolean | `tx.transactor` | `true` if the transaction was applied to the ledger |
**Tempo query** : `{span.xrpl. tx. hash="<hash>"}` to trace a specific transaction across nodes.
**Tempo query** : `{span.tx_ hash="<hash>"}` to trace a specific transaction across nodes.
**Prometheus label** : `xrpl_tx_local` (used as SpanMetrics dimension).
**Prometheus labels ** : `local` , `suppressed` , `tx_type` , `ter_result` , `stage` ( SpanMetrics dimensions ).
#### Transaction Queue (TxQ) Attributes
| Attribute | Type | Set On | Description |
| -------------------- | ------- | ------------------------------ | ----------------------------------------------------------- |
| `tx_hash` | string | `txq.enqueue` , `txq.accept.tx` | Transaction hash |
| `tx_type` | string | `txq.enqueue` | Transaction type name |
| `txq_status` | string | `txq.enqueue` , `txq.accept.tx` | Queue outcome (e.g. `queued` , `applied_direct` , `rejected` ) |
| `fee_level_paid` | int64 | `txq.enqueue` | Fee level paid by the queued tx |
| `required_fee_level` | int64 | `txq.enqueue` | Minimum fee level for inclusion |
| `num_cleared` | int64 | `txq.batch_clear` | Entries cleared in a batch |
| `queue_size` | int64 | `txq.accept` | Current TxQ depth |
| `ledger_changed` | boolean | `txq.accept` | Whether the ledger changed since last attempt |
| `ter_code` | int64 | `txq.accept.tx` | Transaction engine result code |
| `retries_remaining` | int64 | `txq.accept.tx` | Retries left before discard |
| `ledger_seq` | int64 | `txq.cleanup` | Ledger sequence number |
| `expired_count` | int64 | `txq.cleanup` | Number of expired entries cleared |
**Prometheus label** : `txq_status` (SpanMetrics dimension).
#### Consensus Attributes
| Attribute | Type | Set On | Description |
| ------------------------------------ | ------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
| `xrpl. consensus.round` | int64 | `consensus.p roposal.se nd` | Consensus round number |
| `xrpl.consensus.mode ` | string | `consensus.p roposal.se nd` , `consensus.ledger_close` | Node mode: `"syncing"` , `"tracking"` , `"full"` , `"proposing"` |
| `xrpl. consensus.proposers ` | int64 | `consensus.p roposal.se nd` , `consensus.accept ` | Number of proposers in the round |
| `xrpl. consensus.proposing` | boolean | `consensus.validation.se nd` | Whether this node was a proposer |
| `xrpl. consensus.ledger. seq ` | int64 | `consensus.ledger_close` , `consensus.accept` , `consensus.validation.send` , `consensus.accept.apply` | Ledger sequence number |
| `xrpl.consensus.close_time ` | int64 | `consensus.accept.apply ` | Agreed-upon ledger close time (epoch seconds) |
| `xrpl.consensus.close_time_correct` | boolean | `consensus.accept.apply ` | Whether validators reached agreement on close time |
| `xrpl.consensus.close_resolution_ms` | int64 | `consensus.accept.apply` | Close time rounding granularity in milliseconds |
| `xrpl.consensus.state` | string | `consensus.accept.apply` | Consensus outcome: `"finished"` or `"moved_on"` |
| `xrpl. consensus. round_time_ms ` | int64 | `consensus.accept.apply` | Total consensus round duration in milliseconds |
| Attribute | Type | Set On | Description |
| -------------------------- | ------- | -------------------------------------------------------------------------------------------------- | -------------------------------------------------------- |
| `consensus_ledger_id` | string | `consensus.rou nd` | Previous-ledger id anchoring the round |
| `ledger_seq ` | int64 | `consensus.rou nd` , `consensus.ledger_close` , `consensus.accept.apply` , `consensus.validation.send` | Ledger sequence number |
| `consensus_mode ` | string | `consensus.rou nd` , `consensus.ledger_close ` | Node mode: `"Proposing"` , `"Observing"` , `"Wrong"` , etc. |
| `consensus_round_id` | int64 | `consensus.rou nd` | Round identifier |
| `consensus_pha se` | string | `consensus.round` | Current phase name (updated on each transition) |
| `trace_strategy ` | string | `consensus.round ` | Trace-id strategy (`deterministic` / `random` ) |
| `previous_ledger_seq` | int64 | `consensus.round ` | Sequence of the previous ledger |
| `previous_proposers` | int64 | `consensus.round` | Proposer count in the previous round |
| `previous_round_time_ms` | int64 | `consensus.round` | Duration of the previous round |
| `consensus_ round` | int64 | `consensus.proposal.send` | Proposal sequence number for the broadcast proposal |
| `is_bow_out` | boolean | `consensus.proposal.send` | Whether the proposal is a bow-out (resigning the round) |
| `tx_count_open` | int64 | `consensus.ledger_close` | Transactions in the open ledger at close |
| `close_time_resolution_ms` | int64 | `consensus.ledger_close` | Close-time rounding granularity |
| `converge_percent` | int64 | `consensus.establish` , `consensus.update_positions` | Convergence percentage |
| `establish_count` | int64 | `consensus.establish` | Establish-phase iteration count |
| `proposers` | int64 | `consensus.establish` , `consensus.update_positions` , `consensus.accept` | Number of proposers |
| `disputes_count` | int64 | `consensus.establish` , `consensus.update_positions` | Number of disputed transactions |
| `tx_id` | string | `consensus.update_positions` | Disputed transaction id (per-dispute event) |
| `dispute_our_vote` | boolean | `consensus.update_positions` | Our vote on the disputed tx |
| `dispute_yays` | int64 | `consensus.update_positions` | Yes votes on the disputed tx |
| `dispute_nays` | int64 | `consensus.update_positions` | No votes on the disputed tx |
| `agree_count` | int64 | `consensus.check` | Agreeing proposer count |
| `disagree_count` | int64 | `consensus.check` | Disagreeing proposer count |
| `threshold_percent` | int64 | `consensus.check` | Agreement threshold percentage |
| `consensus_result` | string | `consensus.check` | Check outcome |
| `quorum` | int64 | `consensus.check` , `consensus.accept` | Quorum required |
| `round_time_ms` | int64 | `consensus.accept` , `consensus.accept.apply` | Total consensus round duration in milliseconds |
| `consensus_state` | string | `consensus.accept.apply` | Consensus outcome: `"finished"` or `"moved_on"` |
| `close_time` | int64 | `consensus.accept.apply` | Agreed-upon ledger close time (epoch seconds) |
| `close_time_correct` | boolean | `consensus.accept.apply` | Whether validators agreed on close time |
| `close_resolution_ms` | int64 | `consensus.accept.apply` | Close-time rounding granularity in milliseconds |
| `proposing` | boolean | `consensus.accept.apply` , `consensus.validation.send` | Whether this node was a proposer |
| `parent_close_time` | int64 | `consensus.accept.apply` | Parent ledger close time |
| `close_time_self` | int64 | `consensus.accept.apply` | This node's close-time vote |
| `close_time_vote_bins` | string | `consensus.accept.apply` | Distribution of close-time votes |
| `resolution_direction` | string | `consensus.accept.apply` | Whether close resolution increased/decreased/unchanged |
| `tx_count` | int64 | `consensus.accept.apply` | Transactions in the accepted set |
| `ledger_hash` | string | `consensus.validation.send` | Full hash of the validated ledger (**bare**, not dotted) |
| `full_validation` | boolean | `consensus.validation.send` | Whether this is a full validation |
| `validation_sign_time` | int64 | `consensus.validation.send` | Validation signing time |
| `mode_old` | string | `consensus.mode_change` | Operating mode before the transition |
| `mode_new` | string | `consensus.mode_change` | Operating mode after the transition |
**Tempo query** : `{span.xrpl. consensus. mode="p roposing"}` to find rounds where node was proposing.
**Tempo query** : `{span.consensus_ mode="P roposing"}` to find rounds where the node was proposing.
**Prometheus label** : `xrpl_ consensus_mode` (used as SpanMetrics dimension).
**Prometheus labels ** : `consensus_mode` , `consensus_state` , `consensus_phase` , `consensus_result` , `consensus_stalled` , `mode_new` , `close_time_correct` ( SpanMetrics dimensions ).
#### Ledger Attributes
| Attribute | Type | Set On | Description |
| ------------------------- | ----- | ------------------------------------------------------------- | ---------------------------------------------- |
| `xrpl. ledger. seq` | int64 | `ledger.build` , `ledger.validate` , `ledger.store` , `tx.apply` | Ledger sequence number |
| `xrpl.ledger.validations` | int64 | `ledger.validate ` | Number of validations received for this ledger |
| `xrpl.ledger.tx_count` | int64 | `ledger.build` , `tx.apply` | Transactions in the ledger |
| `xrpl.ledger.tx_failed` | int64 | `ledger.build` , `tx.apply` | Failed transactions in the ledger |
| Attribute | Type | Set On | Description |
| --------------------- | ------- | ------------------------------------------------- | ------------------------------------------------ |
| `ledger_ seq` | int64 | `ledger.build` , `ledger.validate` , `ledger.store` | Ledger sequence number |
| `close_time` | int64 | `ledger.build ` | Ledger close time (epoch seconds) |
| `close_time_correct` | boolean | `ledger.build` | Whether close time was agreed upon by validators |
| `close_resolution_ms` | int64 | `ledger.build` | Close time rounding granularity in milliseconds |
| `tx_count` | int64 | `tx.apply` | Transactions applied to the ledger |
| `tx_failed` | int64 | `tx.apply` | Failed transactions in the apply set |
| `validations` | int64 | `ledger.validate` | Number of validations received for this ledger |
| `acquire_reason` | string | `ledger.acquire` | Why the ledger fetch was triggered |
| `timeouts` | int64 | `ledger.acquire` | Number of fetch timeouts |
| `peer_count` | int64 | `ledger.acquire` | Peers queried during the fetch |
| `outcome` | string | `ledger.acquire` | Fetch outcome |
**Tempo query** : `{span.xrpl.ledger.seq=12345}` to find all spans for a specific ledger.
The apply-step span `tx.apply` (child of `ledger.build` ) carries `tx_count` /`tx_failed` ;
the parent `ledger.build` carries `ledger_seq` and the close-time attributes.
`ledger.acquire` (InboundLedger) also sets `ledger_seq` .
**Tempo query** : `{span.ledger_seq=12345}` to find all spans for a specific ledger.
#### Peer Attributes
| Attribute | Type | Set On | Description |
| ------------------------------ | ------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
| `xrpl. peer. id` | int64 | `tx.receive` , `peer.proposal.receive` , `peer.validation.receive` | Peer identifier |
| `xrpl.peer. proposal. trusted` | boolean | `peer.proposal.receive` | Whether the proposal came from a trusted validator |
| `xrpl.peer. validation. trusted` | boolean | `peer.validation.receive` | Whether the validation came from a trusted validator |
| Attribute | Type | Set On | Description |
| -------------------- | ------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
| `peer_ id` | int64 | `tx.receive` , `peer.proposal.receive` , `peer.validation.receive` | Peer identifier |
| `proposal_ trusted` | boolean | `peer.proposal.receive` | Whether the proposal came from a trusted validator |
| `validation_ trusted` | boolean | `peer.validation.receive` | Whether the validation came from a trusted validator |
| `validation_full` | boolean | `peer.validation.receive` | Whether the validation is a full validation |
| `xrpl.ledger.hash` | string | `peer.validation.receive` | Validated ledger hash (**dotted** — shared constant) |
**Prometheus labels** : `xrpl_peer_ proposal_trusted` , `xrpl_peer_ validation_trusted` (SpanMetrics dimensions).
**Prometheus labels** : `proposal_trusted` , `validation_trusted` (SpanMetrics dimensions).
#### PathFind Attributes
| Attribute | Type | Set On | Description |
| ------------------------- | ------- | --------------------- | ---------------------------------------- |
| `pathfind_source_account` | string | `pathfind.request` | Originating account for the path search |
| `pathfind_dest_account` | string | `pathfind.request` | Destination account |
| `pathfind_fast` | boolean | `pathfind.compute` | Whether fast pathfinding mode is enabled |
| `pathfind_search_level` | int64 | `pathfind.discover` | Depth of graph exploration |
| `pathfind_num_paths` | int64 | `pathfind.discover` | Total paths produced |
| `pathfind_ledger_index` | int64 | `pathfind.update_all` | Target ledger index |
| `pathfind_num_requests` | int64 | `pathfind.update_all` | Active requests recomputed |
---
@@ -264,17 +435,34 @@ The OTel Collector's SpanMetrics connector automatically generates RED (Rate, Er
**Standard labels on every metric** : `span_name` , `status_code` , `service_name` , `span_kind`
**Additional dimension labels** (configured in `otel-collector-config.yaml` ):
**Additional dimension labels** (configured in `otel-collector-config.yaml` ).
The Prometheus label is the **bare span-attribute key verbatim** — the
SpanMetrics connector does not rewrite or prefix it:
| Span Attribute | Prometheus Label | Applies To |
| --------------------- | --- -------------------- ------- | ---------------------------------------------- |
| `command` | `xrpl_rpc_command` | `rpc.command.*` |
| `rpc_status` | `xrpl_rpc_status` | `rpc.command.*` |
| `xrpl. consensus. mode` | `xrpl_consensus_mode` | `consensus.ledger_close` |
| `local` | `xrpl_tx_local` | `tx.process` |
| `proposal_trusted` | `xrpl_peer_proposal_trusted` | `peer.proposal.receive` |
| `validation_trusted` | `xrpl_peer_validation_trusted` | `peer.validation.receive` |
| `stage` | `stage` | `tx.preflight` , `tx.preclaim` , `tx.transactor` |
| Prometheus Label / Span Attribute | Type | Applies To |
| --------------------------------- | ------- | ---------------------------------------------- |
| `command` | string | `rpc.command.*` |
| `rpc_status` | string | `rpc.command.*` |
| `consensus_ mode` | string | `consensus.round` , `consensus.ledger_close` |
| `close_time_correct` | boolean | `consensus.accept.apply` |
| `local` | boolean | `tx.process` |
| `suppressed` | boolean | `tx.receive` |
| `proposal_trusted` | boolean | `peer.proposal.receive` |
| `validation_trusted` | boolean | `peer.validation.receive` |
| `tx_type` | string | `tx.*` , `txq.enqueue` |
| `ter_result` | string | `tx.preflight` , `tx.preclaim` , `tx.transactor` |
| `stage` | string | `tx.preflight` , `tx.preclaim` , `tx.transactor` |
| `txq_status` | string | `txq.enqueue` , `txq.accept.tx` |
| `consensus_state` | string | `consensus.accept.apply` |
| `load_type` | string | `rpc.command.*` |
| `is_batch` | boolean | `rpc.process` |
| `mode_new` | string | `consensus.mode_change` |
| `consensus_stalled` | boolean | `consensus.check` |
| `consensus_phase` | string | `consensus.round` |
| `consensus_result` | string | `consensus.check` |
| `method` | string | `grpc.<MethodName>` |
| `grpc_role` | string | `grpc.<MethodName>` |
| `grpc_status` | string | `grpc.<MethodName>` |
The `stage` dimension (3 values: `preflight` , `preclaim` , `apply` ) turns the
apply-pipeline spans into per-stage RED metrics with no native instruments — the
@@ -337,7 +525,7 @@ prefix=xrpld
| `xrpld_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp | Active outbound peer connections | 10– 21 |
| `xrpld_Overlay_Peer_Disconnects` | OverlayImpl.cpp | Cumulative peer disconnection count | Low growth |
| `xrpld_Overlay_Peer_Disconnects_Charges` | OverlayImpl.cpp | Disconnects due to resource limit charges | Low growth (subset of above) |
| `xrpld_job_count` | JobQueue.cpp | Current job queue depth | 0– 100 (healthy) |
| `xrpld_jobq_ job_count` | JobQueue.cpp | Current job queue depth (group `jobq` ) | 0– 100 (healthy) |
**Grafana dashboard** : _Node Health (System Metrics)_ (`xrpld-system-node-health` )
@@ -439,38 +627,47 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
| What to Find | Tempo TraceQL Query |
| ------------------------ | ------------------------------------------------------------------------------ |
| All RPC calls | `{resource.service.name="xrpld" && name="rpc.request"}` |
| All RPC calls | `{resource.service.name="xrpld" && name="rpc.http_ request"}` |
| Specific RPC command | `{resource.service.name="xrpld" && name="rpc.command.server_info"}` |
| Slow RPC calls | `{resource.service.name="xrpld" && name=~"rpc.command.*"} \| duration > 100ms` |
| Failed RPC calls | `{span.rpc_status="error"}` |
| Specific transaction | `{span.xrpl.tx.hash="<hex_hash>"}` |
| Local transactions only | `{span.xrpl.tx.local=true }` |
| Consensus rounds | `{resource.service.name="xrpld" && name="consensus.accept"}` |
| Rounds by mode | `{span.xrpl.consensus.mode="proposing"}` |
| Specific ledger | `{span.xrpl.ledger.seq=12345}` |
| Peer proposals (trusted) | `{span.xrpl.peer.proposal.trusted=true}` |
| gRPC method calls | `{resource.service.name="xrpld" && name="grpc.GetLedger"}` |
| Specific transaction | `{span.tx_hash="<hex_hash>" }` |
| Local transactions only | `{span.local=true}` |
| Consensus rounds | `{resource.service.name="xrpld" && name="consensus.round"}` |
| Rounds by mode | `{span.consensus_mode="Proposing"}` |
| Specific ledger | `{span.ledger_seq=12345}` |
| Peer proposals (trusted) | `{span.proposal_trusted=true}` |
### Trace Structure
A typical RPC trace shows the span hierarchy:
```
rpc.request (ServerHandler)
rpc.http_ request (ServerHandler)
└── rpc.process (ServerHandler)
└── rpc.command.server_info (RPCHandler)
```
A consensus round p roduces independent spans (not parent-child):
A consensus round g roups its lifecycle spans under a single root
(`consensus.round` ); the build/ledger spans run as their own trees:
```
consensus.ledger_close (close event )
consensus.proposal.send (broadcast proposal )
ledger.build (build new ledger )
└ ── tx.apply (apply transaction se t)
consensus.accept (accept result )
consensus.validation.send (send validation )
ledger.validate (promote to validated )
ledger.store (persist to DB )
consensus.round (root — one per round )
├── consensus.phase.open (open phase )
├── consensus.proposal.send (broadcast proposal )
├ ── consensus.ledger_close (close even t)
├── consensus.establish (establish phase )
├── consensus.update_positions (position updates )
├── consensus.check (threshold check )
├── consensus.accept (accept result )
│ └── consensus.accept.apply (apply, jtACCEPT thread)
└── consensus.validation.send (send validation, follows-from link)
ledger.build (build new ledger)
└── tx.apply (apply transaction set)
ledger.validate (promote to validated)
ledger.store (persist to DB)
```
---
@@ -483,19 +680,19 @@ ledger.store (persist to DB)
```promql
# RPC request rate by command (last 5 minutes)
sum by ( xrpl_rpc_ command) ( rate ( traces_span_metrics_calls_total { span_name =~ " rpc.command.* "}[ 5m ] ))
sum by ( command ) ( rate ( traces_span_metrics_calls_total { span_name =~ " rpc.command.* "}[ 5m ] ))
# RPC p95 latency by command
histogram_quantile ( 0.95 , sum by ( le , xrpl_rpc_ command) ( rate ( traces_span_metrics_duration_milliseconds_bucket { span_name =~ " rpc.command.* "}[ 5m ] )))
histogram_quantile ( 0.95 , sum by ( le , command ) ( rate ( traces_span_metrics_duration_milliseconds_bucket { span_name =~ " rpc.command.* "}[ 5m ] )))
# Consensus round duration p95
histogram_quantile ( 0.95 , sum by ( le ) ( rate ( traces_span_metrics_duration_milliseconds_bucket { span_name = " consensus.accept "}[ 5m ] )))
histogram_quantile ( 0.95 , sum by ( le ) ( rate ( traces_span_metrics_duration_milliseconds_bucket { span_name = " consensus.round "}[ 5m ] )))
# Transaction processing rate (local vs relay)
sum by ( xrpl_tx_ local) ( rate ( traces_span_metrics_calls_total { span_name = " tx.process "}[ 5m ] ))
sum by ( local ) ( rate ( traces_span_metrics_calls_total { span_name = " tx.process "}[ 5m ] ))
# Trusted vs untrusted proposal rate
sum by ( xrpl_peer_ proposal_trusted) ( rate ( traces_span_metrics_calls_total { span_name = " peer.proposal.receive "}[ 5m ] ))
sum by ( proposal_trusted ) ( rate ( traces_span_metrics_calls_total { span_name = " peer.proposal.receive "}[ 5m ] ))
```
### StatsD Metrics
@@ -592,90 +789,22 @@ count_over_time({job="xrpld"} |= "trace_id=" [5m])
---
## 5b. Future: Internal Metric Gap Fill (Phase 9)
## 5b. Internal Metric Gap Fill (Phase 9)
> **Status**: Planned, not yet i mplemented.
> **Status**: I mplemented.
> **Plan details**: [06-implementation-phases.md §6.8.2](./06-implementation-phases.md) — motivation, architecture, third-party context
> **Task breakdown**: [Phase9_taskList.md](./Phase9_taskList.md) — per-task implementation details
Phase 9 fills ~50+ metrics that exist inside xrpld but current ly lack time-series export. Uses a hybrid approach: `beast::insight` extensions for NodeStore I/O, OTel `ObservableGauge` async callbacks for new categories.
Phase 9 fills the metrics that exist inside xrpld but previous ly lacked time-series export. It
uses a hybrid approach: `beast::insight` extensions for NodeStore I/O plus OTel `ObservableGauge`
async callbacks for new categories.
### New Metric Categories
#### NodeStore I/O (via beast::insight)
| Prometheus Metric | Type | Description |
| ---------------------------------- | ----- | ----------------------------------- |
| `xrpld_nodestore_reads_total` | Gauge | Cumulative read operations |
| `xrpld_nodestore_reads_hit` | Gauge | Cache-served reads |
| `xrpld_nodestore_writes` | Gauge | Cumulative write operations |
| `xrpld_nodestore_written_bytes` | Gauge | Cumulative bytes written |
| `xrpld_nodestore_read_bytes` | Gauge | Cumulative bytes read |
| `xrpld_nodestore_read_duration_us` | Gauge | Cumulative read time (microseconds) |
| `xrpld_nodestore_write_load` | Gauge | Current write load score |
| `xrpld_nodestore_read_queue` | Gauge | Items in read queue |
#### Cache Hit Rates (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| ----------------------------- | ----- | ------------------------------------ |
| `xrpld_cache_SLE_hit_rate` | Gauge | SLE cache hit rate (0.0-1.0) |
| `xrpld_cache_ledger_hit_rate` | Gauge | Ledger object cache hit rate |
| `xrpld_cache_AL_hit_rate` | Gauge | AcceptedLedger cache hit rate |
| `xrpld_cache_treenode_size` | Gauge | SHAMap TreeNode cache size (entries) |
| `xrpld_cache_fullbelow_size` | Gauge | FullBelow cache size |
#### Transaction Queue (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| ------------------------------------ | ----- | -------------------------------- |
| `xrpld_txq_count` | Gauge | Current transactions in queue |
| `xrpld_txq_max_size` | Gauge | Maximum queue capacity |
| `xrpld_txq_in_ledger` | Gauge | Transactions in open ledger |
| `xrpld_txq_per_ledger` | Gauge | Expected transactions per ledger |
| `xrpld_txq_open_ledger_fee_level` | Gauge | Open ledger fee escalation level |
| `xrpld_txq_med_fee_level` | Gauge | Median fee level in queue |
| `xrpld_txq_reference_fee_level` | Gauge | Reference fee level |
| `xrpld_txq_min_processing_fee_level` | Gauge | Minimum fee to get processed |
#### PerfLog Per-RPC Method (via OTel Metrics SDK)
| Prometheus Metric | Type | Labels | Description |
| ------------------------------------- | --------- | ----------------- | --------------------------- |
| `xrpld_rpc_method_started_total` | Counter | `method="<name>"` | RPC calls started |
| `xrpld_rpc_method_finished_total` | Counter | `method="<name>"` | RPC calls completed |
| `xrpld_rpc_method_errored_total` | Counter | `method="<name>"` | RPC calls errored |
| `xrpld_rpc_method_duration_us_bucket` | Histogram | `method="<name>"` | Execution time distribution |
#### PerfLog Per-Job Type (via OTel Metrics SDK)
| Prometheus Metric | Type | Labels | Description |
| -------------------------------------- | --------- | ------------------- | --------------- |
| `xrpld_job_queued_total` | Counter | `job_type="<name>"` | Jobs queued |
| `xrpld_job_started_total` | Counter | `job_type="<name>"` | Jobs started |
| `xrpld_job_finished_total` | Counter | `job_type="<name>"` | Jobs completed |
| `xrpld_job_queued_duration_us_bucket` | Histogram | `job_type="<name>"` | Queue wait time |
| `xrpld_job_running_duration_us_bucket` | Histogram | `job_type="<name>"` | Execution time |
#### Counted Object Instances (via OTel MetricsRegistry)
| Prometheus Metric | Type | Labels | Description |
| -------------------- | ----- | --------------- | ------------------------------- |
| `xrpld_object_count` | Gauge | `type="<name>"` | Live instances of internal type |
Tracked types: `Transaction` , `Ledger` , `NodeObject` , `STTx` , `STLedgerEntry` , `InboundLedger` , `Pathfinder` , `PathRequest` , `HashRouterEntry`
#### Fee Escalation & Load Factors (via OTel MetricsRegistry)
| Prometheus Metric | Type | Description |
| ---------------------------------- | ----- | ------------------------------------ |
| `xrpld_load_factor` | Gauge | Combined transaction cost multiplier |
| `xrpld_load_factor_server` | Gauge | Server + cluster + network load |
| `xrpld_load_factor_local` | Gauge | Local server load only |
| `xrpld_load_factor_net` | Gauge | Network-wide load estimate |
| `xrpld_load_factor_cluster` | Gauge | Cluster peer load |
| `xrpld_load_factor_fee_escalation` | Gauge | Open ledger fee escalation |
| `xrpld_load_factor_fee_queue` | Gauge | Queue entry fee level |
> **Authoritative metric names live in [§ Phase 9: OTel SDK-Exported Metrics](#phase-9-otel-sdk-exported-metrics-metricsregistry) below.**
> Most internal metrics are emitted as **labeled** gauges — one instrument carrying many logical
> values via a `metric` label (e.g. `xrpld_cache_metrics{metric="SLE_hit_rate"}`,
> `xrpld_txq_metrics{metric="txq_count"}`, `xrpld_load_factor_metrics{metric="load_factor"}`,
> `xrpld_nodestore_state{metric="node_reads_total"}`) — not the flat per-name form. Query the
> labeled names; the flat names (`xrpld_cache_SLE_hit_rate`, `xrpld_txq_count`, …) are **not** emitted.
#### Server Info (via OTel MetricsRegistry)
@@ -759,15 +888,23 @@ docker/telemetry/workload/benchmark.sh --xrpld .build/xrpld --duration 300
### Validated Telemetry Inventory
| Category | Expected Count | Validation Method | Config File |
| ------------------ | -------------- | -------------------------------- | ----------------------- |
| Trace spans | 17 | Tempo API query | `expected_spans.json` |
| Span attributes | 22 | Per-span attribute assertion | `expected_spans.json` |
| StatsD metrics | 255+ | Prometheus query | `expected_metrics.json` |
| Phase 9 metrics | 68+ | Prometheus query | `expec ted_ metrics.json` |
| SpanMetrics RED | 4 per span | Prometheus query | `expected_metrics.json` |
| Grafana dashboards | 10 | Dashboard API "no data" check | `expected_metrics.json` |
| Log-trace links | Prese nt | Loki query + Tempo reverse check | — |
> **Counting note — families vs series.** A _metric family_ is one distinct Prometheus `__name__`
> (histogram `_bucket`/`_count`/`_sum` collapsed to one). A _series_ is a family × its label
> combinations. The legacy overlay-traffic block is the bulk of the count: ~56 message categories ×
> 4 (`_Bytes_In/_Out`, `_Messages_In/_Out`) ≈ 224 families on its own. The labeled gauges
> (`xrpld_cache_metrics{metric}`, …) are few families but many series. Validate against the figures
> below as **families currently emitting** (idle nodes under-report — workload-ga ted metrics such as
> per-RPC/error counters appear only once exercised, which is Phase 10's purpose).
| Category | Expected Cou nt | Validation Method | Config File |
| ------------------------- | ------------------------- | -------------------------------- | ----------------------- |
| Trace spans | ~37 (required + optional) | Tempo API query | `expected_spans.json` |
| Span attributes | per-span assertion | Per-span attribute assertion | `expected_spans.json` |
| Legacy `xrpld_*` families | ~270 (≈224 traffic) | Prometheus `__name__` query | `expected_metrics.json` |
| Native MetricsRegistry | 35 instruments | Prometheus query | `expected_metrics.json` |
| SpanMetrics RED | 4 per span | Prometheus query | `expected_metrics.json` |
| Grafana dashboards | 15 | Dashboard API "no data" check | `expected_metrics.json` |
| Log-trace links | Present | Loki query + Tempo reverse check | — |
### Performance Overhead Targets
@@ -1021,15 +1158,27 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
#### Synchronous Counters (Phase 7+)
| Prometheus Metric | Type | Description | Increment Site |
| ----------------------------------- | ------- | -------------------------------- | --------------------- |
| `xrpld_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
| `xrpld_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
| `xrpld_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
| `xrpld_validation_agreement s_total` | Counter | Cumulative validation agreements | ValidationTracker .cpp |
| `xrpld_validation_missed _total` | Counter | Cumulative validation misses | ValidationTracker.cpp |
| `xrpld_state_changes_total` | Counter | Operating mode transitions | NetworkOPs.cpp |
| `xrpld_jq_trans_overflow_total` | Counter | Job queue transaction overflows | JobQueue.cpp |
| Prometheus Metric | Type | Description | Increment Site |
| --------------------------------- | ------- | ------------------------------- | ---------------- |
| `xrpld_ledgers_closed_total` | Counter | Ledgers closed by consensus | RCLConsensus.cpp |
| `xrpld_validations_sent_total` | Counter | Validations sent | RCLConsensus.cpp |
| `xrpld_validations_checked_total` | Counter | Network validations observed | LedgerMaster.cpp |
| `xrpld_state_change s_total` | Counter | Operating mode transitions | NetworkOPs .cpp |
| `xrpld_jq_trans_overflow _total` | Counter | Job queue transaction overflows | JobQueue.cpp |
Lifetime validation agreement/miss tallies are exported as monotonic **ObservableCounters**
(not synchronous counters) observed from `ValidationTracker` 's gross lifetime totals:
| Prometheus Metric | Type | Description | Source |
| ----------------------------------- | ----------------- | ------------------------------------------ | --------------------- |
| `xrpld_validation_agreements_total` | ObservableCounter | Lifetime validations that initially agreed | ValidationTracker.cpp |
| `xrpld_validation_missed_total` | ObservableCounter | Lifetime validations that initially missed | ValidationTracker.cpp |
> **Counting semantics (initial-classification only):** each reconciled ledger increments exactly
> one of these two counters, at first classification. A later late-repair (miss → agreement) does
> **not** move either counter — keeping both strictly monotonic (a Prometheus `_total` must never
> decrease) and additive (`agreements_total + missed_total` = ledgers reconciled). The
> repair-aware, windowed view remains on `xrpld_validation_agreement{metric="…"}`.
#### Span Attribute Enrichments (Phases 2-4)
@@ -1094,7 +1243,7 @@ State value encoding: 0=disconnected, 1=connected, 2=syncing, 3=tracking, 4=full
| Issue | Impact | Status |
| ------------------------------------------------------------------ | ------------------------------------------------ | -------------------------------------------------------------------- |
| `warn` and `drop` metrics use non-standard StatsD `\|m` meter type | Metrics silently dropped by OTel StatsD receiver | Phase 6 Task 6.1 — needs `\|m` → `\|c` change in StatsDCollector.cpp |
| `xrpld_job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
| `xrpld_jobq_ job_count` may not emit in standalone mode | Missing from Prometheus in some test configs | Requires active job queue activity |
| `xrpld_rpc_requests` depends on `[insight]` config | Zero series if StatsD not configured | Requires `[insight] server=statsd` in xrpld.cfg |
| Peer tracing disabled by default | No `peer.*` spans unless `trace_peer=1` | Intentional — high volume on mainnet |