mirror of
https://github.com/XRPLF/rippled.git
synced 2026-04-29 15:37:57 +00:00
docs(telemetry): update data collection reference with complete span/attribute inventory
Update 09-data-collection-reference.md to reflect the full implementation across all phases: - Expand span inventory from 16 to 35 spans across 8 categories (RPC, PathFind, TX, TxQ, Consensus, Ledger, Peer, gRPC) - Add complete attribute inventory (81 attributes) - Add TxQ spans (6), PathFind spans (5), and all 10 consensus spans - Document LedgerSpanNames.h and PeerSpanNames.h in file inventory - Add close time analysis dashboard panels to dashboard reference - Add $close_time_correct and $resolution_direction template variables - Document toDisplayString(ConsensusMode) utility - Fix section numbering (duplicate section 8) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -78,7 +78,7 @@ There are two independent telemetry pipelines entering a single **OTel Collector
|
||||
|
||||
## 1. OpenTelemetry Spans
|
||||
|
||||
### 1.1 Complete Span Inventory (16 spans)
|
||||
### 1.1 Complete Span Inventory (35 spans)
|
||||
|
||||
> **See also**: [02-design-decisions.md §2.3](./02-design-decisions.md#23-span-naming-conventions) for naming conventions and the full span catalog with rationale. [04-code-samples.md §4.6](./04-code-samples.md#46-span-flow-visualization) for span flow diagrams.
|
||||
|
||||
@@ -86,14 +86,15 @@ There are two independent telemetry pipelines entering a single **OTel Collector
|
||||
|
||||
Controlled by `trace_rpc=1` in `[telemetry]` config.
|
||||
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| -------------------- | ------------- | ----------------- | ------------------------------------------------------------------------ |
|
||||
| `rpc.request` | — | ServerHandler.cpp | Top-level HTTP RPC request entry point |
|
||||
| `rpc.process` | `rpc.request` | ServerHandler.cpp | RPC processing pipeline |
|
||||
| `rpc.ws_message` | — | ServerHandler.cpp | WebSocket message handling |
|
||||
| `rpc.command.<name>` | `rpc.process` | RPCHandler.cpp | Per-command span (e.g., `rpc.command.server_info`, `rpc.command.ledger`) |
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| -------------------- | ------------------ | ----------------- | ------------------------------------------------------------------------ |
|
||||
| `rpc.http_request` | — | ServerHandler.cpp | Top-level HTTP RPC request entry point |
|
||||
| `rpc.process` | `rpc.http_request` | ServerHandler.cpp | RPC processing pipeline |
|
||||
| `rpc.ws_message` | — | ServerHandler.cpp | WebSocket message handling |
|
||||
| `rpc.ws_upgrade` | — | ServerHandler.cpp | WebSocket upgrade handshake (error path) |
|
||||
| `rpc.command.<name>` | `rpc.process` | RPCHandler.cpp | Per-command span (e.g., `rpc.command.server_info`, `rpc.command.ledger`) |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"rpc.request|rpc.command.*"}`
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"rpc.http_request|rpc.command.*"}`
|
||||
|
||||
**Grafana dashboard**: _RPC Performance_ (`rippled-rpc-perf`)
|
||||
|
||||
@@ -111,17 +112,67 @@ Controlled by `trace_transactions=1` in `[telemetry]` config.
|
||||
|
||||
**Grafana dashboard**: _Transaction Overview_ (`rippled-transactions`)
|
||||
|
||||
#### PathFind Spans
|
||||
|
||||
Controlled by `trace_rpc=1` in `[telemetry]` config (pathfinding spans fire within RPC request handling).
|
||||
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| --------------------- | ------------------ | ---------------- | -------------------------------------------------------- |
|
||||
| `pathfind.request` | `rpc.command.*` | PathRequests.cpp | RPC entry for path_find / ripple_path_find |
|
||||
| `pathfind.compute` | `pathfind.request` | PathRequest.cpp | Single path computation (doUpdate) |
|
||||
| `pathfind.update_all` | — | PathRequests.cpp | Async recomputation of all active path requests on close |
|
||||
| `pathfind.discover` | `pathfind.compute` | Pathfinder.cpp | Graph exploration phase (Pathfinder::find) |
|
||||
| `pathfind.rank` | `pathfind.compute` | Pathfinder.cpp | Path ranking and selection phase |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"pathfind.*"}`
|
||||
|
||||
**Grafana dashboard**: _RPC & Pathfinding (StatsD)_ (`rippled-statsd-rpc`) for StatsD timers; span-derived metrics via _RPC Performance_ (`rippled-rpc-perf`)
|
||||
|
||||
#### TxQ Spans
|
||||
|
||||
Controlled by `trace_transactions=1` in `[telemetry]` config.
|
||||
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| ------------------ | ------------- | ----------- | ---------------------------------------------------- |
|
||||
| `txq.enqueue` | `tx.process` | TxQ.cpp | Queue admission decision (apply/queue/reject) |
|
||||
| `txq.apply_direct` | `txq.enqueue` | TxQ.cpp | Direct application attempt (bypassing queue) |
|
||||
| `txq.batch_clear` | `txq.enqueue` | TxQ.cpp | Batch clear of account's queued transactions |
|
||||
| `txq.accept` | — | TxQ.cpp | Ledger-close accept loop (drain queued transactions) |
|
||||
| `txq.accept.tx` | `txq.accept` | TxQ.cpp | Per-transaction apply within accept loop |
|
||||
| `txq.cleanup` | — | TxQ.cpp | Post-close cleanup (expire old transactions) |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"txq.*"}`
|
||||
|
||||
**Grafana dashboard**: _Transaction Overview_ (`rippled-transactions`)
|
||||
|
||||
#### gRPC Spans
|
||||
|
||||
Controlled by `trace_rpc=1` in `[telemetry]` config.
|
||||
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| -------------- | ------ | -------------- | ----------------------------------------------------------------------------- |
|
||||
| `grpc.request` | — | GRPCServer.cpp | Single gRPC request (GetLedger, GetLedgerData, GetLedgerDiff, GetLedgerEntry) |
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name="grpc.request"}`
|
||||
|
||||
#### Consensus Spans
|
||||
|
||||
Controlled by `trace_consensus=1` in `[telemetry]` config.
|
||||
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| --------------------------- | ------ | ---------------- | --------------------------------------------- |
|
||||
| `consensus.proposal.send` | — | RCLConsensus.cpp | Node broadcasts its transaction set proposal |
|
||||
| `consensus.ledger_close` | — | RCLConsensus.cpp | Ledger close event triggered by consensus |
|
||||
| `consensus.accept` | — | RCLConsensus.cpp | Consensus accepts a ledger (round complete) |
|
||||
| `consensus.validation.send` | — | RCLConsensus.cpp | Validation message sent after ledger accepted |
|
||||
| `consensus.accept.apply` | — | RCLConsensus.cpp | Ledger application with close time details |
|
||||
| Span Name | Parent | Source File | Description |
|
||||
| ---------------------------- | ----------------- | ---------------- | ----------------------------------------------------- |
|
||||
| `consensus.round` | — | RCLConsensus.cpp | Top-level round span (deterministic trace ID) |
|
||||
| `consensus.proposal.send` | `consensus.round` | RCLConsensus.cpp | Node broadcasts its transaction set proposal |
|
||||
| `consensus.ledger_close` | `consensus.round` | RCLConsensus.cpp | Ledger close event triggered by consensus |
|
||||
| `consensus.establish` | `consensus.round` | Consensus.h | Establish phase — convergence loop |
|
||||
| `consensus.update_positions` | `consensus.round` | Consensus.h | Update positions during establish phase |
|
||||
| `consensus.check` | `consensus.round` | Consensus.h | Check for consensus agreement |
|
||||
| `consensus.accept` | `consensus.round` | RCLConsensus.cpp | Consensus accepts a ledger (round complete) |
|
||||
| `consensus.accept.apply` | `consensus.round` | RCLConsensus.cpp | Ledger application with close time details |
|
||||
| `consensus.validation.send` | `consensus.round` | RCLConsensus.cpp | Validation message sent after ledger accepted |
|
||||
| `consensus.mode_change` | `consensus.round` | RCLConsensus.cpp | Consensus mode transition (e.g., tracking->proposing) |
|
||||
|
||||
> **Note**: `toDisplayString(ConsensusMode)` (in `ConsensusTypes.h`) provides Title Case display names for mode attribute values: `"Proposing"`, `"Observing"`, `"Wrong Ledger"`, `"Switched Ledger"`. This is separate from `to_string()` which returns stable log-format strings.
|
||||
|
||||
**Where to find**: Tempo → TraceQL: `{resource.service.name="rippled" && name=~"consensus.*"}`
|
||||
|
||||
@@ -156,7 +207,7 @@ Controlled by `trace_peer=1` in `[telemetry]` config. **Disabled by default** (h
|
||||
|
||||
---
|
||||
|
||||
### 1.2 Complete Attribute Inventory (22 attributes)
|
||||
### 1.2 Complete Attribute Inventory (81 attributes)
|
||||
|
||||
> **See also**: [02-design-decisions.md §2.4.2](./02-design-decisions.md#242-span-attributes-by-category) for attribute design rationale and privacy considerations.
|
||||
|
||||
@@ -164,14 +215,13 @@ Every span can carry key-value attributes that provide context for filtering and
|
||||
|
||||
#### RPC Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------------ | ------ | --------------- | ------------------------------------------------ |
|
||||
| `xrpl.rpc.command` | string | `rpc.command.*` | RPC command name (e.g., `server_info`, `ledger`) |
|
||||
| `xrpl.rpc.version` | int64 | `rpc.command.*` | API version number |
|
||||
| `xrpl.rpc.role` | string | `rpc.command.*` | Caller role: `"admin"` or `"user"` |
|
||||
| `xrpl.rpc.status` | string | `rpc.command.*` | Result: `"success"` or `"error"` |
|
||||
| `xrpl.rpc.duration_ms` | int64 | `rpc.command.*` | Command execution time in milliseconds |
|
||||
| `xrpl.rpc.error_message` | string | `rpc.command.*` | Error details (only set on failure) |
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ----------------------- | ------ | --------------- | ------------------------------------------------ |
|
||||
| `xrpl.rpc.command` | string | `rpc.command.*` | RPC command name (e.g., `server_info`, `ledger`) |
|
||||
| `xrpl.rpc.version` | int64 | `rpc.command.*` | API version number |
|
||||
| `xrpl.rpc.role` | string | `rpc.command.*` | Caller role: `"admin"` or `"user"` |
|
||||
| `xrpl.rpc.status` | string | `rpc.command.*` | Result: `"success"` or `"error"` |
|
||||
| `xrpl.rpc.payload_size` | int64 | `rpc.command.*` | Request payload size in bytes |
|
||||
|
||||
**Tempo query**: `{span.xrpl.rpc.command="server_info"}` to find all `server_info` calls.
|
||||
|
||||
@@ -186,48 +236,123 @@ Every span can carry key-value attributes that provide context for filtering and
|
||||
| `xrpl.tx.path` | string | `tx.process` | Submission path: `"sync"` or `"async"` |
|
||||
| `xrpl.tx.suppressed` | boolean | `tx.receive` | `true` if transaction was suppressed (duplicate) |
|
||||
| `xrpl.tx.status` | string | `tx.receive` | Transaction status (e.g., `"known_bad"`) |
|
||||
| `xrpl.peer.id` | int64 | `tx.receive` | Peer identifier (also set on peer spans) |
|
||||
| `xrpl.peer.version` | string | `tx.receive` | Peer protocol version string |
|
||||
|
||||
**Tempo query**: `{span.xrpl.tx.hash="<hash>"}` to trace a specific transaction across nodes.
|
||||
|
||||
**Prometheus label**: `xrpl_tx_local` (used as SpanMetrics dimension).
|
||||
|
||||
#### PathFind Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ---------------------------------- | ------- | --------------------- | ----------------------------------------------- |
|
||||
| `xrpl.pathfind.source_account` | string | `pathfind.request` | Source account address |
|
||||
| `xrpl.pathfind.dest_account` | string | `pathfind.request` | Destination account address |
|
||||
| `xrpl.pathfind.fast` | boolean | `pathfind.compute` | Whether this is a fast (non-full) pathfind |
|
||||
| `xrpl.pathfind.search_level` | int64 | `pathfind.compute` | Search depth level |
|
||||
| `xrpl.pathfind.num_complete_paths` | int64 | `pathfind.compute` | Number of complete paths found |
|
||||
| `xrpl.pathfind.num_paths` | int64 | `pathfind.compute` | Total number of paths explored |
|
||||
| `xrpl.pathfind.num_requests` | int64 | `pathfind.update_all` | Number of active path requests being recomputed |
|
||||
| `xrpl.pathfind.ledger_index` | int64 | `pathfind.update_all` | Ledger index used for recomputation |
|
||||
|
||||
**Tempo query**: `{span.xrpl.pathfind.source_account="rHb9..."}` to find pathfind requests from a specific account.
|
||||
|
||||
#### TxQ Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ----------------------------- | ------- | ------------------------------ | ---------------------------------------------------------- |
|
||||
| `xrpl.txq.tx_hash` | string | `txq.enqueue`, `txq.accept.tx` | Transaction hash in the queue |
|
||||
| `xrpl.txq.status` | string | `txq.enqueue` | Queue result: `"queued"`, `"applied_direct"`, `"rejected"` |
|
||||
| `xrpl.txq.fee_level_paid` | int64 | `txq.enqueue` | Fee level paid by the transaction |
|
||||
| `xrpl.txq.required_fee_level` | int64 | `txq.enqueue` | Minimum fee level required for queue admission |
|
||||
| `xrpl.txq.queue_size` | int64 | `txq.accept` | Queue depth at start of accept |
|
||||
| `xrpl.txq.ledger_changed` | boolean | `txq.accept` | Whether the open ledger changed since last accept |
|
||||
| `xrpl.txq.ledger_seq` | int64 | `txq.cleanup` | Ledger sequence for cleanup |
|
||||
| `xrpl.txq.expired_count` | int64 | `txq.cleanup` | Number of expired transactions removed |
|
||||
| `xrpl.txq.ter_code` | string | `txq.accept.tx` | Transaction engine result code |
|
||||
| `xrpl.txq.retries_remaining` | int64 | `txq.accept.tx` | Remaining retry attempts for this transaction |
|
||||
| `xrpl.txq.num_cleared` | int64 | `txq.batch_clear` | Number of transactions cleared in batch |
|
||||
|
||||
**Tempo query**: `{span.xrpl.txq.status="rejected"}` to find rejected queue attempts.
|
||||
|
||||
#### gRPC Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------ | ------ | -------------- | ------------------------------------------------------------ |
|
||||
| `xrpl.grpc.method` | string | `grpc.request` | gRPC method name (e.g., `GetLedger`, `GetLedgerData`) |
|
||||
| `xrpl.grpc.role` | string | `grpc.request` | Caller role: `"admin"` or `"user"` |
|
||||
| `xrpl.grpc.status` | string | `grpc.request` | Result: `"success"`, `"error"`, `"resource_exhausted"`, etc. |
|
||||
|
||||
**Tempo query**: `{span.xrpl.grpc.method="GetLedger"}` to find gRPC ledger requests.
|
||||
|
||||
#### Consensus Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------------------------ | ------- | --------------------------------------------------------------------------------------------------- | ------------------------------------------------------------- |
|
||||
| `xrpl.consensus.round` | int64 | `consensus.proposal.send` | Consensus round number |
|
||||
| `xrpl.consensus.mode` | string | `consensus.proposal.send`, `consensus.ledger_close` | Node mode: `"syncing"`, `"tracking"`, `"full"`, `"proposing"` |
|
||||
| `xrpl.consensus.proposers` | int64 | `consensus.proposal.send`, `consensus.accept` | Number of proposers in the round |
|
||||
| `xrpl.consensus.proposing` | boolean | `consensus.validation.send` | Whether this node was a proposer |
|
||||
| `xrpl.consensus.ledger.seq` | int64 | `consensus.ledger_close`, `consensus.accept`, `consensus.validation.send`, `consensus.accept.apply` | Ledger sequence number |
|
||||
| `xrpl.consensus.close_time` | int64 | `consensus.accept.apply` | Agreed-upon ledger close time (epoch seconds) |
|
||||
| `xrpl.consensus.close_time_correct` | boolean | `consensus.accept.apply` | Whether validators reached agreement on close time |
|
||||
| `xrpl.consensus.close_resolution_ms` | int64 | `consensus.accept.apply` | Close time rounding granularity in milliseconds |
|
||||
| `xrpl.consensus.state` | string | `consensus.accept.apply` | Consensus outcome: `"finished"` or `"moved_on"` |
|
||||
| `xrpl.consensus.round_time_ms` | int64 | `consensus.accept.apply` | Total consensus round duration in milliseconds |
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
|
||||
| `xrpl.consensus.ledger_id` | string | `consensus.round` | Previous ledger hash (used for deterministic trace ID) |
|
||||
| `xrpl.consensus.ledger.seq` | int64 | `consensus.round`, `consensus.ledger_close`, `consensus.accept`, `consensus.validation.send`, `consensus.accept.apply` | Ledger sequence number |
|
||||
| `xrpl.consensus.mode` | string | `consensus.round`, `consensus.proposal.send`, `consensus.ledger_close` | Node mode via `toDisplayString()`: `"Proposing"`, `"Observing"`, etc. |
|
||||
| `xrpl.consensus.round` | int64 | `consensus.proposal.send` | Consensus round number |
|
||||
| `xrpl.consensus.proposers` | int64 | `consensus.proposal.send`, `consensus.accept` | Number of proposers in the round |
|
||||
| `xrpl.consensus.round_time_ms` | int64 | `consensus.accept`, `consensus.accept.apply` | Total consensus round duration in milliseconds |
|
||||
| `xrpl.consensus.proposing` | boolean | `consensus.validation.send` | Whether this node was a proposer |
|
||||
| `xrpl.consensus.state` | string | `consensus.accept.apply` | Consensus outcome: `"finished"` or `"moved_on"` |
|
||||
| `xrpl.consensus.close_time` | int64 | `consensus.accept.apply` | Agreed-upon ledger close time (epoch seconds) |
|
||||
| `xrpl.consensus.close_time_correct` | boolean | `consensus.accept.apply` | Whether validators reached agreement on close time |
|
||||
| `xrpl.consensus.close_resolution_ms` | int64 | `consensus.accept.apply` | Close time rounding granularity in milliseconds |
|
||||
| `xrpl.consensus.parent_close_time` | int64 | `consensus.accept.apply` | Parent ledger's close time (epoch seconds) |
|
||||
| `xrpl.consensus.close_time_self` | int64 | `consensus.accept.apply` | This node's proposed close time |
|
||||
| `xrpl.consensus.close_time_vote_bins` | string | `consensus.accept.apply` | Histogram of close time votes from validators |
|
||||
| `xrpl.consensus.resolution_direction` | string | `consensus.accept.apply` | Resolution change: `"increased"`, `"decreased"`, or `"unchanged"` |
|
||||
| `xrpl.consensus.converge_percent` | int64 | `consensus.establish` | Convergence percentage threshold |
|
||||
| `xrpl.consensus.establish_count` | int64 | `consensus.establish` | Number of establish iterations completed |
|
||||
| `xrpl.consensus.proposers_agreed` | int64 | `consensus.establish` | Number of proposers that agreed on this round |
|
||||
| `xrpl.consensus.avalanche_threshold` | int64 | `consensus.update_positions` | Avalanche threshold for dispute resolution |
|
||||
| `xrpl.consensus.close_time_threshold` | int64 | `consensus.update_positions` | Close time agreement threshold |
|
||||
| `xrpl.consensus.have_close_time_consensus` | boolean | `consensus.update_positions` | Whether close time consensus has been reached |
|
||||
| `xrpl.consensus.agree_count` | int64 | `consensus.check` | Number of proposers that agree with our position |
|
||||
| `xrpl.consensus.disagree_count` | int64 | `consensus.check` | Number of proposers that disagree with our position |
|
||||
| `xrpl.consensus.threshold_percent` | int64 | `consensus.check` | Required agreement threshold percentage |
|
||||
| `xrpl.consensus.result` | string | `consensus.check` | Check result: `"yes"`, `"no"`, or `"expired"` |
|
||||
| `xrpl.consensus.quorum` | int64 | `consensus.check` | Required quorum for validation |
|
||||
| `xrpl.consensus.validation_count` | int64 | `consensus.check` | Number of validations received |
|
||||
| `xrpl.consensus.trace_strategy` | string | `consensus.round` | Trace sampling strategy used for this round |
|
||||
| `xrpl.consensus.round_id` | string | `consensus.round` | Deterministic round identifier |
|
||||
| `xrpl.consensus.mode.old` | string | `consensus.mode_change` | Previous consensus mode |
|
||||
| `xrpl.consensus.mode.new` | string | `consensus.mode_change` | New consensus mode |
|
||||
| `xrpl.tx.id` | string | `consensus.update_positions` | Disputed transaction ID |
|
||||
| `xrpl.dispute.our_vote` | boolean | `consensus.update_positions` | Our vote on the disputed transaction |
|
||||
| `xrpl.dispute.yays` | int64 | `consensus.update_positions` | Number of proposers voting to include |
|
||||
| `xrpl.dispute.nays` | int64 | `consensus.update_positions` | Number of proposers voting to exclude |
|
||||
|
||||
**Tempo query**: `{span.xrpl.consensus.mode="proposing"}` to find rounds where node was proposing.
|
||||
**Tempo query**: `{span.xrpl.consensus.mode="Proposing"}` to find rounds where node was proposing.
|
||||
|
||||
**Prometheus label**: `xrpl_consensus_mode` (used as SpanMetrics dimension).
|
||||
|
||||
#### Ledger Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------------- | ----- | ------------------------------------------------------------- | ---------------------------------------------- |
|
||||
| `xrpl.ledger.seq` | int64 | `ledger.build`, `ledger.validate`, `ledger.store`, `tx.apply` | Ledger sequence number |
|
||||
| `xrpl.ledger.validations` | int64 | `ledger.validate` | Number of validations received for this ledger |
|
||||
| `xrpl.ledger.tx_count` | int64 | `ledger.build`, `tx.apply` | Transactions in the ledger |
|
||||
| `xrpl.ledger.tx_failed` | int64 | `ledger.build`, `tx.apply` | Failed transactions in the ledger |
|
||||
| Attribute | Type | Set On | Description |
|
||||
| --------------------------------- | ------- | ------------------------------------------------------------- | ------------------------------------------------ |
|
||||
| `xrpl.ledger.seq` | int64 | `ledger.build`, `ledger.validate`, `ledger.store`, `tx.apply` | Ledger sequence number |
|
||||
| `xrpl.ledger.close_time` | int64 | `ledger.build` | Ledger close time (epoch seconds) |
|
||||
| `xrpl.ledger.close_time_correct` | boolean | `ledger.build` | Whether close time was agreed upon by validators |
|
||||
| `xrpl.ledger.close_resolution_ms` | int64 | `ledger.build` | Close time rounding granularity in milliseconds |
|
||||
| `xrpl.ledger.tx_count` | int64 | `ledger.build`, `tx.apply` | Transactions in the ledger |
|
||||
| `xrpl.ledger.tx_failed` | int64 | `ledger.build`, `tx.apply` | Failed transactions in the ledger |
|
||||
| `xrpl.ledger.validations` | int64 | `ledger.validate` | Number of validations received for this ledger |
|
||||
|
||||
**Tempo query**: `{span.xrpl.ledger.seq=12345}` to find all spans for a specific ledger.
|
||||
|
||||
#### Peer Attributes
|
||||
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ------------------------------ | ------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
|
||||
| `xrpl.peer.id` | int64 | `tx.receive`, `peer.proposal.receive`, `peer.validation.receive` | Peer identifier |
|
||||
| `xrpl.peer.proposal.trusted` | boolean | `peer.proposal.receive` | Whether the proposal came from a trusted validator |
|
||||
| `xrpl.peer.validation.trusted` | boolean | `peer.validation.receive` | Whether the validation came from a trusted validator |
|
||||
| Attribute | Type | Set On | Description |
|
||||
| ---------------------------------- | ------- | ---------------------------------------------------------------- | ---------------------------------------------------- |
|
||||
| `xrpl.peer.id` | int64 | `tx.receive`, `peer.proposal.receive`, `peer.validation.receive` | Peer identifier |
|
||||
| `xrpl.peer.proposal.trusted` | boolean | `peer.proposal.receive` | Whether the proposal came from a trusted validator |
|
||||
| `xrpl.peer.validation.ledger_hash` | string | `peer.validation.receive` | Ledger hash the validation refers to |
|
||||
| `xrpl.peer.validation.full` | boolean | `peer.validation.receive` | Whether this is a full (not partial) validation |
|
||||
| `xrpl.peer.validation.trusted` | boolean | `peer.validation.receive` | Whether the validation came from a trusted validator |
|
||||
|
||||
**Prometheus labels**: `xrpl_peer_proposal_trusted`, `xrpl_peer_validation_trusted` (SpanMetrics dimensions).
|
||||
|
||||
@@ -366,13 +491,13 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
|
||||
### 3.1 Span-Derived Dashboards (5)
|
||||
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| -------------------- | ---------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
|
||||
| RPC Performance | `rippled-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
|
||||
| Transaction Overview | `rippled-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
|
||||
| Consensus Health | `rippled-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap |
|
||||
| Ledger Operations | `rippled-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
|
||||
| Peer Network | `rippled-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
|
||||
| Dashboard | UID | Data Source | Key Panels |
|
||||
| -------------------- | ---------------------- | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| RPC Performance | `rippled-rpc-perf` | Prometheus (SpanMetrics) | Request rate by command, p95 latency by command, error rate, heatmap, top commands |
|
||||
| Transaction Overview | `rippled-transactions` | Prometheus (SpanMetrics) | Processing rate, latency p95/p50, local vs relay split, apply duration, heatmap |
|
||||
| Consensus Health | `rippled-consensus` | Prometheus (SpanMetrics) | Round duration p95/p50, proposals rate, close duration, mode timeline, heatmap, close time correctness, resolution direction, close time drift, resolution change timeline, close time vote distribution |
|
||||
| Ledger Operations | `rippled-ledger-ops` | Prometheus (SpanMetrics) | Build rate, build duration, validation rate, store rate, build vs close comparison |
|
||||
| Peer Network | `rippled-peer-net` | Prometheus (SpanMetrics) | Proposal receive rate, validation receive rate, trusted vs untrusted breakdown |
|
||||
|
||||
### 3.2 StatsD Dashboards (5)
|
||||
|
||||
@@ -384,7 +509,27 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
| Overlay Traffic Detail | `rippled-statsd-overlay-detail` | Prometheus (StatsD) | Squelch, overhead, validator lists, set get/share, have/requested tx, proof paths |
|
||||
| Ledger Data & Sync | `rippled-statsd-ledger-sync` | Prometheus (StatsD) | Ledger data exchange, legacy ledger share/get, getobject by type, traffic heatmap |
|
||||
|
||||
### 3.3 Accessing the Dashboards
|
||||
### 3.3 Consensus Close-Time Panels
|
||||
|
||||
The Consensus Health dashboard includes 5 close-time panels added in Phase 4:
|
||||
|
||||
| Panel | Metric / Attribute | Description |
|
||||
| ---------------------------- | --------------------------------------------------------------- | ------------------------------------------------------------------------ |
|
||||
| Close Time Correctness | `xrpl.consensus.close_time_correct` | Percentage of rounds with agreed-upon close time |
|
||||
| Resolution Direction | `xrpl.consensus.resolution_direction` | Rate of resolution increases, decreases, and unchanged per time interval |
|
||||
| Close Time Drift | `xrpl.consensus.close_time` vs `xrpl.consensus.close_time_self` | Difference between agreed close time and node's own proposed close time |
|
||||
| Resolution Change Timeline | `xrpl.consensus.close_resolution_ms` | Close time resolution granularity over time |
|
||||
| Close Time Vote Distribution | `xrpl.consensus.close_time_vote_bins` | Histogram of validator close time votes per round |
|
||||
|
||||
**Template variables** (Consensus Health dashboard):
|
||||
|
||||
| Variable | Source Attribute | Description |
|
||||
| ----------------------- | ------------------------------------- | ------------------------------------------------------------------------ |
|
||||
| `$node` | `exported_instance` | Filter by rippled node instance |
|
||||
| `$close_time_correct` | `xrpl_consensus_close_time_correct` | Filter by close time correctness (`true` / `false`) |
|
||||
| `$resolution_direction` | `xrpl_consensus_resolution_direction` | Filter by resolution direction (`increased` / `decreased` / `unchanged`) |
|
||||
|
||||
### 3.4 Accessing the Dashboards
|
||||
|
||||
1. Open Grafana at **http://localhost:3000**
|
||||
2. Navigate to **Dashboards → rippled** folder
|
||||
@@ -400,7 +545,7 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
|
||||
| What to Find | Tempo TraceQL Query |
|
||||
| ------------------------ | -------------------------------------------------------------------------------- |
|
||||
| All RPC calls | `{resource.service.name="rippled" && name="rpc.request"}` |
|
||||
| All RPC calls | `{resource.service.name="rippled" && name="rpc.http_request"}` |
|
||||
| Specific RPC command | `{resource.service.name="rippled" && name="rpc.command.server_info"}` |
|
||||
| Slow RPC calls | `{resource.service.name="rippled" && name=~"rpc.command.*"} \| duration > 100ms` |
|
||||
| Failed RPC calls | `{span.xrpl.rpc.status="error"}` |
|
||||
@@ -416,20 +561,26 @@ For each of the 45+ overlay traffic categories (defined in `TrafficCount.h`), fo
|
||||
A typical RPC trace shows the span hierarchy:
|
||||
|
||||
```
|
||||
rpc.request (ServerHandler)
|
||||
rpc.http_request (ServerHandler)
|
||||
└── rpc.process (ServerHandler)
|
||||
└── rpc.command.server_info (RPCHandler)
|
||||
```
|
||||
|
||||
A consensus round produces independent spans (not parent-child):
|
||||
A consensus round groups child spans under a deterministic trace ID:
|
||||
|
||||
```
|
||||
consensus.ledger_close (close event)
|
||||
consensus.proposal.send (broadcast proposal)
|
||||
consensus.round (top-level, deterministic trace ID from ledger hash)
|
||||
├── consensus.ledger_close (close event)
|
||||
├── consensus.proposal.send (broadcast proposal)
|
||||
├── consensus.establish (convergence loop)
|
||||
│ ├── consensus.update_positions (update disputes)
|
||||
│ └── consensus.check (check agreement)
|
||||
├── consensus.accept (accept result)
|
||||
├── consensus.accept.apply (apply with close time details)
|
||||
├── consensus.validation.send (send validation)
|
||||
└── consensus.mode_change (mode transition, if any)
|
||||
ledger.build (build new ledger)
|
||||
└── tx.apply (apply transaction set)
|
||||
consensus.accept (accept result)
|
||||
consensus.validation.send (send validation)
|
||||
ledger.validate (promote to validated)
|
||||
ledger.store (persist to DB)
|
||||
```
|
||||
@@ -480,7 +631,26 @@ rippled_State_Accounting_Full_duration
|
||||
|
||||
---
|
||||
|
||||
## 6. Known Issues
|
||||
## 6. SpanNames Header File Inventory
|
||||
|
||||
All span names and attributes are defined as compile-time constants in colocated `SpanNames.h` headers. Each header lives next to its subsystem's implementation.
|
||||
|
||||
| Header File | Subsystem | Span Count | Attribute Count | Notes |
|
||||
| ----------------------------------------------- | ------------- | ---------- | --------------- | ------------------------------------------- |
|
||||
| `src/xrpld/rpc/detail/RpcSpanNames.h` | RPC (HTTP/WS) | 5 | 5 | Includes `rpc.ws_upgrade` error path |
|
||||
| `src/xrpld/rpc/detail/PathFindSpanNames.h` | PathFind | 5 | 8 | Covers one-shot and subscription paths |
|
||||
| `src/xrpld/app/main/GrpcSpanNames.h` | gRPC | 1 | 3 | Flat single-span structure per request |
|
||||
| `src/xrpld/app/misc/TxSpanNames.h` | Transaction | 2 | 7 | Includes peer context attributes |
|
||||
| `src/xrpld/app/misc/detail/TxQSpanNames.h` | TxQ | 6 | 11 | Queue lifecycle: enqueue through cleanup |
|
||||
| `src/xrpld/app/consensus/ConsensusSpanNames.h` | Consensus | 10 | 35 | Deterministic trace IDs, close-time details |
|
||||
| `src/xrpld/app/ledger/detail/LedgerSpanNames.h` | Ledger | 4 | 7 | Build, store, validate, tx.apply |
|
||||
| `src/xrpld/overlay/detail/PeerSpanNames.h` | Peer Overlay | 2 | 5 | Proposal and validation receive |
|
||||
|
||||
> **Design convention**: SpanNames headers are colocated with their subsystem classes rather than centralized in `telemetry/`. See [memory/feedback_span-names-colocation.md](../.claude/memory/feedback_span-names-colocation.md) for rationale.
|
||||
|
||||
---
|
||||
|
||||
## 7. Known Issues
|
||||
|
||||
| Issue | Impact | Status |
|
||||
| ------------------------------------------------------------------ | ------------------------------------------------ | -------------------------------------------------------------------- |
|
||||
@@ -491,7 +661,7 @@ rippled_State_Accounting_Full_duration
|
||||
|
||||
---
|
||||
|
||||
## 7. Privacy and Data Collection
|
||||
## 8. Privacy and Data Collection
|
||||
|
||||
The telemetry system is designed with privacy in mind:
|
||||
|
||||
@@ -505,7 +675,7 @@ The telemetry system is designed with privacy in mind:
|
||||
|
||||
---
|
||||
|
||||
## 8. Configuration Quick Reference
|
||||
## 9. Configuration Quick Reference
|
||||
|
||||
> **Full reference**: [05-configuration-reference.md](./05-configuration-reference.md) §5.1 for all `[telemetry]` options with defaults, the config parser implementation, and collector YAML configurations (dev and production).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user