# Phase 11: Third-Party Data Collection Pipelines — Task List > **Status**: Future Enhancement > > **Goal**: Build a custom OTel Collector receiver that periodically polls rippled's admin RPCs and exports structured metrics for external consumers — making all XRPL health, validator, peer, fee, and DEX data available as Prometheus/OTLP metrics without rippled code changes. > > **Scope**: Go-based OTel Collector receiver plugin + Grafana dashboards + Prometheus alerting rules. > > **Branch**: `pratik/otel-phase11-third-party-collection` (from `pratik/otel-phase10-workload-validation`) > > **Depends on**: Phase 10 (validation harness for testing the new receiver) ### Related Plan Documents | Document | Relevance | | -------------------------------------------------------------------- | --------------------------------------------------------------- | | [06-implementation-phases.md](./06-implementation-phases.md) | Phase 11 plan: motivation, architecture, exit criteria (§6.8.4) | | [09-data-collection-reference.md](./09-data-collection-reference.md) | Defines full metric inventory including third-party metrics | | [Phase10_taskList.md](./Phase10_taskList.md) | Prerequisite — validation harness for testing | ### Third-Party Consumer Gap Analysis This phase addresses the cross-cutting gap identified during research: **rippled has no native Prometheus/OTLP metrics export for data accessible only via RPC**. Every consumer (exchanges, payment processors, analytics providers, validators, researchers, compliance firms, custodians) must build custom JSON-RPC polling and conversion. This receiver centralizes that work. | Consumer Category | Data Unlocked by This Phase | | -------------------------- | ------------------------------------------------------------------ | | **Exchanges** | Real-time fee estimates, TxQ capacity, server health scores | | **Payment Processors** | Settlement latency percentiles, corridor health, path availability | | **Analytics Providers** | Validator metrics, network topology, amendment voting status | | **DeFi / AMM** | AMM pool TVL, DEX order book depth, trade volumes | | **Validators / Operators** | Per-peer latency, version distribution, UNL health, alerting | | **Compliance** | Transaction volume trends, network growth metrics | | **Academic Researchers** | Consensus performance time-series, decentralization metrics | | **CBDC / Tokenization** | Token supply tracking, trust line adoption, freeze status | | **Institutional Custody** | Multi-sig status, escrow tracking, reserve calculations | | **Wallet Providers** | Server health for node selection, fee prediction data | --- ## Task 11.1: OTel Collector Receiver Scaffold **Objective**: Create the Go project structure for a custom OTel Collector receiver that polls rippled JSON-RPC. **What to do**: - Create `docker/telemetry/otel-rippled-receiver/`: - `receiver.go` — implements `receiver.Metrics` interface - `config.go` — configuration struct (endpoint, poll interval, enabled RPCs) - `factory.go` — receiver factory registration - `go.mod` / `go.sum` — Go module with OTel Collector SDK dependency - Configuration model: ```yaml rippled_receiver: endpoint: "http://localhost:5005" # rippled admin RPC poll_interval: 30s # how often to poll enabled_collectors: - server_info - get_counts - fee - peers - validators - feature - server_state amm_pools: [] # optional: AMM pool IDs to track book_offers_pairs: [] # optional: currency pairs for DEX depth ``` - Build a custom OTel Collector binary that includes this receiver alongside the standard receivers. **Key files**: - New: `docker/telemetry/otel-rippled-receiver/receiver.go` - New: `docker/telemetry/otel-rippled-receiver/config.go` - New: `docker/telemetry/otel-rippled-receiver/factory.go` - New: `docker/telemetry/otel-rippled-receiver/go.mod` - New: `docker/telemetry/otel-rippled-receiver/Dockerfile` --- ## Task 11.2: server_info / server_state Collector **Objective**: Poll `server_info` and `server_state` and export all fields as OTel metrics. **What to do**: - Implement `serverInfoCollector` that calls `server_info` (admin) and extracts: **Node Health Gauges:** - `xrpl_server_state` (enum → int: disconnected=0, connected=1, syncing=2, tracking=3, full=4, proposing=5) - `xrpl_server_state_duration_seconds` - `xrpl_uptime_seconds` - `xrpl_io_latency_ms` - `xrpl_amendment_blocked` (0 or 1) - `xrpl_peers_count` - `xrpl_peer_disconnects_total` - `xrpl_peer_disconnects_resources_total` - `xrpl_jq_trans_overflow_total` **Consensus Gauges:** - `xrpl_last_close_proposers` - `xrpl_last_close_converge_time_seconds` - `xrpl_validation_quorum` **Ledger Gauges:** - `xrpl_validated_ledger_seq` - `xrpl_validated_ledger_age_seconds` - `xrpl_validated_ledger_base_fee_drops` - `xrpl_validated_ledger_reserve_base_drops` - `xrpl_validated_ledger_reserve_inc_drops` - `xrpl_close_time_offset_seconds` (0 when absent) **Load Factor Gauges:** - `xrpl_load_factor` - `xrpl_load_factor_server` - `xrpl_load_factor_fee_escalation` - `xrpl_load_factor_fee_queue` - `xrpl_load_factor_local` - `xrpl_load_factor_net` - `xrpl_load_factor_cluster` **State Accounting Gauges** (per state: disconnected, connected, syncing, tracking, full): - `xrpl_state_duration_seconds{state=""}` - `xrpl_state_transitions_total{state=""}` **Validator Info** (when node is a validator): - `xrpl_validator_list_count` - `xrpl_validator_list_expiration_seconds` (epoch) - `xrpl_validator_list_active` (0 or 1) **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/server_info.go` --- ## Task 11.3: get_counts Collector **Objective**: Poll `get_counts` and export internal object counts and NodeStore stats. **What to do**: - Implement `getCountsCollector`: **Database Gauges:** - `xrpl_db_size_kb{db="total"}`, `xrpl_db_size_kb{db="ledger"}`, `xrpl_db_size_kb{db="transaction"}` **NodeStore Gauges:** - `xrpl_nodestore_reads_total`, `xrpl_nodestore_reads_hit`, `xrpl_nodestore_writes_total` - `xrpl_nodestore_read_bytes`, `xrpl_nodestore_written_bytes` - `xrpl_nodestore_read_duration_us`, `xrpl_nodestore_write_load` - `xrpl_nodestore_read_queue`, `xrpl_nodestore_read_threads_running` **Cache Gauges:** - `xrpl_cache_hit_rate{cache="SLE"}`, `xrpl_cache_hit_rate{cache="ledger"}`, `xrpl_cache_hit_rate{cache="accepted_ledger"}` - `xrpl_cache_size{cache="treenode"}`, `xrpl_cache_size{cache="fullbelow"}`, `xrpl_cache_size{cache="accepted_ledger"}` **Object Count Gauges:** - `xrpl_object_count{type=""}` for each counted object type (Transaction, Ledger, NodeObject, STTx, STLedgerEntry, InboundLedger, Pathfinder, etc.) **Rates:** - `xrpl_historical_fetch_per_minute` - `xrpl_local_txs` **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/get_counts.go` --- ## Task 11.4: Peer Topology Collector **Objective**: Poll `peers` and export per-peer and aggregate network metrics. **What to do**: - Implement `peersCollector`: **Aggregate Gauges:** - `xrpl_peers_inbound_count` - `xrpl_peers_outbound_count` - `xrpl_peers_cluster_count` **Per-Peer Gauges** (with labels `peer_key` truncated to 8 chars for cardinality control): - `xrpl_peer_latency_ms{peer="", version="", inbound=""}` - `xrpl_peer_uptime_seconds{peer=""}` - `xrpl_peer_load{peer=""}` **Distribution Gauges** (aggregated across all peers): - `xrpl_peer_latency_p50_ms`, `xrpl_peer_latency_p95_ms`, `xrpl_peer_latency_p99_ms` - `xrpl_peer_version_count{version=""}` — count of peers per software version **Tracking Status:** - `xrpl_peer_diverged_count` — peers with `track=diverged` - `xrpl_peer_unknown_count` — peers with `track=unknown` **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/peers.go` **Cardinality note**: Per-peer metrics use truncated keys. For large peer sets (50+), the aggregate distribution gauges are preferred over per-peer labels. --- ## Task 11.5: Validator & Amendment Collector **Objective**: Poll `validators` and `feature` to export validator health and amendment voting status. **What to do**: - Implement `validatorCollector`: **From `validators` RPC:** - `xrpl_trusted_validators_count` - `xrpl_validator_signing` (0 or 1 — whether local validator is signing) **From `feature` RPC:** - `xrpl_amendment_enabled_count` — total enabled amendments - `xrpl_amendment_majority_count` — amendments with majority but not yet enabled - `xrpl_amendment_vetoed_count` — locally vetoed amendments - `xrpl_amendment_unsupported_majority` (0 or 1) — any unsupported amendment has majority (critical alert) **Per-amendment with majority** (limited cardinality — only amendments with `majority` set): - `xrpl_amendment_majority_time{name=""}` — epoch time when majority was gained - `xrpl_amendment_votes{name=""}` — current vote count - `xrpl_amendment_threshold{name=""}` — votes needed **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/validators.go` --- ## Task 11.6: Fee & TxQ Collector **Objective**: Poll `fee` RPC and export real-time fee market data. **What to do**: - Implement `feeCollector` that calls the public `fee` RPC: **Fee Level Gauges:** - `xrpl_fee_current_ledger_size` — transactions in current open ledger - `xrpl_fee_expected_ledger_size` — expected transactions at close - `xrpl_fee_max_queue_size` — maximum transaction queue size - `xrpl_fee_open_ledger_fee_drops` — minimum fee for open ledger inclusion - `xrpl_fee_median_fee_drops` — median fee level - `xrpl_fee_minimum_fee_drops` — base reference fee - `xrpl_fee_queue_size` — current queue depth - This overlaps with Phase 9's internal TxQ metrics but provides an external-only collection path that doesn't require rippled code changes. **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/fee.go` --- ## Task 11.7: DEX & AMM Collector (Optional) **Objective**: Periodically poll configured AMM pools and order book pairs for DeFi metrics. **What to do**: - Implement `dexCollector` (enabled only when `amm_pools` or `book_offers_pairs` are configured): **AMM Pool Gauges** (per configured pool): - `xrpl_amm_reserve{pool="", asset=""}` — pool reserve amount - `xrpl_amm_lp_token_supply{pool=""}` — outstanding LP tokens - `xrpl_amm_trading_fee{pool=""}` — pool trading fee (basis points) - `xrpl_amm_tvl_drops{pool=""}` — total value locked (XRP-denominated) **Order Book Gauges** (per configured pair): - `xrpl_orderbook_bid_depth{pair="/"}` — total bid volume - `xrpl_orderbook_ask_depth{pair="/"}` — total ask volume - `xrpl_orderbook_spread{pair="/"}` — best bid-ask spread - `xrpl_orderbook_offer_count{pair="/", side="bid|ask"}` — number of offers **Key files**: - New: `docker/telemetry/otel-rippled-receiver/collectors/dex.go` **Note**: This is optional because it requires explicit configuration of which pools/pairs to track. Default configuration tracks no DEX data. --- ## Task 11.8: Prometheus Alerting Rules **Objective**: Create production-ready alerting rules for the metrics exported by this receiver. **What to do**: - Create `docker/telemetry/prometheus/rippled-alerts.yml`: **Tier 1 — Critical (page immediately):** ```yaml - alert: XRPLServerNotFull expr: xrpl_server_state < 4 for: 15m - alert: XRPLAmendmentBlocked expr: xrpl_amendment_blocked == 1 for: 1m - alert: XRPLNoPeers expr: xrpl_peers_count == 0 for: 5m - alert: XRPLLedgerStale expr: xrpl_validated_ledger_age_seconds > 120 for: 2m - alert: XRPLHighIOLatency expr: xrpl_io_latency_ms > 100 for: 5m - alert: XRPLUnsupportedAmendmentMajority expr: xrpl_amendment_unsupported_majority == 1 for: 1m ``` **Tier 2 — Warning (investigate within hours):** ```yaml - alert: XRPLLowPeerCount expr: xrpl_peers_count < 10 for: 15m - alert: XRPLHighLoadFactor expr: xrpl_load_factor > 10 for: 10m - alert: XRPLSlowConsensus expr: xrpl_last_close_converge_time_seconds > 6 for: 5m - alert: XRPLValidatorListExpiring expr: (xrpl_validator_list_expiration_seconds - time()) < 86400 for: 1h - alert: XRPLClockDrift expr: xrpl_close_time_offset_seconds > 0 for: 5m - alert: XRPLStateFlapping expr: rate(xrpl_state_transitions_total{state="full"}[1h]) > 2 for: 30m ``` **Key files**: - New: `docker/telemetry/prometheus/rippled-alerts.yml` - Update: `docker/telemetry/prometheus/prometheus.yml` (add rule_files reference) --- ## Task 11.9: New Grafana Dashboards **Objective**: Create 4 new dashboards for the data exported by the receiver. **What to do**: - **Validator Health** (`rippled-validator-health`): - Server state timeline, state duration breakdown - Proposer count trend, converge time trend, validation quorum - Validator list expiration countdown - Amendment voting status (majority/enabled/vetoed) - **Network Topology** (`rippled-network-topology`): - Peer count (inbound/outbound/cluster), peer version distribution - Peer latency distribution (p50/p95/p99), diverged peer count - Geographic distribution (if enriched with GeoIP) - Peer uptime distribution - **Fee Market** (`rippled-fee-market-external`): - Current fee levels (open ledger, median, minimum), fee escalation timeline - Queue depth vs. capacity, transactions per ledger - Load factor breakdown (server/network/cluster/escalation) - **DEX & AMM Overview** (`rippled-dex-amm`) (only populated when DEX collectors are configured): - AMM pool TVL, reserve ratios, LP token supply - Order book depth per pair, spread trends - Trading fee revenue estimates **Key files**: - New: `docker/telemetry/grafana/dashboards/rippled-validator-health.json` - New: `docker/telemetry/grafana/dashboards/rippled-network-topology.json` - New: `docker/telemetry/grafana/dashboards/rippled-fee-market-external.json` - New: `docker/telemetry/grafana/dashboards/rippled-dex-amm.json` --- ## Task 11.10: Integration with Phase 10 Validation **Objective**: Extend the Phase 10 validation suite to verify this receiver's metrics. **What to do**: - Update `docker/telemetry/workload/validate_telemetry.py`: - Add assertions for all `xrpl_*` metrics produced by the receiver - Verify metric labels have expected values - Verify alerting rules fire correctly (inject a "bad" state and check alert) - Update `docker/telemetry/docker-compose.workload.yaml`: - Add the custom OTel Collector build with the rippled receiver - Configure the receiver to poll one of the test nodes **Key files**: - Update: `docker/telemetry/workload/validate_telemetry.py` - Update: `docker/telemetry/docker-compose.workload.yaml` - Update: `docker/telemetry/workload/expected_metrics.json` --- ## Task 11.11: Documentation **Objective**: Document the receiver, its metrics, deployment, and alerting. **What to do**: - Create `docker/telemetry/otel-rippled-receiver/README.md`: - Architecture overview (how the receiver fits into the OTel Collector) - Configuration reference (all config options with defaults) - Metric reference table (all exported metrics with types and labels) - Deployment guide (building custom collector binary, docker-compose integration) - Update `OpenTelemetryPlan/09-data-collection-reference.md`: - Add "Third-Party Metrics (OTel Collector Receiver)" section - Add new Grafana dashboard reference (4 dashboards) - Add alerting rules reference - Update `docs/telemetry-runbook.md`: - Add "Third-Party Metrics Receiver" troubleshooting section - Add alerting playbook (what to do for each Tier 1/Tier 2 alert) --- ## Exit Criteria - [ ] Custom OTel Collector receiver builds and starts without errors - [ ] All `xrpl_*` metrics from server_info, get_counts, peers, validators, fee appear in Prometheus - [ ] Metrics update at configured poll interval (default 30s) - [ ] 4 new Grafana dashboards operational with data - [ ] Prometheus alerting rules fire correctly for simulated failure conditions - [ ] DEX/AMM collector works when configured (optional — not required for base exit criteria) - [ ] Phase 10 validation suite passes with receiver metrics included - [ ] Receiver handles rippled restart/unavailability gracefully (no crash, logs warning, retries) - [ ] Documentation complete: receiver README, metric reference, alerting playbook - [ ] Go receiver has unit tests with >80% coverage