- Phase 9: Internal Metric Instrumentation Gap Fill (10 tasks, 12d) - MetricsRegistry class, NodeStore I/O, cache, TxQ, PerfLog, CountedObjects, load factors - Phase 10: Synthetic Workload Generation & Telemetry Validation (7 tasks, 10d) - Multi-node harness, RPC/tx generators, validation suite, benchmarks, CI - Phase 11: Third-Party Data Collection Pipelines (11 tasks, 15d) - Custom OTel Collector receiver (Go), 30 external metrics, alerting rules, 4 dashboards - Updated 06-implementation-phases.md with plan sections §6.8.2-§6.8.4, gantt, effort summary - Updated 09-data-collection-reference.md with §5b-§5d future metric definitions - Updated 08-appendix.md with Phase 9-11 glossary, task list entries, cross-reference guide, effort summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
18 KiB
Phase 11: Third-Party Data Collection Pipelines — Task List
Status: Future Enhancement
Goal: Build a custom OTel Collector receiver that periodically polls rippled's admin RPCs and exports structured metrics for external consumers — making all XRPL health, validator, peer, fee, and DEX data available as Prometheus/OTLP metrics without rippled code changes.
Scope: Go-based OTel Collector receiver plugin + Grafana dashboards + Prometheus alerting rules.
Branch:
pratik/otel-phase11-third-party-collection(frompratik/otel-phase10-workload-validation)Depends on: Phase 10 (validation harness for testing the new receiver)
Related Plan Documents
| Document | Relevance |
|---|---|
| 06-implementation-phases.md | Phase 11 plan: motivation, architecture, exit criteria (§6.8.4) |
| 09-data-collection-reference.md | Defines full metric inventory including third-party metrics |
| Phase10_taskList.md | Prerequisite — validation harness for testing |
Third-Party Consumer Gap Analysis
This phase addresses the cross-cutting gap identified during research: rippled has no native Prometheus/OTLP metrics export for data accessible only via RPC. Every consumer (exchanges, payment processors, analytics providers, validators, researchers, compliance firms, custodians) must build custom JSON-RPC polling and conversion. This receiver centralizes that work.
| Consumer Category | Data Unlocked by This Phase |
|---|---|
| Exchanges | Real-time fee estimates, TxQ capacity, server health scores |
| Payment Processors | Settlement latency percentiles, corridor health, path availability |
| Analytics Providers | Validator metrics, network topology, amendment voting status |
| DeFi / AMM | AMM pool TVL, DEX order book depth, trade volumes |
| Validators / Operators | Per-peer latency, version distribution, UNL health, alerting |
| Compliance | Transaction volume trends, network growth metrics |
| Academic Researchers | Consensus performance time-series, decentralization metrics |
| CBDC / Tokenization | Token supply tracking, trust line adoption, freeze status |
| Institutional Custody | Multi-sig status, escrow tracking, reserve calculations |
| Wallet Providers | Server health for node selection, fee prediction data |
Task 11.1: OTel Collector Receiver Scaffold
Objective: Create the Go project structure for a custom OTel Collector receiver that polls rippled JSON-RPC.
What to do:
-
Create
docker/telemetry/otel-rippled-receiver/:receiver.go— implementsreceiver.Metricsinterfaceconfig.go— configuration struct (endpoint, poll interval, enabled RPCs)factory.go— receiver factory registrationgo.mod/go.sum— Go module with OTel Collector SDK dependency
-
Configuration model:
rippled_receiver: endpoint: "http://localhost:5005" # rippled admin RPC poll_interval: 30s # how often to poll enabled_collectors: - server_info - get_counts - fee - peers - validators - feature - server_state amm_pools: [] # optional: AMM pool IDs to track book_offers_pairs: [] # optional: currency pairs for DEX depth -
Build a custom OTel Collector binary that includes this receiver alongside the standard receivers.
Key files:
- New:
docker/telemetry/otel-rippled-receiver/receiver.go - New:
docker/telemetry/otel-rippled-receiver/config.go - New:
docker/telemetry/otel-rippled-receiver/factory.go - New:
docker/telemetry/otel-rippled-receiver/go.mod - New:
docker/telemetry/otel-rippled-receiver/Dockerfile
Task 11.2: server_info / server_state Collector
Objective: Poll server_info and server_state and export all fields as OTel metrics.
What to do:
-
Implement
serverInfoCollectorthat callsserver_info(admin) and extracts:Node Health Gauges:
xrpl_server_state(enum → int: disconnected=0, connected=1, syncing=2, tracking=3, full=4, proposing=5)xrpl_server_state_duration_secondsxrpl_uptime_secondsxrpl_io_latency_msxrpl_amendment_blocked(0 or 1)xrpl_peers_countxrpl_peer_disconnects_totalxrpl_peer_disconnects_resources_totalxrpl_jq_trans_overflow_total
Consensus Gauges:
xrpl_last_close_proposersxrpl_last_close_converge_time_secondsxrpl_validation_quorum
Ledger Gauges:
xrpl_validated_ledger_seqxrpl_validated_ledger_age_secondsxrpl_validated_ledger_base_fee_dropsxrpl_validated_ledger_reserve_base_dropsxrpl_validated_ledger_reserve_inc_dropsxrpl_close_time_offset_seconds(0 when absent)
Load Factor Gauges:
xrpl_load_factorxrpl_load_factor_serverxrpl_load_factor_fee_escalationxrpl_load_factor_fee_queuexrpl_load_factor_localxrpl_load_factor_netxrpl_load_factor_cluster
State Accounting Gauges (per state: disconnected, connected, syncing, tracking, full):
xrpl_state_duration_seconds{state="<name>"}xrpl_state_transitions_total{state="<name>"}
Validator Info (when node is a validator):
xrpl_validator_list_countxrpl_validator_list_expiration_seconds(epoch)xrpl_validator_list_active(0 or 1)
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/server_info.go
Task 11.3: get_counts Collector
Objective: Poll get_counts and export internal object counts and NodeStore stats.
What to do:
-
Implement
getCountsCollector:Database Gauges:
xrpl_db_size_kb{db="total"},xrpl_db_size_kb{db="ledger"},xrpl_db_size_kb{db="transaction"}
NodeStore Gauges:
xrpl_nodestore_reads_total,xrpl_nodestore_reads_hit,xrpl_nodestore_writes_totalxrpl_nodestore_read_bytes,xrpl_nodestore_written_bytesxrpl_nodestore_read_duration_us,xrpl_nodestore_write_loadxrpl_nodestore_read_queue,xrpl_nodestore_read_threads_running
Cache Gauges:
xrpl_cache_hit_rate{cache="SLE"},xrpl_cache_hit_rate{cache="ledger"},xrpl_cache_hit_rate{cache="accepted_ledger"}xrpl_cache_size{cache="treenode"},xrpl_cache_size{cache="fullbelow"},xrpl_cache_size{cache="accepted_ledger"}
Object Count Gauges:
xrpl_object_count{type="<name>"}for each counted object type (Transaction, Ledger, NodeObject, STTx, STLedgerEntry, InboundLedger, Pathfinder, etc.)
Rates:
xrpl_historical_fetch_per_minutexrpl_local_txs
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/get_counts.go
Task 11.4: Peer Topology Collector
Objective: Poll peers and export per-peer and aggregate network metrics.
What to do:
-
Implement
peersCollector:Aggregate Gauges:
xrpl_peers_inbound_countxrpl_peers_outbound_countxrpl_peers_cluster_count
Per-Peer Gauges (with labels
peer_keytruncated to 8 chars for cardinality control):xrpl_peer_latency_ms{peer="<key>", version="<ver>", inbound="<bool>"}xrpl_peer_uptime_seconds{peer="<key>"}xrpl_peer_load{peer="<key>"}
Distribution Gauges (aggregated across all peers):
xrpl_peer_latency_p50_ms,xrpl_peer_latency_p95_ms,xrpl_peer_latency_p99_msxrpl_peer_version_count{version="<semver>"}— count of peers per software version
Tracking Status:
xrpl_peer_diverged_count— peers withtrack=divergedxrpl_peer_unknown_count— peers withtrack=unknown
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/peers.go
Cardinality note: Per-peer metrics use truncated keys. For large peer sets (50+), the aggregate distribution gauges are preferred over per-peer labels.
Task 11.5: Validator & Amendment Collector
Objective: Poll validators and feature to export validator health and amendment voting status.
What to do:
-
Implement
validatorCollector:From
validatorsRPC:xrpl_trusted_validators_countxrpl_validator_signing(0 or 1 — whether local validator is signing)
From
featureRPC:xrpl_amendment_enabled_count— total enabled amendmentsxrpl_amendment_majority_count— amendments with majority but not yet enabledxrpl_amendment_vetoed_count— locally vetoed amendmentsxrpl_amendment_unsupported_majority(0 or 1) — any unsupported amendment has majority (critical alert)
Per-amendment with majority (limited cardinality — only amendments with
majorityset):xrpl_amendment_majority_time{name="<amendment>"}— epoch time when majority was gainedxrpl_amendment_votes{name="<amendment>"}— current vote countxrpl_amendment_threshold{name="<amendment>"}— votes needed
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/validators.go
Task 11.6: Fee & TxQ Collector
Objective: Poll fee RPC and export real-time fee market data.
What to do:
-
Implement
feeCollectorthat calls the publicfeeRPC:Fee Level Gauges:
xrpl_fee_current_ledger_size— transactions in current open ledgerxrpl_fee_expected_ledger_size— expected transactions at closexrpl_fee_max_queue_size— maximum transaction queue sizexrpl_fee_open_ledger_fee_drops— minimum fee for open ledger inclusionxrpl_fee_median_fee_drops— median fee levelxrpl_fee_minimum_fee_drops— base reference feexrpl_fee_queue_size— current queue depth
-
This overlaps with Phase 9's internal TxQ metrics but provides an external-only collection path that doesn't require rippled code changes.
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/fee.go
Task 11.7: DEX & AMM Collector (Optional)
Objective: Periodically poll configured AMM pools and order book pairs for DeFi metrics.
What to do:
-
Implement
dexCollector(enabled only whenamm_poolsorbook_offers_pairsare configured):AMM Pool Gauges (per configured pool):
xrpl_amm_reserve{pool="<id>", asset="<currency>"}— pool reserve amountxrpl_amm_lp_token_supply{pool="<id>"}— outstanding LP tokensxrpl_amm_trading_fee{pool="<id>"}— pool trading fee (basis points)xrpl_amm_tvl_drops{pool="<id>"}— total value locked (XRP-denominated)
Order Book Gauges (per configured pair):
xrpl_orderbook_bid_depth{pair="<base>/<quote>"}— total bid volumexrpl_orderbook_ask_depth{pair="<base>/<quote>"}— total ask volumexrpl_orderbook_spread{pair="<base>/<quote>"}— best bid-ask spreadxrpl_orderbook_offer_count{pair="<base>/<quote>", side="bid|ask"}— number of offers
Key files:
- New:
docker/telemetry/otel-rippled-receiver/collectors/dex.go
Note: This is optional because it requires explicit configuration of which pools/pairs to track. Default configuration tracks no DEX data.
Task 11.8: Prometheus Alerting Rules
Objective: Create production-ready alerting rules for the metrics exported by this receiver.
What to do:
-
Create
docker/telemetry/prometheus/rippled-alerts.yml:Tier 1 — Critical (page immediately):
- alert: XRPLServerNotFull expr: xrpl_server_state < 4 for: 15m - alert: XRPLAmendmentBlocked expr: xrpl_amendment_blocked == 1 for: 1m - alert: XRPLNoPeers expr: xrpl_peers_count == 0 for: 5m - alert: XRPLLedgerStale expr: xrpl_validated_ledger_age_seconds > 120 for: 2m - alert: XRPLHighIOLatency expr: xrpl_io_latency_ms > 100 for: 5m - alert: XRPLUnsupportedAmendmentMajority expr: xrpl_amendment_unsupported_majority == 1 for: 1mTier 2 — Warning (investigate within hours):
- alert: XRPLLowPeerCount expr: xrpl_peers_count < 10 for: 15m - alert: XRPLHighLoadFactor expr: xrpl_load_factor > 10 for: 10m - alert: XRPLSlowConsensus expr: xrpl_last_close_converge_time_seconds > 6 for: 5m - alert: XRPLValidatorListExpiring expr: (xrpl_validator_list_expiration_seconds - time()) < 86400 for: 1h - alert: XRPLClockDrift expr: xrpl_close_time_offset_seconds > 0 for: 5m - alert: XRPLStateFlapping expr: rate(xrpl_state_transitions_total{state="full"}[1h]) > 2 for: 30m
Key files:
- New:
docker/telemetry/prometheus/rippled-alerts.yml - Update:
docker/telemetry/prometheus/prometheus.yml(add rule_files reference)
Task 11.9: New Grafana Dashboards
Objective: Create 4 new dashboards for the data exported by the receiver.
What to do:
-
Validator Health (
rippled-validator-health):- Server state timeline, state duration breakdown
- Proposer count trend, converge time trend, validation quorum
- Validator list expiration countdown
- Amendment voting status (majority/enabled/vetoed)
-
Network Topology (
rippled-network-topology):- Peer count (inbound/outbound/cluster), peer version distribution
- Peer latency distribution (p50/p95/p99), diverged peer count
- Geographic distribution (if enriched with GeoIP)
- Peer uptime distribution
-
Fee Market (
rippled-fee-market-external):- Current fee levels (open ledger, median, minimum), fee escalation timeline
- Queue depth vs. capacity, transactions per ledger
- Load factor breakdown (server/network/cluster/escalation)
-
DEX & AMM Overview (
rippled-dex-amm) (only populated when DEX collectors are configured):- AMM pool TVL, reserve ratios, LP token supply
- Order book depth per pair, spread trends
- Trading fee revenue estimates
Key files:
- New:
docker/telemetry/grafana/dashboards/rippled-validator-health.json - New:
docker/telemetry/grafana/dashboards/rippled-network-topology.json - New:
docker/telemetry/grafana/dashboards/rippled-fee-market-external.json - New:
docker/telemetry/grafana/dashboards/rippled-dex-amm.json
Task 11.10: Integration with Phase 10 Validation
Objective: Extend the Phase 10 validation suite to verify this receiver's metrics.
What to do:
-
Update
docker/telemetry/workload/validate_telemetry.py:- Add assertions for all
xrpl_*metrics produced by the receiver - Verify metric labels have expected values
- Verify alerting rules fire correctly (inject a "bad" state and check alert)
- Add assertions for all
-
Update
docker/telemetry/docker-compose.workload.yaml:- Add the custom OTel Collector build with the rippled receiver
- Configure the receiver to poll one of the test nodes
Key files:
- Update:
docker/telemetry/workload/validate_telemetry.py - Update:
docker/telemetry/docker-compose.workload.yaml - Update:
docker/telemetry/workload/expected_metrics.json
Task 11.11: Documentation
Objective: Document the receiver, its metrics, deployment, and alerting.
What to do:
-
Create
docker/telemetry/otel-rippled-receiver/README.md:- Architecture overview (how the receiver fits into the OTel Collector)
- Configuration reference (all config options with defaults)
- Metric reference table (all exported metrics with types and labels)
- Deployment guide (building custom collector binary, docker-compose integration)
-
Update
OpenTelemetryPlan/09-data-collection-reference.md:- Add "Third-Party Metrics (OTel Collector Receiver)" section
- Add new Grafana dashboard reference (4 dashboards)
- Add alerting rules reference
-
Update
docs/telemetry-runbook.md:- Add "Third-Party Metrics Receiver" troubleshooting section
- Add alerting playbook (what to do for each Tier 1/Tier 2 alert)
Effort Summary
| Task | Description | Effort | Risk |
|---|---|---|---|
| 11.1 | OTel Collector receiver scaffold | 1.5d | Medium |
| 11.2 | server_info / server_state collector | 2d | Low |
| 11.3 | get_counts collector | 1.5d | Low |
| 11.4 | Peer topology collector | 1.5d | Medium |
| 11.5 | Validator & amendment collector | 1d | Low |
| 11.6 | Fee & TxQ collector | 0.5d | Low |
| 11.7 | DEX & AMM collector (optional) | 1.5d | Medium |
| 11.8 | Prometheus alerting rules | 1d | Low |
| 11.9 | New Grafana dashboards (4) | 2d | Low |
| 11.10 | Integration with Phase 10 validation | 1d | Low |
| 11.11 | Documentation | 1d | Low |
Total Effort: 15 days
Exit Criteria
- Custom OTel Collector receiver builds and starts without errors
- All
xrpl_*metrics from server_info, get_counts, peers, validators, fee appear in Prometheus - Metrics update at configured poll interval (default 30s)
- 4 new Grafana dashboards operational with data
- Prometheus alerting rules fire correctly for simulated failure conditions
- DEX/AMM collector works when configured (optional — not required for base exit criteria)
- Phase 10 validation suite passes with receiver metrics included
- Receiver handles rippled restart/unavailability gracefully (no crash, logs warning, retries)
- Documentation complete: receiver README, metric reference, alerting playbook
- Go receiver has unit tests with >80% coverage