mirror of
https://github.com/XRPLF/rippled.git
synced 2026-04-29 15:37:57 +00:00
feat(telemetry): add Phase 5 documentation, deployment configs, and integration tests
Add the observability stack deployment infrastructure and integration test framework for verifying end-to-end trace export. - Add Grafana dashboards: RPC performance, transaction overview, consensus health (pre-provisioned via dashboards.yaml) - Add Prometheus config for spanmetrics collection from OTel Collector - Update OTel Collector config with spanmetrics connector and prometheus exporter for RED metrics - Add docker-compose services: prometheus, dashboard provisioning - Add integration-test.sh with Tempo API-based span verification (replaces previous Jaeger-based approach) - Add TESTING.md with step-by-step deployment and verification guide - Add telemetry-runbook.md for production operations reference - Add xrpld-telemetry.cfg sample configuration - Add toDisplayString() for ConsensusMode (human-readable span values) - Update Phase 2/3 task lists with known issues sections - Add Phase 5 integration test task list - Add TraceContext protobuf fields for future relay propagation - Wire telemetry lifecycle (setServiceInstanceId/start/stop) in Application.cpp Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2
docker/telemetry/.gitignore
vendored
Normal file
2
docker/telemetry/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
# Runtime data generated by xrpld and telemetry stack
|
||||
data/
|
||||
512
docker/telemetry/TESTING.md
Normal file
512
docker/telemetry/TESTING.md
Normal file
@@ -0,0 +1,512 @@
|
||||
# OpenTelemetry Integration Testing Guide
|
||||
|
||||
This document describes how to verify the rippled OpenTelemetry telemetry
|
||||
pipeline end-to-end, from span generation through the observability stack
|
||||
(otel-collector, Tempo, Prometheus, Grafana).
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Build xrpld with telemetry
|
||||
|
||||
```bash
|
||||
conan install . --build=missing -o telemetry=True
|
||||
cmake --preset default -Dtelemetry=ON
|
||||
cmake --build --preset default --target xrpld
|
||||
```
|
||||
|
||||
The binary is at `.build/xrpld`.
|
||||
|
||||
### Required tools
|
||||
|
||||
- **Docker** with `docker compose` (v2)
|
||||
- **curl**
|
||||
- **jq** (JSON processor)
|
||||
|
||||
### Verify binary
|
||||
|
||||
```bash
|
||||
.build/xrpld --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Test 1: Single-Node Standalone (Quick Verification)
|
||||
|
||||
This test verifies RPC and transaction spans in standalone mode. Consensus
|
||||
spans will not fire because standalone mode does not run consensus.
|
||||
|
||||
### Step 1: Start the observability stack
|
||||
|
||||
```bash
|
||||
docker compose -f docker/telemetry/docker-compose.yml up -d
|
||||
```
|
||||
|
||||
Wait for services to be ready:
|
||||
|
||||
```bash
|
||||
# otel-collector health
|
||||
curl -sf http://localhost:13133/ && echo "collector ready"
|
||||
|
||||
# Tempo readiness
|
||||
curl -sf http://localhost:3200/ready > /dev/null && echo "tempo ready"
|
||||
```
|
||||
|
||||
### Step 2: Start xrpld in standalone mode
|
||||
|
||||
```bash
|
||||
.build/xrpld --conf docker/telemetry/xrpld-telemetry.cfg -a --start
|
||||
```
|
||||
|
||||
Wait a few seconds for the node to initialize.
|
||||
|
||||
### Step 3: Exercise RPC spans
|
||||
|
||||
```bash
|
||||
# server_info
|
||||
curl -s http://localhost:5005 \
|
||||
-d '{"method":"server_info"}' | jq .result.info.server_state
|
||||
|
||||
# server_state
|
||||
curl -s http://localhost:5005 \
|
||||
-d '{"method":"server_state"}' | jq .result.state.server_state
|
||||
|
||||
# ledger
|
||||
curl -s http://localhost:5005 \
|
||||
-d '{"method":"ledger","params":[{"ledger_index":"current"}]}' \
|
||||
| jq .result.ledger_current_index
|
||||
```
|
||||
|
||||
### Step 4: Submit a transaction
|
||||
|
||||
Close the ledger first (required in standalone mode):
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:5005 -d '{"method":"ledger_accept"}'
|
||||
```
|
||||
|
||||
Submit a Payment from the genesis account:
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:5005 -d '{
|
||||
"method": "submit",
|
||||
"params": [{
|
||||
"secret": "snoPBrXtMeMyMHUVTgbuqAfg1SUTb",
|
||||
"tx_json": {
|
||||
"TransactionType": "Payment",
|
||||
"Account": "rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh",
|
||||
"Destination": "rPMh7Pi9ct699iZUTWzJaUMR1o42VEfGqF",
|
||||
"Amount": "10000000"
|
||||
}
|
||||
}]
|
||||
}' | jq .result.engine_result
|
||||
```
|
||||
|
||||
Expected result: `"tesSUCCESS"`.
|
||||
|
||||
Close the ledger again to finalize:
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:5005 -d '{"method":"ledger_accept"}'
|
||||
```
|
||||
|
||||
### Step 5: Verify traces in Tempo
|
||||
|
||||
Wait 5 seconds for the batch export, then:
|
||||
|
||||
```bash
|
||||
TEMPO="http://localhost:3200"
|
||||
|
||||
# Check rippled service is registered
|
||||
curl -s "$TEMPO/api/v2/search/tag/resource.service.name/values" | jq '.tagValues[].value'
|
||||
|
||||
# Check RPC spans
|
||||
curl -s "$TEMPO/api/search" \
|
||||
--data-urlencode 'q={resource.service.name="rippled" && name="rpc.request"}' \
|
||||
--data-urlencode 'limit=5' | jq '.traces | length'
|
||||
|
||||
curl -s "$TEMPO/api/search" \
|
||||
--data-urlencode 'q={resource.service.name="rippled" && name="rpc.process"}' \
|
||||
--data-urlencode 'limit=5' | jq '.traces | length'
|
||||
|
||||
curl -s "$TEMPO/api/search" \
|
||||
--data-urlencode 'q={resource.service.name="rippled" && name="rpc.command.server_info"}' \
|
||||
--data-urlencode 'limit=5' | jq '.traces | length'
|
||||
|
||||
# Check transaction spans
|
||||
curl -s "$TEMPO/api/search" \
|
||||
--data-urlencode 'q={resource.service.name="rippled" && name="tx.process"}' \
|
||||
--data-urlencode 'limit=5' | jq '.traces | length'
|
||||
```
|
||||
|
||||
Or open Grafana Explore with Tempo datasource: http://localhost:3000
|
||||
|
||||
### Step 6: Teardown
|
||||
|
||||
```bash
|
||||
# Kill xrpld (Ctrl+C or)
|
||||
kill $(pgrep -f 'xrpld.*xrpld-telemetry')
|
||||
|
||||
# Stop observability stack
|
||||
docker compose -f docker/telemetry/docker-compose.yml down
|
||||
|
||||
# Clean xrpld data
|
||||
rm -rf data/
|
||||
```
|
||||
|
||||
### Expected spans (standalone mode)
|
||||
|
||||
| Span Name | Expected | Notes |
|
||||
| --------------------------- | -------- | ----------------------------- |
|
||||
| `rpc.request` | Yes | Every HTTP RPC call |
|
||||
| `rpc.process` | Yes | Every RPC processing |
|
||||
| `rpc.command.server_info` | Yes | server_info RPC |
|
||||
| `rpc.command.server_state` | Yes | server_state RPC |
|
||||
| `rpc.command.ledger` | Yes | ledger RPC |
|
||||
| `rpc.command.submit` | Yes | submit RPC |
|
||||
| `rpc.command.ledger_accept` | Yes | ledger_accept RPC |
|
||||
| `tx.process` | Yes | Transaction submission |
|
||||
| `tx.receive` | No | No peers in standalone |
|
||||
| `consensus.*` | No | Consensus disabled standalone |
|
||||
|
||||
---
|
||||
|
||||
## Test 2: 6-Node Consensus Network (Full Verification)
|
||||
|
||||
This test verifies ALL span categories including consensus and peer
|
||||
transaction relay, using a 6-node validator network.
|
||||
|
||||
### Automated
|
||||
|
||||
Run the integration test script:
|
||||
|
||||
```bash
|
||||
bash docker/telemetry/integration-test.sh
|
||||
```
|
||||
|
||||
The script will:
|
||||
|
||||
1. Start the observability stack
|
||||
2. Generate 6 validator key pairs
|
||||
3. Create config files for each node
|
||||
4. Start all 6 nodes
|
||||
5. Wait for consensus ("proposing" state)
|
||||
6. Exercise RPC, submit transactions
|
||||
7. Verify all span categories in Tempo
|
||||
8. Verify spanmetrics in Prometheus
|
||||
9. Print results and leave the stack running
|
||||
|
||||
### Manual
|
||||
|
||||
If you prefer to run the steps manually:
|
||||
|
||||
#### Step 1: Start observability stack
|
||||
|
||||
```bash
|
||||
docker compose -f docker/telemetry/docker-compose.yml up -d
|
||||
```
|
||||
|
||||
#### Step 2: Generate validator keys
|
||||
|
||||
Start a temporary standalone xrpld:
|
||||
|
||||
```bash
|
||||
.build/xrpld --conf docker/telemetry/xrpld-telemetry.cfg -a --start &
|
||||
TEMP_PID=$!
|
||||
sleep 5
|
||||
```
|
||||
|
||||
Generate 6 key pairs:
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 6); do
|
||||
curl -s http://localhost:5005 \
|
||||
-d '{"method":"validation_create"}' | jq '.result'
|
||||
done
|
||||
```
|
||||
|
||||
Record the `validation_seed` and `validation_public_key` for each.
|
||||
Kill the temporary node:
|
||||
|
||||
```bash
|
||||
kill $TEMP_PID
|
||||
rm -rf data/
|
||||
```
|
||||
|
||||
#### Step 3: Create node configs
|
||||
|
||||
For each node (1-6), create a config file. Template:
|
||||
|
||||
```ini
|
||||
[server]
|
||||
port_rpc
|
||||
port_peer
|
||||
|
||||
[port_rpc]
|
||||
port = {5004 + node_number}
|
||||
ip = 127.0.0.1
|
||||
admin = 127.0.0.1
|
||||
protocol = http
|
||||
|
||||
[port_peer]
|
||||
port = {51234 + node_number}
|
||||
ip = 0.0.0.0
|
||||
protocol = peer
|
||||
|
||||
[node_db]
|
||||
type=NuDB
|
||||
path=/tmp/xrpld-integration/node{N}/nudb
|
||||
online_delete=256
|
||||
|
||||
[database_path]
|
||||
/tmp/xrpld-integration/node{N}/db
|
||||
|
||||
[debug_logfile]
|
||||
/tmp/xrpld-integration/node{N}/debug.log
|
||||
|
||||
[validation_seed]
|
||||
{seed from step 2}
|
||||
|
||||
[validators_file]
|
||||
/tmp/xrpld-integration/validators.txt
|
||||
|
||||
[ips_fixed]
|
||||
127.0.0.1 51235
|
||||
127.0.0.1 51236
|
||||
127.0.0.1 51237
|
||||
127.0.0.1 51238
|
||||
127.0.0.1 51239
|
||||
127.0.0.1 51240
|
||||
|
||||
[peer_private]
|
||||
1
|
||||
|
||||
[telemetry]
|
||||
enabled=1
|
||||
endpoint=http://localhost:4318/v1/traces
|
||||
exporter=otlp_http
|
||||
sampling_ratio=1.0
|
||||
batch_size=512
|
||||
batch_delay_ms=2000
|
||||
max_queue_size=2048
|
||||
trace_rpc=1
|
||||
trace_transactions=1
|
||||
trace_consensus=1
|
||||
trace_peer=0
|
||||
trace_ledger=1
|
||||
|
||||
[rpc_startup]
|
||||
{ "command": "log_level", "severity": "warning" }
|
||||
|
||||
[ssl_verify]
|
||||
0
|
||||
```
|
||||
|
||||
#### Step 4: Create validators.txt
|
||||
|
||||
```ini
|
||||
[validators]
|
||||
{public_key_1}
|
||||
{public_key_2}
|
||||
{public_key_3}
|
||||
{public_key_4}
|
||||
{public_key_5}
|
||||
{public_key_6}
|
||||
```
|
||||
|
||||
#### Step 5: Start all 6 nodes
|
||||
|
||||
```bash
|
||||
for i in $(seq 1 6); do
|
||||
.build/xrpld --conf /tmp/xrpld-integration/node$i/xrpld.cfg --start &
|
||||
echo $! > /tmp/xrpld-integration/node$i/xrpld.pid
|
||||
done
|
||||
```
|
||||
|
||||
#### Step 6: Wait for consensus
|
||||
|
||||
Poll each node until `server_state` = `"proposing"`:
|
||||
|
||||
```bash
|
||||
for port in 5005 5006 5007 5008 5009 5010; do
|
||||
while true; do
|
||||
state=$(curl -s http://localhost:$port \
|
||||
-d '{"method":"server_info"}' \
|
||||
| jq -r '.result.info.server_state')
|
||||
echo "Port $port: $state"
|
||||
[ "$state" = "proposing" ] && break
|
||||
sleep 5
|
||||
done
|
||||
done
|
||||
```
|
||||
|
||||
#### Step 7: Exercise RPC and submit transaction
|
||||
|
||||
```bash
|
||||
# RPC calls
|
||||
curl -s http://localhost:5005 -d '{"method":"server_info"}'
|
||||
curl -s http://localhost:5005 -d '{"method":"server_state"}'
|
||||
curl -s http://localhost:5005 -d '{"method":"ledger","params":[{"ledger_index":"current"}]}'
|
||||
|
||||
# Submit transaction
|
||||
curl -s http://localhost:5005 -d '{
|
||||
"method": "submit",
|
||||
"params": [{
|
||||
"secret": "snoPBrXtMeMyMHUVTgbuqAfg1SUTb",
|
||||
"tx_json": {
|
||||
"TransactionType": "Payment",
|
||||
"Account": "rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh",
|
||||
"Destination": "rPMh7Pi9ct699iZUTWzJaUMR1o42VEfGqF",
|
||||
"Amount": "10000000"
|
||||
}
|
||||
}]
|
||||
}'
|
||||
```
|
||||
|
||||
Wait 15 seconds for consensus and batch export.
|
||||
|
||||
#### Step 8: Verify in Tempo
|
||||
|
||||
See the "Verification Queries" section below.
|
||||
|
||||
---
|
||||
|
||||
## Expected Span Catalog
|
||||
|
||||
All 12 production span names instrumented across Phases 2-4:
|
||||
|
||||
| Span Name | Source File | Phase | Key Attributes | How to Trigger |
|
||||
| --------------------------- | --------------------- | ----- | --------------------------------------------------------------------------------- | ------------------------- |
|
||||
| `rpc.request` | ServerHandler.cpp:271 | 2 | -- | Any HTTP RPC call |
|
||||
| `rpc.process` | ServerHandler.cpp:573 | 2 | -- | Any HTTP RPC call |
|
||||
| `rpc.ws_message` | ServerHandler.cpp:384 | 2 | -- | WebSocket RPC message |
|
||||
| `rpc.command.<name>` | RPCHandler.cpp:161 | 2 | `xrpl.rpc.command`, `xrpl.rpc.version`, `xrpl.rpc.role` | Any RPC command |
|
||||
| `tx.process` | NetworkOPs.cpp:1227 | 3 | `xrpl.tx.hash`, `xrpl.tx.local`, `xrpl.tx.path` | Submit transaction |
|
||||
| `tx.receive` | PeerImp.cpp:1273 | 3 | `xrpl.peer.id` | Peer relays transaction |
|
||||
| `consensus.proposal.send` | RCLConsensus.cpp:177 | 4 | `xrpl.consensus.round` | Consensus proposing phase |
|
||||
| `consensus.ledger_close` | RCLConsensus.cpp:282 | 4 | `xrpl.consensus.ledger.seq`, `xrpl.consensus.mode` | Ledger close event |
|
||||
| `consensus.accept` | RCLConsensus.cpp:395 | 4 | `xrpl.consensus.proposers`, `xrpl.consensus.round_time_ms` | Ledger accepted |
|
||||
| `consensus.validation.send` | RCLConsensus.cpp:753 | 4 | `xrpl.consensus.ledger.seq`, `xrpl.consensus.proposing` | Validation sent |
|
||||
| `consensus.accept.apply` | RCLConsensus.cpp:453 | 4 | `xrpl.consensus.close_time`, `close_time_correct`, `close_resolution_ms`, `state` | Ledger apply + close time |
|
||||
|
||||
---
|
||||
|
||||
## Verification Queries
|
||||
|
||||
### Tempo API
|
||||
|
||||
Base URL: `http://localhost:3200`
|
||||
|
||||
```bash
|
||||
TEMPO="http://localhost:3200"
|
||||
|
||||
# List all services
|
||||
curl -s "$TEMPO/api/v2/search/tag/resource.service.name/values" | jq '.tagValues[].value'
|
||||
|
||||
# Query traces by operation
|
||||
for op in "rpc.request" "rpc.process" \
|
||||
"rpc.command.server_info" "rpc.command.server_state" "rpc.command.ledger" \
|
||||
"tx.process" "tx.receive" \
|
||||
"consensus.proposal.send" "consensus.ledger_close" \
|
||||
"consensus.accept" "consensus.accept.apply" \
|
||||
"consensus.validation.send"; do
|
||||
count=$(curl -s "$TEMPO/api/search" \
|
||||
--data-urlencode "q={resource.service.name=\"rippled\" && name=\"$op\"}" \
|
||||
--data-urlencode "limit=5" \
|
||||
| jq '.traces | length')
|
||||
printf "%-35s %s traces\n" "$op" "$count"
|
||||
done
|
||||
```
|
||||
|
||||
### Prometheus API
|
||||
|
||||
Base URL: `http://localhost:9090`
|
||||
|
||||
```bash
|
||||
PROM="http://localhost:9090"
|
||||
|
||||
# Span call counts (from spanmetrics connector)
|
||||
curl -s "$PROM/api/v1/query?query=traces_span_metrics_calls_total" \
|
||||
| jq '.data.result[] | {span: .metric.span_name, count: .value[1]}'
|
||||
|
||||
# Latency histogram
|
||||
curl -s "$PROM/api/v1/query?query=traces_span_metrics_duration_milliseconds_count" \
|
||||
| jq '.data.result[] | {span: .metric.span_name, count: .value[1]}'
|
||||
|
||||
# RPC calls by command
|
||||
curl -s "$PROM/api/v1/query?query=traces_span_metrics_calls_total{span_name=~\"rpc.command.*\"}" \
|
||||
| jq '.data.result[] | {command: .metric["xrpl.rpc.command"], count: .value[1]}'
|
||||
```
|
||||
|
||||
### Grafana
|
||||
|
||||
Open http://localhost:3000 (anonymous admin access enabled).
|
||||
|
||||
Pre-configured dashboards:
|
||||
|
||||
- **RPC Performance**: Request rates, latency percentiles by command
|
||||
- **Transaction Overview**: Transaction processing rates and paths
|
||||
- **Consensus Health**: Consensus round duration and proposer counts
|
||||
|
||||
Pre-configured datasources:
|
||||
|
||||
- **Tempo**: Trace data at `http://tempo:3200`
|
||||
- **Prometheus**: Metrics at `http://prometheus:9090`
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No traces in Tempo
|
||||
|
||||
1. Check otel-collector logs:
|
||||
```bash
|
||||
docker compose -f docker/telemetry/docker-compose.yml logs otel-collector
|
||||
```
|
||||
2. Verify xrpld telemetry config has `enabled=1` and correct endpoint
|
||||
3. Check that otel-collector port 4318 is accessible:
|
||||
```bash
|
||||
curl -sf http://localhost:4318 && echo "reachable"
|
||||
```
|
||||
4. Increase `batch_delay_ms` or decrease `batch_size` in xrpld config
|
||||
|
||||
### Nodes not reaching "proposing" state
|
||||
|
||||
1. Check that all peer ports (51235-51240) are not in use:
|
||||
```bash
|
||||
for p in 51235 51236 51237 51238 51239 51240; do
|
||||
ss -tlnp | grep ":$p " && echo "port $p in use"
|
||||
done
|
||||
```
|
||||
2. Verify `[ips_fixed]` lists all 6 peer ports
|
||||
3. Verify `validators.txt` has all 6 public keys
|
||||
4. Check node debug logs: `tail -50 /tmp/xrpld-integration/node1/debug.log`
|
||||
5. Ensure `[peer_private]` is set to `1` (prevents reaching out to public network)
|
||||
|
||||
### Transaction not processing
|
||||
|
||||
1. Verify genesis account exists:
|
||||
```bash
|
||||
curl -s http://localhost:5005 \
|
||||
-d '{"method":"account_info","params":[{"account":"rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh"}]}' \
|
||||
| jq .result.account_data.Balance
|
||||
```
|
||||
2. Check submit response for error codes
|
||||
3. In standalone mode, remember to call `ledger_accept` after submitting
|
||||
|
||||
### Spanmetrics not appearing in Prometheus
|
||||
|
||||
1. Verify otel-collector config has `spanmetrics` connector
|
||||
2. Check that the metrics pipeline is configured:
|
||||
```yaml
|
||||
service:
|
||||
pipelines:
|
||||
metrics:
|
||||
receivers: [spanmetrics]
|
||||
exporters: [prometheus]
|
||||
```
|
||||
3. Verify Prometheus can reach collector:
|
||||
```bash
|
||||
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets'
|
||||
```
|
||||
@@ -7,7 +7,7 @@
|
||||
# - tempo: Grafana Tempo tracing backend, queryable via Grafana Explore
|
||||
# on port 3000. Recommended for production (S3/GCS storage, TraceQL).
|
||||
# - grafana: dashboards on port 3000, pre-configured with Tempo
|
||||
# datasource.
|
||||
# and Prometheus datasources.
|
||||
#
|
||||
# Usage:
|
||||
# docker compose -f docker/telemetry/docker-compose.yml up -d
|
||||
@@ -26,6 +26,7 @@ services:
|
||||
ports:
|
||||
- "4317:4317" # OTLP gRPC receiver
|
||||
- "4318:4318" # OTLP HTTP receiver (xrpld sends traces here)
|
||||
- "8889:8889" # Prometheus metrics (spanmetrics)
|
||||
- "13133:13133" # Health check endpoint
|
||||
volumes:
|
||||
# Mount collector pipeline config (receivers → processors → exporters)
|
||||
@@ -50,6 +51,17 @@ services:
|
||||
networks:
|
||||
- xrpld-telemetry
|
||||
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
ports:
|
||||
- "9090:9090"
|
||||
volumes:
|
||||
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
||||
depends_on:
|
||||
- otel-collector
|
||||
networks:
|
||||
- xrpld-telemetry
|
||||
|
||||
# Grafana: visualization UI with Tempo pre-configured as a datasource.
|
||||
# Anonymous admin access enabled for local development convenience.
|
||||
grafana:
|
||||
@@ -62,8 +74,10 @@ services:
|
||||
volumes:
|
||||
# Auto-provision Tempo datasource and search filters on startup
|
||||
- ./grafana/provisioning:/etc/grafana/provisioning:ro
|
||||
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
|
||||
depends_on:
|
||||
- tempo
|
||||
- prometheus
|
||||
networks:
|
||||
- xrpld-telemetry
|
||||
|
||||
|
||||
244
docker/telemetry/grafana/dashboards/consensus-health.json
Normal file
244
docker/telemetry/grafana/dashboards/consensus-health.json
Normal file
@@ -0,0 +1,244 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 1,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"panels": [
|
||||
{
|
||||
"title": "Consensus Round Duration",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"consensus.accept\"}[5m])))",
|
||||
"legendFormat": "P95 Round Duration"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.50, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"consensus.accept\"}[5m])))",
|
||||
"legendFormat": "P50 Round Duration"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ms"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Consensus Proposals Sent Rate",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"consensus.proposal.send\"}[5m]))",
|
||||
"legendFormat": "Proposals / Sec"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Ledger Close Duration",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"consensus.ledger_close\"}[5m])))",
|
||||
"legendFormat": "P95 Close Duration"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ms"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Validation Send Rate",
|
||||
"type": "stat",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"consensus.validation.send\"}[5m]))",
|
||||
"legendFormat": "Validations / Sec"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Ledger Apply Duration (doAccept)",
|
||||
"description": "Time spent applying the consensus result to build a new ledger. Measured by the consensus.accept.apply span in doAccept().",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"consensus.accept.apply\"}[5m])))",
|
||||
"legendFormat": "P95 Apply Duration"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.50, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"consensus.accept.apply\"}[5m])))",
|
||||
"legendFormat": "P50 Apply Duration"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"axisLabel": "Duration (ms)",
|
||||
"spanNulls": true,
|
||||
"insertNulls": false,
|
||||
"showPoints": "auto",
|
||||
"pointSize": 3
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Close Time Agreement",
|
||||
"description": "Rate of close time agreement vs disagreement across consensus rounds. Based on xrpl.consensus.close_time_correct attribute (true = validators agreed, false = agreed to disagree per avCT_CONSENSUS_PCT).",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"consensus.accept.apply\"}[5m]))",
|
||||
"legendFormat": "Total Rounds / Sec"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ops",
|
||||
"custom": {
|
||||
"axisLabel": "Rounds / Sec",
|
||||
"spanNulls": true,
|
||||
"insertNulls": false,
|
||||
"showPoints": "auto",
|
||||
"pointSize": 3
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"schemaVersion": 39,
|
||||
"tags": ["rippled", "consensus", "telemetry"],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "node",
|
||||
"label": "Node",
|
||||
"description": "Filter by rippled node (service.instance.id \u2014 e.g. Node-1)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total, exported_instance)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
},
|
||||
{
|
||||
"name": "consensus_mode",
|
||||
"label": "Consensus Mode",
|
||||
"description": "Filter by consensus mode (Proposing, Observing, Wrong Ledger, Switched Ledger)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total{span_name=\"consensus.ledger_close\"}, xrpl_consensus_mode)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"title": "rippled Consensus Health",
|
||||
"uid": "rippled-consensus"
|
||||
}
|
||||
189
docker/telemetry/grafana/dashboards/rpc-performance.json
Normal file
189
docker/telemetry/grafana/dashboards/rpc-performance.json
Normal file
@@ -0,0 +1,189 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 1,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"panels": [
|
||||
{
|
||||
"title": "RPC Request Rate by Command",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum by (xrpl_rpc_command) (rate(traces_span_metrics_calls_total{xrpl_rpc_command=~\"$command\", exported_instance=~\"$node\", span_name=~\"rpc.command.*\"}[5m]))",
|
||||
"legendFormat": "{{xrpl_rpc_command}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "reqps",
|
||||
"custom": {
|
||||
"axisLabel": "Requests / Sec",
|
||||
"spanNulls": true,
|
||||
"insertNulls": false,
|
||||
"showPoints": "auto",
|
||||
"pointSize": 3
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "RPC Latency P95 by Command",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, sum by (le, xrpl_rpc_command) (rate(traces_span_metrics_duration_milliseconds_bucket{xrpl_rpc_command=~\"$command\", exported_instance=~\"$node\", span_name=~\"rpc.command.*\"}[5m])))",
|
||||
"legendFormat": "P95 {{xrpl_rpc_command}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"axisLabel": "Latency (ms)",
|
||||
"spanNulls": true,
|
||||
"insertNulls": false,
|
||||
"showPoints": "auto",
|
||||
"pointSize": 3
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "RPC Error Rate",
|
||||
"type": "bargauge",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum by (xrpl_rpc_command) (rate(traces_span_metrics_calls_total{xrpl_rpc_command=~\"$command\", exported_instance=~\"$node\", span_name=~\"rpc.command.*\", status_code=\"STATUS_CODE_ERROR\"}[5m])) / sum by (xrpl_rpc_command) (rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", xrpl_rpc_command=~\"$command\", span_name=~\"rpc.command.*\"}[5m])) * 100",
|
||||
"legendFormat": "{{xrpl_rpc_command}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"thresholds": {
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 5
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "RPC Latency Heatmap",
|
||||
"type": "heatmap",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(increase(traces_span_metrics_duration_milliseconds_bucket{xrpl_rpc_command=~\"$command\", exported_instance=~\"$node\", span_name=~\"rpc.command.*\"}[5m])) by (le)",
|
||||
"legendFormat": "{{le}}",
|
||||
"format": "heatmap"
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"schemaVersion": 39,
|
||||
"tags": ["rippled", "rpc", "telemetry"],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "node",
|
||||
"label": "Node",
|
||||
"description": "Filter by rippled node (service.instance.id \u2014 e.g. Node-1)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total, exported_instance)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
},
|
||||
{
|
||||
"name": "command",
|
||||
"label": "RPC Command",
|
||||
"description": "Filter by RPC command name (e.g., server_info, submit)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total{span_name=~\"rpc.command.*\"}, xrpl_rpc_command)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"title": "rippled RPC Performance",
|
||||
"uid": "rippled-rpc-perf"
|
||||
}
|
||||
172
docker/telemetry/grafana/dashboards/transaction-overview.json
Normal file
172
docker/telemetry/grafana/dashboards/transaction-overview.json
Normal file
@@ -0,0 +1,172 @@
|
||||
{
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 1,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"panels": [
|
||||
{
|
||||
"title": "Transaction Processing Rate",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"tx.process\"}[5m]))",
|
||||
"legendFormat": "tx.process/sec"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"tx.receive\"}[5m]))",
|
||||
"legendFormat": "tx.receive/sec"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Transaction Processing Latency",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.95, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"tx.process\"}[5m])))",
|
||||
"legendFormat": "p95"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "histogram_quantile(0.50, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{exported_instance=~\"$node\", span_name=\"tx.process\"}[5m])))",
|
||||
"legendFormat": "p50"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ms"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"title": "Transaction Path Distribution",
|
||||
"type": "piechart",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum by (xrpl_tx_local) (rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", xrpl_tx_local=~\"$tx_origin\", span_name=\"tx.process\"}[5m]))",
|
||||
"legendFormat": "local={{xrpl_tx_local}}"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Transaction Receive vs Suppressed",
|
||||
"type": "timeseries",
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus"
|
||||
},
|
||||
"expr": "sum(rate(traces_span_metrics_calls_total{exported_instance=~\"$node\", span_name=\"tx.receive\"}[5m]))",
|
||||
"legendFormat": "total received"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "ops"
|
||||
},
|
||||
"overrides": []
|
||||
}
|
||||
}
|
||||
],
|
||||
"schemaVersion": 39,
|
||||
"tags": ["rippled", "transactions", "telemetry"],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "node",
|
||||
"label": "Node",
|
||||
"description": "Filter by rippled node (service.instance.id \u2014 e.g. Node-1)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total, exported_instance)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
},
|
||||
{
|
||||
"name": "tx_origin",
|
||||
"label": "TX Origin",
|
||||
"description": "Filter by transaction origin (true = local submit, false = peer relay)",
|
||||
"type": "query",
|
||||
"query": "label_values(traces_span_metrics_calls_total{span_name=\"tx.process\"}, xrpl_tx_local)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "prometheus"
|
||||
},
|
||||
"includeAll": true,
|
||||
"allValue": ".*",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"multi": true,
|
||||
"refresh": 2,
|
||||
"sort": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"title": "rippled Transaction Overview",
|
||||
"uid": "rippled-transactions"
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: rippled-telemetry
|
||||
orgId: 1
|
||||
folder: rippled
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
foldersFromFilesStructure: false
|
||||
@@ -0,0 +1,10 @@
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
uid: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
@@ -40,9 +40,9 @@ datasources:
|
||||
operator: "="
|
||||
scope: resource
|
||||
type: static
|
||||
# service.instance.id: unique node identifier — defaults to the
|
||||
# node's public key (e.g., nHB1X37...). Distinguishes individual
|
||||
# nodes in a multi-node cluster or network.
|
||||
# service.instance.id: unique node identifier — configurable via
|
||||
# the service_instance_id setting in [telemetry], defaults to the
|
||||
# node's public key. E.g. "Node-1" or "nHB1X37...".
|
||||
- id: node-id
|
||||
tag: service.instance.id
|
||||
operator: "="
|
||||
|
||||
558
docker/telemetry/integration-test.sh
Executable file
558
docker/telemetry/integration-test.sh
Executable file
@@ -0,0 +1,558 @@
|
||||
#!/usr/bin/env bash
|
||||
# Integration test for rippled OpenTelemetry instrumentation.
|
||||
#
|
||||
# Launches a 6-node xrpld consensus network with telemetry enabled,
|
||||
# exercises RPC / transaction / consensus code paths, then verifies
|
||||
# that the expected spans and metrics appear in Tempo and Prometheus.
|
||||
#
|
||||
# Usage:
|
||||
# bash docker/telemetry/integration-test.sh
|
||||
#
|
||||
# Prerequisites:
|
||||
# - .build/xrpld built with telemetry=ON
|
||||
# - docker compose (v2)
|
||||
# - curl, jq
|
||||
#
|
||||
# The script leaves the observability stack and xrpld nodes running
|
||||
# so you can manually inspect Tempo (localhost:3200) and Grafana
|
||||
# (localhost:3000). Run with --cleanup to tear down instead.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configuration
|
||||
# ---------------------------------------------------------------------------
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
|
||||
XRPLD="$REPO_ROOT/.build/xrpld"
|
||||
COMPOSE_FILE="$SCRIPT_DIR/docker-compose.yml"
|
||||
STANDALONE_CFG="$SCRIPT_DIR/xrpld-telemetry.cfg"
|
||||
WORKDIR="/tmp/xrpld-integration"
|
||||
NUM_NODES=6
|
||||
PEER_PORT_BASE=51235
|
||||
RPC_PORT_BASE=5005
|
||||
CONSENSUS_TIMEOUT=120
|
||||
GENESIS_ACCOUNT="rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh"
|
||||
GENESIS_SEED="snoPBrXtMeMyMHUVTgbuqAfg1SUTb"
|
||||
DEST_ACCOUNT="" # Generated dynamically via wallet_propose
|
||||
TEMPO="http://localhost:3200"
|
||||
PROM="http://localhost:9090"
|
||||
|
||||
# Counters for pass/fail
|
||||
PASS=0
|
||||
FAIL=0
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
log() { printf "\033[1;34m[INFO]\033[0m %s\n" "$*"; }
|
||||
ok() { printf "\033[1;32m[PASS]\033[0m %s\n" "$*"; PASS=$((PASS + 1)); }
|
||||
fail() { printf "\033[1;31m[FAIL]\033[0m %s\n" "$*"; FAIL=$((FAIL + 1)); }
|
||||
die() { printf "\033[1;31m[ERROR]\033[0m %s\n" "$*" >&2; exit 1; }
|
||||
|
||||
check_span() {
|
||||
local op="$1"
|
||||
local count
|
||||
count=$(curl -sf "$TEMPO/api/search" \
|
||||
--data-urlencode "q={resource.service.name=\"rippled\" && name=\"$op\"}" \
|
||||
--data-urlencode "limit=5" \
|
||||
| jq '.traces | length' 2>/dev/null || echo 0)
|
||||
if [ "$count" -gt 0 ]; then
|
||||
ok "$op ($count traces)"
|
||||
else
|
||||
fail "$op (0 traces)"
|
||||
fi
|
||||
}
|
||||
|
||||
cleanup() {
|
||||
log "Cleaning up..."
|
||||
# Kill xrpld nodes
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
local pidfile="$WORKDIR/node$i/xrpld.pid"
|
||||
if [ -f "$pidfile" ]; then
|
||||
kill "$(cat "$pidfile")" 2>/dev/null || true
|
||||
rm -f "$pidfile"
|
||||
fi
|
||||
done
|
||||
# Also kill any straggling xrpld processes from our workdir
|
||||
pkill -f "$WORKDIR" 2>/dev/null || true
|
||||
# Stop docker stack
|
||||
docker compose -f "$COMPOSE_FILE" down 2>/dev/null || true
|
||||
# Remove workdir
|
||||
rm -rf "$WORKDIR"
|
||||
log "Cleanup complete."
|
||||
}
|
||||
|
||||
# Handle --cleanup flag
|
||||
if [ "${1:-}" = "--cleanup" ]; then
|
||||
cleanup
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 0: Prerequisites
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Checking prerequisites..."
|
||||
|
||||
command -v docker >/dev/null 2>&1 || die "docker not found"
|
||||
docker compose version >/dev/null 2>&1 || die "docker compose (v2) not found"
|
||||
command -v curl >/dev/null 2>&1 || die "curl not found"
|
||||
command -v jq >/dev/null 2>&1 || die "jq not found"
|
||||
[ -x "$XRPLD" ] || die "xrpld binary not found at $XRPLD (build with telemetry=ON)"
|
||||
[ -f "$COMPOSE_FILE" ] || die "docker-compose.yml not found at $COMPOSE_FILE"
|
||||
[ -f "$STANDALONE_CFG" ] || die "xrpld-telemetry.cfg not found at $STANDALONE_CFG"
|
||||
|
||||
log "All prerequisites met."
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 1: Clean previous run
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Cleaning previous run data..."
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
pidfile="$WORKDIR/node$i/xrpld.pid"
|
||||
if [ -f "$pidfile" ]; then
|
||||
kill "$(cat "$pidfile")" 2>/dev/null || true
|
||||
fi
|
||||
done
|
||||
pkill -f "$WORKDIR" 2>/dev/null || true
|
||||
# Kill any xrpld using the standalone config (from key generation)
|
||||
pkill -f "xrpld-telemetry.cfg" 2>/dev/null || true
|
||||
sleep 2
|
||||
rm -rf "$WORKDIR"
|
||||
mkdir -p "$WORKDIR"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 2: Start observability stack
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Starting observability stack..."
|
||||
docker compose -f "$COMPOSE_FILE" up -d
|
||||
|
||||
log "Waiting for otel-collector to be ready..."
|
||||
for attempt in $(seq 1 30); do
|
||||
# The OTLP HTTP endpoint returns 405 for GET (expects POST), which
|
||||
# means it is listening. curl -sf would fail on 405, so we check
|
||||
# the HTTP status code explicitly.
|
||||
status=$(curl -so /dev/null -w '%{http_code}' http://localhost:4318/ 2>/dev/null || echo 000)
|
||||
if [ "$status" != "000" ]; then
|
||||
log "otel-collector ready (attempt $attempt, HTTP $status)."
|
||||
break
|
||||
fi
|
||||
if [ "$attempt" -eq 30 ]; then
|
||||
die "otel-collector not ready after 30s"
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
log "Waiting for Tempo to be ready..."
|
||||
for attempt in $(seq 1 30); do
|
||||
if curl -sf "$TEMPO/ready" >/dev/null 2>&1; then
|
||||
log "Tempo ready (attempt $attempt)."
|
||||
break
|
||||
fi
|
||||
if [ "$attempt" -eq 30 ]; then
|
||||
die "Tempo not ready after 30s"
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 3: Generate validator keys
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Generating $NUM_NODES validator key pairs..."
|
||||
|
||||
# Start a temporary standalone xrpld for key generation
|
||||
TEMP_DATA="$WORKDIR/temp-keygen"
|
||||
mkdir -p "$TEMP_DATA"
|
||||
|
||||
# Create a minimal temp config for key generation
|
||||
TEMP_CFG="$TEMP_DATA/xrpld.cfg"
|
||||
cat > "$TEMP_CFG" <<EOCFG
|
||||
[server]
|
||||
port_rpc_temp
|
||||
|
||||
[port_rpc_temp]
|
||||
port = 5099
|
||||
ip = 127.0.0.1
|
||||
admin = 127.0.0.1
|
||||
protocol = http
|
||||
|
||||
[node_db]
|
||||
type=NuDB
|
||||
path=$TEMP_DATA/nudb
|
||||
online_delete=256
|
||||
|
||||
[database_path]
|
||||
$TEMP_DATA/db
|
||||
|
||||
[debug_logfile]
|
||||
$TEMP_DATA/debug.log
|
||||
|
||||
[ssl_verify]
|
||||
0
|
||||
EOCFG
|
||||
|
||||
"$XRPLD" --conf "$TEMP_CFG" -a --start > "$TEMP_DATA/stdout.log" 2>&1 &
|
||||
TEMP_PID=$!
|
||||
log "Temporary xrpld started (PID $TEMP_PID), waiting for RPC..."
|
||||
|
||||
for attempt in $(seq 1 30); do
|
||||
if curl -sf http://localhost:5099 -d '{"method":"server_info"}' >/dev/null 2>&1; then
|
||||
log "Temporary xrpld RPC ready (attempt $attempt)."
|
||||
break
|
||||
fi
|
||||
if [ "$attempt" -eq 30 ]; then
|
||||
kill "$TEMP_PID" 2>/dev/null || true
|
||||
die "Temporary xrpld RPC not ready after 30s"
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
declare -a SEEDS
|
||||
declare -a PUBKEYS
|
||||
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
result=$(curl -sf http://localhost:5099 -d '{"method":"validation_create"}')
|
||||
seed=$(echo "$result" | jq -r '.result.validation_seed')
|
||||
pubkey=$(echo "$result" | jq -r '.result.validation_public_key')
|
||||
if [ -z "$seed" ] || [ "$seed" = "null" ]; then
|
||||
kill "$TEMP_PID" 2>/dev/null || true
|
||||
die "Failed to generate key pair $i"
|
||||
fi
|
||||
SEEDS+=("$seed")
|
||||
PUBKEYS+=("$pubkey")
|
||||
log " Node $i: $pubkey"
|
||||
done
|
||||
|
||||
kill "$TEMP_PID" 2>/dev/null || true
|
||||
wait "$TEMP_PID" 2>/dev/null || true
|
||||
rm -rf "$TEMP_DATA"
|
||||
log "Key generation complete."
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 4: Generate node configs and validators.txt
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Generating node configs..."
|
||||
|
||||
# Create shared validators.txt
|
||||
VALIDATORS_FILE="$WORKDIR/validators.txt"
|
||||
{
|
||||
echo "[validators]"
|
||||
for i in $(seq 0 $((NUM_NODES - 1))); do
|
||||
echo "${PUBKEYS[$i]}"
|
||||
done
|
||||
} > "$VALIDATORS_FILE"
|
||||
|
||||
# Create per-node configs
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
NODE_DIR="$WORKDIR/node$i"
|
||||
mkdir -p "$NODE_DIR/nudb" "$NODE_DIR/db"
|
||||
|
||||
RPC_PORT=$((RPC_PORT_BASE + i - 1))
|
||||
PEER_PORT=$((PEER_PORT_BASE + i - 1))
|
||||
SEED="${SEEDS[$((i - 1))]}"
|
||||
|
||||
# Build ips_fixed list (all peers except self)
|
||||
IPS_FIXED=""
|
||||
for j in $(seq 1 "$NUM_NODES"); do
|
||||
if [ "$j" -ne "$i" ]; then
|
||||
IPS_FIXED="${IPS_FIXED}127.0.0.1 $((PEER_PORT_BASE + j - 1))
|
||||
"
|
||||
fi
|
||||
done
|
||||
|
||||
cat > "$NODE_DIR/xrpld.cfg" <<EOCFG
|
||||
[server]
|
||||
port_rpc
|
||||
port_peer
|
||||
|
||||
[port_rpc]
|
||||
port = $RPC_PORT
|
||||
ip = 127.0.0.1
|
||||
admin = 127.0.0.1
|
||||
protocol = http
|
||||
|
||||
[port_peer]
|
||||
port = $PEER_PORT
|
||||
ip = 0.0.0.0
|
||||
protocol = peer
|
||||
|
||||
[node_db]
|
||||
type=NuDB
|
||||
path=$NODE_DIR/nudb
|
||||
online_delete=256
|
||||
|
||||
[database_path]
|
||||
$NODE_DIR/db
|
||||
|
||||
[debug_logfile]
|
||||
$NODE_DIR/debug.log
|
||||
|
||||
[validation_seed]
|
||||
$SEED
|
||||
|
||||
[validators_file]
|
||||
$VALIDATORS_FILE
|
||||
|
||||
[ips_fixed]
|
||||
${IPS_FIXED}
|
||||
[peer_private]
|
||||
1
|
||||
|
||||
[telemetry]
|
||||
enabled=1
|
||||
service_instance_id=Node-${i}
|
||||
endpoint=http://localhost:4318/v1/traces
|
||||
exporter=otlp_http
|
||||
sampling_ratio=1.0
|
||||
batch_size=512
|
||||
batch_delay_ms=2000
|
||||
max_queue_size=2048
|
||||
trace_rpc=1
|
||||
trace_transactions=1
|
||||
trace_consensus=1
|
||||
trace_peer=0
|
||||
trace_ledger=1
|
||||
|
||||
[rpc_startup]
|
||||
{ "command": "log_level", "severity": "warning" }
|
||||
|
||||
[ssl_verify]
|
||||
0
|
||||
EOCFG
|
||||
|
||||
log " Node $i config: RPC=$RPC_PORT, Peer=$PEER_PORT"
|
||||
done
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 5: Start all 6 nodes
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Starting $NUM_NODES xrpld nodes..."
|
||||
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
NODE_DIR="$WORKDIR/node$i"
|
||||
"$XRPLD" --conf "$NODE_DIR/xrpld.cfg" --start > "$NODE_DIR/stdout.log" 2>&1 &
|
||||
echo $! > "$NODE_DIR/xrpld.pid"
|
||||
log " Node $i started (PID $(cat "$NODE_DIR/xrpld.pid"))"
|
||||
done
|
||||
|
||||
# Give nodes a moment to initialize
|
||||
sleep 5
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 6: Wait for consensus
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Waiting for nodes to reach 'proposing' state (timeout: ${CONSENSUS_TIMEOUT}s)..."
|
||||
|
||||
start_time=$(date +%s)
|
||||
nodes_ready=0
|
||||
|
||||
while [ "$nodes_ready" -lt "$NUM_NODES" ]; do
|
||||
elapsed=$(( $(date +%s) - start_time ))
|
||||
if [ "$elapsed" -ge "$CONSENSUS_TIMEOUT" ]; then
|
||||
fail "Consensus timeout after ${CONSENSUS_TIMEOUT}s ($nodes_ready/$NUM_NODES nodes ready)"
|
||||
log "Continuing with partial consensus..."
|
||||
break
|
||||
fi
|
||||
|
||||
nodes_ready=0
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
RPC_PORT=$((RPC_PORT_BASE + i - 1))
|
||||
state=$(curl -sf "http://localhost:$RPC_PORT" \
|
||||
-d '{"method":"server_info"}' 2>/dev/null \
|
||||
| jq -r '.result.info.server_state' 2>/dev/null || echo "unreachable")
|
||||
if [ "$state" = "proposing" ]; then
|
||||
nodes_ready=$((nodes_ready + 1))
|
||||
fi
|
||||
done
|
||||
printf "\r %d/%d nodes proposing (%ds elapsed)..." "$nodes_ready" "$NUM_NODES" "$elapsed"
|
||||
if [ "$nodes_ready" -lt "$NUM_NODES" ]; then
|
||||
sleep 3
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
|
||||
if [ "$nodes_ready" -eq "$NUM_NODES" ]; then
|
||||
ok "All $NUM_NODES nodes reached 'proposing' state"
|
||||
else
|
||||
fail "Only $nodes_ready/$NUM_NODES nodes reached 'proposing' state"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 6b: Wait for validated ledger
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Waiting for first validated ledger..."
|
||||
for attempt in $(seq 1 60); do
|
||||
val_seq=$(curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d '{"method":"server_info"}' 2>/dev/null \
|
||||
| jq -r '.result.info.validated_ledger.seq // 0' 2>/dev/null || echo 0)
|
||||
if [ "$val_seq" -gt 2 ] 2>/dev/null; then
|
||||
ok "First validated ledger: seq $val_seq"
|
||||
break
|
||||
fi
|
||||
if [ "$attempt" -eq 60 ]; then
|
||||
fail "No validated ledger after 60s"
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 7: Exercise RPC spans (Phase 2)
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Exercising RPC spans..."
|
||||
|
||||
curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d '{"method":"server_info"}' > /dev/null
|
||||
curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d '{"method":"server_state"}' > /dev/null
|
||||
curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d '{"method":"ledger","params":[{"ledger_index":"current"}]}' > /dev/null
|
||||
|
||||
log "RPC commands sent. Waiting 5s for batch export..."
|
||||
sleep 5
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 8: Submit transaction (Phase 3)
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Submitting Payment transaction..."
|
||||
|
||||
# Generate a destination wallet
|
||||
log " Generating destination wallet..."
|
||||
wallet_result=$(curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d '{"method":"wallet_propose"}')
|
||||
DEST_ACCOUNT=$(echo "$wallet_result" | jq -r '.result.account_id' 2>/dev/null)
|
||||
if [ -z "$DEST_ACCOUNT" ] || [ "$DEST_ACCOUNT" = "null" ]; then
|
||||
fail "Could not generate destination wallet"
|
||||
DEST_ACCOUNT="rrrrrrrrrrrrrrrrrrrrrhoLvTp" # ACCOUNT_ZERO fallback
|
||||
fi
|
||||
log " Destination: $DEST_ACCOUNT"
|
||||
|
||||
# Get genesis account info
|
||||
acct_result=$(curl -sf "http://localhost:$RPC_PORT_BASE" \
|
||||
-d "{\"method\":\"account_info\",\"params\":[{\"account\":\"$GENESIS_ACCOUNT\"}]}")
|
||||
seq_num=$(echo "$acct_result" | jq -r '.result.account_data.Sequence' 2>/dev/null || echo "unknown")
|
||||
log " Genesis account sequence: $seq_num"
|
||||
|
||||
# Submit payment
|
||||
submit_result=$(curl -sf "http://localhost:$RPC_PORT_BASE" -d "{
|
||||
\"method\": \"submit\",
|
||||
\"params\": [{
|
||||
\"secret\": \"$GENESIS_SEED\",
|
||||
\"tx_json\": {
|
||||
\"TransactionType\": \"Payment\",
|
||||
\"Account\": \"$GENESIS_ACCOUNT\",
|
||||
\"Destination\": \"$DEST_ACCOUNT\",
|
||||
\"Amount\": \"10000000\"
|
||||
}
|
||||
}]
|
||||
}")
|
||||
|
||||
engine_result=$(echo "$submit_result" | jq -r '.result.engine_result' 2>/dev/null || echo "unknown")
|
||||
tx_hash=$(echo "$submit_result" | jq -r '.result.tx_json.hash' 2>/dev/null || echo "unknown")
|
||||
|
||||
if [ "$engine_result" = "tesSUCCESS" ] || [ "$engine_result" = "terQUEUED" ]; then
|
||||
ok "Transaction submitted: $engine_result (hash: ${tx_hash:0:16}...)"
|
||||
else
|
||||
fail "Transaction submission: $engine_result"
|
||||
log " Full response: $(echo "$submit_result" | jq -c .result 2>/dev/null)"
|
||||
fi
|
||||
|
||||
log "Waiting 15s for consensus round + batch export..."
|
||||
sleep 15
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 9: Verify Tempo traces
|
||||
# ---------------------------------------------------------------------------
|
||||
log "Verifying spans in Tempo..."
|
||||
|
||||
# Check service registration
|
||||
services=$(curl -sf "$TEMPO/api/v2/search/tag/resource.service.name/values" \
|
||||
| jq -r '.tagValues[].value' 2>/dev/null || echo "")
|
||||
if echo "$services" | grep -q "rippled"; then
|
||||
ok "Service 'rippled' registered in Tempo"
|
||||
else
|
||||
fail "Service 'rippled' NOT found in Tempo (found: $services)"
|
||||
fi
|
||||
|
||||
log ""
|
||||
log "--- Phase 2: RPC Spans ---"
|
||||
check_span "rpc.request"
|
||||
check_span "rpc.process"
|
||||
check_span "rpc.command.server_info"
|
||||
check_span "rpc.command.server_state"
|
||||
check_span "rpc.command.ledger"
|
||||
|
||||
log ""
|
||||
log "--- Phase 3: Transaction Spans ---"
|
||||
check_span "tx.process"
|
||||
check_span "tx.receive"
|
||||
|
||||
log ""
|
||||
log "--- Phase 4: Consensus Spans ---"
|
||||
check_span "consensus.proposal.send"
|
||||
check_span "consensus.ledger_close"
|
||||
check_span "consensus.accept"
|
||||
check_span "consensus.validation.send"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 10: Verify Prometheus spanmetrics
|
||||
# ---------------------------------------------------------------------------
|
||||
log ""
|
||||
log "--- Phase 5: Spanmetrics ---"
|
||||
log "Waiting 20s for Prometheus scrape cycle..."
|
||||
sleep 20
|
||||
|
||||
calls_count=$(curl -sf "$PROM/api/v1/query?query=traces_span_metrics_calls_total" \
|
||||
| jq '.data.result | length' 2>/dev/null || echo 0)
|
||||
if [ "$calls_count" -gt 0 ]; then
|
||||
ok "Prometheus: traces_span_metrics_calls_total ($calls_count series)"
|
||||
else
|
||||
fail "Prometheus: traces_span_metrics_calls_total (0 series)"
|
||||
fi
|
||||
|
||||
duration_count=$(curl -sf "$PROM/api/v1/query?query=traces_span_metrics_duration_milliseconds_count" \
|
||||
| jq '.data.result | length' 2>/dev/null || echo 0)
|
||||
if [ "$duration_count" -gt 0 ]; then
|
||||
ok "Prometheus: duration histogram ($duration_count series)"
|
||||
else
|
||||
fail "Prometheus: duration histogram (0 series)"
|
||||
fi
|
||||
|
||||
# Check Grafana
|
||||
if curl -sf http://localhost:3000/api/health > /dev/null 2>&1; then
|
||||
ok "Grafana: healthy at localhost:3000"
|
||||
else
|
||||
fail "Grafana: not reachable at localhost:3000"
|
||||
fi
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Step 11: Summary
|
||||
# ---------------------------------------------------------------------------
|
||||
echo ""
|
||||
echo "==========================================================="
|
||||
echo " INTEGRATION TEST RESULTS"
|
||||
echo "==========================================================="
|
||||
printf " \033[1;32mPASSED: %d\033[0m\n" "$PASS"
|
||||
printf " \033[1;31mFAILED: %d\033[0m\n" "$FAIL"
|
||||
echo "==========================================================="
|
||||
echo ""
|
||||
echo " Observability stack is running:"
|
||||
echo ""
|
||||
echo " Tempo: http://localhost:3200"
|
||||
echo " Grafana: http://localhost:3000"
|
||||
echo " Prometheus: http://localhost:9090"
|
||||
echo ""
|
||||
echo " xrpld nodes (6) are running:"
|
||||
for i in $(seq 1 "$NUM_NODES"); do
|
||||
RPC_PORT=$((RPC_PORT_BASE + i - 1))
|
||||
PEER_PORT=$((PEER_PORT_BASE + i - 1))
|
||||
echo " Node $i: RPC=localhost:$RPC_PORT Peer=:$PEER_PORT PID=$(cat "$WORKDIR/node$i/xrpld.pid" 2>/dev/null || echo 'unknown')"
|
||||
done
|
||||
echo ""
|
||||
echo " To tear down:"
|
||||
echo " bash docker/telemetry/integration-test.sh --cleanup"
|
||||
echo ""
|
||||
echo "==========================================================="
|
||||
|
||||
if [ "$FAIL" -gt 0 ]; then
|
||||
exit 1
|
||||
fi
|
||||
@@ -1,9 +1,12 @@
|
||||
# OpenTelemetry Collector configuration for xrpld development.
|
||||
#
|
||||
# Pipeline: OTLP receiver -> batch processor -> debug + Tempo.
|
||||
# Pipelines:
|
||||
# traces: OTLP receiver -> batch processor -> debug + Tempo + spanmetrics
|
||||
# metrics: spanmetrics connector -> Prometheus exporter
|
||||
#
|
||||
# xrpld sends traces via OTLP/HTTP to port 4318. The collector batches
|
||||
# them and forwards to Tempo via OTLP/gRPC on the Docker network. Tempo
|
||||
# is queryable via Grafana Explore using TraceQL.
|
||||
# them, forwards to Tempo, and derives RED metrics via the spanmetrics
|
||||
# connector, which Prometheus scrapes on port 8889.
|
||||
|
||||
receivers:
|
||||
otlp:
|
||||
@@ -18,6 +21,21 @@ processors:
|
||||
timeout: 1s
|
||||
send_batch_size: 100
|
||||
|
||||
connectors:
|
||||
spanmetrics:
|
||||
# Expose service.instance.id (node public key) as a Prometheus label so
|
||||
# Grafana dashboards can filter metrics by individual node.
|
||||
resource_metrics_key_attributes:
|
||||
- service.instance.id
|
||||
histogram:
|
||||
explicit:
|
||||
buckets: [1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 5s]
|
||||
dimensions:
|
||||
- name: xrpl.rpc.command
|
||||
- name: xrpl.rpc.status
|
||||
- name: xrpl.consensus.mode
|
||||
- name: xrpl.tx.local
|
||||
|
||||
exporters:
|
||||
debug:
|
||||
verbosity: detailed
|
||||
@@ -25,6 +43,8 @@ exporters:
|
||||
endpoint: tempo:4317
|
||||
tls:
|
||||
insecure: true
|
||||
prometheus:
|
||||
endpoint: 0.0.0.0:8889
|
||||
|
||||
extensions:
|
||||
health_check:
|
||||
@@ -36,4 +56,7 @@ service:
|
||||
traces:
|
||||
receivers: [otlp]
|
||||
processors: [batch]
|
||||
exporters: [debug, otlp/tempo]
|
||||
exporters: [debug, otlp/tempo, spanmetrics]
|
||||
metrics:
|
||||
receivers: [spanmetrics]
|
||||
exporters: [prometheus]
|
||||
|
||||
9
docker/telemetry/prometheus.yml
Normal file
9
docker/telemetry/prometheus.yml
Normal file
@@ -0,0 +1,9 @@
|
||||
# Prometheus configuration for scraping spanmetrics from OTel Collector.
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: otel-collector
|
||||
static_configs:
|
||||
- targets: ["otel-collector:8889"]
|
||||
60
docker/telemetry/xrpld-telemetry.cfg
Normal file
60
docker/telemetry/xrpld-telemetry.cfg
Normal file
@@ -0,0 +1,60 @@
|
||||
# Standalone xrpld configuration with OpenTelemetry enabled.
|
||||
#
|
||||
# Usage:
|
||||
# 1. Start the observability stack:
|
||||
# docker compose -f docker/telemetry/docker-compose.yml up -d
|
||||
# 2. Run xrpld in standalone mode:
|
||||
# ./xrpld --conf docker/telemetry/xrpld-telemetry.cfg -a --start
|
||||
# 3. Send RPC commands to exercise tracing:
|
||||
# curl -s http://localhost:5005 -d '{"method":"server_info"}'
|
||||
# 4. View traces in Jaeger UI: http://localhost:16686
|
||||
|
||||
[server]
|
||||
port_rpc_admin_local
|
||||
port_ws_admin_local
|
||||
|
||||
[port_rpc_admin_local]
|
||||
port = 5005
|
||||
ip = 127.0.0.1
|
||||
admin = 127.0.0.1
|
||||
protocol = http
|
||||
|
||||
[port_ws_admin_local]
|
||||
port = 6006
|
||||
ip = 127.0.0.1
|
||||
admin = 127.0.0.1
|
||||
protocol = ws
|
||||
|
||||
[node_db]
|
||||
type=NuDB
|
||||
path=docker/telemetry/data/nudb
|
||||
online_delete=256
|
||||
advisory_delete=0
|
||||
|
||||
[database_path]
|
||||
docker/telemetry/data
|
||||
|
||||
[debug_logfile]
|
||||
docker/telemetry/data/debug.log
|
||||
|
||||
[rpc_startup]
|
||||
{ "command": "log_level", "severity": "debug" }
|
||||
|
||||
[ssl_verify]
|
||||
0
|
||||
|
||||
# --- OpenTelemetry tracing ---
|
||||
[telemetry]
|
||||
enabled=1
|
||||
service_instance_id=rippled-standalone
|
||||
endpoint=http://localhost:4318/v1/traces
|
||||
exporter=otlp_http
|
||||
sampling_ratio=1.0
|
||||
batch_size=512
|
||||
batch_delay_ms=5000
|
||||
max_queue_size=2048
|
||||
trace_rpc=1
|
||||
trace_transactions=1
|
||||
trace_consensus=1
|
||||
trace_peer=0
|
||||
trace_ledger=1
|
||||
Reference in New Issue
Block a user