mirror of
https://github.com/XRPLF/rippled.git
synced 2026-04-29 15:37:57 +00:00
Add comprehensive workload harness for end-to-end validation of the
Phases 1-9 telemetry stack:
Task 10.1 — Multi-node test harness:
- docker-compose.workload.yaml with full OTel stack (Collector, Jaeger,
Tempo, Prometheus, Loki, Grafana)
- generate-validator-keys.sh for automated key generation
- xrpld-validator.cfg.template for node configuration
Task 10.2 — RPC load generator:
- rpc_load_generator.py with WebSocket client, configurable rates,
realistic command distribution (40% health, 30% wallet, 15% explorer,
10% tx lookups, 5% DEX), W3C traceparent injection
Task 10.3 — Transaction submitter:
- tx_submitter.py with 10 transaction types (Payment, OfferCreate,
OfferCancel, TrustSet, NFTokenMint, NFTokenCreateOffer, EscrowCreate,
EscrowFinish, AMMCreate, AMMDeposit), auto-funded test accounts
Task 10.4 — Telemetry validation suite:
- validate_telemetry.py checking spans (Jaeger), metrics (Prometheus),
log-trace correlation (Loki), dashboards (Grafana)
- expected_spans.json (17 span types, 22 attributes, 3 hierarchies)
- expected_metrics.json (SpanMetrics, StatsD, Phase 9, dashboards)
Task 10.5 — Performance benchmark suite:
- benchmark.sh for baseline vs telemetry comparison
- collect_system_metrics.sh for CPU/memory/latency sampling
- Thresholds: <3% CPU, <5MB memory, <2ms RPC p99, <5% TPS, <1% consensus
Task 10.6 — CI integration:
- telemetry-validation.yml GitHub Actions workflow
- run-full-validation.sh orchestrator script
- Manual trigger + telemetry branch auto-trigger
Task 10.7 — Documentation:
- workload/README.md with quick start and tool reference
- Updated telemetry-runbook.md with validation and benchmark sections
- Updated 09-data-collection-reference.md with validation inventory
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
198 lines
5.9 KiB
Markdown
198 lines
5.9 KiB
Markdown
# Telemetry Workload Tools
|
|
|
|
Synthetic workload generation and validation tools for rippled's OpenTelemetry telemetry stack. These tools validate that all spans, metrics, dashboards, and log-trace correlation work end-to-end under controlled load.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Build rippled with telemetry enabled
|
|
conan install . --build=missing -o telemetry=True
|
|
cmake --preset default -Dtelemetry=ON
|
|
cmake --build --preset default
|
|
|
|
# Run full validation (starts everything, runs load, validates)
|
|
docker/telemetry/workload/run-full-validation.sh --xrpld .build/xrpld
|
|
|
|
# Cleanup when done
|
|
docker/telemetry/workload/run-full-validation.sh --cleanup
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
run-full-validation.sh (orchestrator)
|
|
|
|
|
|-- docker-compose.workload.yaml
|
|
| |-- otel-collector (traces + StatsD)
|
|
| |-- jaeger (trace search)
|
|
| |-- tempo (trace storage)
|
|
| |-- prometheus (metrics)
|
|
| |-- loki (log aggregation)
|
|
| |-- grafana (dashboards)
|
|
|
|
|
|-- generate-validator-keys.sh
|
|
| -> validator-keys.json, validators.txt
|
|
|
|
|
|-- 5x xrpld nodes (local processes, full telemetry)
|
|
|
|
|
|-- rpc_load_generator.py (WebSocket RPC traffic)
|
|
|-- tx_submitter.py (transaction diversity)
|
|
|
|
|
|-- validate_telemetry.py (pass/fail checks)
|
|
| -> validation-report.json
|
|
|
|
|
|-- benchmark.sh (baseline vs telemetry comparison)
|
|
-> benchmark-report-*.md
|
|
```
|
|
|
|
## Tools Reference
|
|
|
|
### run-full-validation.sh
|
|
|
|
Orchestrates the complete validation pipeline. Starts the telemetry stack, starts a multi-node rippled cluster, generates load, and validates the results.
|
|
|
|
```bash
|
|
# Full validation with defaults
|
|
./run-full-validation.sh --xrpld /path/to/xrpld
|
|
|
|
# Custom load parameters
|
|
./run-full-validation.sh --xrpld /path/to/xrpld \
|
|
--rpc-rate 100 --rpc-duration 300 \
|
|
--tx-tps 10 --tx-duration 300
|
|
|
|
# Include performance benchmarks
|
|
./run-full-validation.sh --xrpld /path/to/xrpld --with-benchmark
|
|
|
|
# Skip Loki checks (if Phase 8 not deployed)
|
|
./run-full-validation.sh --xrpld /path/to/xrpld --skip-loki
|
|
```
|
|
|
|
### rpc_load_generator.py
|
|
|
|
Generates RPC traffic matching realistic production distribution:
|
|
|
|
- 40% health checks (server_info, fee)
|
|
- 30% wallet queries (account_info, account_lines, account_objects)
|
|
- 15% explorer queries (ledger, ledger_data)
|
|
- 10% transaction lookups (tx, account_tx)
|
|
- 5% DEX queries (book_offers, amm_info)
|
|
|
|
```bash
|
|
# Basic usage
|
|
python3 rpc_load_generator.py --endpoints ws://localhost:6006 --rate 50 --duration 120
|
|
|
|
# Multiple endpoints (round-robin)
|
|
python3 rpc_load_generator.py \
|
|
--endpoints ws://localhost:6006 ws://localhost:6007 \
|
|
--rate 100 --duration 300
|
|
|
|
# Custom weights
|
|
python3 rpc_load_generator.py --endpoints ws://localhost:6006 \
|
|
--weights '{"server_info": 80, "account_info": 20}'
|
|
```
|
|
|
|
### tx_submitter.py
|
|
|
|
Submits diverse transaction types to exercise the full span and metric surface:
|
|
|
|
- Payment (XRP transfers)
|
|
- OfferCreate / OfferCancel (DEX activity)
|
|
- TrustSet (trust line creation)
|
|
- NFTokenMint / NFTokenCreateOffer (NFT activity)
|
|
- EscrowCreate / EscrowFinish (escrow lifecycle)
|
|
- AMMCreate / AMMDeposit (AMM pool operations)
|
|
|
|
```bash
|
|
# Basic usage
|
|
python3 tx_submitter.py --endpoint ws://localhost:6006 --tps 5 --duration 120
|
|
|
|
# Custom mix
|
|
python3 tx_submitter.py --endpoint ws://localhost:6006 \
|
|
--weights '{"Payment": 60, "OfferCreate": 20, "TrustSet": 20}'
|
|
```
|
|
|
|
### validate_telemetry.py
|
|
|
|
Automated validation that all expected telemetry data exists:
|
|
|
|
- **Span validation**: All 16+ span types with required attributes
|
|
- **Metric validation**: SpanMetrics, StatsD, Phase 9 metrics
|
|
- **Log-trace correlation**: trace_id/span_id in Loki logs
|
|
- **Dashboard validation**: All 10 Grafana dashboards accessible
|
|
|
|
```bash
|
|
# Run all validations
|
|
python3 validate_telemetry.py --report /tmp/report.json
|
|
|
|
# Skip Loki checks
|
|
python3 validate_telemetry.py --skip-loki --report /tmp/report.json
|
|
```
|
|
|
|
### benchmark.sh
|
|
|
|
Compares baseline (no telemetry) vs telemetry-enabled performance:
|
|
|
|
```bash
|
|
./benchmark.sh --xrpld /path/to/xrpld --duration 300
|
|
```
|
|
|
|
Thresholds (configurable via environment):
|
|
|
|
| Metric | Threshold | Env Variable |
|
|
| ----------------- | --------- | --------------------------- |
|
|
| CPU overhead | < 3% | BENCH_CPU_OVERHEAD_PCT |
|
|
| Memory overhead | < 5MB | BENCH_MEM_OVERHEAD_MB |
|
|
| RPC p99 latency | < 2ms | BENCH_RPC_LATENCY_IMPACT_MS |
|
|
| Throughput impact | < 5% | BENCH_TPS_IMPACT_PCT |
|
|
| Consensus impact | < 1% | BENCH_CONSENSUS_IMPACT_PCT |
|
|
|
|
## Reading Validation Reports
|
|
|
|
The validation report (`validation-report.json`) is structured as:
|
|
|
|
```json
|
|
{
|
|
"summary": {
|
|
"total": 45,
|
|
"passed": 42,
|
|
"failed": 3,
|
|
"all_passed": false
|
|
},
|
|
"checks": [
|
|
{
|
|
"name": "span.rpc.request",
|
|
"category": "span",
|
|
"passed": true,
|
|
"message": "rpc.request: 15 traces found",
|
|
"details": { "trace_count": 15 }
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
Categories:
|
|
|
|
- **span**: Span type existence and attribute validation
|
|
- **metric**: Prometheus metric existence
|
|
- **log**: Log-trace correlation checks
|
|
- **dashboard**: Grafana dashboard accessibility
|
|
|
|
## CI Integration
|
|
|
|
The validation runs as a GitHub Actions workflow (`.github/workflows/telemetry-validation.yml`):
|
|
|
|
- Triggered manually or on pushes to telemetry branches
|
|
- Builds rippled, starts the full stack, runs load, validates
|
|
- Uploads reports as artifacts
|
|
- Posts summary to PR
|
|
|
|
## Configuration Files
|
|
|
|
| File | Purpose |
|
|
| ------------------------------ | ----------------------------------------------- |
|
|
| `expected_spans.json` | Span inventory (names, attributes, hierarchies) |
|
|
| `expected_metrics.json` | Metric inventory (SpanMetrics, StatsD, Phase 9) |
|
|
| `test_accounts.json` | Test account roles (keys generated at runtime) |
|
|
| `xrpld-validator.cfg.template` | Node config template with placeholders |
|
|
| `requirements.txt` | Python dependencies |
|