mirror of
https://github.com/XRPLF/rippled.git
synced 2026-06-06 18:26:51 +00:00
The Phase 10 validation harness had drifted from the code's recording surface and the telemetry-validation CI job was failing before it could build. CI fix (telemetry-validation.yml): - Replace nonexistent local action ./.github/actions/print-env with the remote XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this). - Sync prepare-runner and upload-artifact action SHAs to the canonical workflow. Recording-surface reconciliation (docker/telemetry/workload/): - Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id, ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...). Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared constant), while consensus.validation.send uses bare ledger_hash. - Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq); ledger.build carries ledger_seq/close_* (not tx_count/tx_failed). - Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop the never-emitted duration_ms; rebuild the parent-child map accordingly. - Add the new spans the code emits: apply-pipeline stage spans (tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq.*, consensus sub-spans (round/establish/update_positions/check/phase.open), ledger.acquire, grpc.*, pathfind.*. Conditional spans are marked optional so they are skipped (not failed) when the workload does not exercise them. - validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span attrs); add optional-span handling that skips missing optional spans while still validating attributes when present. - expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics, xrpld_job_count, the 15 on-disk xrpld-* dashboard UIDs, and the real bare spanmetrics dimension labels. - regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message. Metrics pipeline fix: - Switch node [insight] config from server=statsd/prefix=rippled to server=otel + /v1/metrics endpoint + prefix=xrpld across run-full-validation.sh, xrpld-validator.cfg.template, benchmark.sh and the workload compose. The collector has no StatsD receiver, so system metrics only reach Prometheus over OTLP. Synthetic load for new spans: - Add ripple_path_find to the RPC load generator (drives pathfind.* spans). - Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.*). All facts verified against the *SpanNames.h headers and a live xrpld node + collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type], 279 xrpld_ Prometheus metrics and zero rippled_). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
109 lines
3.3 KiB
JSON
109 lines
3.3 KiB
JSON
{
|
|
"profiles": {
|
|
"full-validation": {
|
|
"description": "Full 18-dashboard coverage with burst/idle/plateau patterns",
|
|
"phases": [
|
|
{
|
|
"name": "warmup",
|
|
"description": "Low load to populate baseline gauges and node health metrics",
|
|
"duration_sec": 30,
|
|
"rpc": {
|
|
"rate": 5,
|
|
"weights": { "server_info": 50, "fee": 30, "ledger": 20 }
|
|
},
|
|
"tx": null
|
|
},
|
|
{
|
|
"name": "steady-state",
|
|
"description": "Medium sustained load — plateau data for all dashboards",
|
|
"duration_sec": 60,
|
|
"rpc": { "rate": 30 },
|
|
"tx": { "tps": 3 }
|
|
},
|
|
{
|
|
"name": "rpc-burst",
|
|
"description": "Heavy RPC to saturate job queue and spike latency",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 100 },
|
|
"tx": null
|
|
},
|
|
{
|
|
"name": "tx-flood",
|
|
"description": "High TX rate for fee escalation and TxQ pressure",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 5, "weights": { "server_info": 50, "fee": 50 } },
|
|
"tx": {
|
|
"tps": 20,
|
|
"weights": { "Payment": 70, "OfferCreate": 20, "TrustSet": 10 }
|
|
}
|
|
},
|
|
{
|
|
"name": "txq-burst",
|
|
"description": "Single-type Payment burst at high TPS to force open-ledger fee escalation and TxQ queueing, exercising the txq.* spans (txq.enqueue / txq.accept / txq.accept.tx / txq.cleanup)",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 5, "weights": { "fee": 100 } },
|
|
"tx": {
|
|
"tps": 60,
|
|
"weights": { "Payment": 100 }
|
|
}
|
|
},
|
|
{
|
|
"name": "mixed-peak",
|
|
"description": "Realistic peak load — consensus and ledger ops under stress",
|
|
"duration_sec": 60,
|
|
"rpc": { "rate": 50 },
|
|
"tx": { "tps": 10 }
|
|
},
|
|
{
|
|
"name": "cooldown",
|
|
"description": "Low load for recovery metrics and state transition data",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 5, "weights": { "server_info": 80, "fee": 20 } },
|
|
"tx": null
|
|
}
|
|
],
|
|
"propagation_wait_sec": 60
|
|
},
|
|
"quick-smoke": {
|
|
"description": "Fast smoke test — minimal data for CI quick checks",
|
|
"phases": [
|
|
{
|
|
"name": "smoke",
|
|
"description": "Single phase covering all generator types",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 20 },
|
|
"tx": { "tps": 3 }
|
|
}
|
|
],
|
|
"propagation_wait_sec": 30
|
|
},
|
|
"stress": {
|
|
"description": "Heavy sustained load for performance benchmarking",
|
|
"phases": [
|
|
{
|
|
"name": "ramp-up",
|
|
"description": "Gradually increasing load",
|
|
"duration_sec": 30,
|
|
"rpc": { "rate": 20 },
|
|
"tx": { "tps": 5 }
|
|
},
|
|
{
|
|
"name": "peak",
|
|
"description": "Maximum sustained load",
|
|
"duration_sec": 120,
|
|
"rpc": { "rate": 150 },
|
|
"tx": { "tps": 25 }
|
|
},
|
|
{
|
|
"name": "sustain",
|
|
"description": "Continued high load for stability check",
|
|
"duration_sec": 60,
|
|
"rpc": { "rate": 100 },
|
|
"tx": { "tps": 15 }
|
|
}
|
|
],
|
|
"propagation_wait_sec": 60
|
|
}
|
|
}
|
|
}
|