Files
rippled/docker/telemetry/workload/regression-metrics.json
Pratik Mankawde dc13e9d680 fix: populate baseline from CI run, remove dead rpc_methods metrics
Populate baselines/baseline-timings.json from the green CI run
(24906110133, commit f11ebc1253). 25/31 metrics have non-null values;
6 span.rpc.* are null due to sparse data in the 3m window.

Remove the rpc_methods section from regression-metrics.json and its
thresholds. rippled_rpc_method_duration_us_bucket is never populated
because PerfLogImp::rpcEnd never calls MetricsRegistry::recordRpcFinished
— only recordRpcStarted is wired up (Phase 9 instrumentation gap).
The span-based rpc.request/rpc.process metrics via spanmetrics already
cover RPC latency.
2026-04-24 20:08:52 +01:00

29 lines
1.3 KiB
JSON

{
"_description": "Metric surface for the OTel-driven regression gate. Each entry names a metric, the quantiles to capture, and how to query Prometheus. The comparator compares current run against baseline-timings.json under these exact keys.",
"_key_format": "{category}.{name}.p{quantile} (e.g. span.tx.process.p99, rpc.server_info.p95, job.transaction.queued.p95)",
"spans": {
"_query_template": "histogram_quantile({quantile}, sum by (le) (rate(traces_span_metrics_duration_milliseconds_bucket{span_name=\"{name}\"}[{window}])))",
"_unit": "ms",
"_quantiles": [0.5, 0.95, 0.99],
"names": [
"rpc.request",
"rpc.process",
"tx.process",
"tx.apply",
"ledger.build",
"ledger.validate",
"ledger.store",
"consensus.ledger_close",
"consensus.accept"
]
},
"job_queue": {
"_queued_template": "histogram_quantile({quantile}, sum by (le) (rate(rippled_job_queued_duration_us_bucket{job_type=\"{name}\"}[{window}])))",
"_running_template": "histogram_quantile({quantile}, sum by (le) (rate(rippled_job_running_duration_us_bucket{job_type=\"{name}\"}[{window}])))",
"_unit": "us",
"_quantiles": [0.95],
"_phases": ["queued", "running"],
"names": ["transaction", "acceptLedger"]
}
}