Commit Graph

5 Commits

Author SHA1 Message Date
Pratik Mankawde
592e546f82 fix(telemetry): align Phase 10 workload configs with xrpld_ metric prefix
Phase 10's workload validation configs (expected_metrics.json,
regression-metrics.json, validate_telemetry.py) queried the
MetricsRegistry metrics under the rippled_ prefix, but MetricsRegistry
emits them as xrpld_ (see MetricsRegistry.cpp). On a live run the
workload validator reported every MetricsRegistry metric as missing,
masking genuine regressions.

Rename the following to xrpld_ across the workload validator,
expected-metrics manifest, and regression-metrics template:

- nodestore_state, cache_metrics, txq_metrics, load_factor_metrics,
  object_count
- rpc_method_started_total / _finished_total / _errored_total /
  _duration_us
- job_queued_total / _started_total / _finished_total /
  _queued_duration_us_bucket / _running_duration_us_bucket
- peer_quality, server_info, validator_health, ledger_economy,
  db_metrics, complete_ledgers, build_info, state_tracking,
  storage_detail
- ledgers_closed_total, validations_sent_total,
  validations_checked_total, state_changes_total
- validation_agreement, validation_agreements_total,
  validation_missed_total

Mirrors the phase-9 fix in commit 5601615952.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 15:01:13 +01:00
Pratik Mankawde
a142a700e8 refactor(telemetry): migrate Phase 10 validation from Jaeger to Tempo native API
Migrate validate_telemetry.py to Tempo TraceQL search API, remove
Jaeger service from workload docker-compose, update readiness checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:32:02 +01:00
Pratik Mankawde
ff1502f939 feat(telemetry): add workload orchestrator with phased load profiles
Add a profile-driven workload orchestrator that executes sequential load
phases with configurable RPC rates and TX throughput. Three profiles:
full-validation (6 phases covering all 18 dashboards), quick-smoke (CI),
and stress (benchmarking). Fix 10 validation failures: correct Phase 9
metric prefixes, relax peer latency bounds for localhost clusters, and
allow sub-microsecond span durations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:32:02 +01:00
Pratik Mankawde
711ae43174 feat(telemetry): add external dashboard parity validation checks (Task 10.8)
Add ~28 validation checks for external dashboard parity:
- 8 span attribute checks (server_info, tx.receive, consensus, peer spans)
- 13 metric existence checks (validation agreement, validator health,
  peer quality, ledger economy, state tracking, counters, storage)
- 3 dashboard load checks (validator-health, peer-quality, system-node-health)
- 4 value sanity checks (agreement %, UNL expiry, latency, state value)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:32:02 +01:00
Pratik Mankawde
5de8c520d1 Phase 10: Workload validation - synthetic load generation and telemetry checks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:32:02 +01:00