rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-04-29 15:37:57 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	787b496484	Phase 10: Synthetic workload generation and telemetry validation tools Add comprehensive workload harness for end-to-end validation of the Phases 1-9 telemetry stack: Task 10.1 — Multi-node test harness: - docker-compose.workload.yaml with full OTel stack (Collector, Jaeger, Tempo, Prometheus, Loki, Grafana) - generate-validator-keys.sh for automated key generation - xrpld-validator.cfg.template for node configuration Task 10.2 — RPC load generator: - rpc_load_generator.py with WebSocket client, configurable rates, realistic command distribution (40% health, 30% wallet, 15% explorer, 10% tx lookups, 5% DEX), W3C traceparent injection Task 10.3 — Transaction submitter: - tx_submitter.py with 10 transaction types (Payment, OfferCreate, OfferCancel, TrustSet, NFTokenMint, NFTokenCreateOffer, EscrowCreate, EscrowFinish, AMMCreate, AMMDeposit), auto-funded test accounts Task 10.4 — Telemetry validation suite: - validate_telemetry.py checking spans (Jaeger), metrics (Prometheus), log-trace correlation (Loki), dashboards (Grafana) - expected_spans.json (17 span types, 22 attributes, 3 hierarchies) - expected_metrics.json (SpanMetrics, StatsD, Phase 9, dashboards) Task 10.5 — Performance benchmark suite: - benchmark.sh for baseline vs telemetry comparison - collect_system_metrics.sh for CPU/memory/latency sampling - Thresholds: <3% CPU, <5MB memory, <2ms RPC p99, <5% TPS, <1% consensus Task 10.6 — CI integration: - telemetry-validation.yml GitHub Actions workflow - run-full-validation.sh orchestrator script - Manual trigger + telemetry branch auto-trigger Task 10.7 — Documentation: - workload/README.md with quick start and tool reference - Updated telemetry-runbook.md with validation and benchmark sections - Updated 09-data-collection-reference.md with validation inventory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:59:16 +00:00
Pratik Mankawde	9289cb671d	Phase 9: Internal Metric Instrumentation Gap Fill (Tasks 9.1-9.10) Implement ~50 OTel metrics covering NodeStore I/O, cache hit rates, TxQ state, PerfLog per-RPC/per-job counters, CountedObject instances, and load factor breakdown via MetricsRegistry. Core implementation: - MetricsRegistry class with synchronous instruments (Counter, Histogram) for RPC and Job metrics, and ObservableGauge callbacks for cache, TxQ, CountedObject, LoadFactor, and NodeStore state polling. - ServiceRegistry extended with getMetricsRegistry() virtual method. - Application wires MetricsRegistry lifecycle (create/start/stop). - PerfLogImp instrumented to emit OTel metrics on RPC and Job events. Dashboards & observability: - 3 new Grafana dashboards: RPC Performance, Job Queue, Fee Market/TxQ. - Extended statsd-node-health dashboard with NodeStore, Cache, and CountedObject panels. - 10 alerting rules added to telemetry-runbook.md. - Integration test extended with 12 OTel metric validation checks. Documentation: - 09-data-collection-reference.md updated with Phase 9 metric tables. - Unit tests for MetricsRegistry disabled-path (no-op) behavior. All OTel SDK code guarded with #ifdef XRPL_ENABLE_TELEMETRY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:56:00 +00:00
Pratik Mankawde	b73592f934	Phase 9-11: Future enhancement plans for metric gap fill, workload validation, and third-party pipelines - Phase 9: Internal Metric Instrumentation Gap Fill (10 tasks, 12d) - MetricsRegistry class, NodeStore I/O, cache, TxQ, PerfLog, CountedObjects, load factors - Phase 10: Synthetic Workload Generation & Telemetry Validation (7 tasks, 10d) - Multi-node harness, RPC/tx generators, validation suite, benchmarks, CI - Phase 11: Third-Party Data Collection Pipelines (11 tasks, 15d) - Custom OTel Collector receiver (Go), 30 external metrics, alerting rules, 4 dashboards - Updated 06-implementation-phases.md with plan sections §6.8.2-§6.8.4, gantt, effort summary - Updated 09-data-collection-reference.md with §5b-§5d future metric definitions - Updated 08-appendix.md with Phase 9-11 glossary, task list entries, cross-reference guide, effort summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:56:00 +00:00
Pratik Mankawde	2573e956f1	Phase 8: Update documentation for log-trace correlation Task 8.6: Add Log-Trace Correlation section to telemetry-runbook.md with LogQL examples, verification steps, and troubleshooting guidance. Update 09-data-collection-reference.md section 5a from "Future" to actual implementation docs covering log format, ingestion pipeline, Grafana correlation config, and Loki backend. Add Phase 8 log correlation test section and troubleshooting to TESTING.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:55:54 +00:00
Pratik Mankawde	503d3f7d48	Phase 8: Log-trace correlation plan docs and task list - Add §6.8.1 to 06-implementation-phases.md with full Phase 8 plan (motivation, architecture, Mermaid diagrams, tasks table, exit criteria) - Add Phase8_taskList.md with per-task breakdown (8.1-8.6) - Add §5a log-trace correlation section to 09-data-collection-reference.md - Add Phase 8 row to OpenTelemetryPlan.md, update totals to 13 weeks / 8 phases - Add Phases 6-8 to Gantt chart in 06-implementation-phases.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:55:54 +00:00
Pratik Mankawde	7d51436d26	Phase 7: Native OTel metrics migration (Tasks 7.1-7.7) Replace StatsD UDP metric transport with native OpenTelemetry Metrics SDK export via OTLP/HTTP behind the existing beast::insight::Collector interface. - Task 7.1: Link opentelemetry-cpp to beast module in CMake when telemetry=ON - Task 7.2: New OTelCollector class mapping beast::insight instruments to OTel SDK (Counter, ObservableGauge, Histogram, Counter<uint64>) with OTLP/HTTP export via PeriodicMetricReader at 1s intervals - Task 7.3: Add server=otel branch to CollectorManager with endpoint config - Task 7.4: Update otel-collector-config.yaml to use OTLP receiver for metrics pipeline (StatsD receiver commented out for backward compat) - Task 7.5: Metric names preserved via dot-to-underscore formatting matching StatsD->Prometheus conventions - Task 7.6: Rename Grafana dashboards from statsd-* to system-*, update titles and UIDs from "StatsD" to "System Metrics" - Task 7.7: Update integration test to use server=otel, verify OTLP metrics - Task 7.8: Update runbook, TESTING.md, config reference, and data collection reference docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	702cf63c62	Separate plan from tasks: move Phase 7 plan into 06-implementation-phases.md, remove Phase 8 content - Move Phase 7 motivation (gains/losses/decision) and architecture (class hierarchy, data flow diagram, config) from Phase7_taskList.md into 06-implementation-phases.md §6.8 - Strip Phase7_taskList.md to tasks only (7.1-7.8 + summary table) - Remove Phase8_taskList.md — belongs on Phase 8 branch - Remove §6.8.1 (Phase 8) from 06-implementation-phases.md - Remove §5a (Phase 8 log correlation) from 09-data-collection-reference.md - Remove Phase 8 row from OpenTelemetryPlan.md phase table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	85a2220312	Phase 7-8: Plan docs for native OTel metrics migration and log-trace correlation Phase 7 (native metrics): Replace StatsDCollector with OTelCollectorImpl behind the existing beast::insight::Collector interface. Maps Counter, Gauge, Meter, Event to OTel SDK instruments. Exports via OTLP/HTTP to same collector endpoint as traces. Eliminates StatsD UDP dependency. Resolves deferred Phase 6 Task 6.1 (\|m wire format). Phase 8 (log correlation): Inject trace_id/span_id into JLOG output via Logs::format() thread-local span context read. Add Grafana Loki with OTel Collector filelog receiver for centralized log ingestion. Enable bidirectional Tempo-Loki correlation in Grafana. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	a8c2f94e8a	Remove 'rippled' prefix from dashboard titles, add new dashboards to doc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	f1025d4f71	Fix markdown formatting in data collection reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	64d8369dbc	Add consensus.accept.apply span to data collection reference Add the close time span and its 6 attributes to the Phase 4 consensus span table and attribute table in 09-data-collection-reference.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	4dcd65968f	document updates Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-03-16 16:46:36 +00:00
Pratik Mankawde	2bea046dab	Phase 6: Integrate beast::insight StatsD metrics into telemetry pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 16:46:36 +00:00

13 Commits