Pratik Mankawde
9289cb671d
Phase 9: Internal Metric Instrumentation Gap Fill (Tasks 9.1-9.10)
...
Implement ~50 OTel metrics covering NodeStore I/O, cache hit rates,
TxQ state, PerfLog per-RPC/per-job counters, CountedObject instances,
and load factor breakdown via MetricsRegistry.
Core implementation:
- MetricsRegistry class with synchronous instruments (Counter, Histogram)
for RPC and Job metrics, and ObservableGauge callbacks for cache, TxQ,
CountedObject, LoadFactor, and NodeStore state polling.
- ServiceRegistry extended with getMetricsRegistry() virtual method.
- Application wires MetricsRegistry lifecycle (create/start/stop).
- PerfLogImp instrumented to emit OTel metrics on RPC and Job events.
Dashboards & observability:
- 3 new Grafana dashboards: RPC Performance, Job Queue, Fee Market/TxQ.
- Extended statsd-node-health dashboard with NodeStore, Cache, and
CountedObject panels.
- 10 alerting rules added to telemetry-runbook.md.
- Integration test extended with 12 OTel metric validation checks.
Documentation:
- 09-data-collection-reference.md updated with Phase 9 metric tables.
- Unit tests for MetricsRegistry disabled-path (no-op) behavior.
All OTel SDK code guarded with #ifdef XRPL_ENABLE_TELEMETRY.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-17 10:56:00 +00:00
Pratik Mankawde
b73592f934
Phase 9-11: Future enhancement plans for metric gap fill, workload validation, and third-party pipelines
...
- Phase 9: Internal Metric Instrumentation Gap Fill (10 tasks, 12d)
- MetricsRegistry class, NodeStore I/O, cache, TxQ, PerfLog, CountedObjects, load factors
- Phase 10: Synthetic Workload Generation & Telemetry Validation (7 tasks, 10d)
- Multi-node harness, RPC/tx generators, validation suite, benchmarks, CI
- Phase 11: Third-Party Data Collection Pipelines (11 tasks, 15d)
- Custom OTel Collector receiver (Go), 30 external metrics, alerting rules, 4 dashboards
- Updated 06-implementation-phases.md with plan sections §6.8.2-§6.8.4, gantt, effort summary
- Updated 09-data-collection-reference.md with §5b-§5d future metric definitions
- Updated 08-appendix.md with Phase 9-11 glossary, task list entries, cross-reference guide, effort summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-17 10:56:00 +00:00
Pratik Mankawde
2573e956f1
Phase 8: Update documentation for log-trace correlation
...
Task 8.6: Add Log-Trace Correlation section to telemetry-runbook.md
with LogQL examples, verification steps, and troubleshooting guidance.
Update 09-data-collection-reference.md section 5a from "Future" to
actual implementation docs covering log format, ingestion pipeline,
Grafana correlation config, and Loki backend. Add Phase 8 log
correlation test section and troubleshooting to TESTING.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-17 10:55:54 +00:00
Pratik Mankawde
503d3f7d48
Phase 8: Log-trace correlation plan docs and task list
...
- Add §6.8.1 to 06-implementation-phases.md with full Phase 8 plan
(motivation, architecture, Mermaid diagrams, tasks table, exit criteria)
- Add Phase8_taskList.md with per-task breakdown (8.1-8.6)
- Add §5a log-trace correlation section to 09-data-collection-reference.md
- Add Phase 8 row to OpenTelemetryPlan.md, update totals to 13 weeks / 8 phases
- Add Phases 6-8 to Gantt chart in 06-implementation-phases.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-17 10:55:54 +00:00
Pratik Mankawde
7d51436d26
Phase 7: Native OTel metrics migration (Tasks 7.1-7.7)
...
Replace StatsD UDP metric transport with native OpenTelemetry Metrics SDK
export via OTLP/HTTP behind the existing beast::insight::Collector interface.
- Task 7.1: Link opentelemetry-cpp to beast module in CMake when telemetry=ON
- Task 7.2: New OTelCollector class mapping beast::insight instruments to OTel
SDK (Counter, ObservableGauge, Histogram, Counter<uint64>) with OTLP/HTTP
export via PeriodicMetricReader at 1s intervals
- Task 7.3: Add server=otel branch to CollectorManager with endpoint config
- Task 7.4: Update otel-collector-config.yaml to use OTLP receiver for metrics
pipeline (StatsD receiver commented out for backward compat)
- Task 7.5: Metric names preserved via dot-to-underscore formatting matching
StatsD->Prometheus conventions
- Task 7.6: Rename Grafana dashboards from statsd-* to system-*, update titles
and UIDs from "StatsD" to "System Metrics"
- Task 7.7: Update integration test to use server=otel, verify OTLP metrics
- Task 7.8: Update runbook, TESTING.md, config reference, and data collection
reference docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
702cf63c62
Separate plan from tasks: move Phase 7 plan into 06-implementation-phases.md, remove Phase 8 content
...
- Move Phase 7 motivation (gains/losses/decision) and architecture (class
hierarchy, data flow diagram, config) from Phase7_taskList.md into
06-implementation-phases.md §6.8
- Strip Phase7_taskList.md to tasks only (7.1-7.8 + summary table)
- Remove Phase8_taskList.md — belongs on Phase 8 branch
- Remove §6.8.1 (Phase 8) from 06-implementation-phases.md
- Remove §5a (Phase 8 log correlation) from 09-data-collection-reference.md
- Remove Phase 8 row from OpenTelemetryPlan.md phase table
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
85a2220312
Phase 7-8: Plan docs for native OTel metrics migration and log-trace correlation
...
Phase 7 (native metrics): Replace StatsDCollector with OTelCollectorImpl
behind the existing beast::insight::Collector interface. Maps Counter,
Gauge, Meter, Event to OTel SDK instruments. Exports via OTLP/HTTP to
same collector endpoint as traces. Eliminates StatsD UDP dependency.
Resolves deferred Phase 6 Task 6.1 (|m wire format).
Phase 8 (log correlation): Inject trace_id/span_id into JLOG output
via Logs::format() thread-local span context read. Add Grafana Loki
with OTel Collector filelog receiver for centralized log ingestion.
Enable bidirectional Tempo-Loki correlation in Grafana.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
a8c2f94e8a
Remove 'rippled' prefix from dashboard titles, add new dashboards to doc
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
f1025d4f71
Fix markdown formatting in data collection reference
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
64d8369dbc
Add consensus.accept.apply span to data collection reference
...
Add the close time span and its 6 attributes to the Phase 4 consensus
span table and attribute table in 09-data-collection-reference.md.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
4dcd65968f
document updates
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-03-16 16:46:36 +00:00
Pratik Mankawde
2bea046dab
Phase 6: Integrate beast::insight StatsD metrics into telemetry pipeline
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-16 16:46:36 +00:00