Commit Graph

23 Commits

Author SHA1 Message Date
Pratik Mankawde
92d109ce16 docs: add external dashboard parity tasks and metric reference for Phase 9
Add Tasks 9.11-9.13 (Validator Health, Peer Quality, Ledger Economy dashboards),
new metric tables in data-collection-reference, and monitoring sections in runbook
covering validation agreement, validator health, peer quality, and state tracking.

Source: external dashboard parity design spec (2026-03-30).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:40 +01:00
Pratik Mankawde
6738f8b9ab docs: update Phase 9 docs and dashboard for push_metrics.py parity gauges
- Add Task 9.7a to Phase9_taskList.md documenting new gauges
- Add metric tables to 09-data-collection-reference.md (server_info,
  build_info, complete_ledgers, db_metrics, extended cache/nodestore)
- Update metric counts from ~50 to ~68 in 06-implementation-phases.md
- Add OTel MetricsRegistry gauge reference to telemetry-runbook.md
- Add 11 new panels to system-node-health.json Grafana dashboard
  (server state, uptime, peers, validated seq, last close info,
  build version, complete ledgers, db sizes, historical fetch rate,
  peer disconnects)
- Fix leftover merge conflict marker in 08-appendix.md
- Add ripplex/mseconds to cspell dictionary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:40 +01:00
Pratik Mankawde
43d36ff4f0 Phase 9: Metric gap fill - nodestore, cache, TxQ, load factor dashboards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:40 +01:00
Pratik Mankawde
6916734eae Phase 8: Log-trace correlation with Loki and filelog receiver
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:40 +01:00
Pratik Mankawde
fe10835a7c docs: add Tasks 7.9-7.16 for external dashboard parity metrics
Adds ValidationTracker (agreement computation with 8s grace period),
validator health, peer quality, ledger economy, state tracking,
storage detail gauges, 7 synchronous counters, and agreement gauge.

29 new metrics covering validation agreement, peer quality, UNL health,
ledger economy, state tracking, and upgrade awareness.

Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:39 +01:00
Pratik Mankawde
4137495282 Phase 7: Native OTel metrics migration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:39 +01:00
Pratik Mankawde
7e808806a9 docs: add peerDisconnectsCharges metric to data collection reference
Bridge the existing beast::insight gauge for resource-limit peer
disconnects (peerDisconnectsCharges_) into the StatsD metric inventory.

Part of the external dashboard parity initiative.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:39 +01:00
Pratik Mankawde
7ad43d4c21 Phase 6: StatsD metrics integration into telemetry pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:39 +01:00
Pratik Mankawde
c07dd573fe Phase 5: Documentation, deployment configs, integration test infrastructure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 13:50:38 +01:00
Pratik Mankawde
14770376c3 docs: add Task 4.8 consensus validation span enrichment for external dashboard parity
Adds ledger_hash, validation.full to validation send/receive spans,
and validation_quorum, proposers_validated to consensus.accept spans.
Foundation for Phase 7 ValidationTracker agreement computation.

Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 13:09:49 +01:00
Pratik Mankawde
69d4b77abf Phase 4: Consensus tracing - round lifecycle, proposals, validations, close time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 13:09:49 +01:00
Pratik Mankawde
8dcb03d73b docs: add Task 3.8 TX span peer version attribute for external dashboard parity
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.

Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 13:09:40 +01:00
Pratik Mankawde
439264dc79 docs: add Task 2.8 RPC span attribute enrichment for external dashboard parity
Adds node health context (amendment_blocked, server_state) to rpc.command.*
spans, inspired by the community xrpl-validator-dashboard.

Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 16:07:10 +01:00
Pratik Mankawde
ab6946319c Phase 2: RPC tracing - span macros, attributes, WebSocket, command spans
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 16:07:10 +01:00
Pratik Mankawde
a00c59eb7e Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 16:04:31 +01:00
Pratik Mankawde
405886719c Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 16:01:24 +01:00
Pratik Mankawde
0f0c188111 Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:58:38 +01:00
Pratik Mankawde
f135842071 docs: correct OTel overhead estimates against SDK benchmarks
Verified CPU, memory, and network overhead calculations against
official OTel C++ SDK benchmarks (969 CI runs) and source code
analysis. Key corrections:

- Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median
  ~1000ns; original estimate matched API no-op, not SDK path)
- Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%)
- Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper +
  SpanData + std::map attribute storage)
- Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread
  stack ~8MB was omitted)
- Total memory ceiling: ~2.3MB → ~10MB
- Memory success metric target: <5MB → <10MB
- AddEvent: 50-80ns → 100-200ns

Added Section 3.5.4 with links to all benchmark sources.
Updated presentation.md with matching corrections.
High-level conclusions unchanged (1-3% CPU, negligible consensus).

Also includes: review fixes, cross-document consistency improvements,
additional component tracing docs (PathFinding, TxQ, Validator, etc.),
context size corrections (32 → 25 bytes).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:55:26 +01:00
Pratik Mankawde
a9bc525f22 moved presentation.md file
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-03-30 15:55:26 +01:00
Pratik Mankawde
5c9102bd9a Remove effort estimates from implementation phases document
Strip effort/risk columns from task tables and remove the §6.9 Effort
Summary section with its pie chart and resource requirements table.
Renumber §6.10 Quick Wins → §6.9.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:55:26 +01:00
Pratik Mankawde
c556f3471b Add Phase 4a implementation status to plan docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:55:26 +01:00
Pratik Mankawde
2fb6124412 Appendix: add 00-tracing-fundamentals.md and POC_taskList.md to document index
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:55:26 +01:00
Pratik Mankawde
e482b56f58 Phase 1a: OpenTelemetry plan documentation
Add comprehensive planning documentation for the OpenTelemetry
distributed tracing integration:

- Tracing fundamentals and concepts
- Architecture analysis of rippled's tracing surface area
- Design decisions and trade-offs
- Implementation strategy and code samples
- Configuration reference
- Implementation phases roadmap
- Observability backend comparison
- POC task list and presentation materials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 15:55:26 +01:00