Pratik Mankawde
81298ceb9f
docs: add external dashboard parity tasks and metric reference for Phase 9
...
Add Tasks 9.11-9.13 (Validator Health, Peer Quality, Ledger Economy dashboards),
new metric tables in data-collection-reference, and monitoring sections in runbook
covering validation agreement, validator health, peer quality, and state tracking.
Source: external dashboard parity design spec (2026-03-30).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:49 +01:00
Pratik Mankawde
936c73982d
docs: update Phase 9 docs and dashboard for push_metrics.py parity gauges
...
- Add Task 9.7a to Phase9_taskList.md documenting new gauges
- Add metric tables to 09-data-collection-reference.md (server_info,
build_info, complete_ledgers, db_metrics, extended cache/nodestore)
- Update metric counts from ~50 to ~68 in 06-implementation-phases.md
- Add OTel MetricsRegistry gauge reference to telemetry-runbook.md
- Add 11 new panels to system-node-health.json Grafana dashboard
(server state, uptime, peers, validated seq, last close info,
build version, complete ledgers, db sizes, historical fetch rate,
peer disconnects)
- Fix leftover merge conflict marker in 08-appendix.md
- Add ripplex/mseconds to cspell dictionary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:49 +01:00
Pratik Mankawde
892fee638a
Phase 9: Metric gap fill - nodestore, cache, TxQ, load factor dashboards
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:49 +01:00
Pratik Mankawde
30c430aec8
docs(telemetry): replace Jaeger references in Phase 8 docs and runbook
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 22:31:37 +01:00
Pratik Mankawde
fdec3ce5c4
Phase 8: Log-trace correlation with Loki and filelog receiver
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:37 +01:00
Pratik Mankawde
391b8f91ce
docs: add Tasks 7.9-7.16 for external dashboard parity metrics
...
Adds ValidationTracker (agreement computation with 8s grace period),
validator health, peer quality, ledger economy, state tracking,
storage detail gauges, 7 synchronous counters, and agreement gauge.
29 new metrics covering validation agreement, peer quality, UNL health,
ledger economy, state tracking, and upgrade awareness.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:24 +01:00
Pratik Mankawde
2f7064ace6
Phase 7: Native OTel metrics migration
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:24 +01:00
Pratik Mankawde
1ef234de9d
docs(telemetry): replace Jaeger with Tempo in data collection reference
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 22:31:07 +01:00
Pratik Mankawde
a37cf74868
docs: add peerDisconnectsCharges metric to data collection reference
...
Bridge the existing beast::insight gauge for resource-limit peer
disconnects (peerDisconnectsCharges_) into the StatsD metric inventory.
Part of the external dashboard parity initiative.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:07 +01:00
Pratik Mankawde
21192e9b3f
Phase 6: StatsD metrics integration into telemetry pipeline
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:31:07 +01:00
Pratik Mankawde
87ed778efe
refactor(telemetry): migrate integration test and docs from Jaeger to Tempo API
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 22:29:30 +01:00
Pratik Mankawde
f940290866
Phase 5: Documentation, deployment configs, integration test infrastructure
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:29:30 +01:00
Pratik Mankawde
95f0c8bf51
docs: add Task 4.8 consensus validation span enrichment for external dashboard parity
...
Adds ledger_hash, validation.full to validation send/receive spans,
and validation_quorum, proposers_validated to consensus.accept spans.
Foundation for Phase 7 ValidationTracker agreement computation.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:33 +01:00
Pratik Mankawde
a127711b86
Phase 4: Consensus tracing - round lifecycle, proposals, validations, close time
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:33 +01:00
Pratik Mankawde
e6508a5bbc
docs: add Task 3.8 TX span peer version attribute for external dashboard parity
...
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:27 +01:00
Pratik Mankawde
9ab8570153
docs(telemetry): replace Jaeger references with Tempo in Phase 2-5 task lists
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 22:28:22 +01:00
Pratik Mankawde
befffc573c
docs: add Task 2.8 RPC span attribute enrichment for external dashboard parity
...
Adds node health context (amendment_blocked, server_state) to rpc.command.*
spans, inspired by the community xrpl-validator-dashboard.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:22 +01:00
Pratik Mankawde
945faac770
Phase 2: RPC tracing - span macros, attributes, WebSocket, command spans
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:22 +01:00
Pratik Mankawde
ba92ccad14
Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:22 +01:00
Pratik Mankawde
79b95c8cc6
Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:17 +01:00
Pratik Mankawde
a7470615be
Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-31 22:28:12 +01:00
Pratik Mankawde
33b09d29e1
docs(telemetry): replace Jaeger with Tempo in architecture diagram
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-03-31 22:22:34 +01:00
Pratik Mankawde
f135842071
docs: correct OTel overhead estimates against SDK benchmarks
...
Verified CPU, memory, and network overhead calculations against
official OTel C++ SDK benchmarks (969 CI runs) and source code
analysis. Key corrections:
- Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median
~1000ns; original estimate matched API no-op, not SDK path)
- Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%)
- Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper +
SpanData + std::map attribute storage)
- Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread
stack ~8MB was omitted)
- Total memory ceiling: ~2.3MB → ~10MB
- Memory success metric target: <5MB → <10MB
- AddEvent: 50-80ns → 100-200ns
Added Section 3.5.4 with links to all benchmark sources.
Updated presentation.md with matching corrections.
High-level conclusions unchanged (1-3% CPU, negligible consensus).
Also includes: review fixes, cross-document consistency improvements,
additional component tracing docs (PathFinding, TxQ, Validator, etc.),
context size corrections (32 → 25 bytes).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 15:55:26 +01:00
Pratik Mankawde
a9bc525f22
moved presentation.md file
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-03-30 15:55:26 +01:00
Pratik Mankawde
5c9102bd9a
Remove effort estimates from implementation phases document
...
Strip effort/risk columns from task tables and remove the §6.9 Effort
Summary section with its pie chart and resource requirements table.
Renumber §6.10 Quick Wins → §6.9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 15:55:26 +01:00
Pratik Mankawde
c556f3471b
Add Phase 4a implementation status to plan docs
...
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 15:55:26 +01:00
Pratik Mankawde
2fb6124412
Appendix: add 00-tracing-fundamentals.md and POC_taskList.md to document index
...
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 15:55:26 +01:00
Pratik Mankawde
e482b56f58
Phase 1a: OpenTelemetry plan documentation
...
Add comprehensive planning documentation for the OpenTelemetry
distributed tracing integration:
- Tracing fundamentals and concepts
- Architecture analysis of rippled's tracing surface area
- Design decisions and trade-offs
- Implementation strategy and code samples
- Configuration reference
- Implementation phases roadmap
- Observability backend comparison
- POC task list and presentation materials
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-30 15:55:26 +01:00