rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-28 17:40:25 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	178bc916a8	docs(telemetry): add Task 3.8 TX span peer version attribute spec Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch correlation during network upgrades. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:28:31 +01:00
Pratik Mankawde	ed8164d502	docs(telemetry): add Task 2.9 PathFind instrumentation to Phase 2 task list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:28:07 +01:00
Pratik Mankawde	eb51457e69	fix(telemetry): address Phase 2 code review findings - Move node health attribute strings to compile-time constants in SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState) - Add Tempo search filters for node health attributes - Remove unnecessary .c_str() on strOperatingMode() return - Add samplingRatio clamping test (values > 1.0 and < 0.0) - Fix Task 2.3 status: delivered in Phase 1c, not Phase 2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:28:07 +01:00
Pratik Mankawde	9bc8cc6b4e	docs(telemetry): update Phase2 task list to reflect actual implementation Mark deferred tasks (2.1→Phase 3, 2.5→low priority) with rationale. Mark superseded tasks (2.2→Phase 1c SpanGuard factory). Add Task 2.7 for Grafana search filters. Update summary table with status column. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:28:07 +01:00
Pratik Mankawde	a9ee819ea1	docs(telemetry): add Phase 2-5 task lists and appendix update Introduces task list documents for Phases 2 through 5, with Tempo references (replacing Jaeger) and Task 2.8 dashboard parity spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:28:07 +01:00
Pratik Mankawde	5e8277f36a	docs(telemetry): fix doc references to match pimpl architecture Replace references to non-existent TracingInstrumentation.h with SpanGuard.cpp pimpl implementation that actually exists on this branch. Update conditional compilation section to describe the pimpl approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:26:05 +01:00
Pratik Mankawde	e9c5c3520e	fix(telemetry): address Phase 1b code review findings Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types from public headers. Add global Telemetry accessor so SpanGuard factory methods work without explicit Telemetry references. Add child/linked span creation and cross-thread context propagation. Update plan docs to reflect macro removal in favor of SpanGuard factory pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 14:26:05 +01:00
Pratik Mankawde	26947267b1	docs(telemetry): update plan docs for FilteringSpanProcessor and discard() Add DiscardFlag.h and FilteringSpanProcessor references to the file tree, key files table, and implementation summary in OpenTelemetryPlan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 14:25:31 +01:00
Pratik Mankawde	ea921d3a02	docs(telemetry): remove remaining Jaeger references from config reference Remove duplicate otlp/tempo exporter block, duplicate tempo service definition, and jaeger dependency from docker-compose example. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 14:25:31 +01:00
Pratik Mankawde	88686af850	Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 14:25:31 +01:00
Pratik Mankawde	1fd971b78b	fix(docs): apply rename scripts to OpenTelemetry plan docs Run .github/scripts/rename/docs.sh to replace rippled → xrpld references in all plan documentation files, fixing the check-rename CI failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 13:57:38 +01:00
Pratik Mankawde	747247153b	docs(telemetry): add per-validator participation metric to Phase 7 plan Add Sub-task 7.10a: Per-Validator Validation Count (Flag Ledger Window) to the Phase 7 task list. This metric tracks how many of the last 256 ledgers each UNL validator has validated — the key participation metric for UNL health monitoring. Implementation plan: - Observable gauge rippled_validator_participation with validator label - Data from RCLValidations::getTrustedForLedger() over 256-ledger window - Emitted at flag ledger boundaries (~15 min interval) - Grafana table panel with threshold coloring (green/yellow/red) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 13:32:09 +01:00
Pratik Mankawde	193f5b39cb	docs(telemetry): update plan docs for ServiceRegistry migration Plan documents referenced Application.h and app_ for getTelemetry() but the codebase now uses ServiceRegistry as the interface. Updated: - 05-configuration-reference.md: getTelemetry() on ServiceRegistry, deferred serviceInstanceId pattern in ApplicationImp - POC_taskList.md Task 4: target ServiceRegistry.h not Application.h, correct config file path and constructor pattern - 04-code-samples.md: fix overlay() -> getOverlay(), rewrite JobQueue sample to reflect actual architecture (no app_ member) - 03-implementation-strategy.md: fix file impact table path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:37:13 +01:00
Pratik Mankawde	db8111ef7c	docs(telemetry): replace Jaeger with Tempo in architecture diagram Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:00:48 +01:00
Pratik Mankawde	913a4b794c	docs: correct OTel overhead estimates against SDK benchmarks Verified CPU, memory, and network overhead calculations against official OTel C++ SDK benchmarks (969 CI runs) and source code analysis. Key corrections: - Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median ~1000ns; original estimate matched API no-op, not SDK path) - Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%) - Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper + SpanData + std::map attribute storage) - Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread stack ~8MB was omitted) - Total memory ceiling: ~2.3MB → ~10MB - Memory success metric target: <5MB → <10MB - AddEvent: 50-80ns → 100-200ns Added Section 3.5.4 with links to all benchmark sources. Updated presentation.md with matching corrections. High-level conclusions unchanged (1-3% CPU, negligible consensus). Also includes: review fixes, cross-document consistency improvements, additional component tracing docs (PathFinding, TxQ, Validator, etc.), context size corrections (32 → 25 bytes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	accea17e9d	moved presentation.md file Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	c6fa00fbe3	Remove effort estimates from implementation phases document Strip effort/risk columns from task tables and remove the §6.9 Effort Summary section with its pie chart and resource requirements table. Renumber §6.10 Quick Wins → §6.9. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	bfb8f4f01a	Add Phase 4a implementation status to plan docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	4b745a86b7	Appendix: add 00-tracing-fundamentals.md and POC_taskList.md to document index Split document index into Plan Documents and Task Lists sections. These files were introduced in this branch but missing from the index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	ddf894dcb0	Phase 1a: OpenTelemetry plan documentation Add comprehensive planning documentation for the OpenTelemetry distributed tracing integration: - Tracing fundamentals and concepts - Architecture analysis of rippled's tracing surface area - Design decisions and trade-offs - Implementation strategy and code samples - Configuration reference - Implementation phases roadmap - Observability backend comparison - POC task list and presentation materials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	391b8f91ce	docs: add Tasks 7.9-7.16 for external dashboard parity metrics Adds ValidationTracker (agreement computation with 8s grace period), validator health, peer quality, ledger economy, state tracking, storage detail gauges, 7 synchronous counters, and agreement gauge. 29 new metrics covering validation agreement, peer quality, UNL health, ledger economy, state tracking, and upgrade awareness. Part of the external dashboard parity initiative across phases 2-11. See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:24 +01:00
Pratik Mankawde	2f7064ace6	Phase 7: Native OTel metrics migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:24 +01:00
Pratik Mankawde	1ef234de9d	docs(telemetry): replace Jaeger with Tempo in data collection reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 22:31:07 +01:00
Pratik Mankawde	a37cf74868	docs: add peerDisconnectsCharges metric to data collection reference Bridge the existing beast::insight gauge for resource-limit peer disconnects (peerDisconnectsCharges_) into the StatsD metric inventory. Part of the external dashboard parity initiative. See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:07 +01:00
Pratik Mankawde	21192e9b3f	Phase 6: StatsD metrics integration into telemetry pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:07 +01:00
Pratik Mankawde	87ed778efe	refactor(telemetry): migrate integration test and docs from Jaeger to Tempo API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 22:29:30 +01:00
Pratik Mankawde	f940290866	Phase 5: Documentation, deployment configs, integration test infrastructure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:29:30 +01:00
Pratik Mankawde	95f0c8bf51	docs: add Task 4.8 consensus validation span enrichment for external dashboard parity Adds ledger_hash, validation.full to validation send/receive spans, and validation_quorum, proposers_validated to consensus.accept spans. Foundation for Phase 7 ValidationTracker agreement computation. Part of the external dashboard parity initiative across phases 2-11. See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:33 +01:00
Pratik Mankawde	a127711b86	Phase 4: Consensus tracing - round lifecycle, proposals, validations, close time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:33 +01:00
Pratik Mankawde	e6508a5bbc	docs: add Task 3.8 TX span peer version attribute for external dashboard parity Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch correlation during network upgrades. Part of the external dashboard parity initiative across phases 2-11. See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:27 +01:00
Pratik Mankawde	9ab8570153	docs(telemetry): replace Jaeger references with Tempo in Phase 2-5 task lists Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 22:28:22 +01:00
Pratik Mankawde	befffc573c	docs: add Task 2.8 RPC span attribute enrichment for external dashboard parity Adds node health context (amendment_blocked, server_state) to rpc.command.* spans, inspired by the community xrpl-validator-dashboard. Part of the external dashboard parity initiative across phases 2-11. See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:22 +01:00
Pratik Mankawde	945faac770	Phase 2: RPC tracing - span macros, attributes, WebSocket, command spans Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:22 +01:00
Pratik Mankawde	ba92ccad14	Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:22 +01:00
Pratik Mankawde	79b95c8cc6	Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:17 +01:00
Pratik Mankawde	a7470615be	Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:12 +01:00
Pratik Mankawde	33b09d29e1	docs(telemetry): replace Jaeger with Tempo in architecture diagram Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 22:22:34 +01:00
Pratik Mankawde	f135842071	docs: correct OTel overhead estimates against SDK benchmarks Verified CPU, memory, and network overhead calculations against official OTel C++ SDK benchmarks (969 CI runs) and source code analysis. Key corrections: - Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median ~1000ns; original estimate matched API no-op, not SDK path) - Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%) - Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper + SpanData + std::map attribute storage) - Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread stack ~8MB was omitted) - Total memory ceiling: ~2.3MB → ~10MB - Memory success metric target: <5MB → <10MB - AddEvent: 50-80ns → 100-200ns Added Section 3.5.4 with links to all benchmark sources. Updated presentation.md with matching corrections. High-level conclusions unchanged (1-3% CPU, negligible consensus). Also includes: review fixes, cross-document consistency improvements, additional component tracing docs (PathFinding, TxQ, Validator, etc.), context size corrections (32 → 25 bytes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00
Pratik Mankawde	a9bc525f22	moved presentation.md file Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-03-30 15:55:26 +01:00
Pratik Mankawde	5c9102bd9a	Remove effort estimates from implementation phases document Strip effort/risk columns from task tables and remove the §6.9 Effort Summary section with its pie chart and resource requirements table. Renumber §6.10 Quick Wins → §6.9. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00
Pratik Mankawde	c556f3471b	Add Phase 4a implementation status to plan docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00
Pratik Mankawde	2fb6124412	Appendix: add 00-tracing-fundamentals.md and POC_taskList.md to document index Split document index into Plan Documents and Task Lists sections. These files were introduced in this branch but missing from the index. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00
Pratik Mankawde	e482b56f58	Phase 1a: OpenTelemetry plan documentation Add comprehensive planning documentation for the OpenTelemetry distributed tracing integration: - Tracing fundamentals and concepts - Architecture analysis of rippled's tracing surface area - Design decisions and trade-offs - Implementation strategy and code samples - Configuration reference - Implementation phases roadmap - Observability backend comparison - POC task list and presentation materials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00

1 2 3

143 Commits