# Implementation Phases > **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md) > **Related**: [Configuration Reference](./05-configuration-reference.md) | [Observability Backends](./07-observability-backends.md) --- ## 6.1 Phase Overview > **TxQ** = Transaction Queue ```mermaid gantt title OpenTelemetry Implementation Timeline dateFormat YYYY-MM-DD axisFormat Week %W section Phase 1 Core Infrastructure :p1, 2024-01-01, 2w SDK Integration :p1a, 2024-01-01, 4d Telemetry Interface :p1b, after p1a, 3d Configuration & CMake :p1c, after p1b, 3d Unit Tests :p1d, after p1c, 2d Buffer & Integration :p1e, after p1d, 2d section Phase 2 RPC Tracing :p2, after p1, 2w HTTP Context Extraction :p2a, after p1, 2d RPC Handler Instrumentation :p2b, after p2a, 4d PathFinding Instrumentation :p2f, after p2b, 2d TxQ Instrumentation :p2g, after p2f, 2d WebSocket Support :p2c, after p2g, 2d Integration Tests :p2d, after p2c, 2d Buffer & Review :p2e, after p2d, 4d section Phase 3 Transaction Tracing :p3, after p2, 2w Protocol Buffer Extension :p3a, after p2, 2d PeerImp Instrumentation :p3b, after p3a, 3d Fee Escalation Instrumentation :p3f, after p3b, 2d Relay Context Propagation :p3c, after p3f, 3d Multi-node Tests :p3d, after p3c, 2d Buffer & Review :p3e, after p3d, 4d section Phase 4 Consensus Tracing :p4, after p3, 2w Consensus Round Spans :p4a, after p3, 3d Proposal Handling :p4b, after p4a, 3d Establish Phase (4a) :p4f, after p4b, 3d Validation Tests :p4c, after p4f, 4d Buffer & Review :p4e, after p4c, 4d section Phase 5 Documentation & Deploy :p5, after p4, 1w ``` --- ## 6.2 Phase 1: Core Infrastructure (Weeks 1-2) **Objective**: Establish foundational telemetry infrastructure ### Tasks | Task | Description | | ---- | ----------------------------------------------------- | | 1.1 | Add OpenTelemetry C++ SDK to Conan/CMake | | 1.2 | Implement `Telemetry` interface and factory | | 1.3 | Implement `SpanGuard` RAII wrapper | | 1.4 | Implement configuration parser | | 1.5 | Integrate into `ApplicationImp` | | 1.6 | Add conditional compilation (`XRPL_ENABLE_TELEMETRY`) | | 1.7 | Create `NullTelemetry` no-op implementation | | 1.8 | Unit tests for core infrastructure | ### Exit Criteria - [ ] OpenTelemetry SDK compiles and links - [ ] Telemetry can be enabled/disabled via config - [ ] Basic span creation works - [ ] No performance regression when disabled - [ ] Unit tests passing --- ## 6.3 Phase 2: RPC Tracing (Weeks 3-4) > **TxQ** = Transaction Queue **Objective**: Complete tracing for all RPC operations ### Tasks | Task | Description | | ---- | -------------------------------------------------------------------------- | | 2.1 | Implement W3C Trace Context HTTP header extraction | | 2.2 | Instrument `ServerHandler::onRequest()` | | 2.3 | Instrument `RPCHandler::doCommand()` | | 2.4 | Add RPC-specific attributes | | 2.5 | Instrument WebSocket handler | | 2.6 | PathFinding instrumentation (`pathfind.request`, `pathfind.compute` spans) | | 2.7 | TxQ instrumentation (`txq.enqueue`, `txq.apply` spans) | | 2.8 | Integration tests for RPC tracing | | 2.9 | Performance benchmarks | | 2.10 | Documentation | ### Exit Criteria - [ ] All RPC commands traced - [ ] Trace context propagates from HTTP headers - [ ] WebSocket and HTTP both instrumented - [ ] <1ms overhead per RPC call - [ ] Integration tests passing --- ## 6.4 Phase 3: Transaction Tracing (Weeks 5-6) **Objective**: Trace transaction lifecycle across network with deterministic cross-node correlation ### Tasks | Task | Description | | ---- | -------------------------------------------------------------- | | 3.1 | Define `TraceContext` Protocol Buffer message | | 3.2 | Implement protobuf context serialization | | 3.3 | Instrument `PeerImp::handleTransaction()` | | 3.4 | Instrument `NetworkOPs::submitTransaction()` | | 3.5 | Instrument HashRouter integration | | 3.6 | Fee escalation instrumentation (`fee.escalate` span) | | 3.7 | Implement relay context propagation | | 3.8 | Integration tests (multi-node) | | 3.9 | Deterministic transaction trace ID (`trace_id = txHash[0:16]`) | | 3.10 | Performance benchmarks | ### Deterministic Trace ID (Task 3.9) Transaction spans use **deterministic trace IDs** derived from the transaction hash: `trace_id = txHash[0:16]`. All nodes handling the same transaction independently produce spans under the same trace_id. Protobuf `span_id` propagation (Task 3.7) additionally provides parent-child relay ordering when available. See [02-design-decisions.md §2.5.0](./02-design-decisions.md) for the design rationale and [Phase3_taskList.md Task 3.9](./Phase3_taskList.md) for the full implementation spec. ### Exit Criteria - [ ] Transaction traces span across nodes - [ ] Trace context in Protocol Buffer messages - [ ] HashRouter deduplication visible in traces - [ ] Multi-node integration tests passing - [ ] <5% overhead on transaction throughput - [ ] Deterministic trace_id: all nodes produce same trace_id for same transaction - [ ] Protobuf span_id propagation preserves parent-child ordering when available --- ## 6.5 Phase 4: Consensus Tracing (Weeks 7-8) **Objective**: Full observability into consensus rounds ### Tasks | Task | Description | Status | | ---- | ---------------------------------------------- | ------------------ | | 4.1 | Instrument `RCLConsensusAdaptor::startRound()` | ✅ Done (via 4a.2) | | 4.2 | Instrument phase transitions | ✅ Done | | 4.3 | Instrument proposal handling | ✅ Done | | 4.4 | Instrument validation handling | ✅ Done | | 4.5 | Add consensus-specific attributes | ✅ Done | | 4.6 | Correlate with transaction traces | ✅ Done | | 4.7 | Build verification and testing | ✅ Done | | 4.8 | Validation span enrichment (ext. dashboard) | ❌ Not done | **Note**: The original plan doc listed tasks 4.7-4.11 as "Validator list tracing", "Amendment voting tracing", "SHAMap sync tracing", "Multi-validator integration tests", and "Performance validation". These were descoped and replaced by the tasklist's 4.7 (build verification) and 4.8 (validation span enrichment). Validator, amendment, and SHAMap tracing are not implemented. ### Spans Produced | Span Name | Location | Attributes | | --------------------------- | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `consensus.phase.open` | `Consensus.h:707` | _(none)_ | | `consensus.proposal.send` | `RCLConsensus.cpp:232` | `xrpl.consensus.round` | | `consensus.ledger_close` | `RCLConsensus.cpp:341` | `xrpl.consensus.ledger.seq`, `xrpl.consensus.mode` | | `consensus.accept` | `RCLConsensus.cpp:492` | `xrpl.consensus.proposers`, `xrpl.consensus.round_time_ms`, `xrpl.consensus.quorum` | | `consensus.accept.apply` | `RCLConsensus.cpp:541` | `xrpl.consensus.close_time`, `close_time_correct`, `close_resolution_ms`, `state`, `proposing`, `round_time_ms`, `ledger.seq`, `parent_close_time`, `close_time_self`, `close_time_vote_bins`, `resolution_direction` | | `consensus.validation.send` | `RCLConsensus.cpp:900` | `xrpl.consensus.ledger.seq`, `xrpl.consensus.proposing` | ### Exit Criteria - [x] Complete consensus round traces - [x] Phase transitions visible (open, establish, close, accept) - [x] Proposals and validations traced — send and receive; relay deferred to Phase 4b - [x] Close time agreement tracked (per `avCT_CONSENSUS_PCT`) - [x] No impact on consensus timing - [ ] Multi-validator test network validated - [x] Transaction-consensus correlation (Task 4.6) — `tx.included` events in doAccept - [ ] Validation span enrichment (Task 4.8) — not implemented ### Implementation Status — Phase 4a Complete Phase 4a (establish-phase gap fill & cross-node correlation) adds: - **Deterministic trace ID** derived from `previousLedger.id()` so all validators in the same round share the same `trace_id` (switchable via `consensus_trace_strategy` config: `"deterministic"` or `"attribute"`). See [Configuration Reference](./05-configuration-reference.md) for full configuration options. The `consensus_trace_strategy` option will be documented in the configuration reference as part of Phase 4a implementation. - **Round lifecycle spans**: `consensus.round` with round-to-round span links. - **Establish phase**: `consensus.establish`, `consensus.update_positions` (with `dispute.resolve` events), `consensus.check` (with threshold tracking). - **Mode changes**: `consensus.mode_change` spans. - **Validation**: `consensus.validation.send` with span link to round span (thread-safe cross-thread access via `roundSpanContext_` snapshot). - **Separation of concerns**: telemetry extracted to private helpers (`startRoundTracing`, `createValidationSpan`, `startEstablishTracing`, `updateEstablishTracing`, `endEstablishTracing`). See [Phase4_taskList.md](./Phase4_taskList.md) for the full spec and implementation notes. --- ## 6.5a Phase 4a: Establish-Phase Gap Fill & Cross-Node Correlation **Objective**: Fill tracing gaps in the establish phase and establish cross-node correlation using deterministic trace IDs derived from `previousLedger.id()`. **Approach**: Direct instrumentation in `Consensus.h` and `RCLConsensus.cpp`. All spans use `SpanGuard` factory methods (`span()`, `hashSpan()`, `linkedSpan()`) with `TraceCategory::Consensus` gating. No macros used — all tracing via direct `SpanGuard` API calls. ### Tasks | Task | Description | Effort | Risk | Status | | ---- | ------------------------------------------------ | ------ | ------ | ------------------------ | | 4a.0 | Prerequisites: extend SpanGuard & Telemetry APIs | 1d | Medium | ✅ Done (no macros) | | 4a.1 | Adaptor `getTelemetry()` method | 0.5d | Low | ⏭️ Skipped (not needed) | | 4a.2 | Switchable round span with deterministic traceID | 2d | High | ✅ Done | | 4a.3 | Span members in `Consensus.h` | 0.5d | Medium | ✅ Done (with deviation) | | 4a.4 | Instrument `phaseEstablish()` | 1d | Medium | ✅ Done | | 4a.5 | Instrument `updateOurPositions()` | 1d | Medium | ✅ Done | | 4a.6 | Instrument `haveConsensus()` (thresholds) | 1d | Medium | ✅ Done | | 4a.7 | Instrument mode changes | 0.5d | Low | ✅ Done | | 4a.8 | Reparent existing spans under round | 0.5d | Low | ✅ Done | | 4a.9 | Build verification and testing | 1d | Low | ✅ Done | **Total Effort**: 9 days ### Spans Produced | Span Name | Location | Key Attributes (actually set) | | ---------------------------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------- | | `consensus.round` | `RCLConsensus.cpp` | `round_id`, `ledger_id`, `ledger.seq`, `mode`, `trace_strategy` | | `consensus.establish` | `Consensus.h` | `converge_percent`, `establish_count`, `proposers` | | `consensus.update_positions` | `Consensus.h` | `converge_percent`, `proposers`, `have_close_time_consensus`, `close_time_threshold`, `disputes_count`, `avalanche_threshold` | | `consensus.check` | `Consensus.h` | `agree/disagree_count`, `converge_percent`, `have_close_time_consensus`, `threshold_percent`, `result` | | `consensus.mode_change` | `RCLConsensus.cpp` | `mode.old`, `mode.new` | ### Exit Criteria - [x] Establish phase internals traced (establish, update_positions, check spans) - [x] Establish phase fully traced — `disputes_count`, `avalanche_threshold`, dispute `yays`/`nays` all implemented - [x] Cross-node correlation works via deterministic trace_id - [x] Strategy switchable via config (`deterministic` / `attribute`) - [x] Consecutive rounds linked via follows-from spans - [x] Build passes with telemetry ON and OFF - [x] No impact on consensus timing See [Phase4_taskList.md](./Phase4_taskList.md) for full task details. --- ## 6.5b Phase 4b: Cross-Node Propagation (Future) **Objective**: Wire `TraceContextPropagator` for P2P messages (proposals, validations) to enable true distributed tracing between nodes. **Status**: Design documented, NOT implemented. Protobuf fields (field 1001) and `TraceContextPropagator` free functions exist. Wiring deferred until Phase 4a is validated in a multi-node environment. **Prerequisites**: Phase 4a complete and validated. See [Phase4_taskList.md § Phase 4b](./Phase4_taskList.md) for full design. --- ## 6.6 Phase 5: Documentation & Deployment (Week 9) **Objective**: Production readiness ### Tasks | Task | Description | | ---- | ----------------------------- | | 5.1 | Operator runbook | | 5.2 | Grafana dashboards | | 5.3 | Alert definitions | | 5.4 | Collector deployment examples | | 5.5 | Developer documentation | | 5.6 | Training materials | | 5.7 | Final integration testing | --- ## 6.7 Risk Assessment ```mermaid quadrantChart title Risk Assessment Matrix x-axis Low Impact --> High Impact y-axis Low Likelihood --> High Likelihood quadrant-1 Mitigate Immediately quadrant-2 Plan Mitigation quadrant-3 Accept Risk quadrant-4 Monitor Closely SDK Compat: [0.2, 0.18] Protocol Chg: [0.75, 0.72] Perf Overhead: [0.58, 0.42] Context Prop: [0.4, 0.55] Memory Leaks: [0.85, 0.25] ``` ### Risk Details | Risk | Likelihood | Impact | Mitigation | | ------------------------------------ | ---------- | ------ | --------------------------------------- | | Protocol changes break compatibility | Medium | High | Use high field numbers, optional fields | | Performance overhead unacceptable | Medium | Medium | Sampling, conditional compilation | | Context propagation complexity | Medium | Medium | Phased rollout, extensive testing | | SDK compatibility issues | Low | Medium | Pin SDK version, fallback to no-op | | Memory leaks in long-running nodes | Low | High | Memory profiling, bounded queues | --- ## 6.8 Success Metrics | Metric | Target | Measurement | | ------------------------ | -------------------------------------------------------------- | --------------------- | | Trace coverage | >95% of transaction code paths (independent of sampling ratio) | Sampling verification | | CPU overhead | <3% | Benchmark tests | | Memory overhead | <10 MB | Memory profiling | | Latency impact (p99) | <2% | Performance tests | | Trace completeness | >99% spans with required attrs | Validation script | | Cross-node trace linkage | >90% of multi-hop transactions | Integration tests | --- ## 6.9 Quick Wins and Crawl-Walk-Run Strategy > **TxQ** = Transaction Queue This section outlines a prioritized approach to maximize ROI with minimal initial investment. ### 6.9.1 Crawl-Walk-Run Overview