updated docs

Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
updated few things
2026-02-18 21:02:29 +00:00 · 2026-02-18 14:13:15 +00:00 · 2026-02-17 16:30:00 +00:00 · 2026-02-17 12:11:15 +00:00
11 changed files with 5130 additions and 0 deletions
--- a/OpenTelemetryPlan/00-tracing-fundamentals.md
+++ b/OpenTelemetryPlan/00-tracing-fundamentals.md
@@ -0,0 +1,239 @@
+# Distributed Tracing Fundamentals
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Next**: [Architecture Analysis](./01-architecture-analysis.md)
+
+---
+
+## What is Distributed Tracing?
+
+Distributed tracing is a method for tracking data objects as they flow through distributed systems. In a network like XRP Ledger, a single transaction touches multiple independent nodes—each with no shared memory or logging. Distributed tracing connects these dots.
+
+**Without tracing:** You see isolated logs on each node with no way to correlate them.
+
+**With tracing:** You see the complete journey of a transaction or an event across all nodes it touched.
+
+---
+
+## Core Concepts
+
+### 1. Trace
+
+A **trace** represents the entire journey of a request through the system. It has a unique `trace_id` that stays constant across all nodes.
+
+```
+Trace ID: abc123
+├── Node A: received transaction
+├── Node B: relayed transaction
+├── Node C: included in consensus
+└── Node D: applied to ledger
+```
+
+### 2. Span
+
+A **span** represents a single unit of work within a trace. Each span has:
+
+| Attribute        | Description           | Example                    |
+| ---------------- | --------------------- | -------------------------- |
+| `trace_id`       | Links to parent trace | `abc123`                   |
+| `span_id`        | Unique identifier     | `span456`                  |
+| `parent_span_id` | Parent span (if any)  | `p_span123`                |
+| `name`           | Operation name        | `rpc.submit`               |
+| `start_time`     | When work began       | `2024-01-15T10:30:00Z`     |
+| `end_time`       | When work completed   | `2024-01-15T10:30:00.050Z` |
+| `attributes`     | Key-value metadata    | `tx.hash=ABC...`           |
+| `status`         | OK, ERROR MSG         | `OK`                       |
+
+### 3. Trace Context
+
+**Trace context** is the data that propagates between services to link spans together. It contains:
+
+- `trace_id` - The trace this span belongs to
+- `span_id` - The current span (becomes parent for child spans)
+- `trace_flags` - Sampling decisions
+
+---
+
+## How Spans Form a Trace
+
+Spans have parent-child relationships forming a tree structure:
+
+```mermaid
+flowchart TB
+    subgraph trace["Trace: abc123"]
+        A["tx.submit<br/>span_id: 001<br/>50ms"] --> B["tx.validate<br/>span_id: 002<br/>5ms"]
+        A --> C["tx.relay<br/>span_id: 003<br/>10ms"]
+        A --> D["tx.apply<br/>span_id: 004<br/>30ms"]
+        D --> E["ledger.update<br/>span_id: 005<br/>20ms"]
+    end
+
+    style A fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style B fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style C fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style D fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style E fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+The same trace visualized as a **timeline (Gantt chart)**:
+
+```
+Time →   0ms    10ms    20ms    30ms    40ms    50ms
+         ├───────────────────────────────────────────┤
+tx.submit│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+         ├─────┤
+tx.valid │▓▓▓▓▓│
+         │     ├──────────┤
+tx.relay │     │▓▓▓▓▓▓▓▓▓▓│
+         │               ├────────────────────────────┤
+tx.apply │               │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+         │                         ├──────────────────┤
+ledger   │                         │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+```
+
+---
+
+## Distributed Traces Across Nodes
+
+In distributed systems like rippled, traces span **multiple independent nodes**. The trace context must be propagated in network messages:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant NodeA as Node A
+    participant NodeB as Node B
+    participant NodeC as Node C
+
+    Client->>NodeA: Submit TX<br/>(no trace context)
+
+    Note over NodeA: Creates new trace<br/>trace_id: abc123<br/>span: tx.receive
+
+    NodeA->>NodeB: Relay TX<br/>(trace_id: abc123, parent: 001)
+
+    Note over NodeB: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001
+
+    NodeA->>NodeC: Relay TX<br/>(trace_id: abc123, parent: 001)
+
+    Note over NodeC: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001
+
+    Note over NodeA,NodeC: All spans share trace_id: abc123<br/>enabling correlation across nodes
+```
+
+---
+
+## Context Propagation
+
+For traces to work across nodes, **trace context must be propagated** in messages.
+
+### What's in the Context (32 bytes)
+
+| Field         | Size       | Description                                             |
+| ------------- | ---------- | ------------------------------------------------------- |
+| `trace_id`    | 16 bytes   | Identifies the entire trace (constant across all nodes) |
+| `span_id`     | 8 bytes    | The sender's current span (becomes parent on receiver)  |
+| `trace_flags` | 4 bytes    | Sampling decision flags                                 |
+| `trace_state` | ~0-4 bytes | Optional vendor-specific data                           |
+
+### How span_id Changes at Each Hop
+
+Only **one** `span_id` travels in the context - the sender's current span. Each node:
+1. Extracts the received `span_id` and uses it as the `parent_span_id`
+2. Creates a **new** `span_id` for its own span
+3. Sends its own `span_id` as the parent when forwarding
+
+```
+Node A                      Node B                      Node C
+──────                      ──────                      ──────
+
+Span AAA                    Span BBB                    Span CCC
+   │                           │                           │
+   ▼                           ▼                           ▼
+Context out:                Context out:                Context out:
+├─ trace_id: abc123         ├─ trace_id: abc123         ├─ trace_id: abc123
+├─ span_id: AAA ──────────► ├─ span_id: BBB ──────────► ├─ span_id: CCC ──────►
+└─ flags: 01                └─ flags: 01                └─ flags: 01
+                               │                           │
+                          parent = AAA               parent = BBB
+```
+
+The `trace_id` stays constant, but `span_id` **changes at every hop** to maintain the parent-child chain.
+
+### Propagation Formats
+
+There are two patterns:
+
+### HTTP/RPC Headers (W3C Trace Context)
+
+```
+traceparent: 00-abc123def456-span789-01
+             │  │             │      │
+             │  │             │      └── Flags (sampled)
+             │  │             └── Parent span ID
+             │  └── Trace ID
+             └── Version
+```
+
+### Protocol Buffers (rippled P2P messages)
+
+```protobuf
+message TMTransaction {
+    bytes rawTransaction = 1;
+    // ... existing fields ...
+
+    // Trace context extension
+    bytes trace_parent = 100;  // W3C traceparent
+    bytes trace_state = 101;   // W3C tracestate
+}
+```
+
+---
+
+## Sampling
+
+Not every trace needs to be recorded. **Sampling** reduces overhead:
+
+### Head Sampling (at trace start)
+```
+Request arrives → Random 10% chance → Record or skip entire trace
+```
+- ✅ Low overhead
+- ❌ May miss interesting traces
+
+### Tail Sampling (after trace completes)
+```
+Trace completes → Collector evaluates:
+                  - Error? → KEEP
+                  - Slow? → KEEP
+                  - Normal? → Sample 10%
+```
+- ✅ Never loses important traces
+- ❌ Higher memory usage at collector
+
+---
+
+## Key Benefits for rippled
+
+| Challenge                          | How Tracing Helps                        |
+| ---------------------------------- | ---------------------------------------- |
+| "Where is my transaction?"         | Follow trace across all nodes it touched |
+| "Why was consensus slow?"          | See timing breakdown of each phase       |
+| "Which node is the bottleneck?"    | Compare span durations across nodes      |
+| "What happened during the outage?" | Correlate errors across the network      |
+
+---
+
+## Glossary
+
+| Term                | Definition                                                      |
+| ------------------- | --------------------------------------------------------------- |
+| **Trace**           | Complete journey of a request, identified by `trace_id`         |
+| **Span**            | Single operation within a trace                                 |
+| **Context**         | Data propagated between services (`trace_id`, `span_id`, flags) |
+| **Instrumentation** | Code that creates spans and propagates context                  |
+| **Collector**       | Service that receives, processes, and exports traces            |
+| **Backend**         | Storage/visualization system (Jaeger, Tempo, etc.)              |
+| **Head Sampling**   | Sampling decision at trace start                                |
+| **Tail Sampling**   | Sampling decision after trace completes                         |
+
+---
+
+*Next: [Architecture Analysis](./01-architecture-analysis.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/01-architecture-analysis.md
+++ b/OpenTelemetryPlan/01-architecture-analysis.md
@@ -0,0 +1,328 @@
+# Architecture Analysis
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Design Decisions](./02-design-decisions.md) | [Implementation Strategy](./03-implementation-strategy.md)
+
+---
+
+## 1.1 Current rippled Architecture Overview
+
+The rippled node software consists of several interconnected components that need instrumentation for distributed tracing:
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        subgraph services["Core Services"]
+            RPC["RPC Server<br/>(HTTP/WS/gRPC)"]
+            Overlay["Overlay<br/>(P2P Network)"]
+            Consensus["Consensus<br/>(RCLConsensus)"]
+        end
+
+        JobQueue["JobQueue<br/>(Thread Pool)"]
+
+        subgraph processing["Processing Layer"]
+            NetworkOPs["NetworkOPs<br/>(Tx Processing)"]
+            LedgerMaster["LedgerMaster<br/>(Ledger Mgmt)"]
+            NodeStore["NodeStore<br/>(Database)"]
+        end
+
+        subgraph observability["Existing Observability"]
+            PerfLog["PerfLog<br/>(JSON)"]
+            Insight["Insight<br/>(StatsD)"]
+            Logging["Logging<br/>(Journal)"]
+        end
+
+        services --> JobQueue
+        JobQueue --> processing
+    end
+
+    style rippled fill:#424242,stroke:#212121,color:#ffffff
+    style services fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style processing fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style observability fill:#e65100,stroke:#bf360c,color:#ffffff
+```
+
+---
+
+## 1.2 Key Components for Instrumentation
+
+| Component         | Location                                   | Purpose                  | Trace Value                  |
+| ----------------- | ------------------------------------------ | ------------------------ | ---------------------------- |
+| **Overlay**       | `src/xrpld/overlay/`                       | P2P communication        | Message propagation timing   |
+| **PeerImp**       | `src/xrpld/overlay/detail/PeerImp.cpp`     | Individual peer handling | Per-peer latency             |
+| **RCLConsensus**  | `src/xrpld/app/consensus/RCLConsensus.cpp` | Consensus algorithm      | Round timing, phase analysis |
+| **NetworkOPs**    | `src/xrpld/app/misc/NetworkOPs.cpp`        | Transaction processing   | Tx lifecycle tracking        |
+| **ServerHandler** | `src/xrpld/rpc/detail/ServerHandler.cpp`   | RPC entry point          | Request latency              |
+| **RPCHandler**    | `src/xrpld/rpc/detail/RPCHandler.cpp`      | Command execution        | Per-command timing           |
+| **JobQueue**      | `src/xrpl/core/JobQueue.h`                 | Async task execution     | Queue wait times             |
+
+---
+
+## 1.3 Transaction Flow Diagram
+
+Transaction flow spans multiple nodes in the network. Each node creates linked spans to form a distributed trace:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant PeerA as Peer A (Receive)
+    participant PeerB as Peer B (Relay)
+    participant PeerC as Peer C (Validate)
+
+    Client->>PeerA: 1. Submit TX
+
+    rect rgb(230, 245, 255)
+        Note over PeerA: tx.receive SPAN START
+        PeerA->>PeerA: HashRouter Deduplication
+        PeerA->>PeerA: tx.validate (child span)
+    end
+
+    PeerA->>PeerB: 2. Relay TX (with trace ctx)
+
+    rect rgb(230, 245, 255)
+        Note over PeerB: tx.receive (linked span)
+    end
+
+    PeerB->>PeerC: 3. Relay TX
+
+    rect rgb(230, 245, 255)
+        Note over PeerC: tx.receive (linked span)
+        PeerC->>PeerC: tx.process
+    end
+
+    Note over Client,PeerC: DISTRIBUTED TRACE (same trace_id: abc123)
+```
+
+### Trace Structure
+
+```
+trace_id: abc123
+├── span: tx.receive (Peer A)
+│   ├── span: tx.validate
+│   └── span: tx.relay
+├── span: tx.receive (Peer B) [parent: Peer A]
+│   └── span: tx.relay
+└── span: tx.receive (Peer C) [parent: Peer B]
+    └── span: tx.process
+```
+
+---
+
+## 1.4 Consensus Round Flow
+
+Consensus rounds are multi-phase operations that benefit significantly from tracing:
+
+```mermaid
+flowchart TB
+    subgraph round["consensus.round (root span)"]
+        attrs["Attributes:<br/>xrpl.consensus.ledger.seq = 12345678<br/>xrpl.consensus.mode = proposing<br/>xrpl.consensus.proposers = 35"]
+
+        subgraph open["consensus.phase.open"]
+            open_desc["Duration: ~3s<br/>Waiting for transactions"]
+        end
+
+        subgraph establish["consensus.phase.establish"]
+            est_attrs["proposals_received = 28<br/>disputes_resolved = 3"]
+            est_children["├── consensus.proposal.receive (×28)<br/>├── consensus.proposal.send (×1)<br/>└── consensus.dispute.resolve (×3)"]
+        end
+
+        subgraph accept["consensus.phase.accept"]
+            acc_attrs["transactions_applied = 150<br/>ledger.hash = DEF456..."]
+            acc_children["├── ledger.build<br/>└── ledger.validate"]
+        end
+
+        attrs --> open
+        open --> establish
+        establish --> accept
+    end
+
+    style round fill:#f57f17,stroke:#e65100,color:#ffffff
+    style open fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style establish fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style accept fill:#c2185b,stroke:#880e4f,color:#ffffff
+```
+
+---
+
+## 1.5 RPC Request Flow
+
+RPC requests support W3C Trace Context headers for distributed tracing across services:
+
+```mermaid
+flowchart TB
+    subgraph request["rpc.request (root span)"]
+        http["HTTP Request<br/>POST /<br/>traceparent: 00-abc123...-def456...-01"]
+
+        attrs["Attributes:<br/>http.method = POST<br/>net.peer.ip = 192.168.1.100<br/>xrpl.rpc.command = submit"]
+
+        subgraph enqueue["jobqueue.enqueue"]
+            job_attr["xrpl.job.type = jtCLIENT_RPC"]
+        end
+
+        subgraph command["rpc.command.submit"]
+            cmd_attrs["xrpl.rpc.version = 2<br/>xrpl.rpc.role = user"]
+            cmd_children["├── tx.deserialize<br/>├── tx.validate_local<br/>└── tx.submit_to_network"]
+        end
+
+        response["Response: 200 OK<br/>Duration: 45ms"]
+
+        http --> attrs
+        attrs --> enqueue
+        enqueue --> command
+        command --> response
+    end
+
+    style request fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style enqueue fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style command fill:#e65100,stroke:#bf360c,color:#ffffff
+```
+
+---
+
+## 1.6 Key Trace Points
+
+The following table identifies priority instrumentation points across the codebase:
+
+| Category        | Span Name              | File                 | Method                 | Priority |
+| --------------- | ---------------------- | -------------------- | ---------------------- | -------- |
+| **Transaction** | `tx.receive`           | `PeerImp.cpp`        | `handleTransaction()`  | High     |
+| **Transaction** | `tx.validate`          | `NetworkOPs.cpp`     | `processTransaction()` | High     |
+| **Transaction** | `tx.process`           | `NetworkOPs.cpp`     | `doTransactionSync()`  | High     |
+| **Transaction** | `tx.relay`             | `OverlayImpl.cpp`    | `relay()`              | Medium   |
+| **Consensus**   | `consensus.round`      | `RCLConsensus.cpp`   | `startRound()`         | High     |
+| **Consensus**   | `consensus.phase.*`    | `Consensus.h`        | `timerEntry()`         | High     |
+| **Consensus**   | `consensus.proposal.*` | `RCLConsensus.cpp`   | `peerProposal()`       | Medium   |
+| **RPC**         | `rpc.request`          | `ServerHandler.cpp`  | `onRequest()`          | High     |
+| **RPC**         | `rpc.command.*`        | `RPCHandler.cpp`     | `doCommand()`          | High     |
+| **Peer**        | `peer.connect`         | `OverlayImpl.cpp`    | `onHandoff()`          | Low      |
+| **Peer**        | `peer.message.*`       | `PeerImp.cpp`        | `onMessage()`          | Low      |
+| **Ledger**      | `ledger.acquire`       | `InboundLedgers.cpp` | `acquire()`            | Medium   |
+| **Ledger**      | `ledger.build`         | `RCLConsensus.cpp`   | `buildLCL()`           | High     |
+
+---
+
+## 1.7 Instrumentation Priority
+
+```mermaid
+quadrantChart
+    title Instrumentation Priority Matrix
+    x-axis Low Complexity --> High Complexity
+    y-axis Low Value --> High Value
+    quadrant-1 Implement First
+    quadrant-2 Plan Carefully
+    quadrant-3 Quick Wins
+    quadrant-4 Consider Later
+
+    RPC Tracing: [0.3, 0.85]
+    Transaction Tracing: [0.65, 0.92]
+    Consensus Tracing: [0.75, 0.87]
+    Peer Message Tracing: [0.4, 0.3]
+    Ledger Acquisition: [0.5, 0.6]
+    JobQueue Tracing: [0.35, 0.5]
+```
+
+---
+
+## 1.8 Observable Outcomes
+
+After implementing OpenTelemetry, operators and developers will gain visibility into the following:
+
+### 1.8.1 What You Will See: Traces
+
+| Trace Type                 | Description                                                                                 | Example Query in Grafana/Tempo                         |
+| -------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
+| **Transaction Lifecycle**  | Full journey from RPC submission through validation, relay, consensus, and ledger inclusion | `{service.name="rippled" && xrpl.tx.hash="ABC123..."}` |
+| **Cross-Node Propagation** | Transaction path across multiple rippled nodes with timing                                  | `{xrpl.tx.relay_count > 0}`                            |
+| **Consensus Rounds**       | Complete round with all phases (open, establish, accept)                                    | `{span.name=~"consensus.round.*"}`                     |
+| **RPC Request Processing** | Individual command execution with timing breakdown                                          | `{xrpl.rpc.command="account_info"}`                    |
+| **Ledger Acquisition**     | Peer-to-peer ledger data requests and responses                                             | `{span.name="ledger.acquire"}`                         |
+
+### 1.8.2 What You Will See: Metrics (Derived from Traces)
+
+| Metric                        | Description                            | Dashboard Panel             |
+| ----------------------------- | -------------------------------------- | --------------------------- |
+| **RPC Latency (p50/p95/p99)** | Response time distribution per command | Heatmap by command          |
+| **Transaction Throughput**    | Transactions processed per second      | Time series graph           |
+| **Consensus Round Duration**  | Time to complete consensus phases      | Histogram                   |
+| **Cross-Node Latency**        | Time for transaction to reach N nodes  | Line chart with percentiles |
+| **Error Rate**                | Failed transactions/RPC calls by type  | Stacked bar chart           |
+
+### 1.8.3 Concrete Dashboard Examples
+
+**Transaction Trace View (Jaeger/Tempo):**
+```
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Trace: abc123... (Transaction Submission)                    Duration: 847ms   │
+├────────────────────────────────────────────────────────────────────────────────┤
+│ ├── rpc.request [ServerHandler]                              ████░░░░░░  45ms  │
+│ │   └── rpc.command.submit [RPCHandler]                      ████░░░░░░  42ms  │
+│ │       └── tx.receive [NetworkOPs]                          ███░░░░░░░  35ms  │
+│ │           ├── tx.validate [TxQ]                            █░░░░░░░░░   8ms  │
+│ │           └── tx.relay [Overlay]                           ██░░░░░░░░  15ms  │
+│ │               ├── tx.receive [Node-B]                      █████░░░░░  52ms  │
+│ │               │   └── tx.relay [Node-B]                    ██░░░░░░░░  18ms  │
+│ │               └── tx.receive [Node-C]                      ██████░░░░  65ms  │
+│ └── consensus.round [RCLConsensus]                           ████████░░ 720ms  │
+│     ├── consensus.phase.open                                 ██░░░░░░░░ 180ms  │
+│     ├── consensus.phase.establish                            █████░░░░░ 480ms  │
+│     └── consensus.phase.accept                               █░░░░░░░░░  60ms  │
+└────────────────────────────────────────────────────────────────────────────────┘
+```
+
+**RPC Performance Dashboard Panel:**
+```
+┌─────────────────────────────────────────────────────────────┐
+│ RPC Command Latency (Last 1 Hour)                           │
+├─────────────────────────────────────────────────────────────┤
+│ Command          │ p50    │ p95    │ p99    │ Errors │ Rate │
+│──────────────────┼────────┼────────┼────────┼────────┼──────│
+│ account_info     │  12ms  │  45ms  │  89ms  │  0.1%  │ 150/s│
+│ submit           │  35ms  │ 120ms  │ 250ms  │  2.3%  │  45/s│
+│ ledger           │   8ms  │  25ms  │  55ms  │  0.0%  │  80/s│
+│ tx               │  15ms  │  50ms  │ 100ms  │  0.5%  │  60/s│
+│ server_info      │   5ms  │  12ms  │  20ms  │  0.0%  │ 200/s│
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Consensus Health Dashboard Panel:**
+
+```mermaid
+---
+config:
+    xyChart:
+        width: 1200
+        height: 400
+        plotReservedSpacePercent: 50
+        chartOrientation: vertical
+    themeVariables:
+        xyChart:
+            plotColorPalette: "#3498db"
+---
+xychart-beta
+    title "Consensus Round Duration (Last 24 Hours)"
+    x-axis "Time of Day (Hours)" [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]
+    y-axis "Duration (seconds)" 1 --> 5
+    line [2.1, 2.3, 2.5, 2.4, 2.8, 1.6, 3.2, 3.0, 3.5, 1.3, 3.8, 3.6, 4.0, 3.2, 4.3, 4.1, 4.5, 4.3, 4.2, 2.4, 4.8, 4.6, 4.9, 4.7, 5.0, 4.9, 4.8, 2.6, 4.7, 4.5, 4.2, 4.0, 2.5, 3.7, 3.2, 3.4, 2.9, 3.1, 2.6, 2.8, 2.3, 1.5, 2.7, 2.4, 2.5, 2.3, 2.2, 2.1, 2.0]
+```
+
+### 1.8.4 Operator Actionable Insights
+
+| Scenario              | What You'll See                                                              | Action                           |
+| --------------------- | ---------------------------------------------------------------------------- | -------------------------------- |
+| **Slow RPC**          | Span showing which phase is slow (parsing, execution, serialization)         | Optimize specific code path      |
+| **Transaction Stuck** | Trace stops at validation; error attribute shows reason                      | Fix transaction parameters       |
+| **Consensus Delay**   | Phase.establish taking too long; proposer attribute shows missing validators | Investigate network connectivity |
+| **Memory Spike**      | Large batch of spans correlating with memory increase                        | Tune batch_size or sampling      |
+| **Network Partition** | Traces missing cross-node links for specific peer                            | Check peer connectivity          |
+
+### 1.8.5 Developer Debugging Workflow
+
+1. **Find Transaction**: Query by `xrpl.tx.hash` to get full trace
+2. **Identify Bottleneck**: Look at span durations to find slowest component
+3. **Check Attributes**: Review `xrpl.tx.validity`, `xrpl.rpc.status` for errors
+4. **Correlate Logs**: Use `trace_id` to find related PerfLog entries
+5. **Compare Nodes**: Filter by `service.instance.id` to compare behavior across nodes
+
+---
+
+*Next: [Design Decisions](./02-design-decisions.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/02-design-decisions.md
+++ b/OpenTelemetryPlan/02-design-decisions.md
@@ -0,0 +1,485 @@
+# Design Decisions
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Architecture Analysis](./01-architecture-analysis.md) | [Code Samples](./04-code-samples.md)
+
+---
+
+## 2.1 OpenTelemetry Components
+
+### 2.1.1 SDK Selection
+
+**Primary Choice**: OpenTelemetry C++ SDK (`opentelemetry-cpp`)
+
+| Component                               | Purpose                | Required    |
+| --------------------------------------- | ---------------------- | ----------- |
+| `opentelemetry-cpp::api`                | Tracing API headers    | Yes         |
+| `opentelemetry-cpp::sdk`                | SDK implementation     | Yes         |
+| `opentelemetry-cpp::ext`                | Extensions (exporters) | Yes         |
+| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export       | Recommended |
+| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export       | Alternative |
+
+### 2.1.2 Instrumentation Strategy
+
+**Manual Instrumentation** (recommended):
+
+| Approach   | Pros                                                              | Cons                                                    |
+| ---------- | ----------------------------------------------------------------- | ------------------------------------------------------- |
+| **Manual** | Precise control, optimized placement, rippled-specific attributes | More development effort                                 |
+| **Auto**   | Less code, automatic coverage                                     | Less control, potential overhead, limited customization |
+
+---
+
+## 2.2 Exporter Configuration
+
+```mermaid
+flowchart TB
+    subgraph nodes["rippled Nodes"]
+        node1["rippled<br/>Node 1"]
+        node2["rippled<br/>Node 2"]
+        node3["rippled<br/>Node 3"]
+    end
+
+    collector["OpenTelemetry<br/>Collector<br/>(sidecar or standalone)"]
+
+    subgraph backends["Observability Backends"]
+        jaeger["Jaeger<br/>(Dev)"]
+        tempo["Tempo<br/>(Prod)"]
+        elastic["Elastic<br/>APM"]
+    end
+
+    node1 -->|"OTLP/gRPC<br/>:4317"| collector
+    node2 -->|"OTLP/gRPC<br/>:4317"| collector
+    node3 -->|"OTLP/gRPC<br/>:4317"| collector
+
+    collector --> jaeger
+    collector --> tempo
+    collector --> elastic
+
+    style nodes fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style backends fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style collector fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+### 2.2.1 OTLP/gRPC (Recommended)
+
+```cpp
+// Configuration for OTLP over gRPC
+namespace otlp = opentelemetry::exporter::otlp;
+
+otlp::OtlpGrpcExporterOptions opts;
+opts.endpoint = "localhost:4317";
+opts.use_ssl_credentials = true;
+opts.ssl_credentials_cacert_path = "/path/to/ca.crt";
+```
+
+### 2.2.2 OTLP/HTTP (Alternative)
+
+```cpp
+// Configuration for OTLP over HTTP
+namespace otlp = opentelemetry::exporter::otlp;
+
+otlp::OtlpHttpExporterOptions opts;
+opts.url = "http://localhost:4318/v1/traces";
+opts.content_type = otlp::HttpRequestContentType::kJson;  // or kBinary
+```
+
+---
+
+## 2.3 Span Naming Conventions
+
+### 2.3.1 Naming Schema
+
+```
+<component>.<operation>[.<sub-operation>]
+```
+
+**Examples**:
+- `tx.receive` - Transaction received from peer
+- `consensus.phase.establish` - Consensus establish phase
+- `rpc.command.server_info` - server_info RPC command
+
+### 2.3.2 Complete Span Catalog
+
+```yaml
+# Transaction Spans
+tx:
+  receive:     "Transaction received from network"
+  validate:    "Transaction signature/format validation"
+  process:     "Full transaction processing"
+  relay:       "Transaction relay to peers"
+  apply:       "Apply transaction to ledger"
+
+# Consensus Spans
+consensus:
+  round:       "Complete consensus round"
+  phase:
+    open:      "Open phase - collecting transactions"
+    establish: "Establish phase - reaching agreement"
+    accept:    "Accept phase - applying consensus"
+  proposal:
+    receive:   "Receive peer proposal"
+    send:      "Send our proposal"
+  validation:
+    receive:   "Receive peer validation"
+    send:      "Send our validation"
+
+# RPC Spans
+rpc:
+  request:     "HTTP/WebSocket request handling"
+  command:
+    "*":       "Specific RPC command (dynamic)"
+
+# Peer Spans
+peer:
+  connect:     "Peer connection establishment"
+  disconnect:  "Peer disconnection"
+  message:
+    send:      "Send protocol message"
+    receive:   "Receive protocol message"
+
+# Ledger Spans
+ledger:
+  acquire:     "Ledger acquisition from network"
+  build:       "Build new ledger"
+  validate:    "Ledger validation"
+  close:       "Close ledger"
+
+# Job Spans
+job:
+  enqueue:     "Job added to queue"
+  execute:     "Job execution"
+```
+
+---
+
+## 2.4 Attribute Schema
+
+### 2.4.1 Resource Attributes (Set Once at Startup)
+
+```cpp
+// Standard OpenTelemetry semantic conventions
+resource::SemanticConventions::SERVICE_NAME        = "rippled"
+resource::SemanticConventions::SERVICE_VERSION     = BuildInfo::getVersionString()
+resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
+
+// Custom rippled attributes
+"xrpl.network.id"      = <network_id>           // e.g., 0 for mainnet
+"xrpl.network.type"    = "mainnet" | "testnet" | "devnet" | "standalone"
+"xrpl.node.type"       = "validator" | "stock" | "reporting"
+"xrpl.node.cluster"    = <cluster_name>         // If clustered
+```
+
+### 2.4.2 Span Attributes by Category
+
+#### Transaction Attributes
+```cpp
+"xrpl.tx.hash"         = string   // Transaction hash (hex)
+"xrpl.tx.type"         = string   // "Payment", "OfferCreate", etc.
+"xrpl.tx.account"      = string   // Source account (redacted in prod)
+"xrpl.tx.sequence"     = int64    // Account sequence number
+"xrpl.tx.fee"          = int64    // Fee in drops
+"xrpl.tx.result"       = string   // "tesSUCCESS", "tecPATH_DRY", etc.
+"xrpl.tx.ledger_index" = int64    // Ledger containing transaction
+```
+
+#### Consensus Attributes
+```cpp
+"xrpl.consensus.round"          = int64    // Round number
+"xrpl.consensus.phase"          = string   // "open", "establish", "accept"
+"xrpl.consensus.mode"           = string   // "proposing", "observing", etc.
+"xrpl.consensus.proposers"      = int64    // Number of proposers
+"xrpl.consensus.ledger.prev"    = string   // Previous ledger hash
+"xrpl.consensus.ledger.seq"     = int64    // Ledger sequence
+"xrpl.consensus.tx_count"       = int64    // Transactions in consensus set
+"xrpl.consensus.duration_ms"    = float64  // Round duration
+```
+
+#### RPC Attributes
+```cpp
+"xrpl.rpc.command"     = string   // Command name
+"xrpl.rpc.version"     = int64    // API version
+"xrpl.rpc.role"        = string   // "admin" or "user"
+"xrpl.rpc.params"      = string   // Sanitized parameters (optional)
+```
+
+#### Peer & Message Attributes
+```cpp
+"xrpl.peer.id"            = string   // Peer public key (base58)
+"xrpl.peer.address"       = string   // IP:port
+"xrpl.peer.latency_ms"    = float64  // Measured latency
+"xrpl.peer.cluster"       = string   // Cluster name if clustered
+"xrpl.message.type"       = string   // Protocol message type name
+"xrpl.message.size_bytes" = int64    // Message size
+"xrpl.message.compressed" = bool     // Whether compressed
+```
+
+#### Ledger & Job Attributes
+```cpp
+"xrpl.ledger.hash"       = string   // Ledger hash
+"xrpl.ledger.index"      = int64    // Ledger sequence/index
+"xrpl.ledger.close_time" = int64    // Close time (epoch)
+"xrpl.ledger.tx_count"   = int64    // Transaction count
+"xrpl.job.type"          = string   // Job type name
+"xrpl.job.queue_ms"      = float64  // Time spent in queue
+"xrpl.job.worker"        = int64    // Worker thread ID
+```
+
+### 2.4.3 Data Collection Summary
+
+The following table summarizes what data is collected by category:
+
+| Category        | Attributes Collected                                                 | Purpose                     |
+| --------------- | -------------------------------------------------------------------- | --------------------------- |
+| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index`          | Trace transaction lifecycle |
+| **Consensus**   | `round`, `phase`, `mode`, `proposers` (public keys), `duration_ms`   | Analyze consensus timing    |
+| **RPC**         | `command`, `version`, `status`, `duration_ms`                        | Monitor RPC performance     |
+| **Peer**        | `peer.id` (public key), `latency_ms`, `message.type`, `message.size` | Network topology analysis   |
+| **Ledger**      | `ledger.hash`, `ledger.index`, `close_time`, `tx_count`              | Ledger progression tracking |
+| **Job**         | `job.type`, `queue_ms`, `worker`                                     | JobQueue performance        |
+
+### 2.4.4 Privacy & Sensitive Data Policy
+
+OpenTelemetry instrumentation is designed to collect **operational metadata only**, never sensitive content.
+
+#### Data NOT Collected
+
+The following data is explicitly **excluded** from telemetry collection:
+
+| Excluded Data           | Reason                                    |
+| ----------------------- | ----------------------------------------- |
+| **Private Keys**        | Never exposed; not relevant to tracing    |
+| **Account Balances**    | Financial data; privacy sensitive         |
+| **Transaction Amounts** | Financial data; privacy sensitive         |
+| **Raw TX Payloads**     | May contain sensitive memo/data fields    |
+| **Personal Data**       | No PII collected                          |
+| **IP Addresses**        | Configurable; excluded by default in prod |
+
+#### Privacy Protection Mechanisms
+
+| Mechanism                     | Description                                                               |
+| ----------------------------- | ------------------------------------------------------------------------- |
+| **Account Hashing**           | `xrpl.tx.account` is hashed at collector level before storage             |
+| **Configurable Redaction**    | Sensitive fields can be excluded via `[telemetry]` config section         |
+| **Sampling**                  | Only 10% of traces recorded by default, reducing data exposure            |
+| **Local Control**             | Node operators have full control over what gets exported                  |
+| **No Raw Payloads**           | Transaction content is never recorded, only metadata (hash, type, result) |
+| **Collector-Level Filtering** | Additional redaction/hashing can be configured at OTel Collector          |
+
+#### Collector-Level Data Protection
+
+The OpenTelemetry Collector can be configured to hash or redact sensitive attributes before export:
+
+```yaml
+processors:
+  attributes:
+    actions:
+      # Hash account addresses before storage
+      - key: xrpl.tx.account
+        action: hash
+      # Remove IP addresses entirely
+      - key: xrpl.peer.address
+        action: delete
+      # Redact specific fields
+      - key: xrpl.rpc.params
+        action: delete
+```
+
+#### Configuration Options for Privacy
+
+In `rippled.cfg`, operators can control data collection granularity:
+
+```ini
+[telemetry]
+enabled=1
+
+# Disable collection of specific components
+trace_transactions=1
+trace_consensus=1
+trace_rpc=1
+trace_peer=0          # Disable peer tracing (high volume, includes addresses)
+
+# Redact specific attributes
+redact_account=1      # Hash account addresses before export
+redact_peer_address=1 # Remove peer IP addresses
+```
+
+> **Key Principle**: Telemetry collects **operational metadata** (timing, counts, hashes) — never **sensitive content** (keys, balances, amounts, raw payloads).
+
+---
+
+## 2.5 Context Propagation Design
+
+### 2.5.1 Propagation Boundaries
+
+```mermaid
+flowchart TB
+    subgraph http["HTTP/WebSocket (RPC)"]
+        w3c["W3C Trace Context Headers:<br/>traceparent: 00-{trace_id}-{span_id}-{flags}<br/>tracestate: rippled=<state>"]
+    end
+
+    subgraph protobuf["Protocol Buffers (P2P)"]
+        proto["message TraceContext {<br/>  bytes trace_id = 1;  // 16 bytes<br/>  bytes span_id = 2;   // 8 bytes<br/>  uint32 trace_flags = 3;<br/>  string trace_state = 4;<br/>}"]
+    end
+
+    subgraph jobqueue["JobQueue (Internal Async)"]
+        job["Context captured at job creation,<br/>restored at execution<br/><br/>class Job {<br/>  opentelemetry::context::Context traceContext_;<br/>};"]
+    end
+
+    style http fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style protobuf fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style jobqueue fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+---
+
+## 2.6 Integration with Existing Observability
+
+### 2.6.1 Existing Frameworks Comparison
+
+rippled already has two observability mechanisms. OpenTelemetry complements (not replaces) them:
+
+| Aspect                | PerfLog                       | Beast Insight (StatsD)       | OpenTelemetry             |
+| --------------------- | ----------------------------- | ---------------------------- | ------------------------- |
+| **Type**              | Logging                       | Metrics                      | Distributed Tracing       |
+| **Data**              | JSON log entries              | Counters, gauges, histograms | Spans with context        |
+| **Scope**             | Single node                   | Single node                  | **Cross-node**            |
+| **Output**            | `perf.log` file               | StatsD server                | OTLP Collector            |
+| **Question answered** | "What happened on this node?" | "How many? How fast?"        | "What was the journey?"   |
+| **Correlation**       | By timestamp                  | By metric name               | By `trace_id`             |
+| **Overhead**          | Low (file I/O)                | Low (UDP packets)            | Low-Medium (configurable) |
+
+### 2.6.2 What Each Framework Does Best
+
+#### PerfLog
+- **Purpose**: Detailed local event logging for RPC and job execution
+- **Strengths**:
+  - Rich JSON output with timing data
+  - Already integrated in RPC handlers
+  - File-based, no external dependencies
+- **Limitations**:
+  - Single-node only (no cross-node correlation)
+  - No parent-child relationships between events
+  - Manual log parsing required
+
+```json
+// Example PerfLog entry
+{
+  "time": "2024-01-15T10:30:00.123Z",
+  "method": "submit",
+  "duration_us": 1523,
+  "result": "tesSUCCESS"
+}
+```
+
+#### Beast Insight (StatsD)
+- **Purpose**: Real-time metrics for monitoring dashboards
+- **Strengths**:
+  - Aggregated metrics (counters, gauges, histograms)
+  - Low overhead (UDP, fire-and-forget)
+  - Good for alerting thresholds
+- **Limitations**:
+  - No request-level detail
+  - No causal relationships
+  - Single-node perspective
+
+```cpp
+// Example StatsD usage in rippled
+insight.increment("rpc.submit.count");
+insight.gauge("ledger.age", age);
+insight.timing("consensus.round", duration);
+```
+
+#### OpenTelemetry (NEW)
+- **Purpose**: Distributed request tracing across nodes
+- **Strengths**:
+  - **Cross-node correlation** via `trace_id`
+  - Parent-child span relationships
+  - Rich attributes per span
+  - Industry standard (CNCF)
+- **Limitations**:
+  - Requires collector infrastructure
+  - Higher complexity than logging
+
+```cpp
+// Example OpenTelemetry span
+auto span = telemetry.startSpan("tx.relay");
+span->SetAttribute("tx.hash", hash);
+span->SetAttribute("peer.id", peerId);
+// Span automatically linked to parent via context
+```
+
+### 2.6.3 When to Use Each
+
+| Scenario                                | PerfLog   | StatsD | OpenTelemetry |
+| --------------------------------------- | --------- | ------ | ------------- |
+| "How many TXs per second?"              | ❌         | ✅      | ❌             |
+| "What's the p99 RPC latency?"           | ❌         | ✅      | ✅             |
+| "Why was this specific TX slow?"        | ⚠️ partial | ❌      | ✅             |
+| "Which node delayed consensus?"         | ❌         | ❌      | ✅             |
+| "What happened on node X at time T?"    | ✅         | ❌      | ✅             |
+| "Show me the TX journey across 5 nodes" | ❌         | ❌      | ✅             |
+
+### 2.6.4 Coexistence Strategy
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Process"]
+        perflog["PerfLog<br/>(JSON to file)"]
+        insight["Beast Insight<br/>(StatsD)"]
+        otel["OpenTelemetry<br/>(Tracing)"]
+    end
+
+    perflog --> perffile["perf.log"]
+    insight --> statsd["StatsD Server"]
+    otel --> collector["OTLP Collector"]
+
+    perffile --> grafana["Grafana<br/>(Unified UI)"]
+    statsd --> grafana
+    collector --> grafana
+
+    style rippled fill:#212121,stroke:#0a0a0a,color:#ffffff
+    style grafana fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+### 2.6.5 Correlation with PerfLog
+
+Trace IDs can be correlated with existing PerfLog entries for comprehensive debugging:
+
+```cpp
+// In RPCHandler.cpp - correlate trace with PerfLog
+Status doCommand(RPC::JsonContext& context, Json::Value& result)
+{
+    // Start OpenTelemetry span
+    auto span = context.app.getTelemetry().startSpan(
+        "rpc.command." + context.method);
+
+    // Get trace ID for correlation
+    auto traceId = span->GetContext().trace_id().IsValid()
+        ? toHex(span->GetContext().trace_id())
+        : "";
+
+    // Use existing PerfLog with trace correlation
+    auto const curId = context.app.getPerfLog().currentId();
+    context.app.getPerfLog().rpcStart(context.method, curId);
+
+    // Future: Add trace ID to PerfLog entry
+    // context.app.getPerfLog().setTraceId(curId, traceId);
+
+    try {
+        auto ret = handler(context, result);
+        context.app.getPerfLog().rpcFinish(context.method, curId);
+        span->SetStatus(opentelemetry::trace::StatusCode::kOk);
+        return ret;
+    } catch (std::exception const& e) {
+        context.app.getPerfLog().rpcError(context.method, curId);
+        span->RecordException(e);
+        span->SetStatus(opentelemetry::trace::StatusCode::kError, e.what());
+        throw;
+    }
+}
+```
+
+---
+
+*Previous: [Architecture Analysis](./01-architecture-analysis.md)* | *Next: [Implementation Strategy](./03-implementation-strategy.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/03-implementation-strategy.md
+++ b/OpenTelemetryPlan/03-implementation-strategy.md
@@ -0,0 +1,448 @@
+# Implementation Strategy
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Code Samples](./04-code-samples.md) | [Configuration Reference](./05-configuration-reference.md)
+
+---
+
+## 3.1 Directory Structure
+
+The telemetry implementation follows rippled's existing code organization pattern:
+
+```
+include/xrpl/
+├── telemetry/
+│   ├── Telemetry.h              # Main telemetry interface
+│   ├── TelemetryConfig.h        # Configuration structures
+│   ├── TraceContext.h           # Context propagation utilities
+│   ├── SpanGuard.h              # RAII span management
+│   └── SpanAttributes.h         # Attribute helper functions
+
+src/libxrpl/
+├── telemetry/
+│   ├── Telemetry.cpp            # Implementation
+│   ├── TelemetryConfig.cpp      # Config parsing
+│   ├── TraceContext.cpp         # Context serialization
+│   └── NullTelemetry.cpp        # No-op implementation
+
+src/xrpld/
+├── telemetry/
+│   ├── TracingInstrumentation.h # Instrumentation macros
+│   └── TracingInstrumentation.cpp
+```
+
+---
+
+## 3.2 Implementation Approach
+
+<div align="center">
+
+```mermaid
+%%{init: {'flowchart': {'nodeSpacing': 20, 'rankSpacing': 30}}}%%
+flowchart TB
+    subgraph phase1["Phase 1: Core"]
+        direction LR
+        sdk["SDK Integration"] ~~~ interface["Telemetry Interface"] ~~~ config["Configuration"]
+    end
+
+    subgraph phase2["Phase 2: RPC"]
+        direction LR
+        http["HTTP Context"] ~~~ rpc["RPC Handlers"]
+    end
+
+    subgraph phase3["Phase 3: P2P"]
+        direction LR
+        proto["Protobuf Context"] ~~~ tx["Transaction Relay"]
+    end
+
+    subgraph phase4["Phase 4: Consensus"]
+        direction LR
+        consensus["Consensus Rounds"] ~~~ proposals["Proposals"]
+    end
+
+    phase1 --> phase2 --> phase3 --> phase4
+
+    style phase1 fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style phase2 fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style phase3 fill:#e65100,stroke:#bf360c,color:#ffffff
+    style phase4 fill:#c2185b,stroke:#880e4f,color:#ffffff
+```
+
+</div>
+
+### Key Principles
+
+1. **Minimal Intrusion**: Instrumentation should not alter existing control flow
+2. **Zero-Cost When Disabled**: Use compile-time flags and no-op implementations
+3. **Backward Compatibility**: Protocol Buffer extensions use high field numbers
+4. **Graceful Degradation**: Tracing failures must not affect node operation
+
+---
+
+## 3.3 Performance Overhead Summary
+
+| Metric        | Overhead   | Notes                               |
+| ------------- | ---------- | ----------------------------------- |
+| CPU           | 1-3%       | Span creation and attribute setting |
+| Memory        | 2-5 MB     | Batch buffer for pending spans      |
+| Network       | 10-50 KB/s | Compressed OTLP export to collector |
+| Latency (p99) | <2%        | With proper sampling configuration  |
+
+---
+
+## 3.4 Detailed CPU Overhead Analysis
+
+### 3.4.1 Per-Operation Costs
+
+| Operation             | Time (ns) | Frequency              | Impact     |
+| --------------------- | --------- | ---------------------- | ---------- |
+| Span creation         | 200-500   | Every traced operation | Low        |
+| Span end              | 100-200   | Every traced operation | Low        |
+| SetAttribute (string) | 80-120    | 3-5 per span           | Low        |
+| SetAttribute (int)    | 40-60     | 2-3 per span           | Negligible |
+| AddEvent              | 50-80     | 0-2 per span           | Negligible |
+| Context injection     | 150-250   | Per outgoing message   | Low        |
+| Context extraction    | 100-180   | Per incoming message   | Low        |
+| GetCurrent context    | 10-20     | Thread-local access    | Negligible |
+
+### 3.4.2 Transaction Processing Overhead
+
+<div align="center">
+
+```mermaid
+%%{init: {'pie': {'textPosition': 0.75}}}%%
+pie showData
+    "tx.receive (800ns)" : 800
+    "tx.validate (500ns)" : 500
+    "tx.relay (500ns)" : 500
+    "Context inject (600ns)" : 600
+```
+
+**Transaction Tracing Overhead (~2.4μs total)**
+
+</div>
+
+**Overhead percentage**: 2.4 μs / 200 μs (avg tx processing) = **~1.2%**
+
+### 3.4.3 Consensus Round Overhead
+
+| Operation              | Count | Cost (ns) | Total      |
+| ---------------------- | ----- | --------- | ---------- |
+| consensus.round span   | 1     | ~1000     | ~1 μs      |
+| consensus.phase spans  | 3     | ~700      | ~2.1 μs    |
+| proposal.receive spans | ~20   | ~600      | ~12 μs     |
+| proposal.send spans    | ~3    | ~600      | ~1.8 μs    |
+| Context operations     | ~30   | ~200      | ~6 μs      |
+| **TOTAL**              |       |           | **~23 μs** |
+
+**Overhead percentage**: 23 μs / 3s (typical round) = **~0.0008%** (negligible)
+
+### 3.4.4 RPC Request Overhead
+
+| Operation        | Cost (ns)    |
+| ---------------- | ------------ |
+| rpc.request span | ~700         |
+| rpc.command span | ~600         |
+| Context extract  | ~250         |
+| Context inject   | ~200         |
+| **TOTAL**        | **~1.75 μs** |
+
+- Fast RPC (1ms): 1.75 μs / 1ms = **~0.175%**
+- Slow RPC (100ms): 1.75 μs / 100ms = **~0.002%**
+
+---
+
+## 3.5 Memory Overhead Analysis
+
+### 3.5.1 Static Memory
+
+| Component                | Size        | Allocated  |
+| ------------------------ | ----------- | ---------- |
+| TracerProvider singleton | ~64 KB      | At startup |
+| BatchSpanProcessor       | ~128 KB     | At startup |
+| OTLP exporter            | ~256 KB     | At startup |
+| Propagator registry      | ~8 KB       | At startup |
+| **Total static**         | **~456 KB** |            |
+
+### 3.5.2 Dynamic Memory
+
+| Component            | Size per unit | Max units  | Peak        |
+| -------------------- | ------------- | ---------- | ----------- |
+| Active span          | ~200 bytes    | 1000       | ~200 KB     |
+| Queued span (export) | ~500 bytes    | 2048       | ~1 MB       |
+| Attribute storage    | ~50 bytes     | 5 per span | Included    |
+| Context storage      | ~64 bytes     | Per thread | ~6.4 KB     |
+| **Total dynamic**    |               |            | **~1.2 MB** |
+
+### 3.5.3 Memory Growth Characteristics
+
+```mermaid
+---
+config:
+    xyChart:
+        width: 700
+        height: 400
+---
+xychart-beta
+    title "Memory Usage vs Span Rate"
+    x-axis "Spans/second" [0, 200, 400, 600, 800, 1000]
+    y-axis "Memory (MB)" 0 --> 6
+    line [1, 1.8, 2.6, 3.4, 4.2, 5]
+```
+
+**Notes**:
+- Memory increases linearly with span rate
+- Batch export prevents unbounded growth
+- Queue size is configurable (default 2048 spans)
+- At queue limit, oldest spans are dropped (not blocked)
+
+---
+
+## 3.6 Network Overhead Analysis
+
+### 3.6.1 Export Bandwidth
+
+| Sampling Rate | Spans/sec | Bandwidth | Notes            |
+| ------------- | --------- | --------- | ---------------- |
+| 100%          | ~500      | ~250 KB/s | Development only |
+| 10%           | ~50       | ~25 KB/s  | Staging          |
+| 1%            | ~5        | ~2.5 KB/s | Production       |
+| Error-only    | ~1        | ~0.5 KB/s | Minimal overhead |
+
+### 3.6.2 Trace Context Propagation
+
+| Message Type           | Context Size | Messages/sec | Overhead    |
+| ---------------------- | ------------ | ------------ | ----------- |
+| TMTransaction          | 32 bytes     | ~100         | ~3.2 KB/s   |
+| TMProposeSet           | 32 bytes     | ~10          | ~320 B/s    |
+| TMValidation           | 32 bytes     | ~50          | ~1.6 KB/s   |
+| **Total P2P overhead** |              |              | **~5 KB/s** |
+
+---
+
+## 3.7 Optimization Strategies
+
+### 3.7.1 Sampling Strategies
+
+```mermaid
+flowchart TD
+    trace["New Trace"]
+
+    trace --> errors{"Is Error?"}
+    errors -->|Yes| sample["SAMPLE"]
+    errors -->|No| consensus{"Is Consensus?"}
+
+    consensus -->|Yes| sample
+    consensus -->|No| slow{"Is Slow?"}
+
+    slow -->|Yes| sample
+    slow -->|No| prob{"Random < 10%?"}
+
+    prob -->|Yes| sample
+    prob -->|No| drop["DROP"]
+
+    style sample fill:#4caf50,stroke:#388e3c,color:#fff
+    style drop fill:#f44336,stroke:#c62828,color:#fff
+```
+
+### 3.7.2 Batch Tuning Recommendations
+
+| Environment        | Batch Size | Batch Delay | Max Queue |
+| ------------------ | ---------- | ----------- | --------- |
+| Low-latency        | 128        | 1000ms      | 512       |
+| High-throughput    | 1024       | 10000ms     | 8192      |
+| Memory-constrained | 256        | 2000ms      | 512       |
+
+### 3.7.3 Conditional Instrumentation
+
+```cpp
+// Compile-time feature flag
+#ifndef XRPL_ENABLE_TELEMETRY
+// Zero-cost when disabled
+#define XRPL_TRACE_SPAN(t, n) ((void)0)
+#endif
+
+// Runtime component filtering
+if (telemetry.shouldTracePeer())
+{
+    XRPL_TRACE_SPAN(telemetry, "peer.message.receive");
+    // ... instrumentation
+}
+// No overhead when component tracing disabled
+```
+
+---
+
+## 3.8 Links to Detailed Documentation
+
+- **[Code Samples](./04-code-samples.md)**: Complete implementation code for all components
+- **[Configuration Reference](./05-configuration-reference.md)**: Configuration options and collector setup
+- **[Implementation Phases](./06-implementation-phases.md)**: Detailed timeline and milestones
+
+---
+
+## 3.9 Code Intrusiveness Assessment
+
+This section provides a detailed assessment of how intrusive the OpenTelemetry integration is to the existing rippled codebase.
+
+### 3.9.1 Files Modified Summary
+
+| Component             | Files Modified | Lines Added | Lines Changed | Architectural Impact |
+| --------------------- | -------------- | ----------- | ------------- | -------------------- |
+| **Core Telemetry**    | 5 new files    | ~800        | 0             | None (new module)    |
+| **Application Init**  | 2 files        | ~30         | ~5            | Minimal              |
+| **RPC Layer**         | 3 files        | ~80         | ~20           | Minimal              |
+| **Transaction Relay** | 4 files        | ~120        | ~40           | Low                  |
+| **Consensus**         | 3 files        | ~100        | ~30           | Low-Medium           |
+| **Protocol Buffers**  | 1 file         | ~25         | 0             | Low                  |
+| **CMake/Build**       | 3 files        | ~50         | ~10           | Minimal              |
+| **Total**             | **~21 files**  | **~1,205**  | **~105**      | **Low**              |
+
+### 3.9.2 Detailed File Impact
+
+```mermaid
+pie title Code Changes by Component
+    "New Telemetry Module" : 800
+    "Transaction Relay" : 160
+    "Consensus" : 130
+    "RPC Layer" : 100
+    "Application Init" : 35
+    "Protocol Buffers" : 25
+    "Build System" : 60
+```
+
+#### New Files (No Impact on Existing Code)
+
+| File                                           | Lines | Purpose              |
+| ---------------------------------------------- | ----- | -------------------- |
+| `include/xrpl/telemetry/Telemetry.h`           | ~160  | Main interface       |
+| `include/xrpl/telemetry/SpanGuard.h`           | ~120  | RAII wrapper         |
+| `include/xrpl/telemetry/TraceContext.h`        | ~80   | Context propagation  |
+| `src/xrpld/telemetry/TracingInstrumentation.h` | ~60   | Macros               |
+| `src/libxrpl/telemetry/Telemetry.cpp`          | ~200  | Implementation       |
+| `src/libxrpl/telemetry/TelemetryConfig.cpp`    | ~60   | Config parsing       |
+| `src/libxrpl/telemetry/NullTelemetry.cpp`      | ~40   | No-op implementation |
+
+#### Modified Files (Existing Rippled Code)
+
+| File                                              | Lines Added | Lines Changed | Risk Level |
+| ------------------------------------------------- | ----------- | ------------- | ---------- |
+| `src/xrpld/app/main/Application.cpp`              | ~15         | ~3            | Low        |
+| `include/xrpl/app/main/Application.h`             | ~5          | ~2            | Low        |
+| `src/xrpld/rpc/detail/ServerHandler.cpp`          | ~40         | ~10           | Low        |
+| `src/xrpld/rpc/handlers/*.cpp`                    | ~30         | ~8            | Low        |
+| `src/xrpld/overlay/detail/PeerImp.cpp`            | ~60         | ~15           | Medium     |
+| `src/xrpld/overlay/detail/OverlayImpl.cpp`        | ~30         | ~10           | Medium     |
+| `src/xrpld/app/consensus/RCLConsensus.cpp`        | ~50         | ~15           | Medium     |
+| `src/xrpld/app/consensus/RCLConsensusAdaptor.cpp` | ~40         | ~12           | Medium     |
+| `src/xrpld/core/JobQueue.cpp`                     | ~20         | ~5            | Low        |
+| `src/xrpld/overlay/detail/ripple.proto`           | ~25         | 0             | Low        |
+| `CMakeLists.txt`                                  | ~40         | ~8            | Low        |
+| `cmake/FindOpenTelemetry.cmake`                   | ~50         | 0             | None (new) |
+
+### 3.9.3 Risk Assessment by Component
+
+<div align="center">
+
+**Do First** ↖ ↗ **Plan Carefully**
+
+```mermaid
+quadrantChart
+    title Code Intrusiveness Risk Matrix
+    x-axis Low Risk --> High Risk
+    y-axis Low Value --> High Value
+
+    RPC Tracing: [0.2, 0.8]
+    Transaction Relay: [0.5, 0.9]
+    Consensus Tracing: [0.7, 0.95]
+    Peer Message Tracing: [0.8, 0.4]
+    JobQueue Context: [0.4, 0.5]
+    Ledger Acquisition: [0.5, 0.6]
+```
+
+**Optional** ↙ ↘ **Avoid**
+
+</div>
+
+#### Risk Level Definitions
+
+| Risk Level | Definition                                                       | Mitigation                         |
+| ---------- | ---------------------------------------------------------------- | ---------------------------------- |
+| **Low**    | Additive changes only; no modification to existing logic         | Standard code review               |
+| **Medium** | Minor modifications to existing functions; clear boundaries      | Comprehensive unit tests           |
+| **High**   | Changes to core logic or data structures; potential side effects | Integration tests + staged rollout |
+
+### 3.9.4 Architectural Impact Assessment
+
+| Aspect               | Impact  | Justification                                                         |
+| -------------------- | ------- | --------------------------------------------------------------------- |
+| **Data Flow**        | None    | Tracing is purely observational; no business logic changes            |
+| **Threading Model**  | Minimal | Context propagation uses thread-local storage (standard OTel pattern) |
+| **Memory Model**     | Low     | Bounded queues prevent unbounded growth; RAII ensures cleanup         |
+| **Network Protocol** | Low     | Optional fields in protobuf (high field numbers); backward compatible |
+| **Configuration**    | None    | New config section; existing configs unaffected                       |
+| **Build System**     | Low     | Optional CMake flag; builds work without OpenTelemetry                |
+| **Dependencies**     | Low     | OpenTelemetry SDK is optional; null implementation when disabled      |
+
+### 3.9.5 Backward Compatibility
+
+| Compatibility   | Status | Notes                                                 |
+| --------------- | ------ | ----------------------------------------------------- |
+| **Config File** | ✅ Full | New `[telemetry]` section is optional                 |
+| **Protocol**    | ✅ Full | Optional protobuf fields with high field numbers      |
+| **Build**       | ✅ Full | `XRPL_ENABLE_TELEMETRY=OFF` produces identical binary |
+| **Runtime**     | ✅ Full | `enabled=0` produces zero overhead                    |
+| **API**         | ✅ Full | No changes to public RPC or P2P APIs                  |
+
+### 3.9.6 Rollback Strategy
+
+If issues are discovered after deployment:
+
+1. **Immediate**: Set `enabled=0` in config and restart (zero code change)
+2. **Quick**: Rebuild with `XRPL_ENABLE_TELEMETRY=OFF`
+3. **Complete**: Revert telemetry commits (clean separation makes this easy)
+
+### 3.9.7 Code Change Examples
+
+**Minimal RPC Instrumentation (Low Intrusiveness):**
+```cpp
+// Before
+void ServerHandler::onRequest(...) {
+    auto result = processRequest(req);
+    send(result);
+}
+
+// After (only ~10 lines added)
+void ServerHandler::onRequest(...) {
+    XRPL_TRACE_RPC(app_.getTelemetry(), "rpc.request");  // +1 line
+    XRPL_TRACE_SET_ATTR("xrpl.rpc.command", command);     // +1 line
+
+    auto result = processRequest(req);
+
+    XRPL_TRACE_SET_ATTR("xrpl.rpc.status", status);       // +1 line
+    send(result);
+}
+```
+
+**Consensus Instrumentation (Medium Intrusiveness):**
+```cpp
+// Before
+void RCLConsensusAdaptor::startRound(...) {
+    // ... existing logic
+}
+
+// After (context storage required)
+void RCLConsensusAdaptor::startRound(...) {
+    XRPL_TRACE_CONSENSUS(app_.getTelemetry(), "consensus.round");
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.seq", seq);
+
+    // Store context for child spans in phase transitions
+    currentRoundContext_ = _xrpl_guard_->context();  // New member variable
+
+    // ... existing logic unchanged
+}
+```
+
+---
+
+*Previous: [Design Decisions](./02-design-decisions.md)* | *Next: [Code Samples](./04-code-samples.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/04-code-samples.md
+++ b/OpenTelemetryPlan/04-code-samples.md
@@ -0,0 +1,982 @@
+# Code Samples
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Implementation Strategy](./03-implementation-strategy.md) | [Configuration Reference](./05-configuration-reference.md)
+
+---
+
+## 4.1 Core Interfaces
+
+### 4.1.1 Main Telemetry Interface
+
+```cpp
+// include/xrpl/telemetry/Telemetry.h
+#pragma once
+
+#include <xrpl/telemetry/TelemetryConfig.h>
+#include <opentelemetry/trace/tracer.h>
+#include <opentelemetry/trace/span.h>
+#include <opentelemetry/context/context.h>
+
+#include <memory>
+#include <string>
+#include <string_view>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * Main telemetry interface for OpenTelemetry integration.
+ *
+ * This class provides the primary API for distributed tracing in rippled.
+ * It manages the OpenTelemetry SDK lifecycle and provides convenience
+ * methods for creating spans and propagating context.
+ */
+class Telemetry
+{
+public:
+    /**
+     * Configuration for the telemetry system.
+     */
+    struct Setup
+    {
+        bool enabled = false;
+        std::string serviceName = "rippled";
+        std::string serviceVersion;
+        std::string serviceInstanceId;  // Node public key
+
+        // Exporter configuration
+        std::string exporterType = "otlp_grpc";  // "otlp_grpc", "otlp_http", "none"
+        std::string exporterEndpoint = "localhost:4317";
+        bool useTls = false;
+        std::string tlsCertPath;
+
+        // Sampling configuration
+        double samplingRatio = 1.0;  // 1.0 = 100% sampling
+
+        // Batch processor settings
+        std::uint32_t batchSize = 512;
+        std::chrono::milliseconds batchDelay{5000};
+        std::uint32_t maxQueueSize = 2048;
+
+        // Network attributes
+        std::uint32_t networkId = 0;
+        std::string networkType = "mainnet";
+
+        // Component filtering
+        bool traceTransactions = true;
+        bool traceConsensus = true;
+        bool traceRpc = true;
+        bool tracePeer = false;  // High volume, disabled by default
+        bool traceLedger = true;
+    };
+
+    virtual ~Telemetry() = default;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // LIFECYCLE
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Start the telemetry system (call after configuration) */
+    virtual void start() = 0;
+
+    /** Stop the telemetry system (flushes pending spans) */
+    virtual void stop() = 0;
+
+    /** Check if telemetry is enabled */
+    virtual bool isEnabled() const = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // TRACER ACCESS
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Get the tracer for creating spans */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Tracer>
+    getTracer(std::string_view name = "rippled") = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // SPAN CREATION (Convenience Methods)
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Start a new span with default options */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span>
+    startSpan(
+        std::string_view name,
+        opentelemetry::trace::SpanKind kind =
+            opentelemetry::trace::SpanKind::kInternal) = 0;
+
+    /** Start a span as child of given context */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span>
+    startSpan(
+        std::string_view name,
+        opentelemetry::context::Context const& parentContext,
+        opentelemetry::trace::SpanKind kind =
+            opentelemetry::trace::SpanKind::kInternal) = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // CONTEXT PROPAGATION
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Serialize context for network transmission */
+    virtual std::string serializeContext(
+        opentelemetry::context::Context const& ctx) = 0;
+
+    /** Deserialize context from network data */
+    virtual opentelemetry::context::Context deserializeContext(
+        std::string const& serialized) = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // COMPONENT FILTERING
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Check if transaction tracing is enabled */
+    virtual bool shouldTraceTransactions() const = 0;
+
+    /** Check if consensus tracing is enabled */
+    virtual bool shouldTraceConsensus() const = 0;
+
+    /** Check if RPC tracing is enabled */
+    virtual bool shouldTraceRpc() const = 0;
+
+    /** Check if peer message tracing is enabled */
+    virtual bool shouldTracePeer() const = 0;
+};
+
+// Factory functions
+std::unique_ptr<Telemetry>
+make_Telemetry(
+    Telemetry::Setup const& setup,
+    beast::Journal journal);
+
+Telemetry::Setup
+setup_Telemetry(
+    Section const& section,
+    std::string const& nodePublicKey,
+    std::string const& version);
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.2 RAII Span Guard
+
+```cpp
+// include/xrpl/telemetry/SpanGuard.h
+#pragma once
+
+#include <opentelemetry/trace/span.h>
+#include <opentelemetry/trace/scope.h>
+#include <opentelemetry/trace/status.h>
+
+#include <string_view>
+#include <exception>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * RAII guard for OpenTelemetry spans.
+ *
+ * Automatically ends the span on destruction and makes it the current
+ * span in the thread-local context.
+ */
+class SpanGuard
+{
+    opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> span_;
+    opentelemetry::trace::Scope scope_;
+
+public:
+    /**
+     * Construct guard with span.
+     * The span becomes the current span in thread-local context.
+     */
+    explicit SpanGuard(
+        opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> span)
+        : span_(std::move(span))
+        , scope_(span_)
+    {
+    }
+
+    // Non-copyable, non-movable
+    SpanGuard(SpanGuard const&) = delete;
+    SpanGuard& operator=(SpanGuard const&) = delete;
+    SpanGuard(SpanGuard&&) = delete;
+    SpanGuard& operator=(SpanGuard&&) = delete;
+
+    ~SpanGuard()
+    {
+        if (span_)
+            span_->End();
+    }
+
+    /** Access the underlying span */
+    opentelemetry::trace::Span& span() { return *span_; }
+    opentelemetry::trace::Span const& span() const { return *span_; }
+
+    /** Set span status to OK */
+    void setOk()
+    {
+        span_->SetStatus(opentelemetry::trace::StatusCode::kOk);
+    }
+
+    /** Set span status with code and description */
+    void setStatus(
+        opentelemetry::trace::StatusCode code,
+        std::string_view description = "")
+    {
+        span_->SetStatus(code, std::string(description));
+    }
+
+    /** Set an attribute on the span */
+    template<typename T>
+    void setAttribute(std::string_view key, T&& value)
+    {
+        span_->SetAttribute(
+            opentelemetry::nostd::string_view(key.data(), key.size()),
+            std::forward<T>(value));
+    }
+
+    /** Add an event to the span */
+    void addEvent(std::string_view name)
+    {
+        span_->AddEvent(std::string(name));
+    }
+
+    /** Record an exception on the span */
+    void recordException(std::exception const& e)
+    {
+        span_->RecordException(e);
+        span_->SetStatus(
+            opentelemetry::trace::StatusCode::kError,
+            e.what());
+    }
+
+    /** Get the current trace context */
+    opentelemetry::context::Context context() const
+    {
+        return opentelemetry::context::RuntimeContext::GetCurrent();
+    }
+};
+
+/**
+ * No-op span guard for when tracing is disabled.
+ * Provides the same interface but does nothing.
+ */
+class NullSpanGuard
+{
+public:
+    NullSpanGuard() = default;
+
+    void setOk() {}
+    void setStatus(opentelemetry::trace::StatusCode, std::string_view = "") {}
+
+    template<typename T>
+    void setAttribute(std::string_view, T&&) {}
+
+    void addEvent(std::string_view) {}
+    void recordException(std::exception const&) {}
+};
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.3 Instrumentation Macros
+
+```cpp
+// src/xrpld/telemetry/TracingInstrumentation.h
+#pragma once
+
+#include <xrpl/telemetry/Telemetry.h>
+#include <xrpl/telemetry/SpanGuard.h>
+
+namespace xrpl {
+namespace telemetry {
+
+// ═══════════════════════════════════════════════════════════════════════════
+// INSTRUMENTATION MACROS
+// ═══════════════════════════════════════════════════════════════════════════
+
+#ifdef XRPL_ENABLE_TELEMETRY
+
+// Start a span that is automatically ended when guard goes out of scope
+#define XRPL_TRACE_SPAN(telemetry, name) \
+    auto _xrpl_span_ = (telemetry).startSpan(name); \
+    ::xrpl::telemetry::SpanGuard _xrpl_guard_(_xrpl_span_)
+
+// Start a span with specific kind
+#define XRPL_TRACE_SPAN_KIND(telemetry, name, kind) \
+    auto _xrpl_span_ = (telemetry).startSpan(name, kind); \
+    ::xrpl::telemetry::SpanGuard _xrpl_guard_(_xrpl_span_)
+
+// Conditional span based on component
+#define XRPL_TRACE_TX(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceTransactions()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+#define XRPL_TRACE_CONSENSUS(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceConsensus()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+#define XRPL_TRACE_RPC(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceRpc()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+// Set attribute on current span (if exists)
+#define XRPL_TRACE_SET_ATTR(key, value) \
+    if (_xrpl_guard_.has_value()) { \
+        _xrpl_guard_->setAttribute(key, value); \
+    }
+
+// Record exception on current span
+#define XRPL_TRACE_EXCEPTION(e) \
+    if (_xrpl_guard_.has_value()) { \
+        _xrpl_guard_->recordException(e); \
+    }
+
+#else  // XRPL_ENABLE_TELEMETRY not defined
+
+#define XRPL_TRACE_SPAN(telemetry, name) ((void)0)
+#define XRPL_TRACE_SPAN_KIND(telemetry, name, kind) ((void)0)
+#define XRPL_TRACE_TX(telemetry, name) ((void)0)
+#define XRPL_TRACE_CONSENSUS(telemetry, name) ((void)0)
+#define XRPL_TRACE_RPC(telemetry, name) ((void)0)
+#define XRPL_TRACE_SET_ATTR(key, value) ((void)0)
+#define XRPL_TRACE_EXCEPTION(e) ((void)0)
+
+#endif  // XRPL_ENABLE_TELEMETRY
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.4 Protocol Buffer Extensions
+
+### 4.4.1 TraceContext Message Definition
+
+Add to `src/xrpld/overlay/detail/ripple.proto`:
+
+```protobuf
+// Trace context for distributed tracing across nodes
+// Uses W3C Trace Context format internally
+message TraceContext {
+    // 16-byte trace identifier (required for valid context)
+    bytes trace_id = 1;
+
+    // 8-byte span identifier of parent span
+    bytes span_id = 2;
+
+    // Trace flags (bit 0 = sampled)
+    uint32 trace_flags = 3;
+
+    // W3C tracestate header value for vendor-specific data
+    string trace_state = 4;
+}
+
+// Extend existing messages with optional trace context
+// High field numbers (1000+) to avoid conflicts
+
+message TMTransaction {
+    // ... existing fields ...
+
+    // Optional trace context for distributed tracing
+    optional TraceContext trace_context = 1001;
+}
+
+message TMProposeSet {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMValidation {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMGetLedger {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMLedgerData {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+```
+
+### 4.4.2 Context Serialization/Deserialization
+
+```cpp
+// include/xrpl/telemetry/TraceContext.h
+#pragma once
+
+#include <opentelemetry/context/context.h>
+#include <opentelemetry/trace/span_context.h>
+#include <protocol/messages.h>  // Generated protobuf
+
+#include <optional>
+#include <string>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * Utilities for trace context serialization and propagation.
+ */
+class TraceContextPropagator
+{
+public:
+    /**
+     * Extract trace context from Protocol Buffer message.
+     * Returns empty context if no trace info present.
+     */
+    static opentelemetry::context::Context
+    extract(protocol::TraceContext const& proto);
+
+    /**
+     * Inject current trace context into Protocol Buffer message.
+     */
+    static void
+    inject(
+        opentelemetry::context::Context const& ctx,
+        protocol::TraceContext& proto);
+
+    /**
+     * Extract trace context from HTTP headers (for RPC).
+     * Supports W3C Trace Context (traceparent, tracestate).
+     */
+    static opentelemetry::context::Context
+    extractFromHeaders(
+        std::function<std::optional<std::string>(std::string_view)> headerGetter);
+
+    /**
+     * Inject trace context into HTTP headers (for RPC responses).
+     */
+    static void
+    injectToHeaders(
+        opentelemetry::context::Context const& ctx,
+        std::function<void(std::string_view, std::string_view)> headerSetter);
+};
+
+// ═══════════════════════════════════════════════════════════════════════════
+// IMPLEMENTATION
+// ═══════════════════════════════════════════════════════════════════════════
+
+inline opentelemetry::context::Context
+TraceContextPropagator::extract(protocol::TraceContext const& proto)
+{
+    using namespace opentelemetry::trace;
+
+    if (proto.trace_id().size() != 16 || proto.span_id().size() != 8)
+        return opentelemetry::context::Context{};  // Invalid, return empty
+
+    // Construct TraceId and SpanId from bytes
+    TraceId traceId(reinterpret_cast<uint8_t const*>(proto.trace_id().data()));
+    SpanId spanId(reinterpret_cast<uint8_t const*>(proto.span_id().data()));
+    TraceFlags flags(static_cast<uint8_t>(proto.trace_flags()));
+
+    // Create SpanContext from extracted data
+    SpanContext spanContext(traceId, spanId, flags, /* remote = */ true);
+
+    // Create context with extracted span as parent
+    return opentelemetry::context::Context{}.SetValue(
+        opentelemetry::trace::kSpanKey,
+        opentelemetry::nostd::shared_ptr<Span>(
+            new DefaultSpan(spanContext)));
+}
+
+inline void
+TraceContextPropagator::inject(
+    opentelemetry::context::Context const& ctx,
+    protocol::TraceContext& proto)
+{
+    using namespace opentelemetry::trace;
+
+    // Get current span from context
+    auto span = GetSpan(ctx);
+    if (!span)
+        return;
+
+    auto const& spanCtx = span->GetContext();
+    if (!spanCtx.IsValid())
+        return;
+
+    // Serialize trace_id (16 bytes)
+    auto const& traceId = spanCtx.trace_id();
+    proto.set_trace_id(traceId.Id().data(), TraceId::kSize);
+
+    // Serialize span_id (8 bytes)
+    auto const& spanId = spanCtx.span_id();
+    proto.set_span_id(spanId.Id().data(), SpanId::kSize);
+
+    // Serialize flags
+    proto.set_trace_flags(spanCtx.trace_flags().flags());
+
+    // Note: tracestate not implemented yet
+}
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.5 Module-Specific Instrumentation
+
+### 4.5.1 Transaction Relay Instrumentation
+
+```cpp
+// src/xrpld/overlay/detail/PeerImp.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+PeerImp::handleTransaction(
+    std::shared_ptr<protocol::TMTransaction> const& m)
+{
+    // Extract trace context from incoming message
+    opentelemetry::context::Context parentCtx;
+    if (m->has_trace_context())
+    {
+        parentCtx = telemetry::TraceContextPropagator::extract(
+            m->trace_context());
+    }
+
+    // Start span as child of remote span (cross-node link)
+    auto span = app_.getTelemetry().startSpan(
+        "tx.receive",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    try
+    {
+        // Parse and validate transaction
+        SerialIter sit(makeSlice(m->rawtransaction()));
+        auto stx = std::make_shared<STTx const>(sit);
+
+        // Add transaction attributes
+        guard.setAttribute("xrpl.tx.hash", to_string(stx->getTransactionID()));
+        guard.setAttribute("xrpl.tx.type", stx->getTxnType());
+        guard.setAttribute("xrpl.peer.id", remote_address_.to_string());
+
+        // Check if we've seen this transaction (HashRouter)
+        auto const [flags, suppressed] =
+            app_.getHashRouter().addSuppressionPeer(
+                stx->getTransactionID(),
+                id_);
+
+        if (suppressed)
+        {
+            guard.setAttribute("xrpl.tx.suppressed", true);
+            guard.addEvent("tx.duplicate");
+            return;  // Already processing this transaction
+        }
+
+        // Create child span for validation
+        {
+            auto validateSpan = app_.getTelemetry().startSpan("tx.validate");
+            telemetry::SpanGuard validateGuard(validateSpan);
+
+            auto [validity, reason] = checkTransaction(stx);
+            validateGuard.setAttribute("xrpl.tx.validity",
+                validity == Validity::Valid ? "valid" : "invalid");
+
+            if (validity != Validity::Valid)
+            {
+                validateGuard.setStatus(
+                    opentelemetry::trace::StatusCode::kError,
+                    reason);
+                return;
+            }
+        }
+
+        // Relay to other peers (capture context for propagation)
+        auto ctx = guard.context();
+
+        // Create child span for relay
+        auto relaySpan = app_.getTelemetry().startSpan(
+            "tx.relay",
+            ctx,
+            opentelemetry::trace::SpanKind::kClient);
+        {
+            telemetry::SpanGuard relayGuard(relaySpan);
+
+            // Inject context into outgoing message
+            protocol::TraceContext protoCtx;
+            telemetry::TraceContextPropagator::inject(
+                relayGuard.context(), protoCtx);
+
+            // Relay to other peers
+            app_.overlay().relay(
+                stx->getTransactionID(),
+                *m,
+                protoCtx,  // Pass trace context
+                exclusions);
+
+            relayGuard.setAttribute("xrpl.tx.relay_count",
+                static_cast<int64_t>(relayCount));
+        }
+
+        guard.setOk();
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        JLOG(journal_.warn()) << "Transaction handling failed: " << e.what();
+    }
+}
+```
+
+### 4.5.2 Consensus Instrumentation
+
+```cpp
+// src/xrpld/app/consensus/RCLConsensus.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+RCLConsensusAdaptor::startRound(
+    NetClock::time_point const& now,
+    RCLCxLedger::ID const& prevLedgerHash,
+    RCLCxLedger const& prevLedger,
+    hash_set<NodeID> const& peers,
+    bool proposing)
+{
+    XRPL_TRACE_CONSENSUS(app_.getTelemetry(), "consensus.round");
+
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.prev", to_string(prevLedgerHash));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.seq",
+        static_cast<int64_t>(prevLedger.seq() + 1));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.proposers",
+        static_cast<int64_t>(peers.size()));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.mode",
+        proposing ? "proposing" : "observing");
+
+    // Store trace context for use in phase transitions
+    currentRoundContext_ = _xrpl_guard_.has_value()
+        ? _xrpl_guard_->context()
+        : opentelemetry::context::Context{};
+
+    // ... existing implementation ...
+}
+
+ConsensusPhase
+RCLConsensusAdaptor::phaseTransition(ConsensusPhase newPhase)
+{
+    // Create span for phase transition
+    auto span = app_.getTelemetry().startSpan(
+        "consensus.phase." + to_string(newPhase),
+        currentRoundContext_);
+    telemetry::SpanGuard guard(span);
+
+    guard.setAttribute("xrpl.consensus.phase", to_string(newPhase));
+    guard.addEvent("phase.enter");
+
+    auto const startTime = std::chrono::steady_clock::now();
+
+    try
+    {
+        auto result = doPhaseTransition(newPhase);
+
+        auto const duration = std::chrono::steady_clock::now() - startTime;
+        guard.setAttribute("xrpl.consensus.phase_duration_ms",
+            std::chrono::duration<double, std::milli>(duration).count());
+
+        guard.setOk();
+        return result;
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        throw;
+    }
+}
+
+void
+RCLConsensusAdaptor::peerProposal(
+    NetClock::time_point const& now,
+    RCLCxPeerPos const& proposal)
+{
+    // Extract trace context from proposal message
+    opentelemetry::context::Context parentCtx;
+    if (proposal.hasTraceContext())
+    {
+        parentCtx = telemetry::TraceContextPropagator::extract(
+            proposal.traceContext());
+    }
+
+    auto span = app_.getTelemetry().startSpan(
+        "consensus.proposal.receive",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    guard.setAttribute("xrpl.consensus.proposer",
+        toBase58(TokenType::NodePublic, proposal.nodeId()));
+    guard.setAttribute("xrpl.consensus.round",
+        static_cast<int64_t>(proposal.proposal().proposeSeq()));
+
+    // ... existing implementation ...
+
+    guard.setOk();
+}
+```
+
+### 4.5.3 RPC Handler Instrumentation
+
+```cpp
+// src/xrpld/rpc/detail/ServerHandler.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+ServerHandler::onRequest(
+    http_request_type&& req,
+    std::function<void(http_response_type&&)>&& send)
+{
+    // Extract trace context from HTTP headers (W3C Trace Context)
+    auto parentCtx = telemetry::TraceContextPropagator::extractFromHeaders(
+        [&req](std::string_view name) -> std::optional<std::string> {
+            auto it = req.find(boost::beast::http::field{
+                std::string(name)});
+            if (it != req.end())
+                return std::string(it->value());
+            return std::nullopt;
+        });
+
+    // Start request span
+    auto span = app_.getTelemetry().startSpan(
+        "rpc.request",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    // Add HTTP attributes
+    guard.setAttribute("http.method", std::string(req.method_string()));
+    guard.setAttribute("http.target", std::string(req.target()));
+    guard.setAttribute("http.user_agent",
+        std::string(req[boost::beast::http::field::user_agent]));
+
+    auto const startTime = std::chrono::steady_clock::now();
+
+    try
+    {
+        // Parse and process request
+        auto const& body = req.body();
+        Json::Value jv;
+        Json::Reader reader;
+
+        if (!reader.parse(body, jv))
+        {
+            guard.setStatus(
+                opentelemetry::trace::StatusCode::kError,
+                "Invalid JSON");
+            sendError(send, "Invalid JSON");
+            return;
+        }
+
+        // Extract command name
+        std::string command = jv.isMember("command")
+            ? jv["command"].asString()
+            : jv.isMember("method")
+                ? jv["method"].asString()
+                : "unknown";
+
+        guard.setAttribute("xrpl.rpc.command", command);
+
+        // Create child span for command execution
+        auto cmdSpan = app_.getTelemetry().startSpan(
+            "rpc.command." + command);
+        {
+            telemetry::SpanGuard cmdGuard(cmdSpan);
+
+            // Execute RPC command
+            auto result = processRequest(jv);
+
+            // Record result attributes
+            if (result.isMember("status"))
+            {
+                cmdGuard.setAttribute("xrpl.rpc.status",
+                    result["status"].asString());
+            }
+
+            if (result["status"].asString() == "error")
+            {
+                cmdGuard.setStatus(
+                    opentelemetry::trace::StatusCode::kError,
+                    result.isMember("error_message")
+                        ? result["error_message"].asString()
+                        : "RPC error");
+            }
+            else
+            {
+                cmdGuard.setOk();
+            }
+        }
+
+        auto const duration = std::chrono::steady_clock::now() - startTime;
+        guard.setAttribute("http.duration_ms",
+            std::chrono::duration<double, std::milli>(duration).count());
+
+        // Inject trace context into response headers
+        http_response_type resp;
+        telemetry::TraceContextPropagator::injectToHeaders(
+            guard.context(),
+            [&resp](std::string_view name, std::string_view value) {
+                resp.set(std::string(name), std::string(value));
+            });
+
+        guard.setOk();
+        send(std::move(resp));
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        JLOG(journal_.error()) << "RPC request failed: " << e.what();
+        sendError(send, e.what());
+    }
+}
+```
+
+### 4.5.4 JobQueue Context Propagation
+
+```cpp
+// src/xrpld/core/JobQueue.h (modified)
+
+#include <opentelemetry/context/context.h>
+
+class Job
+{
+    // ... existing members ...
+
+    // Captured trace context at job creation
+    opentelemetry::context::Context traceContext_;
+
+public:
+    // Constructor captures current trace context
+    Job(JobType type, std::function<void()> func, ...)
+        : type_(type)
+        , func_(std::move(func))
+        , traceContext_(opentelemetry::context::RuntimeContext::GetCurrent())
+        // ... other initializations ...
+    {
+    }
+
+    // Get trace context for restoration during execution
+    opentelemetry::context::Context const&
+    traceContext() const { return traceContext_; }
+};
+
+// src/xrpld/core/JobQueue.cpp (modified)
+
+void
+Worker::run()
+{
+    while (auto job = getJob())
+    {
+        // Restore trace context from job creation
+        auto token = opentelemetry::context::RuntimeContext::Attach(
+            job->traceContext());
+
+        // Start execution span
+        auto span = app_.getTelemetry().startSpan("job.execute");
+        telemetry::SpanGuard guard(span);
+
+        guard.setAttribute("xrpl.job.type", to_string(job->type()));
+        guard.setAttribute("xrpl.job.queue_ms", job->queueTimeMs());
+        guard.setAttribute("xrpl.job.worker", workerId_);
+
+        try
+        {
+            job->execute();
+            guard.setOk();
+        }
+        catch (std::exception const& e)
+        {
+            guard.recordException(e);
+            JLOG(journal_.error()) << "Job execution failed: " << e.what();
+        }
+    }
+}
+```
+
+---
+
+## 4.6 Span Flow Visualization
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    subgraph Client["External Client"]
+        submit["Submit TX"]
+    end
+
+    subgraph NodeA["rippled Node A"]
+        rpcA["rpc.request"]
+        cmdA["rpc.command.submit"]
+        txRecvA["tx.receive"]
+        txValA["tx.validate"]
+        txRelayA["tx.relay"]
+    end
+
+    subgraph NodeB["rippled Node B"]
+        txRecvB["tx.receive"]
+        txValB["tx.validate"]
+        txRelayB["tx.relay"]
+    end
+
+    subgraph NodeC["rippled Node C"]
+        txRecvC["tx.receive"]
+        consensusC["consensus.round"]
+        phaseC["consensus.phase.establish"]
+    end
+
+    submit --> rpcA
+    rpcA --> cmdA
+    cmdA --> txRecvA
+    txRecvA --> txValA
+    txValA --> txRelayA
+    txRelayA -.->|"TraceContext"| txRecvB
+    txRecvB --> txValB
+    txValB --> txRelayB
+    txRelayB -.->|"TraceContext"| txRecvC
+    txRecvC --> consensusC
+    consensusC --> phaseC
+
+    style Client fill:#334155,stroke:#1e293b,color:#fff
+    style NodeA fill:#1e3a8a,stroke:#172554,color:#fff
+    style NodeB fill:#064e3b,stroke:#022c22,color:#fff
+    style NodeC fill:#78350f,stroke:#451a03,color:#fff
+    style submit fill:#e2e8f0,stroke:#cbd5e1,color:#1e293b
+    style rpcA fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style cmdA fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style txRecvA fill:#047857,stroke:#064e3b,color:#fff
+    style txValA fill:#047857,stroke:#064e3b,color:#fff
+    style txRelayA fill:#047857,stroke:#064e3b,color:#fff
+    style txRecvB fill:#047857,stroke:#064e3b,color:#fff
+    style txValB fill:#047857,stroke:#064e3b,color:#fff
+    style txRelayB fill:#047857,stroke:#064e3b,color:#fff
+    style txRecvC fill:#047857,stroke:#064e3b,color:#fff
+    style consensusC fill:#fef3c7,stroke:#fde68a,color:#1e293b
+    style phaseC fill:#fef3c7,stroke:#fde68a,color:#1e293b
+```
+
+</div>
+
+---
+
+*Previous: [Implementation Strategy](./03-implementation-strategy.md)* | *Next: [Configuration Reference](./05-configuration-reference.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/05-configuration-reference.md
+++ b/OpenTelemetryPlan/05-configuration-reference.md
@@ -0,0 +1,936 @@
+# Configuration Reference
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Code Samples](./04-code-samples.md) | [Implementation Phases](./06-implementation-phases.md)
+
+---
+
+## 5.1 rippled Configuration
+
+### 5.1.1 Configuration File Section
+
+Add to `cfg/xrpld-example.cfg`:
+
+```ini
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY (OpenTelemetry Distributed Tracing)
+# ═══════════════════════════════════════════════════════════════════════════════
+#
+# Enables distributed tracing for transaction flow, consensus, and RPC calls.
+# Traces are exported to an OpenTelemetry Collector using OTLP protocol.
+#
+# [telemetry]
+#
+# # Enable/disable telemetry (default: 0 = disabled)
+# enabled=1
+#
+# # Exporter type: "otlp_grpc" (default), "otlp_http", or "none"
+# exporter=otlp_grpc
+#
+# # OTLP endpoint (default: localhost:4317 for gRPC, localhost:4318 for HTTP)
+# endpoint=localhost:4317
+#
+# # Use TLS for exporter connection (default: 0)
+# use_tls=0
+#
+# # Path to CA certificate for TLS (optional)
+# # tls_ca_cert=/path/to/ca.crt
+#
+# # Sampling ratio: 0.0-1.0 (default: 1.0 = 100% sampling)
+# # Use lower values in production to reduce overhead
+# sampling_ratio=0.1
+#
+# # Batch processor settings
+# batch_size=512           # Spans per batch (default: 512)
+# batch_delay_ms=5000      # Max delay before sending batch (default: 5000)
+# max_queue_size=2048      # Max queued spans (default: 2048)
+#
+# # Component-specific tracing (default: all enabled except peer)
+# trace_transactions=1     # Transaction relay and processing
+# trace_consensus=1        # Consensus rounds and proposals
+# trace_rpc=1              # RPC request handling
+# trace_peer=0             # Peer messages (high volume, disabled by default)
+# trace_ledger=1           # Ledger acquisition and building
+#
+# # Service identification (automatically detected if not specified)
+# # service_name=rippled
+# # service_instance_id=<node_public_key>
+
+[telemetry]
+enabled=0
+```
+
+### 5.1.2 Configuration Options Summary
+
+| Option                | Type   | Default          | Description                               |
+| --------------------- | ------ | ---------------- | ----------------------------------------- |
+| `enabled`             | bool   | `false`          | Enable/disable telemetry                  |
+| `exporter`            | string | `"otlp_grpc"`    | Exporter type: otlp_grpc, otlp_http, none |
+| `endpoint`            | string | `localhost:4317` | OTLP collector endpoint                   |
+| `use_tls`             | bool   | `false`          | Enable TLS for exporter connection        |
+| `tls_ca_cert`         | string | `""`             | Path to CA certificate file               |
+| `sampling_ratio`      | float  | `1.0`            | Sampling ratio (0.0-1.0)                  |
+| `batch_size`          | uint   | `512`            | Spans per export batch                    |
+| `batch_delay_ms`      | uint   | `5000`           | Max delay before sending batch (ms)       |
+| `max_queue_size`      | uint   | `2048`           | Maximum queued spans                      |
+| `trace_transactions`  | bool   | `true`           | Enable transaction tracing                |
+| `trace_consensus`     | bool   | `true`           | Enable consensus tracing                  |
+| `trace_rpc`           | bool   | `true`           | Enable RPC tracing                        |
+| `trace_peer`          | bool   | `false`          | Enable peer message tracing (high volume) |
+| `trace_ledger`        | bool   | `true`           | Enable ledger tracing                     |
+| `service_name`        | string | `"rippled"`      | Service name for traces                   |
+| `service_instance_id` | string | `<node_pubkey>`  | Instance identifier                       |
+
+---
+
+## 5.2 Configuration Parser
+
+```cpp
+// src/libxrpl/telemetry/TelemetryConfig.cpp
+
+#include <xrpl/telemetry/Telemetry.h>
+#include <xrpl/basics/Log.h>
+
+namespace xrpl {
+namespace telemetry {
+
+Telemetry::Setup
+setup_Telemetry(
+    Section const& section,
+    std::string const& nodePublicKey,
+    std::string const& version)
+{
+    Telemetry::Setup setup;
+
+    // Basic settings
+    setup.enabled = section.value_or("enabled", false);
+    setup.serviceName = section.value_or("service_name", "rippled");
+    setup.serviceVersion = version;
+    setup.serviceInstanceId = section.value_or(
+        "service_instance_id", nodePublicKey);
+
+    // Exporter settings
+    setup.exporterType = section.value_or("exporter", "otlp_grpc");
+
+    if (setup.exporterType == "otlp_grpc")
+        setup.exporterEndpoint = section.value_or("endpoint", "localhost:4317");
+    else if (setup.exporterType == "otlp_http")
+        setup.exporterEndpoint = section.value_or("endpoint", "localhost:4318");
+
+    setup.useTls = section.value_or("use_tls", false);
+    setup.tlsCertPath = section.value_or("tls_ca_cert", "");
+
+    // Sampling
+    setup.samplingRatio = section.value_or("sampling_ratio", 1.0);
+    if (setup.samplingRatio < 0.0 || setup.samplingRatio > 1.0)
+    {
+        Throw<std::runtime_error>(
+            "telemetry.sampling_ratio must be between 0.0 and 1.0");
+    }
+
+    // Batch processor
+    setup.batchSize = section.value_or("batch_size", 512u);
+    setup.batchDelay = std::chrono::milliseconds{
+        section.value_or("batch_delay_ms", 5000u)};
+    setup.maxQueueSize = section.value_or("max_queue_size", 2048u);
+
+    // Component filtering
+    setup.traceTransactions = section.value_or("trace_transactions", true);
+    setup.traceConsensus = section.value_or("trace_consensus", true);
+    setup.traceRpc = section.value_or("trace_rpc", true);
+    setup.tracePeer = section.value_or("trace_peer", false);
+    setup.traceLedger = section.value_or("trace_ledger", true);
+
+    return setup;
+}
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 5.3 Application Integration
+
+### 5.3.1 ApplicationImp Changes
+
+```cpp
+// src/xrpld/app/main/Application.cpp (modified)
+
+#include <xrpl/telemetry/Telemetry.h>
+
+class ApplicationImp : public Application
+{
+    // ... existing members ...
+
+    // Telemetry (must be constructed early, destroyed late)
+    std::unique_ptr<telemetry::Telemetry> telemetry_;
+
+public:
+    ApplicationImp(...)
+    {
+        // Initialize telemetry early (before other components)
+        auto telemetrySection = config_->section("telemetry");
+        auto telemetrySetup = telemetry::setup_Telemetry(
+            telemetrySection,
+            toBase58(TokenType::NodePublic, nodeIdentity_.publicKey()),
+            BuildInfo::getVersionString());
+
+        // Set network attributes
+        telemetrySetup.networkId = config_->NETWORK_ID;
+        telemetrySetup.networkType = [&]() {
+            if (config_->NETWORK_ID == 0) return "mainnet";
+            if (config_->NETWORK_ID == 1) return "testnet";
+            if (config_->NETWORK_ID == 2) return "devnet";
+            return "custom";
+        }();
+
+        telemetry_ = telemetry::make_Telemetry(
+            telemetrySetup,
+            logs_->journal("Telemetry"));
+
+        // ... rest of initialization ...
+    }
+
+    void start() override
+    {
+        // Start telemetry first
+        if (telemetry_)
+            telemetry_->start();
+
+        // ... existing start code ...
+    }
+
+    void stop() override
+    {
+        // ... existing stop code ...
+
+        // Stop telemetry last (to capture shutdown spans)
+        if (telemetry_)
+            telemetry_->stop();
+    }
+
+    telemetry::Telemetry& getTelemetry() override
+    {
+        assert(telemetry_);
+        return *telemetry_;
+    }
+};
+```
+
+### 5.3.2 Application Interface Addition
+
+```cpp
+// include/xrpl/app/main/Application.h (modified)
+
+namespace telemetry { class Telemetry; }
+
+class Application
+{
+public:
+    // ... existing virtual methods ...
+
+    /** Get the telemetry system for distributed tracing */
+    virtual telemetry::Telemetry& getTelemetry() = 0;
+};
+```
+
+---
+
+## 5.4 CMake Integration
+
+### 5.4.1 Find OpenTelemetry Module
+
+```cmake
+# cmake/FindOpenTelemetry.cmake
+
+# Find OpenTelemetry C++ SDK
+#
+# This module defines:
+#   OpenTelemetry_FOUND - System has OpenTelemetry
+#   OpenTelemetry::api - API library target
+#   OpenTelemetry::sdk - SDK library target
+#   OpenTelemetry::otlp_grpc_exporter - OTLP gRPC exporter target
+#   OpenTelemetry::otlp_http_exporter - OTLP HTTP exporter target
+
+find_package(opentelemetry-cpp CONFIG QUIET)
+
+if(opentelemetry-cpp_FOUND)
+    set(OpenTelemetry_FOUND TRUE)
+
+    # Create imported targets if not already created by config
+    if(NOT TARGET OpenTelemetry::api)
+        add_library(OpenTelemetry::api ALIAS opentelemetry-cpp::api)
+    endif()
+    if(NOT TARGET OpenTelemetry::sdk)
+        add_library(OpenTelemetry::sdk ALIAS opentelemetry-cpp::sdk)
+    endif()
+    if(NOT TARGET OpenTelemetry::otlp_grpc_exporter)
+        add_library(OpenTelemetry::otlp_grpc_exporter ALIAS
+            opentelemetry-cpp::otlp_grpc_exporter)
+    endif()
+else()
+    # Try pkg-config fallback
+    find_package(PkgConfig QUIET)
+    if(PKG_CONFIG_FOUND)
+        pkg_check_modules(OTEL opentelemetry-cpp QUIET)
+        if(OTEL_FOUND)
+            set(OpenTelemetry_FOUND TRUE)
+            # Create imported targets from pkg-config
+            add_library(OpenTelemetry::api INTERFACE IMPORTED)
+            target_include_directories(OpenTelemetry::api INTERFACE
+                ${OTEL_INCLUDE_DIRS})
+        endif()
+    endif()
+endif()
+
+include(FindPackageHandleStandardArgs)
+find_package_handle_standard_args(OpenTelemetry
+    REQUIRED_VARS OpenTelemetry_FOUND)
+```
+
+### 5.4.2 CMakeLists.txt Changes
+
+```cmake
+# CMakeLists.txt (additions)
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY OPTIONS
+# ═══════════════════════════════════════════════════════════════════════════════
+
+option(XRPL_ENABLE_TELEMETRY
+    "Enable OpenTelemetry distributed tracing support" OFF)
+
+if(XRPL_ENABLE_TELEMETRY)
+    find_package(OpenTelemetry REQUIRED)
+
+    # Define compile-time flag
+    add_compile_definitions(XRPL_ENABLE_TELEMETRY)
+
+    message(STATUS "OpenTelemetry tracing: ENABLED")
+else()
+    message(STATUS "OpenTelemetry tracing: DISABLED")
+endif()
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY LIBRARY
+# ═══════════════════════════════════════════════════════════════════════════════
+
+if(XRPL_ENABLE_TELEMETRY)
+    add_library(xrpl_telemetry
+        src/libxrpl/telemetry/Telemetry.cpp
+        src/libxrpl/telemetry/TelemetryConfig.cpp
+        src/libxrpl/telemetry/TraceContext.cpp
+    )
+
+    target_include_directories(xrpl_telemetry
+        PUBLIC
+            ${CMAKE_CURRENT_SOURCE_DIR}/include
+    )
+
+    target_link_libraries(xrpl_telemetry
+        PUBLIC
+            OpenTelemetry::api
+            OpenTelemetry::sdk
+            OpenTelemetry::otlp_grpc_exporter
+        PRIVATE
+            xrpl_basics
+    )
+
+    # Add to main library dependencies
+    target_link_libraries(xrpld PRIVATE xrpl_telemetry)
+else()
+    # Create null implementation library
+    add_library(xrpl_telemetry
+        src/libxrpl/telemetry/NullTelemetry.cpp
+    )
+    target_include_directories(xrpl_telemetry
+        PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include
+    )
+endif()
+```
+
+---
+
+## 5.5 OpenTelemetry Collector Configuration
+
+### 5.5.1 Development Configuration
+
+```yaml
+# otel-collector-dev.yaml
+# Minimal configuration for local development
+
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  batch:
+    timeout: 1s
+    send_batch_size: 100
+
+exporters:
+  # Console output for debugging
+  logging:
+    verbosity: detailed
+    sampling_initial: 5
+    sampling_thereafter: 200
+
+  # Jaeger for trace visualization
+  jaeger:
+    endpoint: jaeger:14250
+    tls:
+      insecure: true
+
+service:
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [logging, jaeger]
+```
+
+### 5.5.2 Production Configuration
+
+```yaml
+# otel-collector-prod.yaml
+# Production configuration with filtering, sampling, and multiple backends
+
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+        tls:
+          cert_file: /etc/otel/server.crt
+          key_file: /etc/otel/server.key
+          ca_file: /etc/otel/ca.crt
+
+processors:
+  # Memory limiter to prevent OOM
+  memory_limiter:
+    check_interval: 1s
+    limit_mib: 1000
+    spike_limit_mib: 200
+
+  # Batch processing for efficiency
+  batch:
+    timeout: 5s
+    send_batch_size: 512
+    send_batch_max_size: 1024
+
+  # Tail-based sampling (keep errors and slow traces)
+  tail_sampling:
+    decision_wait: 10s
+    num_traces: 100000
+    expected_new_traces_per_sec: 1000
+    policies:
+      # Always keep error traces
+      - name: errors
+        type: status_code
+        status_code:
+          status_codes: [ERROR]
+      # Keep slow consensus rounds (>5s)
+      - name: slow-consensus
+        type: latency
+        latency:
+          threshold_ms: 5000
+      # Keep slow RPC requests (>1s)
+      - name: slow-rpc
+        type: and
+        and:
+          and_sub_policy:
+            - name: rpc-spans
+              type: string_attribute
+              string_attribute:
+                key: xrpl.rpc.command
+                values: [".*"]
+                enabled_regex_matching: true
+            - name: latency
+              type: latency
+              latency:
+                threshold_ms: 1000
+      # Probabilistic sampling for the rest
+      - name: probabilistic
+        type: probabilistic
+        probabilistic:
+          sampling_percentage: 10
+
+  # Attribute processing
+  attributes:
+    actions:
+      # Hash sensitive data
+      - key: xrpl.tx.account
+        action: hash
+      # Add deployment info
+      - key: deployment.environment
+        value: production
+        action: upsert
+
+exporters:
+  # Grafana Tempo for long-term storage
+  otlp/tempo:
+    endpoint: tempo.monitoring:4317
+    tls:
+      insecure: false
+      ca_file: /etc/otel/tempo-ca.crt
+
+  # Elastic APM for correlation with logs
+  otlp/elastic:
+    endpoint: apm.elastic:8200
+    headers:
+      Authorization: "Bearer ${ELASTIC_APM_TOKEN}"
+
+extensions:
+  health_check:
+    endpoint: 0.0.0.0:13133
+  zpages:
+    endpoint: 0.0.0.0:55679
+
+service:
+  extensions: [health_check, zpages]
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors: [memory_limiter, tail_sampling, attributes, batch]
+      exporters: [otlp/tempo, otlp/elastic]
+```
+
+---
+
+## 5.6 Docker Compose Development Environment
+
+```yaml
+# docker-compose-telemetry.yaml
+version: '3.8'
+
+services:
+  # OpenTelemetry Collector
+  otel-collector:
+    image: otel/opentelemetry-collector-contrib:0.92.0
+    container_name: otel-collector
+    command: ["--config=/etc/otel-collector-config.yaml"]
+    volumes:
+      - ./otel-collector-dev.yaml:/etc/otel-collector-config.yaml:ro
+    ports:
+      - "4317:4317"   # OTLP gRPC
+      - "4318:4318"   # OTLP HTTP
+      - "13133:13133" # Health check
+    depends_on:
+      - jaeger
+
+  # Jaeger for trace visualization
+  jaeger:
+    image: jaegertracing/all-in-one:1.53
+    container_name: jaeger
+    environment:
+      - COLLECTOR_OTLP_ENABLED=true
+    ports:
+      - "16686:16686" # UI
+      - "14250:14250" # gRPC
+
+  # Grafana for dashboards
+  grafana:
+    image: grafana/grafana:10.2.3
+    container_name: grafana
+    environment:
+      - GF_AUTH_ANONYMOUS_ENABLED=true
+      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
+    volumes:
+      - ./grafana/provisioning:/etc/grafana/provisioning:ro
+      - ./grafana/dashboards:/var/lib/grafana/dashboards:ro
+    ports:
+      - "3000:3000"
+    depends_on:
+      - jaeger
+
+  # Prometheus for metrics (optional, for correlation)
+  prometheus:
+    image: prom/prometheus:v2.48.1
+    container_name: prometheus
+    volumes:
+      - ./prometheus.yaml:/etc/prometheus/prometheus.yml:ro
+    ports:
+      - "9090:9090"
+
+networks:
+  default:
+    name: rippled-telemetry
+```
+
+---
+
+## 5.7 Configuration Architecture
+
+```mermaid
+flowchart TB
+    subgraph config["Configuration Sources"]
+        cfgFile["xrpld.cfg<br/>[telemetry] section"]
+        cmake["CMake<br/>XRPL_ENABLE_TELEMETRY"]
+    end
+
+    subgraph init["Initialization"]
+        parse["setup_Telemetry()"]
+        factory["make_Telemetry()"]
+    end
+
+    subgraph runtime["Runtime Components"]
+        tracer["TracerProvider"]
+        exporter["OTLP Exporter"]
+        processor["BatchProcessor"]
+    end
+
+    subgraph collector["Collector Pipeline"]
+        recv["Receivers"]
+        proc["Processors"]
+        exp["Exporters"]
+    end
+
+    cfgFile --> parse
+    cmake -->|"compile flag"| parse
+    parse --> factory
+    factory --> tracer
+    tracer --> processor
+    processor --> exporter
+    exporter -->|"OTLP"| recv
+    recv --> proc
+    proc --> exp
+
+    style config fill:#e3f2fd,stroke:#1976d2
+    style runtime fill:#e8f5e9,stroke:#388e3c
+    style collector fill:#fff3e0,stroke:#ff9800
+```
+
+---
+
+## 5.8 Grafana Integration
+
+Step-by-step instructions for integrating rippled traces with Grafana.
+
+### 5.8.1 Data Source Configuration
+
+#### Tempo (Recommended)
+
+```yaml
+# grafana/provisioning/datasources/tempo.yaml
+apiVersion: 1
+
+datasources:
+  - name: Tempo
+    type: tempo
+    access: proxy
+    url: http://tempo:3200
+    jsonData:
+      httpMethod: GET
+      tracesToLogs:
+        datasourceUid: loki
+        tags: ['service.name', 'xrpl.tx.hash']
+        mappedTags: [{ key: 'trace_id', value: 'traceID' }]
+        mapTagNamesEnabled: true
+        filterByTraceID: true
+      serviceMap:
+        datasourceUid: prometheus
+      nodeGraph:
+        enabled: true
+      search:
+        hide: false
+      lokiSearch:
+        datasourceUid: loki
+```
+
+#### Jaeger
+
+```yaml
+# grafana/provisioning/datasources/jaeger.yaml
+apiVersion: 1
+
+datasources:
+  - name: Jaeger
+    type: jaeger
+    access: proxy
+    url: http://jaeger:16686
+    jsonData:
+      tracesToLogs:
+        datasourceUid: loki
+        tags: ['service.name']
+```
+
+#### Elastic APM
+
+```yaml
+# grafana/provisioning/datasources/elastic-apm.yaml
+apiVersion: 1
+
+datasources:
+  - name: Elasticsearch-APM
+    type: elasticsearch
+    access: proxy
+    url: http://elasticsearch:9200
+    database: "apm-*"
+    jsonData:
+      esVersion: "8.0.0"
+      timeField: "@timestamp"
+      logMessageField: message
+      logLevelField: log.level
+```
+
+### 5.8.2 Dashboard Provisioning
+
+```yaml
+# grafana/provisioning/dashboards/dashboards.yaml
+apiVersion: 1
+
+providers:
+  - name: 'rippled-dashboards'
+    orgId: 1
+    folder: 'rippled'
+    folderUid: 'rippled'
+    type: file
+    disableDeletion: false
+    updateIntervalSeconds: 30
+    options:
+      path: /var/lib/grafana/dashboards/rippled
+```
+
+### 5.8.3 Example Dashboard: RPC Performance
+
+```json
+{
+  "title": "rippled RPC Performance",
+  "uid": "rippled-rpc-performance",
+  "panels": [
+    {
+      "title": "RPC Latency by Command",
+      "type": "heatmap",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && span.xrpl.rpc.command != \"\"} | histogram_over_time(duration) by (span.xrpl.rpc.command)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
+    },
+    {
+      "title": "RPC Error Rate",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && status.code=error} | rate() by (span.xrpl.rpc.command)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
+    },
+    {
+      "title": "Top 10 Slowest RPC Commands",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && span.xrpl.rpc.command != \"\"} | avg(duration) by (span.xrpl.rpc.command) | topk(10)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 8 }
+    },
+    {
+      "title": "Recent Traces",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"}"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 }
+    }
+  ]
+}
+```
+
+### 5.8.4 Example Dashboard: Transaction Tracing
+
+```json
+{
+  "title": "rippled Transaction Tracing",
+  "uid": "rippled-tx-tracing",
+  "panels": [
+    {
+      "title": "Transaction Throughput",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | rate()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Cross-Node Relay Count",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.relay\"} | avg(span.xrpl.tx.relay_count)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 }
+    },
+    {
+      "title": "Transaction Validation Errors",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.validate\" && status.code=error}"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 }
+    }
+  ]
+}
+```
+
+### 5.8.5 TraceQL Query Examples
+
+Common queries for rippled traces:
+
+```
+# Find all traces for a specific transaction hash
+{resource.service.name="rippled" && span.xrpl.tx.hash="ABC123..."}
+
+# Find slow RPC commands (>100ms)
+{resource.service.name="rippled" && name=~"rpc.command.*"} | duration > 100ms
+
+# Find consensus rounds taking >5 seconds
+{resource.service.name="rippled" && name="consensus.round"} | duration > 5s
+
+# Find failed transactions with error details
+{resource.service.name="rippled" && name="tx.validate" && status.code=error}
+
+# Find transactions relayed to many peers
+{resource.service.name="rippled" && name="tx.relay"} | span.xrpl.tx.relay_count > 10
+
+# Compare latency across nodes
+{resource.service.name="rippled" && name="rpc.command.account_info"} | avg(duration) by (resource.service.instance.id)
+```
+
+### 5.8.6 Correlation with PerfLog
+
+To correlate OpenTelemetry traces with existing PerfLog data:
+
+**Step 1: Configure Loki to ingest PerfLog**
+
+```yaml
+# promtail-config.yaml
+scrape_configs:
+  - job_name: rippled-perflog
+    static_configs:
+      - targets:
+          - localhost
+        labels:
+          job: rippled
+          __path__: /var/log/rippled/perf*.log
+    pipeline_stages:
+      - json:
+          expressions:
+            trace_id: trace_id
+            ledger_seq: ledger_seq
+            tx_hash: tx_hash
+      - labels:
+          trace_id:
+          ledger_seq:
+          tx_hash:
+```
+
+**Step 2: Add trace_id to PerfLog entries**
+
+Modify PerfLog to include trace_id when available:
+
+```cpp
+// In PerfLog output, add trace_id from current span context
+void logPerf(Json::Value& entry) {
+    auto span = opentelemetry::trace::GetSpan(
+        opentelemetry::context::RuntimeContext::GetCurrent());
+    if (span && span->GetContext().IsValid()) {
+        char traceIdHex[33];
+        span->GetContext().trace_id().ToLowerBase16(traceIdHex);
+        entry["trace_id"] = std::string(traceIdHex, 32);
+    }
+    // ... existing logging
+}
+```
+
+**Step 3: Configure Grafana trace-to-logs link**
+
+In Tempo data source configuration, set up the derived field:
+
+```yaml
+jsonData:
+  tracesToLogs:
+    datasourceUid: loki
+    tags: ['trace_id', 'xrpl.tx.hash']
+    filterByTraceID: true
+    filterBySpanID: false
+```
+
+### 5.8.7 Correlation with Insight/StatsD Metrics
+
+To correlate traces with existing Beast Insight metrics:
+
+**Step 1: Export Insight metrics to Prometheus**
+
+```yaml
+# prometheus.yaml
+scrape_configs:
+  - job_name: 'rippled-statsd'
+    static_configs:
+      - targets: ['statsd-exporter:9102']
+```
+
+**Step 2: Add exemplars to metrics**
+
+OpenTelemetry SDK automatically adds exemplars (trace IDs) to metrics when using the Prometheus exporter. This links metrics spikes to specific traces.
+
+**Step 3: Configure Grafana metric-to-trace link**
+
+```yaml
+# In Prometheus data source
+jsonData:
+  exemplarTraceIdDestinations:
+    - name: trace_id
+      datasourceUid: tempo
+```
+
+**Step 4: Dashboard panel with exemplars**
+
+```json
+{
+  "title": "RPC Latency with Trace Links",
+  "type": "timeseries",
+  "datasource": "Prometheus",
+  "targets": [
+    {
+      "expr": "histogram_quantile(0.99, rate(rippled_rpc_duration_seconds_bucket[5m]))",
+      "exemplar": true
+    }
+  ]
+}
+```
+
+This allows clicking on metric data points to jump directly to the related trace.
+
+---
+
+*Previous: [Code Samples](./04-code-samples.md)* | *Next: [Implementation Phases](./06-implementation-phases.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/06-implementation-phases.md
+++ b/OpenTelemetryPlan/06-implementation-phases.md
@@ -0,0 +1,537 @@
+# Implementation Phases
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Configuration Reference](./05-configuration-reference.md) | [Observability Backends](./07-observability-backends.md)
+
+---
+
+## 6.1 Phase Overview
+
+```mermaid
+gantt
+    title OpenTelemetry Implementation Timeline
+    dateFormat  YYYY-MM-DD
+    axisFormat  Week %W
+
+    section Phase 1
+    Core Infrastructure        :p1, 2024-01-01, 2w
+    SDK Integration           :p1a, 2024-01-01, 4d
+    Telemetry Interface       :p1b, after p1a, 3d
+    Configuration & CMake     :p1c, after p1b, 3d
+    Unit Tests                :p1d, after p1c, 2d
+
+    section Phase 2
+    RPC Tracing               :p2, after p1, 2w
+    HTTP Context Extraction   :p2a, after p1, 2d
+    RPC Handler Instrumentation :p2b, after p2a, 4d
+    WebSocket Support         :p2c, after p2b, 2d
+    Integration Tests         :p2d, after p2c, 2d
+
+    section Phase 3
+    Transaction Tracing       :p3, after p2, 2w
+    Protocol Buffer Extension :p3a, after p2, 2d
+    PeerImp Instrumentation   :p3b, after p3a, 3d
+    Relay Context Propagation :p3c, after p3b, 3d
+    Multi-node Tests          :p3d, after p3c, 2d
+
+    section Phase 4
+    Consensus Tracing         :p4, after p3, 2w
+    Consensus Round Spans     :p4a, after p3, 3d
+    Proposal Handling         :p4b, after p4a, 3d
+    Validation Tests          :p4c, after p4b, 4d
+
+    section Phase 5
+    Documentation & Deploy    :p5, after p4, 1w
+```
+
+---
+
+## 6.2 Phase 1: Core Infrastructure (Weeks 1-2)
+
+**Objective**: Establish foundational telemetry infrastructure
+
+### Tasks
+
+| Task | Description                                           | Effort | Risk   |
+| ---- | ----------------------------------------------------- | ------ | ------ |
+| 1.1  | Add OpenTelemetry C++ SDK to Conan/CMake              | 2d     | Low    |
+| 1.2  | Implement `Telemetry` interface and factory           | 2d     | Low    |
+| 1.3  | Implement `SpanGuard` RAII wrapper                    | 1d     | Low    |
+| 1.4  | Implement configuration parser                        | 1d     | Low    |
+| 1.5  | Integrate into `ApplicationImp`                       | 1d     | Medium |
+| 1.6  | Add conditional compilation (`XRPL_ENABLE_TELEMETRY`) | 1d     | Low    |
+| 1.7  | Create `NullTelemetry` no-op implementation           | 0.5d   | Low    |
+| 1.8  | Unit tests for core infrastructure                    | 1.5d   | Low    |
+
+**Total Effort**: 10 days (2 developers)
+
+### Exit Criteria
+
+- [ ] OpenTelemetry SDK compiles and links
+- [ ] Telemetry can be enabled/disabled via config
+- [ ] Basic span creation works
+- [ ] No performance regression when disabled
+- [ ] Unit tests passing
+
+---
+
+## 6.3 Phase 2: RPC Tracing (Weeks 3-4)
+
+**Objective**: Complete tracing for all RPC operations
+
+### Tasks
+
+| Task | Description                                        | Effort | Risk   |
+| ---- | -------------------------------------------------- | ------ | ------ |
+| 2.1  | Implement W3C Trace Context HTTP header extraction | 1d     | Low    |
+| 2.2  | Instrument `ServerHandler::onRequest()`            | 1d     | Low    |
+| 2.3  | Instrument `RPCHandler::doCommand()`               | 2d     | Medium |
+| 2.4  | Add RPC-specific attributes                        | 1d     | Low    |
+| 2.5  | Instrument WebSocket handler                       | 1d     | Medium |
+| 2.6  | Integration tests for RPC tracing                  | 2d     | Low    |
+| 2.7  | Performance benchmarks                             | 1d     | Low    |
+| 2.8  | Documentation                                      | 1d     | Low    |
+
+**Total Effort**: 10 days
+
+### Exit Criteria
+
+- [ ] All RPC commands traced
+- [ ] Trace context propagates from HTTP headers
+- [ ] WebSocket and HTTP both instrumented
+- [ ] <1ms overhead per RPC call
+- [ ] Integration tests passing
+
+---
+
+## 6.4 Phase 3: Transaction Tracing (Weeks 5-6)
+
+**Objective**: Trace transaction lifecycle across network
+
+### Tasks
+
+| Task | Description                                   | Effort | Risk   |
+| ---- | --------------------------------------------- | ------ | ------ |
+| 3.1  | Define `TraceContext` Protocol Buffer message | 1d     | Low    |
+| 3.2  | Implement protobuf context serialization      | 1d     | Low    |
+| 3.3  | Instrument `PeerImp::handleTransaction()`     | 2d     | Medium |
+| 3.4  | Instrument `NetworkOPs::submitTransaction()`  | 1d     | Medium |
+| 3.5  | Instrument HashRouter integration             | 1d     | Medium |
+| 3.6  | Implement relay context propagation           | 2d     | High   |
+| 3.7  | Integration tests (multi-node)                | 2d     | Medium |
+| 3.8  | Performance benchmarks                        | 1d     | Low    |
+
+**Total Effort**: 11 days
+
+### Exit Criteria
+
+- [ ] Transaction traces span across nodes
+- [ ] Trace context in Protocol Buffer messages
+- [ ] HashRouter deduplication visible in traces
+- [ ] Multi-node integration tests passing
+- [ ] <5% overhead on transaction throughput
+
+---
+
+## 6.5 Phase 4: Consensus Tracing (Weeks 7-8)
+
+**Objective**: Full observability into consensus rounds
+
+### Tasks
+
+| Task | Description                                    | Effort | Risk   |
+| ---- | ---------------------------------------------- | ------ | ------ |
+| 4.1  | Instrument `RCLConsensusAdaptor::startRound()` | 1d     | Medium |
+| 4.2  | Instrument phase transitions                   | 2d     | Medium |
+| 4.3  | Instrument proposal handling                   | 2d     | High   |
+| 4.4  | Instrument validation handling                 | 1d     | Medium |
+| 4.5  | Add consensus-specific attributes              | 1d     | Low    |
+| 4.6  | Correlate with transaction traces              | 1d     | Medium |
+| 4.7  | Multi-validator integration tests              | 2d     | High   |
+| 4.8  | Performance validation                         | 1d     | Medium |
+
+**Total Effort**: 11 days
+
+### Exit Criteria
+
+- [ ] Complete consensus round traces
+- [ ] Phase transitions visible
+- [ ] Proposals and validations traced
+- [ ] No impact on consensus timing
+- [ ] Multi-validator test network validated
+
+---
+
+## 6.6 Phase 5: Documentation & Deployment (Week 9)
+
+**Objective**: Production readiness
+
+### Tasks
+
+| Task | Description                   | Effort | Risk |
+| ---- | ----------------------------- | ------ | ---- |
+| 5.1  | Operator runbook              | 1d     | Low  |
+| 5.2  | Grafana dashboards            | 1d     | Low  |
+| 5.3  | Alert definitions             | 0.5d   | Low  |
+| 5.4  | Collector deployment examples | 0.5d   | Low  |
+| 5.5  | Developer documentation       | 1d     | Low  |
+| 5.6  | Training materials            | 0.5d   | Low  |
+| 5.7  | Final integration testing     | 0.5d   | Low  |
+
+**Total Effort**: 5 days
+
+---
+
+## 6.7 Risk Assessment
+
+```mermaid
+quadrantChart
+    title Risk Assessment Matrix
+    x-axis Low Impact --> High Impact
+    y-axis Low Likelihood --> High Likelihood
+    quadrant-1 Monitor Closely
+    quadrant-2 Mitigate Immediately
+    quadrant-3 Accept Risk
+    quadrant-4 Plan Mitigation
+
+    SDK Compatibility: [0.25, 0.2]
+    Protocol Changes: [0.75, 0.65]
+    Performance Overhead: [0.65, 0.45]
+    Context Propagation: [0.5, 0.5]
+    Memory Leaks: [0.8, 0.2]
+```
+
+### Risk Details
+
+| Risk                                 | Likelihood | Impact | Mitigation                              |
+| ------------------------------------ | ---------- | ------ | --------------------------------------- |
+| Protocol changes break compatibility | Medium     | High   | Use high field numbers, optional fields |
+| Performance overhead unacceptable    | Medium     | Medium | Sampling, conditional compilation       |
+| Context propagation complexity       | Medium     | Medium | Phased rollout, extensive testing       |
+| SDK compatibility issues             | Low        | Medium | Pin SDK version, fallback to no-op      |
+| Memory leaks in long-running nodes   | Low        | High   | Memory profiling, bounded queues        |
+
+---
+
+## 6.8 Success Metrics
+
+| Metric                   | Target                         | Measurement           |
+| ------------------------ | ------------------------------ | --------------------- |
+| Trace coverage           | >95% of transactions           | Sampling verification |
+| CPU overhead             | <3%                            | Benchmark tests       |
+| Memory overhead          | <5 MB                          | Memory profiling      |
+| Latency impact (p99)     | <2%                            | Performance tests     |
+| Trace completeness       | >99% spans with required attrs | Validation script     |
+| Cross-node trace linkage | >90% of multi-hop transactions | Integration tests     |
+
+---
+
+## 6.9 Effort Summary
+
+<div align="center">
+
+```mermaid
+%%{init: {'pie': {'textPosition': 0.75}}}%%
+pie showData
+    "Phase 1: Core Infrastructure" : 10
+    "Phase 2: RPC Tracing" : 10
+    "Phase 3: Transaction Tracing" : 11
+    "Phase 4: Consensus Tracing" : 11
+    "Phase 5: Documentation" : 5
+```
+
+**Total Effort Distribution (47 developer-days)**
+
+</div>
+
+### Resource Requirements
+
+| Phase     | Developers | Duration    | Total Effort |
+| --------- | ---------- | ----------- | ------------ |
+| 1         | 2          | 2 weeks     | 10 days      |
+| 2         | 1-2        | 2 weeks     | 10 days      |
+| 3         | 2          | 2 weeks     | 11 days      |
+| 4         | 2          | 2 weeks     | 11 days      |
+| 5         | 1          | 1 week      | 5 days       |
+| **Total** | **2**      | **9 weeks** | **47 days**  |
+
+---
+
+## 6.10 Quick Wins and Crawl-Walk-Run Strategy
+
+This section outlines a prioritized approach to maximize ROI with minimal initial investment.
+
+### 6.10.1 Crawl-Walk-Run Overview
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    subgraph crawl["🐢 CRAWL (Week 1-2)"]
+        direction LR
+        c1[Core SDK Setup] ~~~ c2[RPC Tracing Only] ~~~ c3[Single Node]
+    end
+
+    subgraph walk["🚶 WALK (Week 3-5)"]
+        direction LR
+        w1[Transaction Tracing] ~~~ w2[Cross-Node Context] ~~~ w3[Basic Dashboards]
+    end
+
+    subgraph run["🏃 RUN (Week 6-9)"]
+        direction LR
+        r1[Consensus Tracing] ~~~ r2[Full Correlation] ~~~ r3[Production Deploy]
+    end
+
+    crawl --> walk --> run
+
+    style crawl fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style walk fill:#bf360c,stroke:#8c2809,color:#fff
+    style run fill:#0d47a1,stroke:#082f6a,color:#fff
+    style c1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style c2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style c3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style w1 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style w2 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style w3 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style r1 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style r2 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style r3 fill:#0d47a1,stroke:#082f6a,color:#fff
+```
+
+</div>
+
+### 6.10.2 Quick Wins (Immediate Value)
+
+| Quick Win                      | Effort   | Value  | When to Deploy |
+| ------------------------------ | -------- | ------ | -------------- |
+| **RPC Command Tracing**        | 2 days   | High   | Week 2         |
+| **RPC Latency Histograms**     | 0.5 days | High   | Week 2         |
+| **Error Rate Dashboard**       | 0.5 days | Medium | Week 2         |
+| **Transaction Submit Tracing** | 1 day    | High   | Week 3         |
+| **Consensus Round Duration**   | 1 day    | Medium | Week 6         |
+
+### 6.10.3 CRAWL Phase (Weeks 1-2)
+
+**Goal**: Get basic tracing working with minimal code changes.
+
+**What You Get**:
+- RPC request/response traces for all commands
+- Latency breakdown per RPC command
+- Error visibility with stack traces
+- Basic Grafana dashboard
+
+**Code Changes**: ~15 lines in `ServerHandler.cpp`, ~40 lines in new telemetry module
+
+**Why Start Here**:
+- RPC is the lowest-risk, highest-visibility component
+- Immediate value for debugging client issues
+- No cross-node complexity
+- Single file modification to existing code
+
+### 6.10.4 WALK Phase (Weeks 3-5)
+
+**Goal**: Add transaction lifecycle tracing across nodes.
+
+**What You Get**:
+- End-to-end transaction traces from submit to relay
+- Cross-node correlation (see transaction path)
+- HashRouter deduplication visibility
+- Relay latency metrics
+
+**Code Changes**: ~120 lines across 4 files, plus protobuf extension
+
+**Why Do This Second**:
+- Builds on RPC tracing (transactions submitted via RPC)
+- Moderate complexity (requires context propagation)
+- High value for debugging transaction issues
+
+### 6.10.5 RUN Phase (Weeks 6-9)
+
+**Goal**: Full observability including consensus.
+
+**What You Get**:
+- Complete consensus round visibility
+- Phase transition timing
+- Validator proposal tracking
+- Full end-to-end traces (client → RPC → TX → consensus → ledger)
+
+**Code Changes**: ~100 lines across 3 consensus files
+
+**Why Do This Last**:
+- Highest complexity (consensus is critical path)
+- Requires thorough testing
+- Lower relative value (consensus issues are rarer)
+
+### 6.10.6 ROI Prioritization Matrix
+
+```mermaid
+quadrantChart
+    title Implementation ROI Matrix
+    x-axis Low Effort --> High Effort
+    y-axis Low Value --> High Value
+    quadrant-1 Quick Wins - Do First
+    quadrant-2 Major Projects - Plan Carefully
+    quadrant-3 Nice to Have - Optional
+    quadrant-4 Time Sinks - Avoid
+
+    RPC Tracing: [0.15, 0.9]
+    TX Submit Trace: [0.25, 0.85]
+    TX Relay Trace: [0.5, 0.8]
+    Consensus Trace: [0.7, 0.75]
+    Peer Message Trace: [0.85, 0.3]
+    Ledger Acquire: [0.55, 0.5]
+```
+
+---
+
+## 6.11 Definition of Done
+
+Clear, measurable criteria for each phase.
+
+### 6.11.1 Phase 1: Core Infrastructure
+
+| Criterion       | Measurement                                                | Target                       |
+| --------------- | ---------------------------------------------------------- | ---------------------------- |
+| SDK Integration | `cmake --build` succeeds with `-DXRPL_ENABLE_TELEMETRY=ON` | ✅ Compiles                   |
+| Runtime Toggle  | `enabled=0` produces zero overhead                         | <0.1% CPU difference         |
+| Span Creation   | Unit test creates and exports span                         | Span appears in Jaeger       |
+| Configuration   | All config options parsed correctly                        | Config validation tests pass |
+| Documentation   | Developer guide exists                                     | PR approved                  |
+
+**Definition of Done**: All criteria met, PR merged, no regressions in CI.
+
+### 6.11.2 Phase 2: RPC Tracing
+
+| Criterion          | Measurement                        | Target                     |
+| ------------------ | ---------------------------------- | -------------------------- |
+| Coverage           | All RPC commands instrumented      | 100% of commands           |
+| Context Extraction | traceparent header propagates      | Integration test passes    |
+| Attributes         | Command, status, duration recorded | Validation script confirms |
+| Performance        | RPC latency overhead               | <1ms p99                   |
+| Dashboard          | Grafana dashboard deployed         | Screenshot in docs         |
+
+**Definition of Done**: RPC traces visible in Jaeger/Tempo for all commands, dashboard shows latency distribution.
+
+### 6.11.3 Phase 3: Transaction Tracing
+
+| Criterion        | Measurement                     | Target                             |
+| ---------------- | ------------------------------- | ---------------------------------- |
+| Local Trace      | Submit → validate → TxQ traced  | Single-node test passes            |
+| Cross-Node       | Context propagates via protobuf | Multi-node test passes             |
+| Relay Visibility | relay_count attribute correct   | Spot check 100 txs                 |
+| HashRouter       | Deduplication visible in trace  | Duplicate txs show suppressed=true |
+| Performance      | TX throughput overhead          | <5% degradation                    |
+
+**Definition of Done**: Transaction traces span 3+ nodes in test network, performance within bounds.
+
+### 6.11.4 Phase 4: Consensus Tracing
+
+| Criterion            | Measurement                   | Target                    |
+| -------------------- | ----------------------------- | ------------------------- |
+| Round Tracing        | startRound creates root span  | Unit test passes          |
+| Phase Visibility     | All phases have child spans   | Integration test confirms |
+| Proposer Attribution | Proposer ID in attributes     | Spot check 50 rounds      |
+| Timing Accuracy      | Phase durations match PerfLog | <5% variance              |
+| No Consensus Impact  | Round timing unchanged        | Performance test passes   |
+
+**Definition of Done**: Consensus rounds fully traceable, no impact on consensus timing.
+
+### 6.11.5 Phase 5: Production Deployment
+
+| Criterion    | Measurement                  | Target                     |
+| ------------ | ---------------------------- | -------------------------- |
+| Collector HA | Multiple collectors deployed | No single point of failure |
+| Sampling     | Tail sampling configured     | 10% base + errors + slow   |
+| Retention    | Data retained per policy     | 7 days hot, 30 days warm   |
+| Alerting     | Alerts configured            | Error spike, high latency  |
+| Runbook      | Operator documentation       | Approved by ops team       |
+| Training     | Team trained                 | Session completed          |
+
+**Definition of Done**: Telemetry running in production, operators trained, alerts active.
+
+### 6.11.6 Success Metrics Summary
+
+| Phase   | Primary Metric         | Secondary Metric            | Deadline      |
+| ------- | ---------------------- | --------------------------- | ------------- |
+| Phase 1 | SDK compiles and runs  | Zero overhead when disabled | End of Week 2 |
+| Phase 2 | 100% RPC coverage      | <1ms latency overhead       | End of Week 4 |
+| Phase 3 | Cross-node traces work | <5% throughput impact       | End of Week 6 |
+| Phase 4 | Consensus fully traced | No consensus timing impact  | End of Week 8 |
+| Phase 5 | Production deployment  | Operators trained           | End of Week 9 |
+
+---
+
+## 6.12 Recommended Implementation Order
+
+Based on ROI analysis, implement in this exact order:
+
+```mermaid
+flowchart TB
+    subgraph week1["Week 1"]
+        t1[1. OpenTelemetry SDK<br/>Conan/CMake integration]
+        t2[2. Telemetry interface<br/>SpanGuard, config]
+    end
+
+    subgraph week2["Week 2"]
+        t3[3. RPC ServerHandler<br/>instrumentation]
+        t4[4. Basic Jaeger setup<br/>for testing]
+    end
+
+    subgraph week3["Week 3"]
+        t5[5. Transaction submit<br/>tracing]
+        t6[6. Grafana dashboard<br/>v1]
+    end
+
+    subgraph week4["Week 4"]
+        t7[7. Protobuf context<br/>extension]
+        t8[8. PeerImp tx.relay<br/>instrumentation]
+    end
+
+    subgraph week5["Week 5"]
+        t9[9. Multi-node<br/>integration tests]
+        t10[10. Performance<br/>benchmarks]
+    end
+
+    subgraph week6_8["Weeks 6-8"]
+        t11[11. Consensus<br/>instrumentation]
+        t12[12. Full integration<br/>testing]
+    end
+
+    subgraph week9["Week 9"]
+        t13[13. Production<br/>deployment]
+        t14[14. Documentation<br/>& training]
+    end
+
+    t1 --> t2 --> t3 --> t4
+    t4 --> t5 --> t6
+    t6 --> t7 --> t8
+    t8 --> t9 --> t10
+    t10 --> t11 --> t12
+    t12 --> t13 --> t14
+
+    style week1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style week2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style week3 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week4 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week5 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week6_8 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style week9 fill:#4a148c,stroke:#2e0d57,color:#fff
+    style t1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t4 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t5 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t6 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t7 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t8 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t9 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t10 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t11 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style t12 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style t13 fill:#4a148c,stroke:#2e0d57,color:#fff
+    style t14 fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+---
+
+*Previous: [Configuration Reference](./05-configuration-reference.md)* | *Next: [Observability Backends](./07-observability-backends.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/07-observability-backends.md
+++ b/OpenTelemetryPlan/07-observability-backends.md
@@ -0,0 +1,590 @@
+# Observability Backend Recommendations
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Implementation Phases](./06-implementation-phases.md) | [Appendix](./08-appendix.md)
+
+---
+
+## 7.1 Development/Testing Backends
+
+| Backend    | Pros                | Cons              | Use Case          |
+| ---------- | ------------------- | ----------------- | ----------------- |
+| **Jaeger** | Easy setup, good UI | Limited retention | Local dev, CI     |
+| **Zipkin** | Simple, lightweight | Basic features    | Quick prototyping |
+
+### Quick Start with Jaeger
+
+```bash
+# Start Jaeger with OTLP support
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+```
+
+---
+
+## 7.2 Production Backends
+
+| Backend           | Pros                                      | Cons               | Use Case                    |
+| ----------------- | ----------------------------------------- | ------------------ | --------------------------- |
+| **Grafana Tempo** | Cost-effective, Grafana integration       | Newer project      | Most production deployments |
+| **Elastic APM**   | Full observability stack, log correlation | Resource intensive | Existing Elastic users      |
+| **Honeycomb**     | Excellent query, high cardinality         | SaaS cost          | Deep debugging needs        |
+| **Datadog APM**   | Full platform, easy setup                 | SaaS cost          | Enterprise with budget      |
+
+### Backend Selection Flowchart
+
+```mermaid
+flowchart TD
+    start[Select Backend] --> budget{Budget<br/>Constraints?}
+
+    budget -->|Yes| oss[Open Source]
+    budget -->|No| saas{Prefer<br/>SaaS?}
+
+    oss --> existing{Existing<br/>Stack?}
+    existing -->|Grafana| tempo[Grafana Tempo]
+    existing -->|Elastic| elastic[Elastic APM]
+    existing -->|None| tempo
+
+    saas -->|Yes| enterprise{Enterprise<br/>Support?}
+    saas -->|No| oss
+
+    enterprise -->|Yes| datadog[Datadog APM]
+    enterprise -->|No| honeycomb[Honeycomb]
+
+    tempo --> final[Configure Collector]
+    elastic --> final
+    honeycomb --> final
+    datadog --> final
+
+    style start fill:#0f172a,stroke:#020617,color:#fff
+    style budget fill:#334155,stroke:#1e293b,color:#fff
+    style oss fill:#1e293b,stroke:#0f172a,color:#fff
+    style existing fill:#334155,stroke:#1e293b,color:#fff
+    style saas fill:#334155,stroke:#1e293b,color:#fff
+    style enterprise fill:#334155,stroke:#1e293b,color:#fff
+    style final fill:#0f172a,stroke:#020617,color:#fff
+    style tempo fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style elastic fill:#bf360c,stroke:#8c2809,color:#fff
+    style honeycomb fill:#0d47a1,stroke:#082f6a,color:#fff
+    style datadog fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+---
+
+## 7.3 Recommended Production Architecture
+
+```mermaid
+flowchart TB
+    subgraph validators["Validator Nodes"]
+        v1[rippled<br/>Validator 1]
+        v2[rippled<br/>Validator 2]
+    end
+
+    subgraph stock["Stock Nodes"]
+        s1[rippled<br/>Stock 1]
+        s2[rippled<br/>Stock 2]
+    end
+
+    subgraph collector["OTel Collector Cluster"]
+        c1[Collector<br/>DC1]
+        c2[Collector<br/>DC2]
+    end
+
+    subgraph backends["Storage Backends"]
+        tempo[(Grafana<br/>Tempo)]
+        elastic[(Elastic<br/>APM)]
+        archive[(S3/GCS<br/>Archive)]
+    end
+
+    subgraph ui["Visualization"]
+        grafana[Grafana<br/>Dashboards]
+    end
+
+    v1 -->|OTLP| c1
+    v2 -->|OTLP| c1
+    s1 -->|OTLP| c2
+    s2 -->|OTLP| c2
+
+    c1 --> tempo
+    c1 --> elastic
+    c2 --> tempo
+    c2 --> archive
+
+    tempo --> grafana
+    elastic --> grafana
+
+    style validators fill:#b71c1c,stroke:#7f1d1d,color:#ffffff
+    style stock fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style collector fill:#bf360c,stroke:#8c2809,color:#ffffff
+    style backends fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style ui fill:#4a148c,stroke:#2e0d57,color:#ffffff
+```
+
+---
+
+## 7.4 Architecture Considerations
+
+### 7.4.1 Collector Placement
+
+| Strategy      | Description          | Pros                     | Cons                    |
+| ------------- | -------------------- | ------------------------ | ----------------------- |
+| **Sidecar**   | Collector per node   | Isolation, simple config | Resource overhead       |
+| **DaemonSet** | Collector per host   | Shared resources         | Complexity              |
+| **Gateway**   | Central collector(s) | Centralized processing   | Single point of failure |
+
+**Recommendation**: Use **Gateway** pattern with regional collectors for rippled networks:
+- One collector cluster per datacenter/region
+- Tail-based sampling at collector level
+- Multiple export destinations for redundancy
+
+### 7.4.2 Sampling Strategy
+
+```mermaid
+flowchart LR
+    subgraph head["Head Sampling (Node)"]
+        hs[10% probabilistic]
+    end
+
+    subgraph tail["Tail Sampling (Collector)"]
+        ts1[Keep all errors]
+        ts2[Keep slow >5s]
+        ts3[Keep 10% rest]
+    end
+
+    head --> tail
+
+    ts1 --> final[Final Traces]
+    ts2 --> final
+    ts3 --> final
+
+    style head fill:#0d47a1,stroke:#082f6a,color:#fff
+    style tail fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style hs fill:#0d47a1,stroke:#082f6a,color:#fff
+    style ts1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style ts2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style ts3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style final fill:#bf360c,stroke:#8c2809,color:#fff
+```
+
+### 7.4.3 Data Retention
+
+| Environment | Hot Storage | Warm Storage | Cold Archive |
+| ----------- | ----------- | ------------ | ------------ |
+| Development | 24 hours    | N/A          | N/A          |
+| Staging     | 7 days      | N/A          | N/A          |
+| Production  | 7 days      | 30 days      | many years   |
+
+---
+
+## 7.5 Integration Checklist
+
+- [ ] Choose primary backend (Tempo recommended for cost/features)
+- [ ] Deploy collector cluster with high availability
+- [ ] Configure tail-based sampling for error/latency traces
+- [ ] Set up Grafana dashboards for trace visualization
+- [ ] Configure alerts for trace anomalies
+- [ ] Establish data retention policies
+- [ ] Test trace correlation with logs and metrics
+
+---
+
+## 7.6 Grafana Dashboard Examples
+
+Pre-built dashboards for rippled observability.
+
+### 7.6.1 Consensus Health Dashboard
+
+```json
+{
+  "title": "rippled Consensus Health",
+  "uid": "rippled-consensus-health",
+  "tags": ["rippled", "consensus", "tracing"],
+  "panels": [
+    {
+      "title": "Consensus Round Duration",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | avg(duration) by (resource.service.instance.id)"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "unit": "ms",
+          "thresholds": {
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "yellow", "value": 4000 },
+              { "color": "red", "value": 5000 }
+            ]
+          }
+        }
+      },
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Phase Duration Breakdown",
+      "type": "barchart",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=~\"consensus.phase.*\"} | avg(duration) by (name)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
+    },
+    {
+      "title": "Proposers per Round",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | avg(span.xrpl.consensus.proposers)"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 }
+    },
+    {
+      "title": "Recent Slow Rounds (>5s)",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | duration > 5s"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 12 }
+    }
+  ]
+}
+```
+
+### 7.6.2 Node Overview Dashboard
+
+```json
+{
+  "title": "rippled Node Overview",
+  "uid": "rippled-node-overview",
+  "panels": [
+    {
+      "title": "Active Nodes",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"} | count_over_time() by (resource.service.instance.id) | count()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 4, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Total Transactions (1h)",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | count()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 4, "x": 4, "y": 0 }
+    },
+    {
+      "title": "Error Rate",
+      "type": "gauge",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && status.code=error} | rate() / {resource.service.name=\"rippled\"} | rate() * 100"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "unit": "percent",
+          "max": 10,
+          "thresholds": {
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "yellow", "value": 1 },
+              { "color": "red", "value": 5 }
+            ]
+          }
+        }
+      },
+      "gridPos": { "h": 4, "w": 4, "x": 8, "y": 0 }
+    },
+    {
+      "title": "Service Map",
+      "type": "nodeGraph",
+      "datasource": "Tempo",
+      "gridPos": { "h": 12, "w": 12, "x": 12, "y": 0 }
+    }
+  ]
+}
+```
+
+### 7.6.3 Alert Rules
+
+```yaml
+# grafana/provisioning/alerting/rippled-alerts.yaml
+apiVersion: 1
+
+groups:
+  - name: rippled-tracing-alerts
+    folder: rippled
+    interval: 1m
+    rules:
+      - uid: consensus-slow
+        title: Consensus Round Slow
+        condition: A
+        data:
+          - refId: A
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name="consensus.round"} | avg(duration) > 5s'
+        for: 5m
+        annotations:
+          summary: Consensus rounds taking >5 seconds
+          description: "Consensus duration: {{ $value }}ms"
+        labels:
+          severity: warning
+
+      - uid: rpc-error-spike
+        title: RPC Error Rate Spike
+        condition: B
+        data:
+          - refId: B
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name=~"rpc.command.*" && status.code=error} | rate() > 0.05'
+        for: 2m
+        annotations:
+          summary: RPC error rate >5%
+        labels:
+          severity: critical
+
+      - uid: tx-throughput-drop
+        title: Transaction Throughput Drop
+        condition: C
+        data:
+          - refId: C
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name="tx.receive"} | rate() < 10'
+        for: 10m
+        annotations:
+          summary: Transaction throughput below threshold
+        labels:
+          severity: warning
+```
+
+---
+
+## 7.7 PerfLog and Insight Correlation
+
+How to correlate OpenTelemetry traces with existing rippled observability.
+
+### 7.7.1 Correlation Architecture
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        otel[OpenTelemetry<br/>Spans]
+        perflog[PerfLog<br/>JSON Logs]
+        insight[Beast Insight<br/>StatsD Metrics]
+    end
+
+    subgraph collectors["Data Collection"]
+        otelc[OTel Collector]
+        promtail[Promtail/Fluentd]
+        statsd[StatsD Exporter]
+    end
+
+    subgraph storage["Storage"]
+        tempo[(Tempo)]
+        loki[(Loki)]
+        prom[(Prometheus)]
+    end
+
+    subgraph grafana["Grafana"]
+        traces[Trace View]
+        logs[Log View]
+        metrics[Metrics View]
+        corr[Correlation<br/>Panel]
+    end
+
+    otel -->|OTLP| otelc --> tempo
+    perflog -->|JSON| promtail --> loki
+    insight -->|StatsD| statsd --> prom
+
+    tempo --> traces
+    loki --> logs
+    prom --> metrics
+
+    traces --> corr
+    logs --> corr
+    metrics --> corr
+
+    style rippled fill:#0d47a1,stroke:#082f6a,color:#fff
+    style collectors fill:#bf360c,stroke:#8c2809,color:#fff
+    style storage fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style grafana fill:#4a148c,stroke:#2e0d57,color:#fff
+    style otel fill:#0d47a1,stroke:#082f6a,color:#fff
+    style perflog fill:#0d47a1,stroke:#082f6a,color:#fff
+    style insight fill:#0d47a1,stroke:#082f6a,color:#fff
+    style otelc fill:#bf360c,stroke:#8c2809,color:#fff
+    style promtail fill:#bf360c,stroke:#8c2809,color:#fff
+    style statsd fill:#bf360c,stroke:#8c2809,color:#fff
+    style tempo fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style loki fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style prom fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style traces fill:#4a148c,stroke:#2e0d57,color:#fff
+    style logs fill:#4a148c,stroke:#2e0d57,color:#fff
+    style metrics fill:#4a148c,stroke:#2e0d57,color:#fff
+    style corr fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+### 7.7.2 Correlation Fields
+
+| Source      | Field                       | Link To       | Purpose                    |
+| ----------- | --------------------------- | ------------- | -------------------------- |
+| **Trace**   | `trace_id`                  | Logs          | Find log entries for trace |
+| **Trace**   | `xrpl.tx.hash`              | Logs, Metrics | Find TX-related data       |
+| **Trace**   | `xrpl.consensus.ledger.seq` | Logs          | Find ledger-related logs   |
+| **PerfLog** | `trace_id` (new)            | Traces        | Jump to trace from log     |
+| **PerfLog** | `ledger_seq`                | Traces        | Find consensus trace       |
+| **Insight** | `exemplar.trace_id`         | Traces        | Jump from metric spike     |
+
+### 7.7.3 Example: Debugging a Slow Transaction
+
+**Step 1: Find the trace**
+```
+# In Grafana Explore with Tempo
+{resource.service.name="rippled" && span.xrpl.tx.hash="ABC123..."}
+```
+
+**Step 2: Get the trace_id from the trace view**
+```
+Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736
+```
+
+**Step 3: Find related PerfLog entries**
+```
+# In Grafana Explore with Loki
+{job="rippled"} |= "4bf92f3577b34da6a3ce929d0e0e4736"
+```
+
+**Step 4: Check Insight metrics for the time window**
+```
+# In Grafana with Prometheus
+rate(rippled_tx_applied_total[1m])
+  @ timestamp_from_trace
+```
+
+### 7.7.4 Unified Dashboard Example
+
+```json
+{
+  "title": "rippled Unified Observability",
+  "uid": "rippled-unified",
+  "panels": [
+    {
+      "title": "Transaction Latency (Traces)",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | histogram_over_time(duration)"
+        }
+      ],
+      "gridPos": { "h": 6, "w": 8, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Transaction Rate (Metrics)",
+      "type": "timeseries",
+      "datasource": "Prometheus",
+      "targets": [
+        {
+          "expr": "rate(rippled_tx_received_total[5m])",
+          "legendFormat": "{{ instance }}"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "links": [
+            {
+              "title": "View traces",
+              "url": "/explore?left={\"datasource\":\"Tempo\",\"query\":\"{resource.service.name=\\\"rippled\\\" && name=\\\"tx.receive\\\"}\"}"
+            }
+          ]
+        }
+      },
+      "gridPos": { "h": 6, "w": 8, "x": 8, "y": 0 }
+    },
+    {
+      "title": "Recent Logs",
+      "type": "logs",
+      "datasource": "Loki",
+      "targets": [
+        {
+          "expr": "{job=\"rippled\"} | json"
+        }
+      ],
+      "gridPos": { "h": 6, "w": 8, "x": 16, "y": 0 }
+    },
+    {
+      "title": "Trace Search",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"}"
+        }
+      ],
+      "fieldConfig": {
+        "overrides": [
+          {
+            "matcher": { "id": "byName", "options": "traceID" },
+            "properties": [
+              {
+                "id": "links",
+                "value": [
+                  {
+                    "title": "View trace",
+                    "url": "/explore?left={\"datasource\":\"Tempo\",\"query\":\"${__value.raw}\"}"
+                  },
+                  {
+                    "title": "View logs",
+                    "url": "/explore?left={\"datasource\":\"Loki\",\"query\":\"{job=\\\"rippled\\\"} |= \\\"${__value.raw}\\\"\"}"
+                  }
+                ]
+              }
+            ]
+          }
+        ]
+      },
+      "gridPos": { "h": 12, "w": 24, "x": 0, "y": 6 }
+    }
+  ]
+}
+```
+
+---
+
+*Previous: [Implementation Phases](./06-implementation-phases.md)* | *Next: [Appendix](./08-appendix.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/08-appendix.md
+++ b/OpenTelemetryPlan/08-appendix.md
@@ -0,0 +1,133 @@
+# Appendix
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Observability Backends](./07-observability-backends.md)
+
+---
+
+## 8.1 Glossary
+
+| Term                  | Definition                                                 |
+| --------------------- | ---------------------------------------------------------- |
+| **Span**              | A unit of work with start/end time, name, and attributes   |
+| **Trace**             | A collection of spans representing a complete request flow |
+| **Trace ID**          | 128-bit unique identifier for a trace                      |
+| **Span ID**           | 64-bit unique identifier for a span within a trace         |
+| **Context**           | Carrier for trace/span IDs across boundaries               |
+| **Propagator**        | Component that injects/extracts context                    |
+| **Sampler**           | Decides which traces to record                             |
+| **Exporter**          | Sends spans to backend                                     |
+| **Collector**         | Receives, processes, and forwards telemetry                |
+| **OTLP**              | OpenTelemetry Protocol (wire format)                       |
+| **W3C Trace Context** | Standard HTTP headers for trace propagation                |
+| **Baggage**           | Key-value pairs propagated across service boundaries       |
+| **Resource**          | Entity producing telemetry (service, host, etc.)           |
+| **Instrumentation**   | Code that creates telemetry data                           |
+
+### rippled-Specific Terms
+
+| Term              | Definition                                         |
+| ----------------- | -------------------------------------------------- |
+| **Overlay**       | P2P network layer managing peer connections        |
+| **Consensus**     | XRP Ledger consensus algorithm (RCL)               |
+| **Proposal**      | Validator's suggested transaction set for a ledger |
+| **Validation**    | Validator's signature on a closed ledger           |
+| **HashRouter**    | Component for transaction deduplication            |
+| **JobQueue**      | Thread pool for asynchronous task execution        |
+| **PerfLog**       | Existing performance logging system in rippled     |
+| **Beast Insight** | Existing metrics framework in rippled              |
+
+---
+
+## 8.2 Span Hierarchy Visualization
+
+```mermaid
+flowchart TB
+    subgraph trace["Trace: Transaction Lifecycle"]
+        rpc["rpc.submit<br/>(entry point)"]
+        validate["tx.validate"]
+        relay["tx.relay<br/>(parent span)"]
+
+        subgraph peers["Peer Spans"]
+            p1["peer.send<br/>Peer A"]
+            p2["peer.send<br/>Peer B"]
+            p3["peer.send<br/>Peer C"]
+        end
+
+        consensus["consensus.round"]
+        apply["tx.apply"]
+    end
+
+    rpc --> validate
+    validate --> relay
+    relay --> p1
+    relay --> p2
+    relay --> p3
+    p1 -.->|"context propagation"| consensus
+    consensus --> apply
+
+    style trace fill:#0f172a,stroke:#020617,color:#fff
+    style peers fill:#1e3a8a,stroke:#172554,color:#fff
+    style rpc fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style validate fill:#047857,stroke:#064e3b,color:#fff
+    style relay fill:#047857,stroke:#064e3b,color:#fff
+    style p1 fill:#0e7490,stroke:#155e75,color:#fff
+    style p2 fill:#0e7490,stroke:#155e75,color:#fff
+    style p3 fill:#0e7490,stroke:#155e75,color:#fff
+    style consensus fill:#fef3c7,stroke:#fde68a,color:#1e293b
+    style apply fill:#047857,stroke:#064e3b,color:#fff
+```
+
+---
+
+## 8.3 References
+
+### OpenTelemetry Resources
+
+1. [OpenTelemetry C++ SDK](https://github.com/open-telemetry/opentelemetry-cpp)
+2. [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/otel/)
+3. [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)
+4. [OTLP Protocol Specification](https://opentelemetry.io/docs/specs/otlp/)
+
+### Standards
+
+5. [W3C Trace Context](https://www.w3.org/TR/trace-context/)
+6. [W3C Baggage](https://www.w3.org/TR/baggage/)
+7. [Protocol Buffers](https://protobuf.dev/)
+
+### rippled Resources
+
+8. [rippled Source Code](https://github.com/XRPLF/rippled)
+9. [XRP Ledger Documentation](https://xrpl.org/docs/)
+10. [rippled Overlay README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/overlay/README.md)
+11. [rippled RPC README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/rpc/README.md)
+12. [rippled Consensus README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/app/consensus/README.md)
+
+---
+
+## 8.4 Version History
+
+| Version | Date       | Author | Changes                           |
+| ------- | ---------- | ------ | --------------------------------- |
+| 1.0     | 2026-02-12 | -      | Initial implementation plan       |
+| 1.1     | 2026-02-13 | -      | Refactored into modular documents |
+
+---
+
+## 8.5 Document Index
+
+| Document                                                         | Description                                |
+| ---------------------------------------------------------------- | ------------------------------------------ |
+| [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)                   | Master overview and executive summary      |
+| [01-architecture-analysis.md](./01-architecture-analysis.md)     | rippled architecture and trace points      |
+| [02-design-decisions.md](./02-design-decisions.md)               | SDK selection, exporters, span conventions |
+| [03-implementation-strategy.md](./03-implementation-strategy.md) | Directory structure, performance analysis  |
+| [04-code-samples.md](./04-code-samples.md)                       | C++ code examples for all components       |
+| [05-configuration-reference.md](./05-configuration-reference.md) | rippled config, CMake, Collector configs   |
+| [06-implementation-phases.md](./06-implementation-phases.md)     | Timeline, tasks, risks, success metrics    |
+| [07-observability-backends.md](./07-observability-backends.md)   | Backend selection and architecture         |
+| [08-appendix.md](./08-appendix.md)                               | Glossary, references, version history      |
+
+---
+
+*Previous: [Observability Backends](./07-observability-backends.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
--- a/OpenTelemetryPlan/OpenTelemetryPlan.md
+++ b/OpenTelemetryPlan/OpenTelemetryPlan.md
@@ -0,0 +1,189 @@
+# [OpenTelemetry](00-tracing-fundamentals.md) Distributed Tracing Implementation Plan for rippled (xrpld)
+
+## Executive Summary
+
+This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. The plan addresses the unique challenges of a decentralized peer-to-peer system where trace context must propagate across network boundaries between independent nodes.
+
+### Key Benefits
+
+- **End-to-end transaction visibility**: Track transactions from submission through consensus to ledger inclusion
+- **Consensus round analysis**: Understand timing and behavior of consensus phases across validators
+- **RPC performance insights**: Identify slow handlers and optimize response times
+- **Network topology understanding**: Visualize message propagation patterns between peers
+- **Incident debugging**: Correlate events across distributed nodes during issues
+
+### Estimated Performance Overhead
+
+| Metric        | Overhead   | Notes                               |
+| ------------- | ---------- | ----------------------------------- |
+| CPU           | 1-3%       | Span creation and attribute setting |
+| Memory        | 2-5 MB     | Batch buffer for pending spans      |
+| Network       | 10-50 KB/s | Compressed OTLP export to collector |
+| Latency (p99) | <2%        | With proper sampling configuration  |
+
+---
+
+## Document Structure
+
+This implementation plan is organized into modular documents for easier navigation:
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    overview["📋 OpenTelemetryPlan.md<br/>(This Document)"]
+
+    subgraph analysis["Analysis & Design"]
+        arch["01-architecture-analysis.md"]
+        design["02-design-decisions.md"]
+    end
+
+    subgraph impl["Implementation"]
+        strategy["03-implementation-strategy.md"]
+        code["04-code-samples.md"]
+        config["05-configuration-reference.md"]
+    end
+
+    subgraph deploy["Deployment & Planning"]
+        phases["06-implementation-phases.md"]
+        backends["07-observability-backends.md"]
+        appendix["08-appendix.md"]
+    end
+
+    overview --> analysis
+    overview --> impl
+    overview --> deploy
+
+    arch --> design
+    design --> strategy
+    strategy --> code
+    code --> config
+    config --> phases
+    phases --> backends
+    backends --> appendix
+
+    style overview fill:#1b5e20,stroke:#0d3d14,color:#fff,stroke-width:2px
+    style analysis fill:#0d47a1,stroke:#082f6a,color:#fff
+    style impl fill:#bf360c,stroke:#8c2809,color:#fff
+    style deploy fill:#4a148c,stroke:#2e0d57,color:#fff
+    style arch fill:#0d47a1,stroke:#082f6a,color:#fff
+    style design fill:#0d47a1,stroke:#082f6a,color:#fff
+    style strategy fill:#bf360c,stroke:#8c2809,color:#fff
+    style code fill:#bf360c,stroke:#8c2809,color:#fff
+    style config fill:#bf360c,stroke:#8c2809,color:#fff
+    style phases fill:#4a148c,stroke:#2e0d57,color:#fff
+    style backends fill:#4a148c,stroke:#2e0d57,color:#fff
+    style appendix fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+</div>
+
+---
+
+## Table of Contents
+
+| Section | Document                                                   | Description                                                            |
+| ------- | ---------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **1**   | [Architecture Analysis](./01-architecture-analysis.md)     | rippled component analysis, trace points, instrumentation priorities   |
+| **2**   | [Design Decisions](./02-design-decisions.md)               | SDK selection, exporters, span naming, attributes, context propagation |
+| **3**   | [Implementation Strategy](./03-implementation-strategy.md) | Directory structure, key principles, performance optimization          |
+| **4**   | [Code Samples](./04-code-samples.md)                       | Complete C++ implementation examples for all components                |
+| **5**   | [Configuration Reference](./05-configuration-reference.md) | rippled config, CMake integration, Collector configurations            |
+| **6**   | [Implementation Phases](./06-implementation-phases.md)     | 5-phase timeline, tasks, risks, success metrics                        |
+| **7**   | [Observability Backends](./07-observability-backends.md)   | Backend selection guide and production architecture                    |
+| **8**   | [Appendix](./08-appendix.md)                               | Glossary, references, version history                                  |
+
+---
+
+## 1. Architecture Analysis
+
+The rippled node consists of several key components that require instrumentation for comprehensive distributed tracing. The main areas include the RPC server (HTTP/WebSocket), Overlay P2P network, Consensus mechanism (RCLConsensus), JobQueue for async task execution, and existing observability infrastructure (PerfLog, Insight/StatsD, Journal logging).
+
+Key trace points span across transaction submission via RPC, peer-to-peer message propagation, consensus round execution, and ledger building. The implementation prioritizes high-value, low-risk components first: RPC handlers provide immediate value with minimal risk, while consensus tracing requires careful implementation to avoid timing impacts.
+
+➡️ **[Read full Architecture Analysis](./01-architecture-analysis.md)**
+
+---
+
+## 2. Design Decisions
+
+The OpenTelemetry C++ SDK is selected for its CNCF backing, active development, and native performance characteristics. Traces are exported via OTLP/gRPC (primary) or OTLP/HTTP (fallback) to an OpenTelemetry Collector, which provides flexible routing and sampling.
+
+Span naming follows a hierarchical `<component>.<operation>` convention (e.g., `rpc.submit`, `tx.relay`, `consensus.round`). Context propagation uses W3C Trace Context headers for HTTP and embedded Protocol Buffer fields for P2P messages. The implementation coexists with existing PerfLog and Insight observability systems through correlation IDs.
+
+**Data Collection & Privacy**: Telemetry collects only operational metadata (timing, counts, hashes) — never sensitive content (private keys, balances, amounts, raw payloads). Privacy protection includes account hashing, configurable redaction, sampling, and collector-level filtering. Node operators retain full control(not penned down in this document yet) over what data is exported.
+
+➡️ **[Read full Design Decisions](./02-design-decisions.md)**
+
+---
+
+## 3. Implementation Strategy
+
+The telemetry code is organized under `include/xrpl/telemetry/` for headers and `src/libxrpl/telemetry/` for implementation. Key principles include RAII-based span management via `SpanGuard`, conditional compilation with `XRPL_ENABLE_TELEMETRY`, and minimal runtime overhead through batch processing and efficient sampling.
+
+Performance optimization strategies include probabilistic head sampling (10% default), tail-based sampling at the collector for errors and slow traces, batch export to reduce network overhead, and conditional instrumentation that compiles to no-ops when disabled.
+
+➡️ **[Read full Implementation Strategy](./03-implementation-strategy.md)**
+
+---
+
+## 4. Code Samples
+
+Complete C++ implementation examples are provided for all telemetry components:
+- `Telemetry.h` - Core interface for tracer access and span creation
+- `SpanGuard.h` - RAII wrapper for automatic span lifecycle management
+- `TracingInstrumentation.h` - Macros for conditional instrumentation
+- Protocol Buffer extensions for trace context propagation
+- Module-specific instrumentation (RPC, Consensus, P2P, JobQueue)
+
+➡️ **[View all Code Samples](./04-code-samples.md)**
+
+---
+
+## 5. Configuration Reference
+
+Configuration is handled through the `[telemetry]` section in `xrpld.cfg` with options for enabling/disabling, exporter selection, endpoint configuration, sampling ratios, and component-level filtering. CMake integration includes a `XRPL_ENABLE_TELEMETRY` option for compile-time control.
+
+OpenTelemetry Collector configurations are provided for development (with Jaeger) and production (with tail-based sampling, Tempo, and Elastic APM). Docker Compose examples enable quick local development environment setup.
+
+➡️ **[View full Configuration Reference](./05-configuration-reference.md)**
+
+---
+
+## 6. Implementation Phases
+
+The implementation spans 9 weeks across 5 phases:
+
+| Phase | Duration  | Focus               | Key Deliverables                                    |
+| ----- | --------- | ------------------- | --------------------------------------------------- |
+| 1     | Weeks 1-2 | Core Infrastructure | SDK integration, Telemetry interface, Configuration |
+| 2     | Weeks 3-4 | RPC Tracing         | HTTP context extraction, Handler instrumentation    |
+| 3     | Weeks 5-6 | Transaction Tracing | Protocol Buffer context, Relay propagation          |
+| 4     | Weeks 7-8 | Consensus Tracing   | Round spans, Proposal/validation tracing            |
+| 5     | Week 9    | Documentation       | Runbook, Dashboards, Training                       |
+
+**Total Effort**: 47 developer-days with 2 developers
+
+➡️ **[View full Implementation Phases](./06-implementation-phases.md)**
+
+---
+
+## 7. Observability Backends
+
+For development and testing, Jaeger provides easy setup with a good UI. For production deployments, Grafana Tempo is recommended for its cost-effectiveness and Grafana integration, while Elastic APM is ideal for organizations with existing Elastic infrastructure.
+
+The recommended production architecture uses a gateway collector pattern with regional collectors performing tail-based sampling, routing traces to multiple backends (Tempo for primary storage, Elastic for log correlation, S3/GCS for long-term archive).
+
+➡️ **[View Observability Backend Recommendations](./07-observability-backends.md)**
+
+---
+
+## 8. Appendix
+
+The appendix contains a glossary of OpenTelemetry and rippled-specific terms, references to external documentation and specifications, version history for this implementation plan, and a complete document index.
+
+➡️ **[View Appendix](./08-appendix.md)**
+
+---
+
+*This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. For detailed information on any section, follow the links to the corresponding sub-documents.*
--- a/presentation.md
+++ b/presentation.md
@@ -0,0 +1,263 @@
+# OpenTelemetry Distributed Tracing for rippled
+
+---
+
+## Slide 1: Introduction
+
+### What is OpenTelemetry?
+
+OpenTelemetry is an open-source, CNCF-backed observability framework for distributed tracing, metrics, and logs.
+
+### Why OpenTelemetry for rippled?
+
+- **End-to-End Transaction Visibility**: Track transactions from submission → consensus → ledger inclusion
+- **Cross-Node Correlation**: Follow requests across multiple independent nodes using a unique `trace_id`
+- **Consensus Round Analysis**: Understand timing and behavior across validators
+- **Incident Debugging**: Correlate events across distributed nodes during issues
+
+```mermaid
+flowchart LR
+    A["Node A<br/>tx.receive<br/>trace_id: abc123"] --> B["Node B<br/>tx.relay<br/>trace_id: abc123"] --> C["Node C<br/>tx.validate<br/>trace_id: abc123"] --> D["Node D<br/>ledger.apply<br/>trace_id: abc123"]
+
+    style A fill:#1565c0,stroke:#0d47a1,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style D fill:#e65100,stroke:#bf360c,color:#fff
+```
+
+> **Trace ID: abc123** — All nodes share the same trace, enabling cross-node correlation.
+
+---
+
+## Slide 2: Comparison with Existing Solutions
+
+### Current Observability Stack
+
+| Aspect                | PerfLog (JSON)        | StatsD (Metrics)      | OpenTelemetry (NEW)         |
+| --------------------- | --------------------- | --------------------- | --------------------------- |
+| **Type**              | Logging               | Metrics               | Distributed Tracing         |
+| **Scope**             | Single node           | Single node           | **Cross-node**              |
+| **Data**              | JSON log entries      | Counters, gauges      | Spans with context          |
+| **Correlation**       | By timestamp          | By metric name        | By `trace_id`               |
+| **Overhead**          | Low (file I/O)        | Low (UDP)             | Low-Medium (configurable)   |
+| **Question Answered** | "What happened here?" | "How many? How fast?" | **"What was the journey?"** |
+
+### Use Case Matrix
+
+| Scenario                         | PerfLog | StatsD | OpenTelemetry |
+| -------------------------------- | ------- | ------ | ------------- |
+| "How many TXs per second?"       | ❌       | ✅      | ❌             |
+| "Why was this specific TX slow?" | ⚠️       | ❌      | ✅             |
+| "Which node delayed consensus?"  | ❌       | ❌      | ✅             |
+| "Show TX journey across 5 nodes" | ❌       | ❌      | ✅             |
+
+> **Key Insight**: OpenTelemetry **complements** (not replaces) existing systems.
+
+---
+
+## Slide 3: Architecture
+
+### High-Level Integration Architecture
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        subgraph services["Core Services"]
+            direction LR
+            RPC["RPC Server<br/>(HTTP/WS)"] ~~~ Overlay["Overlay<br/>(P2P Network)"] ~~~ Consensus["Consensus<br/>(RCLConsensus)"]
+        end
+
+        Telemetry["Telemetry Module<br/>(OpenTelemetry SDK)"]
+
+        services --> Telemetry
+    end
+
+    Telemetry -->|OTLP/gRPC| Collector["OTel Collector"]
+
+    Collector --> Tempo["Grafana Tempo"]
+    Collector --> Jaeger["Jaeger"]
+    Collector --> Elastic["Elastic APM"]
+
+    style rippled fill:#424242,stroke:#212121,color:#fff
+    style services fill:#1565c0,stroke:#0d47a1,color:#fff
+    style Telemetry fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style Collector fill:#e65100,stroke:#bf360c,color:#fff
+```
+
+### Context Propagation
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant NodeA as Node A
+    participant NodeB as Node B
+
+    Client->>NodeA: Submit TX (no context)
+    Note over NodeA: Creates trace_id: abc123<br/>span: tx.receive
+    NodeA->>NodeB: Relay TX<br/>(traceparent: abc123)
+    Note over NodeB: Links to trace_id: abc123<br/>span: tx.relay
+```
+
+- **HTTP/RPC**: W3C Trace Context headers (`traceparent`)
+- **P2P Messages**: Protocol Buffer extension fields
+
+---
+
+## Slide 4: Implementation Plan
+
+### 5-Phase Rollout (9 Weeks)
+
+```mermaid
+gantt
+    title Implementation Timeline
+    dateFormat  YYYY-MM-DD
+    axisFormat  Week %W
+
+    section Phase 1
+    Core Infrastructure    :p1, 2024-01-01, 2w
+
+    section Phase 2
+    RPC Tracing           :p2, after p1, 2w
+
+    section Phase 3
+    Transaction Tracing   :p3, after p2, 2w
+
+    section Phase 4
+    Consensus Tracing     :p4, after p3, 2w
+
+    section Phase 5
+    Documentation         :p5, after p4, 1w
+```
+
+### Phase Details
+
+| Phase | Focus               | Key Deliverables                             | Effort  |
+| ----- | ------------------- | -------------------------------------------- | ------- |
+| 1     | Core Infrastructure | SDK integration, Telemetry interface, Config | 10 days |
+| 2     | RPC Tracing         | HTTP context extraction, Handler spans       | 10 days |
+| 3     | Transaction Tracing | Protobuf context, P2P relay propagation      | 10 days |
+| 4     | Consensus Tracing   | Round spans, Proposal/validation tracing     | 10 days |
+| 5     | Documentation       | Runbook, Dashboards, Training                | 7 days  |
+
+**Total Effort**: ~47 developer-days (2 developers)
+
+---
+
+## Slide 5: Performance Overhead
+
+### Estimated System Impact
+
+| Metric            | Overhead   | Notes                               |
+| ----------------- | ---------- | ----------------------------------- |
+| **CPU**           | 1-3%       | Span creation and attribute setting |
+| **Memory**        | 2-5 MB     | Batch buffer for pending spans      |
+| **Network**       | 10-50 KB/s | Compressed OTLP export to collector |
+| **Latency (p99)** | <2%        | With proper sampling configuration  |
+
+### Per-Message Overhead (Context Propagation)
+
+Each P2P message carries trace context with the following overhead:
+
+| Field         | Size          | Description                               |
+| ------------- | ------------- | ----------------------------------------- |
+| `trace_id`    | 16 bytes      | Unique identifier for the entire trace    |
+| `span_id`     | 8 bytes       | Current span (becomes parent on receiver) |
+| `trace_flags` | 4 bytes       | Sampling decision flags                   |
+| `trace_state` | 0-4 bytes     | Optional vendor-specific data             |
+| **Total**     | **~32 bytes** | **Added per traced P2P message**          |
+
+```mermaid
+flowchart LR
+    subgraph msg["P2P Message with Trace Context"]
+        A["Original Message<br/>(variable size)"] --> B["+ TraceContext<br/>(~32 bytes)"]
+    end
+
+    subgraph breakdown["Context Breakdown"]
+        C["trace_id<br/>16 bytes"]
+        D["span_id<br/>8 bytes"]
+        E["flags<br/>4 bytes"]
+        F["state<br/>0-4 bytes"]
+    end
+
+    B --> breakdown
+
+    style A fill:#424242,stroke:#212121,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#1565c0,stroke:#0d47a1,color:#fff
+    style D fill:#1565c0,stroke:#0d47a1,color:#fff
+    style E fill:#e65100,stroke:#bf360c,color:#fff
+    style F fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+> **Note**: 32 bytes is negligible compared to typical transaction messages (hundreds to thousands of bytes)
+
+### Mitigation Strategies
+
+```mermaid
+flowchart LR
+    A["Head Sampling<br/>10% default"] --> B["Tail Sampling<br/>Keep errors/slow"] --> C["Batch Export<br/>Reduce I/O"] --> D["Conditional Compile<br/>XRPL_ENABLE_TELEMETRY"]
+
+    style A fill:#1565c0,stroke:#0d47a1,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#e65100,stroke:#bf360c,color:#fff
+    style D fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+### Kill Switches (Rollback Options)
+
+1. **Config Disable**: Set `enabled=0` in config → instant disable, no restart needed for sampling
+2. **Rebuild**: Compile with `XRPL_ENABLE_TELEMETRY=OFF` → zero overhead (no-op)
+3. **Full Revert**: Clean separation allows easy commit reversion
+
+---
+
+## Slide 6: Data Collection & Privacy
+
+### What Data is Collected
+
+| Category        | Attributes Collected                                                               | Purpose                     |
+| --------------- | ---------------------------------------------------------------------------------- | --------------------------- |
+| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index`                        | Trace transaction lifecycle |
+| **Consensus**   | `round`, `phase`, `mode`, `proposers`(public key or public node id), `duration_ms` | Analyze consensus timing    |
+| **RPC**         | `command`, `version`, `status`, `duration_ms`                                      | Monitor RPC performance     |
+| **Peer**        | `peer.id`(public key), `latency_ms`, `message.type`, `message.size`                | Network topology analysis   |
+| **Ledger**      | `ledger.hash`, `ledger.index`, `close_time`, `tx_count`                            | Ledger progression tracking |
+| **Job**         | `job.type`, `queue_ms`, `worker`                                                   | JobQueue performance        |
+
+### What is NOT Collected (Privacy Guarantees)
+
+```mermaid
+flowchart LR
+    subgraph notCollected["❌ NOT Collected"]
+        direction LR
+        A["Private Keys"] ~~~ B["Account Balances"] ~~~ C["Transaction Amounts"]
+    end
+
+    subgraph alsoNot["❌ Also Excluded"]
+        direction LR
+        D["IP Addresses<br/>(configurable)"] ~~~ E["Personal Data"] ~~~ F["Raw TX Payloads"]
+    end
+
+    style A fill:#c62828,stroke:#8c2809,color:#fff
+    style B fill:#c62828,stroke:#8c2809,color:#fff
+    style C fill:#c62828,stroke:#8c2809,color:#fff
+    style D fill:#c62828,stroke:#8c2809,color:#fff
+    style E fill:#c62828,stroke:#8c2809,color:#fff
+    style F fill:#c62828,stroke:#8c2809,color:#fff
+```
+
+### Privacy Protection Mechanisms
+
+| Mechanism                  | Description                                                   |
+| -------------------------- | ------------------------------------------------------------- |
+| **Account Hashing**        | `xrpl.tx.account` is hashed at collector level before storage |
+| **Configurable Redaction** | Sensitive fields can be excluded via config                   |
+| **Sampling**               | Only 10% of traces recorded by default (reduces exposure)     |
+| **Local Control**          | Node operators control what gets exported                     |
+| **No Raw Payloads**        | Transaction content is never recorded, only metadata          |
+
+> **Key Principle**: Telemetry collects **operational metadata** (timing, counts, hashes) — never **sensitive content** (keys, balances, amounts).
+
+---
+
+*End of Presentation*