Phase 1a: OpenTelemetry plan documentation

Add comprehensive planning documentation for the OpenTelemetry distributed tracing integration: - Tracing fundamentals and concepts - Architecture analysis of rippled's tracing surface area - Design decisions and trade-offs - Implementation strategy and code samples - Configuration reference - Implementation phases roadmap - Observability backend comparison - POC task list and presentation materials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 03:42:27 +00:00 · 2026-02-27 17:50:27 +00:00
30 changed files with 5986 additions and 4067 deletions
--- a/BoostToStdCoroutineSwitchPlan.md
+++ b/BoostToStdCoroutineSwitchPlan.md
--- a/OpenTelemetryPlan/00-tracing-fundamentals.md
+++ b/OpenTelemetryPlan/00-tracing-fundamentals.md
@@ -0,0 +1,244 @@
+# Distributed Tracing Fundamentals
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Next**: [Architecture Analysis](./01-architecture-analysis.md)
+
+---
+
+## What is Distributed Tracing?
+
+Distributed tracing is a method for tracking data objects as they flow through distributed systems. In a network like XRP Ledger, a single transaction touches multiple independent nodes—each with no shared memory or logging. Distributed tracing connects these dots.
+
+**Without tracing:** You see isolated logs on each node with no way to correlate them.
+
+**With tracing:** You see the complete journey of a transaction or an event across all nodes it touched.
+
+---
+
+## Core Concepts
+
+### 1. Trace
+
+A **trace** represents the entire journey of a request through the system. It has a unique `trace_id` that stays constant across all nodes.
+
+```
+Trace ID: abc123
+├── Node A: received transaction
+├── Node B: relayed transaction
+├── Node C: included in consensus
+└── Node D: applied to ledger
+```
+
+### 2. Span
+
+A **span** represents a single unit of work within a trace. Each span has:
+
+| Attribute        | Description           | Example                    |
+| ---------------- | --------------------- | -------------------------- |
+| `trace_id`       | Links to parent trace | `abc123`                   |
+| `span_id`        | Unique identifier     | `span456`                  |
+| `parent_span_id` | Parent span (if any)  | `p_span123`                |
+| `name`           | Operation name        | `rpc.submit`               |
+| `start_time`     | When work began       | `2024-01-15T10:30:00Z`     |
+| `end_time`       | When work completed   | `2024-01-15T10:30:00.050Z` |
+| `attributes`     | Key-value metadata    | `tx.hash=ABC...`           |
+| `status`         | OK, ERROR MSG         | `OK`                       |
+
+### 3. Trace Context
+
+**Trace context** is the data that propagates between services to link spans together. It contains:
+
+- `trace_id` - The trace this span belongs to
+- `span_id` - The current span (becomes parent for child spans)
+- `trace_flags` - Sampling decisions
+
+---
+
+## How Spans Form a Trace
+
+Spans have parent-child relationships forming a tree structure:
+
+```mermaid
+flowchart TB
+    subgraph trace["Trace: abc123"]
+        A["tx.submit<br/>span_id: 001<br/>50ms"] --> B["tx.validate<br/>span_id: 002<br/>5ms"]
+        A --> C["tx.relay<br/>span_id: 003<br/>10ms"]
+        A --> D["tx.apply<br/>span_id: 004<br/>30ms"]
+        D --> E["ledger.update<br/>span_id: 005<br/>20ms"]
+    end
+
+    style A fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style B fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style C fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style D fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style E fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+The same trace visualized as a **timeline (Gantt chart)**:
+
+```
+Time →   0ms    10ms    20ms    30ms    40ms    50ms
+         ├───────────────────────────────────────────┤
+tx.submit│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+         ├─────┤
+tx.valid │▓▓▓▓▓│
+         │     ├──────────┤
+tx.relay │     │▓▓▓▓▓▓▓▓▓▓│
+         │               ├────────────────────────────┤
+tx.apply │               │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+         │                         ├──────────────────┤
+ledger   │                         │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
+```
+
+---
+
+## Distributed Traces Across Nodes
+
+In distributed systems like rippled, traces span **multiple independent nodes**. The trace context must be propagated in network messages:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant NodeA as Node A
+    participant NodeB as Node B
+    participant NodeC as Node C
+
+    Client->>NodeA: Submit TX<br/>(no trace context)
+
+    Note over NodeA: Creates new trace<br/>trace_id: abc123<br/>span: tx.receive
+
+    NodeA->>NodeB: Relay TX<br/>(trace_id: abc123, parent: 001)
+
+    Note over NodeB: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001
+
+    NodeA->>NodeC: Relay TX<br/>(trace_id: abc123, parent: 001)
+
+    Note over NodeC: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001
+
+    Note over NodeA,NodeC: All spans share trace_id: abc123<br/>enabling correlation across nodes
+```
+
+---
+
+## Context Propagation
+
+For traces to work across nodes, **trace context must be propagated** in messages.
+
+### What's in the Context (32 bytes)
+
+| Field         | Size       | Description                                             |
+| ------------- | ---------- | ------------------------------------------------------- |
+| `trace_id`    | 16 bytes   | Identifies the entire trace (constant across all nodes) |
+| `span_id`     | 8 bytes    | The sender's current span (becomes parent on receiver)  |
+| `trace_flags` | 4 bytes    | Sampling decision flags                                 |
+| `trace_state` | ~0-4 bytes | Optional vendor-specific data                           |
+
+### How span_id Changes at Each Hop
+
+Only **one** `span_id` travels in the context - the sender's current span. Each node:
+
+1. Extracts the received `span_id` and uses it as the `parent_span_id`
+2. Creates a **new** `span_id` for its own span
+3. Sends its own `span_id` as the parent when forwarding
+
+```
+Node A                      Node B                      Node C
+──────                      ──────                      ──────
+
+Span AAA                    Span BBB                    Span CCC
+   │                           │                           │
+   ▼                           ▼                           ▼
+Context out:                Context out:                Context out:
+├─ trace_id: abc123         ├─ trace_id: abc123         ├─ trace_id: abc123
+├─ span_id: AAA ──────────► ├─ span_id: BBB ──────────► ├─ span_id: CCC ──────►
+└─ flags: 01                └─ flags: 01                └─ flags: 01
+                               │                           │
+                          parent = AAA               parent = BBB
+```
+
+The `trace_id` stays constant, but `span_id` **changes at every hop** to maintain the parent-child chain.
+
+### Propagation Formats
+
+There are two patterns:
+
+### HTTP/RPC Headers (W3C Trace Context)
+
+```
+traceparent: 00-abc123def456-span789-01
+             │  │             │      │
+             │  │             │      └── Flags (sampled)
+             │  │             └── Parent span ID
+             │  └── Trace ID
+             └── Version
+```
+
+### Protocol Buffers (rippled P2P messages)
+
+```protobuf
+message TMTransaction {
+    bytes rawTransaction = 1;
+    // ... existing fields ...
+
+    // Trace context extension
+    bytes trace_parent = 100;  // W3C traceparent
+    bytes trace_state = 101;   // W3C tracestate
+}
+```
+
+---
+
+## Sampling
+
+Not every trace needs to be recorded. **Sampling** reduces overhead:
+
+### Head Sampling (at trace start)
+
+```
+Request arrives → Random 10% chance → Record or skip entire trace
+```
+
+- ✅ Low overhead
+- ❌ May miss interesting traces
+
+### Tail Sampling (after trace completes)
+
+```
+Trace completes → Collector evaluates:
+                  - Error? → KEEP
+                  - Slow? → KEEP
+                  - Normal? → Sample 10%
+```
+
+- ✅ Never loses important traces
+- ❌ Higher memory usage at collector
+
+---
+
+## Key Benefits for rippled
+
+| Challenge                          | How Tracing Helps                        |
+| ---------------------------------- | ---------------------------------------- |
+| "Where is my transaction?"         | Follow trace across all nodes it touched |
+| "Why was consensus slow?"          | See timing breakdown of each phase       |
+| "Which node is the bottleneck?"    | Compare span durations across nodes      |
+| "What happened during the outage?" | Correlate errors across the network      |
+
+---
+
+## Glossary
+
+| Term                | Definition                                                      |
+| ------------------- | --------------------------------------------------------------- |
+| **Trace**           | Complete journey of a request, identified by `trace_id`         |
+| **Span**            | Single operation within a trace                                 |
+| **Context**         | Data propagated between services (`trace_id`, `span_id`, flags) |
+| **Instrumentation** | Code that creates spans and propagates context                  |
+| **Collector**       | Service that receives, processes, and exports traces            |
+| **Backend**         | Storage/visualization system (Jaeger, Tempo, etc.)              |
+| **Head Sampling**   | Sampling decision at trace start                                |
+| **Tail Sampling**   | Sampling decision after trace completes                         |
+
+---
+
+_Next: [Architecture Analysis](./01-architecture-analysis.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/01-architecture-analysis.md
+++ b/OpenTelemetryPlan/01-architecture-analysis.md
@@ -0,0 +1,330 @@
+# Architecture Analysis
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Design Decisions](./02-design-decisions.md) | [Implementation Strategy](./03-implementation-strategy.md)
+
+---
+
+## 1.1 Current rippled Architecture Overview
+
+The rippled node software consists of several interconnected components that need instrumentation for distributed tracing:
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        subgraph services["Core Services"]
+            RPC["RPC Server<br/>(HTTP/WS/gRPC)"]
+            Overlay["Overlay<br/>(P2P Network)"]
+            Consensus["Consensus<br/>(RCLConsensus)"]
+        end
+
+        JobQueue["JobQueue<br/>(Thread Pool)"]
+
+        subgraph processing["Processing Layer"]
+            NetworkOPs["NetworkOPs<br/>(Tx Processing)"]
+            LedgerMaster["LedgerMaster<br/>(Ledger Mgmt)"]
+            NodeStore["NodeStore<br/>(Database)"]
+        end
+
+        subgraph observability["Existing Observability"]
+            PerfLog["PerfLog<br/>(JSON)"]
+            Insight["Insight<br/>(StatsD)"]
+            Logging["Logging<br/>(Journal)"]
+        end
+
+        services --> JobQueue
+        JobQueue --> processing
+    end
+
+    style rippled fill:#424242,stroke:#212121,color:#ffffff
+    style services fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style processing fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style observability fill:#e65100,stroke:#bf360c,color:#ffffff
+```
+
+---
+
+## 1.2 Key Components for Instrumentation
+
+| Component         | Location                                   | Purpose                  | Trace Value                  |
+| ----------------- | ------------------------------------------ | ------------------------ | ---------------------------- |
+| **Overlay**       | `src/xrpld/overlay/`                       | P2P communication        | Message propagation timing   |
+| **PeerImp**       | `src/xrpld/overlay/detail/PeerImp.cpp`     | Individual peer handling | Per-peer latency             |
+| **RCLConsensus**  | `src/xrpld/app/consensus/RCLConsensus.cpp` | Consensus algorithm      | Round timing, phase analysis |
+| **NetworkOPs**    | `src/xrpld/app/misc/NetworkOPs.cpp`        | Transaction processing   | Tx lifecycle tracking        |
+| **ServerHandler** | `src/xrpld/rpc/detail/ServerHandler.cpp`   | RPC entry point          | Request latency              |
+| **RPCHandler**    | `src/xrpld/rpc/detail/RPCHandler.cpp`      | Command execution        | Per-command timing           |
+| **JobQueue**      | `src/xrpl/core/JobQueue.h`                 | Async task execution     | Queue wait times             |
+
+---
+
+## 1.3 Transaction Flow Diagram
+
+Transaction flow spans multiple nodes in the network. Each node creates linked spans to form a distributed trace:
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant PeerA as Peer A (Receive)
+    participant PeerB as Peer B (Relay)
+    participant PeerC as Peer C (Validate)
+
+    Client->>PeerA: 1. Submit TX
+
+    rect rgb(230, 245, 255)
+        Note over PeerA: tx.receive SPAN START
+        PeerA->>PeerA: HashRouter Deduplication
+        PeerA->>PeerA: tx.validate (child span)
+    end
+
+    PeerA->>PeerB: 2. Relay TX (with trace ctx)
+
+    rect rgb(230, 245, 255)
+        Note over PeerB: tx.receive (linked span)
+    end
+
+    PeerB->>PeerC: 3. Relay TX
+
+    rect rgb(230, 245, 255)
+        Note over PeerC: tx.receive (linked span)
+        PeerC->>PeerC: tx.process
+    end
+
+    Note over Client,PeerC: DISTRIBUTED TRACE (same trace_id: abc123)
+```
+
+### Trace Structure
+
+```
+trace_id: abc123
+├── span: tx.receive (Peer A)
+│   ├── span: tx.validate
+│   └── span: tx.relay
+├── span: tx.receive (Peer B) [parent: Peer A]
+│   └── span: tx.relay
+└── span: tx.receive (Peer C) [parent: Peer B]
+    └── span: tx.process
+```
+
+---
+
+## 1.4 Consensus Round Flow
+
+Consensus rounds are multi-phase operations that benefit significantly from tracing:
+
+```mermaid
+flowchart TB
+    subgraph round["consensus.round (root span)"]
+        attrs["Attributes:<br/>xrpl.consensus.ledger.seq = 12345678<br/>xrpl.consensus.mode = proposing<br/>xrpl.consensus.proposers = 35"]
+
+        subgraph open["consensus.phase.open"]
+            open_desc["Duration: ~3s<br/>Waiting for transactions"]
+        end
+
+        subgraph establish["consensus.phase.establish"]
+            est_attrs["proposals_received = 28<br/>disputes_resolved = 3"]
+            est_children["├── consensus.proposal.receive (×28)<br/>├── consensus.proposal.send (×1)<br/>└── consensus.dispute.resolve (×3)"]
+        end
+
+        subgraph accept["consensus.phase.accept"]
+            acc_attrs["transactions_applied = 150<br/>ledger.hash = DEF456..."]
+            acc_children["├── ledger.build<br/>└── ledger.validate"]
+        end
+
+        attrs --> open
+        open --> establish
+        establish --> accept
+    end
+
+    style round fill:#f57f17,stroke:#e65100,color:#ffffff
+    style open fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style establish fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style accept fill:#c2185b,stroke:#880e4f,color:#ffffff
+```
+
+---
+
+## 1.5 RPC Request Flow
+
+RPC requests support W3C Trace Context headers for distributed tracing across services:
+
+```mermaid
+flowchart TB
+    subgraph request["rpc.request (root span)"]
+        http["HTTP Request<br/>POST /<br/>traceparent: 00-abc123...-def456...-01"]
+
+        attrs["Attributes:<br/>http.method = POST<br/>net.peer.ip = 192.168.1.100<br/>xrpl.rpc.command = submit"]
+
+        subgraph enqueue["jobqueue.enqueue"]
+            job_attr["xrpl.job.type = jtCLIENT_RPC"]
+        end
+
+        subgraph command["rpc.command.submit"]
+            cmd_attrs["xrpl.rpc.version = 2<br/>xrpl.rpc.role = user"]
+            cmd_children["├── tx.deserialize<br/>├── tx.validate_local<br/>└── tx.submit_to_network"]
+        end
+
+        response["Response: 200 OK<br/>Duration: 45ms"]
+
+        http --> attrs
+        attrs --> enqueue
+        enqueue --> command
+        command --> response
+    end
+
+    style request fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style enqueue fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style command fill:#e65100,stroke:#bf360c,color:#ffffff
+```
+
+---
+
+## 1.6 Key Trace Points
+
+The following table identifies priority instrumentation points across the codebase:
+
+| Category        | Span Name              | File                 | Method                 | Priority |
+| --------------- | ---------------------- | -------------------- | ---------------------- | -------- |
+| **Transaction** | `tx.receive`           | `PeerImp.cpp`        | `handleTransaction()`  | High     |
+| **Transaction** | `tx.validate`          | `NetworkOPs.cpp`     | `processTransaction()` | High     |
+| **Transaction** | `tx.process`           | `NetworkOPs.cpp`     | `doTransactionSync()`  | High     |
+| **Transaction** | `tx.relay`             | `OverlayImpl.cpp`    | `relay()`              | Medium   |
+| **Consensus**   | `consensus.round`      | `RCLConsensus.cpp`   | `startRound()`         | High     |
+| **Consensus**   | `consensus.phase.*`    | `Consensus.h`        | `timerEntry()`         | High     |
+| **Consensus**   | `consensus.proposal.*` | `RCLConsensus.cpp`   | `peerProposal()`       | Medium   |
+| **RPC**         | `rpc.request`          | `ServerHandler.cpp`  | `onRequest()`          | High     |
+| **RPC**         | `rpc.command.*`        | `RPCHandler.cpp`     | `doCommand()`          | High     |
+| **Peer**        | `peer.connect`         | `OverlayImpl.cpp`    | `onHandoff()`          | Low      |
+| **Peer**        | `peer.message.*`       | `PeerImp.cpp`        | `onMessage()`          | Low      |
+| **Ledger**      | `ledger.acquire`       | `InboundLedgers.cpp` | `acquire()`            | Medium   |
+| **Ledger**      | `ledger.build`         | `RCLConsensus.cpp`   | `buildLCL()`           | High     |
+
+---
+
+## 1.7 Instrumentation Priority
+
+```mermaid
+quadrantChart
+    title Instrumentation Priority Matrix
+    x-axis Low Complexity --> High Complexity
+    y-axis Low Value --> High Value
+    quadrant-1 Implement First
+    quadrant-2 Plan Carefully
+    quadrant-3 Quick Wins
+    quadrant-4 Consider Later
+
+    RPC Tracing: [0.3, 0.85]
+    Transaction Tracing: [0.65, 0.92]
+    Consensus Tracing: [0.75, 0.87]
+    Peer Message Tracing: [0.4, 0.3]
+    Ledger Acquisition: [0.5, 0.6]
+    JobQueue Tracing: [0.35, 0.5]
+```
+
+---
+
+## 1.8 Observable Outcomes
+
+After implementing OpenTelemetry, operators and developers will gain visibility into the following:
+
+### 1.8.1 What You Will See: Traces
+
+| Trace Type                 | Description                                                                                 | Example Query in Grafana/Tempo                         |
+| -------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------ |
+| **Transaction Lifecycle**  | Full journey from RPC submission through validation, relay, consensus, and ledger inclusion | `{service.name="rippled" && xrpl.tx.hash="ABC123..."}` |
+| **Cross-Node Propagation** | Transaction path across multiple rippled nodes with timing                                  | `{xrpl.tx.relay_count > 0}`                            |
+| **Consensus Rounds**       | Complete round with all phases (open, establish, accept)                                    | `{span.name=~"consensus.round.*"}`                     |
+| **RPC Request Processing** | Individual command execution with timing breakdown                                          | `{xrpl.rpc.command="account_info"}`                    |
+| **Ledger Acquisition**     | Peer-to-peer ledger data requests and responses                                             | `{span.name="ledger.acquire"}`                         |
+
+### 1.8.2 What You Will See: Metrics (Derived from Traces)
+
+| Metric                        | Description                            | Dashboard Panel             |
+| ----------------------------- | -------------------------------------- | --------------------------- |
+| **RPC Latency (p50/p95/p99)** | Response time distribution per command | Heatmap by command          |
+| **Transaction Throughput**    | Transactions processed per second      | Time series graph           |
+| **Consensus Round Duration**  | Time to complete consensus phases      | Histogram                   |
+| **Cross-Node Latency**        | Time for transaction to reach N nodes  | Line chart with percentiles |
+| **Error Rate**                | Failed transactions/RPC calls by type  | Stacked bar chart           |
+
+### 1.8.3 Concrete Dashboard Examples
+
+**Transaction Trace View (Jaeger/Tempo):**
+
+```
+┌────────────────────────────────────────────────────────────────────────────────┐
+│ Trace: abc123... (Transaction Submission)                    Duration: 847ms   │
+├────────────────────────────────────────────────────────────────────────────────┤
+│ ├── rpc.request [ServerHandler]                              ████░░░░░░  45ms  │
+│ │   └── rpc.command.submit [RPCHandler]                      ████░░░░░░  42ms  │
+│ │       └── tx.receive [NetworkOPs]                          ███░░░░░░░  35ms  │
+│ │           ├── tx.validate [TxQ]                            █░░░░░░░░░   8ms  │
+│ │           └── tx.relay [Overlay]                           ██░░░░░░░░  15ms  │
+│ │               ├── tx.receive [Node-B]                      █████░░░░░  52ms  │
+│ │               │   └── tx.relay [Node-B]                    ██░░░░░░░░  18ms  │
+│ │               └── tx.receive [Node-C]                      ██████░░░░  65ms  │
+│ └── consensus.round [RCLConsensus]                           ████████░░ 720ms  │
+│     ├── consensus.phase.open                                 ██░░░░░░░░ 180ms  │
+│     ├── consensus.phase.establish                            █████░░░░░ 480ms  │
+│     └── consensus.phase.accept                               █░░░░░░░░░  60ms  │
+└────────────────────────────────────────────────────────────────────────────────┘
+```
+
+**RPC Performance Dashboard Panel:**
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ RPC Command Latency (Last 1 Hour)                           │
+├─────────────────────────────────────────────────────────────┤
+│ Command          │ p50    │ p95    │ p99    │ Errors │ Rate │
+│──────────────────┼────────┼────────┼────────┼────────┼──────│
+│ account_info     │  12ms  │  45ms  │  89ms  │  0.1%  │ 150/s│
+│ submit           │  35ms  │ 120ms  │ 250ms  │  2.3%  │  45/s│
+│ ledger           │   8ms  │  25ms  │  55ms  │  0.0%  │  80/s│
+│ tx               │  15ms  │  50ms  │ 100ms  │  0.5%  │  60/s│
+│ server_info      │   5ms  │  12ms  │  20ms  │  0.0%  │ 200/s│
+└─────────────────────────────────────────────────────────────┘
+```
+
+**Consensus Health Dashboard Panel:**
+
+```mermaid
+---
+config:
+    xyChart:
+        width: 1200
+        height: 400
+        plotReservedSpacePercent: 50
+        chartOrientation: vertical
+    themeVariables:
+        xyChart:
+            plotColorPalette: "#3498db"
+---
+xychart-beta
+    title "Consensus Round Duration (Last 24 Hours)"
+    x-axis "Time of Day (Hours)" [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]
+    y-axis "Duration (seconds)" 1 --> 5
+    line [2.1, 2.3, 2.5, 2.4, 2.8, 1.6, 3.2, 3.0, 3.5, 1.3, 3.8, 3.6, 4.0, 3.2, 4.3, 4.1, 4.5, 4.3, 4.2, 2.4, 4.8, 4.6, 4.9, 4.7, 5.0, 4.9, 4.8, 2.6, 4.7, 4.5, 4.2, 4.0, 2.5, 3.7, 3.2, 3.4, 2.9, 3.1, 2.6, 2.8, 2.3, 1.5, 2.7, 2.4, 2.5, 2.3, 2.2, 2.1, 2.0]
+```
+
+### 1.8.4 Operator Actionable Insights
+
+| Scenario              | What You'll See                                                              | Action                           |
+| --------------------- | ---------------------------------------------------------------------------- | -------------------------------- |
+| **Slow RPC**          | Span showing which phase is slow (parsing, execution, serialization)         | Optimize specific code path      |
+| **Transaction Stuck** | Trace stops at validation; error attribute shows reason                      | Fix transaction parameters       |
+| **Consensus Delay**   | Phase.establish taking too long; proposer attribute shows missing validators | Investigate network connectivity |
+| **Memory Spike**      | Large batch of spans correlating with memory increase                        | Tune batch_size or sampling      |
+| **Network Partition** | Traces missing cross-node links for specific peer                            | Check peer connectivity          |
+
+### 1.8.5 Developer Debugging Workflow
+
+1. **Find Transaction**: Query by `xrpl.tx.hash` to get full trace
+2. **Identify Bottleneck**: Look at span durations to find slowest component
+3. **Check Attributes**: Review `xrpl.tx.validity`, `xrpl.rpc.status` for errors
+4. **Correlate Logs**: Use `trace_id` to find related PerfLog entries
+5. **Compare Nodes**: Filter by `service.instance.id` to compare behavior across nodes
+
+---
+
+_Next: [Design Decisions](./02-design-decisions.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/02-design-decisions.md
+++ b/OpenTelemetryPlan/02-design-decisions.md
@@ -0,0 +1,494 @@
+# Design Decisions
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Architecture Analysis](./01-architecture-analysis.md) | [Code Samples](./04-code-samples.md)
+
+---
+
+## 2.1 OpenTelemetry Components
+
+### 2.1.1 SDK Selection
+
+**Primary Choice**: OpenTelemetry C++ SDK (`opentelemetry-cpp`)
+
+| Component                               | Purpose                | Required    |
+| --------------------------------------- | ---------------------- | ----------- |
+| `opentelemetry-cpp::api`                | Tracing API headers    | Yes         |
+| `opentelemetry-cpp::sdk`                | SDK implementation     | Yes         |
+| `opentelemetry-cpp::ext`                | Extensions (exporters) | Yes         |
+| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export       | Recommended |
+| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export       | Alternative |
+
+### 2.1.2 Instrumentation Strategy
+
+**Manual Instrumentation** (recommended):
+
+| Approach   | Pros                                                              | Cons                                                    |
+| ---------- | ----------------------------------------------------------------- | ------------------------------------------------------- |
+| **Manual** | Precise control, optimized placement, rippled-specific attributes | More development effort                                 |
+| **Auto**   | Less code, automatic coverage                                     | Less control, potential overhead, limited customization |
+
+---
+
+## 2.2 Exporter Configuration
+
+```mermaid
+flowchart TB
+    subgraph nodes["rippled Nodes"]
+        node1["rippled<br/>Node 1"]
+        node2["rippled<br/>Node 2"]
+        node3["rippled<br/>Node 3"]
+    end
+
+    collector["OpenTelemetry<br/>Collector<br/>(sidecar or standalone)"]
+
+    subgraph backends["Observability Backends"]
+        jaeger["Jaeger<br/>(Dev)"]
+        tempo["Tempo<br/>(Prod)"]
+        elastic["Elastic<br/>APM"]
+    end
+
+    node1 -->|"OTLP/gRPC<br/>:4317"| collector
+    node2 -->|"OTLP/gRPC<br/>:4317"| collector
+    node3 -->|"OTLP/gRPC<br/>:4317"| collector
+
+    collector --> jaeger
+    collector --> tempo
+    collector --> elastic
+
+    style nodes fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style backends fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style collector fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+### 2.2.1 OTLP/gRPC (Recommended)
+
+```cpp
+// Configuration for OTLP over gRPC
+namespace otlp = opentelemetry::exporter::otlp;
+
+otlp::OtlpGrpcExporterOptions opts;
+opts.endpoint = "localhost:4317";
+opts.use_ssl_credentials = true;
+opts.ssl_credentials_cacert_path = "/path/to/ca.crt";
+```
+
+### 2.2.2 OTLP/HTTP (Alternative)
+
+```cpp
+// Configuration for OTLP over HTTP
+namespace otlp = opentelemetry::exporter::otlp;
+
+otlp::OtlpHttpExporterOptions opts;
+opts.url = "http://localhost:4318/v1/traces";
+opts.content_type = otlp::HttpRequestContentType::kJson;  // or kBinary
+```
+
+---
+
+## 2.3 Span Naming Conventions
+
+### 2.3.1 Naming Schema
+
+```
+<component>.<operation>[.<sub-operation>]
+```
+
+**Examples**:
+
+- `tx.receive` - Transaction received from peer
+- `consensus.phase.establish` - Consensus establish phase
+- `rpc.command.server_info` - server_info RPC command
+
+### 2.3.2 Complete Span Catalog
+
+```yaml
+# Transaction Spans
+tx:
+  receive: "Transaction received from network"
+  validate: "Transaction signature/format validation"
+  process: "Full transaction processing"
+  relay: "Transaction relay to peers"
+  apply: "Apply transaction to ledger"
+
+# Consensus Spans
+consensus:
+  round: "Complete consensus round"
+  phase:
+    open: "Open phase - collecting transactions"
+    establish: "Establish phase - reaching agreement"
+    accept: "Accept phase - applying consensus"
+  proposal:
+    receive: "Receive peer proposal"
+    send: "Send our proposal"
+  validation:
+    receive: "Receive peer validation"
+    send: "Send our validation"
+
+# RPC Spans
+rpc:
+  request: "HTTP/WebSocket request handling"
+  command:
+    "*": "Specific RPC command (dynamic)"
+
+# Peer Spans
+peer:
+  connect: "Peer connection establishment"
+  disconnect: "Peer disconnection"
+  message:
+    send: "Send protocol message"
+    receive: "Receive protocol message"
+
+# Ledger Spans
+ledger:
+  acquire: "Ledger acquisition from network"
+  build: "Build new ledger"
+  validate: "Ledger validation"
+  close: "Close ledger"
+
+# Job Spans
+job:
+  enqueue: "Job added to queue"
+  execute: "Job execution"
+```
+
+---
+
+## 2.4 Attribute Schema
+
+### 2.4.1 Resource Attributes (Set Once at Startup)
+
+```cpp
+// Standard OpenTelemetry semantic conventions
+resource::SemanticConventions::SERVICE_NAME        = "rippled"
+resource::SemanticConventions::SERVICE_VERSION     = BuildInfo::getVersionString()
+resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
+
+// Custom rippled attributes
+"xrpl.network.id"      = <network_id>           // e.g., 0 for mainnet
+"xrpl.network.type"    = "mainnet" | "testnet" | "devnet" | "standalone"
+"xrpl.node.type"       = "validator" | "stock" | "reporting"
+"xrpl.node.cluster"    = <cluster_name>         // If clustered
+```
+
+### 2.4.2 Span Attributes by Category
+
+#### Transaction Attributes
+
+```cpp
+"xrpl.tx.hash"         = string   // Transaction hash (hex)
+"xrpl.tx.type"         = string   // "Payment", "OfferCreate", etc.
+"xrpl.tx.account"      = string   // Source account (redacted in prod)
+"xrpl.tx.sequence"     = int64    // Account sequence number
+"xrpl.tx.fee"          = int64    // Fee in drops
+"xrpl.tx.result"       = string   // "tesSUCCESS", "tecPATH_DRY", etc.
+"xrpl.tx.ledger_index" = int64    // Ledger containing transaction
+```
+
+#### Consensus Attributes
+
+```cpp
+"xrpl.consensus.round"          = int64    // Round number
+"xrpl.consensus.phase"          = string   // "open", "establish", "accept"
+"xrpl.consensus.mode"           = string   // "proposing", "observing", etc.
+"xrpl.consensus.proposers"      = int64    // Number of proposers
+"xrpl.consensus.ledger.prev"    = string   // Previous ledger hash
+"xrpl.consensus.ledger.seq"     = int64    // Ledger sequence
+"xrpl.consensus.tx_count"       = int64    // Transactions in consensus set
+"xrpl.consensus.duration_ms"    = float64  // Round duration
+```
+
+#### RPC Attributes
+
+```cpp
+"xrpl.rpc.command"     = string   // Command name
+"xrpl.rpc.version"     = int64    // API version
+"xrpl.rpc.role"        = string   // "admin" or "user"
+"xrpl.rpc.params"      = string   // Sanitized parameters (optional)
+```
+
+#### Peer & Message Attributes
+
+```cpp
+"xrpl.peer.id"            = string   // Peer public key (base58)
+"xrpl.peer.address"       = string   // IP:port
+"xrpl.peer.latency_ms"    = float64  // Measured latency
+"xrpl.peer.cluster"       = string   // Cluster name if clustered
+"xrpl.message.type"       = string   // Protocol message type name
+"xrpl.message.size_bytes" = int64    // Message size
+"xrpl.message.compressed" = bool     // Whether compressed
+```
+
+#### Ledger & Job Attributes
+
+```cpp
+"xrpl.ledger.hash"       = string   // Ledger hash
+"xrpl.ledger.index"      = int64    // Ledger sequence/index
+"xrpl.ledger.close_time" = int64    // Close time (epoch)
+"xrpl.ledger.tx_count"   = int64    // Transaction count
+"xrpl.job.type"          = string   // Job type name
+"xrpl.job.queue_ms"      = float64  // Time spent in queue
+"xrpl.job.worker"        = int64    // Worker thread ID
+```
+
+### 2.4.3 Data Collection Summary
+
+The following table summarizes what data is collected by category:
+
+| Category        | Attributes Collected                                                 | Purpose                     |
+| --------------- | -------------------------------------------------------------------- | --------------------------- |
+| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index`          | Trace transaction lifecycle |
+| **Consensus**   | `round`, `phase`, `mode`, `proposers` (public keys), `duration_ms`   | Analyze consensus timing    |
+| **RPC**         | `command`, `version`, `status`, `duration_ms`                        | Monitor RPC performance     |
+| **Peer**        | `peer.id` (public key), `latency_ms`, `message.type`, `message.size` | Network topology analysis   |
+| **Ledger**      | `ledger.hash`, `ledger.index`, `close_time`, `tx_count`              | Ledger progression tracking |
+| **Job**         | `job.type`, `queue_ms`, `worker`                                     | JobQueue performance        |
+
+### 2.4.4 Privacy & Sensitive Data Policy
+
+OpenTelemetry instrumentation is designed to collect **operational metadata only**, never sensitive content.
+
+#### Data NOT Collected
+
+The following data is explicitly **excluded** from telemetry collection:
+
+| Excluded Data           | Reason                                    |
+| ----------------------- | ----------------------------------------- |
+| **Private Keys**        | Never exposed; not relevant to tracing    |
+| **Account Balances**    | Financial data; privacy sensitive         |
+| **Transaction Amounts** | Financial data; privacy sensitive         |
+| **Raw TX Payloads**     | May contain sensitive memo/data fields    |
+| **Personal Data**       | No PII collected                          |
+| **IP Addresses**        | Configurable; excluded by default in prod |
+
+#### Privacy Protection Mechanisms
+
+| Mechanism                     | Description                                                               |
+| ----------------------------- | ------------------------------------------------------------------------- |
+| **Account Hashing**           | `xrpl.tx.account` is hashed at collector level before storage             |
+| **Configurable Redaction**    | Sensitive fields can be excluded via `[telemetry]` config section         |
+| **Sampling**                  | Only 10% of traces recorded by default, reducing data exposure            |
+| **Local Control**             | Node operators have full control over what gets exported                  |
+| **No Raw Payloads**           | Transaction content is never recorded, only metadata (hash, type, result) |
+| **Collector-Level Filtering** | Additional redaction/hashing can be configured at OTel Collector          |
+
+#### Collector-Level Data Protection
+
+The OpenTelemetry Collector can be configured to hash or redact sensitive attributes before export:
+
+```yaml
+processors:
+  attributes:
+    actions:
+      # Hash account addresses before storage
+      - key: xrpl.tx.account
+        action: hash
+      # Remove IP addresses entirely
+      - key: xrpl.peer.address
+        action: delete
+      # Redact specific fields
+      - key: xrpl.rpc.params
+        action: delete
+```
+
+#### Configuration Options for Privacy
+
+In `rippled.cfg`, operators can control data collection granularity:
+
+```ini
+[telemetry]
+enabled=1
+
+# Disable collection of specific components
+trace_transactions=1
+trace_consensus=1
+trace_rpc=1
+trace_peer=0          # Disable peer tracing (high volume, includes addresses)
+
+# Redact specific attributes
+redact_account=1      # Hash account addresses before export
+redact_peer_address=1 # Remove peer IP addresses
+```
+
+> **Key Principle**: Telemetry collects **operational metadata** (timing, counts, hashes) — never **sensitive content** (keys, balances, amounts, raw payloads).
+
+---
+
+## 2.5 Context Propagation Design
+
+### 2.5.1 Propagation Boundaries
+
+```mermaid
+flowchart TB
+    subgraph http["HTTP/WebSocket (RPC)"]
+        w3c["W3C Trace Context Headers:<br/>traceparent: 00-{trace_id}-{span_id}-{flags}<br/>tracestate: rippled=<state>"]
+    end
+
+    subgraph protobuf["Protocol Buffers (P2P)"]
+        proto["message TraceContext {<br/>  bytes trace_id = 1;  // 16 bytes<br/>  bytes span_id = 2;   // 8 bytes<br/>  uint32 trace_flags = 3;<br/>  string trace_state = 4;<br/>}"]
+    end
+
+    subgraph jobqueue["JobQueue (Internal Async)"]
+        job["Context captured at job creation,<br/>restored at execution<br/><br/>class Job {<br/>  opentelemetry::context::Context traceContext_;<br/>};"]
+    end
+
+    style http fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style protobuf fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style jobqueue fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+---
+
+## 2.6 Integration with Existing Observability
+
+### 2.6.1 Existing Frameworks Comparison
+
+rippled already has two observability mechanisms. OpenTelemetry complements (not replaces) them:
+
+| Aspect                | PerfLog                       | Beast Insight (StatsD)       | OpenTelemetry             |
+| --------------------- | ----------------------------- | ---------------------------- | ------------------------- |
+| **Type**              | Logging                       | Metrics                      | Distributed Tracing       |
+| **Data**              | JSON log entries              | Counters, gauges, histograms | Spans with context        |
+| **Scope**             | Single node                   | Single node                  | **Cross-node**            |
+| **Output**            | `perf.log` file               | StatsD server                | OTLP Collector            |
+| **Question answered** | "What happened on this node?" | "How many? How fast?"        | "What was the journey?"   |
+| **Correlation**       | By timestamp                  | By metric name               | By `trace_id`             |
+| **Overhead**          | Low (file I/O)                | Low (UDP packets)            | Low-Medium (configurable) |
+
+### 2.6.2 What Each Framework Does Best
+
+#### PerfLog
+
+- **Purpose**: Detailed local event logging for RPC and job execution
+- **Strengths**:
+  - Rich JSON output with timing data
+  - Already integrated in RPC handlers
+  - File-based, no external dependencies
+- **Limitations**:
+  - Single-node only (no cross-node correlation)
+  - No parent-child relationships between events
+  - Manual log parsing required
+
+```json
+// Example PerfLog entry
+{
+  "time": "2024-01-15T10:30:00.123Z",
+  "method": "submit",
+  "duration_us": 1523,
+  "result": "tesSUCCESS"
+}
+```
+
+#### Beast Insight (StatsD)
+
+- **Purpose**: Real-time metrics for monitoring dashboards
+- **Strengths**:
+  - Aggregated metrics (counters, gauges, histograms)
+  - Low overhead (UDP, fire-and-forget)
+  - Good for alerting thresholds
+- **Limitations**:
+  - No request-level detail
+  - No causal relationships
+  - Single-node perspective
+
+```cpp
+// Example StatsD usage in rippled
+insight.increment("rpc.submit.count");
+insight.gauge("ledger.age", age);
+insight.timing("consensus.round", duration);
+```
+
+#### OpenTelemetry (NEW)
+
+- **Purpose**: Distributed request tracing across nodes
+- **Strengths**:
+  - **Cross-node correlation** via `trace_id`
+  - Parent-child span relationships
+  - Rich attributes per span
+  - Industry standard (CNCF)
+- **Limitations**:
+  - Requires collector infrastructure
+  - Higher complexity than logging
+
+```cpp
+// Example OpenTelemetry span
+auto span = telemetry.startSpan("tx.relay");
+span->SetAttribute("tx.hash", hash);
+span->SetAttribute("peer.id", peerId);
+// Span automatically linked to parent via context
+```
+
+### 2.6.3 When to Use Each
+
+| Scenario                                | PerfLog    | StatsD | OpenTelemetry |
+| --------------------------------------- | ---------- | ------ | ------------- |
+| "How many TXs per second?"              | ❌         | ✅     | ❌            |
+| "What's the p99 RPC latency?"           | ❌         | ✅     | ✅            |
+| "Why was this specific TX slow?"        | ⚠️ partial | ❌     | ✅            |
+| "Which node delayed consensus?"         | ❌         | ❌     | ✅            |
+| "What happened on node X at time T?"    | ✅         | ❌     | ✅            |
+| "Show me the TX journey across 5 nodes" | ❌         | ❌     | ✅            |
+
+### 2.6.4 Coexistence Strategy
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Process"]
+        perflog["PerfLog<br/>(JSON to file)"]
+        insight["Beast Insight<br/>(StatsD)"]
+        otel["OpenTelemetry<br/>(Tracing)"]
+    end
+
+    perflog --> perffile["perf.log"]
+    insight --> statsd["StatsD Server"]
+    otel --> collector["OTLP Collector"]
+
+    perffile --> grafana["Grafana<br/>(Unified UI)"]
+    statsd --> grafana
+    collector --> grafana
+
+    style rippled fill:#212121,stroke:#0a0a0a,color:#ffffff
+    style grafana fill:#bf360c,stroke:#8c2809,color:#ffffff
+```
+
+### 2.6.5 Correlation with PerfLog
+
+Trace IDs can be correlated with existing PerfLog entries for comprehensive debugging:
+
+```cpp
+// In RPCHandler.cpp - correlate trace with PerfLog
+Status doCommand(RPC::JsonContext& context, Json::Value& result)
+{
+    // Start OpenTelemetry span
+    auto span = context.app.getTelemetry().startSpan(
+        "rpc.command." + context.method);
+
+    // Get trace ID for correlation
+    auto traceId = span->GetContext().trace_id().IsValid()
+        ? toHex(span->GetContext().trace_id())
+        : "";
+
+    // Use existing PerfLog with trace correlation
+    auto const curId = context.app.getPerfLog().currentId();
+    context.app.getPerfLog().rpcStart(context.method, curId);
+
+    // Future: Add trace ID to PerfLog entry
+    // context.app.getPerfLog().setTraceId(curId, traceId);
+
+    try {
+        auto ret = handler(context, result);
+        context.app.getPerfLog().rpcFinish(context.method, curId);
+        span->SetStatus(opentelemetry::trace::StatusCode::kOk);
+        return ret;
+    } catch (std::exception const& e) {
+        context.app.getPerfLog().rpcError(context.method, curId);
+        span->RecordException(e);
+        span->SetStatus(opentelemetry::trace::StatusCode::kError, e.what());
+        throw;
+    }
+}
+```
+
+---
+
+_Previous: [Architecture Analysis](./01-architecture-analysis.md)_ | _Next: [Implementation Strategy](./03-implementation-strategy.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/03-implementation-strategy.md
+++ b/OpenTelemetryPlan/03-implementation-strategy.md
@@ -0,0 +1,451 @@
+# Implementation Strategy
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Code Samples](./04-code-samples.md) | [Configuration Reference](./05-configuration-reference.md)
+
+---
+
+## 3.1 Directory Structure
+
+The telemetry implementation follows rippled's existing code organization pattern:
+
+```
+include/xrpl/
+├── telemetry/
+│   ├── Telemetry.h              # Main telemetry interface
+│   ├── TelemetryConfig.h        # Configuration structures
+│   ├── TraceContext.h           # Context propagation utilities
+│   ├── SpanGuard.h              # RAII span management
+│   └── SpanAttributes.h         # Attribute helper functions
+
+src/libxrpl/
+├── telemetry/
+│   ├── Telemetry.cpp            # Implementation
+│   ├── TelemetryConfig.cpp      # Config parsing
+│   ├── TraceContext.cpp         # Context serialization
+│   └── NullTelemetry.cpp        # No-op implementation
+
+src/xrpld/
+├── telemetry/
+│   ├── TracingInstrumentation.h # Instrumentation macros
+│   └── TracingInstrumentation.cpp
+```
+
+---
+
+## 3.2 Implementation Approach
+
+<div align="center">
+
+```mermaid
+%%{init: {'flowchart': {'nodeSpacing': 20, 'rankSpacing': 30}}}%%
+flowchart TB
+    subgraph phase1["Phase 1: Core"]
+        direction LR
+        sdk["SDK Integration"] ~~~ interface["Telemetry Interface"] ~~~ config["Configuration"]
+    end
+
+    subgraph phase2["Phase 2: RPC"]
+        direction LR
+        http["HTTP Context"] ~~~ rpc["RPC Handlers"]
+    end
+
+    subgraph phase3["Phase 3: P2P"]
+        direction LR
+        proto["Protobuf Context"] ~~~ tx["Transaction Relay"]
+    end
+
+    subgraph phase4["Phase 4: Consensus"]
+        direction LR
+        consensus["Consensus Rounds"] ~~~ proposals["Proposals"]
+    end
+
+    phase1 --> phase2 --> phase3 --> phase4
+
+    style phase1 fill:#1565c0,stroke:#0d47a1,color:#ffffff
+    style phase2 fill:#2e7d32,stroke:#1b5e20,color:#ffffff
+    style phase3 fill:#e65100,stroke:#bf360c,color:#ffffff
+    style phase4 fill:#c2185b,stroke:#880e4f,color:#ffffff
+```
+
+</div>
+
+### Key Principles
+
+1. **Minimal Intrusion**: Instrumentation should not alter existing control flow
+2. **Zero-Cost When Disabled**: Use compile-time flags and no-op implementations
+3. **Backward Compatibility**: Protocol Buffer extensions use high field numbers
+4. **Graceful Degradation**: Tracing failures must not affect node operation
+
+---
+
+## 3.3 Performance Overhead Summary
+
+| Metric        | Overhead   | Notes                               |
+| ------------- | ---------- | ----------------------------------- |
+| CPU           | 1-3%       | Span creation and attribute setting |
+| Memory        | 2-5 MB     | Batch buffer for pending spans      |
+| Network       | 10-50 KB/s | Compressed OTLP export to collector |
+| Latency (p99) | <2%        | With proper sampling configuration  |
+
+---
+
+## 3.4 Detailed CPU Overhead Analysis
+
+### 3.4.1 Per-Operation Costs
+
+| Operation             | Time (ns) | Frequency              | Impact     |
+| --------------------- | --------- | ---------------------- | ---------- |
+| Span creation         | 200-500   | Every traced operation | Low        |
+| Span end              | 100-200   | Every traced operation | Low        |
+| SetAttribute (string) | 80-120    | 3-5 per span           | Low        |
+| SetAttribute (int)    | 40-60     | 2-3 per span           | Negligible |
+| AddEvent              | 50-80     | 0-2 per span           | Negligible |
+| Context injection     | 150-250   | Per outgoing message   | Low        |
+| Context extraction    | 100-180   | Per incoming message   | Low        |
+| GetCurrent context    | 10-20     | Thread-local access    | Negligible |
+
+### 3.4.2 Transaction Processing Overhead
+
+<div align="center">
+
+```mermaid
+%%{init: {'pie': {'textPosition': 0.75}}}%%
+pie showData
+    "tx.receive (800ns)" : 800
+    "tx.validate (500ns)" : 500
+    "tx.relay (500ns)" : 500
+    "Context inject (600ns)" : 600
+```
+
+**Transaction Tracing Overhead (~2.4μs total)**
+
+</div>
+
+**Overhead percentage**: 2.4 μs / 200 μs (avg tx processing) = **~1.2%**
+
+### 3.4.3 Consensus Round Overhead
+
+| Operation              | Count | Cost (ns) | Total      |
+| ---------------------- | ----- | --------- | ---------- |
+| consensus.round span   | 1     | ~1000     | ~1 μs      |
+| consensus.phase spans  | 3     | ~700      | ~2.1 μs    |
+| proposal.receive spans | ~20   | ~600      | ~12 μs     |
+| proposal.send spans    | ~3    | ~600      | ~1.8 μs    |
+| Context operations     | ~30   | ~200      | ~6 μs      |
+| **TOTAL**              |       |           | **~23 μs** |
+
+**Overhead percentage**: 23 μs / 3s (typical round) = **~0.0008%** (negligible)
+
+### 3.4.4 RPC Request Overhead
+
+| Operation        | Cost (ns)    |
+| ---------------- | ------------ |
+| rpc.request span | ~700         |
+| rpc.command span | ~600         |
+| Context extract  | ~250         |
+| Context inject   | ~200         |
+| **TOTAL**        | **~1.75 μs** |
+
+- Fast RPC (1ms): 1.75 μs / 1ms = **~0.175%**
+- Slow RPC (100ms): 1.75 μs / 100ms = **~0.002%**
+
+---
+
+## 3.5 Memory Overhead Analysis
+
+### 3.5.1 Static Memory
+
+| Component                | Size        | Allocated  |
+| ------------------------ | ----------- | ---------- |
+| TracerProvider singleton | ~64 KB      | At startup |
+| BatchSpanProcessor       | ~128 KB     | At startup |
+| OTLP exporter            | ~256 KB     | At startup |
+| Propagator registry      | ~8 KB       | At startup |
+| **Total static**         | **~456 KB** |            |
+
+### 3.5.2 Dynamic Memory
+
+| Component            | Size per unit | Max units  | Peak        |
+| -------------------- | ------------- | ---------- | ----------- |
+| Active span          | ~200 bytes    | 1000       | ~200 KB     |
+| Queued span (export) | ~500 bytes    | 2048       | ~1 MB       |
+| Attribute storage    | ~50 bytes     | 5 per span | Included    |
+| Context storage      | ~64 bytes     | Per thread | ~6.4 KB     |
+| **Total dynamic**    |               |            | **~1.2 MB** |
+
+### 3.5.3 Memory Growth Characteristics
+
+```mermaid
+---
+config:
+    xyChart:
+        width: 700
+        height: 400
+---
+xychart-beta
+    title "Memory Usage vs Span Rate"
+    x-axis "Spans/second" [0, 200, 400, 600, 800, 1000]
+    y-axis "Memory (MB)" 0 --> 6
+    line [1, 1.8, 2.6, 3.4, 4.2, 5]
+```
+
+**Notes**:
+
+- Memory increases linearly with span rate
+- Batch export prevents unbounded growth
+- Queue size is configurable (default 2048 spans)
+- At queue limit, oldest spans are dropped (not blocked)
+
+---
+
+## 3.6 Network Overhead Analysis
+
+### 3.6.1 Export Bandwidth
+
+| Sampling Rate | Spans/sec | Bandwidth | Notes            |
+| ------------- | --------- | --------- | ---------------- |
+| 100%          | ~500      | ~250 KB/s | Development only |
+| 10%           | ~50       | ~25 KB/s  | Staging          |
+| 1%            | ~5        | ~2.5 KB/s | Production       |
+| Error-only    | ~1        | ~0.5 KB/s | Minimal overhead |
+
+### 3.6.2 Trace Context Propagation
+
+| Message Type           | Context Size | Messages/sec | Overhead    |
+| ---------------------- | ------------ | ------------ | ----------- |
+| TMTransaction          | 32 bytes     | ~100         | ~3.2 KB/s   |
+| TMProposeSet           | 32 bytes     | ~10          | ~320 B/s    |
+| TMValidation           | 32 bytes     | ~50          | ~1.6 KB/s   |
+| **Total P2P overhead** |              |              | **~5 KB/s** |
+
+---
+
+## 3.7 Optimization Strategies
+
+### 3.7.1 Sampling Strategies
+
+```mermaid
+flowchart TD
+    trace["New Trace"]
+
+    trace --> errors{"Is Error?"}
+    errors -->|Yes| sample["SAMPLE"]
+    errors -->|No| consensus{"Is Consensus?"}
+
+    consensus -->|Yes| sample
+    consensus -->|No| slow{"Is Slow?"}
+
+    slow -->|Yes| sample
+    slow -->|No| prob{"Random < 10%?"}
+
+    prob -->|Yes| sample
+    prob -->|No| drop["DROP"]
+
+    style sample fill:#4caf50,stroke:#388e3c,color:#fff
+    style drop fill:#f44336,stroke:#c62828,color:#fff
+```
+
+### 3.7.2 Batch Tuning Recommendations
+
+| Environment        | Batch Size | Batch Delay | Max Queue |
+| ------------------ | ---------- | ----------- | --------- |
+| Low-latency        | 128        | 1000ms      | 512       |
+| High-throughput    | 1024       | 10000ms     | 8192      |
+| Memory-constrained | 256        | 2000ms      | 512       |
+
+### 3.7.3 Conditional Instrumentation
+
+```cpp
+// Compile-time feature flag
+#ifndef XRPL_ENABLE_TELEMETRY
+// Zero-cost when disabled
+#define XRPL_TRACE_SPAN(t, n) ((void)0)
+#endif
+
+// Runtime component filtering
+if (telemetry.shouldTracePeer())
+{
+    XRPL_TRACE_SPAN(telemetry, "peer.message.receive");
+    // ... instrumentation
+}
+// No overhead when component tracing disabled
+```
+
+---
+
+## 3.8 Links to Detailed Documentation
+
+- **[Code Samples](./04-code-samples.md)**: Complete implementation code for all components
+- **[Configuration Reference](./05-configuration-reference.md)**: Configuration options and collector setup
+- **[Implementation Phases](./06-implementation-phases.md)**: Detailed timeline and milestones
+
+---
+
+## 3.9 Code Intrusiveness Assessment
+
+This section provides a detailed assessment of how intrusive the OpenTelemetry integration is to the existing rippled codebase.
+
+### 3.9.1 Files Modified Summary
+
+| Component             | Files Modified | Lines Added | Lines Changed | Architectural Impact |
+| --------------------- | -------------- | ----------- | ------------- | -------------------- |
+| **Core Telemetry**    | 5 new files    | ~800        | 0             | None (new module)    |
+| **Application Init**  | 2 files        | ~30         | ~5            | Minimal              |
+| **RPC Layer**         | 3 files        | ~80         | ~20           | Minimal              |
+| **Transaction Relay** | 4 files        | ~120        | ~40           | Low                  |
+| **Consensus**         | 3 files        | ~100        | ~30           | Low-Medium           |
+| **Protocol Buffers**  | 1 file         | ~25         | 0             | Low                  |
+| **CMake/Build**       | 3 files        | ~50         | ~10           | Minimal              |
+| **Total**             | **~21 files**  | **~1,205**  | **~105**      | **Low**              |
+
+### 3.9.2 Detailed File Impact
+
+```mermaid
+pie title Code Changes by Component
+    "New Telemetry Module" : 800
+    "Transaction Relay" : 160
+    "Consensus" : 130
+    "RPC Layer" : 100
+    "Application Init" : 35
+    "Protocol Buffers" : 25
+    "Build System" : 60
+```
+
+#### New Files (No Impact on Existing Code)
+
+| File                                           | Lines | Purpose              |
+| ---------------------------------------------- | ----- | -------------------- |
+| `include/xrpl/telemetry/Telemetry.h`           | ~160  | Main interface       |
+| `include/xrpl/telemetry/SpanGuard.h`           | ~120  | RAII wrapper         |
+| `include/xrpl/telemetry/TraceContext.h`        | ~80   | Context propagation  |
+| `src/xrpld/telemetry/TracingInstrumentation.h` | ~60   | Macros               |
+| `src/libxrpl/telemetry/Telemetry.cpp`          | ~200  | Implementation       |
+| `src/libxrpl/telemetry/TelemetryConfig.cpp`    | ~60   | Config parsing       |
+| `src/libxrpl/telemetry/NullTelemetry.cpp`      | ~40   | No-op implementation |
+
+#### Modified Files (Existing Rippled Code)
+
+| File                                              | Lines Added | Lines Changed | Risk Level |
+| ------------------------------------------------- | ----------- | ------------- | ---------- |
+| `src/xrpld/app/main/Application.cpp`              | ~15         | ~3            | Low        |
+| `include/xrpl/app/main/Application.h`             | ~5          | ~2            | Low        |
+| `src/xrpld/rpc/detail/ServerHandler.cpp`          | ~40         | ~10           | Low        |
+| `src/xrpld/rpc/handlers/*.cpp`                    | ~30         | ~8            | Low        |
+| `src/xrpld/overlay/detail/PeerImp.cpp`            | ~60         | ~15           | Medium     |
+| `src/xrpld/overlay/detail/OverlayImpl.cpp`        | ~30         | ~10           | Medium     |
+| `src/xrpld/app/consensus/RCLConsensus.cpp`        | ~50         | ~15           | Medium     |
+| `src/xrpld/app/consensus/RCLConsensusAdaptor.cpp` | ~40         | ~12           | Medium     |
+| `src/xrpld/core/JobQueue.cpp`                     | ~20         | ~5            | Low        |
+| `src/xrpld/overlay/detail/ripple.proto`           | ~25         | 0             | Low        |
+| `CMakeLists.txt`                                  | ~40         | ~8            | Low        |
+| `cmake/FindOpenTelemetry.cmake`                   | ~50         | 0             | None (new) |
+
+### 3.9.3 Risk Assessment by Component
+
+<div align="center">
+
+**Do First** ↖ ↗ **Plan Carefully**
+
+```mermaid
+quadrantChart
+    title Code Intrusiveness Risk Matrix
+    x-axis Low Risk --> High Risk
+    y-axis Low Value --> High Value
+
+    RPC Tracing: [0.2, 0.8]
+    Transaction Relay: [0.5, 0.9]
+    Consensus Tracing: [0.7, 0.95]
+    Peer Message Tracing: [0.8, 0.4]
+    JobQueue Context: [0.4, 0.5]
+    Ledger Acquisition: [0.5, 0.6]
+```
+
+**Optional** ↙ ↘ **Avoid**
+
+</div>
+
+#### Risk Level Definitions
+
+| Risk Level | Definition                                                       | Mitigation                         |
+| ---------- | ---------------------------------------------------------------- | ---------------------------------- |
+| **Low**    | Additive changes only; no modification to existing logic         | Standard code review               |
+| **Medium** | Minor modifications to existing functions; clear boundaries      | Comprehensive unit tests           |
+| **High**   | Changes to core logic or data structures; potential side effects | Integration tests + staged rollout |
+
+### 3.9.4 Architectural Impact Assessment
+
+| Aspect               | Impact  | Justification                                                         |
+| -------------------- | ------- | --------------------------------------------------------------------- |
+| **Data Flow**        | None    | Tracing is purely observational; no business logic changes            |
+| **Threading Model**  | Minimal | Context propagation uses thread-local storage (standard OTel pattern) |
+| **Memory Model**     | Low     | Bounded queues prevent unbounded growth; RAII ensures cleanup         |
+| **Network Protocol** | Low     | Optional fields in protobuf (high field numbers); backward compatible |
+| **Configuration**    | None    | New config section; existing configs unaffected                       |
+| **Build System**     | Low     | Optional CMake flag; builds work without OpenTelemetry                |
+| **Dependencies**     | Low     | OpenTelemetry SDK is optional; null implementation when disabled      |
+
+### 3.9.5 Backward Compatibility
+
+| Compatibility   | Status  | Notes                                                 |
+| --------------- | ------- | ----------------------------------------------------- |
+| **Config File** | ✅ Full | New `[telemetry]` section is optional                 |
+| **Protocol**    | ✅ Full | Optional protobuf fields with high field numbers      |
+| **Build**       | ✅ Full | `XRPL_ENABLE_TELEMETRY=OFF` produces identical binary |
+| **Runtime**     | ✅ Full | `enabled=0` produces zero overhead                    |
+| **API**         | ✅ Full | No changes to public RPC or P2P APIs                  |
+
+### 3.9.6 Rollback Strategy
+
+If issues are discovered after deployment:
+
+1. **Immediate**: Set `enabled=0` in config and restart (zero code change)
+2. **Quick**: Rebuild with `XRPL_ENABLE_TELEMETRY=OFF`
+3. **Complete**: Revert telemetry commits (clean separation makes this easy)
+
+### 3.9.7 Code Change Examples
+
+**Minimal RPC Instrumentation (Low Intrusiveness):**
+
+```cpp
+// Before
+void ServerHandler::onRequest(...) {
+    auto result = processRequest(req);
+    send(result);
+}
+
+// After (only ~10 lines added)
+void ServerHandler::onRequest(...) {
+    XRPL_TRACE_RPC(app_.getTelemetry(), "rpc.request");  // +1 line
+    XRPL_TRACE_SET_ATTR("xrpl.rpc.command", command);     // +1 line
+
+    auto result = processRequest(req);
+
+    XRPL_TRACE_SET_ATTR("xrpl.rpc.status", status);       // +1 line
+    send(result);
+}
+```
+
+**Consensus Instrumentation (Medium Intrusiveness):**
+
+```cpp
+// Before
+void RCLConsensusAdaptor::startRound(...) {
+    // ... existing logic
+}
+
+// After (context storage required)
+void RCLConsensusAdaptor::startRound(...) {
+    XRPL_TRACE_CONSENSUS(app_.getTelemetry(), "consensus.round");
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.seq", seq);
+
+    // Store context for child spans in phase transitions
+    currentRoundContext_ = _xrpl_guard_->context();  // New member variable
+
+    // ... existing logic unchanged
+}
+```
+
+---
+
+_Previous: [Design Decisions](./02-design-decisions.md)_ | _Next: [Code Samples](./04-code-samples.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/04-code-samples.md
+++ b/OpenTelemetryPlan/04-code-samples.md
@@ -0,0 +1,982 @@
+# Code Samples
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Implementation Strategy](./03-implementation-strategy.md) | [Configuration Reference](./05-configuration-reference.md)
+
+---
+
+## 4.1 Core Interfaces
+
+### 4.1.1 Main Telemetry Interface
+
+```cpp
+// include/xrpl/telemetry/Telemetry.h
+#pragma once
+
+#include <xrpl/telemetry/TelemetryConfig.h>
+#include <opentelemetry/trace/tracer.h>
+#include <opentelemetry/trace/span.h>
+#include <opentelemetry/context/context.h>
+
+#include <memory>
+#include <string>
+#include <string_view>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * Main telemetry interface for OpenTelemetry integration.
+ *
+ * This class provides the primary API for distributed tracing in rippled.
+ * It manages the OpenTelemetry SDK lifecycle and provides convenience
+ * methods for creating spans and propagating context.
+ */
+class Telemetry
+{
+public:
+    /**
+     * Configuration for the telemetry system.
+     */
+    struct Setup
+    {
+        bool enabled = false;
+        std::string serviceName = "rippled";
+        std::string serviceVersion;
+        std::string serviceInstanceId;  // Node public key
+
+        // Exporter configuration
+        std::string exporterType = "otlp_grpc";  // "otlp_grpc", "otlp_http", "none"
+        std::string exporterEndpoint = "localhost:4317";
+        bool useTls = false;
+        std::string tlsCertPath;
+
+        // Sampling configuration
+        double samplingRatio = 1.0;  // 1.0 = 100% sampling
+
+        // Batch processor settings
+        std::uint32_t batchSize = 512;
+        std::chrono::milliseconds batchDelay{5000};
+        std::uint32_t maxQueueSize = 2048;
+
+        // Network attributes
+        std::uint32_t networkId = 0;
+        std::string networkType = "mainnet";
+
+        // Component filtering
+        bool traceTransactions = true;
+        bool traceConsensus = true;
+        bool traceRpc = true;
+        bool tracePeer = false;  // High volume, disabled by default
+        bool traceLedger = true;
+    };
+
+    virtual ~Telemetry() = default;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // LIFECYCLE
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Start the telemetry system (call after configuration) */
+    virtual void start() = 0;
+
+    /** Stop the telemetry system (flushes pending spans) */
+    virtual void stop() = 0;
+
+    /** Check if telemetry is enabled */
+    virtual bool isEnabled() const = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // TRACER ACCESS
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Get the tracer for creating spans */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Tracer>
+    getTracer(std::string_view name = "rippled") = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // SPAN CREATION (Convenience Methods)
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Start a new span with default options */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span>
+    startSpan(
+        std::string_view name,
+        opentelemetry::trace::SpanKind kind =
+            opentelemetry::trace::SpanKind::kInternal) = 0;
+
+    /** Start a span as child of given context */
+    virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span>
+    startSpan(
+        std::string_view name,
+        opentelemetry::context::Context const& parentContext,
+        opentelemetry::trace::SpanKind kind =
+            opentelemetry::trace::SpanKind::kInternal) = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // CONTEXT PROPAGATION
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Serialize context for network transmission */
+    virtual std::string serializeContext(
+        opentelemetry::context::Context const& ctx) = 0;
+
+    /** Deserialize context from network data */
+    virtual opentelemetry::context::Context deserializeContext(
+        std::string const& serialized) = 0;
+
+    // ═══════════════════════════════════════════════════════════════════════
+    // COMPONENT FILTERING
+    // ═══════════════════════════════════════════════════════════════════════
+
+    /** Check if transaction tracing is enabled */
+    virtual bool shouldTraceTransactions() const = 0;
+
+    /** Check if consensus tracing is enabled */
+    virtual bool shouldTraceConsensus() const = 0;
+
+    /** Check if RPC tracing is enabled */
+    virtual bool shouldTraceRpc() const = 0;
+
+    /** Check if peer message tracing is enabled */
+    virtual bool shouldTracePeer() const = 0;
+};
+
+// Factory functions
+std::unique_ptr<Telemetry>
+make_Telemetry(
+    Telemetry::Setup const& setup,
+    beast::Journal journal);
+
+Telemetry::Setup
+setup_Telemetry(
+    Section const& section,
+    std::string const& nodePublicKey,
+    std::string const& version);
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.2 RAII Span Guard
+
+```cpp
+// include/xrpl/telemetry/SpanGuard.h
+#pragma once
+
+#include <opentelemetry/trace/span.h>
+#include <opentelemetry/trace/scope.h>
+#include <opentelemetry/trace/status.h>
+
+#include <string_view>
+#include <exception>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * RAII guard for OpenTelemetry spans.
+ *
+ * Automatically ends the span on destruction and makes it the current
+ * span in the thread-local context.
+ */
+class SpanGuard
+{
+    opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> span_;
+    opentelemetry::trace::Scope scope_;
+
+public:
+    /**
+     * Construct guard with span.
+     * The span becomes the current span in thread-local context.
+     */
+    explicit SpanGuard(
+        opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> span)
+        : span_(std::move(span))
+        , scope_(span_)
+    {
+    }
+
+    // Non-copyable, non-movable
+    SpanGuard(SpanGuard const&) = delete;
+    SpanGuard& operator=(SpanGuard const&) = delete;
+    SpanGuard(SpanGuard&&) = delete;
+    SpanGuard& operator=(SpanGuard&&) = delete;
+
+    ~SpanGuard()
+    {
+        if (span_)
+            span_->End();
+    }
+
+    /** Access the underlying span */
+    opentelemetry::trace::Span& span() { return *span_; }
+    opentelemetry::trace::Span const& span() const { return *span_; }
+
+    /** Set span status to OK */
+    void setOk()
+    {
+        span_->SetStatus(opentelemetry::trace::StatusCode::kOk);
+    }
+
+    /** Set span status with code and description */
+    void setStatus(
+        opentelemetry::trace::StatusCode code,
+        std::string_view description = "")
+    {
+        span_->SetStatus(code, std::string(description));
+    }
+
+    /** Set an attribute on the span */
+    template<typename T>
+    void setAttribute(std::string_view key, T&& value)
+    {
+        span_->SetAttribute(
+            opentelemetry::nostd::string_view(key.data(), key.size()),
+            std::forward<T>(value));
+    }
+
+    /** Add an event to the span */
+    void addEvent(std::string_view name)
+    {
+        span_->AddEvent(std::string(name));
+    }
+
+    /** Record an exception on the span */
+    void recordException(std::exception const& e)
+    {
+        span_->RecordException(e);
+        span_->SetStatus(
+            opentelemetry::trace::StatusCode::kError,
+            e.what());
+    }
+
+    /** Get the current trace context */
+    opentelemetry::context::Context context() const
+    {
+        return opentelemetry::context::RuntimeContext::GetCurrent();
+    }
+};
+
+/**
+ * No-op span guard for when tracing is disabled.
+ * Provides the same interface but does nothing.
+ */
+class NullSpanGuard
+{
+public:
+    NullSpanGuard() = default;
+
+    void setOk() {}
+    void setStatus(opentelemetry::trace::StatusCode, std::string_view = "") {}
+
+    template<typename T>
+    void setAttribute(std::string_view, T&&) {}
+
+    void addEvent(std::string_view) {}
+    void recordException(std::exception const&) {}
+};
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.3 Instrumentation Macros
+
+```cpp
+// src/xrpld/telemetry/TracingInstrumentation.h
+#pragma once
+
+#include <xrpl/telemetry/Telemetry.h>
+#include <xrpl/telemetry/SpanGuard.h>
+
+namespace xrpl {
+namespace telemetry {
+
+// ═══════════════════════════════════════════════════════════════════════════
+// INSTRUMENTATION MACROS
+// ═══════════════════════════════════════════════════════════════════════════
+
+#ifdef XRPL_ENABLE_TELEMETRY
+
+// Start a span that is automatically ended when guard goes out of scope
+#define XRPL_TRACE_SPAN(telemetry, name) \
+    auto _xrpl_span_ = (telemetry).startSpan(name); \
+    ::xrpl::telemetry::SpanGuard _xrpl_guard_(_xrpl_span_)
+
+// Start a span with specific kind
+#define XRPL_TRACE_SPAN_KIND(telemetry, name, kind) \
+    auto _xrpl_span_ = (telemetry).startSpan(name, kind); \
+    ::xrpl::telemetry::SpanGuard _xrpl_guard_(_xrpl_span_)
+
+// Conditional span based on component
+#define XRPL_TRACE_TX(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceTransactions()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+#define XRPL_TRACE_CONSENSUS(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceConsensus()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+#define XRPL_TRACE_RPC(telemetry, name) \
+    std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+    if ((telemetry).shouldTraceRpc()) { \
+        _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+    }
+
+// Set attribute on current span (if exists)
+#define XRPL_TRACE_SET_ATTR(key, value) \
+    if (_xrpl_guard_.has_value()) { \
+        _xrpl_guard_->setAttribute(key, value); \
+    }
+
+// Record exception on current span
+#define XRPL_TRACE_EXCEPTION(e) \
+    if (_xrpl_guard_.has_value()) { \
+        _xrpl_guard_->recordException(e); \
+    }
+
+#else  // XRPL_ENABLE_TELEMETRY not defined
+
+#define XRPL_TRACE_SPAN(telemetry, name) ((void)0)
+#define XRPL_TRACE_SPAN_KIND(telemetry, name, kind) ((void)0)
+#define XRPL_TRACE_TX(telemetry, name) ((void)0)
+#define XRPL_TRACE_CONSENSUS(telemetry, name) ((void)0)
+#define XRPL_TRACE_RPC(telemetry, name) ((void)0)
+#define XRPL_TRACE_SET_ATTR(key, value) ((void)0)
+#define XRPL_TRACE_EXCEPTION(e) ((void)0)
+
+#endif  // XRPL_ENABLE_TELEMETRY
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.4 Protocol Buffer Extensions
+
+### 4.4.1 TraceContext Message Definition
+
+Add to `src/xrpld/overlay/detail/ripple.proto`:
+
+```protobuf
+// Trace context for distributed tracing across nodes
+// Uses W3C Trace Context format internally
+message TraceContext {
+    // 16-byte trace identifier (required for valid context)
+    bytes trace_id = 1;
+
+    // 8-byte span identifier of parent span
+    bytes span_id = 2;
+
+    // Trace flags (bit 0 = sampled)
+    uint32 trace_flags = 3;
+
+    // W3C tracestate header value for vendor-specific data
+    string trace_state = 4;
+}
+
+// Extend existing messages with optional trace context
+// High field numbers (1000+) to avoid conflicts
+
+message TMTransaction {
+    // ... existing fields ...
+
+    // Optional trace context for distributed tracing
+    optional TraceContext trace_context = 1001;
+}
+
+message TMProposeSet {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMValidation {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMGetLedger {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+
+message TMLedgerData {
+    // ... existing fields ...
+    optional TraceContext trace_context = 1001;
+}
+```
+
+### 4.4.2 Context Serialization/Deserialization
+
+```cpp
+// include/xrpl/telemetry/TraceContext.h
+#pragma once
+
+#include <opentelemetry/context/context.h>
+#include <opentelemetry/trace/span_context.h>
+#include <protocol/messages.h>  // Generated protobuf
+
+#include <optional>
+#include <string>
+
+namespace xrpl {
+namespace telemetry {
+
+/**
+ * Utilities for trace context serialization and propagation.
+ */
+class TraceContextPropagator
+{
+public:
+    /**
+     * Extract trace context from Protocol Buffer message.
+     * Returns empty context if no trace info present.
+     */
+    static opentelemetry::context::Context
+    extract(protocol::TraceContext const& proto);
+
+    /**
+     * Inject current trace context into Protocol Buffer message.
+     */
+    static void
+    inject(
+        opentelemetry::context::Context const& ctx,
+        protocol::TraceContext& proto);
+
+    /**
+     * Extract trace context from HTTP headers (for RPC).
+     * Supports W3C Trace Context (traceparent, tracestate).
+     */
+    static opentelemetry::context::Context
+    extractFromHeaders(
+        std::function<std::optional<std::string>(std::string_view)> headerGetter);
+
+    /**
+     * Inject trace context into HTTP headers (for RPC responses).
+     */
+    static void
+    injectToHeaders(
+        opentelemetry::context::Context const& ctx,
+        std::function<void(std::string_view, std::string_view)> headerSetter);
+};
+
+// ═══════════════════════════════════════════════════════════════════════════
+// IMPLEMENTATION
+// ═══════════════════════════════════════════════════════════════════════════
+
+inline opentelemetry::context::Context
+TraceContextPropagator::extract(protocol::TraceContext const& proto)
+{
+    using namespace opentelemetry::trace;
+
+    if (proto.trace_id().size() != 16 || proto.span_id().size() != 8)
+        return opentelemetry::context::Context{};  // Invalid, return empty
+
+    // Construct TraceId and SpanId from bytes
+    TraceId traceId(reinterpret_cast<uint8_t const*>(proto.trace_id().data()));
+    SpanId spanId(reinterpret_cast<uint8_t const*>(proto.span_id().data()));
+    TraceFlags flags(static_cast<uint8_t>(proto.trace_flags()));
+
+    // Create SpanContext from extracted data
+    SpanContext spanContext(traceId, spanId, flags, /* remote = */ true);
+
+    // Create context with extracted span as parent
+    return opentelemetry::context::Context{}.SetValue(
+        opentelemetry::trace::kSpanKey,
+        opentelemetry::nostd::shared_ptr<Span>(
+            new DefaultSpan(spanContext)));
+}
+
+inline void
+TraceContextPropagator::inject(
+    opentelemetry::context::Context const& ctx,
+    protocol::TraceContext& proto)
+{
+    using namespace opentelemetry::trace;
+
+    // Get current span from context
+    auto span = GetSpan(ctx);
+    if (!span)
+        return;
+
+    auto const& spanCtx = span->GetContext();
+    if (!spanCtx.IsValid())
+        return;
+
+    // Serialize trace_id (16 bytes)
+    auto const& traceId = spanCtx.trace_id();
+    proto.set_trace_id(traceId.Id().data(), TraceId::kSize);
+
+    // Serialize span_id (8 bytes)
+    auto const& spanId = spanCtx.span_id();
+    proto.set_span_id(spanId.Id().data(), SpanId::kSize);
+
+    // Serialize flags
+    proto.set_trace_flags(spanCtx.trace_flags().flags());
+
+    // Note: tracestate not implemented yet
+}
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 4.5 Module-Specific Instrumentation
+
+### 4.5.1 Transaction Relay Instrumentation
+
+```cpp
+// src/xrpld/overlay/detail/PeerImp.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+PeerImp::handleTransaction(
+    std::shared_ptr<protocol::TMTransaction> const& m)
+{
+    // Extract trace context from incoming message
+    opentelemetry::context::Context parentCtx;
+    if (m->has_trace_context())
+    {
+        parentCtx = telemetry::TraceContextPropagator::extract(
+            m->trace_context());
+    }
+
+    // Start span as child of remote span (cross-node link)
+    auto span = app_.getTelemetry().startSpan(
+        "tx.receive",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    try
+    {
+        // Parse and validate transaction
+        SerialIter sit(makeSlice(m->rawtransaction()));
+        auto stx = std::make_shared<STTx const>(sit);
+
+        // Add transaction attributes
+        guard.setAttribute("xrpl.tx.hash", to_string(stx->getTransactionID()));
+        guard.setAttribute("xrpl.tx.type", stx->getTxnType());
+        guard.setAttribute("xrpl.peer.id", remote_address_.to_string());
+
+        // Check if we've seen this transaction (HashRouter)
+        auto const [flags, suppressed] =
+            app_.getHashRouter().addSuppressionPeer(
+                stx->getTransactionID(),
+                id_);
+
+        if (suppressed)
+        {
+            guard.setAttribute("xrpl.tx.suppressed", true);
+            guard.addEvent("tx.duplicate");
+            return;  // Already processing this transaction
+        }
+
+        // Create child span for validation
+        {
+            auto validateSpan = app_.getTelemetry().startSpan("tx.validate");
+            telemetry::SpanGuard validateGuard(validateSpan);
+
+            auto [validity, reason] = checkTransaction(stx);
+            validateGuard.setAttribute("xrpl.tx.validity",
+                validity == Validity::Valid ? "valid" : "invalid");
+
+            if (validity != Validity::Valid)
+            {
+                validateGuard.setStatus(
+                    opentelemetry::trace::StatusCode::kError,
+                    reason);
+                return;
+            }
+        }
+
+        // Relay to other peers (capture context for propagation)
+        auto ctx = guard.context();
+
+        // Create child span for relay
+        auto relaySpan = app_.getTelemetry().startSpan(
+            "tx.relay",
+            ctx,
+            opentelemetry::trace::SpanKind::kClient);
+        {
+            telemetry::SpanGuard relayGuard(relaySpan);
+
+            // Inject context into outgoing message
+            protocol::TraceContext protoCtx;
+            telemetry::TraceContextPropagator::inject(
+                relayGuard.context(), protoCtx);
+
+            // Relay to other peers
+            app_.overlay().relay(
+                stx->getTransactionID(),
+                *m,
+                protoCtx,  // Pass trace context
+                exclusions);
+
+            relayGuard.setAttribute("xrpl.tx.relay_count",
+                static_cast<int64_t>(relayCount));
+        }
+
+        guard.setOk();
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        JLOG(journal_.warn()) << "Transaction handling failed: " << e.what();
+    }
+}
+```
+
+### 4.5.2 Consensus Instrumentation
+
+```cpp
+// src/xrpld/app/consensus/RCLConsensus.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+RCLConsensusAdaptor::startRound(
+    NetClock::time_point const& now,
+    RCLCxLedger::ID const& prevLedgerHash,
+    RCLCxLedger const& prevLedger,
+    hash_set<NodeID> const& peers,
+    bool proposing)
+{
+    XRPL_TRACE_CONSENSUS(app_.getTelemetry(), "consensus.round");
+
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.prev", to_string(prevLedgerHash));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.ledger.seq",
+        static_cast<int64_t>(prevLedger.seq() + 1));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.proposers",
+        static_cast<int64_t>(peers.size()));
+    XRPL_TRACE_SET_ATTR("xrpl.consensus.mode",
+        proposing ? "proposing" : "observing");
+
+    // Store trace context for use in phase transitions
+    currentRoundContext_ = _xrpl_guard_.has_value()
+        ? _xrpl_guard_->context()
+        : opentelemetry::context::Context{};
+
+    // ... existing implementation ...
+}
+
+ConsensusPhase
+RCLConsensusAdaptor::phaseTransition(ConsensusPhase newPhase)
+{
+    // Create span for phase transition
+    auto span = app_.getTelemetry().startSpan(
+        "consensus.phase." + to_string(newPhase),
+        currentRoundContext_);
+    telemetry::SpanGuard guard(span);
+
+    guard.setAttribute("xrpl.consensus.phase", to_string(newPhase));
+    guard.addEvent("phase.enter");
+
+    auto const startTime = std::chrono::steady_clock::now();
+
+    try
+    {
+        auto result = doPhaseTransition(newPhase);
+
+        auto const duration = std::chrono::steady_clock::now() - startTime;
+        guard.setAttribute("xrpl.consensus.phase_duration_ms",
+            std::chrono::duration<double, std::milli>(duration).count());
+
+        guard.setOk();
+        return result;
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        throw;
+    }
+}
+
+void
+RCLConsensusAdaptor::peerProposal(
+    NetClock::time_point const& now,
+    RCLCxPeerPos const& proposal)
+{
+    // Extract trace context from proposal message
+    opentelemetry::context::Context parentCtx;
+    if (proposal.hasTraceContext())
+    {
+        parentCtx = telemetry::TraceContextPropagator::extract(
+            proposal.traceContext());
+    }
+
+    auto span = app_.getTelemetry().startSpan(
+        "consensus.proposal.receive",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    guard.setAttribute("xrpl.consensus.proposer",
+        toBase58(TokenType::NodePublic, proposal.nodeId()));
+    guard.setAttribute("xrpl.consensus.round",
+        static_cast<int64_t>(proposal.proposal().proposeSeq()));
+
+    // ... existing implementation ...
+
+    guard.setOk();
+}
+```
+
+### 4.5.3 RPC Handler Instrumentation
+
+```cpp
+// src/xrpld/rpc/detail/ServerHandler.cpp (modified)
+
+#include <xrpl/telemetry/TracingInstrumentation.h>
+
+void
+ServerHandler::onRequest(
+    http_request_type&& req,
+    std::function<void(http_response_type&&)>&& send)
+{
+    // Extract trace context from HTTP headers (W3C Trace Context)
+    auto parentCtx = telemetry::TraceContextPropagator::extractFromHeaders(
+        [&req](std::string_view name) -> std::optional<std::string> {
+            auto it = req.find(boost::beast::http::field{
+                std::string(name)});
+            if (it != req.end())
+                return std::string(it->value());
+            return std::nullopt;
+        });
+
+    // Start request span
+    auto span = app_.getTelemetry().startSpan(
+        "rpc.request",
+        parentCtx,
+        opentelemetry::trace::SpanKind::kServer);
+    telemetry::SpanGuard guard(span);
+
+    // Add HTTP attributes
+    guard.setAttribute("http.method", std::string(req.method_string()));
+    guard.setAttribute("http.target", std::string(req.target()));
+    guard.setAttribute("http.user_agent",
+        std::string(req[boost::beast::http::field::user_agent]));
+
+    auto const startTime = std::chrono::steady_clock::now();
+
+    try
+    {
+        // Parse and process request
+        auto const& body = req.body();
+        Json::Value jv;
+        Json::Reader reader;
+
+        if (!reader.parse(body, jv))
+        {
+            guard.setStatus(
+                opentelemetry::trace::StatusCode::kError,
+                "Invalid JSON");
+            sendError(send, "Invalid JSON");
+            return;
+        }
+
+        // Extract command name
+        std::string command = jv.isMember("command")
+            ? jv["command"].asString()
+            : jv.isMember("method")
+                ? jv["method"].asString()
+                : "unknown";
+
+        guard.setAttribute("xrpl.rpc.command", command);
+
+        // Create child span for command execution
+        auto cmdSpan = app_.getTelemetry().startSpan(
+            "rpc.command." + command);
+        {
+            telemetry::SpanGuard cmdGuard(cmdSpan);
+
+            // Execute RPC command
+            auto result = processRequest(jv);
+
+            // Record result attributes
+            if (result.isMember("status"))
+            {
+                cmdGuard.setAttribute("xrpl.rpc.status",
+                    result["status"].asString());
+            }
+
+            if (result["status"].asString() == "error")
+            {
+                cmdGuard.setStatus(
+                    opentelemetry::trace::StatusCode::kError,
+                    result.isMember("error_message")
+                        ? result["error_message"].asString()
+                        : "RPC error");
+            }
+            else
+            {
+                cmdGuard.setOk();
+            }
+        }
+
+        auto const duration = std::chrono::steady_clock::now() - startTime;
+        guard.setAttribute("http.duration_ms",
+            std::chrono::duration<double, std::milli>(duration).count());
+
+        // Inject trace context into response headers
+        http_response_type resp;
+        telemetry::TraceContextPropagator::injectToHeaders(
+            guard.context(),
+            [&resp](std::string_view name, std::string_view value) {
+                resp.set(std::string(name), std::string(value));
+            });
+
+        guard.setOk();
+        send(std::move(resp));
+    }
+    catch (std::exception const& e)
+    {
+        guard.recordException(e);
+        JLOG(journal_.error()) << "RPC request failed: " << e.what();
+        sendError(send, e.what());
+    }
+}
+```
+
+### 4.5.4 JobQueue Context Propagation
+
+```cpp
+// src/xrpld/core/JobQueue.h (modified)
+
+#include <opentelemetry/context/context.h>
+
+class Job
+{
+    // ... existing members ...
+
+    // Captured trace context at job creation
+    opentelemetry::context::Context traceContext_;
+
+public:
+    // Constructor captures current trace context
+    Job(JobType type, std::function<void()> func, ...)
+        : type_(type)
+        , func_(std::move(func))
+        , traceContext_(opentelemetry::context::RuntimeContext::GetCurrent())
+        // ... other initializations ...
+    {
+    }
+
+    // Get trace context for restoration during execution
+    opentelemetry::context::Context const&
+    traceContext() const { return traceContext_; }
+};
+
+// src/xrpld/core/JobQueue.cpp (modified)
+
+void
+Worker::run()
+{
+    while (auto job = getJob())
+    {
+        // Restore trace context from job creation
+        auto token = opentelemetry::context::RuntimeContext::Attach(
+            job->traceContext());
+
+        // Start execution span
+        auto span = app_.getTelemetry().startSpan("job.execute");
+        telemetry::SpanGuard guard(span);
+
+        guard.setAttribute("xrpl.job.type", to_string(job->type()));
+        guard.setAttribute("xrpl.job.queue_ms", job->queueTimeMs());
+        guard.setAttribute("xrpl.job.worker", workerId_);
+
+        try
+        {
+            job->execute();
+            guard.setOk();
+        }
+        catch (std::exception const& e)
+        {
+            guard.recordException(e);
+            JLOG(journal_.error()) << "Job execution failed: " << e.what();
+        }
+    }
+}
+```
+
+---
+
+## 4.6 Span Flow Visualization
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    subgraph Client["External Client"]
+        submit["Submit TX"]
+    end
+
+    subgraph NodeA["rippled Node A"]
+        rpcA["rpc.request"]
+        cmdA["rpc.command.submit"]
+        txRecvA["tx.receive"]
+        txValA["tx.validate"]
+        txRelayA["tx.relay"]
+    end
+
+    subgraph NodeB["rippled Node B"]
+        txRecvB["tx.receive"]
+        txValB["tx.validate"]
+        txRelayB["tx.relay"]
+    end
+
+    subgraph NodeC["rippled Node C"]
+        txRecvC["tx.receive"]
+        consensusC["consensus.round"]
+        phaseC["consensus.phase.establish"]
+    end
+
+    submit --> rpcA
+    rpcA --> cmdA
+    cmdA --> txRecvA
+    txRecvA --> txValA
+    txValA --> txRelayA
+    txRelayA -.->|"TraceContext"| txRecvB
+    txRecvB --> txValB
+    txValB --> txRelayB
+    txRelayB -.->|"TraceContext"| txRecvC
+    txRecvC --> consensusC
+    consensusC --> phaseC
+
+    style Client fill:#334155,stroke:#1e293b,color:#fff
+    style NodeA fill:#1e3a8a,stroke:#172554,color:#fff
+    style NodeB fill:#064e3b,stroke:#022c22,color:#fff
+    style NodeC fill:#78350f,stroke:#451a03,color:#fff
+    style submit fill:#e2e8f0,stroke:#cbd5e1,color:#1e293b
+    style rpcA fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style cmdA fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style txRecvA fill:#047857,stroke:#064e3b,color:#fff
+    style txValA fill:#047857,stroke:#064e3b,color:#fff
+    style txRelayA fill:#047857,stroke:#064e3b,color:#fff
+    style txRecvB fill:#047857,stroke:#064e3b,color:#fff
+    style txValB fill:#047857,stroke:#064e3b,color:#fff
+    style txRelayB fill:#047857,stroke:#064e3b,color:#fff
+    style txRecvC fill:#047857,stroke:#064e3b,color:#fff
+    style consensusC fill:#fef3c7,stroke:#fde68a,color:#1e293b
+    style phaseC fill:#fef3c7,stroke:#fde68a,color:#1e293b
+```
+
+</div>
+
+---
+
+_Previous: [Implementation Strategy](./03-implementation-strategy.md)_ | _Next: [Configuration Reference](./05-configuration-reference.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/05-configuration-reference.md
+++ b/OpenTelemetryPlan/05-configuration-reference.md
@@ -0,0 +1,936 @@
+# Configuration Reference
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Code Samples](./04-code-samples.md) | [Implementation Phases](./06-implementation-phases.md)
+
+---
+
+## 5.1 rippled Configuration
+
+### 5.1.1 Configuration File Section
+
+Add to `cfg/xrpld-example.cfg`:
+
+```ini
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY (OpenTelemetry Distributed Tracing)
+# ═══════════════════════════════════════════════════════════════════════════════
+#
+# Enables distributed tracing for transaction flow, consensus, and RPC calls.
+# Traces are exported to an OpenTelemetry Collector using OTLP protocol.
+#
+# [telemetry]
+#
+# # Enable/disable telemetry (default: 0 = disabled)
+# enabled=1
+#
+# # Exporter type: "otlp_grpc" (default), "otlp_http", or "none"
+# exporter=otlp_grpc
+#
+# # OTLP endpoint (default: localhost:4317 for gRPC, localhost:4318 for HTTP)
+# endpoint=localhost:4317
+#
+# # Use TLS for exporter connection (default: 0)
+# use_tls=0
+#
+# # Path to CA certificate for TLS (optional)
+# # tls_ca_cert=/path/to/ca.crt
+#
+# # Sampling ratio: 0.0-1.0 (default: 1.0 = 100% sampling)
+# # Use lower values in production to reduce overhead
+# sampling_ratio=0.1
+#
+# # Batch processor settings
+# batch_size=512           # Spans per batch (default: 512)
+# batch_delay_ms=5000      # Max delay before sending batch (default: 5000)
+# max_queue_size=2048      # Max queued spans (default: 2048)
+#
+# # Component-specific tracing (default: all enabled except peer)
+# trace_transactions=1     # Transaction relay and processing
+# trace_consensus=1        # Consensus rounds and proposals
+# trace_rpc=1              # RPC request handling
+# trace_peer=0             # Peer messages (high volume, disabled by default)
+# trace_ledger=1           # Ledger acquisition and building
+#
+# # Service identification (automatically detected if not specified)
+# # service_name=rippled
+# # service_instance_id=<node_public_key>
+
+[telemetry]
+enabled=0
+```
+
+### 5.1.2 Configuration Options Summary
+
+| Option                | Type   | Default          | Description                               |
+| --------------------- | ------ | ---------------- | ----------------------------------------- |
+| `enabled`             | bool   | `false`          | Enable/disable telemetry                  |
+| `exporter`            | string | `"otlp_grpc"`    | Exporter type: otlp_grpc, otlp_http, none |
+| `endpoint`            | string | `localhost:4317` | OTLP collector endpoint                   |
+| `use_tls`             | bool   | `false`          | Enable TLS for exporter connection        |
+| `tls_ca_cert`         | string | `""`             | Path to CA certificate file               |
+| `sampling_ratio`      | float  | `1.0`            | Sampling ratio (0.0-1.0)                  |
+| `batch_size`          | uint   | `512`            | Spans per export batch                    |
+| `batch_delay_ms`      | uint   | `5000`           | Max delay before sending batch (ms)       |
+| `max_queue_size`      | uint   | `2048`           | Maximum queued spans                      |
+| `trace_transactions`  | bool   | `true`           | Enable transaction tracing                |
+| `trace_consensus`     | bool   | `true`           | Enable consensus tracing                  |
+| `trace_rpc`           | bool   | `true`           | Enable RPC tracing                        |
+| `trace_peer`          | bool   | `false`          | Enable peer message tracing (high volume) |
+| `trace_ledger`        | bool   | `true`           | Enable ledger tracing                     |
+| `service_name`        | string | `"rippled"`      | Service name for traces                   |
+| `service_instance_id` | string | `<node_pubkey>`  | Instance identifier                       |
+
+---
+
+## 5.2 Configuration Parser
+
+```cpp
+// src/libxrpl/telemetry/TelemetryConfig.cpp
+
+#include <xrpl/telemetry/Telemetry.h>
+#include <xrpl/basics/Log.h>
+
+namespace xrpl {
+namespace telemetry {
+
+Telemetry::Setup
+setup_Telemetry(
+    Section const& section,
+    std::string const& nodePublicKey,
+    std::string const& version)
+{
+    Telemetry::Setup setup;
+
+    // Basic settings
+    setup.enabled = section.value_or("enabled", false);
+    setup.serviceName = section.value_or("service_name", "rippled");
+    setup.serviceVersion = version;
+    setup.serviceInstanceId = section.value_or(
+        "service_instance_id", nodePublicKey);
+
+    // Exporter settings
+    setup.exporterType = section.value_or("exporter", "otlp_grpc");
+
+    if (setup.exporterType == "otlp_grpc")
+        setup.exporterEndpoint = section.value_or("endpoint", "localhost:4317");
+    else if (setup.exporterType == "otlp_http")
+        setup.exporterEndpoint = section.value_or("endpoint", "localhost:4318");
+
+    setup.useTls = section.value_or("use_tls", false);
+    setup.tlsCertPath = section.value_or("tls_ca_cert", "");
+
+    // Sampling
+    setup.samplingRatio = section.value_or("sampling_ratio", 1.0);
+    if (setup.samplingRatio < 0.0 || setup.samplingRatio > 1.0)
+    {
+        Throw<std::runtime_error>(
+            "telemetry.sampling_ratio must be between 0.0 and 1.0");
+    }
+
+    // Batch processor
+    setup.batchSize = section.value_or("batch_size", 512u);
+    setup.batchDelay = std::chrono::milliseconds{
+        section.value_or("batch_delay_ms", 5000u)};
+    setup.maxQueueSize = section.value_or("max_queue_size", 2048u);
+
+    // Component filtering
+    setup.traceTransactions = section.value_or("trace_transactions", true);
+    setup.traceConsensus = section.value_or("trace_consensus", true);
+    setup.traceRpc = section.value_or("trace_rpc", true);
+    setup.tracePeer = section.value_or("trace_peer", false);
+    setup.traceLedger = section.value_or("trace_ledger", true);
+
+    return setup;
+}
+
+} // namespace telemetry
+} // namespace xrpl
+```
+
+---
+
+## 5.3 Application Integration
+
+### 5.3.1 ApplicationImp Changes
+
+```cpp
+// src/xrpld/app/main/Application.cpp (modified)
+
+#include <xrpl/telemetry/Telemetry.h>
+
+class ApplicationImp : public Application
+{
+    // ... existing members ...
+
+    // Telemetry (must be constructed early, destroyed late)
+    std::unique_ptr<telemetry::Telemetry> telemetry_;
+
+public:
+    ApplicationImp(...)
+    {
+        // Initialize telemetry early (before other components)
+        auto telemetrySection = config_->section("telemetry");
+        auto telemetrySetup = telemetry::setup_Telemetry(
+            telemetrySection,
+            toBase58(TokenType::NodePublic, nodeIdentity_.publicKey()),
+            BuildInfo::getVersionString());
+
+        // Set network attributes
+        telemetrySetup.networkId = config_->NETWORK_ID;
+        telemetrySetup.networkType = [&]() {
+            if (config_->NETWORK_ID == 0) return "mainnet";
+            if (config_->NETWORK_ID == 1) return "testnet";
+            if (config_->NETWORK_ID == 2) return "devnet";
+            return "custom";
+        }();
+
+        telemetry_ = telemetry::make_Telemetry(
+            telemetrySetup,
+            logs_->journal("Telemetry"));
+
+        // ... rest of initialization ...
+    }
+
+    void start() override
+    {
+        // Start telemetry first
+        if (telemetry_)
+            telemetry_->start();
+
+        // ... existing start code ...
+    }
+
+    void stop() override
+    {
+        // ... existing stop code ...
+
+        // Stop telemetry last (to capture shutdown spans)
+        if (telemetry_)
+            telemetry_->stop();
+    }
+
+    telemetry::Telemetry& getTelemetry() override
+    {
+        assert(telemetry_);
+        return *telemetry_;
+    }
+};
+```
+
+### 5.3.2 Application Interface Addition
+
+```cpp
+// include/xrpl/app/main/Application.h (modified)
+
+namespace telemetry { class Telemetry; }
+
+class Application
+{
+public:
+    // ... existing virtual methods ...
+
+    /** Get the telemetry system for distributed tracing */
+    virtual telemetry::Telemetry& getTelemetry() = 0;
+};
+```
+
+---
+
+## 5.4 CMake Integration
+
+### 5.4.1 Find OpenTelemetry Module
+
+```cmake
+# cmake/FindOpenTelemetry.cmake
+
+# Find OpenTelemetry C++ SDK
+#
+# This module defines:
+#   OpenTelemetry_FOUND - System has OpenTelemetry
+#   OpenTelemetry::api - API library target
+#   OpenTelemetry::sdk - SDK library target
+#   OpenTelemetry::otlp_grpc_exporter - OTLP gRPC exporter target
+#   OpenTelemetry::otlp_http_exporter - OTLP HTTP exporter target
+
+find_package(opentelemetry-cpp CONFIG QUIET)
+
+if(opentelemetry-cpp_FOUND)
+    set(OpenTelemetry_FOUND TRUE)
+
+    # Create imported targets if not already created by config
+    if(NOT TARGET OpenTelemetry::api)
+        add_library(OpenTelemetry::api ALIAS opentelemetry-cpp::api)
+    endif()
+    if(NOT TARGET OpenTelemetry::sdk)
+        add_library(OpenTelemetry::sdk ALIAS opentelemetry-cpp::sdk)
+    endif()
+    if(NOT TARGET OpenTelemetry::otlp_grpc_exporter)
+        add_library(OpenTelemetry::otlp_grpc_exporter ALIAS
+            opentelemetry-cpp::otlp_grpc_exporter)
+    endif()
+else()
+    # Try pkg-config fallback
+    find_package(PkgConfig QUIET)
+    if(PKG_CONFIG_FOUND)
+        pkg_check_modules(OTEL opentelemetry-cpp QUIET)
+        if(OTEL_FOUND)
+            set(OpenTelemetry_FOUND TRUE)
+            # Create imported targets from pkg-config
+            add_library(OpenTelemetry::api INTERFACE IMPORTED)
+            target_include_directories(OpenTelemetry::api INTERFACE
+                ${OTEL_INCLUDE_DIRS})
+        endif()
+    endif()
+endif()
+
+include(FindPackageHandleStandardArgs)
+find_package_handle_standard_args(OpenTelemetry
+    REQUIRED_VARS OpenTelemetry_FOUND)
+```
+
+### 5.4.2 CMakeLists.txt Changes
+
+```cmake
+# CMakeLists.txt (additions)
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY OPTIONS
+# ═══════════════════════════════════════════════════════════════════════════════
+
+option(XRPL_ENABLE_TELEMETRY
+    "Enable OpenTelemetry distributed tracing support" OFF)
+
+if(XRPL_ENABLE_TELEMETRY)
+    find_package(OpenTelemetry REQUIRED)
+
+    # Define compile-time flag
+    add_compile_definitions(XRPL_ENABLE_TELEMETRY)
+
+    message(STATUS "OpenTelemetry tracing: ENABLED")
+else()
+    message(STATUS "OpenTelemetry tracing: DISABLED")
+endif()
+
+# ═══════════════════════════════════════════════════════════════════════════════
+# TELEMETRY LIBRARY
+# ═══════════════════════════════════════════════════════════════════════════════
+
+if(XRPL_ENABLE_TELEMETRY)
+    add_library(xrpl_telemetry
+        src/libxrpl/telemetry/Telemetry.cpp
+        src/libxrpl/telemetry/TelemetryConfig.cpp
+        src/libxrpl/telemetry/TraceContext.cpp
+    )
+
+    target_include_directories(xrpl_telemetry
+        PUBLIC
+            ${CMAKE_CURRENT_SOURCE_DIR}/include
+    )
+
+    target_link_libraries(xrpl_telemetry
+        PUBLIC
+            OpenTelemetry::api
+            OpenTelemetry::sdk
+            OpenTelemetry::otlp_grpc_exporter
+        PRIVATE
+            xrpl_basics
+    )
+
+    # Add to main library dependencies
+    target_link_libraries(xrpld PRIVATE xrpl_telemetry)
+else()
+    # Create null implementation library
+    add_library(xrpl_telemetry
+        src/libxrpl/telemetry/NullTelemetry.cpp
+    )
+    target_include_directories(xrpl_telemetry
+        PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include
+    )
+endif()
+```
+
+---
+
+## 5.5 OpenTelemetry Collector Configuration
+
+### 5.5.1 Development Configuration
+
+```yaml
+# otel-collector-dev.yaml
+# Minimal configuration for local development
+
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+      http:
+        endpoint: 0.0.0.0:4318
+
+processors:
+  batch:
+    timeout: 1s
+    send_batch_size: 100
+
+exporters:
+  # Console output for debugging
+  logging:
+    verbosity: detailed
+    sampling_initial: 5
+    sampling_thereafter: 200
+
+  # Jaeger for trace visualization
+  jaeger:
+    endpoint: jaeger:14250
+    tls:
+      insecure: true
+
+service:
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors: [batch]
+      exporters: [logging, jaeger]
+```
+
+### 5.5.2 Production Configuration
+
+```yaml
+# otel-collector-prod.yaml
+# Production configuration with filtering, sampling, and multiple backends
+
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+        tls:
+          cert_file: /etc/otel/server.crt
+          key_file: /etc/otel/server.key
+          ca_file: /etc/otel/ca.crt
+
+processors:
+  # Memory limiter to prevent OOM
+  memory_limiter:
+    check_interval: 1s
+    limit_mib: 1000
+    spike_limit_mib: 200
+
+  # Batch processing for efficiency
+  batch:
+    timeout: 5s
+    send_batch_size: 512
+    send_batch_max_size: 1024
+
+  # Tail-based sampling (keep errors and slow traces)
+  tail_sampling:
+    decision_wait: 10s
+    num_traces: 100000
+    expected_new_traces_per_sec: 1000
+    policies:
+      # Always keep error traces
+      - name: errors
+        type: status_code
+        status_code:
+          status_codes: [ERROR]
+      # Keep slow consensus rounds (>5s)
+      - name: slow-consensus
+        type: latency
+        latency:
+          threshold_ms: 5000
+      # Keep slow RPC requests (>1s)
+      - name: slow-rpc
+        type: and
+        and:
+          and_sub_policy:
+            - name: rpc-spans
+              type: string_attribute
+              string_attribute:
+                key: xrpl.rpc.command
+                values: [".*"]
+                enabled_regex_matching: true
+            - name: latency
+              type: latency
+              latency:
+                threshold_ms: 1000
+      # Probabilistic sampling for the rest
+      - name: probabilistic
+        type: probabilistic
+        probabilistic:
+          sampling_percentage: 10
+
+  # Attribute processing
+  attributes:
+    actions:
+      # Hash sensitive data
+      - key: xrpl.tx.account
+        action: hash
+      # Add deployment info
+      - key: deployment.environment
+        value: production
+        action: upsert
+
+exporters:
+  # Grafana Tempo for long-term storage
+  otlp/tempo:
+    endpoint: tempo.monitoring:4317
+    tls:
+      insecure: false
+      ca_file: /etc/otel/tempo-ca.crt
+
+  # Elastic APM for correlation with logs
+  otlp/elastic:
+    endpoint: apm.elastic:8200
+    headers:
+      Authorization: "Bearer ${ELASTIC_APM_TOKEN}"
+
+extensions:
+  health_check:
+    endpoint: 0.0.0.0:13133
+  zpages:
+    endpoint: 0.0.0.0:55679
+
+service:
+  extensions: [health_check, zpages]
+  pipelines:
+    traces:
+      receivers: [otlp]
+      processors: [memory_limiter, tail_sampling, attributes, batch]
+      exporters: [otlp/tempo, otlp/elastic]
+```
+
+---
+
+## 5.6 Docker Compose Development Environment
+
+```yaml
+# docker-compose-telemetry.yaml
+version: "3.8"
+
+services:
+  # OpenTelemetry Collector
+  otel-collector:
+    image: otel/opentelemetry-collector-contrib:0.92.0
+    container_name: otel-collector
+    command: ["--config=/etc/otel-collector-config.yaml"]
+    volumes:
+      - ./otel-collector-dev.yaml:/etc/otel-collector-config.yaml:ro
+    ports:
+      - "4317:4317" # OTLP gRPC
+      - "4318:4318" # OTLP HTTP
+      - "13133:13133" # Health check
+    depends_on:
+      - jaeger
+
+  # Jaeger for trace visualization
+  jaeger:
+    image: jaegertracing/all-in-one:1.53
+    container_name: jaeger
+    environment:
+      - COLLECTOR_OTLP_ENABLED=true
+    ports:
+      - "16686:16686" # UI
+      - "14250:14250" # gRPC
+
+  # Grafana for dashboards
+  grafana:
+    image: grafana/grafana:10.2.3
+    container_name: grafana
+    environment:
+      - GF_AUTH_ANONYMOUS_ENABLED=true
+      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
+    volumes:
+      - ./grafana/provisioning:/etc/grafana/provisioning:ro
+      - ./grafana/dashboards:/var/lib/grafana/dashboards:ro
+    ports:
+      - "3000:3000"
+    depends_on:
+      - jaeger
+
+  # Prometheus for metrics (optional, for correlation)
+  prometheus:
+    image: prom/prometheus:v2.48.1
+    container_name: prometheus
+    volumes:
+      - ./prometheus.yaml:/etc/prometheus/prometheus.yml:ro
+    ports:
+      - "9090:9090"
+
+networks:
+  default:
+    name: rippled-telemetry
+```
+
+---
+
+## 5.7 Configuration Architecture
+
+```mermaid
+flowchart TB
+    subgraph config["Configuration Sources"]
+        cfgFile["xrpld.cfg<br/>[telemetry] section"]
+        cmake["CMake<br/>XRPL_ENABLE_TELEMETRY"]
+    end
+
+    subgraph init["Initialization"]
+        parse["setup_Telemetry()"]
+        factory["make_Telemetry()"]
+    end
+
+    subgraph runtime["Runtime Components"]
+        tracer["TracerProvider"]
+        exporter["OTLP Exporter"]
+        processor["BatchProcessor"]
+    end
+
+    subgraph collector["Collector Pipeline"]
+        recv["Receivers"]
+        proc["Processors"]
+        exp["Exporters"]
+    end
+
+    cfgFile --> parse
+    cmake -->|"compile flag"| parse
+    parse --> factory
+    factory --> tracer
+    tracer --> processor
+    processor --> exporter
+    exporter -->|"OTLP"| recv
+    recv --> proc
+    proc --> exp
+
+    style config fill:#e3f2fd,stroke:#1976d2
+    style runtime fill:#e8f5e9,stroke:#388e3c
+    style collector fill:#fff3e0,stroke:#ff9800
+```
+
+---
+
+## 5.8 Grafana Integration
+
+Step-by-step instructions for integrating rippled traces with Grafana.
+
+### 5.8.1 Data Source Configuration
+
+#### Tempo (Recommended)
+
+```yaml
+# grafana/provisioning/datasources/tempo.yaml
+apiVersion: 1
+
+datasources:
+  - name: Tempo
+    type: tempo
+    access: proxy
+    url: http://tempo:3200
+    jsonData:
+      httpMethod: GET
+      tracesToLogs:
+        datasourceUid: loki
+        tags: ["service.name", "xrpl.tx.hash"]
+        mappedTags: [{ key: "trace_id", value: "traceID" }]
+        mapTagNamesEnabled: true
+        filterByTraceID: true
+      serviceMap:
+        datasourceUid: prometheus
+      nodeGraph:
+        enabled: true
+      search:
+        hide: false
+      lokiSearch:
+        datasourceUid: loki
+```
+
+#### Jaeger
+
+```yaml
+# grafana/provisioning/datasources/jaeger.yaml
+apiVersion: 1
+
+datasources:
+  - name: Jaeger
+    type: jaeger
+    access: proxy
+    url: http://jaeger:16686
+    jsonData:
+      tracesToLogs:
+        datasourceUid: loki
+        tags: ["service.name"]
+```
+
+#### Elastic APM
+
+```yaml
+# grafana/provisioning/datasources/elastic-apm.yaml
+apiVersion: 1
+
+datasources:
+  - name: Elasticsearch-APM
+    type: elasticsearch
+    access: proxy
+    url: http://elasticsearch:9200
+    database: "apm-*"
+    jsonData:
+      esVersion: "8.0.0"
+      timeField: "@timestamp"
+      logMessageField: message
+      logLevelField: log.level
+```
+
+### 5.8.2 Dashboard Provisioning
+
+```yaml
+# grafana/provisioning/dashboards/dashboards.yaml
+apiVersion: 1
+
+providers:
+  - name: "rippled-dashboards"
+    orgId: 1
+    folder: "rippled"
+    folderUid: "rippled"
+    type: file
+    disableDeletion: false
+    updateIntervalSeconds: 30
+    options:
+      path: /var/lib/grafana/dashboards/rippled
+```
+
+### 5.8.3 Example Dashboard: RPC Performance
+
+```json
+{
+  "title": "rippled RPC Performance",
+  "uid": "rippled-rpc-performance",
+  "panels": [
+    {
+      "title": "RPC Latency by Command",
+      "type": "heatmap",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && span.xrpl.rpc.command != \"\"} | histogram_over_time(duration) by (span.xrpl.rpc.command)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
+    },
+    {
+      "title": "RPC Error Rate",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && status.code=error} | rate() by (span.xrpl.rpc.command)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
+    },
+    {
+      "title": "Top 10 Slowest RPC Commands",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && span.xrpl.rpc.command != \"\"} | avg(duration) by (span.xrpl.rpc.command) | topk(10)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 8 }
+    },
+    {
+      "title": "Recent Traces",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"}"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 16 }
+    }
+  ]
+}
+```
+
+### 5.8.4 Example Dashboard: Transaction Tracing
+
+```json
+{
+  "title": "rippled Transaction Tracing",
+  "uid": "rippled-tx-tracing",
+  "panels": [
+    {
+      "title": "Transaction Throughput",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | rate()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Cross-Node Relay Count",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.relay\"} | avg(span.xrpl.tx.relay_count)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 }
+    },
+    {
+      "title": "Transaction Validation Errors",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.validate\" && status.code=error}"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 }
+    }
+  ]
+}
+```
+
+### 5.8.5 TraceQL Query Examples
+
+Common queries for rippled traces:
+
+```
+# Find all traces for a specific transaction hash
+{resource.service.name="rippled" && span.xrpl.tx.hash="ABC123..."}
+
+# Find slow RPC commands (>100ms)
+{resource.service.name="rippled" && name=~"rpc.command.*"} | duration > 100ms
+
+# Find consensus rounds taking >5 seconds
+{resource.service.name="rippled" && name="consensus.round"} | duration > 5s
+
+# Find failed transactions with error details
+{resource.service.name="rippled" && name="tx.validate" && status.code=error}
+
+# Find transactions relayed to many peers
+{resource.service.name="rippled" && name="tx.relay"} | span.xrpl.tx.relay_count > 10
+
+# Compare latency across nodes
+{resource.service.name="rippled" && name="rpc.command.account_info"} | avg(duration) by (resource.service.instance.id)
+```
+
+### 5.8.6 Correlation with PerfLog
+
+To correlate OpenTelemetry traces with existing PerfLog data:
+
+**Step 1: Configure Loki to ingest PerfLog**
+
+```yaml
+# promtail-config.yaml
+scrape_configs:
+  - job_name: rippled-perflog
+    static_configs:
+      - targets:
+          - localhost
+        labels:
+          job: rippled
+          __path__: /var/log/rippled/perf*.log
+    pipeline_stages:
+      - json:
+          expressions:
+            trace_id: trace_id
+            ledger_seq: ledger_seq
+            tx_hash: tx_hash
+      - labels:
+          trace_id:
+          ledger_seq:
+          tx_hash:
+```
+
+**Step 2: Add trace_id to PerfLog entries**
+
+Modify PerfLog to include trace_id when available:
+
+```cpp
+// In PerfLog output, add trace_id from current span context
+void logPerf(Json::Value& entry) {
+    auto span = opentelemetry::trace::GetSpan(
+        opentelemetry::context::RuntimeContext::GetCurrent());
+    if (span && span->GetContext().IsValid()) {
+        char traceIdHex[33];
+        span->GetContext().trace_id().ToLowerBase16(traceIdHex);
+        entry["trace_id"] = std::string(traceIdHex, 32);
+    }
+    // ... existing logging
+}
+```
+
+**Step 3: Configure Grafana trace-to-logs link**
+
+In Tempo data source configuration, set up the derived field:
+
+```yaml
+jsonData:
+  tracesToLogs:
+    datasourceUid: loki
+    tags: ["trace_id", "xrpl.tx.hash"]
+    filterByTraceID: true
+    filterBySpanID: false
+```
+
+### 5.8.7 Correlation with Insight/StatsD Metrics
+
+To correlate traces with existing Beast Insight metrics:
+
+**Step 1: Export Insight metrics to Prometheus**
+
+```yaml
+# prometheus.yaml
+scrape_configs:
+  - job_name: "rippled-statsd"
+    static_configs:
+      - targets: ["statsd-exporter:9102"]
+```
+
+**Step 2: Add exemplars to metrics**
+
+OpenTelemetry SDK automatically adds exemplars (trace IDs) to metrics when using the Prometheus exporter. This links metrics spikes to specific traces.
+
+**Step 3: Configure Grafana metric-to-trace link**
+
+```yaml
+# In Prometheus data source
+jsonData:
+  exemplarTraceIdDestinations:
+    - name: trace_id
+      datasourceUid: tempo
+```
+
+**Step 4: Dashboard panel with exemplars**
+
+```json
+{
+  "title": "RPC Latency with Trace Links",
+  "type": "timeseries",
+  "datasource": "Prometheus",
+  "targets": [
+    {
+      "expr": "histogram_quantile(0.99, rate(rippled_rpc_duration_seconds_bucket[5m]))",
+      "exemplar": true
+    }
+  ]
+}
+```
+
+This allows clicking on metric data points to jump directly to the related trace.
+
+---
+
+_Previous: [Code Samples](./04-code-samples.md)_ | _Next: [Implementation Phases](./06-implementation-phases.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/06-implementation-phases.md
+++ b/OpenTelemetryPlan/06-implementation-phases.md
@@ -0,0 +1,543 @@
+# Implementation Phases
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Configuration Reference](./05-configuration-reference.md) | [Observability Backends](./07-observability-backends.md)
+
+---
+
+## 6.1 Phase Overview
+
+```mermaid
+gantt
+    title OpenTelemetry Implementation Timeline
+    dateFormat  YYYY-MM-DD
+    axisFormat  Week %W
+
+    section Phase 1
+    Core Infrastructure        :p1, 2024-01-01, 2w
+    SDK Integration           :p1a, 2024-01-01, 4d
+    Telemetry Interface       :p1b, after p1a, 3d
+    Configuration & CMake     :p1c, after p1b, 3d
+    Unit Tests                :p1d, after p1c, 2d
+
+    section Phase 2
+    RPC Tracing               :p2, after p1, 2w
+    HTTP Context Extraction   :p2a, after p1, 2d
+    RPC Handler Instrumentation :p2b, after p2a, 4d
+    WebSocket Support         :p2c, after p2b, 2d
+    Integration Tests         :p2d, after p2c, 2d
+
+    section Phase 3
+    Transaction Tracing       :p3, after p2, 2w
+    Protocol Buffer Extension :p3a, after p2, 2d
+    PeerImp Instrumentation   :p3b, after p3a, 3d
+    Relay Context Propagation :p3c, after p3b, 3d
+    Multi-node Tests          :p3d, after p3c, 2d
+
+    section Phase 4
+    Consensus Tracing         :p4, after p3, 2w
+    Consensus Round Spans     :p4a, after p3, 3d
+    Proposal Handling         :p4b, after p4a, 3d
+    Validation Tests          :p4c, after p4b, 4d
+
+    section Phase 5
+    Documentation & Deploy    :p5, after p4, 1w
+```
+
+---
+
+## 6.2 Phase 1: Core Infrastructure (Weeks 1-2)
+
+**Objective**: Establish foundational telemetry infrastructure
+
+### Tasks
+
+| Task | Description                                           | Effort | Risk   |
+| ---- | ----------------------------------------------------- | ------ | ------ |
+| 1.1  | Add OpenTelemetry C++ SDK to Conan/CMake              | 2d     | Low    |
+| 1.2  | Implement `Telemetry` interface and factory           | 2d     | Low    |
+| 1.3  | Implement `SpanGuard` RAII wrapper                    | 1d     | Low    |
+| 1.4  | Implement configuration parser                        | 1d     | Low    |
+| 1.5  | Integrate into `ApplicationImp`                       | 1d     | Medium |
+| 1.6  | Add conditional compilation (`XRPL_ENABLE_TELEMETRY`) | 1d     | Low    |
+| 1.7  | Create `NullTelemetry` no-op implementation           | 0.5d   | Low    |
+| 1.8  | Unit tests for core infrastructure                    | 1.5d   | Low    |
+
+**Total Effort**: 10 days (2 developers)
+
+### Exit Criteria
+
+- [ ] OpenTelemetry SDK compiles and links
+- [ ] Telemetry can be enabled/disabled via config
+- [ ] Basic span creation works
+- [ ] No performance regression when disabled
+- [ ] Unit tests passing
+
+---
+
+## 6.3 Phase 2: RPC Tracing (Weeks 3-4)
+
+**Objective**: Complete tracing for all RPC operations
+
+### Tasks
+
+| Task | Description                                        | Effort | Risk   |
+| ---- | -------------------------------------------------- | ------ | ------ |
+| 2.1  | Implement W3C Trace Context HTTP header extraction | 1d     | Low    |
+| 2.2  | Instrument `ServerHandler::onRequest()`            | 1d     | Low    |
+| 2.3  | Instrument `RPCHandler::doCommand()`               | 2d     | Medium |
+| 2.4  | Add RPC-specific attributes                        | 1d     | Low    |
+| 2.5  | Instrument WebSocket handler                       | 1d     | Medium |
+| 2.6  | Integration tests for RPC tracing                  | 2d     | Low    |
+| 2.7  | Performance benchmarks                             | 1d     | Low    |
+| 2.8  | Documentation                                      | 1d     | Low    |
+
+**Total Effort**: 10 days
+
+### Exit Criteria
+
+- [ ] All RPC commands traced
+- [ ] Trace context propagates from HTTP headers
+- [ ] WebSocket and HTTP both instrumented
+- [ ] <1ms overhead per RPC call
+- [ ] Integration tests passing
+
+---
+
+## 6.4 Phase 3: Transaction Tracing (Weeks 5-6)
+
+**Objective**: Trace transaction lifecycle across network
+
+### Tasks
+
+| Task | Description                                   | Effort | Risk   |
+| ---- | --------------------------------------------- | ------ | ------ |
+| 3.1  | Define `TraceContext` Protocol Buffer message | 1d     | Low    |
+| 3.2  | Implement protobuf context serialization      | 1d     | Low    |
+| 3.3  | Instrument `PeerImp::handleTransaction()`     | 2d     | Medium |
+| 3.4  | Instrument `NetworkOPs::submitTransaction()`  | 1d     | Medium |
+| 3.5  | Instrument HashRouter integration             | 1d     | Medium |
+| 3.6  | Implement relay context propagation           | 2d     | High   |
+| 3.7  | Integration tests (multi-node)                | 2d     | Medium |
+| 3.8  | Performance benchmarks                        | 1d     | Low    |
+
+**Total Effort**: 11 days
+
+### Exit Criteria
+
+- [ ] Transaction traces span across nodes
+- [ ] Trace context in Protocol Buffer messages
+- [ ] HashRouter deduplication visible in traces
+- [ ] Multi-node integration tests passing
+- [ ] <5% overhead on transaction throughput
+
+---
+
+## 6.5 Phase 4: Consensus Tracing (Weeks 7-8)
+
+**Objective**: Full observability into consensus rounds
+
+### Tasks
+
+| Task | Description                                    | Effort | Risk   |
+| ---- | ---------------------------------------------- | ------ | ------ |
+| 4.1  | Instrument `RCLConsensusAdaptor::startRound()` | 1d     | Medium |
+| 4.2  | Instrument phase transitions                   | 2d     | Medium |
+| 4.3  | Instrument proposal handling                   | 2d     | High   |
+| 4.4  | Instrument validation handling                 | 1d     | Medium |
+| 4.5  | Add consensus-specific attributes              | 1d     | Low    |
+| 4.6  | Correlate with transaction traces              | 1d     | Medium |
+| 4.7  | Multi-validator integration tests              | 2d     | High   |
+| 4.8  | Performance validation                         | 1d     | Medium |
+
+**Total Effort**: 11 days
+
+### Exit Criteria
+
+- [ ] Complete consensus round traces
+- [ ] Phase transitions visible
+- [ ] Proposals and validations traced
+- [ ] No impact on consensus timing
+- [ ] Multi-validator test network validated
+
+---
+
+## 6.6 Phase 5: Documentation & Deployment (Week 9)
+
+**Objective**: Production readiness
+
+### Tasks
+
+| Task | Description                   | Effort | Risk |
+| ---- | ----------------------------- | ------ | ---- |
+| 5.1  | Operator runbook              | 1d     | Low  |
+| 5.2  | Grafana dashboards            | 1d     | Low  |
+| 5.3  | Alert definitions             | 0.5d   | Low  |
+| 5.4  | Collector deployment examples | 0.5d   | Low  |
+| 5.5  | Developer documentation       | 1d     | Low  |
+| 5.6  | Training materials            | 0.5d   | Low  |
+| 5.7  | Final integration testing     | 0.5d   | Low  |
+
+**Total Effort**: 5 days
+
+---
+
+## 6.7 Risk Assessment
+
+```mermaid
+quadrantChart
+    title Risk Assessment Matrix
+    x-axis Low Impact --> High Impact
+    y-axis Low Likelihood --> High Likelihood
+    quadrant-1 Monitor Closely
+    quadrant-2 Mitigate Immediately
+    quadrant-3 Accept Risk
+    quadrant-4 Plan Mitigation
+
+    SDK Compatibility: [0.25, 0.2]
+    Protocol Changes: [0.75, 0.65]
+    Performance Overhead: [0.65, 0.45]
+    Context Propagation: [0.5, 0.5]
+    Memory Leaks: [0.8, 0.2]
+```
+
+### Risk Details
+
+| Risk                                 | Likelihood | Impact | Mitigation                              |
+| ------------------------------------ | ---------- | ------ | --------------------------------------- |
+| Protocol changes break compatibility | Medium     | High   | Use high field numbers, optional fields |
+| Performance overhead unacceptable    | Medium     | Medium | Sampling, conditional compilation       |
+| Context propagation complexity       | Medium     | Medium | Phased rollout, extensive testing       |
+| SDK compatibility issues             | Low        | Medium | Pin SDK version, fallback to no-op      |
+| Memory leaks in long-running nodes   | Low        | High   | Memory profiling, bounded queues        |
+
+---
+
+## 6.8 Success Metrics
+
+| Metric                   | Target                         | Measurement           |
+| ------------------------ | ------------------------------ | --------------------- |
+| Trace coverage           | >95% of transactions           | Sampling verification |
+| CPU overhead             | <3%                            | Benchmark tests       |
+| Memory overhead          | <5 MB                          | Memory profiling      |
+| Latency impact (p99)     | <2%                            | Performance tests     |
+| Trace completeness       | >99% spans with required attrs | Validation script     |
+| Cross-node trace linkage | >90% of multi-hop transactions | Integration tests     |
+
+---
+
+## 6.9 Effort Summary
+
+<div align="center">
+
+```mermaid
+%%{init: {'pie': {'textPosition': 0.75}}}%%
+pie showData
+    "Phase 1: Core Infrastructure" : 10
+    "Phase 2: RPC Tracing" : 10
+    "Phase 3: Transaction Tracing" : 11
+    "Phase 4: Consensus Tracing" : 11
+    "Phase 5: Documentation" : 5
+```
+
+**Total Effort Distribution (47 developer-days)**
+
+</div>
+
+### Resource Requirements
+
+| Phase     | Developers | Duration    | Total Effort |
+| --------- | ---------- | ----------- | ------------ |
+| 1         | 2          | 2 weeks     | 10 days      |
+| 2         | 1-2        | 2 weeks     | 10 days      |
+| 3         | 2          | 2 weeks     | 11 days      |
+| 4         | 2          | 2 weeks     | 11 days      |
+| 5         | 1          | 1 week      | 5 days       |
+| **Total** | **2**      | **9 weeks** | **47 days**  |
+
+---
+
+## 6.10 Quick Wins and Crawl-Walk-Run Strategy
+
+This section outlines a prioritized approach to maximize ROI with minimal initial investment.
+
+### 6.10.1 Crawl-Walk-Run Overview
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    subgraph crawl["🐢 CRAWL (Week 1-2)"]
+        direction LR
+        c1[Core SDK Setup] ~~~ c2[RPC Tracing Only] ~~~ c3[Single Node]
+    end
+
+    subgraph walk["🚶 WALK (Week 3-5)"]
+        direction LR
+        w1[Transaction Tracing] ~~~ w2[Cross-Node Context] ~~~ w3[Basic Dashboards]
+    end
+
+    subgraph run["🏃 RUN (Week 6-9)"]
+        direction LR
+        r1[Consensus Tracing] ~~~ r2[Full Correlation] ~~~ r3[Production Deploy]
+    end
+
+    crawl --> walk --> run
+
+    style crawl fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style walk fill:#bf360c,stroke:#8c2809,color:#fff
+    style run fill:#0d47a1,stroke:#082f6a,color:#fff
+    style c1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style c2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style c3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style w1 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style w2 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style w3 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style r1 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style r2 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style r3 fill:#0d47a1,stroke:#082f6a,color:#fff
+```
+
+</div>
+
+### 6.10.2 Quick Wins (Immediate Value)
+
+| Quick Win                      | Effort   | Value  | When to Deploy |
+| ------------------------------ | -------- | ------ | -------------- |
+| **RPC Command Tracing**        | 2 days   | High   | Week 2         |
+| **RPC Latency Histograms**     | 0.5 days | High   | Week 2         |
+| **Error Rate Dashboard**       | 0.5 days | Medium | Week 2         |
+| **Transaction Submit Tracing** | 1 day    | High   | Week 3         |
+| **Consensus Round Duration**   | 1 day    | Medium | Week 6         |
+
+### 6.10.3 CRAWL Phase (Weeks 1-2)
+
+**Goal**: Get basic tracing working with minimal code changes.
+
+**What You Get**:
+
+- RPC request/response traces for all commands
+- Latency breakdown per RPC command
+- Error visibility with stack traces
+- Basic Grafana dashboard
+
+**Code Changes**: ~15 lines in `ServerHandler.cpp`, ~40 lines in new telemetry module
+
+**Why Start Here**:
+
+- RPC is the lowest-risk, highest-visibility component
+- Immediate value for debugging client issues
+- No cross-node complexity
+- Single file modification to existing code
+
+### 6.10.4 WALK Phase (Weeks 3-5)
+
+**Goal**: Add transaction lifecycle tracing across nodes.
+
+**What You Get**:
+
+- End-to-end transaction traces from submit to relay
+- Cross-node correlation (see transaction path)
+- HashRouter deduplication visibility
+- Relay latency metrics
+
+**Code Changes**: ~120 lines across 4 files, plus protobuf extension
+
+**Why Do This Second**:
+
+- Builds on RPC tracing (transactions submitted via RPC)
+- Moderate complexity (requires context propagation)
+- High value for debugging transaction issues
+
+### 6.10.5 RUN Phase (Weeks 6-9)
+
+**Goal**: Full observability including consensus.
+
+**What You Get**:
+
+- Complete consensus round visibility
+- Phase transition timing
+- Validator proposal tracking
+- Full end-to-end traces (client → RPC → TX → consensus → ledger)
+
+**Code Changes**: ~100 lines across 3 consensus files
+
+**Why Do This Last**:
+
+- Highest complexity (consensus is critical path)
+- Requires thorough testing
+- Lower relative value (consensus issues are rarer)
+
+### 6.10.6 ROI Prioritization Matrix
+
+```mermaid
+quadrantChart
+    title Implementation ROI Matrix
+    x-axis Low Effort --> High Effort
+    y-axis Low Value --> High Value
+    quadrant-1 Quick Wins - Do First
+    quadrant-2 Major Projects - Plan Carefully
+    quadrant-3 Nice to Have - Optional
+    quadrant-4 Time Sinks - Avoid
+
+    RPC Tracing: [0.15, 0.9]
+    TX Submit Trace: [0.25, 0.85]
+    TX Relay Trace: [0.5, 0.8]
+    Consensus Trace: [0.7, 0.75]
+    Peer Message Trace: [0.85, 0.3]
+    Ledger Acquire: [0.55, 0.5]
+```
+
+---
+
+## 6.11 Definition of Done
+
+Clear, measurable criteria for each phase.
+
+### 6.11.1 Phase 1: Core Infrastructure
+
+| Criterion       | Measurement                                                | Target                       |
+| --------------- | ---------------------------------------------------------- | ---------------------------- |
+| SDK Integration | `cmake --build` succeeds with `-DXRPL_ENABLE_TELEMETRY=ON` | ✅ Compiles                  |
+| Runtime Toggle  | `enabled=0` produces zero overhead                         | <0.1% CPU difference         |
+| Span Creation   | Unit test creates and exports span                         | Span appears in Jaeger       |
+| Configuration   | All config options parsed correctly                        | Config validation tests pass |
+| Documentation   | Developer guide exists                                     | PR approved                  |
+
+**Definition of Done**: All criteria met, PR merged, no regressions in CI.
+
+### 6.11.2 Phase 2: RPC Tracing
+
+| Criterion          | Measurement                        | Target                     |
+| ------------------ | ---------------------------------- | -------------------------- |
+| Coverage           | All RPC commands instrumented      | 100% of commands           |
+| Context Extraction | traceparent header propagates      | Integration test passes    |
+| Attributes         | Command, status, duration recorded | Validation script confirms |
+| Performance        | RPC latency overhead               | <1ms p99                   |
+| Dashboard          | Grafana dashboard deployed         | Screenshot in docs         |
+
+**Definition of Done**: RPC traces visible in Jaeger/Tempo for all commands, dashboard shows latency distribution.
+
+### 6.11.3 Phase 3: Transaction Tracing
+
+| Criterion        | Measurement                     | Target                             |
+| ---------------- | ------------------------------- | ---------------------------------- |
+| Local Trace      | Submit → validate → TxQ traced  | Single-node test passes            |
+| Cross-Node       | Context propagates via protobuf | Multi-node test passes             |
+| Relay Visibility | relay_count attribute correct   | Spot check 100 txs                 |
+| HashRouter       | Deduplication visible in trace  | Duplicate txs show suppressed=true |
+| Performance      | TX throughput overhead          | <5% degradation                    |
+
+**Definition of Done**: Transaction traces span 3+ nodes in test network, performance within bounds.
+
+### 6.11.4 Phase 4: Consensus Tracing
+
+| Criterion            | Measurement                   | Target                    |
+| -------------------- | ----------------------------- | ------------------------- |
+| Round Tracing        | startRound creates root span  | Unit test passes          |
+| Phase Visibility     | All phases have child spans   | Integration test confirms |
+| Proposer Attribution | Proposer ID in attributes     | Spot check 50 rounds      |
+| Timing Accuracy      | Phase durations match PerfLog | <5% variance              |
+| No Consensus Impact  | Round timing unchanged        | Performance test passes   |
+
+**Definition of Done**: Consensus rounds fully traceable, no impact on consensus timing.
+
+### 6.11.5 Phase 5: Production Deployment
+
+| Criterion    | Measurement                  | Target                     |
+| ------------ | ---------------------------- | -------------------------- |
+| Collector HA | Multiple collectors deployed | No single point of failure |
+| Sampling     | Tail sampling configured     | 10% base + errors + slow   |
+| Retention    | Data retained per policy     | 7 days hot, 30 days warm   |
+| Alerting     | Alerts configured            | Error spike, high latency  |
+| Runbook      | Operator documentation       | Approved by ops team       |
+| Training     | Team trained                 | Session completed          |
+
+**Definition of Done**: Telemetry running in production, operators trained, alerts active.
+
+### 6.11.6 Success Metrics Summary
+
+| Phase   | Primary Metric         | Secondary Metric            | Deadline      |
+| ------- | ---------------------- | --------------------------- | ------------- |
+| Phase 1 | SDK compiles and runs  | Zero overhead when disabled | End of Week 2 |
+| Phase 2 | 100% RPC coverage      | <1ms latency overhead       | End of Week 4 |
+| Phase 3 | Cross-node traces work | <5% throughput impact       | End of Week 6 |
+| Phase 4 | Consensus fully traced | No consensus timing impact  | End of Week 8 |
+| Phase 5 | Production deployment  | Operators trained           | End of Week 9 |
+
+---
+
+## 6.12 Recommended Implementation Order
+
+Based on ROI analysis, implement in this exact order:
+
+```mermaid
+flowchart TB
+    subgraph week1["Week 1"]
+        t1[1. OpenTelemetry SDK<br/>Conan/CMake integration]
+        t2[2. Telemetry interface<br/>SpanGuard, config]
+    end
+
+    subgraph week2["Week 2"]
+        t3[3. RPC ServerHandler<br/>instrumentation]
+        t4[4. Basic Jaeger setup<br/>for testing]
+    end
+
+    subgraph week3["Week 3"]
+        t5[5. Transaction submit<br/>tracing]
+        t6[6. Grafana dashboard<br/>v1]
+    end
+
+    subgraph week4["Week 4"]
+        t7[7. Protobuf context<br/>extension]
+        t8[8. PeerImp tx.relay<br/>instrumentation]
+    end
+
+    subgraph week5["Week 5"]
+        t9[9. Multi-node<br/>integration tests]
+        t10[10. Performance<br/>benchmarks]
+    end
+
+    subgraph week6_8["Weeks 6-8"]
+        t11[11. Consensus<br/>instrumentation]
+        t12[12. Full integration<br/>testing]
+    end
+
+    subgraph week9["Week 9"]
+        t13[13. Production<br/>deployment]
+        t14[14. Documentation<br/>& training]
+    end
+
+    t1 --> t2 --> t3 --> t4
+    t4 --> t5 --> t6
+    t6 --> t7 --> t8
+    t8 --> t9 --> t10
+    t10 --> t11 --> t12
+    t12 --> t13 --> t14
+
+    style week1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style week2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style week3 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week4 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week5 fill:#bf360c,stroke:#8c2809,color:#fff
+    style week6_8 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style week9 fill:#4a148c,stroke:#2e0d57,color:#fff
+    style t1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t4 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style t5 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t6 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t7 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t8 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t9 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t10 fill:#ffe0b2,stroke:#ffcc80,color:#1e293b
+    style t11 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style t12 fill:#0d47a1,stroke:#082f6a,color:#fff
+    style t13 fill:#4a148c,stroke:#2e0d57,color:#fff
+    style t14 fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+---
+
+_Previous: [Configuration Reference](./05-configuration-reference.md)_ | _Next: [Observability Backends](./07-observability-backends.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/07-observability-backends.md
+++ b/OpenTelemetryPlan/07-observability-backends.md
@@ -0,0 +1,595 @@
+# Observability Backend Recommendations
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Implementation Phases](./06-implementation-phases.md) | [Appendix](./08-appendix.md)
+
+---
+
+## 7.1 Development/Testing Backends
+
+| Backend    | Pros                | Cons              | Use Case          |
+| ---------- | ------------------- | ----------------- | ----------------- |
+| **Jaeger** | Easy setup, good UI | Limited retention | Local dev, CI     |
+| **Zipkin** | Simple, lightweight | Basic features    | Quick prototyping |
+
+### Quick Start with Jaeger
+
+```bash
+# Start Jaeger with OTLP support
+docker run -d --name jaeger \
+  -e COLLECTOR_OTLP_ENABLED=true \
+  -p 16686:16686 \
+  -p 4317:4317 \
+  -p 4318:4318 \
+  jaegertracing/all-in-one:latest
+```
+
+---
+
+## 7.2 Production Backends
+
+| Backend           | Pros                                      | Cons               | Use Case                    |
+| ----------------- | ----------------------------------------- | ------------------ | --------------------------- |
+| **Grafana Tempo** | Cost-effective, Grafana integration       | Newer project      | Most production deployments |
+| **Elastic APM**   | Full observability stack, log correlation | Resource intensive | Existing Elastic users      |
+| **Honeycomb**     | Excellent query, high cardinality         | SaaS cost          | Deep debugging needs        |
+| **Datadog APM**   | Full platform, easy setup                 | SaaS cost          | Enterprise with budget      |
+
+### Backend Selection Flowchart
+
+```mermaid
+flowchart TD
+    start[Select Backend] --> budget{Budget<br/>Constraints?}
+
+    budget -->|Yes| oss[Open Source]
+    budget -->|No| saas{Prefer<br/>SaaS?}
+
+    oss --> existing{Existing<br/>Stack?}
+    existing -->|Grafana| tempo[Grafana Tempo]
+    existing -->|Elastic| elastic[Elastic APM]
+    existing -->|None| tempo
+
+    saas -->|Yes| enterprise{Enterprise<br/>Support?}
+    saas -->|No| oss
+
+    enterprise -->|Yes| datadog[Datadog APM]
+    enterprise -->|No| honeycomb[Honeycomb]
+
+    tempo --> final[Configure Collector]
+    elastic --> final
+    honeycomb --> final
+    datadog --> final
+
+    style start fill:#0f172a,stroke:#020617,color:#fff
+    style budget fill:#334155,stroke:#1e293b,color:#fff
+    style oss fill:#1e293b,stroke:#0f172a,color:#fff
+    style existing fill:#334155,stroke:#1e293b,color:#fff
+    style saas fill:#334155,stroke:#1e293b,color:#fff
+    style enterprise fill:#334155,stroke:#1e293b,color:#fff
+    style final fill:#0f172a,stroke:#020617,color:#fff
+    style tempo fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style elastic fill:#bf360c,stroke:#8c2809,color:#fff
+    style honeycomb fill:#0d47a1,stroke:#082f6a,color:#fff
+    style datadog fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+---
+
+## 7.3 Recommended Production Architecture
+
+```mermaid
+flowchart TB
+    subgraph validators["Validator Nodes"]
+        v1[rippled<br/>Validator 1]
+        v2[rippled<br/>Validator 2]
+    end
+
+    subgraph stock["Stock Nodes"]
+        s1[rippled<br/>Stock 1]
+        s2[rippled<br/>Stock 2]
+    end
+
+    subgraph collector["OTel Collector Cluster"]
+        c1[Collector<br/>DC1]
+        c2[Collector<br/>DC2]
+    end
+
+    subgraph backends["Storage Backends"]
+        tempo[(Grafana<br/>Tempo)]
+        elastic[(Elastic<br/>APM)]
+        archive[(S3/GCS<br/>Archive)]
+    end
+
+    subgraph ui["Visualization"]
+        grafana[Grafana<br/>Dashboards]
+    end
+
+    v1 -->|OTLP| c1
+    v2 -->|OTLP| c1
+    s1 -->|OTLP| c2
+    s2 -->|OTLP| c2
+
+    c1 --> tempo
+    c1 --> elastic
+    c2 --> tempo
+    c2 --> archive
+
+    tempo --> grafana
+    elastic --> grafana
+
+    style validators fill:#b71c1c,stroke:#7f1d1d,color:#ffffff
+    style stock fill:#0d47a1,stroke:#082f6a,color:#ffffff
+    style collector fill:#bf360c,stroke:#8c2809,color:#ffffff
+    style backends fill:#1b5e20,stroke:#0d3d14,color:#ffffff
+    style ui fill:#4a148c,stroke:#2e0d57,color:#ffffff
+```
+
+---
+
+## 7.4 Architecture Considerations
+
+### 7.4.1 Collector Placement
+
+| Strategy      | Description          | Pros                     | Cons                    |
+| ------------- | -------------------- | ------------------------ | ----------------------- |
+| **Sidecar**   | Collector per node   | Isolation, simple config | Resource overhead       |
+| **DaemonSet** | Collector per host   | Shared resources         | Complexity              |
+| **Gateway**   | Central collector(s) | Centralized processing   | Single point of failure |
+
+**Recommendation**: Use **Gateway** pattern with regional collectors for rippled networks:
+
+- One collector cluster per datacenter/region
+- Tail-based sampling at collector level
+- Multiple export destinations for redundancy
+
+### 7.4.2 Sampling Strategy
+
+```mermaid
+flowchart LR
+    subgraph head["Head Sampling (Node)"]
+        hs[10% probabilistic]
+    end
+
+    subgraph tail["Tail Sampling (Collector)"]
+        ts1[Keep all errors]
+        ts2[Keep slow >5s]
+        ts3[Keep 10% rest]
+    end
+
+    head --> tail
+
+    ts1 --> final[Final Traces]
+    ts2 --> final
+    ts3 --> final
+
+    style head fill:#0d47a1,stroke:#082f6a,color:#fff
+    style tail fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style hs fill:#0d47a1,stroke:#082f6a,color:#fff
+    style ts1 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style ts2 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style ts3 fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style final fill:#bf360c,stroke:#8c2809,color:#fff
+```
+
+### 7.4.3 Data Retention
+
+| Environment | Hot Storage | Warm Storage | Cold Archive |
+| ----------- | ----------- | ------------ | ------------ |
+| Development | 24 hours    | N/A          | N/A          |
+| Staging     | 7 days      | N/A          | N/A          |
+| Production  | 7 days      | 30 days      | many years   |
+
+---
+
+## 7.5 Integration Checklist
+
+- [ ] Choose primary backend (Tempo recommended for cost/features)
+- [ ] Deploy collector cluster with high availability
+- [ ] Configure tail-based sampling for error/latency traces
+- [ ] Set up Grafana dashboards for trace visualization
+- [ ] Configure alerts for trace anomalies
+- [ ] Establish data retention policies
+- [ ] Test trace correlation with logs and metrics
+
+---
+
+## 7.6 Grafana Dashboard Examples
+
+Pre-built dashboards for rippled observability.
+
+### 7.6.1 Consensus Health Dashboard
+
+```json
+{
+  "title": "rippled Consensus Health",
+  "uid": "rippled-consensus-health",
+  "tags": ["rippled", "consensus", "tracing"],
+  "panels": [
+    {
+      "title": "Consensus Round Duration",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | avg(duration) by (resource.service.instance.id)"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "unit": "ms",
+          "thresholds": {
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "yellow", "value": 4000 },
+              { "color": "red", "value": 5000 }
+            ]
+          }
+        }
+      },
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Phase Duration Breakdown",
+      "type": "barchart",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=~\"consensus.phase.*\"} | avg(duration) by (name)"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 }
+    },
+    {
+      "title": "Proposers per Round",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | avg(span.xrpl.consensus.proposers)"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 6, "x": 0, "y": 8 }
+    },
+    {
+      "title": "Recent Slow Rounds (>5s)",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"consensus.round\"} | duration > 5s"
+        }
+      ],
+      "gridPos": { "h": 8, "w": 24, "x": 0, "y": 12 }
+    }
+  ]
+}
+```
+
+### 7.6.2 Node Overview Dashboard
+
+```json
+{
+  "title": "rippled Node Overview",
+  "uid": "rippled-node-overview",
+  "panels": [
+    {
+      "title": "Active Nodes",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"} | count_over_time() by (resource.service.instance.id) | count()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 4, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Total Transactions (1h)",
+      "type": "stat",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | count()"
+        }
+      ],
+      "gridPos": { "h": 4, "w": 4, "x": 4, "y": 0 }
+    },
+    {
+      "title": "Error Rate",
+      "type": "gauge",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && status.code=error} | rate() / {resource.service.name=\"rippled\"} | rate() * 100"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "unit": "percent",
+          "max": 10,
+          "thresholds": {
+            "steps": [
+              { "color": "green", "value": null },
+              { "color": "yellow", "value": 1 },
+              { "color": "red", "value": 5 }
+            ]
+          }
+        }
+      },
+      "gridPos": { "h": 4, "w": 4, "x": 8, "y": 0 }
+    },
+    {
+      "title": "Service Map",
+      "type": "nodeGraph",
+      "datasource": "Tempo",
+      "gridPos": { "h": 12, "w": 12, "x": 12, "y": 0 }
+    }
+  ]
+}
+```
+
+### 7.6.3 Alert Rules
+
+```yaml
+# grafana/provisioning/alerting/rippled-alerts.yaml
+apiVersion: 1
+
+groups:
+  - name: rippled-tracing-alerts
+    folder: rippled
+    interval: 1m
+    rules:
+      - uid: consensus-slow
+        title: Consensus Round Slow
+        condition: A
+        data:
+          - refId: A
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name="consensus.round"} | avg(duration) > 5s'
+        for: 5m
+        annotations:
+          summary: Consensus rounds taking >5 seconds
+          description: "Consensus duration: {{ $value }}ms"
+        labels:
+          severity: warning
+
+      - uid: rpc-error-spike
+        title: RPC Error Rate Spike
+        condition: B
+        data:
+          - refId: B
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name=~"rpc.command.*" && status.code=error} | rate() > 0.05'
+        for: 2m
+        annotations:
+          summary: RPC error rate >5%
+        labels:
+          severity: critical
+
+      - uid: tx-throughput-drop
+        title: Transaction Throughput Drop
+        condition: C
+        data:
+          - refId: C
+            datasourceUid: tempo
+            model:
+              queryType: traceql
+              query: '{resource.service.name="rippled" && name="tx.receive"} | rate() < 10'
+        for: 10m
+        annotations:
+          summary: Transaction throughput below threshold
+        labels:
+          severity: warning
+```
+
+---
+
+## 7.7 PerfLog and Insight Correlation
+
+How to correlate OpenTelemetry traces with existing rippled observability.
+
+### 7.7.1 Correlation Architecture
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        otel[OpenTelemetry<br/>Spans]
+        perflog[PerfLog<br/>JSON Logs]
+        insight[Beast Insight<br/>StatsD Metrics]
+    end
+
+    subgraph collectors["Data Collection"]
+        otelc[OTel Collector]
+        promtail[Promtail/Fluentd]
+        statsd[StatsD Exporter]
+    end
+
+    subgraph storage["Storage"]
+        tempo[(Tempo)]
+        loki[(Loki)]
+        prom[(Prometheus)]
+    end
+
+    subgraph grafana["Grafana"]
+        traces[Trace View]
+        logs[Log View]
+        metrics[Metrics View]
+        corr[Correlation<br/>Panel]
+    end
+
+    otel -->|OTLP| otelc --> tempo
+    perflog -->|JSON| promtail --> loki
+    insight -->|StatsD| statsd --> prom
+
+    tempo --> traces
+    loki --> logs
+    prom --> metrics
+
+    traces --> corr
+    logs --> corr
+    metrics --> corr
+
+    style rippled fill:#0d47a1,stroke:#082f6a,color:#fff
+    style collectors fill:#bf360c,stroke:#8c2809,color:#fff
+    style storage fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style grafana fill:#4a148c,stroke:#2e0d57,color:#fff
+    style otel fill:#0d47a1,stroke:#082f6a,color:#fff
+    style perflog fill:#0d47a1,stroke:#082f6a,color:#fff
+    style insight fill:#0d47a1,stroke:#082f6a,color:#fff
+    style otelc fill:#bf360c,stroke:#8c2809,color:#fff
+    style promtail fill:#bf360c,stroke:#8c2809,color:#fff
+    style statsd fill:#bf360c,stroke:#8c2809,color:#fff
+    style tempo fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style loki fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style prom fill:#1b5e20,stroke:#0d3d14,color:#fff
+    style traces fill:#4a148c,stroke:#2e0d57,color:#fff
+    style logs fill:#4a148c,stroke:#2e0d57,color:#fff
+    style metrics fill:#4a148c,stroke:#2e0d57,color:#fff
+    style corr fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+### 7.7.2 Correlation Fields
+
+| Source      | Field                       | Link To       | Purpose                    |
+| ----------- | --------------------------- | ------------- | -------------------------- |
+| **Trace**   | `trace_id`                  | Logs          | Find log entries for trace |
+| **Trace**   | `xrpl.tx.hash`              | Logs, Metrics | Find TX-related data       |
+| **Trace**   | `xrpl.consensus.ledger.seq` | Logs          | Find ledger-related logs   |
+| **PerfLog** | `trace_id` (new)            | Traces        | Jump to trace from log     |
+| **PerfLog** | `ledger_seq`                | Traces        | Find consensus trace       |
+| **Insight** | `exemplar.trace_id`         | Traces        | Jump from metric spike     |
+
+### 7.7.3 Example: Debugging a Slow Transaction
+
+**Step 1: Find the trace**
+
+```
+# In Grafana Explore with Tempo
+{resource.service.name="rippled" && span.xrpl.tx.hash="ABC123..."}
+```
+
+**Step 2: Get the trace_id from the trace view**
+
+```
+Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736
+```
+
+**Step 3: Find related PerfLog entries**
+
+```
+# In Grafana Explore with Loki
+{job="rippled"} |= "4bf92f3577b34da6a3ce929d0e0e4736"
+```
+
+**Step 4: Check Insight metrics for the time window**
+
+```
+# In Grafana with Prometheus
+rate(rippled_tx_applied_total[1m])
+  @ timestamp_from_trace
+```
+
+### 7.7.4 Unified Dashboard Example
+
+```json
+{
+  "title": "rippled Unified Observability",
+  "uid": "rippled-unified",
+  "panels": [
+    {
+      "title": "Transaction Latency (Traces)",
+      "type": "timeseries",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\" && name=\"tx.receive\"} | histogram_over_time(duration)"
+        }
+      ],
+      "gridPos": { "h": 6, "w": 8, "x": 0, "y": 0 }
+    },
+    {
+      "title": "Transaction Rate (Metrics)",
+      "type": "timeseries",
+      "datasource": "Prometheus",
+      "targets": [
+        {
+          "expr": "rate(rippled_tx_received_total[5m])",
+          "legendFormat": "{{ instance }}"
+        }
+      ],
+      "fieldConfig": {
+        "defaults": {
+          "links": [
+            {
+              "title": "View traces",
+              "url": "/explore?left={\"datasource\":\"Tempo\",\"query\":\"{resource.service.name=\\\"rippled\\\" && name=\\\"tx.receive\\\"}\"}"
+            }
+          ]
+        }
+      },
+      "gridPos": { "h": 6, "w": 8, "x": 8, "y": 0 }
+    },
+    {
+      "title": "Recent Logs",
+      "type": "logs",
+      "datasource": "Loki",
+      "targets": [
+        {
+          "expr": "{job=\"rippled\"} | json"
+        }
+      ],
+      "gridPos": { "h": 6, "w": 8, "x": 16, "y": 0 }
+    },
+    {
+      "title": "Trace Search",
+      "type": "table",
+      "datasource": "Tempo",
+      "targets": [
+        {
+          "queryType": "traceql",
+          "query": "{resource.service.name=\"rippled\"}"
+        }
+      ],
+      "fieldConfig": {
+        "overrides": [
+          {
+            "matcher": { "id": "byName", "options": "traceID" },
+            "properties": [
+              {
+                "id": "links",
+                "value": [
+                  {
+                    "title": "View trace",
+                    "url": "/explore?left={\"datasource\":\"Tempo\",\"query\":\"${__value.raw}\"}"
+                  },
+                  {
+                    "title": "View logs",
+                    "url": "/explore?left={\"datasource\":\"Loki\",\"query\":\"{job=\\\"rippled\\\"} |= \\\"${__value.raw}\\\"\"}"
+                  }
+                ]
+              }
+            ]
+          }
+        ]
+      },
+      "gridPos": { "h": 12, "w": 24, "x": 0, "y": 6 }
+    }
+  ]
+}
+```
+
+---
+
+_Previous: [Implementation Phases](./06-implementation-phases.md)_ | _Next: [Appendix](./08-appendix.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/08-appendix.md
+++ b/OpenTelemetryPlan/08-appendix.md
@@ -0,0 +1,133 @@
+# Appendix
+
+> **Parent Document**: [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)
+> **Related**: [Observability Backends](./07-observability-backends.md)
+
+---
+
+## 8.1 Glossary
+
+| Term                  | Definition                                                 |
+| --------------------- | ---------------------------------------------------------- |
+| **Span**              | A unit of work with start/end time, name, and attributes   |
+| **Trace**             | A collection of spans representing a complete request flow |
+| **Trace ID**          | 128-bit unique identifier for a trace                      |
+| **Span ID**           | 64-bit unique identifier for a span within a trace         |
+| **Context**           | Carrier for trace/span IDs across boundaries               |
+| **Propagator**        | Component that injects/extracts context                    |
+| **Sampler**           | Decides which traces to record                             |
+| **Exporter**          | Sends spans to backend                                     |
+| **Collector**         | Receives, processes, and forwards telemetry                |
+| **OTLP**              | OpenTelemetry Protocol (wire format)                       |
+| **W3C Trace Context** | Standard HTTP headers for trace propagation                |
+| **Baggage**           | Key-value pairs propagated across service boundaries       |
+| **Resource**          | Entity producing telemetry (service, host, etc.)           |
+| **Instrumentation**   | Code that creates telemetry data                           |
+
+### rippled-Specific Terms
+
+| Term              | Definition                                         |
+| ----------------- | -------------------------------------------------- |
+| **Overlay**       | P2P network layer managing peer connections        |
+| **Consensus**     | XRP Ledger consensus algorithm (RCL)               |
+| **Proposal**      | Validator's suggested transaction set for a ledger |
+| **Validation**    | Validator's signature on a closed ledger           |
+| **HashRouter**    | Component for transaction deduplication            |
+| **JobQueue**      | Thread pool for asynchronous task execution        |
+| **PerfLog**       | Existing performance logging system in rippled     |
+| **Beast Insight** | Existing metrics framework in rippled              |
+
+---
+
+## 8.2 Span Hierarchy Visualization
+
+```mermaid
+flowchart TB
+    subgraph trace["Trace: Transaction Lifecycle"]
+        rpc["rpc.submit<br/>(entry point)"]
+        validate["tx.validate"]
+        relay["tx.relay<br/>(parent span)"]
+
+        subgraph peers["Peer Spans"]
+            p1["peer.send<br/>Peer A"]
+            p2["peer.send<br/>Peer B"]
+            p3["peer.send<br/>Peer C"]
+        end
+
+        consensus["consensus.round"]
+        apply["tx.apply"]
+    end
+
+    rpc --> validate
+    validate --> relay
+    relay --> p1
+    relay --> p2
+    relay --> p3
+    p1 -.->|"context propagation"| consensus
+    consensus --> apply
+
+    style trace fill:#0f172a,stroke:#020617,color:#fff
+    style peers fill:#1e3a8a,stroke:#172554,color:#fff
+    style rpc fill:#1d4ed8,stroke:#1e40af,color:#fff
+    style validate fill:#047857,stroke:#064e3b,color:#fff
+    style relay fill:#047857,stroke:#064e3b,color:#fff
+    style p1 fill:#0e7490,stroke:#155e75,color:#fff
+    style p2 fill:#0e7490,stroke:#155e75,color:#fff
+    style p3 fill:#0e7490,stroke:#155e75,color:#fff
+    style consensus fill:#fef3c7,stroke:#fde68a,color:#1e293b
+    style apply fill:#047857,stroke:#064e3b,color:#fff
+```
+
+---
+
+## 8.3 References
+
+### OpenTelemetry Resources
+
+1. [OpenTelemetry C++ SDK](https://github.com/open-telemetry/opentelemetry-cpp)
+2. [OpenTelemetry Specification](https://opentelemetry.io/docs/specs/otel/)
+3. [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/)
+4. [OTLP Protocol Specification](https://opentelemetry.io/docs/specs/otlp/)
+
+### Standards
+
+5. [W3C Trace Context](https://www.w3.org/TR/trace-context/)
+6. [W3C Baggage](https://www.w3.org/TR/baggage/)
+7. [Protocol Buffers](https://protobuf.dev/)
+
+### rippled Resources
+
+8. [rippled Source Code](https://github.com/XRPLF/rippled)
+9. [XRP Ledger Documentation](https://xrpl.org/docs/)
+10. [rippled Overlay README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/overlay/README.md)
+11. [rippled RPC README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/rpc/README.md)
+12. [rippled Consensus README](https://github.com/XRPLF/rippled/blob/develop/src/xrpld/app/consensus/README.md)
+
+---
+
+## 8.4 Version History
+
+| Version | Date       | Author | Changes                           |
+| ------- | ---------- | ------ | --------------------------------- |
+| 1.0     | 2026-02-12 | -      | Initial implementation plan       |
+| 1.1     | 2026-02-13 | -      | Refactored into modular documents |
+
+---
+
+## 8.5 Document Index
+
+| Document                                                         | Description                                |
+| ---------------------------------------------------------------- | ------------------------------------------ |
+| [OpenTelemetryPlan.md](./OpenTelemetryPlan.md)                   | Master overview and executive summary      |
+| [01-architecture-analysis.md](./01-architecture-analysis.md)     | rippled architecture and trace points      |
+| [02-design-decisions.md](./02-design-decisions.md)               | SDK selection, exporters, span conventions |
+| [03-implementation-strategy.md](./03-implementation-strategy.md) | Directory structure, performance analysis  |
+| [04-code-samples.md](./04-code-samples.md)                       | C++ code examples for all components       |
+| [05-configuration-reference.md](./05-configuration-reference.md) | rippled config, CMake, Collector configs   |
+| [06-implementation-phases.md](./06-implementation-phases.md)     | Timeline, tasks, risks, success metrics    |
+| [07-observability-backends.md](./07-observability-backends.md)   | Backend selection and architecture         |
+| [08-appendix.md](./08-appendix.md)                               | Glossary, references, version history      |
+
+---
+
+_Previous: [Observability Backends](./07-observability-backends.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
--- a/OpenTelemetryPlan/OpenTelemetryPlan.md
+++ b/OpenTelemetryPlan/OpenTelemetryPlan.md
@@ -0,0 +1,190 @@
+# [OpenTelemetry](00-tracing-fundamentals.md) Distributed Tracing Implementation Plan for rippled (xrpld)
+
+## Executive Summary
+
+This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. The plan addresses the unique challenges of a decentralized peer-to-peer system where trace context must propagate across network boundaries between independent nodes.
+
+### Key Benefits
+
+- **End-to-end transaction visibility**: Track transactions from submission through consensus to ledger inclusion
+- **Consensus round analysis**: Understand timing and behavior of consensus phases across validators
+- **RPC performance insights**: Identify slow handlers and optimize response times
+- **Network topology understanding**: Visualize message propagation patterns between peers
+- **Incident debugging**: Correlate events across distributed nodes during issues
+
+### Estimated Performance Overhead
+
+| Metric        | Overhead   | Notes                               |
+| ------------- | ---------- | ----------------------------------- |
+| CPU           | 1-3%       | Span creation and attribute setting |
+| Memory        | 2-5 MB     | Batch buffer for pending spans      |
+| Network       | 10-50 KB/s | Compressed OTLP export to collector |
+| Latency (p99) | <2%        | With proper sampling configuration  |
+
+---
+
+## Document Structure
+
+This implementation plan is organized into modular documents for easier navigation:
+
+<div align="center">
+
+```mermaid
+flowchart TB
+    overview["📋 OpenTelemetryPlan.md<br/>(This Document)"]
+
+    subgraph analysis["Analysis & Design"]
+        arch["01-architecture-analysis.md"]
+        design["02-design-decisions.md"]
+    end
+
+    subgraph impl["Implementation"]
+        strategy["03-implementation-strategy.md"]
+        code["04-code-samples.md"]
+        config["05-configuration-reference.md"]
+    end
+
+    subgraph deploy["Deployment & Planning"]
+        phases["06-implementation-phases.md"]
+        backends["07-observability-backends.md"]
+        appendix["08-appendix.md"]
+    end
+
+    overview --> analysis
+    overview --> impl
+    overview --> deploy
+
+    arch --> design
+    design --> strategy
+    strategy --> code
+    code --> config
+    config --> phases
+    phases --> backends
+    backends --> appendix
+
+    style overview fill:#1b5e20,stroke:#0d3d14,color:#fff,stroke-width:2px
+    style analysis fill:#0d47a1,stroke:#082f6a,color:#fff
+    style impl fill:#bf360c,stroke:#8c2809,color:#fff
+    style deploy fill:#4a148c,stroke:#2e0d57,color:#fff
+    style arch fill:#0d47a1,stroke:#082f6a,color:#fff
+    style design fill:#0d47a1,stroke:#082f6a,color:#fff
+    style strategy fill:#bf360c,stroke:#8c2809,color:#fff
+    style code fill:#bf360c,stroke:#8c2809,color:#fff
+    style config fill:#bf360c,stroke:#8c2809,color:#fff
+    style phases fill:#4a148c,stroke:#2e0d57,color:#fff
+    style backends fill:#4a148c,stroke:#2e0d57,color:#fff
+    style appendix fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+</div>
+
+---
+
+## Table of Contents
+
+| Section | Document                                                   | Description                                                            |
+| ------- | ---------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **1**   | [Architecture Analysis](./01-architecture-analysis.md)     | rippled component analysis, trace points, instrumentation priorities   |
+| **2**   | [Design Decisions](./02-design-decisions.md)               | SDK selection, exporters, span naming, attributes, context propagation |
+| **3**   | [Implementation Strategy](./03-implementation-strategy.md) | Directory structure, key principles, performance optimization          |
+| **4**   | [Code Samples](./04-code-samples.md)                       | Complete C++ implementation examples for all components                |
+| **5**   | [Configuration Reference](./05-configuration-reference.md) | rippled config, CMake integration, Collector configurations            |
+| **6**   | [Implementation Phases](./06-implementation-phases.md)     | 5-phase timeline, tasks, risks, success metrics                        |
+| **7**   | [Observability Backends](./07-observability-backends.md)   | Backend selection guide and production architecture                    |
+| **8**   | [Appendix](./08-appendix.md)                               | Glossary, references, version history                                  |
+
+---
+
+## 1. Architecture Analysis
+
+The rippled node consists of several key components that require instrumentation for comprehensive distributed tracing. The main areas include the RPC server (HTTP/WebSocket), Overlay P2P network, Consensus mechanism (RCLConsensus), JobQueue for async task execution, and existing observability infrastructure (PerfLog, Insight/StatsD, Journal logging).
+
+Key trace points span across transaction submission via RPC, peer-to-peer message propagation, consensus round execution, and ledger building. The implementation prioritizes high-value, low-risk components first: RPC handlers provide immediate value with minimal risk, while consensus tracing requires careful implementation to avoid timing impacts.
+
+➡️ **[Read full Architecture Analysis](./01-architecture-analysis.md)**
+
+---
+
+## 2. Design Decisions
+
+The OpenTelemetry C++ SDK is selected for its CNCF backing, active development, and native performance characteristics. Traces are exported via OTLP/gRPC (primary) or OTLP/HTTP (fallback) to an OpenTelemetry Collector, which provides flexible routing and sampling.
+
+Span naming follows a hierarchical `<component>.<operation>` convention (e.g., `rpc.submit`, `tx.relay`, `consensus.round`). Context propagation uses W3C Trace Context headers for HTTP and embedded Protocol Buffer fields for P2P messages. The implementation coexists with existing PerfLog and Insight observability systems through correlation IDs.
+
+**Data Collection & Privacy**: Telemetry collects only operational metadata (timing, counts, hashes) — never sensitive content (private keys, balances, amounts, raw payloads). Privacy protection includes account hashing, configurable redaction, sampling, and collector-level filtering. Node operators retain full control(not penned down in this document yet) over what data is exported.
+
+➡️ **[Read full Design Decisions](./02-design-decisions.md)**
+
+---
+
+## 3. Implementation Strategy
+
+The telemetry code is organized under `include/xrpl/telemetry/` for headers and `src/libxrpl/telemetry/` for implementation. Key principles include RAII-based span management via `SpanGuard`, conditional compilation with `XRPL_ENABLE_TELEMETRY`, and minimal runtime overhead through batch processing and efficient sampling.
+
+Performance optimization strategies include probabilistic head sampling (10% default), tail-based sampling at the collector for errors and slow traces, batch export to reduce network overhead, and conditional instrumentation that compiles to no-ops when disabled.
+
+➡️ **[Read full Implementation Strategy](./03-implementation-strategy.md)**
+
+---
+
+## 4. Code Samples
+
+Complete C++ implementation examples are provided for all telemetry components:
+
+- `Telemetry.h` - Core interface for tracer access and span creation
+- `SpanGuard.h` - RAII wrapper for automatic span lifecycle management
+- `TracingInstrumentation.h` - Macros for conditional instrumentation
+- Protocol Buffer extensions for trace context propagation
+- Module-specific instrumentation (RPC, Consensus, P2P, JobQueue)
+
+➡️ **[View all Code Samples](./04-code-samples.md)**
+
+---
+
+## 5. Configuration Reference
+
+Configuration is handled through the `[telemetry]` section in `xrpld.cfg` with options for enabling/disabling, exporter selection, endpoint configuration, sampling ratios, and component-level filtering. CMake integration includes a `XRPL_ENABLE_TELEMETRY` option for compile-time control.
+
+OpenTelemetry Collector configurations are provided for development (with Jaeger) and production (with tail-based sampling, Tempo, and Elastic APM). Docker Compose examples enable quick local development environment setup.
+
+➡️ **[View full Configuration Reference](./05-configuration-reference.md)**
+
+---
+
+## 6. Implementation Phases
+
+The implementation spans 9 weeks across 5 phases:
+
+| Phase | Duration  | Focus               | Key Deliverables                                    |
+| ----- | --------- | ------------------- | --------------------------------------------------- |
+| 1     | Weeks 1-2 | Core Infrastructure | SDK integration, Telemetry interface, Configuration |
+| 2     | Weeks 3-4 | RPC Tracing         | HTTP context extraction, Handler instrumentation    |
+| 3     | Weeks 5-6 | Transaction Tracing | Protocol Buffer context, Relay propagation          |
+| 4     | Weeks 7-8 | Consensus Tracing   | Round spans, Proposal/validation tracing            |
+| 5     | Week 9    | Documentation       | Runbook, Dashboards, Training                       |
+
+**Total Effort**: 47 developer-days with 2 developers
+
+➡️ **[View full Implementation Phases](./06-implementation-phases.md)**
+
+---
+
+## 7. Observability Backends
+
+For development and testing, Jaeger provides easy setup with a good UI. For production deployments, Grafana Tempo is recommended for its cost-effectiveness and Grafana integration, while Elastic APM is ideal for organizations with existing Elastic infrastructure.
+
+The recommended production architecture uses a gateway collector pattern with regional collectors performing tail-based sampling, routing traces to multiple backends (Tempo for primary storage, Elastic for log correlation, S3/GCS for long-term archive).
+
+➡️ **[View Observability Backend Recommendations](./07-observability-backends.md)**
+
+---
+
+## 8. Appendix
+
+The appendix contains a glossary of OpenTelemetry and rippled-specific terms, references to external documentation and specifications, version history for this implementation plan, and a complete document index.
+
+➡️ **[View Appendix](./08-appendix.md)**
+
+---
+
+_This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. For detailed information on any section, follow the links to the corresponding sub-documents._
--- a/OpenTelemetryPlan/POC_taskList.md
+++ b/OpenTelemetryPlan/POC_taskList.md
@@ -0,0 +1,610 @@
+# OpenTelemetry POC Task List
+
+> **Goal**: Build a minimal end-to-end proof of concept that demonstrates distributed tracing in rippled. A successful POC will show RPC request traces flowing from rippled through an OTel Collector into Jaeger, viewable in a browser UI.
+>
+> **Scope**: RPC tracing only (highest value, lowest risk per the [CRAWL phase](./06-implementation-phases.md#6102-quick-wins-immediate-value) in the implementation phases). No cross-node P2P context propagation or consensus tracing in the POC.
+
+### Related Plan Documents
+
+| Document                                                         | Relevance to POC                                                                                                                                          |
+| ---------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [00-tracing-fundamentals.md](./00-tracing-fundamentals.md)       | Core concepts: traces, spans, context propagation, sampling                                                                                               |
+| [01-architecture-analysis.md](./01-architecture-analysis.md)     | RPC request flow (§1.5), key trace points (§1.6), instrumentation priority (§1.7)                                                                         |
+| [02-design-decisions.md](./02-design-decisions.md)               | SDK selection (§2.1), exporter config (§2.2), span naming (§2.3), attribute schema (§2.4), coexistence with PerfLog/Insight (§2.6)                        |
+| [03-implementation-strategy.md](./03-implementation-strategy.md) | Directory structure (§3.1), key principles (§3.2), performance overhead (§3.3-3.6), conditional compilation (§3.7.3), code intrusiveness (§3.9)           |
+| [04-code-samples.md](./04-code-samples.md)                       | Telemetry interface (§4.1), SpanGuard (§4.2), macros (§4.3), RPC instrumentation (§4.5.3)                                                                 |
+| [05-configuration-reference.md](./05-configuration-reference.md) | rippled config (§5.1), config parser (§5.2), Application integration (§5.3), CMake (§5.4), Collector config (§5.5), Docker Compose (§5.6), Grafana (§5.8) |
+| [06-implementation-phases.md](./06-implementation-phases.md)     | Phase 1 core tasks (§6.2), Phase 2 RPC tasks (§6.3), quick wins (§6.10), definition of done (§6.11)                                                       |
+| [07-observability-backends.md](./07-observability-backends.md)   | Jaeger dev setup (§7.1), Grafana dashboards (§7.6), alert rules (§7.6.3)                                                                                  |
+
+---
+
+## Task 0: Docker Observability Stack Setup
+
+**Objective**: Stand up the backend infrastructure to receive, store, and display traces.
+
+**What to do**:
+
+- Create `docker/telemetry/docker-compose.yml` in the repo with three services:
+  1. **OpenTelemetry Collector** (`otel/opentelemetry-collector-contrib:latest`)
+     - Expose ports `4317` (OTLP gRPC) and `4318` (OTLP HTTP)
+     - Expose port `13133` (health check)
+     - Mount a config file `docker/telemetry/otel-collector-config.yaml`
+  2. **Jaeger** (`jaegertracing/all-in-one:latest`)
+     - Expose port `16686` (UI) and `14250` (gRPC collector)
+     - Set env `COLLECTOR_OTLP_ENABLED=true`
+  3. **Grafana** (`grafana/grafana:latest`) — optional but useful
+     - Expose port `3000`
+     - Enable anonymous admin access for local dev (`GF_AUTH_ANONYMOUS_ENABLED=true`, `GF_AUTH_ANONYMOUS_ORG_ROLE=Admin`)
+     - Provision Jaeger as a data source via `docker/telemetry/grafana/provisioning/datasources/jaeger.yaml`
+
+- Create `docker/telemetry/otel-collector-config.yaml`:
+
+  ```yaml
+  receivers:
+    otlp:
+      protocols:
+        grpc:
+          endpoint: 0.0.0.0:4317
+        http:
+          endpoint: 0.0.0.0:4318
+
+  processors:
+    batch:
+      timeout: 1s
+      send_batch_size: 100
+
+  exporters:
+    logging:
+      verbosity: detailed
+    otlp/jaeger:
+      endpoint: jaeger:4317
+      tls:
+        insecure: true
+
+  service:
+    pipelines:
+      traces:
+        receivers: [otlp]
+        processors: [batch]
+        exporters: [logging, otlp/jaeger]
+  ```
+
+- Create Grafana Jaeger datasource provisioning file at `docker/telemetry/grafana/provisioning/datasources/jaeger.yaml`:
+  ```yaml
+  apiVersion: 1
+  datasources:
+    - name: Jaeger
+      type: jaeger
+      access: proxy
+      url: http://jaeger:16686
+  ```
+
+**Verification**: Run `docker compose -f docker/telemetry/docker-compose.yml up -d`, then:
+
+- `curl http://localhost:13133` returns healthy (Collector)
+- `http://localhost:16686` opens Jaeger UI (no traces yet)
+- `http://localhost:3000` opens Grafana (optional)
+
+**Reference**:
+
+- [05-configuration-reference.md §5.5](./05-configuration-reference.md) — Collector config (dev YAML with Jaeger exporter)
+- [05-configuration-reference.md §5.6](./05-configuration-reference.md) — Docker Compose development environment
+- [07-observability-backends.md §7.1](./07-observability-backends.md) — Jaeger quick start and backend selection
+- [05-configuration-reference.md §5.8](./05-configuration-reference.md) — Grafana datasource provisioning and dashboards
+
+---
+
+## Task 1: Add OpenTelemetry C++ SDK Dependency
+
+**Objective**: Make `opentelemetry-cpp` available to the build system.
+
+**What to do**:
+
+- Edit `conanfile.py` to add `opentelemetry-cpp` as an **optional** dependency. The gRPC otel plugin flag (`"grpc/*:otel_plugin": False`) in the existing conanfile may need to remain false — we pull the OTel SDK separately.
+  - Add a Conan option: `with_telemetry = [True, False]` defaulting to `False`
+  - When `with_telemetry` is `True`, add `opentelemetry-cpp` to `self.requires()`
+  - Required OTel Conan components: `opentelemetry-cpp` (which bundles api, sdk, and exporters). If the package isn't in Conan Center, consider using `FetchContent` in CMake or building from source as a fallback.
+- Edit `CMakeLists.txt`:
+  - Add option: `option(XRPL_ENABLE_TELEMETRY "Enable OpenTelemetry tracing" OFF)`
+  - When ON, `find_package(opentelemetry-cpp CONFIG REQUIRED)` and add compile definition `XRPL_ENABLE_TELEMETRY`
+  - When OFF, do nothing (zero build impact)
+- Verify the build succeeds with `-DXRPL_ENABLE_TELEMETRY=OFF` (no regressions) and with `-DXRPL_ENABLE_TELEMETRY=ON` (SDK links successfully).
+
+**Key files**:
+
+- `conanfile.py`
+- `CMakeLists.txt`
+
+**Reference**:
+
+- [05-configuration-reference.md §5.4](./05-configuration-reference.md) — CMake integration, `FindOpenTelemetry.cmake`, `XRPL_ENABLE_TELEMETRY` option
+- [03-implementation-strategy.md §3.2](./03-implementation-strategy.md) — Key principle: zero-cost when disabled via compile-time flags
+- [02-design-decisions.md §2.1](./02-design-decisions.md) — SDK selection rationale and required OTel components
+
+---
+
+## Task 2: Create Core Telemetry Interface and NullTelemetry
+
+**Objective**: Define the `Telemetry` abstract interface and a no-op implementation so the rest of the codebase can reference telemetry without hard-depending on the OTel SDK.
+
+**What to do**:
+
+- Create `include/xrpl/telemetry/Telemetry.h`:
+  - Define `namespace xrpl::telemetry`
+  - Define `struct Telemetry::Setup` holding: `enabled`, `exporterEndpoint`, `samplingRatio`, `serviceName`, `serviceVersion`, `serviceInstanceId`, `traceRpc`, `traceTransactions`, `traceConsensus`, `tracePeer`
+  - Define abstract `class Telemetry` with:
+    - `virtual void start() = 0;`
+    - `virtual void stop() = 0;`
+    - `virtual bool isEnabled() const = 0;`
+    - `virtual nostd::shared_ptr<Tracer> getTracer(string_view name = "rippled") = 0;`
+    - `virtual nostd::shared_ptr<Span> startSpan(string_view name, SpanKind kind = kInternal) = 0;`
+    - `virtual nostd::shared_ptr<Span> startSpan(string_view name, Context const& parentContext, SpanKind kind = kInternal) = 0;`
+    - `virtual bool shouldTraceRpc() const = 0;`
+    - `virtual bool shouldTraceTransactions() const = 0;`
+    - `virtual bool shouldTraceConsensus() const = 0;`
+  - Factory: `std::unique_ptr<Telemetry> make_Telemetry(Setup const&, beast::Journal);`
+  - Config parser: `Telemetry::Setup setup_Telemetry(Section const&, std::string const& nodePublicKey, std::string const& version);`
+
+- Create `include/xrpl/telemetry/SpanGuard.h`:
+  - RAII guard that takes an `nostd::shared_ptr<Span>`, creates a `Scope`, and calls `span->End()` in destructor.
+  - Convenience: `setAttribute()`, `setOk()`, `setStatus()`, `addEvent()`, `recordException()`, `context()`
+  - See [04-code-samples.md](./04-code-samples.md) §4.2 for the full implementation.
+
+- Create `src/libxrpl/telemetry/NullTelemetry.cpp`:
+  - Implements `Telemetry` with all no-ops.
+  - `isEnabled()` returns `false`, `startSpan()` returns a noop span.
+  - This is used when `XRPL_ENABLE_TELEMETRY` is OFF or `enabled=0` in config.
+
+- Guard all OTel SDK headers behind `#ifdef XRPL_ENABLE_TELEMETRY`. The `NullTelemetry` implementation should compile without the OTel SDK present.
+
+**Key new files**:
+
+- `include/xrpl/telemetry/Telemetry.h`
+- `include/xrpl/telemetry/SpanGuard.h`
+- `src/libxrpl/telemetry/NullTelemetry.cpp`
+
+**Reference**:
+
+- [04-code-samples.md §4.1](./04-code-samples.md) — Full `Telemetry` interface with `Setup` struct, lifecycle, tracer access, span creation, and component filtering methods
+- [04-code-samples.md §4.2](./04-code-samples.md) — Full `SpanGuard` RAII implementation and `NullSpanGuard` no-op class
+- [03-implementation-strategy.md §3.1](./03-implementation-strategy.md) — Directory structure: `include/xrpl/telemetry/` for headers, `src/libxrpl/telemetry/` for implementation
+- [03-implementation-strategy.md §3.7.3](./03-implementation-strategy.md) — Conditional instrumentation and zero-cost compile-time disabled pattern
+
+---
+
+## Task 3: Implement OTel-Backed Telemetry
+
+**Objective**: Implement the real `Telemetry` class that initializes the OTel SDK, configures the OTLP exporter and batch processor, and creates tracers/spans.
+
+**What to do**:
+
+- Create `src/libxrpl/telemetry/Telemetry.cpp` (compiled only when `XRPL_ENABLE_TELEMETRY=ON`):
+  - `class TelemetryImpl : public Telemetry` that:
+    - In `start()`: creates a `TracerProvider` with:
+      - Resource attributes: `service.name`, `service.version`, `service.instance.id`
+      - An `OtlpGrpcExporter` pointed at `setup.exporterEndpoint` (default `localhost:4317`)
+      - A `BatchSpanProcessor` with configurable batch size and delay
+      - A `TraceIdRatioBasedSampler` using `setup.samplingRatio`
+    - Sets the global `TracerProvider`
+    - In `stop()`: calls `ForceFlush()` then shuts down the provider
+    - In `startSpan()`: delegates to `getTracer()->StartSpan(name, ...)`
+    - `shouldTraceRpc()` etc. read from `Setup` fields
+
+- Create `src/libxrpl/telemetry/TelemetryConfig.cpp`:
+  - `setup_Telemetry()` parses the `[telemetry]` config section from `xrpld.cfg`
+  - Maps config keys: `enabled`, `exporter`, `endpoint`, `sampling_ratio`, `trace_rpc`, `trace_transactions`, `trace_consensus`, `trace_peer`
+
+- Wire `make_Telemetry()` factory:
+  - If `setup.enabled` is true AND `XRPL_ENABLE_TELEMETRY` is defined: return `TelemetryImpl`
+  - Otherwise: return `NullTelemetry`
+
+- Add telemetry source files to CMake. When `XRPL_ENABLE_TELEMETRY=ON`, compile `Telemetry.cpp` and `TelemetryConfig.cpp` and link against `opentelemetry-cpp::api`, `opentelemetry-cpp::sdk`, `opentelemetry-cpp::otlp_grpc_exporter`. When OFF, compile only `NullTelemetry.cpp`.
+
+**Key new files**:
+
+- `src/libxrpl/telemetry/Telemetry.cpp`
+- `src/libxrpl/telemetry/TelemetryConfig.cpp`
+
+**Key modified files**:
+
+- `CMakeLists.txt` (add telemetry library target)
+
+**Reference**:
+
+- [04-code-samples.md §4.1](./04-code-samples.md) — `Telemetry` interface that `TelemetryImpl` must implement
+- [05-configuration-reference.md §5.2](./05-configuration-reference.md) — `setup_Telemetry()` config parser implementation
+- [02-design-decisions.md §2.2](./02-design-decisions.md) — OTLP/gRPC exporter config (endpoint, TLS options)
+- [02-design-decisions.md §2.4.1](./02-design-decisions.md) — Resource attributes: `service.name`, `service.version`, `service.instance.id`, `xrpl.network.id`
+- [03-implementation-strategy.md §3.4](./03-implementation-strategy.md) — Per-operation CPU costs and overhead budget for span creation
+- [03-implementation-strategy.md §3.5](./03-implementation-strategy.md) — Memory overhead: static (~456 KB) and dynamic (~1.2 MB) budgets
+
+---
+
+## Task 4: Integrate Telemetry into Application Lifecycle
+
+**Objective**: Wire the `Telemetry` object into `Application` so all components can access it.
+
+**What to do**:
+
+- Edit `src/xrpld/app/main/Application.h`:
+  - Forward-declare `namespace xrpl::telemetry { class Telemetry; }`
+  - Add pure virtual method: `virtual telemetry::Telemetry& getTelemetry() = 0;`
+
+- Edit `src/xrpld/app/main/Application.cpp` (the `ApplicationImp` class):
+  - Add member: `std::unique_ptr<telemetry::Telemetry> telemetry_;`
+  - In the constructor, after config is loaded and node identity is known:
+    ```cpp
+    auto const telemetrySection = config_->section("telemetry");
+    auto telemetrySetup = telemetry::setup_Telemetry(
+        telemetrySection,
+        toBase58(TokenType::NodePublic, nodeIdentity_.publicKey()),
+        BuildInfo::getVersionString());
+    telemetry_ = telemetry::make_Telemetry(telemetrySetup, logs_->journal("Telemetry"));
+    ```
+  - In `start()`: call `telemetry_->start()` early
+  - In `stop()` or destructor: call `telemetry_->stop()` late (to flush pending spans)
+  - Implement `getTelemetry()` override: return `*telemetry_`
+
+- Add `[telemetry]` section to the example config `cfg/rippled-example.cfg`:
+  ```ini
+  # [telemetry]
+  # enabled=1
+  # endpoint=localhost:4317
+  # sampling_ratio=1.0
+  # trace_rpc=1
+  ```
+
+**Key modified files**:
+
+- `src/xrpld/app/main/Application.h`
+- `src/xrpld/app/main/Application.cpp`
+- `cfg/rippled-example.cfg` (or equivalent example config)
+
+**Reference**:
+
+- [05-configuration-reference.md §5.3](./05-configuration-reference.md) — `ApplicationImp` changes: member declaration, constructor init, `start()`/`stop()` wiring, `getTelemetry()` override
+- [05-configuration-reference.md §5.1](./05-configuration-reference.md) — `[telemetry]` config section format and all option defaults
+- [03-implementation-strategy.md §3.9.2](./03-implementation-strategy.md) — File impact assessment: `Application.cpp` ~15 lines added, ~3 changed (Low risk)
+
+---
+
+## Task 5: Create Instrumentation Macros
+
+**Objective**: Define convenience macros that make instrumenting code one-liners, and that compile to zero-cost no-ops when telemetry is disabled.
+
+**What to do**:
+
+- Create `src/xrpld/telemetry/TracingInstrumentation.h`:
+  - When `XRPL_ENABLE_TELEMETRY` is defined:
+
+    ```cpp
+    #define XRPL_TRACE_SPAN(telemetry, name) \
+        auto _xrpl_span_ = (telemetry).startSpan(name); \
+        ::xrpl::telemetry::SpanGuard _xrpl_guard_(_xrpl_span_)
+
+    #define XRPL_TRACE_RPC(telemetry, name) \
+        std::optional<::xrpl::telemetry::SpanGuard> _xrpl_guard_; \
+        if ((telemetry).shouldTraceRpc()) { \
+            _xrpl_guard_.emplace((telemetry).startSpan(name)); \
+        }
+
+    #define XRPL_TRACE_SET_ATTR(key, value) \
+        if (_xrpl_guard_.has_value()) { \
+            _xrpl_guard_->setAttribute(key, value); \
+        }
+
+    #define XRPL_TRACE_EXCEPTION(e) \
+        if (_xrpl_guard_.has_value()) { \
+            _xrpl_guard_->recordException(e); \
+        }
+    ```
+
+  - When `XRPL_ENABLE_TELEMETRY` is NOT defined, all macros expand to `((void)0)`
+
+**Key new file**:
+
+- `src/xrpld/telemetry/TracingInstrumentation.h`
+
+**Reference**:
+
+- [04-code-samples.md §4.3](./04-code-samples.md) — Full macro definitions for `XRPL_TRACE_SPAN`, `XRPL_TRACE_RPC`, `XRPL_TRACE_CONSENSUS`, `XRPL_TRACE_SET_ATTR`, `XRPL_TRACE_EXCEPTION` with both enabled and disabled branches
+- [03-implementation-strategy.md §3.7.3](./03-implementation-strategy.md) — Conditional instrumentation pattern: compile-time `#ifndef` and runtime `shouldTrace*()` checks
+- [03-implementation-strategy.md §3.9.7](./03-implementation-strategy.md) — Before/after code examples showing minimal intrusiveness (~1-3 lines per instrumentation point)
+
+---
+
+## Task 6: Instrument RPC ServerHandler
+
+**Objective**: Add tracing to the HTTP RPC entry point so every incoming RPC request creates a span.
+
+**What to do**:
+
+- Edit `src/xrpld/rpc/detail/ServerHandler.cpp`:
+  - `#include` the `TracingInstrumentation.h` header
+  - In `ServerHandler::onRequest(Session& session)`:
+    - At the top of the method, add: `XRPL_TRACE_RPC(app_.getTelemetry(), "rpc.request");`
+    - After the RPC command name is extracted, set attribute: `XRPL_TRACE_SET_ATTR("xrpl.rpc.command", command);`
+    - After the response status is known, set: `XRPL_TRACE_SET_ATTR("http.status_code", static_cast<int64_t>(statusCode));`
+    - Wrap error paths with: `XRPL_TRACE_EXCEPTION(e);`
+  - In `ServerHandler::processRequest(...)`:
+    - Add a child span: `XRPL_TRACE_RPC(app_.getTelemetry(), "rpc.process");`
+    - Set method attribute: `XRPL_TRACE_SET_ATTR("xrpl.rpc.method", request_method);`
+  - In `ServerHandler::onWSMessage(...)` (WebSocket path):
+    - Add: `XRPL_TRACE_RPC(app_.getTelemetry(), "rpc.ws.message");`
+
+- The goal is to see spans like:
+  ```
+  rpc.request
+    └── rpc.process
+  ```
+  in Jaeger for every HTTP RPC call.
+
+**Key modified file**:
+
+- `src/xrpld/rpc/detail/ServerHandler.cpp` (~15-25 lines added)
+
+**Reference**:
+
+- [04-code-samples.md §4.5.3](./04-code-samples.md) — Complete `ServerHandler::onRequest()` instrumented code sample with W3C header extraction, span creation, attribute setting, and error handling
+- [01-architecture-analysis.md §1.5](./01-architecture-analysis.md) — RPC request flow diagram: HTTP request -> attributes -> jobqueue.enqueue -> rpc.command -> response
+- [01-architecture-analysis.md §1.6](./01-architecture-analysis.md) — Key trace points table: `rpc.request` in `ServerHandler.cpp::onRequest()` (Priority: High)
+- [02-design-decisions.md §2.3](./02-design-decisions.md) — Span naming convention: `rpc.request`, `rpc.command.*`
+- [02-design-decisions.md §2.4.2](./02-design-decisions.md) — RPC span attributes: `xrpl.rpc.command`, `xrpl.rpc.version`, `xrpl.rpc.role`, `xrpl.rpc.params`
+- [03-implementation-strategy.md §3.9.2](./03-implementation-strategy.md) — File impact: `ServerHandler.cpp` ~40 lines added, ~10 changed (Low risk)
+
+---
+
+## Task 7: Instrument RPC Command Execution
+
+**Objective**: Add per-command tracing inside the RPC handler so each command (e.g., `submit`, `account_info`, `server_info`) gets its own child span.
+
+**What to do**:
+
+- Edit `src/xrpld/rpc/detail/RPCHandler.cpp`:
+  - `#include` the `TracingInstrumentation.h` header
+  - In `doCommand(RPC::JsonContext& context, Json::Value& result)`:
+    - At the top: `XRPL_TRACE_RPC(context.app.getTelemetry(), "rpc.command." + context.method);`
+    - Set attributes:
+      - `XRPL_TRACE_SET_ATTR("xrpl.rpc.command", context.method);`
+      - `XRPL_TRACE_SET_ATTR("xrpl.rpc.version", static_cast<int64_t>(context.apiVersion));`
+      - `XRPL_TRACE_SET_ATTR("xrpl.rpc.role", (context.role == Role::ADMIN) ? "admin" : "user");`
+    - On success: `XRPL_TRACE_SET_ATTR("xrpl.rpc.status", "success");`
+    - On error: `XRPL_TRACE_SET_ATTR("xrpl.rpc.status", "error");` and set the error message
+
+- After this, traces in Jaeger should look like:
+  ```
+  rpc.request  (xrpl.rpc.command=account_info)
+    └── rpc.process
+          └── rpc.command.account_info  (xrpl.rpc.version=2, xrpl.rpc.role=user, xrpl.rpc.status=success)
+  ```
+
+**Key modified file**:
+
+- `src/xrpld/rpc/detail/RPCHandler.cpp` (~15-20 lines added)
+
+**Reference**:
+
+- [04-code-samples.md §4.5.3](./04-code-samples.md) — `ServerHandler::onRequest()` code sample (includes child span pattern for `rpc.command.*`)
+- [02-design-decisions.md §2.3](./02-design-decisions.md) — Span naming: `rpc.command.*` pattern with dynamic command name (e.g., `rpc.command.server_info`)
+- [02-design-decisions.md §2.4.2](./02-design-decisions.md) — RPC attribute schema: `xrpl.rpc.command`, `xrpl.rpc.version`, `xrpl.rpc.role`, `xrpl.rpc.status`
+- [01-architecture-analysis.md §1.6](./01-architecture-analysis.md) — Key trace points table: `rpc.command.*` in `RPCHandler.cpp::doCommand()` (Priority: High)
+- [02-design-decisions.md §2.6.5](./02-design-decisions.md) — Correlation with PerfLog: how `doCommand()` can link trace_id with existing PerfLog entries
+- [03-implementation-strategy.md §3.4.4](./03-implementation-strategy.md) — RPC request overhead budget: ~1.75 μs total per request
+
+---
+
+## Task 8: Build, Run, and Verify End-to-End
+
+**Objective**: Prove the full pipeline works: rippled emits traces -> OTel Collector receives them -> Jaeger displays them.
+
+**What to do**:
+
+1. **Start the Docker stack**:
+
+   ```bash
+   docker compose -f docker/telemetry/docker-compose.yml up -d
+   ```
+
+   Verify Collector health: `curl http://localhost:13133`
+
+2. **Build rippled with telemetry**:
+
+   ```bash
+   # Adjust for your actual build workflow
+   conan install . --build=missing -o with_telemetry=True
+   cmake --preset default -DXRPL_ENABLE_TELEMETRY=ON
+   cmake --build --preset default
+   ```
+
+3. **Configure rippled**:
+   Add to `rippled.cfg` (or your local test config):
+
+   ```ini
+   [telemetry]
+   enabled=1
+   endpoint=localhost:4317
+   sampling_ratio=1.0
+   trace_rpc=1
+   ```
+
+4. **Start rippled** in standalone mode:
+
+   ```bash
+   ./rippled --conf rippled.cfg -a --start
+   ```
+
+5. **Generate RPC traffic**:
+
+   ```bash
+   # server_info
+   curl -s -X POST http://localhost:5005 \
+     -H "Content-Type: application/json" \
+     -d '{"method":"server_info","params":[{}]}'
+
+   # ledger
+   curl -s -X POST http://localhost:5005 \
+     -H "Content-Type: application/json" \
+     -d '{"method":"ledger","params":[{"ledger_index":"current"}]}'
+
+   # account_info (will error in standalone, that's fine — we trace errors too)
+   curl -s -X POST http://localhost:5005 \
+     -H "Content-Type: application/json" \
+     -d '{"method":"account_info","params":[{"account":"rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh"}]}'
+   ```
+
+6. **Verify in Jaeger**:
+   - Open `http://localhost:16686`
+   - Select service `rippled` from the dropdown
+   - Click "Find Traces"
+   - Confirm you see traces with spans: `rpc.request` -> `rpc.process` -> `rpc.command.server_info`
+   - Click into a trace and verify attributes: `xrpl.rpc.command`, `xrpl.rpc.status`, `xrpl.rpc.version`
+
+7. **Verify zero-overhead when disabled**:
+   - Rebuild with `XRPL_ENABLE_TELEMETRY=OFF`, or set `enabled=0` in config
+   - Run the same RPC calls
+   - Confirm no new traces appear and no errors in rippled logs
+
+**Verification Checklist**:
+
+- [ ] Docker stack starts without errors
+- [ ] rippled builds with `-DXRPL_ENABLE_TELEMETRY=ON`
+- [ ] rippled starts and connects to OTel Collector (check rippled logs for telemetry messages)
+- [ ] Traces appear in Jaeger UI under service "rippled"
+- [ ] Span hierarchy is correct (parent-child relationships)
+- [ ] Span attributes are populated (`xrpl.rpc.command`, `xrpl.rpc.status`, etc.)
+- [ ] Error spans show error status and message
+- [ ] Building with `XRPL_ENABLE_TELEMETRY=OFF` produces no regressions
+- [ ] Setting `enabled=0` at runtime produces no traces and no errors
+
+**Reference**:
+
+- [06-implementation-phases.md §6.11.1](./06-implementation-phases.md) — Phase 1 definition of done: SDK compiles, runtime toggle works, span creation verified in Jaeger, config validation passes
+- [06-implementation-phases.md §6.11.2](./06-implementation-phases.md) — Phase 2 definition of done: 100% RPC coverage, traceparent propagation, <1ms p99 overhead, dashboard deployed
+- [06-implementation-phases.md §6.8](./06-implementation-phases.md) — Success metrics: trace coverage >95%, CPU overhead <3%, memory <5 MB, latency impact <2%
+- [03-implementation-strategy.md §3.9.5](./03-implementation-strategy.md) — Backward compatibility: config optional, protocol unchanged, `XRPL_ENABLE_TELEMETRY=OFF` produces identical binary
+- [01-architecture-analysis.md §1.8](./01-architecture-analysis.md) — Observable outcomes: what traces, metrics, and dashboards to expect
+
+---
+
+## Task 9: Document POC Results and Next Steps
+
+**Objective**: Capture findings, screenshots, and remaining work for the team.
+
+**What to do**:
+
+- Take screenshots of Jaeger showing:
+  - The service list with "rippled"
+  - A trace with the full span tree
+  - Span detail view showing attributes
+- Document any issues encountered (build issues, SDK quirks, missing attributes)
+- Note performance observations (build time impact, any noticeable runtime overhead)
+- Write a short summary of what the POC proves and what it doesn't cover yet:
+  - **Proves**: OTel SDK integrates with rippled, OTLP export works, RPC traces visible
+  - **Doesn't cover**: Cross-node P2P context propagation, consensus tracing, protobuf trace context, W3C traceparent header extraction, tail-based sampling, production deployment
+- Outline next steps (mapping to the full plan phases):
+  - [Phase 2](./06-implementation-phases.md) completion: [W3C header extraction](./02-design-decisions.md) (§2.5), WebSocket tracing, all [RPC handlers](./01-architecture-analysis.md) (§1.6)
+  - [Phase 3](./06-implementation-phases.md): [Protobuf `TraceContext` message](./04-code-samples.md) (§4.4), [transaction relay tracing](./04-code-samples.md) (§4.5.1) across nodes
+  - [Phase 4](./06-implementation-phases.md): [Consensus round and phase tracing](./04-code-samples.md) (§4.5.2)
+  - [Phase 5](./06-implementation-phases.md): [Production collector config](./05-configuration-reference.md) (§5.5.2), [Grafana dashboards](./07-observability-backends.md) (§7.6), [alerting](./07-observability-backends.md) (§7.6.3)
+
+**Reference**:
+
+- [06-implementation-phases.md §6.1](./06-implementation-phases.md) — Full 5-phase timeline overview and Gantt chart
+- [06-implementation-phases.md §6.10](./06-implementation-phases.md) — Crawl-Walk-Run strategy: POC is the CRAWL phase, next steps are WALK and RUN
+- [06-implementation-phases.md §6.12](./06-implementation-phases.md) — Recommended implementation order (14 steps across 9 weeks)
+- [03-implementation-strategy.md §3.9](./03-implementation-strategy.md) — Code intrusiveness assessment and risk matrix for each remaining component
+- [07-observability-backends.md §7.2](./07-observability-backends.md) — Production backend selection (Tempo, Elastic APM, Honeycomb, Datadog)
+- [02-design-decisions.md §2.5](./02-design-decisions.md) — Context propagation design: W3C HTTP headers, protobuf P2P, JobQueue internal
+- [00-tracing-fundamentals.md](./00-tracing-fundamentals.md) — Reference for team onboarding on distributed tracing concepts
+
+---
+
+## Summary
+
+| Task | Description                          | New Files | Modified Files | Depends On |
+| ---- | ------------------------------------ | --------- | -------------- | ---------- |
+| 0    | Docker observability stack           | 4         | 0              | —          |
+| 1    | OTel C++ SDK dependency              | 0         | 2              | —          |
+| 2    | Core Telemetry interface + NullImpl  | 3         | 0              | 1          |
+| 3    | OTel-backed Telemetry implementation | 2         | 1              | 1, 2       |
+| 4    | Application lifecycle integration    | 0         | 3              | 2, 3       |
+| 5    | Instrumentation macros               | 1         | 0              | 2          |
+| 6    | Instrument RPC ServerHandler         | 0         | 1              | 4, 5       |
+| 7    | Instrument RPC command execution     | 0         | 1              | 4, 5       |
+| 8    | End-to-end verification              | 0         | 0              | 0-7        |
+| 9    | Document results and next steps      | 1         | 0              | 8          |
+
+**Parallel work**: Tasks 0 and 1 can run in parallel. Tasks 2 and 5 have no dependency on each other. Tasks 6 and 7 can be done in parallel once Tasks 4 and 5 are complete.
+
+---
+
+## Next Steps (Post-POC)
+
+### Metrics Pipeline for Grafana Dashboards
+
+The current POC exports **traces only**. Grafana's Explore view can query Jaeger for individual traces, but time-series charts (latency histograms, request throughput, error rates) require a **metrics pipeline**. To enable this:
+
+1. **Add a `spanmetrics` connector** to the OTel Collector config that derives RED metrics (Rate, Errors, Duration) from trace spans automatically:
+
+   ```yaml
+   connectors:
+     spanmetrics:
+       histogram:
+         explicit:
+           buckets: [1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 5s]
+       dimensions:
+         - name: xrpl.rpc.command
+         - name: xrpl.rpc.status
+
+   exporters:
+     prometheus:
+       endpoint: 0.0.0.0:8889
+
+   service:
+     pipelines:
+       traces:
+         receivers: [otlp]
+         processors: [batch]
+         exporters: [debug, otlp/jaeger, spanmetrics]
+       metrics:
+         receivers: [spanmetrics]
+         exporters: [prometheus]
+   ```
+
+2. **Add Prometheus** to the Docker Compose stack to scrape the collector's metrics endpoint.
+
+3. **Add Prometheus as a Grafana datasource** and build dashboards for:
+   - RPC request latency (p50/p95/p99) by command
+   - RPC throughput (requests/sec) by command
+   - Error rate by command
+   - Span duration distribution
+
+### Additional Instrumentation
+
+- **W3C `traceparent` header extraction** in `ServerHandler` to support cross-service context propagation from external callers
+- **WebSocket RPC tracing** in `ServerHandler::onWSMessage()`
+- **Transaction relay tracing** across nodes using protobuf `TraceContext` messages
+- **Consensus round and phase tracing** for validator coordination visibility
+- **Ledger close tracing** to measure close-to-validated latency
+
+### Production Hardening
+
+- **Tail-based sampling** in the OTel Collector to reduce volume while retaining error/slow traces
+- **TLS configuration** for the OTLP exporter in production deployments
+- **Resource limits** on the batch processor queue to prevent unbounded memory growth
+- **Health monitoring** for the telemetry pipeline itself (collector lag, export failures)
+
+### POC Lessons Learned
+
+Issues encountered during POC implementation that inform future work:
+
+| Issue                                                                                              | Resolution                                                                    | Impact on Future Work                                            |
+| -------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ---------------------------------------------------------------- |
+| Conan lockfile rejected `opentelemetry-cpp/1.18.0`                                                 | Used `--lockfile=""` to bypass                                                | Lockfile must be regenerated when adding new dependencies        |
+| Conan package only builds OTLP HTTP exporter, not gRPC                                             | Switched from gRPC to HTTP exporter (`localhost:4318/v1/traces`)              | HTTP exporter is the default; gRPC requires custom Conan profile |
+| CMake target `opentelemetry-cpp::api` etc. don't exist in Conan package                            | Use umbrella target `opentelemetry-cpp::opentelemetry-cpp`                    | Conan targets differ from upstream CMake targets                 |
+| OTel Collector `logging` exporter deprecated                                                       | Renamed to `debug` exporter                                                   | Use `debug` in all collector configs going forward               |
+| Macro parameter `telemetry` collided with `::xrpl::telemetry::` namespace                          | Renamed macro params to `_tel_obj_`, `_span_name_`                            | Avoid common words as macro parameter names                      |
+| `opentelemetry::trace::Scope` creates new context on move                                          | Store scope as member, create once in constructor                             | SpanGuard move semantics need care with Scope lifecycle          |
+| `TracerProviderFactory::Create` returns `unique_ptr<sdk::TracerProvider>`, not `nostd::shared_ptr` | Use `std::shared_ptr` member, wrap in `nostd::shared_ptr` for global provider | OTel SDK factory return types don't match API provider types     |
--- a/cspell.config.yaml
+++ b/cspell.config.yaml
@@ -71,14 +71,12 @@ words:
  - coldwallet
  - compr
  - conanfile
-  - cppcoro
  - conanrun
  - confs
  - connectability
  - coro
  - coros
  - cowid
-  - cppcoro
  - cryptocondition
  - cryptoconditional
  - cryptoconditions
@@ -101,14 +99,11 @@ words:
  - endmacro
  - exceptioned
  - Falco
-  - fcontext
  - finalizers
  - firewalled
-  - fcontext
  - fmtdur
  - fsanitize
  - funclets
-  - gantt
  - gcov
  - gcovr
  - ghead
@@ -183,6 +178,7 @@ words:
  - nixpkgs
  - nonxrp
  - noripple
+  - nostd
  - nudb
  - nullptr
  - nunl
@@ -190,7 +186,6 @@ words:
  - ostr
  - pargs
  - partitioner
-  - pratik
  - paychan
  - paychans
  - permdex
@@ -198,7 +193,6 @@ words:
  - permissioned
  - pointee
  - populator
-  - pratik
  - preauth
  - preauthorization
  - preauthorize
@@ -213,7 +207,6 @@ words:
  - queuable
  - Raphson
  - replayer
-  - repost
  - rerere
  - retriable
  - RIPD
@@ -244,7 +237,6 @@ words:
  - soci
  - socidb
  - sslws
-  - stackful
  - statsd
  - STATSDCOLLECTOR
  - stissue
@@ -316,3 +308,9 @@ words:
  - xrplf
  - xxhash
  - xxhasher
+  - xychart
+  - otelc
+  - zpages
+  - traceql
+  - Gantt
+  - gantt
--- a/include/xrpl/core/CoroTask.h
+++ b/include/xrpl/core/CoroTask.h
@@ -1,687 +0,0 @@
-#pragma once
-
-#include <coroutine>
-#include <exception>
-#include <utility>
-#include <variant>
-
-namespace xrpl {
-
-template <typename T = void>
-class CoroTask;
-
-/**
- * CoroTask<void> -- coroutine return type for void-returning coroutines.
- *
- * Class / Dependency Diagram
- * ==========================
- *
- *   CoroTask<void>
- *   +-----------------------------------------------+
- *   | - handle_ : Handle (coroutine_handle<promise>) |
- *   +-----------------------------------------------+
- *   | + handle(), done()                             |
- *   | + await_ready/suspend/resume  (Awaiter iface)  |
- *   +-----------------------------------------------+
- *            |  owns
- *            v
- *   promise_type
- *   +-----------------------------------------------+
- *   | - exception_    : std::exception_ptr           |
- *   | - continuation_ : std::coroutine_handle<>      |
- *   +-----------------------------------------------+
- *   | + get_return_object() -> CoroTask              |
- *   | + initial_suspend()   -> suspend_always (lazy) |
- *   | + final_suspend()     -> FinalAwaiter          |
- *   | + return_void()                                |
- *   | + unhandled_exception()                        |
- *   +-----------------------------------------------+
- *            |  returns at final_suspend
- *            v
- *   FinalAwaiter
- *   +-----------------------------------------------+
- *   | await_suspend(h):                              |
- *   |   if continuation_ set -> symmetric transfer   |
- *   |   else                 -> noop_coroutine        |
- *   +-----------------------------------------------+
- *
- * Design Notes
- * ------------
- * - Lazy start: initial_suspend returns suspend_always, so the coroutine
- *   body does not execute until the handle is explicitly resumed.
- * - Symmetric transfer: await_suspend returns a coroutine_handle instead
- *   of void/bool, allowing the scheduler to jump directly to the next
- *   coroutine without growing the call stack.
- * - Continuation chaining: when one CoroTask is co_await-ed inside
- *   another, the caller's handle is stored as continuation_ so
- *   FinalAwaiter can resume it when this task finishes.
- * - Move-only: the handle is exclusively owned; copy is deleted.
- *
- * Usage Examples
- * ==============
- *
- * 1. Basic void coroutine (the most common case in rippled):
- *
- *      CoroTask<void> doWork(std::shared_ptr<CoroTaskRunner> runner) {
- *          // do something
- *          co_await runner->suspend();   // yield control
- *          // resumed later via runner->post() or runner->resume()
- *          co_return;
- *      }
- *
- * 2. co_await-ing one CoroTask<void> from another (chaining):
- *
- *      CoroTask<void> inner() {
- *          // ...
- *          co_return;
- *      }
- *      CoroTask<void> outer() {
- *          co_await inner();   // continuation_ links outer -> inner
- *          co_return;          // FinalAwaiter resumes outer
- *      }
- *
- * 3. Exceptions propagate through co_await:
- *
- *      CoroTask<void> failing() {
- *          throw std::runtime_error("oops");
- *          co_return;
- *      }
- *      CoroTask<void> caller() {
- *          try { co_await failing(); }
- *          catch (std::runtime_error const&) { // caught here }
- *      }
- *
- * Caveats / Pitfalls
- * ==================
- *
- * BUG-RISK: Dangling references in coroutine parameters.
- *   Coroutine parameters are copied into the frame, but references
- *   are NOT -- they are stored as-is. If the referent goes out of scope
- *   before the coroutine finishes, you get use-after-free.
- *
- *      // BROKEN -- local dies before coroutine runs:
- *      CoroTask<void> bad(int& ref) { co_return; }
- *      void launch() {
- *          int local = 42;
- *          auto task = bad(local);  // frame stores &local
- *      }  // local destroyed; frame holds dangling ref
- *
- *      // FIX -- pass by value, or ensure lifetime via shared_ptr.
- *
- * BUG-RISK: GCC 14 corrupts reference captures in coroutine lambdas.
- *   When a lambda that returns CoroTask captures by reference ([&]),
- *   GCC 14 may generate a corrupted coroutine frame. Always capture
- *   by explicit pointer-to-value instead:
- *
- *      // BROKEN on GCC 14:
- *      jq.postCoroTask(t, n, [&](auto) -> CoroTask<void> { ... });
- *
- *      // FIX -- capture pointers explicitly:
- *      jq.postCoroTask(t, n, [ptr = &val](auto) -> CoroTask<void> { ... });
- *
- * BUG-RISK: Resuming a destroyed or completed CoroTask.
- *   Calling handle().resume() after the coroutine has already run to
- *   completion (done() == true) is undefined behavior. The CoroTaskRunner
- *   guards against this with an XRPL_ASSERT, but standalone usage of
- *   CoroTask must check done() before resuming.
- *
- * BUG-RISK: Moving a CoroTask that is being awaited.
- *   If task A is co_await-ed by task B (so A.continuation_ == B), moving
- *   or destroying A will invalidate the continuation link. Never move
- *   or reassign a CoroTask while it is mid-execution or being awaited.
- *
- * LIMITATION: CoroTask is fire-and-forget for the top-level owner.
- *   There is no built-in notification when the coroutine finishes.
- *   The caller must use external synchronization (e.g. CoroTaskRunner::join
- *   or a gate/condition_variable) to know when it is done.
- *
- * LIMITATION: No cancellation token.
- *   There is no way to cancel a suspended CoroTask from outside. The
- *   coroutine body must cooperatively check a flag (e.g. jq_.isStopping())
- *   after each co_await and co_return early if needed.
- *
- * LIMITATION: Stackless -- cannot suspend from nested non-coroutine calls.
- *   If a coroutine calls a regular function that wants to "yield", it
- *   cannot. Only the immediate coroutine body can use co_await.
- *   This is acceptable for rippled because all yield() sites are shallow.
- */
-template <>
-class CoroTask<void>
-{
-public:
-    struct promise_type;
-    using Handle = std::coroutine_handle<promise_type>;
-
-    /**
-     * Coroutine promise. Compiler uses this to manage coroutine state.
-     * Stores the exception (if any) and the continuation handle for
-     * symmetric transfer back to the awaiting coroutine.
-     */
-    struct promise_type
-    {
-        // Captured exception from the coroutine body, rethrown in
-        // await_resume() when this task is co_await-ed by a caller.
-        std::exception_ptr exception_;
-
-        // Handle to the coroutine that is co_await-ing this task.
-        // Set by await_suspend(). FinalAwaiter uses it for symmetric
-        // transfer back to the caller. Null if this is a top-level task.
-        std::coroutine_handle<> continuation_;
-
-        /**
-         * Create the CoroTask return object.
-         * Called by the compiler at coroutine creation.
-         */
-        CoroTask
-        get_return_object()
-        {
-            return CoroTask{Handle::from_promise(*this)};
-        }
-
-        /**
-         * Lazy start. The coroutine body does not execute until the
-         * handle is explicitly resumed (e.g. by CoroTaskRunner::resume).
-         */
-        std::suspend_always
-        initial_suspend() noexcept
-        {
-            return {};
-        }
-
-        /**
-         * Awaiter returned by final_suspend(). Uses symmetric transfer:
-         * if a continuation exists, transfers control directly to it
-         * (tail-call, no stack growth). Otherwise returns noop_coroutine
-         * so the coroutine frame stays alive for the owner to destroy.
-         */
-        struct FinalAwaiter
-        {
-            /**
-             * Always false. We need await_suspend to run for
-             * symmetric transfer.
-             */
-            bool
-            await_ready() noexcept
-            {
-                return false;
-            }
-
-            /**
-             * Symmetric transfer: returns the continuation handle so
-             * the compiler emits a tail-call instead of a nested resume.
-             * If no continuation is set, returns noop_coroutine to
-             * suspend at final_suspend without destroying the frame.
-             *
-             * @param h Handle to this completing coroutine
-             *
-             * @return Continuation handle, or noop_coroutine
-             */
-            std::coroutine_handle<>
-            await_suspend(Handle h) noexcept
-            {
-                if (auto cont = h.promise().continuation_)
-                    return cont;
-                return std::noop_coroutine();
-            }
-
-            void
-            await_resume() noexcept
-            {
-            }
-        };
-
-        /**
-         * Returns FinalAwaiter for symmetric transfer at coroutine end.
-         */
-        FinalAwaiter
-        final_suspend() noexcept
-        {
-            return {};
-        }
-
-        /**
-         * Called by the compiler for `co_return;` (void coroutine).
-         */
-        void
-        return_void()
-        {
-        }
-
-        /**
-         * Called by the compiler when an exception escapes the coroutine
-         * body. Captures it for later rethrowing in await_resume().
-         */
-        void
-        unhandled_exception()
-        {
-            exception_ = std::current_exception();
-        }
-    };
-
-    /**
-     * Default constructor. Creates an empty (null handle) task.
-     */
-    CoroTask() = default;
-
-    /**
-     * Takes ownership of a compiler-generated coroutine handle.
-     *
-     * @param h Coroutine handle to own
-     */
-    explicit CoroTask(Handle h) : handle_(h)
-    {
-    }
-
-    /**
-     * Destroys the coroutine frame if this task owns one.
-     */
-    ~CoroTask()
-    {
-        if (handle_)
-            handle_.destroy();
-    }
-
-    /**
-     * Move constructor. Transfers handle ownership, leaves other empty.
-     */
-    CoroTask(CoroTask&& other) noexcept : handle_(std::exchange(other.handle_, {}))
-    {
-    }
-
-    /**
-     * Move assignment. Destroys current frame (if any), takes other's.
-     */
-    CoroTask&
-    operator=(CoroTask&& other) noexcept
-    {
-        if (this != &other)
-        {
-            if (handle_)
-                handle_.destroy();
-            handle_ = std::exchange(other.handle_, {});
-        }
-        return *this;
-    }
-
-    CoroTask(CoroTask const&) = delete;
-    CoroTask&
-    operator=(CoroTask const&) = delete;
-
-    /**
-     * @return The underlying coroutine_handle
-     */
-    Handle
-    handle() const
-    {
-        return handle_;
-    }
-
-    /**
-     * @return true if the coroutine has run to completion (or thrown)
-     */
-    bool
-    done() const
-    {
-        return handle_ && handle_.done();
-    }
-
-    // -- Awaiter interface: allows `co_await someCoroTask;` --
-
-    /**
-     * Always false. This task is lazy, so co_await always suspends
-     * the caller to set up the continuation link.
-     */
-    bool
-    await_ready() const noexcept
-    {
-        return false;
-    }
-
-    /**
-     * Stores the caller's handle as our continuation, then returns
-     * our handle for symmetric transfer (caller suspends, we resume).
-     *
-     * @param caller Handle of the coroutine doing co_await on us
-     *
-     * @return Our handle for symmetric transfer
-     */
-    std::coroutine_handle<>
-    await_suspend(std::coroutine_handle<> caller) noexcept
-    {
-        handle_.promise().continuation_ = caller;
-        return handle_;  // Symmetric transfer
-    }
-
-    /**
-     * Called when the caller resumes after co_await. Rethrows any
-     * exception captured by unhandled_exception().
-     */
-    void
-    await_resume()
-    {
-        if (auto& ep = handle_.promise().exception_)
-            std::rethrow_exception(ep);
-    }
-
-private:
-    // Exclusively-owned coroutine handle. Null after move or default
-    // construction. Destroyed in the destructor.
-    Handle handle_;
-};
-
-/**
- * CoroTask<T> -- coroutine return type for value-returning coroutines.
- *
- * Class / Dependency Diagram
- * ==========================
- *
- *   CoroTask<T>
- *   +-----------------------------------------------+
- *   | - handle_ : Handle (coroutine_handle<promise>) |
- *   +-----------------------------------------------+
- *   | + handle(), done()                             |
- *   | + await_ready/suspend/resume  (Awaiter iface)  |
- *   +-----------------------------------------------+
- *            |  owns
- *            v
- *   promise_type
- *   +-----------------------------------------------+
- *   | - result_       : variant<monostate, T,        |
- *   |                           exception_ptr>       |
- *   | - continuation_ : std::coroutine_handle<>      |
- *   +-----------------------------------------------+
- *   | + get_return_object() -> CoroTask              |
- *   | + initial_suspend()   -> suspend_always (lazy) |
- *   | + final_suspend()     -> FinalAwaiter          |
- *   | + return_value(T)     -> stores in result_[1]  |
- *   | + unhandled_exception -> stores in result_[2]  |
- *   +-----------------------------------------------+
- *            |  returns at final_suspend
- *            v
- *   FinalAwaiter  (same symmetric-transfer pattern as CoroTask<void>)
- *
- * Value Extraction
- * ----------------
- * await_resume() inspects the variant:
- *   - index 2 (exception_ptr) -> rethrow
- *   - index 1 (T)             -> return value via move
- *
- * Usage Examples
- * ==============
- *
- * 1. Simple value return:
- *
- *      CoroTask<int> computeAnswer() { co_return 42; }
- *
- *      CoroTask<void> caller() {
- *          int v = co_await computeAnswer();  // v == 42
- *      }
- *
- * 2. Chaining value-returning coroutines:
- *
- *      CoroTask<int> add(int a, int b) { co_return a + b; }
- *      CoroTask<int> doubleSum(int a, int b) {
- *          int s = co_await add(a, b);
- *          co_return s * 2;
- *      }
- *
- * 3. Exception propagation from inner to outer:
- *
- *      CoroTask<int> failing() {
- *          throw std::runtime_error("bad");
- *          co_return 0;  // never reached
- *      }
- *      CoroTask<void> caller() {
- *          try {
- *              int v = co_await failing();  // throws here
- *          } catch (std::runtime_error const& e) {
- *              // e.what() == "bad"
- *          }
- *      }
- *
- * Caveats / Pitfalls (in addition to CoroTask<void> caveats above)
- * ================================================================
- *
- * BUG-RISK: await_resume() moves the value out of the variant.
- *   Calling co_await on the same CoroTask<T> instance twice is undefined
- *   behavior -- the second call will see a moved-from T. CoroTask is
- *   single-shot: one co_return, one co_await.
- *
- * BUG-RISK: T must be move-constructible.
- *   return_value(T) takes by value and moves into the variant.
- *   Types that are not movable cannot be used as T.
- *
- * LIMITATION: No co_yield support.
- *   CoroTask<T> only supports a single co_return. It does not implement
- *   yield_value(), so using co_yield inside a CoroTask<T> coroutine is a
- *   compile error. For streaming values, a different return type
- *   (e.g. Generator<T>) would be needed.
- *
- * LIMITATION: Result is only accessible via co_await.
- *   There is no .get() or .result() method. The value can only be
- *   extracted by co_await-ing the CoroTask<T> from inside another
- *   coroutine. For extracting results in non-coroutine code, pass a
- *   pointer to the caller and write through it (as the tests do).
- */
-template <typename T>
-class CoroTask
-{
-public:
-    struct promise_type;
-    using Handle = std::coroutine_handle<promise_type>;
-
-    /**
-     * Coroutine promise for value-returning coroutines.
-     * Stores the result as a variant: monostate (not yet set),
-     * T (co_return value), or exception_ptr (unhandled exception).
-     */
-    struct promise_type
-    {
-        // Tri-state result:
-        //   index 0 (monostate) -- coroutine has not yet completed
-        //   index 1 (T)         -- co_return value stored here
-        //   index 2 (exception) -- unhandled exception captured here
-        std::variant<std::monostate, T, std::exception_ptr> result_;
-
-        // Handle to the coroutine co_await-ing this task. Used by
-        // FinalAwaiter for symmetric transfer. Null for top-level tasks.
-        std::coroutine_handle<> continuation_;
-
-        /**
-         * Create the CoroTask return object.
-         * Called by the compiler at coroutine creation.
-         */
-        CoroTask
-        get_return_object()
-        {
-            return CoroTask{Handle::from_promise(*this)};
-        }
-
-        /**
-         * Lazy start. Coroutine body does not run until explicitly resumed.
-         */
-        std::suspend_always
-        initial_suspend() noexcept
-        {
-            return {};
-        }
-
-        /**
-         * Symmetric-transfer awaiter at coroutine completion.
-         * Same pattern as CoroTask<void>::FinalAwaiter.
-         */
-        struct FinalAwaiter
-        {
-            bool
-            await_ready() noexcept
-            {
-                return false;
-            }
-
-            /**
-             * Returns continuation for symmetric transfer, or
-             * noop_coroutine if this is a top-level task.
-             *
-             * @param h Handle to this completing coroutine
-             *
-             * @return Continuation handle, or noop_coroutine
-             */
-            std::coroutine_handle<>
-            await_suspend(Handle h) noexcept
-            {
-                if (auto cont = h.promise().continuation_)
-                    return cont;
-                return std::noop_coroutine();
-            }
-
-            void
-            await_resume() noexcept
-            {
-            }
-        };
-
-        FinalAwaiter
-        final_suspend() noexcept
-        {
-            return {};
-        }
-
-        /**
-         * Called by the compiler for `co_return value;`.
-         * Moves the value into result_ at index 1.
-         *
-         * @param value The value to store
-         */
-        void
-        return_value(T value)
-        {
-            result_.template emplace<1>(std::move(value));
-        }
-
-        /**
-         * Captures unhandled exceptions at index 2 of result_.
-         * Rethrown later in await_resume().
-         */
-        void
-        unhandled_exception()
-        {
-            result_.template emplace<2>(std::current_exception());
-        }
-    };
-
-    /**
-     * Default constructor. Creates an empty (null handle) task.
-     */
-    CoroTask() = default;
-
-    /**
-     * Takes ownership of a compiler-generated coroutine handle.
-     *
-     * @param h Coroutine handle to own
-     */
-    explicit CoroTask(Handle h) : handle_(h)
-    {
-    }
-
-    /**
-     * Destroys the coroutine frame if this task owns one.
-     */
-    ~CoroTask()
-    {
-        if (handle_)
-            handle_.destroy();
-    }
-
-    /**
-     * Move constructor. Transfers handle ownership, leaves other empty.
-     */
-    CoroTask(CoroTask&& other) noexcept : handle_(std::exchange(other.handle_, {}))
-    {
-    }
-
-    /**
-     * Move assignment. Destroys current frame (if any), takes other's.
-     */
-    CoroTask&
-    operator=(CoroTask&& other) noexcept
-    {
-        if (this != &other)
-        {
-            if (handle_)
-                handle_.destroy();
-            handle_ = std::exchange(other.handle_, {});
-        }
-        return *this;
-    }
-
-    CoroTask(CoroTask const&) = delete;
-    CoroTask&
-    operator=(CoroTask const&) = delete;
-
-    /**
-     * @return The underlying coroutine_handle
-     */
-    Handle
-    handle() const
-    {
-        return handle_;
-    }
-
-    /**
-     * @return true if the coroutine has run to completion (or thrown)
-     */
-    bool
-    done() const
-    {
-        return handle_ && handle_.done();
-    }
-
-    // -- Awaiter interface: allows `T val = co_await someCoroTask;` --
-
-    /**
-     * Always false. co_await always suspends to set up continuation.
-     */
-    bool
-    await_ready() const noexcept
-    {
-        return false;
-    }
-
-    /**
-     * Stores caller as continuation, returns our handle for
-     * symmetric transfer.
-     *
-     * @param caller Handle of the coroutine doing co_await on us
-     *
-     * @return Our handle for symmetric transfer
-     */
-    std::coroutine_handle<>
-    await_suspend(std::coroutine_handle<> caller) noexcept
-    {
-        handle_.promise().continuation_ = caller;
-        return handle_;
-    }
-
-    /**
-     * Extracts the result: rethrows if exception, otherwise moves
-     * the T value out of the variant. Single-shot: calling twice
-     * on the same task is undefined (moved-from T).
-     *
-     * @return The co_return-ed value
-     */
-    T
-    await_resume()
-    {
-        auto& result = handle_.promise().result_;
-        if (auto* ep = std::get_if<2>(&result))
-            std::rethrow_exception(*ep);
-        return std::get<1>(std::move(result));
-    }
-
-private:
-    // Exclusively-owned coroutine handle. Null after move or default
-    // construction. Destroyed in the destructor.
-    Handle handle_;
-};
-
-}  // namespace xrpl
--- a/include/xrpl/core/CoroTaskRunner.ipp
+++ b/include/xrpl/core/CoroTaskRunner.ipp
@@ -1,329 +0,0 @@
-#pragma once
-
-/**
- * @file CoroTaskRunner.ipp
- *
- * CoroTaskRunner inline implementation.
- *
- * This file contains the business logic for managing C++20 coroutines
- * on the JobQueue. It is included at the bottom of JobQueue.h.
- *
- * Data Flow: suspend / post / resume cycle
- * =========================================
- *
- *         coroutine body                 CoroTaskRunner              JobQueue
- *         --------------                 --------------              --------
- *              |
- *    co_await runner->suspend()
- *              |
- *              +--- await_suspend ------> onSuspend()
- *              |                            ++nSuspend_ ------------> nSuspend_
- *              |                          [coroutine is now suspended]
- *              |
- *              .    (externally or by JobQueueAwaiter)
- *              .
- *              +--- (caller calls) -----> post()
- *              |                            ++runCount_
- *              |                            addJob(resume) ----------> job enqueued
- *              |                                                        |
- *              |                                                      [worker picks up]
- *              |                                                        |
- *              +--- <----- resume() <-----------------------------------+
- *              |             --nSuspend_ ------> nSuspend_
- *              |             swap in LocalValues (lvs_)
- *              |             task_.handle().resume()
- *              |                |
- *              |    [coroutine body continues here]
- *              |                |
- *              |             swap out LocalValues
- *              |             --runCount_
- *              |             cv_.notify_all()
- *              v
- *
- * Thread Safety
- * =============
- * - mutex_     : guards task_.handle().resume() so that post()-before-suspend
- *                races cannot resume the coroutine while it is still running.
- *                (See the race condition discussion in JobQueue.h)
- * - mutex_run_ : guards runCount_ counter; used by join() to wait until
- *                all in-flight resume operations complete.
- * - jq_.m_mutex: guards nSuspend_ increments/decrements.
- *
- * Common Mistakes When Modifying This File
- * =========================================
- *
- * 1. Changing lock ordering.
- *    resume() acquires locks in this order: jq_.m_mutex -> mutex_ -> mutex_run_.
- *    post() acquires only mutex_run_. Any new code path that touches these
- *    mutexes must follow the same order to avoid deadlocks.
- *
- * 2. Removing the shared_from_this() capture in post().
- *    The lambda passed to addJob captures [this, sp = shared_from_this()].
- *    If you remove sp, 'this' can be destroyed before the job runs,
- *    causing use-after-free. The sp capture is load-bearing.
- *
- * 3. Forgetting to decrement nSuspend_ on a new code path.
- *    Every ++nSuspend_ must have a matching --nSuspend_. If you add a new
- *    suspension path (e.g. a new awaiter) and forget to decrement on resume
- *    or on failure, JobQueue::stop() will hang.
- *
- * 4. Calling task_.handle().resume() without holding mutex_.
- *    This allows a race where the coroutine runs on two threads
- *    simultaneously. Always hold mutex_ around resume().
- *
- * 5. Swapping LocalValues outside of the mutex_ critical section.
- *    The swap-in and swap-out of LocalValues must bracket the resume()
- *    call. If you move the swap-out before the lock_guard(mutex_) is
- *    released, you break LocalValue isolation for any code that runs
- *    after the coroutine suspends but before the lock is dropped.
- */
-//
-
-namespace xrpl {
-
-/**
- * Construct a CoroTaskRunner. Sets runCount_ to 0; does not
- * create the coroutine. Call init() afterwards.
- *
- * @param jq   The JobQueue this coroutine will run on
- * @param type Job type for scheduling priority
- * @param name Human-readable name for logging
- */
-inline JobQueue::CoroTaskRunner::CoroTaskRunner(
-    create_t,
-    JobQueue& jq,
-    JobType type,
-    std::string const& name)
-    : jq_(jq), type_(type), name_(name), runCount_(0)
-{
-}
-
-/**
- * Initialize with a coroutine-returning callable.
- * Stores the callable on the heap (FuncStore) so it outlives the
- * coroutine frame. Coroutine frames store a reference to the
- * callable's implicit object parameter (the lambda). If the callable
- * is a temporary, that reference dangles after the caller returns.
- * Keeping the callable alive here ensures the coroutine's captures
- * remain valid.
- *
- * @param f Callable: CoroTask<void>(shared_ptr<CoroTaskRunner>)
- */
-template <class F>
-void
-JobQueue::CoroTaskRunner::init(F&& f)
-{
-    using Fn = std::decay_t<F>;
-    auto store = std::make_unique<FuncStore<Fn>>(std::forward<F>(f));
-    task_ = store->func(shared_from_this());
-    storedFunc_ = std::move(store);
-}
-
-/**
- * Destructor. Waits for any in-flight resume() to complete, then
- * asserts (debug) that the coroutine has finished or
- * expectEarlyExit() was called.
- *
- * The join() call is necessary because with async dispatch the
- * coroutine runs on a worker thread. The gate signal (which wakes
- * the test thread) can arrive before resume() has set finished_.
- * join() synchronizes via mutex_run_, establishing a happens-before
- * edge: finished_ = true → unlock(mutex_run_) in resume() →
- * lock(mutex_run_) in join() → read finished_.
- */
-inline JobQueue::CoroTaskRunner::~CoroTaskRunner()
-{
-#ifndef NDEBUG
-    join();
-    XRPL_ASSERT(finished_, "xrpl::JobQueue::CoroTaskRunner::~CoroTaskRunner : is finished");
-#endif
-}
-
-/**
- * Increment the JobQueue's suspended-coroutine count (nSuspend_).
- */
-inline void
-JobQueue::CoroTaskRunner::onSuspend()
-{
-    std::lock_guard lock(jq_.m_mutex);
-    ++jq_.nSuspend_;
-}
-
-/**
- * Decrement nSuspend_ without resuming.
- */
-inline void
-JobQueue::CoroTaskRunner::onUndoSuspend()
-{
-    std::lock_guard lock(jq_.m_mutex);
-    --jq_.nSuspend_;
-}
-
-/**
- * Return a SuspendAwaiter whose await_suspend() increments nSuspend_
- * before the coroutine actually suspends. The caller must later call
- * post() or resume() to continue execution.
- *
- * @return Awaiter for use with `co_await runner->suspend()`
- */
-inline auto
-JobQueue::CoroTaskRunner::suspend()
-{
-    /**
-     * Custom awaiter for suspend(). Always suspends (await_ready
-     * returns false) and increments nSuspend_ in await_suspend().
-     */
-    struct SuspendAwaiter
-    {
-        CoroTaskRunner& runner_;  // The runner that owns this coroutine.
-
-        /**
-         * Always returns false so the coroutine suspends.
-         */
-        bool
-        await_ready() const noexcept
-        {
-            return false;
-        }
-
-        /**
-         * Called when the coroutine suspends. Increments nSuspend_
-         * so the JobQueue knows a coroutine is waiting.
-         */
-        void
-        await_suspend(std::coroutine_handle<>) const
-        {
-            runner_.onSuspend();
-        }
-
-        void
-        await_resume() const noexcept
-        {
-        }
-    };
-    return SuspendAwaiter{*this};
-}
-
-/**
- * Schedule coroutine resumption as a job on the JobQueue.
- * A shared_ptr capture (sp) prevents this CoroTaskRunner from being
- * destroyed while the job is queued but not yet executed.
- *
- * @return false if the JobQueue rejected the job (shutting down)
- */
-inline bool
-JobQueue::CoroTaskRunner::post()
-{
-    {
-        std::lock_guard lk(mutex_run_);
-        ++runCount_;
-    }
-
-    // sp prevents 'this' from being destroyed while the job is pending
-    if (jq_.addJob(type_, name_, [this, sp = shared_from_this()]() { resume(); }))
-    {
-        return true;
-    }
-
-    // The coroutine will not run. Undo the runCount_ increment.
-    std::lock_guard lk(mutex_run_);
-    --runCount_;
-    cv_.notify_all();
-    return false;
-}
-
-/**
- * Resume the coroutine on the current thread.
- *
- * Steps:
- *   1. Decrement nSuspend_ (under jq_.m_mutex)
- *   2. Swap in this coroutine's LocalValues for thread-local isolation
- *   3. Resume the coroutine handle (under mutex_)
- *   4. Swap out LocalValues, restoring the thread's previous state
- *   5. Decrement runCount_ and notify join() waiters
- *
- * Note: runCount_ is NOT incremented here — post() already did that.
- * This ensures join() stays blocked for the entire post→resume lifetime.
- */
-inline void
-JobQueue::CoroTaskRunner::resume()
-{
-    {
-        std::lock_guard lock(jq_.m_mutex);
-        --jq_.nSuspend_;
-    }
-    auto saved = detail::getLocalValues().release();
-    detail::getLocalValues().reset(&lvs_);
-    std::lock_guard lock(mutex_);
-    XRPL_ASSERT(!task_.done(), "xrpl::JobQueue::CoroTaskRunner::resume : task is not done");
-    task_.handle().resume();
-    detail::getLocalValues().release();
-    detail::getLocalValues().reset(saved);
-    if (task_.done())
-    {
-#ifndef NDEBUG
-        finished_ = true;
-#endif
-        // Break the shared_ptr cycle: frame -> shared_ptr<runner> -> this.
-        // Use std::move (not task_ = {}) so task_.handle_ is null BEFORE the
-        // frame is destroyed. operator= would destroy the frame while handle_
-        // still holds the old value -- a re-entrancy hazard on GCC-12 if
-        // frame destruction triggers runner cleanup.
-        [[maybe_unused]] auto completed = std::move(task_);
-    }
-    std::lock_guard lk(mutex_run_);
-    --runCount_;
-    cv_.notify_all();
-}
-
-/**
- * @return true if the coroutine has not yet run to completion
- */
-inline bool
-JobQueue::CoroTaskRunner::runnable() const
-{
-    // After normal completion, task_ is reset to break the shared_ptr cycle
-    // (handle_ becomes null). A null handle means the coroutine is done.
-    return task_.handle() && !task_.done();
-}
-
-/**
- * Handle early termination when the coroutine never ran (e.g. JobQueue
- * is stopping). Decrements nSuspend_ and destroys the coroutine frame
- * to break the shared_ptr cycle: frame -> lambda -> runner -> frame.
- */
-inline void
-JobQueue::CoroTaskRunner::expectEarlyExit()
-{
-#ifndef NDEBUG
-    if (!finished_)
-#endif
-    {
-        std::lock_guard lock(jq_.m_mutex);
-        --jq_.nSuspend_;
-#ifndef NDEBUG
-        finished_ = true;
-#endif
-    }
-    // Break the shared_ptr cycle: frame -> shared_ptr<runner> -> this.
-    // The coroutine is at initial_suspend and never ran user code, so
-    // destroying it is safe. Use std::move (not task_ = {}) so
-    // task_.handle_ is null before the frame is destroyed.
-    {
-        [[maybe_unused]] auto completed = std::move(task_);
-    }
-    storedFunc_.reset();
-}
-
-/**
- * Block until all pending/active resume operations complete.
- * Uses cv_ + mutex_run_ to wait until runCount_ reaches 0.
- */
-inline void
-JobQueue::CoroTaskRunner::join()
-{
-    std::unique_lock<std::mutex> lk(mutex_run_);
-    cv_.wait(lk, [this]() { return runCount_ == 0; });
-}
-
-}  // namespace xrpl
--- a/include/xrpl/core/JobQueue.h
+++ b/include/xrpl/core/JobQueue.h
@@ -2,7 +2,6 @@

 #include <xrpl/basics/LocalValue.h>
 #include <xrpl/core/ClosureCounter.h>
-#include <xrpl/core/CoroTask.h>
 #include <xrpl/core/JobTypeData.h>
 #include <xrpl/core/JobTypes.h>
 #include <xrpl/core/detail/Workers.h>
@@ -10,7 +9,6 @@

 #include <boost/coroutine/all.hpp>

-#include <coroutine>
 #include <set>

 namespace xrpl {
@@ -121,395 +119,6 @@ public:
        join();
    };

-    /** C++20 coroutine lifecycle manager. Replaces Coro for new code.
-     *
-     *  Class / Inheritance / Dependency Diagram
-     *  =========================================
-     *
-     *  std::enable_shared_from_this<CoroTaskRunner>
-     *            ^
-     *            | (public inheritance)
-     *            |
-     *  CoroTaskRunner
-     *  +---------------------------------------------------+
-     *  | - lvs_         : detail::LocalValues               |
-     *  | - jq_          : JobQueue&                         |
-     *  | - type_        : JobType                           |
-     *  | - name_        : std::string                       |
-     *  | - runCount_    : int         (in-flight resumes)    |
-     *  | - mutex_       : std::mutex      (coroutine guard) |
-     *  | - mutex_run_   : std::mutex      (join guard)      |
-     *  | - cv_          : condition_variable                 |
-     *  | - task_        : CoroTask<void>                    |
-     *  | - storedFunc_  : unique_ptr<FuncBase> (type-erased)|
-     *  +---------------------------------------------------+
-     *  | + init(F&&)         : set up coroutine callable    |
-     *  | + onSuspend()       : ++jq_.nSuspend_              |
-     *  | + onUndoSuspend()   : --jq_.nSuspend_              |
-     *  | + suspend()         : returns SuspendAwaiter       |
-     *  | + post()            : schedule resume on JobQueue  |
-     *  | + resume()          : resume coroutine on caller   |
-     *  | + runnable()        : !task_.done()                |
-     *  | + expectEarlyExit() : teardown for failed post     |
-     *  | + join()            : block until not running       |
-     *  +---------------------------------------------------+
-     *          |                          |
-     *          | owns                     | references
-     *          v                          v
-     *    CoroTask<void>             JobQueue
-     *    (coroutine frame)          (thread pool + nSuspend_)
-     *
-     *    FuncBase / FuncStore<F>    (type-erased heap storage
-     *                                for the coroutine lambda)
-     *
-     *  Coroutine Lifecycle (Control Flow)
-     *  ===================================
-     *
-     *  Caller thread              JobQueue worker thread
-     *  -------------              ----------------------
-     *  postCoroTask(f)
-     *    |
-     *    +-- check stopping_ (reject if JQ shutting down)
-     *    +-- ++nSuspend_  (lazy start counts as suspended)
-     *    +-- make_shared<CoroTaskRunner>
-     *    +-- init(f)
-     *    |     +-- store lambda on heap (FuncStore)
-     *    |     +-- task_ = f(shared_from_this())
-     *    |           [coroutine created, suspended at initial_suspend]
-     *    +-- post()
-     *    |     +-- ++runCount_
-     *    |     +-- addJob(type_, [resume]{})
-     *    |                                    resume()
-     *    |                                      |
-     *    |                                      +-- --nSuspend_
-     *    |                                      +-- swap in LocalValues
-     *    |                                      +-- task_.handle().resume()
-     *    |                                      |     [coroutine body runs]
-     *    |                                      |     ...
-     *    |                                      |     co_await suspend()
-     *    |                                      |       +-- ++nSuspend_
-     *    |                                      |       [coroutine suspends]
-     *    |                                      +-- swap out LocalValues
-     *    |                                      +-- --runCount_
-     *    |                                      +-- cv_.notify_all()
-     *    |
-     *  post()  <-- called externally or by JobQueueAwaiter
-     *    +-- ++runCount_
-     *    +-- addJob(type_, [resume]{})
-     *                                         resume()
-     *                                           |
-     *                                           +-- [coroutine body continues]
-     *                                           +-- co_return
-     *                                           +-- --runCount_
-     *                                           +-- cv_.notify_all()
-     *  join()
-     *    +-- cv_.wait([]{runCount_ == 0})
-     *    +-- [done]
-     *
-     *  Usage Examples
-     *  ==============
-     *
-     *  1. Fire-and-forget coroutine (most common pattern):
-     *
-     *      jq.postCoroTask(jtCLIENT, "MyWork",
-     *          [](auto runner) -> CoroTask<void> {
-     *              doSomeWork();
-     *              co_await runner->suspend();  // yield to other jobs
-     *              doMoreWork();
-     *              co_return;
-     *          });
-     *
-     *  2. Manually controlling suspend / resume (external trigger):
-     *
-     *      auto runner = jq.postCoroTask(jtCLIENT, "ExtTrigger",
-     *          [&result](auto runner) -> CoroTask<void> {
-     *              startAsyncOperation(callback);
-     *              co_await runner->suspend();
-     *              // callback called runner->post() to get here
-     *              result = collectResult();
-     *              co_return;
-     *          });
-     *      // ... later, from the callback:
-     *      runner->post();   // reschedule the coroutine on the JobQueue
-     *
-     *  3. Using JobQueueAwaiter for automatic suspend + repost:
-     *
-     *      jq.postCoroTask(jtCLIENT, "AutoRepost",
-     *          [](auto runner) -> CoroTask<void> {
-     *              step1();
-     *              co_await JobQueueAwaiter{runner};  // yield + auto-repost
-     *              step2();
-     *              co_await JobQueueAwaiter{runner};
-     *              step3();
-     *              co_return;
-     *          });
-     *
-     *  4. Checking shutdown after co_await (cooperative cancellation):
-     *
-     *      jq.postCoroTask(jtCLIENT, "Cancellable",
-     *          [&jq](auto runner) -> CoroTask<void> {
-     *              while (moreWork()) {
-     *                  co_await JobQueueAwaiter{runner};
-     *                  if (jq.isStopping())
-     *                      co_return;  // bail out cleanly
-     *                  processNextItem();
-     *              }
-     *              co_return;
-     *          });
-     *
-     *  Caveats / Pitfalls
-     *  ==================
-     *
-     *  BUG-RISK: Calling suspend() without a matching post()/resume().
-     *    After co_await runner->suspend(), the coroutine is parked and
-     *    nSuspend_ is incremented.  If nothing ever calls post() or
-     *    resume(), the coroutine is leaked and JobQueue::stop() will
-     *    hang forever waiting for nSuspend_ to reach zero.
-     *
-     *  BUG-RISK: Calling post() on an already-running coroutine.
-     *    post() schedules a resume() job.  If the coroutine has not
-     *    actually suspended yet (no co_await executed), the resume job
-     *    will try to call handle().resume() while the coroutine is still
-     *    running on another thread.  This is UB.  The mutex_ prevents
-     *    data corruption but the logic is wrong — always co_await
-     *    suspend() before calling post().  (The test testIncorrectOrder
-     *    shows this works only because mutex_ serializes the calls.)
-     *
-     *  BUG-RISK: Dropping the shared_ptr<CoroTaskRunner> before join().
-     *    The CoroTaskRunner destructor asserts (!finished_ is false).
-     *    If you let the last shared_ptr die while the coroutine is still
-     *    running or suspended, you get an assertion failure in debug and
-     *    UB in release.  Always call join() or expectEarlyExit() first.
-     *
-     *  BUG-RISK: Lambda captures outliving the coroutine frame.
-     *    The lambda passed to postCoroTask is heap-allocated (FuncStore)
-     *    to prevent dangling.  But objects captured by pointer still need
-     *    their own lifetime management.  If you capture a raw pointer to
-     *    a stack variable, and the stack frame exits before the coroutine
-     *    finishes, the pointer dangles.  Use shared_ptr or ensure the
-     *    pointed-to object outlives the coroutine.
-     *
-     *  BUG-RISK: Forgetting co_return in a void coroutine.
-     *    If the coroutine body falls off the end without co_return,
-     *    the compiler may silently treat it as co_return (per standard),
-     *    but some compilers warn.  Always write explicit co_return.
-     *
-     *  LIMITATION: CoroTaskRunner only supports CoroTask<void>.
-     *    The task_ member is CoroTask<void>.  To return values from
-     *    the top-level coroutine, write through a captured pointer
-     *    (as the tests demonstrate), or co_await inner CoroTask<T>
-     *    coroutines that return values.
-     *
-     *  LIMITATION: One coroutine per CoroTaskRunner.
-     *    init() must be called exactly once.  You cannot reuse a
-     *    CoroTaskRunner to run a second coroutine.  Create a new one
-     *    via postCoroTask() instead.
-     *
-     *  LIMITATION: No timeout on join().
-     *    join() blocks indefinitely.  If the coroutine is suspended
-     *    and never posted, join() will deadlock.  Use timed waits
-     *    on the gate pattern (condition_variable + wait_for) in tests.
-     */
-    class CoroTaskRunner : public std::enable_shared_from_this<CoroTaskRunner>
-    {
-    private:
-        // Per-coroutine thread-local storage. Swapped in before resume()
-        // and swapped out after, so each coroutine sees its own LocalValue
-        // state regardless of which worker thread executes it.
-        detail::LocalValues lvs_;
-
-        // Back-reference to the owning JobQueue. Used to post jobs,
-        // increment/decrement nSuspend_, and acquire jq_.m_mutex.
-        JobQueue& jq_;
-
-        // Job type passed to addJob() when posting this coroutine.
-        JobType type_;
-
-        // Human-readable name for this coroutine job (for logging).
-        std::string name_;
-
-        // Number of in-flight resume operations (pending + active).
-        // Incremented by post(), decremented when resume() finishes.
-        // Guarded by mutex_run_. join() blocks until this reaches 0.
-        //
-        // A counter (not a bool) is needed because post() can be called
-        // from within the coroutine body (e.g. via JobQueueAwaiter),
-        // enqueuing a second resume while the first is still running.
-        // A bool would be clobbered: R2.post() sets true, then R1's
-        // cleanup sets false — losing the fact that R2 is still pending.
-        int runCount_;
-
-        // Guards task_.handle().resume() to prevent the coroutine from
-        // running on two threads simultaneously. Handles the race where
-        // post() enqueues a resume before the coroutine has actually
-        // suspended (post-before-suspend pattern).
-        std::mutex mutex_;
-
-        // Guards runCount_. Used with cv_ for join() to wait
-        // until all pending/active resume operations complete.
-        std::mutex mutex_run_;
-
-        // Notified when runCount_ reaches zero, allowing
-        // join() waiters to wake up.
-        std::condition_variable cv_;
-
-        // The coroutine handle wrapper. Owns the coroutine frame.
-        // Set by init(), reset to empty by expectEarlyExit() on
-        // early termination.
-        CoroTask<void> task_;
-
-        /**
-         * Type-erased base for heap-stored callables.
-         * Prevents the coroutine lambda from being destroyed before
-         * the coroutine frame is done with it.
-         *
-         * @see FuncStore
-         */
-        struct FuncBase
-        {
-            virtual ~FuncBase() = default;
-        };
-
-        /**
-         * Concrete type-erased storage for a callable of type F.
-         * The coroutine frame stores a reference to the lambda's implicit
-         * object parameter. If the lambda is a temporary, that reference
-         * dangles after the call returns. FuncStore keeps it alive on
-         * the heap for the lifetime of the CoroTaskRunner.
-         */
-        template <class F>
-        struct FuncStore : FuncBase
-        {
-            F func;  // The stored callable (coroutine lambda).
-            explicit FuncStore(F&& f) : func(std::move(f))
-            {
-            }
-        };
-
-        // Heap-allocated callable storage. Set by init(), ensures the
-        // lambda outlives the coroutine frame that references it.
-        std::unique_ptr<FuncBase> storedFunc_;
-
-#ifndef NDEBUG
-        // Debug-only flag. True once the coroutine has completed or
-        // expectEarlyExit() was called. Asserted in the destructor
-        // to catch leaked runners.
-        bool finished_ = false;
-#endif
-
-    public:
-        /**
-         * Tag type for private construction. Prevents external code
-         * from constructing CoroTaskRunner directly. Use postCoroTask().
-         */
-        struct create_t
-        {
-            explicit create_t() = default;
-        };
-
-        /**
-         * Construct a CoroTaskRunner. Private by convention (create_t tag).
-         *
-         * @param jq   The JobQueue this coroutine will run on
-         * @param type Job type for scheduling priority
-         * @param name Human-readable name for logging
-         */
-        CoroTaskRunner(create_t, JobQueue&, JobType, std::string const&);
-
-        CoroTaskRunner(CoroTaskRunner const&) = delete;
-        CoroTaskRunner&
-        operator=(CoroTaskRunner const&) = delete;
-
-        /**
-         * Destructor. Asserts (debug) that the coroutine has finished
-         * or expectEarlyExit() was called.
-         */
-        ~CoroTaskRunner();
-
-        /**
-         * Initialize with a coroutine-returning callable.
-         * Must be called exactly once, after the object is managed by
-         * shared_ptr (because init uses shared_from_this internally).
-         * This is handled automatically by postCoroTask().
-         *
-         * @param f Callable: CoroTask<void>(shared_ptr<CoroTaskRunner>)
-         */
-        template <class F>
-        void
-        init(F&& f);
-
-        /**
-         * Increment the JobQueue's suspended-coroutine count (nSuspend_).
-         * Called when the coroutine is about to suspend. Every call
-         * must be balanced by a corresponding decrement (via resume()
-         * or onUndoSuspend()), or JobQueue::stop() will hang.
-         */
-        void
-        onSuspend();
-
-        /**
-         * Decrement nSuspend_ without resuming.
-         * Used to undo onSuspend() when a scheduled post() fails
-         * (e.g. JobQueue is stopping).
-         */
-        void
-        onUndoSuspend();
-
-        /**
-         * Suspend the coroutine.
-         * The awaiter's await_suspend() increments nSuspend_ before the
-         * coroutine actually suspends. The caller must later call post()
-         * or resume() to continue execution.
-         *
-         * @return An awaiter for use with `co_await runner->suspend()`
-         */
-        auto
-        suspend();
-
-        /**
-         * Schedule coroutine resumption as a job on the JobQueue.
-         * Captures shared_from_this() to prevent this runner from being
-         * destroyed while the job is queued.
-         *
-         * @return true if the job was accepted; false if the JobQueue
-         *         is stopping (caller must handle cleanup)
-         */
-        bool
-        post();
-
-        /**
-         * Resume the coroutine on the current thread.
-         * Decrements nSuspend_, swaps in LocalValues, resumes the
-         * coroutine handle, swaps out LocalValues, and notifies join()
-         * waiters. Lock ordering: mutex_run_ -> jq_.m_mutex -> mutex_.
-         */
-        void
-        resume();
-
-        /**
-         * @return true if the coroutine has not yet run to completion
-         */
-        bool
-        runnable() const;
-
-        /**
-         * Handle early termination when the coroutine never ran.
-         * Decrements nSuspend_ and destroys the coroutine frame to
-         * break the shared_ptr cycle (frame -> lambda -> runner -> frame).
-         * Called by postCoroTask() when post() fails.
-         */
-        void
-        expectEarlyExit();
-
-        /**
-         * Block until all pending/active resume operations complete.
-         * Uses cv_ + mutex_run_ to wait until runCount_ reaches 0.
-         * Warning: deadlocks if the coroutine is suspended and never posted.
-         */
-        void
-        join();
-    };
-
    using JobFunction = std::function<void()>;

    JobQueue(
@@ -556,19 +165,6 @@ public:
    std::shared_ptr<Coro>
    postCoro(JobType t, std::string const& name, F&& f);

-    /** Creates a C++20 coroutine and adds a job to the queue to run it.
-
-        @param t The type of job.
-        @param name Name of the job.
-        @param f Callable with signature
-            CoroTask<void>(std::shared_ptr<CoroTaskRunner>).
-
-        @return shared_ptr to posted CoroTaskRunner. nullptr if not successful.
-    */
-    template <class F>
-    std::shared_ptr<CoroTaskRunner>
-    postCoroTask(JobType t, std::string const& name, F&& f);
-
    /** Jobs waiting at this priority.
     */
    int
@@ -783,7 +379,6 @@ private:
 }  // namespace xrpl

 #include <xrpl/core/Coro.ipp>
-#include <xrpl/core/CoroTaskRunner.ipp>

 namespace xrpl {

@@ -806,69 +401,4 @@ JobQueue::postCoro(JobType t, std::string const& name, F&& f)
    return coro;
 }

-// postCoroTask — entry point for launching a C++20 coroutine on the JobQueue.
-//
-// Control Flow
-// ============
-//
-//   postCoroTask(t, name, f)
-//     |
-//     +-- 1. Check stopping_ — reject if JQ shutting down
-//     |
-//     +-- 2. ++nSuspend_   (mirrors Boost Coro ctor's implicit yield)
-//     |        The coroutine is "suspended" from the JobQueue's perspective
-//     |        even though it hasn't run yet — this keeps the JQ shutdown
-//     |        logic correct (it waits for nSuspend_ to reach 0).
-//     |
-//     +-- 3. Create CoroTaskRunner (shared_ptr, ref-counted)
-//     |
-//     +-- 4. runner->init(f)
-//     |        +-- Heap-allocate the lambda (FuncStore) to prevent
-//     |        |   dangling captures in the coroutine frame
-//     |        +-- task_ = f(shared_from_this())
-//     |              [coroutine created but NOT started — lazy initial_suspend]
-//     |
-//     +-- 5. runner->post()
-//     |        +-- addJob(type_, [resume]{})   → resume on worker thread
-//     |        +-- failure (JQ stopping):
-//     |              +-- runner->expectEarlyExit()
-//     |              |     --nSuspend_, destroy coroutine frame
-//     |              +-- return nullptr
-//
-// Why async post() instead of synchronous resume()?
-// ==================================================
-// The initial dispatch MUST use async post() so the coroutine body runs on
-// a JobQueue worker thread, not the caller's thread.  resume() swaps the
-// caller's thread-local LocalValues with the coroutine's private copy.
-// If the coroutine mutates LocalValues (e.g. thread_specific_storage test),
-// those mutations bleed back into the caller's thread-local state after the
-// swap-out, corrupting subsequent tests that share the same thread pool.
-// Async post() avoids this by running the coroutine on a worker thread whose
-// LocalValues are managed by the thread pool, not by the caller.
-//
-template <class F>
-std::shared_ptr<JobQueue::CoroTaskRunner>
-JobQueue::postCoroTask(JobType t, std::string const& name, F&& f)
-{
-    // Reject if the JQ is shutting down — matches addJob()'s stopping_ check.
-    // Must check before incrementing nSuspend_ to avoid leaving an orphan
-    // count that would cause stop() to hang.
-    if (stopping_)
-        return nullptr;
-
-    {
-        std::lock_guard lock(m_mutex);
-        ++nSuspend_;
-    }
-
-    auto runner = std::make_shared<CoroTaskRunner>(CoroTaskRunner::create_t{}, *this, t, name);
-    runner->init(std::forward<F>(f));
-    if (!runner->post())
-    {
-        runner->expectEarlyExit();
-        runner.reset();
-    }
-    return runner;
-}
-
 }  // namespace xrpl
--- a/include/xrpl/core/JobQueueAwaiter.h
+++ b/include/xrpl/core/JobQueueAwaiter.h
@@ -1,174 +0,0 @@
-#pragma once
-
-#include <xrpl/core/JobQueue.h>
-
-#include <coroutine>
-#include <memory>
-
-namespace xrpl {
-
-/**
- * Awaiter that suspends and immediately reschedules on the JobQueue.
- * Equivalent to calling yield() followed by post() in the old Coro API.
- *
- * Usage:
- *     co_await JobQueueAwaiter{runner};
- *
- * What it waits for: The coroutine is re-queued as a job and resumes
- * when a worker thread picks it up.
- *
- * Which thread resumes: A JobQueue worker thread.
- *
- * What await_resume() returns: void.
- *
- * Dependency Diagram
- * ==================
- *
- *   JobQueueAwaiter
- *   +----------------------------------------------+
- *   | + runner : shared_ptr<CoroTaskRunner>         |
- *   +----------------------------------------------+
- *   | + await_ready()   -> false (always suspend)   |
- *   | + await_suspend() -> bool (suspend or cancel) |
- *   | + await_resume()  -> void                     |
- *   +----------------------------------------------+
- *          |                         |
- *          | uses                    | uses
- *          v                         v
- *   CoroTaskRunner              JobQueue
- *   .onSuspend()                (via runner->post() -> addJob)
- *   .onUndoSuspend()
- *   .post()
- *
- * Control Flow (await_suspend)
- * ============================
- *
- *   co_await JobQueueAwaiter{runner}
- *     |
- *     +-- await_ready() -> false
- *     +-- await_suspend(handle)
- *           |
- *           +-- runner->onSuspend()     // ++nSuspend_
- *           +-- runner->post()          // addJob to JobQueue
- *           |     |
- *           |     +-- success? return true   // coroutine stays suspended
- *           |     |                          // worker thread will call resume()
- *           |     +-- failure? (JQ stopping)
- *           |           +-- runner->onUndoSuspend()  // --nSuspend_
- *           |           +-- return false   // coroutine continues immediately
- *           |                              // so it can clean up and co_return
- *
- * Usage Examples
- * ==============
- *
- * 1. Yield and auto-repost (most common -- replaces yield() + post()):
- *
- *     CoroTask<void> handler(auto runner) {
- *         doPartA();
- *         co_await JobQueueAwaiter{runner};  // yield + repost
- *         doPartB();                         // runs on a worker thread
- *         co_return;
- *     }
- *
- * 2. Multiple yield points in a loop:
- *
- *     CoroTask<void> batchProcessor(auto runner) {
- *         for (auto& item : items) {
- *             process(item);
- *             co_await JobQueueAwaiter{runner};  // let other jobs run
- *         }
- *         co_return;
- *     }
- *
- * 3. Graceful shutdown -- checking after resume:
- *
- *     CoroTask<void> longTask(auto runner, JobQueue& jq) {
- *         while (hasWork()) {
- *             co_await JobQueueAwaiter{runner};
- *             // If JQ is stopping, await_suspend returns false and
- *             // the coroutine continues immediately without re-queuing.
- *             // Always check isStopping() to decide whether to proceed:
- *             if (jq.isStopping())
- *                 co_return;
- *             doNextChunk();
- *         }
- *         co_return;
- *     }
- *
- * Caveats / Pitfalls
- * ==================
- *
- * BUG-RISK: Using a stale or null runner.
- *   The runner shared_ptr must be valid and point to the CoroTaskRunner
- *   that owns the coroutine currently executing. Passing a runner from
- *   a different coroutine, or a default-constructed shared_ptr, is UB.
- *
- * BUG-RISK: Assuming resume happens on the same thread.
- *   After co_await JobQueueAwaiter, the coroutine resumes on whatever
- *   worker thread picks up the job. Do not rely on thread-local state
- *   unless it is managed through LocalValue (which CoroTaskRunner
- *   automatically swaps in/out).
- *
- * BUG-RISK: Ignoring the shutdown path.
- *   When the JobQueue is stopping, post() fails and await_suspend()
- *   returns false (coroutine does NOT actually suspend). The coroutine
- *   body continues immediately on the same thread. If your code after
- *   co_await assumes it was re-queued and is running on a worker thread,
- *   that assumption breaks during shutdown. Always handle the "JQ is
- *   stopping" case, either by checking jq.isStopping() or by letting
- *   the coroutine fall through to co_return naturally.
- *
- * DIFFERENCE from runner->suspend() + runner->post():
- *   JobQueueAwaiter combines both in one atomic operation. With the
- *   manual suspend()/post() pattern, there is a window between the
- *   two calls where an external event could race. JobQueueAwaiter
- *   removes that window -- onSuspend() and post() happen within the
- *   same await_suspend() call while the coroutine is guaranteed to
- *   be suspended. Prefer JobQueueAwaiter unless you need an external
- *   party to decide *when* to call post().
- */
-struct JobQueueAwaiter
-{
-    // The CoroTaskRunner that owns the currently executing coroutine.
-    std::shared_ptr<JobQueue::CoroTaskRunner> runner;
-
-    /**
-     * Always returns false so the coroutine suspends.
-     */
-    bool
-    await_ready() const noexcept
-    {
-        return false;
-    }
-
-    /**
-     * Increment nSuspend (equivalent to yield()) and schedule resume
-     * on the JobQueue (equivalent to post()). If the JobQueue is
-     * stopping, undoes the suspend count and returns false so the
-     * coroutine continues immediately and can clean up.
-     *
-     * @return true if coroutine should stay suspended (job posted);
-     *         false if coroutine should continue (JQ stopping)
-     */
-    bool
-    await_suspend(std::coroutine_handle<>)
-    {
-        runner->onSuspend();
-        if (!runner->post())
-        {
-            // JobQueue is stopping. Undo the suspend count and
-            // don't actually suspend — the coroutine continues
-            // immediately so it can clean up and co_return.
-            runner->onUndoSuspend();
-            return false;
-        }
-        return true;
-    }
-
-    void
-    await_resume() const noexcept
-    {
-    }
-};
-
-}  // namespace xrpl
--- a/presentation.md
+++ b/presentation.md
@@ -0,0 +1,280 @@
+# OpenTelemetry Distributed Tracing for rippled
+
+---
+
+## Slide 1: Introduction
+
+### What is OpenTelemetry?
+
+OpenTelemetry is an open-source, CNCF-backed observability framework for distributed tracing, metrics, and logs.
+
+### Why OpenTelemetry for rippled?
+
+- **End-to-End Transaction Visibility**: Track transactions from submission → consensus → ledger inclusion
+- **Cross-Node Correlation**: Follow requests across multiple independent nodes using a unique `trace_id`
+- **Consensus Round Analysis**: Understand timing and behavior across validators
+- **Incident Debugging**: Correlate events across distributed nodes during issues
+
+```mermaid
+flowchart LR
+    A["Node A<br/>tx.receive<br/>trace_id: abc123"] --> B["Node B<br/>tx.relay<br/>trace_id: abc123"] --> C["Node C<br/>tx.validate<br/>trace_id: abc123"] --> D["Node D<br/>ledger.apply<br/>trace_id: abc123"]
+
+    style A fill:#1565c0,stroke:#0d47a1,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style D fill:#e65100,stroke:#bf360c,color:#fff
+```
+
+> **Trace ID: abc123** — All nodes share the same trace, enabling cross-node correlation.
+
+---
+
+## Slide 2: OpenTelemetry vs Open Source Alternatives
+
+| Feature             | OpenTelemetry    | Jaeger           | Zipkin             | SkyWalking | Pinpoint   | Prometheus |
+| ------------------- | ---------------- | ---------------- | ------------------ | ---------- | ---------- | ---------- |
+| **Tracing**         | YES              | YES              | YES                | YES        | YES        | NO         |
+| **Metrics**         | YES              | NO               | NO                 | YES        | YES        | YES        |
+| **Logs**            | YES              | NO               | NO                 | YES        | NO         | NO         |
+| **C++ SDK**         | YES Official     | YES (Deprecated) | YES (Unmaintained) | NO         | NO         | YES        |
+| **Vendor Neutral**  | YES Primary goal | NO               | NO                 | NO         | NO         | NO         |
+| **Instrumentation** | Manual + Auto    | Manual           | Manual             | Auto-first | Auto-first | Manual     |
+| **Backend**         | Any (exporters)  | Self             | Self               | Self       | Self       | Self       |
+| **CNCF Status**     | Incubating       | Graduated        | NO                 | Incubating | NO         | Graduated  |
+
+> **Why OpenTelemetry?** It's the only actively maintained, full-featured C++ option with vendor neutrality — allowing export to Jaeger, Prometheus, Grafana, or any commercial backend without changing instrumentation.
+
+---
+
+## Slide 3: Comparison with rippled's Existing Solutions
+
+### Current Observability Stack
+
+| Aspect                | PerfLog (JSON)        | StatsD (Metrics)      | OpenTelemetry (NEW)         |
+| --------------------- | --------------------- | --------------------- | --------------------------- |
+| **Type**              | Logging               | Metrics               | Distributed Tracing         |
+| **Scope**             | Single node           | Single node           | **Cross-node**              |
+| **Data**              | JSON log entries      | Counters, gauges      | Spans with context          |
+| **Correlation**       | By timestamp          | By metric name        | By `trace_id`               |
+| **Overhead**          | Low (file I/O)        | Low (UDP)             | Low-Medium (configurable)   |
+| **Question Answered** | "What happened here?" | "How many? How fast?" | **"What was the journey?"** |
+
+### Use Case Matrix
+
+| Scenario                         | PerfLog | StatsD | OpenTelemetry |
+| -------------------------------- | ------- | ------ | ------------- |
+| "How many TXs per second?"       | ❌      | ✅     | ❌            |
+| "Why was this specific TX slow?" | ⚠️      | ❌     | ✅            |
+| "Which node delayed consensus?"  | ❌      | ❌     | ✅            |
+| "Show TX journey across 5 nodes" | ❌      | ❌     | ✅            |
+
+> **Key Insight**: OpenTelemetry **complements** (not replaces) existing systems.
+
+---
+
+## Slide 4: Architecture
+
+### High-Level Integration Architecture
+
+```mermaid
+flowchart TB
+    subgraph rippled["rippled Node"]
+        subgraph services["Core Services"]
+            direction LR
+            RPC["RPC Server<br/>(HTTP/WS)"] ~~~ Overlay["Overlay<br/>(P2P Network)"] ~~~ Consensus["Consensus<br/>(RCLConsensus)"]
+        end
+
+        Telemetry["Telemetry Module<br/>(OpenTelemetry SDK)"]
+
+        services --> Telemetry
+    end
+
+    Telemetry -->|OTLP/gRPC| Collector["OTel Collector"]
+
+    Collector --> Tempo["Grafana Tempo"]
+    Collector --> Jaeger["Jaeger"]
+    Collector --> Elastic["Elastic APM"]
+
+    style rippled fill:#424242,stroke:#212121,color:#fff
+    style services fill:#1565c0,stroke:#0d47a1,color:#fff
+    style Telemetry fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style Collector fill:#e65100,stroke:#bf360c,color:#fff
+```
+
+### Context Propagation
+
+```mermaid
+sequenceDiagram
+    participant Client
+    participant NodeA as Node A
+    participant NodeB as Node B
+
+    Client->>NodeA: Submit TX (no context)
+    Note over NodeA: Creates trace_id: abc123<br/>span: tx.receive
+    NodeA->>NodeB: Relay TX<br/>(traceparent: abc123)
+    Note over NodeB: Links to trace_id: abc123<br/>span: tx.relay
+```
+
+- **HTTP/RPC**: W3C Trace Context headers (`traceparent`)
+- **P2P Messages**: Protocol Buffer extension fields
+
+---
+
+## Slide 5: Implementation Plan
+
+### 5-Phase Rollout (9 Weeks)
+
+```mermaid
+gantt
+    title Implementation Timeline
+    dateFormat  YYYY-MM-DD
+    axisFormat  Week %W
+
+    section Phase 1
+    Core Infrastructure    :p1, 2024-01-01, 2w
+
+    section Phase 2
+    RPC Tracing           :p2, after p1, 2w
+
+    section Phase 3
+    Transaction Tracing   :p3, after p2, 2w
+
+    section Phase 4
+    Consensus Tracing     :p4, after p3, 2w
+
+    section Phase 5
+    Documentation         :p5, after p4, 1w
+```
+
+### Phase Details
+
+| Phase | Focus               | Key Deliverables                             | Effort  |
+| ----- | ------------------- | -------------------------------------------- | ------- |
+| 1     | Core Infrastructure | SDK integration, Telemetry interface, Config | 10 days |
+| 2     | RPC Tracing         | HTTP context extraction, Handler spans       | 10 days |
+| 3     | Transaction Tracing | Protobuf context, P2P relay propagation      | 10 days |
+| 4     | Consensus Tracing   | Round spans, Proposal/validation tracing     | 10 days |
+| 5     | Documentation       | Runbook, Dashboards, Training                | 7 days  |
+
+**Total Effort**: ~47 developer-days (2 developers)
+
+---
+
+## Slide 6: Performance Overhead
+
+### Estimated System Impact
+
+| Metric            | Overhead   | Notes                               |
+| ----------------- | ---------- | ----------------------------------- |
+| **CPU**           | 1-3%       | Span creation and attribute setting |
+| **Memory**        | 2-5 MB     | Batch buffer for pending spans      |
+| **Network**       | 10-50 KB/s | Compressed OTLP export to collector |
+| **Latency (p99)** | <2%        | With proper sampling configuration  |
+
+### Per-Message Overhead (Context Propagation)
+
+Each P2P message carries trace context with the following overhead:
+
+| Field         | Size          | Description                               |
+| ------------- | ------------- | ----------------------------------------- |
+| `trace_id`    | 16 bytes      | Unique identifier for the entire trace    |
+| `span_id`     | 8 bytes       | Current span (becomes parent on receiver) |
+| `trace_flags` | 4 bytes       | Sampling decision flags                   |
+| `trace_state` | 0-4 bytes     | Optional vendor-specific data             |
+| **Total**     | **~32 bytes** | **Added per traced P2P message**          |
+
+```mermaid
+flowchart LR
+    subgraph msg["P2P Message with Trace Context"]
+        A["Original Message<br/>(variable size)"] --> B["+ TraceContext<br/>(~32 bytes)"]
+    end
+
+    subgraph breakdown["Context Breakdown"]
+        C["trace_id<br/>16 bytes"]
+        D["span_id<br/>8 bytes"]
+        E["flags<br/>4 bytes"]
+        F["state<br/>0-4 bytes"]
+    end
+
+    B --> breakdown
+
+    style A fill:#424242,stroke:#212121,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#1565c0,stroke:#0d47a1,color:#fff
+    style D fill:#1565c0,stroke:#0d47a1,color:#fff
+    style E fill:#e65100,stroke:#bf360c,color:#fff
+    style F fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+> **Note**: 32 bytes is negligible compared to typical transaction messages (hundreds to thousands of bytes)
+
+### Mitigation Strategies
+
+```mermaid
+flowchart LR
+    A["Head Sampling<br/>10% default"] --> B["Tail Sampling<br/>Keep errors/slow"] --> C["Batch Export<br/>Reduce I/O"] --> D["Conditional Compile<br/>XRPL_ENABLE_TELEMETRY"]
+
+    style A fill:#1565c0,stroke:#0d47a1,color:#fff
+    style B fill:#2e7d32,stroke:#1b5e20,color:#fff
+    style C fill:#e65100,stroke:#bf360c,color:#fff
+    style D fill:#4a148c,stroke:#2e0d57,color:#fff
+```
+
+### Kill Switches (Rollback Options)
+
+1. **Config Disable**: Set `enabled=0` in config → instant disable, no restart needed for sampling
+2. **Rebuild**: Compile with `XRPL_ENABLE_TELEMETRY=OFF` → zero overhead (no-op)
+3. **Full Revert**: Clean separation allows easy commit reversion
+
+---
+
+## Slide 7: Data Collection & Privacy
+
+### What Data is Collected
+
+| Category        | Attributes Collected                                                               | Purpose                     |
+| --------------- | ---------------------------------------------------------------------------------- | --------------------------- |
+| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index`                        | Trace transaction lifecycle |
+| **Consensus**   | `round`, `phase`, `mode`, `proposers`(public key or public node id), `duration_ms` | Analyze consensus timing    |
+| **RPC**         | `command`, `version`, `status`, `duration_ms`                                      | Monitor RPC performance     |
+| **Peer**        | `peer.id`(public key), `latency_ms`, `message.type`, `message.size`                | Network topology analysis   |
+| **Ledger**      | `ledger.hash`, `ledger.index`, `close_time`, `tx_count`                            | Ledger progression tracking |
+| **Job**         | `job.type`, `queue_ms`, `worker`                                                   | JobQueue performance        |
+
+### What is NOT Collected (Privacy Guarantees)
+
+```mermaid
+flowchart LR
+    subgraph notCollected["❌ NOT Collected"]
+        direction LR
+        A["Private Keys"] ~~~ B["Account Balances"] ~~~ C["Transaction Amounts"]
+    end
+
+    subgraph alsoNot["❌ Also Excluded"]
+        direction LR
+        D["IP Addresses<br/>(configurable)"] ~~~ E["Personal Data"] ~~~ F["Raw TX Payloads"]
+    end
+
+    style A fill:#c62828,stroke:#8c2809,color:#fff
+    style B fill:#c62828,stroke:#8c2809,color:#fff
+    style C fill:#c62828,stroke:#8c2809,color:#fff
+    style D fill:#c62828,stroke:#8c2809,color:#fff
+    style E fill:#c62828,stroke:#8c2809,color:#fff
+    style F fill:#c62828,stroke:#8c2809,color:#fff
+```
+
+### Privacy Protection Mechanisms
+
+| Mechanism                  | Description                                                   |
+| -------------------------- | ------------------------------------------------------------- |
+| **Account Hashing**        | `xrpl.tx.account` is hashed at collector level before storage |
+| **Configurable Redaction** | Sensitive fields can be excluded via config                   |
+| **Sampling**               | Only 10% of traces recorded by default (reduces exposure)     |
+| **Local Control**          | Node operators control what gets exported                     |
+| **No Raw Payloads**        | Transaction content is never recorded, only metadata          |
+
+> **Key Principle**: Telemetry collects **operational metadata** (timing, counts, hashes) — never **sensitive content** (keys, balances, amounts).
+
+---
+
+_End of Presentation_
--- a/src/test/app/Path_test.cpp
+++ b/src/test/app/Path_test.cpp
@@ -8,7 +8,6 @@
 #include <xrpld/rpc/detail/Tuning.h>

 #include <xrpl/beast/unit_test.h>
-#include <xrpl/core/CoroTask.h>
 #include <xrpl/core/JobQueue.h>
 #include <xrpl/json/json_reader.h>
 #include <xrpl/protocol/ApiVersion.h>
@@ -132,6 +131,7 @@ public:
             c,
             Role::USER,
             {},
+             {},
             RPC::apiVersionIfUnspecified},
            {},
            {}};
@@ -155,11 +155,11 @@ public:

        Json::Value result;
        gate g;
-        app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+        app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
            context.params = std::move(params);
+            context.coro = coro;
            RPC::doCommand(context, result);
            g.signal();
-            co_return;
        });

        using namespace std::chrono_literals;
@@ -240,27 +240,28 @@ public:
             c,
             Role::USER,
             {},
+             {},
             RPC::apiVersionIfUnspecified},
            {},
            {}};
        Json::Value result;
        gate g;
        // Test RPC::Tuning::max_src_cur source currencies.
-        app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+        app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
            context.params = rpf(Account("alice"), Account("bob"), RPC::Tuning::max_src_cur);
+            context.coro = coro;
            RPC::doCommand(context, result);
            g.signal();
-            co_return;
        });
        BEAST_EXPECT(g.wait_for(5s));
        BEAST_EXPECT(!result.isMember(jss::error));

        // Test more than RPC::Tuning::max_src_cur source currencies.
-        app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+        app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
            context.params = rpf(Account("alice"), Account("bob"), RPC::Tuning::max_src_cur + 1);
+            context.coro = coro;
            RPC::doCommand(context, result);
            g.signal();
-            co_return;
        });
        BEAST_EXPECT(g.wait_for(5s));
        BEAST_EXPECT(result.isMember(jss::error));
@@ -268,22 +269,22 @@ public:
        // Test RPC::Tuning::max_auto_src_cur source currencies.
        for (auto i = 0; i < (RPC::Tuning::max_auto_src_cur - 1); ++i)
            env.trust(Account("alice")[std::to_string(i + 100)](100), "bob");
-        app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+        app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
            context.params = rpf(Account("alice"), Account("bob"), 0);
+            context.coro = coro;
            RPC::doCommand(context, result);
            g.signal();
-            co_return;
        });
        BEAST_EXPECT(g.wait_for(5s));
        BEAST_EXPECT(!result.isMember(jss::error));

        // Test more than RPC::Tuning::max_auto_src_cur source currencies.
        env.trust(Account("alice")["AUD"](100), "bob");
-        app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+        app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
            context.params = rpf(Account("alice"), Account("bob"), 0);
+            context.coro = coro;
            RPC::doCommand(context, result);
            g.signal();
-            co_return;
        });
        BEAST_EXPECT(g.wait_for(5s));
        BEAST_EXPECT(result.isMember(jss::error));
--- a/src/test/core/CoroTask_test.cpp
+++ b/src/test/core/CoroTask_test.cpp
@@ -1,537 +0,0 @@
-#include <test/jtx.h>
-
-#include <xrpl/core/JobQueue.h>
-#include <xrpl/core/JobQueueAwaiter.h>
-
-#include <chrono>
-#include <mutex>
-
-namespace xrpl {
-namespace test {
-
-/**
- * Test suite for the C++20 coroutine primitives: CoroTask, CoroTaskRunner,
- * and JobQueueAwaiter.
- *
- * Dependency Diagram
- * ==================
- *
- *   CoroTask_test
- *   +-------------------------------------------------+
- *   | + gate (inner class) : condition_variable helper |
- *   +-------------------------------------------------+
- *          |  uses
- *          v
- *   jtx::Env  -->  JobQueue::postCoroTask()
- *                       |
- *                       +-- CoroTaskRunner (suspend / post / resume)
- *                       +-- CoroTask<void> / CoroTask<T>
- *                       +-- JobQueueAwaiter
- *
- * Test Coverage Matrix
- * ====================
- *
- *   Test                      | Primitives exercised
- *   --------------------------+----------------------------------------------
- *   testVoidCompletion        | CoroTask<void> basic lifecycle
- *   testCorrectOrder          | suspend() -> join() -> post() -> complete
- *   testIncorrectOrder        | post() before suspend() (race-safe path)
- *   testJobQueueAwaiter       | JobQueueAwaiter suspend + auto-repost
- *   testThreadSpecificStorage | LocalValue isolation across coroutines
- *   testExceptionPropagation  | unhandled_exception() in promise_type
- *   testMultipleYields        | N sequential suspend/resume cycles
- *   testValueReturn           | CoroTask<T> co_return value
- *   testValueException        | CoroTask<T> exception via co_await
- *   testValueChaining         | nested CoroTask<T> -> CoroTask<T>
- *   testShutdownRejection     | postCoroTask returns nullptr when stopping
- */
-class CoroTask_test : public beast::unit_test::suite
-{
-public:
-    /**
-     * Simple one-shot gate for synchronizing between test thread
-     * and coroutine worker threads. signal() sets the flag;
-     * wait_for() blocks until signaled or timeout.
-     */
-    class gate
-    {
-    private:
-        std::condition_variable cv_;
-        std::mutex mutex_;
-        bool signaled_ = false;
-
-    public:
-        /**
-         * Block until signaled or timeout expires.
-         *
-         * @param rel_time Maximum duration to wait
-         *
-         * @return true if signaled before timeout
-         */
-        template <class Rep, class Period>
-        bool
-        wait_for(std::chrono::duration<Rep, Period> const& rel_time)
-        {
-            std::unique_lock<std::mutex> lk(mutex_);
-            auto b = cv_.wait_for(lk, rel_time, [this] { return signaled_; });
-            signaled_ = false;
-            return b;
-        }
-
-        /**
-         * Signal the gate, waking any waiting thread.
-         */
-        void
-        signal()
-        {
-            std::lock_guard lk(mutex_);
-            signaled_ = true;
-            cv_.notify_all();
-        }
-    };
-
-    // NOTE: All coroutine lambdas passed to postCoroTask use explicit
-    // pointer-by-value captures instead of [&] to work around a GCC 14
-    // bug where reference captures in coroutine lambdas are corrupted
-    // in the coroutine frame.
-
-    /**
-     * CoroTask<void> runs to completion and runner becomes non-runnable.
-     */
-    void
-    testVoidCompletion()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("void completion");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [gp = &g](auto) -> CoroTask<void> {
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g.wait_for(5s));
-        runner->join();
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * Correct order: suspend, join, post, complete.
-     * Mirrors existing Coroutine_test::correct_order.
-     */
-    void
-    testCorrectOrder()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("correct order");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g1, g2;
-        std::shared_ptr<JobQueue::CoroTaskRunner> r;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT,
-            "CoroTaskTest",
-            [rp = &r, g1p = &g1, g2p = &g2](auto runner) -> CoroTask<void> {
-                *rp = runner;
-                g1p->signal();
-                co_await runner->suspend();
-                g2p->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g1.wait_for(5s));
-        runner->join();
-        runner->post();
-        BEAST_EXPECT(g2.wait_for(5s));
-        runner->join();
-    }
-
-    /**
-     * Incorrect order: post() before suspend(). Verifies the
-     * race-safe path. Mirrors Coroutine_test::incorrect_order.
-     */
-    void
-    testIncorrectOrder()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("incorrect order");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [gp = &g](auto runner) -> CoroTask<void> {
-                runner->post();
-                co_await runner->suspend();
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(g.wait_for(5s));
-    }
-
-    /**
-     * JobQueueAwaiter suspend + auto-repost across multiple yield points.
-     */
-    void
-    testJobQueueAwaiter()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("JobQueueAwaiter");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        int step = 0;
-        env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [sp = &step, gp = &g](auto runner) -> CoroTask<void> {
-                *sp = 1;
-                co_await JobQueueAwaiter{runner};
-                *sp = 2;
-                co_await JobQueueAwaiter{runner};
-                *sp = 3;
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(step == 3);
-    }
-
-    /**
-     * Per-coroutine LocalValue isolation. Each coroutine sees its own
-     * copy of thread-local state. Mirrors Coroutine_test::thread_specific_storage.
-     */
-    void
-    testThreadSpecificStorage()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("thread specific storage");
-        Env env(*this);
-
-        auto& jq = env.app().getJobQueue();
-
-        static int const N = 4;
-        std::array<std::shared_ptr<JobQueue::CoroTaskRunner>, N> a;
-
-        LocalValue<int> lv(-1);
-        BEAST_EXPECT(*lv == -1);
-
-        gate g;
-        jq.addJob(jtCLIENT, "LocalValTest", [&]() {
-            this->BEAST_EXPECT(*lv == -1);
-            *lv = -2;
-            this->BEAST_EXPECT(*lv == -2);
-            g.signal();
-        });
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(*lv == -1);
-
-        for (int i = 0; i < N; ++i)
-        {
-            jq.postCoroTask(
-                jtCLIENT,
-                "CoroTaskTest",
-                [this, ap = &a, gp = &g, lvp = &lv, id = i](auto runner) -> CoroTask<void> {
-                    (*ap)[id] = runner;
-                    gp->signal();
-                    co_await runner->suspend();
-
-                    this->BEAST_EXPECT(**lvp == -1);
-                    **lvp = id;
-                    this->BEAST_EXPECT(**lvp == id);
-                    gp->signal();
-                    co_await runner->suspend();
-
-                    this->BEAST_EXPECT(**lvp == id);
-                    co_return;
-                });
-            BEAST_EXPECT(g.wait_for(5s));
-            a[i]->join();
-        }
-        for (auto const& r : a)
-        {
-            r->post();
-            BEAST_EXPECT(g.wait_for(5s));
-            r->join();
-        }
-        for (auto const& r : a)
-        {
-            r->post();
-            r->join();
-        }
-
-        jq.addJob(jtCLIENT, "LocalValTest", [&]() {
-            this->BEAST_EXPECT(*lv == -2);
-            g.signal();
-        });
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(*lv == -1);
-    }
-
-    /**
-     * Exception thrown in coroutine body is caught by
-     * promise_type::unhandled_exception(). Coroutine completes.
-     */
-    void
-    testExceptionPropagation()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("exception propagation");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [gp = &g](auto) -> CoroTask<void> {
-                gp->signal();
-                throw std::runtime_error("test exception");
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g.wait_for(5s));
-        runner->join();
-        // The exception is caught by promise_type::unhandled_exception()
-        // and the coroutine is considered done
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * Multiple sequential suspend/resume cycles via co_await.
-     */
-    void
-    testMultipleYields()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("multiple yields");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        int counter = 0;
-        std::shared_ptr<JobQueue::CoroTaskRunner> r;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT,
-            "CoroTaskTest",
-            [rp = &r, cp = &counter, gp = &g](auto runner) -> CoroTask<void> {
-                *rp = runner;
-                ++(*cp);
-                gp->signal();
-                co_await runner->suspend();
-                ++(*cp);
-                gp->signal();
-                co_await runner->suspend();
-                ++(*cp);
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(counter == 1);
-        runner->join();
-
-        runner->post();
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(counter == 2);
-        runner->join();
-
-        runner->post();
-        BEAST_EXPECT(g.wait_for(5s));
-        BEAST_EXPECT(counter == 3);
-        runner->join();
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * CoroTask<T> returns a value via co_return. Outer coroutine
-     * extracts it with co_await.
-     */
-    void
-    testValueReturn()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("value return");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        int result = 0;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [rp = &result, gp = &g](auto) -> CoroTask<void> {
-                auto inner = []() -> CoroTask<int> { co_return 42; };
-                *rp = co_await inner();
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g.wait_for(5s));
-        runner->join();
-        BEAST_EXPECT(result == 42);
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * CoroTask<T> propagates exceptions from inner coroutines.
-     * Outer coroutine catches via try/catch around co_await.
-     */
-    void
-    testValueException()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("value exception");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        bool caught = false;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [cp = &caught, gp = &g](auto) -> CoroTask<void> {
-                auto inner = []() -> CoroTask<int> {
-                    throw std::runtime_error("inner error");
-                    co_return 0;
-                };
-                try
-                {
-                    co_await inner();
-                }
-                catch (std::runtime_error const& e)
-                {
-                    *cp = true;
-                }
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g.wait_for(5s));
-        runner->join();
-        BEAST_EXPECT(caught);
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * CoroTask<T> chaining. Nested value-returning coroutines
-     * compose via co_await.
-     */
-    void
-    testValueChaining()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("value chaining");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        gate g;
-        int result = 0;
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [rp = &result, gp = &g](auto) -> CoroTask<void> {
-                auto add = [](int a, int b) -> CoroTask<int> { co_return a + b; };
-                auto mul = [add](int a, int b) -> CoroTask<int> {
-                    int sum = co_await add(a, b);
-                    co_return sum * 2;
-                };
-                *rp = co_await mul(3, 4);
-                gp->signal();
-                co_return;
-            });
-        BEAST_EXPECT(runner);
-        BEAST_EXPECT(g.wait_for(5s));
-        runner->join();
-        BEAST_EXPECT(result == 14);  // (3 + 4) * 2
-        BEAST_EXPECT(!runner->runnable());
-    }
-
-    /**
-     * postCoroTask returns nullptr when JobQueue is stopping.
-     */
-    void
-    testShutdownRejection()
-    {
-        using namespace std::chrono_literals;
-        using namespace jtx;
-
-        testcase("shutdown rejection");
-
-        Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
-            cfg->FORCE_MULTI_THREAD = true;
-            return cfg;
-        }));
-
-        // Stop the JobQueue
-        env.app().getJobQueue().stop();
-
-        auto runner = env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTaskTest", [](auto) -> CoroTask<void> { co_return; });
-        BEAST_EXPECT(!runner);
-    }
-
-    void
-    run() override
-    {
-        testVoidCompletion();
-        testCorrectOrder();
-        testIncorrectOrder();
-        testJobQueueAwaiter();
-        testThreadSpecificStorage();
-        testExceptionPropagation();
-        testMultipleYields();
-        testValueReturn();
-        testValueException();
-        testValueChaining();
-        testShutdownRejection();
-    }
-};
-
-BEAST_DEFINE_TESTSUITE(CoroTask, core, xrpl);
-
-}  // namespace test
-}  // namespace xrpl
--- a/src/test/core/Coroutine_test.cpp
+++ b/src/test/core/Coroutine_test.cpp
@@ -40,11 +40,6 @@ public:
        }
    };

-    // NOTE: All coroutine lambdas passed to postCoroTask use explicit
-    // pointer-by-value captures instead of [&] to work around a GCC 14
-    // bug where reference captures in coroutine lambdas are corrupted
-    // in the coroutine frame.
-
    void
    correct_order()
    {
@@ -59,15 +54,13 @@ public:
        }));

        gate g1, g2;
-        std::shared_ptr<JobQueue::CoroTaskRunner> c;
-        env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTest", [cp = &c, g1p = &g1, g2p = &g2](auto runner) -> CoroTask<void> {
-                *cp = runner;
-                g1p->signal();
-                co_await runner->suspend();
-                g2p->signal();
-                co_return;
-            });
+        std::shared_ptr<JobQueue::Coro> c;
+        env.app().getJobQueue().postCoro(jtCLIENT, "CoroTest", [&](auto const& cr) {
+            c = cr;
+            g1.signal();
+            c->yield();
+            g2.signal();
+        });
        BEAST_EXPECT(g1.wait_for(5s));
        c->join();
        c->post();
@@ -88,17 +81,11 @@ public:
        }));

        gate g;
-        env.app().getJobQueue().postCoroTask(
-            jtCLIENT, "CoroTest", [gp = &g](auto runner) -> CoroTask<void> {
-                // Schedule a resume before suspending.  The posted job
-                // cannot actually call resume() until the current resume()
-                // releases CoroTaskRunner::mutex_, which only happens after
-                // the coroutine suspends at co_await.
-                runner->post();
-                co_await runner->suspend();
-                gp->signal();
-                co_return;
-            });
+        env.app().getJobQueue().postCoro(jtCLIENT, "CoroTest", [&](auto const& c) {
+            c->post();
+            c->yield();
+            g.signal();
+        });
        BEAST_EXPECT(g.wait_for(5s));
    }

@@ -114,7 +101,7 @@ public:
        auto& jq = env.app().getJobQueue();

        static int const N = 4;
-        std::array<std::shared_ptr<JobQueue::CoroTaskRunner>, N> a;
+        std::array<std::shared_ptr<JobQueue::Coro>, N> a;

        LocalValue<int> lv(-1);
        BEAST_EXPECT(*lv == -1);
@@ -131,23 +118,19 @@ public:

        for (int i = 0; i < N; ++i)
        {
-            jq.postCoroTask(
-                jtCLIENT,
-                "CoroTest",
-                [this, ap = &a, gp = &g, lvp = &lv, id = i](auto runner) -> CoroTask<void> {
-                    (*ap)[id] = runner;
-                    gp->signal();
-                    co_await runner->suspend();
+            jq.postCoro(jtCLIENT, "CoroTest", [&, id = i](auto const& c) {
+                a[id] = c;
+                g.signal();
+                c->yield();

-                    this->BEAST_EXPECT(**lvp == -1);
-                    **lvp = id;
-                    this->BEAST_EXPECT(**lvp == id);
-                    gp->signal();
-                    co_await runner->suspend();
+                this->BEAST_EXPECT(*lv == -1);
+                *lv = id;
+                this->BEAST_EXPECT(*lv == id);
+                g.signal();
+                c->yield();

-                    this->BEAST_EXPECT(**lvp == id);
-                    co_return;
-                });
+                this->BEAST_EXPECT(*lv == id);
+            });
            BEAST_EXPECT(g.wait_for(5s));
            a[i]->join();
        }
--- a/src/test/core/JobQueue_test.cpp
+++ b/src/test/core/JobQueue_test.cpp
@@ -43,91 +43,87 @@ class JobQueue_test : public beast::unit_test::suite
        }
    }

-    // NOTE: All coroutine lambdas passed to postCoroTask use explicit
-    // pointer-by-value captures instead of [&] to work around a GCC 14
-    // bug where reference captures in coroutine lambdas are corrupted
-    // in the coroutine frame.
-
    void
-    testPostCoroTask()
+    testPostCoro()
    {
        jtx::Env env{*this};

        JobQueue& jQueue = env.app().getJobQueue();
        {
-            // Test repeated post()s until the coroutine completes.
+            // Test repeated post()s until the Coro completes.
            std::atomic<int> yieldCount{0};
-            auto const runner = jQueue.postCoroTask(
-                jtCLIENT, "PostCoroTest1", [ycp = &yieldCount](auto runner) -> CoroTask<void> {
-                    while (++(*ycp) < 4)
-                        co_await runner->suspend();
-                    co_return;
+            auto const coro = jQueue.postCoro(
+                jtCLIENT,
+                "PostCoroTest1",
+                [&yieldCount](std::shared_ptr<JobQueue::Coro> const& coroCopy) {
+                    while (++yieldCount < 4)
+                        coroCopy->yield();
                });
-            BEAST_EXPECT(runner != nullptr);
+            BEAST_EXPECT(coro != nullptr);

            // Wait for the Job to run and yield.
            while (yieldCount == 0)
                ;

-            // Now re-post until the CoroTaskRunner says it is done.
+            // Now re-post until the Coro says it is done.
            int old = yieldCount;
-            while (runner->runnable())
+            while (coro->runnable())
            {
-                BEAST_EXPECT(runner->post());
+                BEAST_EXPECT(coro->post());
                while (old == yieldCount)
                {
                }
-                runner->join();
+                coro->join();
                BEAST_EXPECT(++old == yieldCount);
            }
            BEAST_EXPECT(yieldCount == 4);
        }
        {
-            // Test repeated resume()s until the coroutine completes.
+            // Test repeated resume()s until the Coro completes.
            int yieldCount{0};
-            auto const runner = jQueue.postCoroTask(
-                jtCLIENT, "PostCoroTest2", [ycp = &yieldCount](auto runner) -> CoroTask<void> {
-                    while (++(*ycp) < 4)
-                        co_await runner->suspend();
-                    co_return;
+            auto const coro = jQueue.postCoro(
+                jtCLIENT,
+                "PostCoroTest2",
+                [&yieldCount](std::shared_ptr<JobQueue::Coro> const& coroCopy) {
+                    while (++yieldCount < 4)
+                        coroCopy->yield();
                });
-            if (!runner)
+            if (!coro)
            {
-                // There's no good reason we should not get a runner, but we
+                // There's no good reason we should not get a Coro, but we
                // can't continue without one.
                BEAST_EXPECT(false);
                return;
            }

            // Wait for the Job to run and yield.
-            runner->join();
+            coro->join();

-            // Now resume until the CoroTaskRunner says it is done.
+            // Now resume until the Coro says it is done.
            int old = yieldCount;
-            while (runner->runnable())
+            while (coro->runnable())
            {
-                runner->resume();  // Resume runs synchronously on this thread.
+                coro->resume();  // Resume runs synchronously on this thread.
                BEAST_EXPECT(++old == yieldCount);
            }
            BEAST_EXPECT(yieldCount == 4);
        }
        {
            // If the JobQueue is stopped, we should no
-            // longer be able to post a coroutine (and calling postCoroTask()
-            // should return nullptr).
+            // longer be able to add a Coro (and calling postCoro() should
+            // return false).
            using namespace std::chrono_literals;
            jQueue.stop();

-            // The coroutine should never run, so having it access this
+            // The Coro should never run, so having the Coro access this
            // unprotected variable on the stack should be completely safe.
            // Not recommended for the faint of heart...
            bool unprotected;
-            auto const runner = jQueue.postCoroTask(
-                jtCLIENT, "PostCoroTest3", [up = &unprotected](auto) -> CoroTask<void> {
-                    *up = false;
-                    co_return;
+            auto const coro = jQueue.postCoro(
+                jtCLIENT, "PostCoroTest3", [&unprotected](std::shared_ptr<JobQueue::Coro> const&) {
+                    unprotected = false;
                });
-            BEAST_EXPECT(runner == nullptr);
+            BEAST_EXPECT(coro == nullptr);
        }
    }

@@ -136,7 +132,7 @@ public:
    run() override
    {
        testAddJob();
-        testPostCoroTask();
+        testPostCoro();
    }
 };

--- a/src/test/jtx/impl/AMMTest.cpp
+++ b/src/test/jtx/impl/AMMTest.cpp
@@ -6,7 +6,6 @@

 #include <xrpld/rpc/RPCHandler.h>

-#include <xrpl/core/CoroTask.h>
 #include <xrpl/protocol/ApiVersion.h>
 #include <xrpl/protocol/STParsedJSON.h>
 #include <xrpl/resource/Fees.h>
@@ -194,6 +193,7 @@ AMMTest::find_paths_request(
         c,
         Role::USER,
         {},
+         {},
         RPC::apiVersionIfUnspecified},
        {},
        {}};
@@ -215,11 +215,11 @@ AMMTest::find_paths_request(

    Json::Value result;
    gate g;
-    app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
+    app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
        context.params = std::move(params);
+        context.coro = coro;
        RPC::doCommand(context, result);
        g.signal();
-        co_return;
    });

    using namespace std::chrono_literals;
--- a/src/xrpld/app/main/Application.cpp
+++ b/src/xrpld/app/main/Application.cpp
@@ -1425,6 +1425,7 @@ ApplicationImp::setup(boost::program_options::variables_map const& cmdline)
             c,
             Role::ADMIN,
             {},
+             {},
             RPC::apiMaximumSupportedVersion},
            jvCommand};

--- a/src/xrpld/app/main/GRPCServer.cpp
+++ b/src/xrpld/app/main/GRPCServer.cpp
@@ -3,7 +3,6 @@

 #include <xrpl/beast/core/CurrentThreadName.h>
 #include <xrpl/beast/net/IPAddressConversion.h>
-#include <xrpl/core/CoroTask.h>
 #include <xrpl/resource/Fees.h>

 namespace xrpl {
@@ -100,14 +99,13 @@ GRPCServerImpl::CallData<Request, Response>::process()
    // ensures that finished is always true when this CallData object
    // is returned as a tag in handleRpcs(), after sending the response
    finished_ = true;
-    auto runner = app_.getJobQueue().postCoroTask(
-        JobType::jtRPC, "gRPC-Client", [thisShared](auto) -> CoroTask<void> {
-            thisShared->processRequest();
-            co_return;
+    auto coro = app_.getJobQueue().postCoro(
+        JobType::jtRPC, "gRPC-Client", [thisShared](std::shared_ptr<JobQueue::Coro> coro) {
+            thisShared->process(coro);
        });

-    // If runner is null, then the JobQueue has already been shutdown
-    if (!runner)
+    // If coro is null, then the JobQueue has already been shutdown
+    if (!coro)
    {
        grpc::Status status{grpc::StatusCode::INTERNAL, "Job Queue is already stopped"};
        responder_.FinishWithError(status, this);
@@ -116,7 +114,7 @@ GRPCServerImpl::CallData<Request, Response>::process()

 template <class Request, class Response>
 void
-GRPCServerImpl::CallData<Request, Response>::processRequest()
+GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::Coro> coro)
 {
    try
    {
@@ -158,6 +156,7 @@ GRPCServerImpl::CallData<Request, Response>::processRequest()
                 app_.getLedgerMaster(),
                 usage,
                 role,
+                 coro,
                 InfoSub::pointer(),
                 apiVersion},
                request_};
--- a/src/xrpld/app/main/GRPCServer.h
+++ b/src/xrpld/app/main/GRPCServer.h
@@ -206,12 +206,9 @@ private:
        clone() override;

    private:
-        /**
-         * Process the gRPC request. Called inside the CoroTask lambda
-         * posted to the JobQueue by process().
-         */
+        // process the request. Called inside the coroutine passed to JobQueue
        void
-        processRequest();
+        process(std::shared_ptr<JobQueue::Coro> coro);

        // return load type of this RPC
        Resource::Charge
--- a/src/xrpld/rpc/Context.h
+++ b/src/xrpld/rpc/Context.h
@@ -3,6 +3,7 @@
 #include <xrpld/rpc/Role.h>

 #include <xrpl/beast/utility/Journal.h>
+#include <xrpl/core/JobQueue.h>
 #include <xrpl/server/InfoSub.h>

 namespace xrpl {
@@ -23,6 +24,7 @@ struct Context
    LedgerMaster& ledgerMaster;
    Resource::Consumer& consumer;
    Role role;
+    std::shared_ptr<JobQueue::Coro> coro{};
    InfoSub::pointer infoSub{};
    unsigned int apiVersion;
 };
--- a/src/xrpld/rpc/ServerHandler.h
+++ b/src/xrpld/rpc/ServerHandler.h
@@ -169,10 +169,13 @@ public:

 private:
    Json::Value
-    processSession(std::shared_ptr<WSSession> const& session, Json::Value const& jv);
+    processSession(
+        std::shared_ptr<WSSession> const& session,
+        std::shared_ptr<JobQueue::Coro> const& coro,
+        Json::Value const& jv);

    void
-    processSession(std::shared_ptr<Session> const&);
+    processSession(std::shared_ptr<Session> const&, std::shared_ptr<JobQueue::Coro> coro);

    void
    processRequest(
@@ -180,6 +183,7 @@ private:
        std::string const& request,
        beast::IP::Endpoint const& remoteIPAddress,
        Output&&,
+        std::shared_ptr<JobQueue::Coro> coro,
        std::string_view forwardedFor,
        std::string_view user);

--- a/src/xrpld/rpc/detail/ServerHandler.cpp
+++ b/src/xrpld/rpc/detail/ServerHandler.cpp
@@ -14,7 +14,6 @@
 #include <xrpl/basics/make_SSLContext.h>
 #include <xrpl/beast/net/IPAddressConversion.h>
 #include <xrpl/beast/rfc2616.h>
-#include <xrpl/core/CoroTask.h>
 #include <xrpl/core/JobQueue.h>
 #include <xrpl/json/json_reader.h>
 #include <xrpl/json/to_string.h>
@@ -285,10 +284,9 @@ ServerHandler::onRequest(Session& session)
    }

    std::shared_ptr<Session> detachedSession = session.detach();
-    auto const postResult = m_jobQueue.postCoroTask(
-        jtCLIENT_RPC, "RPC-Client", [this, detachedSession](auto) -> CoroTask<void> {
-            processSession(detachedSession);
-            co_return;
+    auto const postResult = m_jobQueue.postCoro(
+        jtCLIENT_RPC, "RPC-Client", [this, detachedSession](std::shared_ptr<JobQueue::Coro> coro) {
+            processSession(detachedSession, coro);
        });
    if (postResult == nullptr)
    {
@@ -324,18 +322,17 @@ ServerHandler::onWSMessage(

    JLOG(m_journal.trace()) << "Websocket received '" << jv << "'";

-    auto const postResult = m_jobQueue.postCoroTask(
+    auto const postResult = m_jobQueue.postCoro(
        jtCLIENT_WEBSOCKET,
        "WS-Client",
-        [this, session, jv = std::move(jv)](auto) -> CoroTask<void> {
-            auto const jr = this->processSession(session, jv);
+        [this, session, jv = std::move(jv)](std::shared_ptr<JobQueue::Coro> const& coro) {
+            auto const jr = this->processSession(session, coro, jv);
            auto const s = to_string(jr);
            auto const n = s.length();
            boost::beast::multi_buffer sb(n);
            sb.commit(boost::asio::buffer_copy(sb.prepare(n), boost::asio::buffer(s.c_str(), n)));
            session->send(std::make_shared<StreambufWSMsg<decltype(sb)>>(std::move(sb)));
            session->complete();
-            co_return;
        });
    if (postResult == nullptr)
    {
@@ -376,7 +373,10 @@ logDuration(Json::Value const& request, T const& duration, beast::Journal& journ
 }

 Json::Value
-ServerHandler::processSession(std::shared_ptr<WSSession> const& session, Json::Value const& jv)
+ServerHandler::processSession(
+    std::shared_ptr<WSSession> const& session,
+    std::shared_ptr<JobQueue::Coro> const& coro,
+    Json::Value const& jv)
 {
    auto is = std::static_pointer_cast<WSInfoSub>(session->appDefined);
    if (is->getConsumer().disconnect(m_journal))
@@ -443,6 +443,7 @@ ServerHandler::processSession(std::shared_ptr<WSSession> const& session, Json::V
                 app_.getLedgerMaster(),
                 is->getConsumer(),
                 role,
+                 coro,
                 is,
                 apiVersion},
                jv,
@@ -513,14 +514,18 @@ ServerHandler::processSession(std::shared_ptr<WSSession> const& session, Json::V
    return jr;
 }

+// Run as a coroutine.
 void
-ServerHandler::processSession(std::shared_ptr<Session> const& session)
+ServerHandler::processSession(
+    std::shared_ptr<Session> const& session,
+    std::shared_ptr<JobQueue::Coro> coro)
 {
    processRequest(
        session->port(),
        buffers_to_string(session->request().body().data()),
        session->remoteAddress().at_port(0),
        makeOutput(*session),
+        coro,
        forwardedFor(session->request()),
        [&] {
            auto const iter = session->request().find("X-User");
@@ -557,6 +562,7 @@ ServerHandler::processRequest(
    std::string const& request,
    beast::IP::Endpoint const& remoteIPAddress,
    Output&& output,
+    std::shared_ptr<JobQueue::Coro> coro,
    std::string_view forwardedFor,
    std::string_view user)
 {
@@ -813,6 +819,7 @@ ServerHandler::processRequest(
             app_.getLedgerMaster(),
             usage,
             role,
+             coro,
             InfoSub::pointer(),
             apiVersion},
            params,
--- a/src/xrpld/rpc/handlers/RipplePathFind.cpp
+++ b/src/xrpld/rpc/handlers/RipplePathFind.cpp
@@ -7,9 +7,6 @@
 #include <xrpl/protocol/RPCErr.h>
 #include <xrpl/resource/Fees.h>

-#include <condition_variable>
-#include <mutex>
-
 namespace xrpl {

 // This interface is deprecated.
@@ -40,31 +37,98 @@ doRipplePathFind(RPC::JsonContext& context)
        PathRequest::pointer request;
        lpLedger = context.ledgerMaster.getClosedLedger();

-        // makeLegacyPathRequest enqueues a path-finding job that runs
-        // asynchronously.  We block this thread with a condition_variable
-        // until the path-finding continuation signals completion.
-        // If makeLegacyPathRequest cannot schedule the job (e.g. during
-        // shutdown), it returns an empty request and we skip the wait.
-        std::mutex mtx;
-        std::condition_variable cv;
-        bool pathDone = false;
-
+        // It doesn't look like there's much odd happening here, but you should
+        // be aware this code runs in a JobQueue::Coro, which is a coroutine.
+        // And we may be flipping around between threads.  Here's an overview:
+        //
+        // 1. We're running doRipplePathFind() due to a call to
+        //    ripple_path_find.  doRipplePathFind() is currently running
+        //    inside of a JobQueue::Coro using a JobQueue thread.
+        //
+        // 2. doRipplePathFind's call to makeLegacyPathRequest() enqueues the
+        //    path-finding request.  That request will (probably) run at some
+        //    indeterminate future time on a (probably different) JobQueue
+        //    thread.
+        //
+        // 3. As a continuation from that path-finding JobQueue thread, the
+        //    coroutine we're currently running in (!) is posted to the
+        //    JobQueue.  Because it is a continuation, that post won't
+        //    happen until the path-finding request completes.
+        //
+        // 4. Once the continuation is enqueued, and we have reason to think
+        //    the path-finding job is likely to run, then the coroutine we're
+        //    running in yield()s.  That means it surrenders its thread in
+        //    the JobQueue.  The coroutine is suspended, but ready to run,
+        //    because it is kept resident by a shared_ptr in the
+        //    path-finding continuation.
+        //
+        // 5. If all goes well then path-finding runs on a JobQueue thread
+        //    and executes its continuation.  The continuation posts this
+        //    same coroutine (!) to the JobQueue.
+        //
+        // 6. When the JobQueue calls this coroutine, this coroutine resumes
+        //    from the line below the coro->yield() and returns the
+        //    path-finding result.
+        //
+        // With so many moving parts, what could go wrong?
+        //
+        // Just in terms of the JobQueue refusing to add jobs at shutdown
+        // there are two specific things that can go wrong.
+        //
+        // 1. The path-finding Job queued by makeLegacyPathRequest() might be
+        //    rejected (because we're shutting down).
+        //
+        //    Fortunately this problem can be addressed by looking at the
+        //    return value of makeLegacyPathRequest().  If
+        //    makeLegacyPathRequest() cannot get a thread to run the path-find
+        //    on, then it returns an empty request.
+        //
+        // 2. The path-finding job might run, but the Coro::post() might be
+        //    rejected by the JobQueue (because we're shutting down).
+        //
+        //    We handle this case by resuming (not posting) the Coro.
+        //    By resuming the Coro, we allow the Coro to run to completion
+        //    on the current thread instead of requiring that it run on a
+        //    new thread from the JobQueue.
+        //
+        // Both of these failure modes are hard to recreate in a unit test
+        // because they are so dependent on inter-thread timing.  However
+        // the failure modes can be observed by synchronously (inside the
+        // rippled source code) shutting down the application.  The code to
+        // do so looks like this:
+        //
+        //   context.app.signalStop();
+        //   while (! context.app.getJobQueue().jobCounter().joined()) { }
+        //
+        // The first line starts the process of shutting down the app.
+        // The second line waits until no more jobs can be added to the
+        // JobQueue before letting the thread continue.
+        //
+        // May 2017
        jvResult = context.app.getPathRequests().makeLegacyPathRequest(
            request,
-            [&]() {
+            [&context]() {
+                // Copying the shared_ptr keeps the coroutine alive up
+                // through the return.  Otherwise the storage under the
+                // captured reference could evaporate when we return from
+                // coroCopy->resume().  This is not strictly necessary, but
+                // will make maintenance easier.
+                std::shared_ptr<JobQueue::Coro> coroCopy{context.coro};
+                if (!coroCopy->post())
                {
-                    std::lock_guard lk(mtx);
-                    pathDone = true;
+                    // The post() failed, so we won't get a thread to let
+                    // the Coro finish.  We'll call Coro::resume() so the
+                    // Coro can finish on our thread.  Otherwise the
+                    // application will hang on shutdown.
+                    coroCopy->resume();
                }
-                cv.notify_one();
            },
            context.consumer,
            lpLedger,
            context.params);
        if (request)
        {
-            std::unique_lock lk(mtx);
-            cv.wait(lk, [&] { return pathDone; });
+            context.coro->yield();
            jvResult = request->doStatus(context.params);
        }