mirror of
https://github.com/XRPLF/rippled.git
synced 2026-04-29 15:37:57 +00:00
Introduces task list documents for Phases 2 through 5, with Tempo references (replacing Jaeger) and Task 2.8 dashboard parity spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
239 lines
10 KiB
Markdown
239 lines
10 KiB
Markdown
# Phase 3: Transaction Tracing Task List
|
|
|
|
> **Goal**: Trace the full transaction lifecycle from RPC submission through peer relay, including cross-node context propagation via Protocol Buffer extensions. This is the WALK phase that demonstrates true distributed tracing.
|
|
>
|
|
> **Scope**: Protocol Buffer `TraceContext` message, context serialization, PeerImp transaction instrumentation, NetworkOPs processing instrumentation, HashRouter visibility, and multi-node relay context propagation.
|
|
>
|
|
> **Branch**: `pratik/otel-phase3-tx-tracing` (from `pratik/otel-phase2-rpc-tracing`)
|
|
|
|
### Related Plan Documents
|
|
|
|
| Document | Relevance |
|
|
| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------ |
|
|
| [04-code-samples.md](./04-code-samples.md) | TraceContext protobuf (§4.4.1), PeerImp instrumentation (§4.5.1), context serialization (§4.4.2) |
|
|
| [01-architecture-analysis.md](./01-architecture-analysis.md) | Transaction flow (§1.3), key trace points (§1.6) |
|
|
| [06-implementation-phases.md](./06-implementation-phases.md) | Phase 3 tasks (§6.4), definition of done (§6.11.3) |
|
|
| [02-design-decisions.md](./02-design-decisions.md) | Context propagation design (§2.5), attribute schema (§2.4.3) |
|
|
|
|
---
|
|
|
|
## Task 3.1: Define TraceContext Protocol Buffer Message
|
|
|
|
**Objective**: Add trace context fields to the P2P protocol messages so trace IDs can propagate across nodes.
|
|
|
|
**What to do**:
|
|
|
|
- Edit `include/xrpl/proto/xrpl.proto` (or `src/ripple/proto/ripple.proto`, wherever the proto is):
|
|
- Add `TraceContext` message definition:
|
|
```protobuf
|
|
message TraceContext {
|
|
bytes trace_id = 1; // 16-byte trace identifier
|
|
bytes span_id = 2; // 8-byte span identifier
|
|
uint32 trace_flags = 3; // bit 0 = sampled
|
|
string trace_state = 4; // W3C tracestate value
|
|
}
|
|
```
|
|
- Add `optional TraceContext trace_context = 1001;` to:
|
|
- `TMTransaction`
|
|
- `TMProposeSet` (for Phase 4 use)
|
|
- `TMValidation` (for Phase 4 use)
|
|
- Use high field numbers (1001+) to avoid conflicts with existing fields
|
|
|
|
- Regenerate protobuf C++ code
|
|
|
|
**Key modified files**:
|
|
|
|
- `include/xrpl/proto/xrpl.proto` (or equivalent)
|
|
|
|
**Reference**:
|
|
|
|
- [04-code-samples.md §4.4.1](./04-code-samples.md) — TraceContext message definition
|
|
- [02-design-decisions.md §2.5.2](./02-design-decisions.md) — Protocol buffer context propagation design
|
|
|
|
---
|
|
|
|
## Task 3.2: Implement Protobuf Context Serialization
|
|
|
|
**Objective**: Create utilities to serialize/deserialize OTel trace context to/from protobuf `TraceContext` messages.
|
|
|
|
**What to do**:
|
|
|
|
- Create `include/xrpl/telemetry/TraceContextPropagator.h` (extend from Phase 2 if exists, or add protobuf methods):
|
|
- Add protobuf-specific methods:
|
|
- `static Context extractFromProtobuf(protocol::TraceContext const& proto)` — reconstruct OTel context from protobuf fields
|
|
- `static void injectToProtobuf(Context const& ctx, protocol::TraceContext& proto)` — serialize current span context into protobuf fields
|
|
- Both methods guard behind `#ifdef XRPL_ENABLE_TELEMETRY`
|
|
|
|
- Create/extend `src/libxrpl/telemetry/TraceContextPropagator.cpp`:
|
|
- Implement extraction: read trace_id (16 bytes), span_id (8 bytes), trace_flags from protobuf, construct `SpanContext`, wrap in `Context`
|
|
- Implement injection: get current span from context, serialize its TraceId, SpanId, and TraceFlags into protobuf fields
|
|
|
|
**Key new/modified files**:
|
|
|
|
- `include/xrpl/telemetry/TraceContextPropagator.h`
|
|
- `src/libxrpl/telemetry/TraceContextPropagator.cpp`
|
|
|
|
**Reference**:
|
|
|
|
- [04-code-samples.md §4.4.2](./04-code-samples.md) — Full extract/inject implementation
|
|
|
|
---
|
|
|
|
## Task 3.3: Instrument PeerImp Transaction Handling
|
|
|
|
**Objective**: Add trace spans to the peer-level transaction receive and relay path.
|
|
|
|
**What to do**:
|
|
|
|
- Edit `src/xrpld/overlay/detail/PeerImp.cpp`:
|
|
- In `onMessage(TMTransaction)` / `handleTransaction()`:
|
|
- Extract parent trace context from incoming `TMTransaction::trace_context` field (if present)
|
|
- Create `tx.receive` span as child of extracted context (or new root if none)
|
|
- Set attributes: `xrpl.tx.hash`, `xrpl.peer.id`, `xrpl.tx.status`
|
|
- On HashRouter suppression (duplicate): set `xrpl.tx.suppressed=true`, add `tx.duplicate` event
|
|
- Wrap validation call with child span `tx.validate`
|
|
- Wrap relay with `tx.relay` span
|
|
- When relaying to peers:
|
|
- Inject current trace context into outgoing `TMTransaction::trace_context`
|
|
- Set `xrpl.tx.relay_count` attribute
|
|
|
|
- Include `TracingInstrumentation.h` and use `XRPL_TRACE_TX` macro
|
|
|
|
**Key modified files**:
|
|
|
|
- `src/xrpld/overlay/detail/PeerImp.cpp`
|
|
|
|
**Reference**:
|
|
|
|
- [04-code-samples.md §4.5.1](./04-code-samples.md) — Full PeerImp instrumentation example
|
|
- [01-architecture-analysis.md §1.3](./01-architecture-analysis.md) — Transaction flow diagram
|
|
- [01-architecture-analysis.md §1.6](./01-architecture-analysis.md) — tx.receive trace point
|
|
|
|
---
|
|
|
|
## Task 3.4: Instrument NetworkOPs Transaction Processing
|
|
|
|
**Objective**: Trace the transaction processing pipeline in NetworkOPs, covering both sync and async paths.
|
|
|
|
**What to do**:
|
|
|
|
- Edit `src/xrpld/app/misc/NetworkOPs.cpp`:
|
|
- In `processTransaction()`:
|
|
- Create `tx.process` span
|
|
- Set attributes: `xrpl.tx.hash`, `xrpl.tx.type`, `xrpl.tx.local` (whether from RPC or peer)
|
|
- Record whether sync or async path is taken
|
|
|
|
- In `doTransactionAsync()`:
|
|
- Capture parent context before queuing
|
|
- Create `tx.queue` span with queue depth attribute
|
|
- Add event when transaction is dequeued for processing
|
|
|
|
- In `doTransactionSync()`:
|
|
- Create `tx.process_sync` span
|
|
- Record result (applied, queued, rejected)
|
|
|
|
**Key modified files**:
|
|
|
|
- `src/xrpld/app/misc/NetworkOPs.cpp`
|
|
|
|
**Reference**:
|
|
|
|
- [01-architecture-analysis.md §1.6](./01-architecture-analysis.md) — tx.validate and tx.process trace points
|
|
- [02-design-decisions.md §2.4.3](./02-design-decisions.md) — Transaction attribute schema
|
|
|
|
---
|
|
|
|
## Task 3.5: Instrument HashRouter for Dedup Visibility
|
|
|
|
**Objective**: Make transaction deduplication visible in traces by recording HashRouter decisions as span attributes/events.
|
|
|
|
**What to do**:
|
|
|
|
- Edit `src/xrpld/overlay/detail/PeerImp.cpp` (in handleTransaction):
|
|
- After calling `HashRouter::shouldProcess()` or `addSuppressionPeer()`:
|
|
- Record `xrpl.tx.suppressed` attribute (true/false)
|
|
- Record `xrpl.tx.flags` showing current HashRouter state (SAVED, TRUSTED, etc.)
|
|
- Add `tx.first_seen` or `tx.duplicate` event
|
|
|
|
- This is NOT a modification to HashRouter itself — just recording its decisions as span attributes in the existing PeerImp instrumentation from Task 3.3.
|
|
|
|
**Key modified files**:
|
|
|
|
- `src/xrpld/overlay/detail/PeerImp.cpp` (same changes as 3.3, logically grouped)
|
|
|
|
---
|
|
|
|
## Task 3.6: Context Propagation in Transaction Relay
|
|
|
|
**Objective**: Ensure trace context flows correctly when transactions are relayed between peers, creating linked spans across nodes.
|
|
|
|
**What to do**:
|
|
|
|
- Verify the relay path injects trace context:
|
|
- When `PeerImp` relays a transaction, the `TMTransaction` message should carry `trace_context`
|
|
- When a remote peer receives it, the context is extracted and used as parent
|
|
|
|
- Test context propagation:
|
|
- Manually verify with 2+ node setup that trace IDs match across nodes
|
|
- Confirm parent-child span relationships are correct in Tempo
|
|
|
|
- Handle edge cases:
|
|
- Missing trace context (older peers): create new root span
|
|
- Corrupted trace context: log warning, create new root span
|
|
- Sampled-out traces: respect trace flags
|
|
|
|
**Key modified files**:
|
|
|
|
- `src/xrpld/overlay/detail/PeerImp.cpp`
|
|
- `src/xrpld/overlay/detail/OverlayImpl.cpp` (if relay method needs context param)
|
|
|
|
**Reference**:
|
|
|
|
- [02-design-decisions.md §2.5](./02-design-decisions.md) — Context propagation design
|
|
- [04-code-samples.md §4.5.1](./04-code-samples.md) — Relay context injection pattern
|
|
|
|
---
|
|
|
|
## Task 3.7: Build Verification and Testing
|
|
|
|
**Objective**: Verify all Phase 3 changes compile and work correctly.
|
|
|
|
**What to do**:
|
|
|
|
1. Build with `telemetry=ON` — verify no compilation errors
|
|
2. Build with `telemetry=OFF` — verify no regressions
|
|
3. Run existing unit tests
|
|
4. Verify protobuf regeneration produces correct C++ code
|
|
5. Document any issues encountered
|
|
|
|
**Verification Checklist**:
|
|
|
|
- [ ] Protobuf changes generate valid C++
|
|
- [ ] Build succeeds with telemetry ON
|
|
- [ ] Build succeeds with telemetry OFF
|
|
- [ ] Existing tests pass
|
|
- [ ] No undefined symbols from new telemetry calls
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
| Task | Description | New Files | Modified Files | Depends On |
|
|
| ---- | ----------------------------------- | --------- | -------------- | ---------- |
|
|
| 3.1 | TraceContext protobuf message | 0 | 1 | Phase 2 |
|
|
| 3.2 | Protobuf context serialization | 1-2 | 0 | 3.1 |
|
|
| 3.3 | PeerImp transaction instrumentation | 0 | 1 | 3.2 |
|
|
| 3.4 | NetworkOPs transaction processing | 0 | 1 | Phase 2 |
|
|
| 3.5 | HashRouter dedup visibility | 0 | 1 | 3.3 |
|
|
| 3.6 | Relay context propagation | 0 | 1-2 | 3.3, 3.5 |
|
|
| 3.7 | Build verification and testing | 0 | 0 | 3.1-3.6 |
|
|
|
|
**Parallel work**: Tasks 3.1 and 3.4 can start in parallel. Task 3.2 depends on 3.1. Tasks 3.3 and 3.5 depend on 3.2. Task 3.6 depends on 3.3 and 3.5.
|
|
|
|
**Exit Criteria** (from [06-implementation-phases.md §6.11.3](./06-implementation-phases.md)):
|
|
|
|
- [ ] Transaction traces span across nodes
|
|
- [ ] Trace context in Protocol Buffer messages
|
|
- [ ] HashRouter deduplication visible in traces
|
|
- [ ] <5% overhead on transaction throughput
|