10 KiB
Phase 3: Transaction Tracing Task List
Goal: Trace the full transaction lifecycle from RPC submission through peer relay, including cross-node context propagation via Protocol Buffer extensions. This is the WALK phase that demonstrates true distributed tracing.
Scope: Protocol Buffer
TraceContextmessage, context serialization, PeerImp transaction instrumentation, NetworkOPs processing instrumentation, HashRouter visibility, and multi-node relay context propagation.Branch:
pratik/otel-phase3-tx-tracing(frompratik/otel-phase2-rpc-tracing)
Related Plan Documents
| Document | Relevance |
|---|---|
| 04-code-samples.md | TraceContext protobuf (§4.4.1), PeerImp instrumentation (§4.5.1), context serialization (§4.4.2) |
| 01-architecture-analysis.md | Transaction flow (§1.3), key trace points (§1.6) |
| 06-implementation-phases.md | Phase 3 tasks (§6.4), definition of done (§6.11.3) |
| 02-design-decisions.md | Context propagation design (§2.5), attribute schema (§2.4.3) |
Task 3.1: Define TraceContext Protocol Buffer Message
Objective: Add trace context fields to the P2P protocol messages so trace IDs can propagate across nodes.
What to do:
-
Edit
include/xrpl/proto/xrpl.proto(orsrc/ripple/proto/ripple.proto, wherever the proto is):- Add
TraceContextmessage definition:message TraceContext { bytes trace_id = 1; // 16-byte trace identifier bytes span_id = 2; // 8-byte span identifier uint32 trace_flags = 3; // bit 0 = sampled string trace_state = 4; // W3C tracestate value } - Add
optional TraceContext trace_context = 1001;to:TMTransactionTMProposeSet(for Phase 4 use)TMValidation(for Phase 4 use)
- Use high field numbers (1001+) to avoid conflicts with existing fields
- Add
-
Regenerate protobuf C++ code
Key modified files:
include/xrpl/proto/xrpl.proto(or equivalent)
Reference:
- 04-code-samples.md §4.4.1 — TraceContext message definition
- 02-design-decisions.md §2.5.2 — Protocol buffer context propagation design
Task 3.2: Implement Protobuf Context Serialization
Objective: Create utilities to serialize/deserialize OTel trace context to/from protobuf TraceContext messages.
What to do:
-
Create
include/xrpl/telemetry/TraceContextPropagator.h(extend from Phase 2 if exists, or add protobuf methods):- Add protobuf-specific methods:
static Context extractFromProtobuf(protocol::TraceContext const& proto)— reconstruct OTel context from protobuf fieldsstatic void injectToProtobuf(Context const& ctx, protocol::TraceContext& proto)— serialize current span context into protobuf fields
- Both methods guard behind
#ifdef XRPL_ENABLE_TELEMETRY
- Add protobuf-specific methods:
-
Create/extend
src/libxrpl/telemetry/TraceContextPropagator.cpp:- Implement extraction: read trace_id (16 bytes), span_id (8 bytes), trace_flags from protobuf, construct
SpanContext, wrap inContext - Implement injection: get current span from context, serialize its TraceId, SpanId, and TraceFlags into protobuf fields
- Implement extraction: read trace_id (16 bytes), span_id (8 bytes), trace_flags from protobuf, construct
Key new/modified files:
include/xrpl/telemetry/TraceContextPropagator.hsrc/libxrpl/telemetry/TraceContextPropagator.cpp
Reference:
- 04-code-samples.md §4.4.2 — Full extract/inject implementation
Task 3.3: Instrument PeerImp Transaction Handling
Objective: Add trace spans to the peer-level transaction receive and relay path.
What to do:
-
Edit
src/xrpld/overlay/detail/PeerImp.cpp:- In
onMessage(TMTransaction)/handleTransaction():- Extract parent trace context from incoming
TMTransaction::trace_contextfield (if present) - Create
tx.receivespan as child of extracted context (or new root if none) - Set attributes:
xrpl.tx.hash,xrpl.peer.id,xrpl.tx.status - On HashRouter suppression (duplicate): set
xrpl.tx.suppressed=true, addtx.duplicateevent - Wrap validation call with child span
tx.validate - Wrap relay with
tx.relayspan
- Extract parent trace context from incoming
- When relaying to peers:
- Inject current trace context into outgoing
TMTransaction::trace_context - Set
xrpl.tx.relay_countattribute
- Inject current trace context into outgoing
- In
-
Include
TracingInstrumentation.hand useXRPL_TRACE_TXmacro
Key modified files:
src/xrpld/overlay/detail/PeerImp.cpp
Reference:
- 04-code-samples.md §4.5.1 — Full PeerImp instrumentation example
- 01-architecture-analysis.md §1.3 — Transaction flow diagram
- 01-architecture-analysis.md §1.6 — tx.receive trace point
Task 3.4: Instrument NetworkOPs Transaction Processing
Objective: Trace the transaction processing pipeline in NetworkOPs, covering both sync and async paths.
What to do:
- Edit
src/xrpld/app/misc/NetworkOPs.cpp:-
In
processTransaction():- Create
tx.processspan - Set attributes:
xrpl.tx.hash,xrpl.tx.type,xrpl.tx.local(whether from RPC or peer) - Record whether sync or async path is taken
- Create
-
In
doTransactionAsync():- Capture parent context before queuing
- Create
tx.queuespan with queue depth attribute - Add event when transaction is dequeued for processing
-
In
doTransactionSync():- Create
tx.process_syncspan - Record result (applied, queued, rejected)
- Create
-
Key modified files:
src/xrpld/app/misc/NetworkOPs.cpp
Reference:
- 01-architecture-analysis.md §1.6 — tx.validate and tx.process trace points
- 02-design-decisions.md §2.4.3 — Transaction attribute schema
Task 3.5: Instrument HashRouter for Dedup Visibility
Objective: Make transaction deduplication visible in traces by recording HashRouter decisions as span attributes/events.
What to do:
-
Edit
src/xrpld/overlay/detail/PeerImp.cpp(in handleTransaction):- After calling
HashRouter::shouldProcess()oraddSuppressionPeer():- Record
xrpl.tx.suppressedattribute (true/false) - Record
xrpl.tx.flagsshowing current HashRouter state (SAVED, TRUSTED, etc.) - Add
tx.first_seenortx.duplicateevent
- Record
- After calling
-
This is NOT a modification to HashRouter itself — just recording its decisions as span attributes in the existing PeerImp instrumentation from Task 3.3.
Key modified files:
src/xrpld/overlay/detail/PeerImp.cpp(same changes as 3.3, logically grouped)
Task 3.6: Context Propagation in Transaction Relay
Objective: Ensure trace context flows correctly when transactions are relayed between peers, creating linked spans across nodes.
What to do:
-
Verify the relay path injects trace context:
- When
PeerImprelays a transaction, theTMTransactionmessage should carrytrace_context - When a remote peer receives it, the context is extracted and used as parent
- When
-
Test context propagation:
- Manually verify with 2+ node setup that trace IDs match across nodes
- Confirm parent-child span relationships are correct in Jaeger
-
Handle edge cases:
- Missing trace context (older peers): create new root span
- Corrupted trace context: log warning, create new root span
- Sampled-out traces: respect trace flags
Key modified files:
src/xrpld/overlay/detail/PeerImp.cppsrc/xrpld/overlay/detail/OverlayImpl.cpp(if relay method needs context param)
Reference:
- 02-design-decisions.md §2.5 — Context propagation design
- 04-code-samples.md §4.5.1 — Relay context injection pattern
Task 3.7: Build Verification and Testing
Objective: Verify all Phase 3 changes compile and work correctly.
What to do:
- Build with
telemetry=ON— verify no compilation errors - Build with
telemetry=OFF— verify no regressions - Run existing unit tests
- Verify protobuf regeneration produces correct C++ code
- Document any issues encountered
Verification Checklist:
- Protobuf changes generate valid C++
- Build succeeds with telemetry ON
- Build succeeds with telemetry OFF
- Existing tests pass
- No undefined symbols from new telemetry calls
Summary
| Task | Description | New Files | Modified Files | Depends On |
|---|---|---|---|---|
| 3.1 | TraceContext protobuf message | 0 | 1 | Phase 2 |
| 3.2 | Protobuf context serialization | 1-2 | 0 | 3.1 |
| 3.3 | PeerImp transaction instrumentation | 0 | 1 | 3.2 |
| 3.4 | NetworkOPs transaction processing | 0 | 1 | Phase 2 |
| 3.5 | HashRouter dedup visibility | 0 | 1 | 3.3 |
| 3.6 | Relay context propagation | 0 | 1-2 | 3.3, 3.5 |
| 3.7 | Build verification and testing | 0 | 0 | 3.1-3.6 |
Parallel work: Tasks 3.1 and 3.4 can start in parallel. Task 3.2 depends on 3.1. Tasks 3.3 and 3.5 depend on 3.2. Task 3.6 depends on 3.3 and 3.5.
Exit Criteria (from 06-implementation-phases.md §6.11.3):
- Transaction traces span across nodes
- Trace context in Protocol Buffer messages
- HashRouter deduplication visible in traces
- <5% overhead on transaction throughput