Add full consensus tracing with deterministic trace ID correlation and establish-phase instrumentation: - Deterministic trace_id from previousLedger.id() for cross-node correlation (switchable via consensus_trace_strategy config) - Round-to-round span links (follows-from) for causal chaining - Establish phase spans with convergence tracking, dispute resolution events, and threshold escalation attributes - Validation spans with links to round spans (thread-safe via roundSpanContext_ snapshot for jtACCEPT cross-thread access) - Mode change spans for proposing/observing transitions - New startSpan overload with span links in Telemetry interface - XRPL_TRACE_ADD_EVENT macro with do-while(0) safety wrapper - Config validation for consensus_trace_strategy - Test adaptor (csf::Peer) updated with getTelemetry() stub Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 KiB
Phase 4: Consensus Tracing Task List
Goal: Full observability into consensus rounds — track round lifecycle, phase transitions, proposal handling, and validation. This is the RUN phase that completes the distributed tracing story.
Scope: RCLConsensus instrumentation for round starts, phase transitions (open/establish/accept), proposal send/receive, validation handling, and correlation with transaction traces from Phase 3.
Branch:
pratik/otel-phase4-consensus-tracing(frompratik/otel-phase3-tx-tracing)
Related Plan Documents
| Document | Relevance |
|---|---|
| 04-code-samples.md | Consensus instrumentation (§4.5.2), consensus span patterns |
| 01-architecture-analysis.md | Consensus round flow (§1.4), key trace points (§1.6) |
| 06-implementation-phases.md | Phase 4 tasks (§6.5), definition of done (§6.11.4) |
| 02-design-decisions.md | Consensus attribute schema (§2.4.4) |
Task 4.1: Instrument Consensus Round Start
Objective: Create a root span for each consensus round that captures the round's key parameters.
What to do:
-
Edit
src/xrpld/app/consensus/RCLConsensus.cpp:- In
RCLConsensus::startRound()(or the Adaptor's startRound):- Create
consensus.roundspan usingXRPL_TRACE_CONSENSUSmacro - Set attributes:
xrpl.consensus.ledger.prev— previous ledger hashxrpl.consensus.ledger.seq— target ledger sequencexrpl.consensus.proposers— number of trusted proposersxrpl.consensus.mode— "proposing" or "observing"
- Store the span context for use by child spans in phase transitions
- Create
- In
-
Add a member to hold current round trace context:
opentelemetry::context::Context currentRoundContext_(guarded by#ifdef)- Updated at round start, used by phase transition spans
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cppsrc/xrpld/app/consensus/RCLConsensus.h(add context member)
Reference:
- 04-code-samples.md §4.5.2 — startRound instrumentation example
- 01-architecture-analysis.md §1.4 — Consensus round flow
Task 4.2: Instrument Phase Transitions
Objective: Create child spans for each consensus phase (open, establish, accept) to show timing breakdown.
What to do:
- Edit
src/xrpld/app/consensus/RCLConsensus.cpp:-
Identify where phase transitions occur (the
Consensus<Adaptor>template drives this) -
For each phase entry:
- Create span as child of
currentRoundContext_:consensus.phase.open,consensus.phase.establish,consensus.phase.accept - Set
xrpl.consensus.phaseattribute - Add
phase.enterevent at start,phase.exitevent at end - Record phase duration in milliseconds
- Create span as child of
-
In the
onCloseadaptor method:- Create
consensus.ledger_closespan - Set attributes: close_time, mode, transaction count in initial position
- Create
-
Note: The Consensus template class in
src/xrpld/consensus/Consensus.hdrives phase transitions — Phase 4a instruments directly in the template
-
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cpp- Possibly
include/xrpl/consensus/Consensus.h(for template-level phase tracking)
Reference:
- 04-code-samples.md §4.5.2 — phaseTransition instrumentation
Task 4.3: Instrument Proposal Handling
Objective: Trace proposal send and receive to show validator coordination.
What to do:
- Edit
src/xrpld/app/consensus/RCLConsensus.cpp:-
In
Adaptor::propose():- Create
consensus.proposal.sendspan - Set attributes:
xrpl.consensus.round(proposal sequence), proposal hash - Inject trace context into outgoing
TMProposeSet::trace_context(from Phase 3 protobuf)
- Create
-
In
Adaptor::peerProposal()(or wherever peer proposals are received):- Extract trace context from incoming
TMProposeSet::trace_context - Create
consensus.proposal.receivespan as child of extracted context - Set attributes:
xrpl.consensus.proposer(node ID),xrpl.consensus.round
- Extract trace context from incoming
-
In
Adaptor::share(RCLCxPeerPos):- Create
consensus.proposal.relayspan for relaying peer proposals
- Create
-
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cpp
Reference:
- 04-code-samples.md §4.5.2 — peerProposal instrumentation
- 02-design-decisions.md §2.4.4 — Consensus attribute schema
Task 4.4: Instrument Validation Handling
Objective: Trace validation send and receive to show ledger validation flow.
What to do:
- Edit
src/xrpld/app/consensus/RCLConsensus.cpp(or the validation handler):-
When sending our validation:
- Create
consensus.validation.sendspan - Set attributes: validated ledger hash, sequence, signing time
- Create
-
When receiving a peer validation:
- Extract trace context from
TMValidation::trace_context(if present) - Create
consensus.validation.receivespan - Set attributes:
xrpl.consensus.validator(node ID), ledger hash
- Extract trace context from
-
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cppsrc/xrpld/app/misc/NetworkOPs.cpp(if validation handling is here)
Task 4.5: Add Consensus-Specific Attributes
Objective: Enrich consensus spans with detailed attributes for debugging and analysis.
What to do:
- Review all consensus spans and ensure they include:
xrpl.consensus.ledger.seq— target ledger sequence numberxrpl.consensus.round— consensus round numberxrpl.consensus.mode— proposing/observing/wrongLedgerxrpl.consensus.phase— current phase namexrpl.consensus.phase_duration_ms— time spent in phasexrpl.consensus.proposers— number of trusted proposersxrpl.consensus.tx_count— transactions in proposed setxrpl.consensus.disputes— number of disputed transactionsxrpl.consensus.converge_percent— convergence percentage
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cpp
Task 4.6: Correlate Transaction and Consensus Traces
Objective: Link transaction traces from Phase 3 with consensus traces so you can follow a transaction from submission through consensus into the ledger.
What to do:
-
In
onClose()oronAccept():- When building the consensus position, link the round span to individual transaction spans using span links (if OTel SDK supports it) or events
- At minimum, record the transaction hashes included in the consensus set as span events:
tx.includedwithxrpl.tx.hashattribute
-
In
processTransactionSet()(NetworkOPs):- If the consensus round span context is available, create child spans for each transaction applied to the ledger
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cppsrc/xrpld/app/misc/NetworkOPs.cpp
Task 4.7: Build Verification and Testing
Objective: Verify all Phase 4 changes compile and don't affect consensus timing.
What to do:
- Build with
telemetry=ON— verify no compilation errors - Build with
telemetry=OFF— verify no regressions (critical for consensus code) - Run existing consensus-related unit tests
- Verify that all macros expand to no-ops when disabled
- Check that no consensus-critical code paths are affected by instrumentation overhead
Verification Checklist:
- Build succeeds with telemetry ON
- Build succeeds with telemetry OFF
- Existing consensus tests pass
- No new includes in consensus headers when telemetry is OFF
- Phase timing instrumentation doesn't use blocking operations
Summary
| Task | Description | New Files | Modified Files | Depends On |
|---|---|---|---|---|
| 4.1 | Consensus round start instrumentation | 0 | 2 | Phase 3 |
| 4.2 | Phase transition instrumentation | 0 | 1-2 | 4.1 |
| 4.3 | Proposal handling instrumentation | 0 | 1 | 4.1 |
| 4.4 | Validation handling instrumentation | 0 | 1-2 | 4.1 |
| 4.5 | Consensus-specific attributes | 0 | 1 | 4.2, 4.3, 4.4 |
| 4.6 | Transaction-consensus correlation | 0 | 2 | 4.2, Phase 3 |
| 4.7 | Build verification and testing | 0 | 0 | 4.1-4.6 |
Parallel work: Tasks 4.2, 4.3, and 4.4 can run in parallel after 4.1 is complete. Task 4.5 depends on all three. Task 4.6 depends on 4.2 and Phase 3.
Implemented Spans
| Span Name | Method | Key Attributes |
|---|---|---|
consensus.proposal.send |
Adaptor::propose |
xrpl.consensus.round |
consensus.ledger_close |
Adaptor::onClose |
xrpl.consensus.ledger.seq, xrpl.consensus.mode |
consensus.accept |
Adaptor::onAccept |
xrpl.consensus.proposers, xrpl.consensus.round_time_ms |
consensus.accept.apply |
Adaptor::doAccept |
xrpl.consensus.close_time, close_time_correct, close_resolution_ms, state, proposing, round_time_ms, ledger.seq |
consensus.validation.send |
Adaptor::onAccept (via validate) |
xrpl.consensus.proposing |
Close Time Attributes (consensus.accept.apply)
The consensus.accept.apply span captures ledger close time agreement details
driven by avCT_CONSENSUS_PCT (75% validator agreement threshold):
xrpl.consensus.close_time— Agreed-upon ledger close time (epoch seconds). When validators disagree (consensusCloseTime == epoch), this is synthetically set toprevCloseTime + 1s.xrpl.consensus.close_time_correct—trueif validators reached agreement,falseif they "agreed to disagree" (close time forced to prev+1s).xrpl.consensus.close_resolution_ms— Rounding granularity for close time (starts at 30s, decreases as ledger interval stabilizes).xrpl.consensus.state—"finished"(normal) or"moved_on"(consensus failed, adopted best available).xrpl.consensus.proposing— Whether this node was proposing.xrpl.consensus.round_time_ms— Total consensus round duration.
Exit Criteria (from 06-implementation-phases.md §6.11.4):
- Complete consensus round traces
- Phase transitions visible
- Proposals and validations traced
- Close time agreement tracked (per
avCT_CONSENSUS_PCT) - No impact on consensus timing
Phase 4a: Establish-Phase Gap Fill & Cross-Node Correlation
Goal: Fill tracing gaps in the consensus establish phase (disputes, convergence, threshold escalation, mode changes) and establish cross-node correlation using a deterministic shared trace ID derived from
previousLedger.id().Approach: Direct instrumentation in
Consensus.h— the generic consensus template has full access to internal state (convergePercent_,result_->disputes,mode_, threshold logic). Telemetry access comes via a single new adaptor methodgetTelemetry(). Long-lived spans (round, establish) are stored as class members usingSpanGuarddirectly — NOT theXRPL_TRACE_*convenience macros (which create local variables named_xrpl_guard_). Short-lived scoped spans (update_positions, check) can use the macros. All code compiles to no-ops whenXRPL_ENABLE_TELEMETRYis not defined.Branch:
pratik/otel-phase4-consensus-tracing
Design: Switchable Correlation Strategy
Two strategies for cross-node trace correlation, switchable via config:
Strategy A — Deterministic Trace ID (Default)
Derive trace_id = SHA256(previousLedger.id())[0:16] so all nodes in the same
consensus round share the same trace_id without P2P context propagation.
- Pros: All nodes appear in the same trace in Tempo/Jaeger automatically. No collector-side post-processing needed.
- Cons: Overrides OTel's random trace_id generation; requires custom
IdGeneratoror manual span context construction.
Strategy B — Attribute-Based Correlation
Use normal random trace_id but attach xrpl.consensus.ledger_id as an attribute
on every consensus span. Correlation happens at query time via Tempo/Grafana
by attribute queries.
- Pros: Standard OTel trace_id semantics; no SDK customization.
- Cons: Cross-node correlation requires query-time joins, not automatic.
Config
[telemetry]
# "deterministic" (default) or "attribute"
consensus_trace_strategy=deterministic
Implementation
In RCLConsensus::Adaptor::startRound():
- If
deterministic:- Compute
trace_id_bytes = SHA256(prevLedgerID)[0:16] - Construct
opentelemetry::trace::TraceId(trace_id_bytes) - Create a synthetic
SpanContextwith this trace_id and a random span_id:auto traceId = opentelemetry::trace::TraceId(trace_id_bytes); auto spanId = opentelemetry::trace::SpanId(random_8_bytes); auto syntheticCtx = opentelemetry::trace::SpanContext( traceId, spanId, opentelemetry::trace::TraceFlags(1), false); - Wrap in
opentelemetry::context::Contextviaopentelemetry::trace::SetSpan(context, syntheticSpan) - Call
startSpan("consensus.round", parentContext)so the new span inherits the deterministic trace_id.
- Compute
- If
attribute: start a normalconsensus.roundspan, setxrpl.consensus.ledger_id = previousLedger.id()as attribute.
Both strategies always set xrpl.consensus.round_id (round number) and
xrpl.consensus.ledger_id (previous ledger hash) as attributes.
Design: Span Hierarchy
consensus.round (root — created in RCLConsensus::startRound, closed at accept)
│ link → previous round's SpanContext (follows-from)
│
├── consensus.establish (phaseEstablish → acceptance, in Consensus.h)
│ ├── consensus.update_positions (each updateOurPositions call)
│ │ └── consensus.dispute.resolve (per-tx dispute resolution event)
│ ├── consensus.check (each haveConsensus call)
│ └── consensus.mode_change (short-lived span in adaptor on mode transition)
│
├── consensus.accept (existing onAccept span — reparented under round)
│
└── consensus.validation.send (existing — reparented, follows-from link to round)
Span Links (follows-from relationships)
| Link Source | Link Target | Rationale |
|---|---|---|
consensus.round (N+1) |
consensus.round (N) |
Causal chain: round N+1 exists because round N accepted |
consensus.validation.send |
consensus.round |
Validation follows from the round that produced it; may outlive the round span |
| (Phase 4b) Received proposal processing | Sender's consensus.round |
Cross-node causal link via P2P context propagation |
Task 4a.0: Prerequisites — Extend SpanGuard and Telemetry APIs
Objective: Add missing API surface needed by later tasks.
What to do:
-
Add
SpanGuard::addEvent()with attributes (needed by Task 4a.5): The currentaddEvent(string_view name)only accepts a name. Add an overload that accepts key-value attributes:void addEvent(std::string_view name, std::initializer_list< std::pair<opentelemetry::nostd::string_view, opentelemetry::common::AttributeValue>> attributes) { span_->AddEvent(std::string(name), attributes); } -
Add a
Telemetry::startSpan()overload that accepts span links (needed by Tasks 4a.2, 4a.8): The currentstartSpan()has no span link support. Add an overload that accepts a vector ofSpanContextlinks for follows-from relationships:virtual opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> startSpan( std::string_view name, opentelemetry::context::Context const& parentContext, std::vector<opentelemetry::trace::SpanContext> const& links, opentelemetry::trace::SpanKind kind = opentelemetry::trace::SpanKind::kInternal) = 0; -
Add
XRPL_TRACE_ADD_EVENTmacro (needed by Task 4a.5): Add toTracingInstrumentation.hto exposeaddEvent(name, attrs)through the macro interface (consistent withXRPL_TRACE_SET_ATTRpattern):#ifdef XRPL_ENABLE_TELEMETRY #define XRPL_TRACE_ADD_EVENT(name, ...) \ if (_xrpl_guard_.has_value()) \ { \ _xrpl_guard_->addEvent(name, __VA_ARGS__); \ } #else #define XRPL_TRACE_ADD_EVENT(name, ...) ((void)0) #endif
Key modified files:
include/xrpl/telemetry/SpanGuard.h— addaddEvent()overloadinclude/xrpl/telemetry/Telemetry.h— addstartSpan()with linkssrc/xrpld/telemetry/Telemetry.cpp— implement new overloadsrc/xrpld/telemetry/NullTelemetry.cpp— no-op implementationsrc/xrpld/telemetry/TracingInstrumentation.h— addXRPL_TRACE_ADD_EVENTmacro
Task 4a.1: Adaptor getTelemetry() Method
Objective: Give Consensus.h access to the telemetry subsystem without
coupling the generic template to OTel headers.
What to do:
- Add
getTelemetry()method to the Adaptor concept (returnsxrpl::telemetry::Telemetry&). The return type is already forward-declared behind#ifdef XRPL_ENABLE_TELEMETRY. - Implement in
RCLConsensus::Adaptor— delegates toapp_.getTelemetry(). - In
Consensus.h, theXRPL_TRACE_*macros calladaptor_.getTelemetry()— when telemetry is disabled, the macros expand to((void)0)and the method is never called.
Key modified files:
src/xrpld/app/consensus/RCLConsensus.h— declaregetTelemetry()src/xrpld/app/consensus/RCLConsensus.cpp— implementgetTelemetry()
Task 4a.2: Switchable Round Span with Deterministic Trace ID
Objective: Create a consensus.round root span in startRound() that uses
the switchable correlation strategy. Store span context as a member for child
spans in Consensus.h.
What to do:
-
In
RCLConsensus::Adaptor::startRound()(or a new helper):- Read
consensus_trace_strategyfrom config. - Deterministic: compute
trace_id = SHA256(prevLedgerID)[0:16]. Construct aSpanContextwith this trace_id, then startconsensus.roundspan as child of that context. - Attribute: start normal
consensus.roundspan. - Set attributes on both:
xrpl.consensus.round_id,xrpl.consensus.ledger_id,xrpl.consensus.ledger.seq,xrpl.consensus.mode. - Store the round span in
Consensusas a member (see Task 4a.3). - If a previous round's span context is available, add a span link (follows-from) to establish the round chain.
- Read
-
Add
createDeterministicTraceId(hash)utility toinclude/xrpl/telemetry/Telemetry.h(returns 16-byte trace ID from a 256-bit hash by truncation). -
Add
consensus_trace_strategytoTelemetry::SetupandTelemetryConfig.cppparser:/** Cross-node correlation strategy: "deterministic" or "attribute". */ std::string consensusTraceStrategy = "deterministic";
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cppinclude/xrpl/telemetry/Telemetry.h—createDeterministicTraceId()src/xrpld/telemetry/TelemetryConfig.cpp— parse new config option
Task 4a.3: Span Members in Consensus.h
Objective: Add span storage to the Consensus class so that spans created
in startRound() (adaptor) are accessible from phaseEstablish(),
updateOurPositions(), and haveConsensus() (template methods).
What to do:
- Add to
Consensusprivate members (guarded by#ifdef XRPL_ENABLE_TELEMETRY):#ifdef XRPL_ENABLE_TELEMETRY std::optional<xrpl::telemetry::SpanGuard> roundSpan_; std::optional<xrpl::telemetry::SpanGuard> establishSpan_; opentelemetry::context::Context prevRoundContext_; #endif roundSpan_is created instartRound()via the adaptor and stored. ItsSpanGuard::Scopemember keeps the span active on the thread context for the entire round lifetime.establishSpan_is created when entering phaseEstablish and cleared on accept. It becomes a child ofroundSpan_via OTel's thread-local context propagation.prevRoundContext_stores the previous round's context for follows-from links.
Threading assumption: startRound(), phaseEstablish(), updateOurPositions(),
and haveConsensus() all run on the same thread (the consensus job queue thread).
This is required for the SpanGuard::Scope-based parent-child hierarchy to work.
The Consensus class documentation confirms it is NOT thread-safe and calls are
serialized by the application.
- Add conditional include at top of
Consensus.h:#ifdef XRPL_ENABLE_TELEMETRY #include <xrpl/telemetry/SpanGuard.h> #include <xrpld/telemetry/TracingInstrumentation.h> #endif
Key modified files:
src/xrpld/consensus/Consensus.h
Task 4a.4: Instrument phaseEstablish()
Objective: Create consensus.establish span wrapping the establish phase,
with attributes for convergence progress.
What to do:
-
At the start of
phaseEstablish()(line 1298), ifestablishSpan_is not yet created, create it as child ofroundSpan_using the direct API (NOT theXRPL_TRACE_CONSENSUSmacro, which creates a local variable):#ifdef XRPL_ENABLE_TELEMETRY if (!establishSpan_ && adaptor_.getTelemetry().shouldTraceConsensus()) { establishSpan_.emplace( adaptor_.getTelemetry().startSpan("consensus.establish")); } #endif -
Set attributes on each call:
xrpl.consensus.converge_percent—convergePercent_xrpl.consensus.establish_count—establishCounter_xrpl.consensus.proposers—currPeerPositions_.size()
-
On phase exit (transition to accept), close the establish span and record final duration.
Key modified files:
src/xrpld/consensus/Consensus.h—phaseEstablish()method
Task 4a.5: Instrument updateOurPositions()
Objective: Trace each position update cycle including dispute resolution details.
What to do:
-
At the start of
updateOurPositions()(line 1418), create a scoped child span. This method is called and returns within a singlephaseEstablish()call, so theXRPL_TRACE_CONSENSUSmacro works here (scoped local):XRPL_TRACE_CONSENSUS(adaptor_.getTelemetry(), "consensus.update_positions"); -
Set attributes:
xrpl.consensus.disputes_count—result_->disputes.size()xrpl.consensus.converge_percent— current convergencexrpl.consensus.proposers_agreed— count of peers with same positionxrpl.consensus.proposers_total— total peer positions
-
Inside the dispute resolution loop, for each dispute that changes our vote, add an event with attributes using
XRPL_TRACE_ADD_EVENT(from Task 4a.0):XRPL_TRACE_ADD_EVENT("dispute.resolve", { {"xrpl.tx.id", std::string(tx_id)}, {"xrpl.dispute.our_vote", our_vote}, {"xrpl.dispute.yays", static_cast<int64_t>(yays)}, {"xrpl.dispute.nays", static_cast<int64_t>(nays)} });
Key modified files:
src/xrpld/consensus/Consensus.h—updateOurPositions()method
Task 4a.6: Instrument haveConsensus() (Threshold & Convergence)
Objective: Trace consensus checking including threshold escalation
(ConsensusParms::AvalancheState::{init, mid, late, stuck}).
What to do:
-
At the start of
haveConsensus()(line 1598), create a scoped child span:XRPL_TRACE_CONSENSUS(adaptor_.getTelemetry(), "consensus.check"); -
Set attributes:
xrpl.consensus.agree_count— peers that agree with our positionxrpl.consensus.disagree_count— peers that disagreexrpl.consensus.converge_percent— convergence percentagexrpl.consensus.result— ConsensusState result (Yes/No/MovedOn)
-
The free function
checkConsensus()inConsensus.cpp(line 151) determines thresholds based oncurrentAgreeTime. Threshold values come fromConsensusParms::avalancheCutoffs(defined inConsensusParms.h). The escalation states areConsensusParms::AvalancheState::{init, mid, late, stuck}. Record the effective threshold as an attribute on the span:xrpl.consensus.threshold_percent— current threshold fromavalancheCutoffs
Key modified files:
src/xrpld/consensus/Consensus.h—haveConsensus()method
Task 4a.7: Instrument Mode Changes
Objective: Trace consensus mode transitions (proposing ↔ observing, wrongLedger, switchedLedger).
What to do:
Mode changes are rare (typically 0-1 per round), so a standalone short-lived span is appropriate (not an event). This captures timing of the mode change itself.
-
In
RCLConsensus::Adaptor::onModeChange(), create a scoped span:XRPL_TRACE_CONSENSUS(app_.getTelemetry(), "consensus.mode_change"); XRPL_TRACE_SET_ATTR("xrpl.consensus.mode.old", to_string(before).c_str()); XRPL_TRACE_SET_ATTR("xrpl.consensus.mode.new", to_string(after).c_str()); -
Note:
MonitoredMode::set()(line 304 inConsensus.h) callsadaptor_.onModeChange(before, after)— so the span is created in the adaptor, which already has telemetry access. No instrumentation needed inConsensus.hfor this task.
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cpp—onModeChange()
Task 4a.8: Reparent Existing Spans Under Round
Objective: Make existing consensus spans (consensus.accept,
consensus.accept.apply, consensus.validation.send) children of the
consensus.round root span instead of being standalone.
What to do:
- The existing spans in
onAccept(),doAccept(), andvalidate()useXRPL_TRACE_CONSENSUS(app_.getTelemetry(), ...)which creates standalone spans on the current thread's context. - After Task 4a.2 creates the round span and stores it, these methods run on the same thread within the round span's scope, so they automatically become children. Verify this works correctly.
- For
consensus.validation.send: add a span link (follows-from) to the round span context, since the validation may be processed after the round completes.
Key modified files:
src/xrpld/app/consensus/RCLConsensus.cpp— verify parent-child hierarchy
Task 4a.9: Build Verification and Testing
Objective: Verify all Phase 4a changes compile cleanly with telemetry ON and OFF, and don't affect consensus timing.
What to do:
- Build with
telemetry=ON— verify no compilation errors - Build with
telemetry=OFF— verify macros expand to no-ops, no new includes leak intoConsensus.hwhen disabled - Run existing consensus unit tests
- Verify
#ifdef XRPL_ENABLE_TELEMETRYguards on all new members inConsensus.h - Run
pcclpre-commit checks
Verification Checklist:
- Build succeeds with telemetry ON
- Build succeeds with telemetry OFF
- Existing consensus tests pass
Consensus.hhas zero OTel includes when telemetry is OFF- No new virtual calls in hot consensus paths
pcclpasses
Phase 4a Summary
| Task | Description | New Files | Modified Files | Depends On |
|---|---|---|---|---|
| 4a.0 | Prerequisites: extend SpanGuard & Telemetry APIs | 0 | 4 | Phase 4 |
| 4a.1 | Adaptor getTelemetry() method |
0 | 2 | Phase 4 |
| 4a.2 | Switchable round span with deterministic traceID | 0 | 3 | 4a.0, 4a.1 |
| 4a.3 | Span members in Consensus.h |
0 | 1 | 4a.1 |
| 4a.4 | Instrument phaseEstablish() |
0 | 1 | 4a.3 |
| 4a.5 | Instrument updateOurPositions() |
0 | 1 | 4a.0, 4a.3 |
| 4a.6 | Instrument haveConsensus() (thresholds) |
0 | 1 | 4a.3 |
| 4a.7 | Instrument mode changes | 0 | 1 | 4a.1 |
| 4a.8 | Reparent existing spans under round | 0 | 1 | 4a.0, 4a.2 |
| 4a.9 | Build verification and testing | 0 | 0 | 4a.0-4a.8 |
Parallel work: Tasks 4a.0 and 4a.1 can run in parallel. Tasks 4a.4, 4a.5, 4a.6, and 4a.7 can run in parallel after 4a.3 (and 4a.0 for 4a.5).
New Spans (Phase 4a)
| Span Name | Location | Key Attributes |
|---|---|---|
consensus.round |
RCLConsensus.cpp |
round_id, ledger_id, ledger.seq, mode; link → prev round |
consensus.establish |
Consensus.h |
converge_percent, establish_count, proposers |
consensus.update_positions |
Consensus.h |
disputes_count, converge_percent, proposers_agreed, proposers_total |
consensus.check |
Consensus.h |
agree_count, disagree_count, converge_percent, result, threshold_percent |
consensus.mode_change |
RCLConsensus.cpp |
mode.old, mode.new |
New Events (Phase 4a)
| Event Name | Parent Span | Attributes |
|---|---|---|
dispute.resolve |
consensus.update_positions |
tx_id, our_vote, yays, nays |
New Attributes (Phase 4a)
// Round-level (on consensus.round)
"xrpl.consensus.round_id" = int64 // Consensus round number
"xrpl.consensus.ledger_id" = string // previousLedger.id() hash
"xrpl.consensus.trace_strategy" = string // "deterministic" or "attribute"
// Establish-level
"xrpl.consensus.converge_percent" = int64 // Convergence % (0-100+)
"xrpl.consensus.establish_count" = int64 // Number of establish iterations
"xrpl.consensus.disputes_count" = int64 // Active disputes
"xrpl.consensus.proposers_agreed" = int64 // Peers agreeing with us
"xrpl.consensus.proposers_total" = int64 // Total peer positions
"xrpl.consensus.agree_count" = int64 // Peers that agree (haveConsensus)
"xrpl.consensus.disagree_count" = int64 // Peers that disagree
"xrpl.consensus.threshold_percent" = int64 // Current threshold (50/65/70/95)
"xrpl.consensus.result" = string // "yes", "no", "moved_on"
// Mode change
"xrpl.consensus.mode.old" = string // Previous mode
"xrpl.consensus.mode.new" = string // New mode
Implementation Notes
- Separation of concerns: All non-trivial telemetry code extracted to private
helpers (
startRoundTracing,createValidationSpan,startEstablishTracing,updateEstablishTracing,endEstablishTracing). Business logic methods contain only single-line#ifdefblocks calling these helpers. - Thread safety:
createValidationSpan()runs on the jtACCEPT worker thread. Instead of accessingroundSpan_across threads, aroundSpanContext_snapshot (lightweightSpanContextvalue type) is captured on the consensus thread instartRoundTracing()and read bycreateValidationSpan(). The job queue provides the happens-before guarantee. - Macro safety:
XRPL_TRACE_ADD_EVENTusesdo { } while (0)to prevent dangling-else issues. - Config validation:
consensus_trace_strategyis validated to be either"deterministic"or"attribute", falling back to"deterministic"for unrecognised values. - Plan deviation:
roundSpan_is stored inRCLConsensus::Adaptor(notConsensus.h) because the adaptor has access to telemetry config and can implement the deterministic trace ID strategy.establishSpan_is correctly inConsensus.has planned.
Phase 4b: Cross-Node Propagation (Future — Documentation Only)
Goal: Wire
TraceContextPropagatorfor P2P messages so that proposals and validations carry trace context between nodes. This enables true distributed tracing where a proposal sent by Node A creates a child span on Node B.Status: NOT IMPLEMENTED. The protobuf fields and propagator class exist but are not wired. This section documents the design for future work.
Architecture
Node A (proposing) Node B (receiving)
───────────────── ──────────────────
consensus.round consensus.round
├── propose() ├── peerProposal()
│ └── TraceContextPropagator │ └── TraceContextPropagator
│ ::injectToProtobuf( │ ::extractFromProtobuf(
│ TMProposeSet.trace_context) │ TMProposeSet.trace_context)
│ │ └── span link → Node A's context
└── validate() └── onValidation()
└── inject into TMValidation └── extract from TMValidation
Wiring Points
| Message | Inject Location | Extract Location | Protobuf Field |
|---|---|---|---|
TMProposeSet |
Adaptor::propose() |
PeerImp::onMessage(TMProposeSet) |
field 1001: TraceContext |
TMValidation |
Adaptor::validate() |
PeerImp::onMessage(TMValidation) |
field 1001: TraceContext |
TMTransaction |
NetworkOPs::processTransaction() |
PeerImp::onMessage(TMTransaction) |
field 1001: TraceContext |
Span Link Semantics
Received messages use span links (follows-from), NOT parent-child:
- The receiver's processing span links to the sender's context
- This preserves each node's independent trace tree
- Cross-node correlation visible via linked traces in Tempo/Jaeger
Interaction with Deterministic Trace ID (Strategy A)
When using deterministic trace_id (Phase 4a default), cross-node spans already share the same trace_id. P2P propagation adds span-level linking:
- Without propagation: spans from different nodes appear in the same trace (same trace_id) but without parent-child or follows-from relationships.
- With propagation: spans have explicit links showing which proposal/validation from Node A caused processing on Node B.
Prerequisites
- Phase 4a (this task list) — establish phase tracing must be in place
TraceContextPropagatorclass (already exists ininclude/xrpl/telemetry/TraceContextPropagator.h)- Protobuf
TraceContextmessage (already exists, field 1001)