SpanGuard::span() hardcoded SpanKind::kInternal for every span. Tempo's
service-graph and spanmetrics RED calculations rely on kServer /
kConsumer / kClient / kProducer to classify inbound vs outbound vs
internal operations. With kInternal everywhere, the service graph
collapses to a single self-loop and RED metrics attribute all latency
to internal work.
Add categoryToSpanKind() mapping:
- Rpc -> kServer (inbound synchronous request)
- Peer -> kConsumer (inbound async peer message)
- Transactions -> kInternal
- Consensus -> kInternal
- Ledger -> kInternal
Only the single-argument overload is affected; childSpan / linkedSpan
continue to default to kInternal because they represent in-process
continuations of an already-kinded parent.
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
dest_account, fast, search_level, num_complete_paths, num_paths,
num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes so gauges register in Prometheus (via StatsD) even when their
initial/steady-state value is 0:
1. StatsDGaugeImpl m_dirty: default-init to true so the initial value
(0) is emitted on the first flush. Previously, gauges whose value
never changed from 0 were never flushed and never appeared
downstream.
2. io_latency_sampler firstSample_: new atomic<bool>, init true.
m_event.notify now fires when either firstSample_ is true (exchanged
to false) or lastSample >= 10 ms. This guarantees the io_latency
metric is registered on startup; subsequent sub-10 ms samples are
still suppressed to avoid flooding.
Set defaults for tx_span::attr::suppressed (false) and
tx_span::attr::status ("new") immediately after creating the txReceive
span. Without defaults, spans whose suppressed/status attributes would
only be set in the HashRouter-suppressed branch lacked these attributes
entirely, producing incomplete span data in downstream stores.
The suppressed branch still overrides these when the transaction has
already been seen via HashRouter.
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture
Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp
Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
09-data-collection-reference.md and telemetry-runbook.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[nodiscard]] to getConsensusTraceStrategy, getYays, getNays
- Add missing <string>, SpanGuard.h, SpanNames.h includes
- Fix widening cast placement (cast before arithmetic, not after)
- Replace nested ternary with lambda for const dir variable
- Add braces to if/else-if chains in Consensus.h
- Concatenate nested namespaces in ConsensusSpanNames.h
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Concatenate nested namespaces in SpanNames.h, RpcSpanNames.h, GrpcSpanNames.h
- Remove unused InfoSub.h and NetworkOPs.h includes from RPCHandler.cpp
- Add missing <string_view> includes in RPCHandler.cpp and GRPCServer.cpp
- Replace nested ternary with if/else-if in RPCHandler.cpp
- Add IWYU pragma keep for json_body.h in ServerHandler.cpp
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clang-tidy misc-include-cleaner requires direct includes for all used
symbols. Application.cpp calls toBase58(TokenType::NodePublic, ...) at
line 1359 but did not directly include PublicKey.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add [[maybe_unused]] to RAII spans in TxQ.cpp
- Include Telemetry.h in RCLConsensus.cpp for complete type
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add cons_span::event namespace with disputeResolve and txIncluded
constants; replace hardcoded strings in Consensus.h and RCLConsensus.cpp
- Move proposal.receive and validation.receive spans in PeerImp into
shared_ptr captured by job lambdas so they measure checkPropose and
checkValidation timing, not just message parsing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining Phase 4/4a consensus tracing tasks:
- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()
Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 4-arg hashSpan overload was duplicated during a prior rebase
cascade — it appeared at both line 240 and line 305 in SpanGuard.cpp.
This would cause a linker error (multiple definition).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record the close time voting threshold and consensus state on
consensus.update_positions and consensus.check spans:
- xrpl.consensus.close_time_threshold: the avCT_CONSENSUS_PCT (75%)
threshold required for close time agreement
- xrpl.consensus.have_close_time_consensus: whether validators
reached close time consensus in this iteration
These attributes enable dashboards to show how the close time
voting process converges (or stalls) across consensus iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the span-replacement logic in startRoundTracing() that was
discarding the hash-derived round span and replacing it with a linked
span (which gets a random trace_id). The deterministic trace_id from
the ledger hash is the key feature enabling cross-node correlation —
replacing it broke correlation on all rounds after the first.
Also: use thread_local mt19937 for hashSpan() span IDs (same fix as
phase-3 txSpan), add Doxygen to establish tracing method declarations
in Consensus.h, and update SpanGuard.h diagram with hashSpan/addEvent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.
- Add TraceBytes struct and SpanGuard::getTraceBytes() for
extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
with sender's trace context as parent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- tx.receive span in PeerImp: convert to shared_ptr, capture in
checkTransaction lambda so it measures actual processing, not just
message parsing
- tx.process span in NetworkOPs: convert to shared_ptr, store in
TransactionStatus so it lives until the batch job processes the entry;
sync path unchanged (span destructs on function return)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace SpanGuard::txSpan(prefix, name, hash) with the generic
SpanGuard::hashSpan(TraceCategory, name, hash) that accepts a
TraceCategory parameter instead of hardcoding Transactions. This
enables reuse for consensus round spans (Phase 4) and any future
subsystem needing deterministic cross-node trace correlation via
hash-derived trace IDs.
Both overloads are replaced:
- hashSpan(cat, name, hash, size) — standalone with random span_id
- hashSpan(cat, name, hash, size, parentSpanId, parentSize, flags)
— with remote parent from protobuf context propagation
Add full span name constants (tx_span::receive, tx_span::process)
to TxSpanNames.h following the ConsensusSpanNames.h pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace thread_local mt19937 with xrpl::default_prng() for span ID
generation — uses the project's existing thread-local xor-shift engine.
One call yields a uint64_t (8 bytes), filling the span ID in a single
memcpy without loops.
Fix compilation failure when XRPL_ENABLE_TELEMETRY is not defined:
move xrpl.pb.h include outside the #ifdef guard in TxTracing.h since
protocol::TMTransaction is used unconditionally in the function
signature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-call std::random_device with thread_local std::mt19937 in
txSpan() for span ID generation. random_device is ~423x slower due to
/dev/urandom syscalls on each construction; mt19937 is seeded once per
thread and reused for all subsequent span IDs.
Update the SpanGuard class ASCII diagram to include txSpan factory
methods that were added in the hash-derived trace ID commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>