Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive trace_id from txHash[0:16] so all nodes handling the same
transaction produce spans under the same trace. Protobuf span_id
propagation provides parent-child relay ordering when available.
- Add SpanGuard::txSpan() factory methods (hash-derived trace ID)
- Add TxTracing.h helpers: txReceiveSpan(), txProcessSpan()
- Update PeerImp and NetworkOPs to use the new helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move node health attribute strings to compile-time constants in
SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState)
- Add Tempo search filters for node health attributes
- Remove unnecessary .c_str() on strOperatingMode() return
- Add samplingRatio clamping test (values > 1.0 and < 0.0)
- Fix Task 2.3 status: delivered in Phase 1c, not Phase 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace rpcSpan(), txSpan(), consensusSpan(), peerSpan(), ledgerSpan()
with a single span(TraceCategory, prefix, name) factory method. Adding
a new traceable subsystem now requires only a new enum value and one
switch case — no new methods or header changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SpanContext::isValid(): add inline no-op when XRPL_ENABLE_TELEMETRY
is not defined, preventing a linker error if called in that path
- linkedSpan(): set kIsRootSpanKey on the StartSpanOptions parent
context so linked spans start a genuinely independent sub-tree
instead of silently becoming children of the current active span
- Telemetry::instance_: use std::atomic with acquire/release ordering
to avoid a data race between start()/stop() and factory methods
called from worker threads
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add span discard mechanism that drops unwanted spans before they enter
the batch export queue, saving both network bandwidth and storage.
FilteringSpanProcessor is a custom SpanProcessor decorator that wraps
BatchSpanProcessor. SpanGuard::discard() sets a thread-local flag
(tl_discardCurrentSpan) before calling Span::End(). The OTel SDK calls
OnEnd() synchronously on the same thread, where the flag is checked and
cleared to drop the span.
New file: DiscardFlag.h — zero-dependency header for the thread-local
flag, avoiding transitive include bloat from Telemetry.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This change:
* Removes a set of unnecessary brackets in the initialization of an `std::uint32_t`.
* Fixes a couple of incorrect flags (same value, just wrong variables - so no amendment needed).
This change replaces all instances of `<variable> != tesSUCCESS` with `!isTesSuccess(<variable>)` and `<variable> == tesSUCCESS` with `isTesSuccess(<variable>)`.
This change:
* Makes `addSLE` in `DIDSet` a static function, instead of a free function.
* Renames `Attestation` to `Data` everywhere (an artifact of a previous name for the field).
* Actually runs a set of tests that were not included in the `run` function of `DID_test`.
This change:
* Introduces a new helper function on `STTx`, `getFeePayer`.
* Removes the usage of `mSourceBalance` and replaces it with SLE balance lookups.
* Renames `mPriorBalance` to `preFeeBalance_`
This simplifies some of the code in the transactors and makes it a lot more readable.
Throwing exceptions from code sometime confuses ASAN, as it cannot keep track of stack frames. This change therefore adds a macro to skip instrumentation around the `Throw` function.
Per [XLS-0095](https://xls.xrpl.org/xls/XLS-0095-rename-rippled-to-xrpld.html), we are taking steps to rename ripple(d) to xrpl(d). This change modifies the system name from `rippled` to `xrpld`.
The system name is used in limited places:
* When no explicit config file is passed via the `--config` flag, then the system name is used to construct the path where the config file and database may be stored, via the `$XDG_CONFIG_HOME` and `$XDG_DATA_HOME` directories, respectively.
* It is used in the metadata and user-agent as part of RPC calls.
* It is newly used in the full version string.
ASAN wasn't able to keep track of `boost::coroutine` context switches, and would lead to many false positives being detected. By switching to `boost::coroutine2` and `ucontext`, ASAN is able to know about the context switches advertised by the `boost::fiber` class, which in turn leads to more cleaner ASAN analysis.
The `HTTPClient` class initializes a global SSL context via `initializeSSLContext()`. However, it had no way to release it, which caused memory leaks flagged by the LeakSanitizer. Multiple LSAN suppressions in the sanitizers' suppressions file were masking these leaks. Our test code also manually called `initializeSSLContext()` in each test without guaranteed cleanup on failure paths.
This change fixes these memory leaks by adding a `cleanupSSLContext()` method to properly release the global SSL context, and removes the corresponding LSAN suppressions. The change further refactors the `HTTPClient` tests to use a Google Test fixture (`HTTPClientTest`) that manages the SSL context lifecycle via RAII (SetUp/TearDown), making it impossible for tests to leak the context.