Replace thread_local mt19937 with xrpl::default_prng() for span ID
generation — uses the project's existing thread-local xor-shift engine.
One call yields a uint64_t (8 bytes), filling the span ID in a single
memcpy without loops.
Fix compilation failure when XRPL_ENABLE_TELEMETRY is not defined:
move xrpl.pb.h include outside the #ifdef guard in TxTracing.h since
protocol::TMTransaction is used unconditionally in the function
signature.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-call std::random_device with thread_local std::mt19937 in
txSpan() for span ID generation. random_device is ~423x slower due to
/dev/urandom syscalls on each construction; mt19937 is seeded once per
thread and reused for all subsequent span IDs.
Update the SpanGuard class ASCII diagram to include txSpan factory
methods that were added in the hash-derived trace ID commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move TxSpanNames.h and TxQSpanNames.h from src/xrpld/telemetry/ to sit
next to the classes they instrument, matching the PathFindSpanNames.h
convention.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Derive trace_id from txHash[0:16] so all nodes handling the same
transaction produce spans under the same trace. Protobuf span_id
propagation provides parent-child relay ordering when available.
- Add SpanGuard::txSpan() factory methods (hash-derived trace ID)
- Add TxTracing.h helpers: txReceiveSpan(), txProcessSpan()
- Update PeerImp and NetworkOPs to use the new helpers
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trace_id = txHash[0:16] strategy so all nodes handling the same
transaction independently produce spans under the same trace_id,
combined with protobuf span_id propagation for parent-child ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move scattered string literals from PeerImp.cpp and NetworkOPs.cpp into
compile-time constants in src/xrpld/telemetry/TxSpanNames.h. Follows
the same StaticStr/join() pattern established in Phase 1c for RPC spans.
Constants cover: span prefixes (tx), operations (receive, process),
attribute keys (hash, local, path, suppressed, status, peerId,
peerVersion), and values (sync, async, knownBad).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to old XRPL_TRACE_TX/CONSENSUS macros with
SpanGuard::span(TraceCategory, ...) factory calls introduced in Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the path finding subsystem with full span coverage:
- pathfind.request: wraps doPathFind() and doRipplePathFind() RPC handlers
- pathfind.compute: wraps PathRequest::doUpdate() with fast/normal attr
- pathfind.update_all: wraps PathRequestManager::updateAll() on ledger
close with ledger_index attr
- pathfind.discover: wraps Pathfinder::findPaths() graph exploration
with search_level attr
- pathfind.rank: wraps Pathfinder::computePathRanks() liquidity
validation with num_paths attr
New file: PathFindSpanNames.h with compile-time constants following
the StaticStr/join() pattern from Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move node health attribute strings to compile-time constants in
SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState)
- Add Tempo search filters for node health attributes
- Remove unnecessary .c_str() on strOperatingMode() return
- Add samplingRatio clamping test (values > 1.0 and < 0.0)
- Fix Task 2.3 status: delivered in Phase 1c, not Phase 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- serviceName default is "xrpld" not "rippled"
- Remove references to nonexistent exporterType field
- Pass networkId (4th param) to setup_Telemetry()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark deferred tasks (2.1→Phase 3, 2.5→low priority) with rationale.
Mark superseded tasks (2.2→Phase 1c SpanGuard factory). Add Task 2.7
for Grafana search filters. Update summary table with status column.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Grafana Tempo datasource: add rpc-command, rpc-status, rpc-role
search filters for the Explore UI
- Unit tests: TelemetryConfig (config parsing defaults and sections),
SpanGuardFactory (null guard safety, move semantics, discard, all
factory methods)
- Test CMake registration with optional OTel linking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add amendment_blocked and server_state span attributes to every
rpc.command.* span so operators can correlate RPC behavior with node state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces task list documents for Phases 2 through 5, with Tempo
references (replacing Jaeger) and Task 2.8 dashboard parity spec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[maybe_unused]] to the RAII span in processSession() — the
variable is not read but its lifetime scopes the active OTel context
for child spans created in processRequest()
- Regenerate levelization: remove premature xrpld.telemetry entries
that reference a module not yet present on this branch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use grpc_span::val::resourceExhausted constant instead of raw
"resource_exhausted" string in GRPCServer.cpp
- Fix unbounded span name cardinality in RPCHandler.cpp error path:
use fixed rpc_span::val::unknownCommand as span name instead of
user-supplied cmdName (attacker-controlled input). The actual
command is still captured in the xrpl.rpc.command attribute.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, cloned CallData instances (created for the next incoming
gRPC request) would have an empty name_, making subsequent span attrs
blank.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow project convention (PerfLog.h, SpanGuard.h) for documentation
diagrams. Show HTTP single, HTTP batch, and WebSocket span nesting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document the span hierarchy, covered paths, and known instrumentation
gaps directly in the header that developers reference when adding spans.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace rpcSpan(fullName) calls with span(TraceCategory::Rpc, prefix,
name). Add 'using namespace telemetry' to both RPC files so call sites
read cleanly without repeated namespace qualifiers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete TracingInstrumentation.h and replace all XRPL_TRACE_* macro
invocations with direct SpanGuard::rpcSpan() calls. SpanGuard's pimpl
design and global Telemetry accessor eliminate the need for macro
wrappers and explicit Telemetry instance passing at call sites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TracingInstrumentation.h:
- Unify all span-creation macros to use std::optional<SpanGuard>
(fixes type mismatch between XRPL_TRACE_SPAN and SET_ATTR)
- Wrap XRPL_TRACE_SET_ATTR/EXCEPTION in do-while(0) (dangling-else)
- Move macros outside namespace blocks (macros are global)
- Cache telemetry reference to avoid double-evaluation
- Remove leaked _xrpl_span_ intermediate variable
- Add @note tags for thread safety, scope, and usage constraints
- Add 3 usage examples per CLAUDE.md requirements
ServerHandler.cpp:
- Remove misleading rpc.request span from onRequest() (span ended
before coroutine runs, producing orphan spans)
- Add rpc.http_request span to HTTP processSession() (runs inside
the coroutine, correct parent for rpc.process/rpc.command spans)
- Add XRPL_TRACE_EXCEPTION and error status in both catch blocks
(WS processSession and processRequest)
SpanGuard.h:
- Add null guards to all mutating methods (setOk, setStatus,
setAttribute, addEvent, recordException) for safety after discard()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
- Add [[nodiscard]] to factory and accessor methods
- NOLINT no-op stub instance methods that must stay non-static for API
parity with the real implementation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the single-arg span(name) factory that creates unconditional
spans without category gating. All call sites use the 3-arg
span(TraceCategory, prefix, name) variant which checks whether the
category is enabled in config before creating a span. The 1-arg form
was dead code with no callers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to non-existent TracingInstrumentation.h with
SpanGuard.cpp pimpl implementation that actually exists on this branch.
Update conditional compilation section to describe the pimpl approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace rpcSpan(), txSpan(), consensusSpan(), peerSpan(), ledgerSpan()
with a single span(TraceCategory, prefix, name) factory method. Adding
a new traceable subsystem now requires only a new enum value and one
switch case — no new methods or header changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SpanContext::isValid(): add inline no-op when XRPL_ENABLE_TELEMETRY
is not defined, preventing a linker error if called in that path
- linkedSpan(): set kIsRootSpanKey on the StartSpanOptions parent
context so linked spans start a genuinely independent sub-tree
instead of silently becoming children of the current active span
- Telemetry::instance_: use std::atomic with acquire/release ordering
to avoid a data race between start()/stop() and factory methods
called from worker threads
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add DiscardFlag.h and FilteringSpanProcessor references to the file
tree, key files table, and implementation summary in OpenTelemetryPlan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add span discard mechanism that drops unwanted spans before they enter
the batch export queue, saving both network bandwidth and storage.
FilteringSpanProcessor is a custom SpanProcessor decorator that wraps
BatchSpanProcessor. SpanGuard::discard() sets a thread-local flag
(tl_discardCurrentSpan) before calling Span::End(). The OTel SDK calls
OnEnd() synchronously on the same thread, where the flag is checked and
cleared to drop the span.
New file: DiscardFlag.h — zero-dependency header for the thread-local
flag, avoiding transitive include bloat from Telemetry.h.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate otlp/tempo exporter block, duplicate tempo service
definition, and jaeger dependency from docker-compose example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tempo is now the sole trace backend. Remove Jaeger all-in-one service
from docker-compose, otlp/jaeger exporter from OTel Collector config,
and Jaeger Grafana datasource provisioning file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run .github/scripts/rename/docs.sh to replace rippled → xrpld
references in all plan documentation files, fixing the check-rename
CI failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>