feat(telemetry): add Phase 4 consensus tracing with SpanGuard API

Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.

Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:

- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
  (consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
  (used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
  following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
  for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
  consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
  from consensus thread to jtACCEPT worker thread

Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-04-24 21:35:50 +01:00
parent 654fe2d30f
commit 54c97daaf1
18 changed files with 1371 additions and 16 deletions

View File

@@ -120,8 +120,10 @@
#include <array>
#include <cstdint>
#include <exception>
#include <initializer_list>
#include <memory>
#include <string_view>
#include <utility>
namespace xrpl::telemetry {
@@ -153,6 +155,11 @@ struct TraceBytes
bool valid{false};
};
/** Key-value pair for span event attributes.
Used by addEvent(name, attrs) to attach structured metadata to events.
*/
using EventAttribute = std::pair<std::string_view, std::string_view>;
/** Opaque wrapper for an OTel context snapshot.
Used to propagate trace context across threads. Created by
@@ -362,6 +369,14 @@ public:
void
addEvent(std::string_view name);
/** Add a named event with key-value attributes to the span's timeline.
No-op on a null guard.
@param name Event name.
@param attrs Attribute pairs (all string_view for simplicity).
*/
void
addEvent(std::string_view name, std::initializer_list<EventAttribute> attrs);
/** Record an exception as a span event following OTel semantic
conventions, and mark the span status as error.
No-op on a null guard.
@@ -491,6 +506,10 @@ public:
{
}
void
addEvent(std::string_view, std::initializer_list<EventAttribute>)
{
}
void
recordException(std::exception const&)
{
}

View File

@@ -187,6 +187,13 @@ public:
/** Enable tracing for ledger close/accept. */
bool traceLedger = true;
/** Strategy for cross-node consensus trace correlation.
"deterministic" — derive trace_id from ledger hash so all
validators in the same round share the same trace_id.
"attribute" — random trace_id, correlate via ledger_id attribute.
*/
std::string consensusTraceStrategy = "deterministic";
};
virtual ~Telemetry() = default;
@@ -244,6 +251,10 @@ public:
[[nodiscard]] virtual bool
shouldTraceLedger() const = 0;
/** @return The configured consensus trace correlation strategy. */
virtual std::string const&
getConsensusTraceStrategy() const = 0;
#ifdef XRPL_ENABLE_TELEMETRY
/** Get or create a named tracer instance.