Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing

Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
This commit is contained in:
Pratik Mankawde
2026-05-28 11:38:05 +01:00
19 changed files with 196 additions and 179 deletions

View File

@@ -346,20 +346,20 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
The following table summarizes what data is collected by category:
| Category | Attributes Collected | Purpose |
| --------------- | ---------------------------------------------------------------------- | ---------------------------- |
| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index` | Trace transaction lifecycle |
| **Consensus** | `round`, `phase`, `mode`, `proposers` (public keys), `duration_ms` | Analyze consensus timing |
| **RPC** | `command`, `version`, `status`, `duration_ms` | Monitor RPC performance |
| **Peer** | `peer.id` (public key), `latency_ms`, `message.type`, `message.size` | Network topology analysis |
| **Ledger** | `ledger.hash`, `ledger.index`, `close_time`, `tx_count` | Ledger progression tracking |
| **Job** | `job.type`, `queue_ms`, `worker` | JobQueue performance |
| **PathFinding** | `pathfind.source_currency`, `dest_currency`, `path_count`, `cache_hit` | Payment path analysis |
| **TxQ** | `txq.queue_depth`, `fee_level`, `eviction_reason` | Queue depth and fee tracking |
| **Fee** | `fee.load_factor`, `escalation_level` | Fee escalation monitoring |
| **Validator** | `validator.list_size`, `list_age_sec` | UNL health monitoring |
| **Amendment** | `amendment.name`, `status` | Protocol upgrade tracking |
| **SHAMap** | `shamap.type`, `missing_nodes`, `duration_ms` | State tree sync performance |
| Category | Attributes Collected | Purpose |
| --------------- | ---------------------------------------------------------------------------------------------------------------- | ---------------------------- |
| **Transaction** | `tx.hash`, `tx.type`, `tx.result`, `tx.fee`, `ledger_index` | Trace transaction lifecycle |
| **Consensus** | `round`, `phase`, `mode`, `proposers` (public keys), `duration_ms` | Analyze consensus timing |
| **RPC** | `command`, `version`, `status`, `duration_ms` | Monitor RPC performance |
| **Peer** | `peer.id` (public key), `latency_ms`, `message.type`, `message.size` | Network topology analysis |
| **Ledger** | `ledger.hash`, `ledger.index`, `close_time`, `tx_count` | Ledger progression tracking |
| **Job** | `job.type`, `queue_ms`, `worker` | JobQueue performance |
| **PathFinding** | `pathfind_fast`, `pathfind_search_level`, `pathfind_num_paths`, `pathfind_ledger_index`, `pathfind_num_requests` | Payment path analysis |
| **TxQ** | `txq.queue_depth`, `fee_level`, `eviction_reason` | Queue depth and fee tracking |
| **Fee** | `fee.load_factor`, `escalation_level` | Fee escalation monitoring |
| **Validator** | `validator.list_size`, `list_age_sec` | UNL health monitoring |
| **Amendment** | `amendment.name`, `status` | Protocol upgrade tracking |
| **SHAMap** | `shamap.type`, `missing_nodes`, `duration_ms` | State tree sync performance |
### 2.4.4 Privacy & Sensitive Data Policy

View File

@@ -1,8 +1,8 @@
# Phase 2: RPC Tracing Completion Task List
> **Goal**: Complete RPC tracing coverage with unit tests, Grafana search filters, node health attributes, and config hardening. Build on the Phase 1c SpanGuard factory foundation to achieve production-quality RPC observability.
> **Goal**: Complete RPC tracing coverage with unit tests, Grafana search filters, PathFind instrumentation, and config hardening. Build on the Phase 1c SpanGuard factory foundation to achieve production-quality RPC observability.
>
> **Scope**: Unit tests for core telemetry, Grafana Tempo search filters, node health span attributes, config validation (`std::clamp`).
> **Scope**: Unit tests for core telemetry, Grafana Tempo search filters, PathFind RPC tracing, config validation (`std::clamp`).
>
> **Branch**: `pratik/otel-phase2-rpc-tracing` (from `pratik/otel-phase1c-rpc-integration`)
@@ -121,42 +121,9 @@ These can be added later if dashboard queries specifically need them. The node h
## Task 2.8: RPC Span Attribute Enrichment — Node Health Context
> **Source**: [External Dashboard Parity](../docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md) — adds node-level health context inspired by the community [xrpl-validator-dashboard](https://github.com/realgrapedrop/xrpl-validator-dashboard).
>
> **Downstream**: Phase 7 (MetricsRegistry uses these attributes for alerting context), Phase 10 (validation checks for these attributes).
**Status**: DROPPED.
**Objective**: Add node-level health state to every `rpc.command.*` span so operators can correlate RPC behavior with node state in Tempo.
**What to do**:
- Edit `src/xrpld/rpc/detail/RPCHandler.cpp`:
- In the `rpc.command.*` span creation block (after existing `setAttribute` calls for `command`, `version`, etc.):
- Node health attrs (`xrpl.node.amendment_blocked`, `xrpl.node.server_state`) are now resource-level attrs, not per-span. They are set at Tracer init.
**New span attributes**:
| Attribute | Type | Source | Example |
| ----------------------------- | ------ | ------------------------------------------- | -------- |
| `xrpl.node.amendment_blocked` | bool | `context.app.getOPs().isAmendmentBlocked()` | `true` |
| `xrpl.node.server_state` | string | `context.app.getOPs().strOperatingMode()` | `"full"` |
**Rationale**: When a node is amendment-blocked or in a degraded state, every RPC response is suspect. Tagging spans with this state enables Tempo TraceQL queries like:
```
{name=~"rpc.command.*"} | xrpl.node.amendment_blocked = true
```
This surfaces all RPCs served during a blocked period — critical for post-incident analysis.
**Key modified files**:
- `src/xrpld/rpc/detail/RPCHandler.cpp`
**Exit Criteria**:
- [ ] `rpc.command.server_info` spans carry `xrpl.node.amendment_blocked` and `xrpl.node.server_state` attributes
- [ ] No measurable latency impact (attribute values are cached atomics, not computed per-call)
- [ ] Attributes appear in Tempo trace detail view
Node health (`amendment_blocked`, `server_state`) is not part of the telemetry surface. Operators consume the same data via the existing `server_info` / `server_state` RPC commands, so duplicating it on traces adds storage and cardinality cost without new value. The OTel C++ SDK 1.18.0 also does not support runtime updates to the resource, ruling out resource-level emission of these dynamic-by-nature flags.
---
@@ -169,10 +136,11 @@ This surfaces all RPCs served during a blocked period — critical for post-inci
**Spans added**:
- `pathfind.request` — wraps `doPathFind()` and `doRipplePathFind()` RPC handlers
- `pathfind.compute` — wraps `PathRequest::doUpdate()` (fast/normal attr)
- `pathfind.update_all` — wraps `PathRequestManager::updateAll()` on ledger close (ledger_index attr)
- `pathfind.discover` — wraps `Pathfinder::findPaths()` graph exploration (search_level attr)
- `pathfind.rank` — wraps `Pathfinder::computePathRanks()` liquidity validation (num_paths attr)
- `pathfind.compute` — wraps `PathRequest::doUpdate()` (`pathfind_fast` attr)
- `pathfind.update_all` — wraps `PathRequestManager::updateAll()` on ledger close (`pathfind_ledger_index`, `pathfind_num_requests` attrs; emitted only when active subscriptions exist)
- `pathfind.discover` — wraps the entire per-source-asset loop in `PathRequest::findPaths()` (`pathfind_search_level`, `pathfind_num_paths` attrs). One span per RPC call instead of N (one per source asset). Trade-off: per-asset breakdown is lost; storage and cardinality bounded.
**Attribute namespacing**: All pathfind attributes use the `pathfind_*` underscore form per the Phase 1c naming-spec rule 5.
**New file**: `src/xrpld/rpc/detail/PathFindSpanNames.h`
@@ -197,9 +165,10 @@ This surfaces all RPCs served during a blocked period — critical for post-inci
| 2.5 | Enhanced RPC span attributes (HTTP-level) | Deferred | Low value; span duration covers timing natively |
| 2.6 | Build verification and performance baseline | Complete | Verified in CI on Phase 1c |
| 2.7 | Grafana Tempo search filters | Complete | rpc-command, rpc-status, rpc-role filters |
| 2.8 | RPC span attribute enrichment (node health) | Complete | amendment_blocked + server_state |
| 2.9 | PathFind RPC instrumentation (5 spans) | Complete | request, compute, update_all, discover, rank |
| 2.8 | RPC span attribute enrichment (node health) | Dropped | Available via `server_info`/`server_state` RPC |
| 2.9 | PathFind RPC instrumentation | Complete | request, compute, update_all, discover |
**Delivered in this branch**: Tasks 2.4, 2.7, 2.8, 2.9.
**Delivered in this branch**: Tasks 2.4, 2.7, 2.9.
**Deferred with rationale**: Tasks 2.1 (→Phase 3), 2.5 (low priority).
**Dropped**: Task 2.8 (node health not duplicated on traces).
**Superseded**: Task 2.2 (Phase 1c SpanGuard factory covers this).

View File

@@ -107,17 +107,6 @@ datasources:
operator: "="
scope: span
type: dynamic
# Phase 2: Node health filters (Task 2.8) — resource attributes
- id: node-amendment-blocked
tag: xrpl.node.amendment_blocked
operator: "="
scope: resource
type: static
- id: node-server-state
tag: xrpl.node.server_state
operator: "="
scope: resource
type: dynamic
# Phase 3: Transaction tracing filters
- id: tx-hash
tag: xrpl.tx.hash

View File

@@ -36,7 +36,8 @@ such as Grafana Tempo.
Telemetry is **off by default** at both compile time and runtime:
- **Compile time**: The Conan option `telemetry` and CMake option `telemetry` must be set to `True`/`ON`.
When disabled, all tracing macros compile to `((void)0)` with zero overhead.
When disabled, all `SpanGuard` calls compile to inline no-ops (defined in `SpanGuard.h`)
with zero overhead — no OTel SDK dependency required.
- **Runtime**: The `[telemetry]` config section must set `enabled=1`.
When disabled at runtime, a no-op implementation is used.

View File

@@ -15,7 +15,6 @@
// Add new amendments to the top of this list.
// Keep it sorted in reverse chronological order.
XRPL_FIX (Cleanup3_2_0, Supported::no, VoteBehavior::DefaultNo)
XRPL_FEATURE(MPTokensV2, Supported::no, VoteBehavior::DefaultNo)
XRPL_FIX (Security3_1_3, Supported::no, VoteBehavior::DefaultNo)
XRPL_FIX (PermissionedDomainInvariant, Supported::yes, VoteBehavior::DefaultNo)

View File

@@ -20,7 +20,13 @@
* - Per-span attribute keys: bare field name (span name carries the domain).
* - Collision qualifier: <domain>_<field> when bare name collides across
* domains or with OTel reserved `status` (e.g. rpc_status, grpc_status).
* - Resource attribute keys: xrpl.<subsystem>.<field> (process-identity).
* - Shared cross-span attributes: <domain>_<field> (underscore) form
* (e.g. tx_hash, peer_id, ledger_seq, consensus_round).
* - Resource attribute keys: xrpl.<subsystem>.<field> (dotted) form is
* RESERVED for process-identity attributes set once at startup on the
* OTel resource (e.g. xrpl.network.id, xrpl.network.type). Do not use
* this form for span attributes — it parses awkwardly in TraceQL and
* blurs the resource/span scope distinction.
* - Span prefixes: <subsystem>[.<component>].
*/
@@ -103,15 +109,6 @@ inline constexpr auto networkId = join(join(seg::xrpl, seg::network), makeStr("i
inline constexpr auto networkType = join(join(seg::xrpl, seg::network), makeStr("type"));
inline constexpr auto linkType = makeStr("link_type");
/// Node health attributes — RESOURCE-ONLY (process identity, not per-span).
/// Set at Tracer init via resource::Resource::Create and refreshed on state
/// transitions. Do NOT use with span.setAttribute().
inline constexpr auto xrplNode = join(seg::xrpl, makeStr("node"));
/// "xrpl.node.amendment_blocked" — resource attribute key.
inline constexpr auto nodeAmendmentBlocked = join(xrplNode, makeStr("amendment_blocked"));
/// "xrpl.node.server_state" — resource attribute key.
inline constexpr auto nodeServerState = join(xrplNode, makeStr("server_state"));
/// Canonical shared attrs (rule 5 — kept xrpl.<domain>.* form).
/// Defined once here, aliased by domain-specific headers.
inline constexpr auto txHash = join(join(seg::xrpl, seg::tx), makeStr("hash"));

View File

@@ -3,13 +3,15 @@
/** Abstract interface for OpenTelemetry distributed tracing.
Provides the Telemetry base class that all components use to create trace
spans. Two concrete implementations exist, selected at construction time
spans. Three concrete implementations exist, selected at construction time
by make_Telemetry():
- TelemetryImpl (Telemetry.cpp): real OTel SDK integration, compiled
only when XRPL_ENABLE_TELEMETRY is defined and enabled at runtime.
- NullTelemetry (NullTelemetry.cpp): no-op stub used when telemetry is
disabled at compile time or runtime.
- NullTelemetryOtel (Telemetry.cpp): no-op stub that still depends on
the OTel API (used during transition or for testing).
Inheritance / dependency diagram:
@@ -35,32 +37,44 @@
Usage examples:
1. Check before tracing (typical guard pattern):
1. Root span at a subsystem entry point (typical usage):
@code
auto& telemetry = registry.getTelemetry();
if (telemetry.isEnabled() && telemetry.shouldTraceRpc())
#include <xrpld/rpc/detail/RpcSpanNames.h>
using namespace xrpl::telemetry;
// In an RPC handler dispatch:
auto guard = SpanGuard::span(
TraceCategory::Rpc, rpc_span::prefix::command, commandName);
guard.setAttribute(rpc_span::attr::command, commandName);
// ... process request
// guard destructor automatically ends the span on scope exit
@endcode
2. Child span for a sub-operation (scoped child):
@code
auto parent = SpanGuard::span(TraceCategory::Transactions, "tx", "process");
{
auto span = telemetry.startSpan("rpc.command.server_info");
// ... do work, span ends when shared_ptr refcount drops to 0
auto child = parent.childSpan("tx.apply");
child.setAttribute("tx_type", txType);
// child ends here
}
@endcode
2. RAII tracing with SpanGuard (preferred):
3. Unrelated span (cross-scope, same thread):
@code
if (telemetry.isEnabled() && telemetry.shouldTraceRpc())
{
SpanGuard guard(telemetry.startSpan("rpc.command.submit"));
guard.setAttribute("command", "submit");
// ... guard ends span automatically on scope exit
}
// Transactions and RPC can be active simultaneously
auto txSpan = SpanGuard::span(TraceCategory::Transactions, "tx", "process");
auto rpcSpan = SpanGuard::span(TraceCategory::Rpc, "rpc", "info");
// both spans end on scope exit
@endcode
3. Cross-thread context propagation:
4. Cross-thread context propagation:
@code
// On thread A: capture context
auto ctx = guard.context();
// On thread B: create child span with explicit parent
auto child = telemetry.startSpan("async.work", ctx);
// Thread A: capture the active context while span is in scope
auto ctx = parentGuard.captureContext();
// Thread B: create child span with explicit parent
auto child = SpanGuard::childSpan("async.work", ctx);
@endcode
@note Thread safety: The Telemetry interface is safe for concurrent reads

View File

@@ -42,6 +42,7 @@
#include <cstring>
#include <string>
#include <typeinfo>
#include <utility>
namespace xrpl {
@@ -423,7 +424,7 @@ SpanGuard::recordException(std::exception const& e)
return;
impl_->span->AddEvent(
"exception",
{{"exception.type", "std::exception"}, {"exception.message", std::string(e.what())}});
{{"exception.type", typeid(e).name()}, {"exception.message", std::string(e.what())}});
impl_->span->SetStatus(otel_trace::StatusCode::kError, e.what());
}

View File

@@ -169,8 +169,7 @@ void
GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::Coro> coro)
{
using namespace telemetry;
auto span =
SpanGuard::span(TraceCategory::Rpc, grpc_span::prefix::grpc, grpc_span::op::request);
auto span = SpanGuard::span(TraceCategory::Rpc, grpc_span::prefix::grpc, name_);
span.setAttribute(grpc_span::attr::method, name_);
try
@@ -179,6 +178,7 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
bool const isUnlimited = clientIsUnlimited();
if (!isUnlimited && usage.disconnect(app_.getJournal("gRPCServer")))
{
span.setAttribute(grpc_span::attr::grpcStatus, grpc_span::val::error);
span.setError(grpc_span::val::resourceExhausted);
grpc::Status const status{
grpc::StatusCode::RESOURCE_EXHAUSTED, "usage balance exceeds threshold"};
@@ -190,6 +190,11 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
usage.charge(loadType);
auto role = getRole(isUnlimited);
span.setAttribute(
grpc_span::attr::grpcRole,
role == Role::ADMIN ? std::string_view(grpc_span::val::admin)
: std::string_view(grpc_span::val::user));
{
std::stringstream toLog;
toLog << "role = " << (int)role;
@@ -225,6 +230,7 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
if (conditionMetRes != rpcSUCCESS)
{
RPC::ErrorInfo const errorInfo = RPC::get_error_info(conditionMetRes);
span.setAttribute(grpc_span::attr::grpcStatus, grpc_span::val::error);
span.setError(errorInfo.token.c_str());
grpc::Status const status{
grpc::StatusCode::FAILED_PRECONDITION, errorInfo.message.c_str()};
@@ -234,6 +240,7 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
{
std::pair<Response, grpc::Status> result = handler_(context);
setIsUnlimited(result.first, isUnlimited);
span.setAttribute(grpc_span::attr::grpcStatus, grpc_span::val::success);
span.setOk();
responder_.Finish(result.first, result.second, this);
}
@@ -241,6 +248,7 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
}
catch (std::exception const& ex)
{
span.setAttribute(grpc_span::attr::grpcStatus, grpc_span::val::error);
span.recordException(ex);
grpc::Status const status{grpc::StatusCode::INTERNAL, ex.what()};
responder_.FinishWithError(status, this);

View File

@@ -9,13 +9,16 @@
* Span hierarchy:
*
* +-------------------------------------------------------+
* | grpc.request |
* | grpc.<MethodName> (e.g. grpc.GetLedger) |
* | CallData::process(coro) |
* | attrs: method, grpc_role, grpc_status |
* +-------------------------------------------------------+
*
* Unlike the HTTP/WS RPC path, gRPC has a flat single-span structure
* per request since each CallData handles exactly one RPC method.
* The method name is embedded in the span name (rather than only as
* an attribute) so dashboards can break out per-method latency and
* error rates without needing TraceQL attribute filters.
*/
#include <xrpl/telemetry/SpanNames.h>
@@ -25,16 +28,11 @@ namespace xrpl::telemetry::grpc_span {
// ===== Span prefixes =======================================================
namespace prefix {
/// "grpc" — root prefix for gRPC transport spans.
/// "grpc" — root prefix for gRPC transport spans. The full span name is
/// formed at the call site as `grpc.<MethodName>` (see GRPCServer.cpp).
inline constexpr auto grpc = makeStr("grpc");
} // namespace prefix
// ===== Span operation suffixes =============================================
namespace op {
inline constexpr auto request = makeStr("request");
} // namespace op
// ===== Attribute keys ======================================================
namespace attr {
@@ -51,6 +49,8 @@ inline constexpr auto grpcStatus = makeStr("grpc_status");
namespace val {
using telemetry::attr_val::error;
using telemetry::attr_val::success;
inline constexpr auto admin = makeStr("admin");
inline constexpr auto user = makeStr("user");
inline constexpr auto resourceExhausted = makeStr("resource_exhausted");
inline constexpr auto failedPrecondition = makeStr("failed_precondition");
} // namespace val

View File

@@ -9,34 +9,36 @@
*
* RPC entry (one-shot or subscription):
*
* +-------------------------------------------------------+
* | pathfind.request |
* | doPathFind() / doRipplePathFind() |
* | attrs: source_account, dest_account |
* | |
* | +--------------------------------------------------+ |
* | | pathfind.compute | |
* | | PathRequest::doUpdate() | |
* | | attrs: fast, search_level | |
* | | | |
* | | +---------------------+ +--------------------+ | |
* | | | pathfind.discover | | pathfind.rank | | |
* | | | Pathfinder::find() | | computePathRanks() | | |
* | | +---------------------+ +--------------------+ | |
* | +--------------------------------------------------+ |
* +-------------------------------------------------------+
* +----------------------------------------------------------------+
* | pathfind.request |
* | doPathFind() / doRipplePathFind() |
* | attrs: pathfind_source_account, pathfind_dest_account |
* | (set when present in request params) |
* | |
* | +-----------------------------------------------------------+ |
* | | pathfind.compute | |
* | | PathRequest::doUpdate() | |
* | | attrs: pathfind_fast | |
* | | | |
* | | +-----------------------------------------------------+ | |
* | | | pathfind.discover (one per RPC call, hoisted above | |
* | | | the per-source-asset loop in PathRequest::findPaths)| |
* | | | attrs: pathfind_search_level, pathfind_num_paths | |
* | | +-----------------------------------------------------+ | |
* | +-----------------------------------------------------------+ |
* +----------------------------------------------------------------+
*
* Async recomputation (ledger close):
*
* +-------------------------------------------------------+
* | pathfind.update_all |
* | PathRequestManager::updateAll() |
* | attrs: ledger_index, num_requests |
* | |
* | +--------------------------------------------------+ |
* | | pathfind.compute (per active request) | |
* | +--------------------------------------------------+ |
* +-------------------------------------------------------+
* +----------------------------------------------------------------+
* | pathfind.update_all |
* | PathRequestManager::updateAll() |
* | attrs: pathfind_ledger_index, pathfind_num_requests |
* | |
* | +-----------------------------------------------------------+ |
* | | pathfind.compute (per active request) | |
* | +-----------------------------------------------------------+ |
* +----------------------------------------------------------------+
*/
#include <xrpl/telemetry/SpanNames.h>
@@ -57,30 +59,31 @@ inline constexpr auto request = makeStr("request");
inline constexpr auto compute = makeStr("compute");
inline constexpr auto updateAll = makeStr("update_all");
inline constexpr auto discover = makeStr("discover");
inline constexpr auto rank = makeStr("rank");
} // namespace op
// ===== Attribute keys ======================================================
//
// All pathfind attributes are namespaced under `pathfind_*` (underscore form,
// per Phase 1c naming spec rule 5). Avoids collisions with bare keys like
// `fast` or `num_paths` that other subsystems may introduce.
namespace attr {
/// "source_account" — originating account for path search.
inline constexpr auto sourceAccount = makeStr("source_account");
/// "dest_account" — destination account.
inline constexpr auto destAccount = makeStr("dest_account");
/// "fast" — whether fast pathfinding mode enabled.
inline constexpr auto fast = makeStr("fast");
/// "search_level" — depth of graph exploration.
inline constexpr auto searchLevel = makeStr("search_level");
/// "num_complete_paths" — complete paths found.
inline constexpr auto numCompletePaths = makeStr("num_complete_paths");
/// "num_paths" — total paths returned.
inline constexpr auto numPaths = makeStr("num_paths");
/// "num_requests" — active path requests.
inline constexpr auto numRequests = makeStr("num_requests");
/// "xrpl.pathfind.ledger_index" — kept qualified (rule 5): pathfind target
/// ledger is distinct from xrpl.ledger.seq.
inline constexpr auto ledgerIndex =
join(join(seg::xrpl, makeStr("pathfind")), makeStr("ledger_index"));
/// "pathfind_source_account" — originating account for path search.
inline constexpr auto sourceAccount = makeStr("pathfind_source_account");
/// "pathfind_dest_account" — destination account.
inline constexpr auto destAccount = makeStr("pathfind_dest_account");
/// "pathfind_fast" — whether fast pathfinding mode enabled.
inline constexpr auto fast = makeStr("pathfind_fast");
/// "pathfind_search_level" — depth of graph exploration.
inline constexpr auto searchLevel = makeStr("pathfind_search_level");
/// "pathfind_num_paths" — total paths produced across the per-source-asset
/// loop in PathRequest::findPaths (sum of getBestPaths().size() per asset).
inline constexpr auto numPaths = makeStr("pathfind_num_paths");
/// "pathfind_num_requests" — snapshot size of requests_ at update_all start
/// (may include weak_ptrs that subsequently expire during processing).
inline constexpr auto numRequests = makeStr("pathfind_num_requests");
/// "pathfind_ledger_index" — pathfind target ledger index.
inline constexpr auto ledgerIndex = makeStr("pathfind_ledger_index");
} // namespace attr
} // namespace xrpl::telemetry::pathfind_span

View File

@@ -40,6 +40,7 @@
#include <algorithm>
#include <chrono>
#include <cstdint>
#include <functional>
#include <memory>
#include <mutex>
@@ -579,6 +580,20 @@ PathRequest::findPaths(
auto const dst_amount = convertAmount(saDstAmount, convert_all_);
hash_map<PathAsset, std::unique_ptr<Pathfinder>> pathasset_map;
// One `pathfind.discover` span wraps the entire per-source-asset loop so
// that a single RPC call produces one discover span instead of N (one per
// candidate source asset). Trade-off: per-asset discovery/ranking timing
// is no longer split into individual spans — span count and Tempo storage
// are bounded per RPC at the cost of per-asset visibility. If per-asset
// breakdown is needed in the future, add child spans inside the loop body
// (`Pathfinder::findPaths`/`computePathRanks`) parented off this span.
using namespace telemetry;
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::discover);
span.setAttribute(pathfind_span::attr::searchLevel, static_cast<int64_t>(level));
std::int64_t totalPaths = 0;
for (auto const& asset : sourceAssets)
{
if (continueCallback && !continueCallback())
@@ -598,6 +613,7 @@ PathRequest::findPaths(
auto ps = pathfinder->getBestPaths(
max_paths_, fullLiquidityPath, mContext[asset], asset.getIssuer(), continueCallback);
mContext[asset] = ps;
totalPaths += static_cast<std::int64_t>(ps.size());
auto const& sourceAccount = [&] {
if (!isXRP(asset.getIssuer()))
@@ -697,6 +713,8 @@ PathRequest::findPaths(
}
}
span.setAttribute(pathfind_span::attr::numPaths, totalPaths);
/* The resource fee is based on the number of source currencies used.
The minimum cost is 50 and the maximum is 400. The cost increases
after four source currencies, 50 - (4 * 4) = 34.

View File

@@ -61,11 +61,6 @@ PathRequestManager::getAssetCache(std::shared_ptr<ReadView const> const& ledger,
void
PathRequestManager::updateAll(std::shared_ptr<ReadView const> const& inLedger)
{
using namespace telemetry;
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::updateAll);
span.setAttribute(pathfind_span::attr::ledgerIndex, static_cast<int64_t>(inLedger->seq()));
auto event = app_.getJobQueue().makeLoadEvent(jtPATH_FIND, "PathRequest::updateAll");
std::vector<PathRequest::wptr> requests;
@@ -78,6 +73,18 @@ PathRequestManager::updateAll(std::shared_ptr<ReadView const> const& inLedger)
cache = getAssetCache(inLedger, true);
}
// updateAll runs on every ledger close; skip span emission entirely when
// there are no active path subscriptions to avoid a steady stream of empty
// spans at mainnet close cadence.
if (requests.empty())
return;
using namespace telemetry;
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::updateAll);
span.setAttribute(pathfind_span::attr::ledgerIndex, static_cast<int64_t>(inLedger->seq()));
span.setAttribute(pathfind_span::attr::numRequests, static_cast<int64_t>(requests.size()));
bool newRequests = app_.getLedgerMaster().isNewPathRequest();
bool mustBreak = false;

View File

@@ -2,7 +2,6 @@
#include <xrpld/app/main/Application.h>
#include <xrpld/rpc/detail/AssetCache.h>
#include <xrpld/rpc/detail/PathFindSpanNames.h>
#include <xrpld/rpc/detail/PathfinderUtils.h>
#include <xrpld/rpc/detail/RippleLineCache.h>
#include <xrpld/rpc/detail/TrustLine.h>
@@ -30,7 +29,6 @@
#include <xrpl/protocol/STPathSet.h>
#include <xrpl/protocol/TER.h>
#include <xrpl/protocol/UintTypes.h>
#include <xrpl/telemetry/SpanGuard.h>
#include <xrpl/tx/paths/RippleCalc.h>
#include <algorithm>
@@ -229,11 +227,6 @@ Pathfinder::Pathfinder(
bool
Pathfinder::findPaths(int searchLevel, std::function<bool(void)> const& continueCallback)
{
using namespace telemetry;
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::discover);
span.setAttribute(pathfind_span::attr::searchLevel, static_cast<int64_t>(searchLevel));
JLOG(j_.trace()) << "findPaths start";
if (mDstAmount == beast::zero)
{
@@ -444,11 +437,6 @@ Pathfinder::getPathLiquidity(
void
Pathfinder::computePathRanks(int maxPaths, std::function<bool(void)> const& continueCallback)
{
using namespace telemetry;
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::rank);
span.setAttribute(pathfind_span::attr::numPaths, static_cast<int64_t>(maxPaths));
mRemainingAmount = convertAmount(mDstAmount, convert_all_);
// Must subtract liquidity in default path from remaining amount.

View File

@@ -185,7 +185,14 @@ callMethod(JsonContext& context, Method method, std::string const& name, Object&
JLOG(context.j.debug()) << "RPC call " << name << " completed in "
<< ((end - start).count() / 1000000000.0) << "seconds";
perfLog.rpcFinish(name, curId);
span.setAttribute(rpc_span::attr::rpcStatus, rpc_span::val::success);
// Status::operator bool() returns true when there IS an error
// (code_ != OK), so the ternary correctly maps error->error, ok->success.
span.setAttribute(
rpc_span::attr::rpcStatus,
ret ? std::string_view(rpc_span::val::error)
: std::string_view(rpc_span::val::success));
if (!ret)
span.setOk();
return ret;
}
catch (std::exception& e)
@@ -224,8 +231,11 @@ doCommand(RPC::JsonContext& context, Json::Value& result)
{
cmdName = "unknown";
}
auto span = SpanGuard::span(
TraceCategory::Rpc, rpc_span::prefix::command, rpc_span::val::unknownCommand);
// Use the resolved command name as the span suffix so dashboards
// can break out per-command error rates (e.g. rpc.command.submit
// for a submit that hit rpcTOO_BUSY). Falling back to a single
// "unknown" name only when the request truly omits both fields.
auto span = SpanGuard::span(TraceCategory::Rpc, rpc_span::prefix::command, cmdName);
span.setAttribute(rpc_span::attr::command, cmdName.c_str());
span.setError(get_error_info(error).token.c_str());

View File

@@ -86,7 +86,7 @@
* gRPC path (see GrpcSpanNames.h for constants):
*
* +-------------------------------------------------------+
* | grpc.request |
* | grpc.<MethodName> (e.g. grpc.GetLedger) |
* | CallData::process(coro) |
* | attrs: method, grpc_status |
* +-------------------------------------------------------+

View File

@@ -355,7 +355,7 @@ ServerHandler::onWSMessage(
Json::Value jvResult(Json::objectValue);
jvResult[jss::type] = jss::error;
jvResult[jss::error] = "jsonInvalid";
jvResult[jss::value] = buffers_to_string(buffers);
jvResult[jss::value] = ::xrpl::buffers_to_string(buffers);
boost::beast::multi_buffer sb;
Json::stream(jvResult, [&sb](auto const p, auto const n) {
sb.commit(boost::asio::buffer_copy(sb.prepare(n), boost::asio::buffer(p, n)));
@@ -564,6 +564,7 @@ ServerHandler::processSession(
jr[jss::api_version] = jv[jss::api_version];
jr[jss::type] = jss::response;
span.setOk();
return jr;
}
@@ -578,7 +579,7 @@ ServerHandler::processSession(
processRequest(
session->port(),
buffers_to_string(session->request().body().data()),
::xrpl::buffers_to_string(session->request().body().data()),
session->remoteAddress().at_port(0),
makeOutput(*session),
coro,
@@ -598,6 +599,7 @@ ServerHandler::processSession(
{
session->close(true);
}
span.setOk();
}
static Json::Value
@@ -1036,6 +1038,7 @@ ServerHandler::processRequest(
}
}
span.setOk();
HTTPReply(httpStatus, response, output, rpcJ);
}

View File

@@ -18,8 +18,13 @@ Json::Value
doPathFind(RPC::JsonContext& context)
{
using namespace telemetry;
[[maybe_unused]] auto span = SpanGuard::span(
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::request);
if (auto const& src = context.params[jss::source_account]; src.isString())
span.setAttribute(pathfind_span::attr::sourceAccount, src.asString());
if (auto const& dst = context.params[jss::destination_account]; dst.isString())
span.setAttribute(pathfind_span::attr::destAccount, dst.asString());
if (context.app.config().PATH_SEARCH_MAX == 0)
return rpcError(rpcNOT_SUPPORTED);

View File

@@ -26,8 +26,13 @@ Json::Value
doRipplePathFind(RPC::JsonContext& context)
{
using namespace telemetry;
[[maybe_unused]] auto span = SpanGuard::span(
auto span = SpanGuard::span(
TraceCategory::Rpc, pathfind_span::prefix::pathfind, pathfind_span::op::request);
if (auto const& src = context.params[jss::source_account]; src.isString())
span.setAttribute(pathfind_span::attr::sourceAccount, src.asString());
if (auto const& dst = context.params[jss::destination_account]; dst.isString())
span.setAttribute(pathfind_span::attr::destAccount, dst.asString());
if (context.app.config().PATH_SEARCH_MAX == 0)
return rpcError(rpcNOT_SUPPORTED);