fix(telemetry): use resolved command/method name as span suffix

Per PR #6438 review thread r3250432621: known-command errors
(rpcTOO_BUSY, rpcNO_PERMISSION, etc.) were collapsing into a
single rpc.command.unknown span name, hiding per-command error
rates in dashboards. Same anti-pattern existed for gRPC, where
every method was bucketed under grpc.request with the method
relegated to an attribute.

- RPCHandler.cpp: doCommand error path uses cmdName as the span
  suffix; the rpc_span::val::unknownCommand fallback only applies
  when the request truly omits both command and method fields.
- GRPCServer.cpp: gRPC span name is now grpc.<MethodName>
  (e.g. grpc.GetLedger). Method also retained as an attribute.
- GrpcSpanNames.h: drop the unused op::request constant; update
  the span-hierarchy comment.
- RpcSpanNames.h: update the gRPC span diagram to match.

Dashboards on downstream phases will benefit from per-command
breakdowns without needing TraceQL attribute filters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-05-27 18:26:29 +01:00
parent f6f0cb1a5f
commit bfdcd3da87
4 changed files with 14 additions and 14 deletions

View File

@@ -169,8 +169,7 @@ void
GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::Coro> coro)
{
using namespace telemetry;
auto span =
SpanGuard::span(TraceCategory::Rpc, grpc_span::prefix::grpc, grpc_span::op::request);
auto span = SpanGuard::span(TraceCategory::Rpc, grpc_span::prefix::grpc, name_);
span.setAttribute(grpc_span::attr::method, name_);
try

View File

@@ -9,13 +9,16 @@
* Span hierarchy:
*
* +-------------------------------------------------------+
* | grpc.request |
* | grpc.<MethodName> (e.g. grpc.GetLedger) |
* | CallData::process(coro) |
* | attrs: method, grpc_role, grpc_status |
* +-------------------------------------------------------+
*
* Unlike the HTTP/WS RPC path, gRPC has a flat single-span structure
* per request since each CallData handles exactly one RPC method.
* The method name is embedded in the span name (rather than only as
* an attribute) so dashboards can break out per-method latency and
* error rates without needing TraceQL attribute filters.
*/
#include <xrpl/telemetry/SpanNames.h>
@@ -25,16 +28,11 @@ namespace xrpl::telemetry::grpc_span {
// ===== Span prefixes =======================================================
namespace prefix {
/// "grpc" — root prefix for gRPC transport spans.
/// "grpc" — root prefix for gRPC transport spans. The full span name is
/// formed at the call site as `grpc.<MethodName>` (see GRPCServer.cpp).
inline constexpr auto grpc = makeStr("grpc");
} // namespace prefix
// ===== Span operation suffixes =============================================
namespace op {
inline constexpr auto request = makeStr("request");
} // namespace op
// ===== Attribute keys ======================================================
namespace attr {

View File

@@ -229,8 +229,11 @@ doCommand(RPC::JsonContext& context, Json::Value& result)
{
cmdName = "unknown";
}
auto span = SpanGuard::span(
TraceCategory::Rpc, rpc_span::prefix::command, rpc_span::val::unknownCommand);
// Use the resolved command name as the span suffix so dashboards
// can break out per-command error rates (e.g. rpc.command.submit
// for a submit that hit rpcTOO_BUSY). Falling back to a single
// "unknown" name only when the request truly omits both fields.
auto span = SpanGuard::span(TraceCategory::Rpc, rpc_span::prefix::command, cmdName);
span.setAttribute(rpc_span::attr::command, cmdName.c_str());
span.setError(get_error_info(error).token.c_str());

View File

@@ -86,9 +86,9 @@
* gRPC path (see GrpcSpanNames.h for constants):
*
* +-------------------------------------------------------+
* | grpc.request |
* | grpc.<MethodName> (e.g. grpc.GetLedger) |
* | CallData::process(coro) |
* | attrs: method, grpc_status |
* | attrs: method, grpc_status |
* +-------------------------------------------------------+
*
* Covered paths: