Commit Graph

14010 Commits

Author SHA1 Message Date
Pratik Mankawde
497dd007d9 refactor(telemetry): simplify attr naming on phase-2 — drop xrpl.pathfind. prefix
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
  dest_account, fast, search_level, num_complete_paths, num_paths,
  num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
  xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
  RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 15:57:36 +01:00
Pratik Mankawde
0d845149ec Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-05-13 15:55:39 +01:00
Pratik Mankawde
7a854ccad2 refactor(telemetry): simplify attr naming on phase-1c — drop xrpl.<domain>. prefix
- Drop xrpl.rpc.* prefix from per-span attrs (command, version).
- Qualify collision-prone fields: role -> rpc_role/grpc_role,
  status -> rpc_status/grpc_status.
- Rename payload_size -> request_payload_size for cross-domain clarity.
- Simplify link.type -> link_type (bare name, no join).
- Update convention doc in SpanNames.h to reflect new naming rules.
- Update telemetry.md doc with renamed attr keys.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 15:54:13 +01:00
Pratik Mankawde
87a8780b73 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing
# Conflicts:
#	src/xrpld/rpc/detail/RPCHandler.cpp
2026-04-29 18:18:37 +01:00
Pratik Mankawde
78522ba18e Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration 2026-04-29 18:17:02 +01:00
Pratik Mankawde
79fbb9c303 fix(telemetry): address clang-tidy errors on phase1c RPC integration files
- Concatenate nested namespaces in SpanNames.h, RpcSpanNames.h, GrpcSpanNames.h
- Remove unused InfoSub.h and NetworkOPs.h includes from RPCHandler.cpp
- Add missing <string_view> includes in RPCHandler.cpp and GRPCServer.cpp
- Replace nested ternary with if/else-if in RPCHandler.cpp
- Add IWYU pragma keep for json_body.h in ServerHandler.cpp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 18:16:24 +01:00
Pratik Mankawde
e21e7b0d51 fix(telemetry): add missing PublicKey.h include for toBase58 in Application.cpp
Clang-tidy misc-include-cleaner requires direct includes for all used
symbols. Application.cpp calls toBase58(TokenType::NodePublic, ...) at
line 1359 but did not directly include PublicKey.h.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:51:35 +01:00
Pratik Mankawde
3ed22580fe fix(telemetry): address remaining clang-tidy and cspell CI failures
- Add "hicpp" to cspell dictionary for NOLINT annotations
- Concatenate nested namespaces in RpcSpanNames.h
- Fix include hygiene and nested ternary in RPCHandler.cpp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:31:58 +01:00
Pratik Mankawde
20fabbc0ec fix(telemetry): resolve Clang build, clang-tidy, and rename CI failures
- Add [[maybe_unused]] to RAII span variables in PathFind/RipplePathFind
  handlers (Clang -Wunused-variable with -Werror)
- Restore over-renamed values: rippledb, rippled.cfg, historical GitHub URL
- Concatenate nested namespaces in SpanNames.h and PathFindSpanNames.h
  (modernize-concat-nested-namespaces)
- Add missing includes and const qualifiers in test files
- Suppress intentional use-after-move in SpanGuardFactory move test
- Remove unused NetworkOPs.h include from PathRequest.cpp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:02:55 +01:00
Pratik Mankawde
831be14fd9 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-04-29 13:01:14 +01:00
Pratik Mankawde
019e84f0d2 Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration 2026-04-29 13:01:08 +01:00
Pratik Mankawde
694abe2004 docs(telemetry): add thread-safety comments to stop() and sdkProvider_.reset()
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 13:00:39 +01:00
Pratik Mankawde
e07391fe78 chore: apply rename script (rippled → xrpld)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 12:25:03 +01:00
Pratik Mankawde
9515177e29 docs(telemetry): add missing config options to xrpld-example.cfg
Document service_instance_id, use_tls, tls_ca_cert, batch_size,
batch_delay_ms, and max_queue_size in the [telemetry] section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 12:20:18 +01:00
Pratik Mankawde
bd6e58a20e fix(telemetry): add missing span constants, fix test API, update levelization
Add unknownCommand and wsUpgrade span name constants to RpcSpanNames.h,
fix SpanGuardFactory tests to use the 3-argument SpanGuard::span() API,
update levelization results, and apply rename script to docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 11:21:13 +01:00
Pratik Mankawde
e8c826c816 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-04-29 11:17:19 +01:00
Pratik Mankawde
d753191d20 Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration 2026-04-29 11:16:51 +01:00
Pratik Mankawde
d4e91b462e fix(telemetry): resolve clang-tidy warnings in Telemetry interfaces
Use C++17 concatenated namespaces, add [[nodiscard]] to query methods,
add missing direct includes, and use pass-by-value + std::move in
NullTelemetry constructor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 11:16:21 +01:00
Pratik Mankawde
fa2736277f Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-04-28 15:07:17 +01:00
Pratik Mankawde
196c309d1d Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration
# Conflicts:
#	src/libxrpl/telemetry/Telemetry.cpp
2026-04-28 15:07:07 +01:00
Pratik Mankawde
d46d015fd5 fix(telemetry): fix include ordering in PathFind span files
Sort PathFindSpanNames.h after AssetCache.h alphabetically in
PathRequestManager.cpp and Pathfinder.cpp to satisfy the project's
include-order convention enforced by pre-commit hooks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 15:05:57 +01:00
Pratik Mankawde
999bf83f15 fix(telemetry): fix SpanGuard.cpp include ordering
Move SpanGuard.h (associated header) to first include position,
separated by blank line from other project includes, per the
project's include-order convention enforced by pre-commit hooks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 15:05:03 +01:00
Pratik Mankawde
96470e0c8d fix(telemetry): fix include ordering and markdown table formatting
Move Telemetry.h (associated header) to first include position in
Telemetry.cpp per the project's include-order convention. Trim
trailing whitespace from POC_taskList.md markdown table columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 15:04:09 +01:00
Pratik Mankawde
ed8164d502 docs(telemetry): add Task 2.9 PathFind instrumentation to Phase 2 task list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
682d7a8d76 feat(telemetry): add PathFind tracing with 5 spans (Tasks 2.9/2.10)
Instrument the path finding subsystem with full span coverage:

- pathfind.request: wraps doPathFind() and doRipplePathFind() RPC handlers
- pathfind.compute: wraps PathRequest::doUpdate() with fast/normal attr
- pathfind.update_all: wraps PathRequestManager::updateAll() on ledger
  close with ledger_index attr
- pathfind.discover: wraps Pathfinder::findPaths() graph exploration
  with search_level attr
- pathfind.rank: wraps Pathfinder::computePathRanks() liquidity
  validation with num_paths attr

New file: PathFindSpanNames.h with compile-time constants following
the StaticStr/join() pattern from Phase 1c.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
eb51457e69 fix(telemetry): address Phase 2 code review findings
- Move node health attribute strings to compile-time constants in
  SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState)
- Add Tempo search filters for node health attributes
- Remove unnecessary .c_str() on strOperatingMode() return
- Add samplingRatio clamping test (values > 1.0 and < 0.0)
- Fix Task 2.3 status: delivered in Phase 1c, not Phase 2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
65817c4c57 fix(telemetry): align TelemetryConfig tests with current API
- serviceName default is "xrpld" not "rippled"
- Remove references to nonexistent exporterType field
- Pass networkId (4th param) to setup_Telemetry()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
9bc8cc6b4e docs(telemetry): update Phase2 task list to reflect actual implementation
Mark deferred tasks (2.1→Phase 3, 2.5→low priority) with rationale.
Mark superseded tasks (2.2→Phase 1c SpanGuard factory). Add Task 2.7
for Grafana search filters. Update summary table with status column.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
832648c351 feat(telemetry): add RPC trace filters and SpanGuard unit tests
- Grafana Tempo datasource: add rpc-command, rpc-status, rpc-role
  search filters for the Explore UI
- Unit tests: TelemetryConfig (config parsing defaults and sections),
  SpanGuardFactory (null guard safety, move semantics, discard, all
  factory methods)
- Test CMake registration with optional OTel linking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
21b58a8885 feat(telemetry): add node health attributes to RPC spans (Task 2.8)
Add amendment_blocked and server_state span attributes to every
rpc.command.* span so operators can correlate RPC behavior with node state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
a9ee819ea1 docs(telemetry): add Phase 2-5 task lists and appendix update
Introduces task list documents for Phases 2 through 5, with Tempo
references (replacing Jaeger) and Task 2.8 dashboard parity spec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
736579e473 refactor(telemetry): extract span name constants into modular headers
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:07 +01:00
Pratik Mankawde
3b93e2d4d9 fix(telemetry): suppress unused span warning and regenerate levelization
- Add [[maybe_unused]] to the RAII span in processSession() — the
  variable is not read but its lifetime scopes the active OTel context
  for child spans created in processRequest()
- Regenerate levelization: remove premature xrpld.telemetry entries
  that reference a module not yet present on this branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
ac9bd2c055 fix(telemetry): use span name constants and fix cardinality risk
- Use grpc_span::val::resourceExhausted constant instead of raw
  "resource_exhausted" string in GRPCServer.cpp
- Fix unbounded span name cardinality in RPCHandler.cpp error path:
  use fixed rpc_span::val::unknownCommand as span name instead of
  user-supplied cmdName (attacker-controlled input). The actual
  command is still captured in the xrpl.rpc.command attribute.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
4124762343 fix(telemetry): pass name_ through CallData::clone()
Without this, cloned CallData instances (created for the next incoming
gRPC request) would have an empty name_, making subsequent span attrs
blank.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
ea8600e204 feat(telemetry): instrument missing critical/medium RPC span paths
Add spans to previously uninstrumented error and validation paths:

- gRPC: span in CallData::process(coro) with method name attribute,
  covers all 4 gRPC endpoints (GetLedger, GetLedgerData, etc.)
- WebSocket parse errors: span in onWSMessage() for invalid JSON
- WebSocket upgrade failures: span in onHandoff() try/catch
- Command dispatch rejections: span in doCommand() when fillHandler()
  fails (unknown command, too busy, permission denied)

New files: GrpcSpanNames.h (gRPC span constants)
Modified: GRPCServer.h (name_ member), RpcSpanNames.h (wsUpgrade op,
updated coverage diagram)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
895e9167b0 docs(telemetry): replace text hierarchy with ASCII box diagrams
Follow project convention (PerfLog.h, SpanGuard.h) for documentation
diagrams. Show HTTP single, HTTP batch, and WebSocket span nesting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
d15d2d2df6 docs(telemetry): add RPC span coverage map to RpcSpanNames.h
Document the span hierarchy, covered paths, and known instrumentation
gaps directly in the header that developers reference when adding spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
75bcd4ff53 refactor(telemetry): extract span name constants into modular headers
Centralise scattered string literals into compile-time constants using
StaticStr<N> and join() for dot-separated composition. Shared primitives
live in SpanNames.h; RPC-specific names in RpcSpanNames.h. Future modules
(consensus, peer, ledger) add their own *SpanNames.h without bloating
the central header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
a73117ddd0 refactor(telemetry): update RPC call sites to TraceCategory API
Replace rpcSpan(fullName) calls with span(TraceCategory::Rpc, prefix,
name). Add 'using namespace telemetry' to both RPC files so call sites
read cleanly without repeated namespace qualifiers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
9e4d943c69 feat(telemetry): replace tracing macros with SpanGuard factory pattern
Delete TracingInstrumentation.h and replace all XRPL_TRACE_* macro
invocations with direct SpanGuard::rpcSpan() calls. SpanGuard's pimpl
design and global Telemetry accessor eliminate the need for macro
wrappers and explicit Telemetry instance passing at call sites.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
025a8a344b fix(telemetry): address Phase 1c code review findings
TracingInstrumentation.h:
- Unify all span-creation macros to use std::optional<SpanGuard>
  (fixes type mismatch between XRPL_TRACE_SPAN and SET_ATTR)
- Wrap XRPL_TRACE_SET_ATTR/EXCEPTION in do-while(0) (dangling-else)
- Move macros outside namespace blocks (macros are global)
- Cache telemetry reference to avoid double-evaluation
- Remove leaked _xrpl_span_ intermediate variable
- Add @note tags for thread safety, scope, and usage constraints
- Add 3 usage examples per CLAUDE.md requirements

ServerHandler.cpp:
- Remove misleading rpc.request span from onRequest() (span ended
  before coroutine runs, producing orphan spans)
- Add rpc.http_request span to HTTP processSession() (runs inside
  the coroutine, correct parent for rpc.process/rpc.command spans)
- Add XRPL_TRACE_EXCEPTION and error status in both catch blocks
  (WS processSession and processRequest)

SpanGuard.h:
- Add null guards to all mutating methods (setOk, setStatus,
  setAttribute, addEvent, recordException) for safety after discard()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
9ee9e566d4 removed presentation.md from root
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
0de807b1be Phase 1c: RPC integration - ServerHandler tracing, telemetry config wiring
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
59ee027d8a fix(telemetry): resolve clang-tidy warnings in SpanGuard.h
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
- Add [[nodiscard]] to factory and accessor methods
- NOLINT no-op stub instance methods that must stay non-static for API
  parity with the real implementation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00
Pratik Mankawde
7aa4486741 refactor(telemetry): remove unused SpanGuard::span(name) overload
Remove the single-arg span(name) factory that creates unconditional
spans without category gating. All call sites use the 3-arg
span(TraceCategory, prefix, name) variant which checks whether the
category is enabled in config before creating a span. The 1-arg form
was dead code with no callers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00
Pratik Mankawde
5e8277f36a docs(telemetry): fix doc references to match pimpl architecture
Replace references to non-existent TracingInstrumentation.h with
SpanGuard.cpp pimpl implementation that actually exists on this branch.
Update conditional compilation section to describe the pimpl approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00
Pratik Mankawde
573593ae31 refactor(telemetry): replace per-category factory methods with TraceCategory enum
Replace rpcSpan(), txSpan(), consensusSpan(), peerSpan(), ledgerSpan()
with a single span(TraceCategory, prefix, name) factory method. Adding
a new traceable subsystem now requires only a new enum value and one
switch case — no new methods or header changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00
Pratik Mankawde
a5c405f4be fix(telemetry): address Phase 1b code review findings
- SpanContext::isValid(): add inline no-op when XRPL_ENABLE_TELEMETRY
  is not defined, preventing a linker error if called in that path
- linkedSpan(): set kIsRootSpanKey on the StartSpanOptions parent
  context so linked spans start a genuinely independent sub-tree
  instead of silently becoming children of the current active span
- Telemetry::instance_: use std::atomic with acquire/release ordering
  to avoid a data race between start()/stop() and factory methods
  called from worker threads

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00
Pratik Mankawde
e9c5c3520e fix(telemetry): address Phase 1b code review findings
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:26:05 +01:00