Commit Graph

14172 Commits

Author SHA1 Message Date
Pratik Mankawde
fe7cb33b65 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-05-13 16:53:47 +01:00
Pratik Mankawde
f5cf4155c2 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-05-13 16:53:45 +01:00
Pratik Mankawde
ea30adeb47 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-05-13 16:53:43 +01:00
Pratik Mankawde
9bc2e4abb3 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-05-13 16:53:32 +01:00
Pratik Mankawde
7b9e0bc300 fix(telemetry): remove unused includes from RPCHandler after node-health attr removal
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:53:19 +01:00
Pratik Mankawde
b05e650b6f docs(telemetry): update 09-data-collection-reference + Phase5 integration test list for simplified attr naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:42:30 +01:00
Pratik Mankawde
57175ab12c Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-05-13 16:37:37 +01:00
Pratik Mankawde
d44a0aa3ff docs(telemetry): update Phase5 task list for simplified attr naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:37:27 +01:00
Pratik Mankawde
522fe562ff Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-05-13 16:36:34 +01:00
Pratik Mankawde
745102360b docs(telemetry): update Phase4 task list for simplified consensus attr naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:36:22 +01:00
Pratik Mankawde
19d9c44cf5 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-05-13 16:31:35 +01:00
Pratik Mankawde
5c14b57462 docs(telemetry): update Phase3 task list for simplified tx/txq attr naming
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:31:22 +01:00
Pratik Mankawde
c875944e05 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-05-13 16:29:32 +01:00
Pratik Mankawde
2430032e3a docs(telemetry): update Phase2 task list + design docs for attr rename
- Phase2_taskList: update attr refs to bare names, note node-health
  attrs moved to resource level.
- 02-design-decisions: strip xrpl.pathfind.* prefix from planned attrs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:29:20 +01:00
Pratik Mankawde
0f63d14999 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-05-13 16:28:07 +01:00
Pratik Mankawde
faaec003f4 docs(telemetry): update plan docs for simplified RPC/gRPC attr naming
Update OpenTelemetryPlan docs and Telemetry.h doc example to reflect
the renamed per-span attributes: xrpl.rpc.command -> command,
xrpl.rpc.status -> rpc_status, xrpl.grpc.method -> method, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:27:55 +01:00
Pratik Mankawde
9e27120a15 refactor(telemetry): simplify ledger/peer attr naming on phase-6, update dashboards
- Add canonical ledgerHash (xrpl.ledger.hash) to SpanNames.h.
- LedgerSpanNames: reuse shared canonicals (ledgerSeq, closeTime,
  closeTimeCorrect, closeResolutionMs, ledgerHash); bare names for
  tx_count, tx_failed, validations.
- PeerSpanNames: reuse shared canonicals (peerId, ledgerHash); bare
  names for proposal_trusted, validation_full, validation_trusted.
- Update call sites in BuildLedger.cpp, LedgerMaster.cpp, PeerImp.cpp.
- Update 5 Grafana dashboards: strip xrpl.<domain>. prefix from
  per-span attr refs in PromQL/TraceQL queries. Keep rule-5 entries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:16:30 +01:00
Pratik Mankawde
e60efd4d2f Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-05-13 16:10:46 +01:00
Pratik Mankawde
c48f5ed6e7 docs(telemetry): update runbook attr names for simplified naming convention
Update 31 attribute references in telemetry-runbook.md to match the
simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use
domain-qualified names for collisions (rpc_status, consensus_state,
etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:08:48 +01:00
Pratik Mankawde
c9fe4b1a14 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-05-13 16:04:27 +01:00
Pratik Mankawde
46d1012ad4 refactor(telemetry): simplify consensus attr naming on phase-4 — drop xrpl.consensus. prefix
- Add canonical shared bare attrs to SpanNames.h: closeTime,
  closeTimeCorrect, closeResolutionMs (reused by ledger domain).
- Keep qualified (rule 5): ledgerId, mode, round, roundId.
- Domain-qualify collisions: state -> consensus_state,
  result -> consensus_result.
- Reuse canonical ledgerSeq from phase-3.
- Drop xrpl.consensus.* prefix from 20+ attrs (proposers, round_time_ms,
  converge_percent, avalanche_threshold, etc.).
- Dispute attrs: bare names (dispute_our_vote, dispute_yays, etc.).
- Update call sites in RCLConsensus.cpp, Consensus.h.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:04:16 +01:00
Pratik Mankawde
7eeddd3ad9 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-05-13 16:01:13 +01:00
Pratik Mankawde
e339ba1f6b refactor(telemetry): simplify tx/txq attr naming on phase-3 — drop xrpl.<domain>. prefix
- Add canonical shared attrs to SpanNames.h: txHash (xrpl.tx.hash),
  peerId (xrpl.peer.id), ledgerSeq (xrpl.ledger.seq).
- Drop xrpl.tx.* prefix: local, path, suppressed, peer_version.
- Domain-qualify: status -> tx_status, txq status -> txq_status.
- TxQ: tx_hash -> reuse canonical txHash, ledger_seq -> reuse canonical
  ledgerSeq; bare names for fee_level_paid, required_fee_level, etc.
- Update call sites in PeerImp.cpp, NetworkOPs.cpp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 16:01:00 +01:00
Pratik Mankawde
ac1b01b4c7 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-05-13 15:57:45 +01:00
Pratik Mankawde
497dd007d9 refactor(telemetry): simplify attr naming on phase-2 — drop xrpl.pathfind. prefix
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
  dest_account, fast, search_level, num_complete_paths, num_paths,
  num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
  xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
  RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 15:57:36 +01:00
Pratik Mankawde
0d845149ec Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing 2026-05-13 15:55:39 +01:00
Pratik Mankawde
7a854ccad2 refactor(telemetry): simplify attr naming on phase-1c — drop xrpl.<domain>. prefix
- Drop xrpl.rpc.* prefix from per-span attrs (command, version).
- Qualify collision-prone fields: role -> rpc_role/grpc_role,
  status -> rpc_status/grpc_status.
- Rename payload_size -> request_payload_size for cross-domain clarity.
- Simplify link.type -> link_type (bare name, no join).
- Update convention doc in SpanNames.h to reflect new naming rules.
- Update telemetry.md doc with renamed attr keys.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 15:54:13 +01:00
Pratik Mankawde
580ee5ede7 fix(telemetry): StatsD gauge and io_latency first-sample emit
Two fixes so gauges register in Prometheus (via StatsD) even when their
initial/steady-state value is 0:

1. StatsDGaugeImpl m_dirty: default-init to true so the initial value
   (0) is emitted on the first flush. Previously, gauges whose value
   never changed from 0 were never flushed and never appeared
   downstream.

2. io_latency_sampler firstSample_: new atomic<bool>, init true.
   m_event.notify now fires when either firstSample_ is true (exchanged
   to false) or lastSample >= 10 ms. This guarantees the io_latency
   metric is registered on startup; subsequent sub-10 ms samples are
   still suppressed to avoid flooding.
2026-05-13 14:40:58 +01:00
Pratik Mankawde
937d11d7c3 fix(telemetry): default tx span attrs on receive path
Set defaults for tx_span::attr::suppressed (false) and
tx_span::attr::status ("new") immediately after creating the txReceive
span. Without defaults, spans whose suppressed/status attributes would
only be set in the HashRouter-suppressed branch lacked these attributes
entirely, producing incomplete span data in downstream stores.

The suppressed branch still overrides these when the transaction has
already been seen via HashRouter.
2026-05-13 14:40:57 +01:00
Pratik Mankawde
beaf01ae4d fix(telemetry): fix CI failures in phase-6 build, clang-tidy, and rename checks
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
  validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture

Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
  PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp

Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
  09-data-collection-reference.md and telemetry-runbook.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 17:09:17 +01:00
Pratik Mankawde
57ed0d9fd0 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-04-29 19:59:02 +01:00
Pratik Mankawde
51918ef868 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-04-29 19:58:54 +01:00
Pratik Mankawde
c8674d61b8 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing
# Conflicts:
#	src/xrpld/app/consensus/RCLConsensus.cpp
2026-04-29 19:58:45 +01:00
Pratik Mankawde
cf18032e7f fix(telemetry): address clang-tidy errors on phase3 transaction tracing files
- Add [[maybe_unused]] to RAII span variables in TxQ.cpp
- Remove unused st.h include, add missing to_string header in TxQ.cpp
- Concatenate nested namespaces in TxQSpanNames.h, TxSpanNames.h,
  ConsensusReceiveTracing.h, PropagationHelpers.h, TxTracing.h
- Remove unused TraceContextPropagator.h include from RCLConsensus.cpp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 19:57:42 +01:00
Pratik Mankawde
f18ddd95c1 fix(telemetry): address clang-tidy errors on phase4 consensus tracing files
- Add [[nodiscard]] to getConsensusTraceStrategy, getYays, getNays
- Add missing <string>, SpanGuard.h, SpanNames.h includes
- Fix widening cast placement (cast before arithmetic, not after)
- Replace nested ternary with lambda for const dir variable
- Add braces to if/else-if chains in Consensus.h
- Concatenate nested namespaces in ConsensusSpanNames.h

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 19:54:23 +01:00
Pratik Mankawde
e6266e4e8d Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-04-29 18:20:23 +01:00
Pratik Mankawde
025620cc4e Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-04-29 18:20:19 +01:00
Pratik Mankawde
692ce65f3e Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-04-29 18:20:04 +01:00
Pratik Mankawde
bb3732c22f Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-04-29 18:19:58 +01:00
Pratik Mankawde
87a8780b73 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing
# Conflicts:
#	src/xrpld/rpc/detail/RPCHandler.cpp
2026-04-29 18:18:37 +01:00
Pratik Mankawde
78522ba18e Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration 2026-04-29 18:17:02 +01:00
Pratik Mankawde
79fbb9c303 fix(telemetry): address clang-tidy errors on phase1c RPC integration files
- Concatenate nested namespaces in SpanNames.h, RpcSpanNames.h, GrpcSpanNames.h
- Remove unused InfoSub.h and NetworkOPs.h includes from RPCHandler.cpp
- Add missing <string_view> includes in RPCHandler.cpp and GRPCServer.cpp
- Replace nested ternary with if/else-if in RPCHandler.cpp
- Add IWYU pragma keep for json_body.h in ServerHandler.cpp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 18:16:24 +01:00
Pratik Mankawde
e21e7b0d51 fix(telemetry): add missing PublicKey.h include for toBase58 in Application.cpp
Clang-tidy misc-include-cleaner requires direct includes for all used
symbols. Application.cpp calls toBase58(TokenType::NodePublic, ...) at
line 1359 but did not directly include PublicKey.h.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:51:35 +01:00
Pratik Mankawde
3dd2f34591 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
# Conflicts:
#	OpenTelemetryPlan/Phase3_taskList.md
#	docker/telemetry/grafana/provisioning/datasources/tempo.yaml
#	docs/telemetry-runbook.md
#	include/xrpl/proto/xrpl.proto
#	src/xrpld/app/consensus/RCLConsensus.cpp
#	src/xrpld/app/misc/detail/TxQ.cpp
2026-04-29 17:38:03 +01:00
Pratik Mankawde
521e0756e1 docs(telemetry): add cross-node trace propagation to runbook
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:33:10 +01:00
Pratik Mankawde
dbcd040180 fix(telemetry): fix Clang unused-variable and incomplete-type errors
- Add [[maybe_unused]] to RAII spans in TxQ.cpp
- Include Telemetry.h in RCLConsensus.cpp for complete type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
17e69e660c feat(telemetry): add toDisplayString() and use Title Case in consensus attributes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
ef10c754b1 fix(telemetry): address code review findings for Phase 4 consensus tracing
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
912890c104 fix: address PR review round 2 — event name constants, span timing
- Add cons_span::event namespace with disputeResolve and txIncluded
  constants; replace hardcoded strings in Consensus.h and RCLConsensus.cpp
- Move proposal.receive and validation.receive spans in PeerImp into
  shared_ptr captured by job lambdas so they measure checkPropose and
  checkValidation timing, not just message parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
ac68091bec code review changes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-04-29 17:32:56 +01:00