Commit Graph

121 Commits

Author SHA1 Message Date
Pratik Mankawde
4d6ddb5f1f Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-06-01 14:56:09 +01:00
Pratik Mankawde
e7dea147cd Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 18:18:36 +01:00
Pratik Mankawde
8d730b8b9a Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 18:16:35 +01:00
Pratik Mankawde
2f96c6547c Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 16:51:31 +01:00
Pratik Mankawde
c187a62353 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 16:47:15 +01:00
Pratik Mankawde
c848e51e13 Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 16:44:07 +01:00
Pratik Mankawde
8f9057729c Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 16:14:21 +01:00
Pratik Mankawde
f031befc6e compilation fixes and levelization fixes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 16:04:19 +01:00
Pratik Mankawde
3a1f22583f Merge branch 'pratik/otel-phase1a-plan-docs' into pratik/otel-phase1b-telemetry-infra
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-29 15:34:22 +01:00
Ayaz Salikhov
f9551ac5ca style: Run shfmt on workflows, actions and markdown bash code (#7333) 2026-05-27 19:24:18 +00:00
Ayaz Salikhov
23d0812827 style: Use shfmt instead of bashate (#7326) 2026-05-26 18:28:23 +00:00
Pratik Mankawde
e9d885bd9b fix: Fix clang-tidy pre-commit hook to locate compile_commands.json from repo root (#7325)
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-26 13:50:18 +00:00
Michael Legleux
93836f22db ci: Add Linux package builds (DEB + RPM) to CI (#6639) 2026-05-16 05:08:37 +00:00
Alex Kremer
5b6e8b6f93 refactor: Rename static constants (#7120)
Co-authored-by: Bart <bthomee@users.noreply.github.com>
2026-05-15 15:32:19 +00:00
Pratik Mankawde
25d2dae798 fix(tests): align MockServiceRegistry overrides with ServiceRegistry interface
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".

Rename all 13 mismatched overrides to match the current interface:
  timeKeeper          -> getTimeKeeper
  cachedSLEs          -> getCachedSLEs
  validators          -> getValidators
  validatorSites      -> getValidatorSites
  validatorManifests  -> getValidatorManifests
  publisherManifests  -> getPublisherManifests
  overlay             -> getOverlay
  cluster             -> getCluster
  peerReservations    -> getPeerReservations
  pendingSaves        -> getPendingSaves
  openLedger (x2)     -> getOpenLedger
  getPathRequests     -> getPathRequestManager (type rename too)
  journal             -> getJournal
  logs                -> getLogs
  trapTxID            -> getTrapTxID
  app                 -> getApp

Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 12:12:59 +01:00
Pratik Mankawde
93a7df6147 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-05-13 11:31:16 +01:00
Pratik Mankawde
c096eeb239 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-05-13 11:30:22 +01:00
Bronek Kozicki
d67e06102a chore: Upgrade Clang sanitizer to clang-22 and switch gcc-15 sanitizer to Release (#7079) 2026-05-07 10:36:36 +00:00
Pratik Mankawde
92072ecca4 fix(telemetry): fix CI failures — clang-tidy, levelization, linker
Clang-tidy fixes:
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
  in OTelCollector.h, OTelCollector.cpp, ValidationTracker.h/.cpp
- Add missing direct includes (misc-include-cleaner) in
  ValidationTracker.cpp, test, CollectorManager.cpp, OTelCollector.cpp
- Make lock_guard variables const (misc-const-correctness)
- Add braces around single-line if/else (readability-braces-around-statements)
- Use designated initializer for WindowEvent (modernize-use-designated-initializers)
- Initialize LedgerEvent::seq field (cppcoreguidelines-pro-type-member-init)

Linker fix:
- Add ValidationTracker.cpp as source to xrpl.test.telemetry target
  (it lives in src/xrpld/ but the test links against libxrpl only)

Levelization fix:
- Remove stale dependency edges from ordering.txt that were introduced
  by the erroneous develop-merge commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:07:25 +01:00
Pratik Mankawde
b62987bda7 Revert "fix: revert all unrelated upstream develop changes from phase-7 PR"
This reverts commit f4555c80fe.
2026-05-06 14:13:59 +01:00
Pratik Mankawde
f08412b3e0 ordering update
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-06 13:27:29 +01:00
Alex Kremer
8995564ed6 refactor: Enable clang-tidy readability-identifier-naming check (#6571) 2026-05-03 10:31:53 +00:00
Pratik Mankawde
40cc7f9ed7 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-30 17:00:39 +01:00
Pratik Mankawde
f4555c80fe fix: revert all unrelated upstream develop changes from phase-7 PR
Reverts 259 files that carried unrelated upstream changes through the
phase-6 merge: enum class removals (cppcoreguidelines-use-enum-class),
scoped_lock→lock_guard conversions (modernize-use-scoped-lock),
nodestore Backend API changes (void const* key), .clang-tidy config,
test infrastructure deletions, and miscellaneous develop changes.

These changes belong on develop, not in the telemetry PR chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 16:59:24 +01:00
Pratik Mankawde
9e12e660fe Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:25:13 +01:00
Pratik Mankawde
b65f91117f fix: address CI checks (prettier, docs.sh rename, levelization)
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:03:22 +01:00
Pratik Mankawde
ef10c754b1 fix(telemetry): address code review findings for Phase 4 consensus tracing
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
5f7de1bb48 feat(telemetry): complete Phase 4 consensus tracing
Implement remaining Phase 4/4a consensus tracing tasks:

- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()

Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
8fb33b0818 feat(telemetry): add Phase 4 consensus tracing with SpanGuard API
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.

Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:

- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
  (consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
  (used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
  following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
  for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
  consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
  from consensus thread to jtACCEPT worker thread

Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 17:32:56 +01:00
Pratik Mankawde
61cb1faf8f feat(telemetry): add cross-node trace context propagation
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.

- Add TraceBytes struct and SpanGuard::getTraceBytes() for
  extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
  validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
  with sender's trace context as parent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:49 +01:00
Pratik Mankawde
5cbb349efa fix(telemetry): fix include ordering, levelization, and rename for phase 3
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 17:32:49 +01:00
Pratik Mankawde
3508917f17 feat(telemetry): Phase 3 transaction tracing with protobuf context propagation
- TraceContext protobuf message for cross-node trace propagation
  (added to TMTransaction, TMProposeSet, TMValidation at field 1001)
- TraceContextPropagator.h: inline extractFromProtobuf/injectToProtobuf
- PeerImp::handleTransaction: tx.receive span with peer.id, peer.version,
  tx.hash, tx.suppressed, tx.status attributes
- NetworkOPsImp::processTransaction: tx.process span with tx.hash,
  tx.local, tx.path attributes
- Tempo search filters for tx.hash, tx.local, tx.status
- Unit tests for TraceContextPropagator (round-trip, edge cases)
- Levelization: xrpld.app/overlay > xrpld.telemetry dependencies

Translated from macro API (XRPL_TRACE_TX/SET_ATTR) to SpanGuard factory
pattern introduced in Phase 1c.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 17:32:49 +01:00
Pratik Mankawde
ac11217195 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment
# Conflicts:
#	OpenTelemetryPlan/Phase3_taskList.md
#	include/xrpl/telemetry/TraceContextPropagator.h
2026-04-29 14:24:38 +01:00
Pratik Mankawde
103dd605d2 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing
# Conflicts:
#	include/xrpl/telemetry/SpanGuard.h
#	src/xrpld/overlay/detail/PeerImp.cpp
2026-04-29 14:23:31 +01:00
Pratik Mankawde
12b7316f71 feat(telemetry): add cross-node trace context propagation
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.

- Add TraceBytes struct and SpanGuard::getTraceBytes() for
  extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
  validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
  with sender's trace context as parent

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 14:21:32 +01:00
Pratik Mankawde
21dad9a17d docs(telemetry): sync runbook, dashboards, and configs with code
- Add 14 missing spans to runbook (6 TxQ + 8 consensus)
- Fix tx.receive attributes and config table in runbook
- Document dispute.resolve and tx.included span events
- Add spanmetrics dimensions for close_time_correct and tx.suppressed
- Fix Close Time Agreement and TX Receive vs Suppressed panel PromQL
- Wire $consensus_mode template variable to all consensus panels
- Add 10 Tempo search filters for operational attributes
- Apply rename script artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 12:29:53 +01:00
Pratik Mankawde
8e97c7329a fix(telemetry): fix include ordering, levelization, and rename for phase 3
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 11:23:43 +01:00
Pratik Mankawde
fe058d49b4 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-04-29 11:21:35 +01:00
Pratik Mankawde
bd6e58a20e fix(telemetry): add missing span constants, fix test API, update levelization
Add unknownCommand and wsUpgrade span name constants to RpcSpanNames.h,
fix SpanGuardFactory tests to use the 3-argument SpanGuard::span() API,
update levelization results, and apply rename script to docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-29 11:21:13 +01:00
Pratik Mankawde
c01f8ae99c fix(telemetry): address code review findings for Phase 4 consensus tracing
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 18:14:00 +01:00
Pratik Mankawde
bc49eb6f83 feat(telemetry): complete Phase 4 consensus tracing
Implement remaining Phase 4/4a consensus tracing tasks:

- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()

Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 16:16:53 +01:00
Jingchen
46b997b774 feat: Create new transaction testing framework TxTest (#6537)
Signed-off-by: JCW <a1q123456@users.noreply.github.com>
Co-authored-by: xrplf-ai-reviewer[bot] <266832837+xrplf-ai-reviewer[bot]@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
2026-04-28 14:16:10 +00:00
Pratik Mankawde
34ee231d62 feat(telemetry): add Phase 4 consensus tracing with SpanGuard API
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.

Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:

- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
  (consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
  (used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
  following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
  for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
  consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
  from consensus thread to jtACCEPT worker thread

Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:34:39 +01:00
Pratik Mankawde
19eead6955 feat(telemetry): Phase 3 transaction tracing with protobuf context propagation
- TraceContext protobuf message for cross-node trace propagation
  (added to TMTransaction, TMProposeSet, TMValidation at field 1001)
- TraceContextPropagator.h: inline extractFromProtobuf/injectToProtobuf
- PeerImp::handleTransaction: tx.receive span with peer.id, peer.version,
  tx.hash, tx.suppressed, tx.status attributes
- NetworkOPsImp::processTransaction: tx.process span with tx.hash,
  tx.local, tx.path attributes
- Tempo search filters for tx.hash, tx.local, tx.status
- Unit tests for TraceContextPropagator (round-trip, edge cases)
- Levelization: xrpld.app/overlay > xrpld.telemetry dependencies

Translated from macro API (XRPL_TRACE_TX/SET_ATTR) to SpanGuard factory
pattern introduced in Phase 1c.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 14:28:31 +01:00
Pratik Mankawde
3b93e2d4d9 fix(telemetry): suppress unused span warning and regenerate levelization
- Add [[maybe_unused]] to the RAII span in processSession() — the
  variable is not read but its lifetime scopes the active OTel context
  for child spans created in processRequest()
- Regenerate levelization: remove premature xrpld.telemetry entries
  that reference a module not yet present on this branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
0de807b1be Phase 1c: RPC integration - ServerHandler tracing, telemetry config wiring
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:27:31 +01:00
Pratik Mankawde
88686af850 Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 14:25:31 +01:00
Pratik Mankawde
3547112540 fix: Fix ubsan flagged issues (#6151)
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: xrplf-ai-reviewer[bot] <266832837+xrplf-ai-reviewer[bot]@users.noreply.github.com>
2026-04-27 20:34:16 +00:00
Alex Kremer
19da25812b fix: Remaining clang-tidy unchecked optionals (#6979) 2026-04-23 16:21:01 +00:00
Ayaz Salikhov
3429845c40 style: Add bashate pre-commit hook to unify bash style (#6994) 2026-04-22 14:26:02 +00:00