Commit Graph

13813 Commits

Author SHA1 Message Date
Pratik Mankawde
d11731517c Set explicit uid on Prometheus datasource for dashboard compatibility
Dashboard template variables reference datasource uid "prometheus" but
the provisioning config had no uid set, causing Grafana to auto-assign
a random one and break dashboard panel queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
6bd71dc2a2 Add Grafana dashboard template variables for node and metric filtering
- Add resource_metrics_key_attributes to spanmetrics connector so
  service.instance.id becomes a Prometheus label for per-node filtering
- Add 'node' dropdown (service_instance_id) to all 3 dashboards
- Add 'command' dropdown (xrpl_rpc_command) to RPC Performance
- Add 'tx_origin' dropdown (xrpl_tx_local) to Transaction Overview
- Add 'consensus_mode' dropdown (xrpl_consensus_mode) to Consensus Health
- Update all panel PromQL queries to include $node filter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
ca7af13ef2 Add consensus close time panels and docs for accept.apply span
- Add Ledger Apply Duration and Close Time Agreement panels to
  consensus-health dashboard
- Add consensus.accept.apply to telemetry runbook with TraceQL
  queries for close time disagreements and consensus failures
- Add span to TESTING.md expected span catalog and verification loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
0eaabfa2dc Phase 5: Observability stack — spanmetrics, dashboards, runbook
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
31d0ed178a Phase 4a: Establish-phase gap fill & cross-node correlation
Add full consensus tracing with deterministic trace ID correlation
and establish-phase instrumentation:

- Deterministic trace_id from previousLedger.id() for cross-node
  correlation (switchable via consensus_trace_strategy config)
- Round-to-round span links (follows-from) for causal chaining
- Establish phase spans with convergence tracking, dispute resolution
  events, and threshold escalation attributes
- Validation spans with links to round spans (thread-safe via
  roundSpanContext_ snapshot for jtACCEPT cross-thread access)
- Mode change spans for proposing/observing transitions
- New startSpan overload with span links in Telemetry interface
- XRPL_TRACE_ADD_EVENT macro with do-while(0) safety wrapper
- Config validation for consensus_trace_strategy
- Test adaptor (csf::Peer) updated with getTelemetry() stub

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:00 +00:00
Pratik Mankawde
7675af41ec Add consensus.accept.apply span with ledger close time attributes
Add a new trace span in doAccept() capturing ledger close time details:
- xrpl.consensus.close_time: agreed-upon close time (epoch seconds)
- xrpl.consensus.close_time_correct: whether validators converged
  (per avCT_CONSENSUS_PCT = 75% threshold)
- xrpl.consensus.close_resolution_ms: time rounding granularity
- xrpl.consensus.state: "finished" or "moved_on" (consensus failure)
- xrpl.consensus.proposing: whether this node was proposing

Update Tempo datasource with close time filters, plan docs with
new span inventory, and add test coverage for the attribute pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:00 +00:00
Pratik Mankawde
655b78a7d8 Add consensus trace filters to Grafana Tempo datasource
Add xrpl.consensus.mode (static), xrpl.consensus.round (dynamic),
and xrpl.consensus.ledger.seq (static) search filters for Phase 4
consensus span attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:24 +00:00
Pratik Mankawde
0bbffaebc4 Phase 4: Consensus tracing — round lifecycle, proposals, validations
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:24 +00:00
Pratik Mankawde
e33e592128 Add transaction trace filters to Grafana Tempo datasource
Add xrpl.tx.hash (static), xrpl.tx.local and xrpl.tx.status
(dynamic) search filters for Phase 3 transaction span attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
e6b05d700c Phase 3: Transaction tracing — protobuf context, PeerImp, NetworkOPs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
f114ec5462 Fix gersemi cmake formatting in test CMakeLists
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
1d05075585 Appendix: add Phase 2-5 task lists to document index
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
e77f2f0fa5 Add RPC trace filters to Grafana Tempo datasource
Add xrpl.rpc.command (static), xrpl.rpc.status and xrpl.rpc.role
(dynamic) search filters for Phase 2 RPC span attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
ea00fb9bc8 Phase 2: Complete RPC tracing — interface, macros, attributes, tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
a707abd6f2 Phase 1c: RPC layer telemetry integration
Add tracing instrumentation to the RPC request handling layer:

TracingInstrumentation.h:
- Convenience macros (XRPL_TRACE_RPC, XRPL_TRACE_TX, etc.) that
  create RAII SpanGuard objects when telemetry is enabled
- XRPL_TRACE_SET_ATTR / XRPL_TRACE_EXCEPTION for span enrichment
- Zero-overhead no-ops when XRPL_ENABLE_TELEMETRY is not defined

RPCHandler.cpp:
- Trace each RPC command with span name "rpc.command.<method>"
- Record command name, API version, role, and status as attributes
- Capture exceptions on the span for error visibility

ServerHandler.cpp:
- Trace HTTP requests ("rpc.request"), WebSocket messages
  ("rpc.ws_message"), and processRequest ("rpc.process")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
99f4f16eb9 Phase 1a: OpenTelemetry plan documentation
Add comprehensive planning documentation for the OpenTelemetry
distributed tracing integration:

- Tracing fundamentals and concepts
- Architecture analysis of rippled's tracing surface area
- Design decisions and trade-offs
- Implementation strategy and code samples
- Configuration reference
- Implementation phases roadmap
- Observability backend comparison
- POC task list and presentation materials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
f333ae4c11 Fix gersemi cmake formatting (if/endif spacing, target_link_libraries wrapping)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
9ecea839bb Appendix: add presentation.md to document index
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
ba3bf72326 Move presentation.md into OpenTelemetryPlan directory
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
b3609fb345 Respect custom service_instance_id when set in [telemetry] config
Only override serviceInstanceId with the node public key when the user
hasn't explicitly set service_instance_id in the [telemetry] section.
This allows operators to assign human-friendly names (e.g. "validator-1")
while still defaulting to the node's base58 public key.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
99a0b4094e Fix empty service.instance.id by deferring node identity injection
The Telemetry object is constructed in ApplicationImp's member initializer
list where nodeIdentity_ is not yet available, resulting in an empty
service.instance.id resource attribute. Add setServiceInstanceId() virtual
method that Application::setup() calls after nodeIdentity_ is known but
before telemetry_->start() creates the OTel resource.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
48a3cf56b8 Add node identification filters to Grafana Tempo datasource
Add resource-level search filters for node identity:
- service.instance.id (node public key) — unique node identifier
- service.version (rippled build version)
- xrpl.network.id (numeric network ID)
- xrpl.network.type (mainnet/testnet/devnet/standalone)

These enable filtering traces by specific nodes in multi-node
deployments and by network in mixed environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
8e43103310 Add Tempo search filters and metrics generator for trace exploration
Configure Grafana Tempo datasource with pre-built search filters
(service.name, span name, status, duration) for the Explore UI.
Enable Tempo metrics_generator with service-graphs and span-metrics
processors to power Grafana's service map visualization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
d412791a01 Add Grafana Tempo as second trace backend alongside Jaeger
- Add Tempo 2.7.2 service to docker-compose with local storage
- Add otlp/tempo exporter to OTel Collector traces pipeline
- Add Tempo Grafana datasource provisioning with node graph
- Update 05-configuration-reference.md examples with Tempo
- OTel Collector fans traces to both Jaeger and Tempo simultaneously

Jaeger provides a standalone UI at :16686 for quick lookups.
Tempo is queryable via Grafana Explore using TraceQL and is the
recommended backend for production (supports S3/GCS storage).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
d8d7a40fca Phase 1b: Telemetry core infrastructure
Add the OpenTelemetry telemetry library and supporting infrastructure:

Build system:
- Conan opentelemetry-cpp dependency with OTLP/gRPC exporter
- CMake integration for xrpl_telemetry library target
- Levelization ordering updates

Core library (libxrpl):
- Telemetry class: provider lifecycle, span creation, sampling config
- SpanGuard: RAII span management with attribute/exception helpers
- TelemetryConfig: parse [telemetry] config section
- NullTelemetry: no-op implementation when telemetry is disabled

Application integration:
- Telemetry member in ApplicationImp with start/stop lifecycle
- getTelemetry() interface on Application
- ServiceRegistry telemetry accessor

Docker observability stack:
- OTel Collector, Jaeger, Grafana docker-compose setup
- Collector config with OTLP gRPC receiver and Jaeger exporter

Config and docs:
- Example telemetry config section in xrpld-example.cfg
- Build documentation for telemetry setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Pratik Mankawde
fe6cd31762 Add Phase 4a implementation status to plan docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:08:13 +00:00
Pratik Mankawde
fd18cf9e01 Merge remote-tracking branch 'origin/develop' into pratik/otel-phase1a-plan-docs 2026-03-11 14:58:44 +00:00
Alex Kremer
7b3724b7a3 fix: Add missed clang-tidy bugprone-inc-dec-conditions check (#6526) 2026-03-11 14:04:26 +00:00
Ayaz Salikhov
bee2d112c6 ci: Fix how clang-tidy is run when .clang-tidy is changed (#6521) 2026-03-11 14:18:18 +01:00
Bart
01c977bbfe ci: Fix rules used to determine when to upload Conan recipes (#6524)
The refs as previously used pointed to the source branch, not the target branch. However, determining the target branch is different depending on the GitHub event. The pull request logic was incorrect and needed to be fixed, and the logic inside the workflow could be simplified. Both modifications have been made in this commit.
2026-03-11 13:43:58 +01:00
Bart
3baf5454f2 ci: Only upload artifacts in the XRPLF/rippled repository (#6523)
This change will only attempt to upload artifacts for CI runs performed in the XRPLF/rippled repository.
2026-03-11 11:48:40 +01:00
Michael Legleux
24a5cbaa93 chore: Build voidstar on amd64 only (#6481)
* chore: Build voidstar on amd64 only

* fatal error if configuring voidstar on  non x86
2026-03-10 23:59:43 +00:00
Michael Legleux
eb7c8c6c7a chore: Use CMake components for install (#6485)
* chore: Use components for install

* rm CMake export targets

* reformat
2026-03-10 23:38:43 +00:00
Alex Kremer
f27d8f3890 chore: Enable clang-tidy bugprone-inc-dec-in-conditions check (#6455) 2026-03-10 20:12:15 +00:00
Alex Kremer
8345cd77df chore: Enable clang-tidy bugprone-unused-raii check (#6505) 2026-03-10 19:48:56 +00:00
Alex Kremer
c38aabdaee chore: Enable clang-tidy bugprone-unhandled-self-assignment check (#6504) 2026-03-10 17:42:49 +00:00
Alex Kremer
a896ed3987 chore: Enable clang-tidy bugprone-optional-value-conversion check (#6470) 2026-03-10 15:56:24 +01:00
Alex Kremer
1a7d67c4db chore: Enable clang-tidy bugprone-reserved-identifier check (#6456) 2026-03-10 10:29:08 +01:00
Alex Kremer
92983d8040 chore: Enable clang-tidy bugprone-too-small-loop-variable check (#6473) 2026-03-10 08:56:44 +00:00
Alex Kremer
320a65f77c chore: Enable clang-tidy bugprone-suspicious-stringview-data-usage check (#6467) 2026-03-10 08:34:27 +00:00
Ayaz Salikhov
45b8c4d732 chore: Update XRPLF/actions (#6508)
This change mainly includes XRPLF/actions#51.
2026-03-09 21:47:22 +00:00
Pratik Mankawde
d6bf13394e Appendix: add 00-tracing-fundamentals.md and POC_taskList.md to document index
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:03:09 +00:00
Pratik Mankawde
34243e0cc2 Phase 1a: OpenTelemetry plan documentation
Add comprehensive planning documentation for the OpenTelemetry
distributed tracing integration:

- Tracing fundamentals and concepts
- Architecture analysis of rippled's tracing surface area
- Design decisions and trade-offs
- Implementation strategy and code samples
- Configuration reference
- Implementation phases roadmap
- Observability backend comparison
- POC task list and presentation materials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 19:03:09 +00:00
Alex Kremer
e284969ae4 chore: Enable clang-tidy bugprone-pointer-arithmetic-on-polymorphic-object check (#6469) 2026-03-09 19:36:56 +01:00
Alex Kremer
0335076359 chore: Fix additional clang-tidy issues for unused-local-non-trivial-variable check (#6509) 2026-03-09 17:16:04 +00:00
Ayaz Salikhov
7e2b137131 chore: Use check-pr-title from XRPLF/actions (#6506) 2026-03-09 17:53:52 +01:00
Sergey Kuznetsov
e2290b1a0a feat: Add mutex wrapper from clio (#6447)
This change adds a mutex wrapper copied from clio. The wrapper attaches a mutex to the data it protects, which improves safety and readability.
2026-03-09 16:33:20 +00:00
Alex Kremer
1ee0567b14 chore: Enable clang-tidy bugprone-suspicious-missing-comma check (#6468) 2026-03-09 15:48:38 +00:00
Alex Kremer
6b301efc8c chore: Enable clang-tidy bugprone-unused-local-non-trivial-variable check (#6458) 2026-03-09 15:25:52 +00:00
dependabot[bot]
9e0d350fca ci: [DEPENDABOT] bump tj-actions/changed-files from 47.0.4 to 47.0.5 (#6501) 2026-03-09 15:27:03 +01:00