Pratik Mankawde
98fc939851
Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-06-01 15:01:19 +01:00
Pratik Mankawde
4d6ddb5f1f
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-06-01 14:56:09 +01:00
Pratik Mankawde
ba7e1f98e4
Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 18:24:43 +01:00
Pratik Mankawde
e7dea147cd
Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 18:18:36 +01:00
Pratik Mankawde
8d730b8b9a
Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 18:16:35 +01:00
Pratik Mankawde
e5fae351d6
Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment
2026-05-29 17:53:29 +01:00
Pratik Mankawde
a44d91ec27
leftover clang-tidy fixes
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 17:52:45 +01:00
Pratik Mankawde
8f9057729c
Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 16:14:21 +01:00
Pratik Mankawde
3a1f22583f
Merge branch 'pratik/otel-phase1a-plan-docs' into pratik/otel-phase1b-telemetry-infra
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 15:34:22 +01:00
Pratik Mankawde
f66a53cfc9
Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-29 14:51:12 +01:00
Pratik Mankawde
8b790ebac9
bumped otel version to 1.26.0
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-28 12:18:20 +01:00
Pratik Mankawde
824f63216a
Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-27 16:57:08 +01:00
Pratik Mankawde
a104140a51
addressing code review comments
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-27 16:46:35 +01:00
Pratik Mankawde
ac57a91b77
merge: phase-9 (dashboard UID + line-number cleanup, detach callbacks) into phase-10
...
# Conflicts:
# docker/telemetry/TESTING.md
2026-05-14 17:23:55 +01:00
Pratik Mankawde
a9f52458b3
merge: pratik/otel-phase8-log-correlation (dashboard UID + line-number cleanup) into pratik/otel-phase9-metric-gap-fill
...
# Conflicts:
# docker/telemetry/grafana/dashboards/consensus-health.json
# docker/telemetry/grafana/dashboards/ledger-operations.json
# docker/telemetry/grafana/dashboards/peer-network.json
# docker/telemetry/grafana/dashboards/rpc-performance.json
# docker/telemetry/grafana/dashboards/system-ledger-data-sync.json
# docker/telemetry/grafana/dashboards/system-network-traffic.json
# docker/telemetry/grafana/dashboards/system-node-health.json
# docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json
# docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json
# docker/telemetry/grafana/dashboards/transaction-overview.json
2026-05-14 17:10:12 +01:00
Pratik Mankawde
0e5e802e5e
merge: pratik/otel-phase7-native-metrics (dashboard UID + line-number cleanup) into pratik/otel-phase8-log-correlation
2026-05-14 17:07:34 +01:00
Pratik Mankawde
6985e1948b
merge: pratik/otel-phase6-statsd (line-number + docs cleanup) into pratik/otel-phase7-native-metrics
...
# Conflicts:
# OpenTelemetryPlan/06-implementation-phases.md
# docker/telemetry/grafana/dashboards/system-ledger-data-sync.json
# docker/telemetry/grafana/dashboards/system-network-traffic.json
# docker/telemetry/grafana/dashboards/system-node-health.json
# docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json
# docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json
2026-05-14 17:07:15 +01:00
Pratik Mankawde
dfe91e071f
merge: phase-5 (runbook span-name + line-number fixes) into phase-6
...
# Conflicts:
# OpenTelemetryPlan/06-implementation-phases.md
# docs/telemetry-runbook.md
2026-05-14 16:42:13 +01:00
Pratik Mankawde
dec8b0a9a1
docs(telemetry): fix stale RPC span names + drop volatile line numbers in runbook
...
- RPC Spans table: `rpc.request` was documented but the code actually emits
`rpc.http_request`. Listed the actual emitted names
(`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`)
and their parent/child relationship.
- Drop `:<line>` suffixes from Source File columns in both RPC and
Transaction span tables. Line numbers drift with every refactor; the
filename is enough for operators to grep.
- Summary table: replace the never-emitted `rpc.request` row with the real
entry points so `span_name=` filters in PromQL / TraceQL match.
2026-05-14 16:34:58 +01:00
Pratik Mankawde
ec8e3e2950
Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
2026-05-13 16:17:49 +01:00
Pratik Mankawde
495d5bd8a0
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
2026-05-13 16:17:12 +01:00
Pratik Mankawde
6cd910f06f
Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
2026-05-13 16:17:05 +01:00
Pratik Mankawde
5cd71ed107
Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics
2026-05-13 16:16:50 +01:00
Pratik Mankawde
e60efd4d2f
Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
2026-05-13 16:10:46 +01:00
Pratik Mankawde
c48f5ed6e7
docs(telemetry): update runbook attr names for simplified naming convention
...
Update 31 attribute references in telemetry-runbook.md to match the
simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use
domain-qualified names for collisions (rpc_status, consensus_state,
etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-13 16:08:48 +01:00
Pratik Mankawde
c9fe4b1a14
Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment
2026-05-13 16:04:27 +01:00
Pratik Mankawde
7a854ccad2
refactor(telemetry): simplify attr naming on phase-1c — drop xrpl.<domain>. prefix
...
- Drop xrpl.rpc.* prefix from per-span attrs (command, version).
- Qualify collision-prone fields: role -> rpc_role/grpc_role,
status -> rpc_status/grpc_status.
- Rename payload_size -> request_payload_size for cross-domain clarity.
- Simplify link.type -> link_type (bare name, no join).
- Update convention doc in SpanNames.h to reflect new naming rules.
- Update telemetry.md doc with renamed attr keys.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-13 15:54:13 +01:00
Pratik Mankawde
782d98d249
Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
...
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com >
2026-05-13 11:40:15 +01:00
Pratik Mankawde
c096eeb239
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
2026-05-13 11:30:22 +01:00
Pratik Mankawde
fac6c3ac1d
Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
2026-05-06 14:34:17 +01:00
Pratik Mankawde
761688383d
fix(telemetry): address code review issues in OTelCollector
...
- Fix use-after-free: extract gauge callback to static function and call
RemoveCallback in ~OTelGaugeImpl() before unregistering from collector
- Use memory_order_acq_rel on callHooks() debounce CAS for proper
happens-before relationship between hook invocations
- Add explicit 2s timeout to ForceFlush() in destructor to prevent
blocking indefinitely when OTLP endpoint is unreachable at shutdown
- Add OTLP receiver to metrics pipeline so native OTel metrics from
xrpld are actually received by the collector
- Remove stale health check port from docker-compose (extension was
removed from collector config)
- Clarify fallback docs: StatsD path requires re-enabling receiver/port
- Fix comments: Counter uses uint64_t not int64_t, gauge clamps to
[0, INT64_MAX] not [0, UINT64_MAX]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-06 14:24:52 +01:00
Ayaz Salikhov
27f7fdb3a6
chore: Do not duplicate sanitizer flags ( #7058 )
...
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com >
2026-05-05 16:32:43 +00:00
Alex Kremer
8995564ed6
refactor: Enable clang-tidy readability-identifier-naming check ( #6571 )
2026-05-03 10:31:53 +00:00
Pratik Mankawde
beaf01ae4d
fix(telemetry): fix CI failures in phase-6 build, clang-tidy, and rename checks
...
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture
Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp
Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
09-data-collection-reference.md and telemetry-runbook.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-30 17:09:17 +01:00
Pratik Mankawde
a0477f9475
Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
2026-04-29 21:11:03 +01:00
Pratik Mankawde
1658d3dc40
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
2026-04-29 21:09:47 +01:00
Pratik Mankawde
8e7a2d6c53
Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
...
# Conflicts:
# OpenTelemetryPlan/06-implementation-phases.md
# OpenTelemetryPlan/08-appendix.md
# OpenTelemetryPlan/OpenTelemetryPlan.md
2026-04-29 21:07:32 +01:00
Pratik Mankawde
9adcc49171
fix: re-apply phase-7 doc/config changes lost during merge
...
Re-applies phase-7 unique modifications to documentation and
configuration files that were overwritten when taking phase-6's
versions during the merge conflict resolution.
Changes:
- docker-compose.yml: comment out StatsD port 8125, add OTLP notes
- otel-collector-config.yaml: remove StatsD receiver, update pipeline
- integration-test.sh: server=otel, check_otel_metric, StatsD port check
- telemetry-runbook.md: System Metrics section, server=otel config,
troubleshooting for missing OTel metrics
- 02-design-decisions.md: Phase 7 coexistence strategy notes
- 05-configuration-reference.md: OTel System Metrics correlation
- 06-implementation-phases.md: add Phase 7 section (~180 lines)
- OpenTelemetryPlan.md: update phases table (7 phases, 60.6 days)
- 08-appendix.md: add Phase7_taskList.md to document index
- Delete 5 statsd-*.json dashboards (replaced by system-*.json)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-29 21:05:48 +01:00
Pratik Mankawde
70d86d7ebf
Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
...
# Conflicts:
# OpenTelemetryPlan/06-implementation-phases.md
# OpenTelemetryPlan/09-data-collection-reference.md
# OpenTelemetryPlan/OpenTelemetryPlan.md
# docker/telemetry/docker-compose.yml
# docker/telemetry/grafana/dashboards/statsd-network-traffic.json
# docker/telemetry/otel-collector-config.yaml
# src/xrpld/overlay/detail/PeerImp.cpp
2026-04-29 20:38:00 +01:00
Pratik Mankawde
9e12e660fe
Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-29 20:25:13 +01:00
Pratik Mankawde
7ab6f4d34b
fix: address CI rename checks (rippled -> xrpld) in phase-8 docs
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-29 20:09:43 +01:00
Pratik Mankawde
81b47afde7
Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
...
# Conflicts:
# OpenTelemetryPlan/06-implementation-phases.md
# OpenTelemetryPlan/08-appendix.md
# OpenTelemetryPlan/OpenTelemetryPlan.md
# docker/telemetry/grafana/dashboards/statsd-network-traffic.json
# docker/telemetry/grafana/dashboards/statsd-node-health.json
# docker/telemetry/grafana/dashboards/statsd-rpc-pathfinding.json
2026-04-29 20:07:43 +01:00
Pratik Mankawde
b65f91117f
fix: address CI checks (prettier, docs.sh rename, levelization)
...
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-04-29 20:03:22 +01:00
Pratik Mankawde
769668579a
Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics
...
# Conflicts:
# .codecov.yml
# .github/scripts/levelization/results/ordering.txt
# .github/workflows/reusable-clang-tidy-files.yml
# CMakeLists.txt
# OpenTelemetryPlan/00-tracing-fundamentals.md
# OpenTelemetryPlan/01-architecture-analysis.md
# OpenTelemetryPlan/02-design-decisions.md
# OpenTelemetryPlan/03-implementation-strategy.md
# OpenTelemetryPlan/04-code-samples.md
# OpenTelemetryPlan/05-configuration-reference.md
# OpenTelemetryPlan/06-implementation-phases.md
# OpenTelemetryPlan/07-observability-backends.md
# OpenTelemetryPlan/08-appendix.md
# OpenTelemetryPlan/09-data-collection-reference.md
# OpenTelemetryPlan/OpenTelemetryPlan.md
# OpenTelemetryPlan/POC_taskList.md
# OpenTelemetryPlan/Phase2_taskList.md
# OpenTelemetryPlan/Phase3_taskList.md
# OpenTelemetryPlan/Phase4_taskList.md
# OpenTelemetryPlan/Phase5_IntegrationTest_taskList.md
# OpenTelemetryPlan/Phase5_taskList.md
# OpenTelemetryPlan/presentation.md
# cfg/xrpld-example.cfg
# conan.lock
# conanfile.py
# cspell.config.yaml
# docker/telemetry/TESTING.md
# docker/telemetry/docker-compose.yml
# docker/telemetry/grafana/dashboards/consensus-health.json
# docker/telemetry/grafana/dashboards/transaction-overview.json
# docker/telemetry/grafana/provisioning/dashboards/dashboards.yaml
# docker/telemetry/grafana/provisioning/datasources/tempo.yaml
# docker/telemetry/integration-test.sh
# docker/telemetry/otel-collector-config.yaml
# docker/telemetry/tempo.yaml
# docker/telemetry/xrpld-telemetry.cfg
# docs/build/telemetry.md
# docs/telemetry-runbook.md
# include/xrpl/core/ServiceRegistry.h
# include/xrpl/protocol/detail/features.macro
# include/xrpl/telemetry/SpanGuard.h
# include/xrpl/telemetry/Telemetry.h
# include/xrpl/telemetry/TraceContextPropagator.h
# src/libxrpl/basics/MallocTrim.cpp
# src/libxrpl/nodestore/backend/MemoryFactory.cpp
# src/libxrpl/nodestore/backend/NuDBFactory.cpp
# src/libxrpl/nodestore/backend/RocksDBFactory.cpp
# src/libxrpl/telemetry/NullTelemetry.cpp
# src/libxrpl/telemetry/Telemetry.cpp
# src/libxrpl/telemetry/TelemetryConfig.cpp
# src/tests/libxrpl/basics/MallocTrim.cpp
# src/tests/libxrpl/telemetry/TelemetryConfig.cpp
# src/xrpld/app/consensus/RCLConsensus.cpp
# src/xrpld/app/consensus/RCLConsensus.h
# src/xrpld/app/ledger/detail/BuildLedger.cpp
# src/xrpld/app/ledger/detail/LedgerMaster.cpp
# src/xrpld/app/main/Application.cpp
# src/xrpld/app/misc/NetworkOPs.cpp
# src/xrpld/consensus/Consensus.h
# src/xrpld/overlay/detail/PeerImp.cpp
# src/xrpld/rpc/detail/RPCHandler.cpp
# src/xrpld/rpc/detail/ServerHandler.cpp
2026-04-29 19:50:32 +01:00
Pratik Mankawde
3dd2f34591
Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
...
# Conflicts:
# OpenTelemetryPlan/Phase3_taskList.md
# docker/telemetry/grafana/provisioning/datasources/tempo.yaml
# docs/telemetry-runbook.md
# include/xrpl/proto/xrpl.proto
# src/xrpld/app/consensus/RCLConsensus.cpp
# src/xrpld/app/misc/detail/TxQ.cpp
2026-04-29 17:38:03 +01:00
Pratik Mankawde
521e0756e1
docs(telemetry): add cross-node trace propagation to runbook
...
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-29 17:33:10 +01:00
Pratik Mankawde
39273e3aae
Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
...
# Conflicts:
# docs/telemetry-runbook.md
2026-04-29 14:30:13 +01:00
Pratik Mankawde
9f571e5d1e
docs(telemetry): add cross-node trace propagation to runbook
...
Document the propagation infrastructure: send-side injection in
NetworkOPs/RCLConsensus, receive-side extraction in PeerImp via
PropagationHelpers.h and ConsensusReceiveTracing.h. Update
consensus receive span descriptions to reflect parent extraction.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-29 14:28:40 +01:00
Pratik Mankawde
b933e8ae00
feat(telemetry): add missing StatsD dashboard panels from production dashboard
...
Compared shared production Grafana dashboard against Phase 6 StatsD
dashboards and added 10 missing panels covering job execution/dequeue
timers, cache metrics, ledger publish gap, state duration rate, duplicate
traffic, and detailed traffic breakdown.
Node Health dashboard: 8 → 16 panels, plus quantile template variable.
Network Traffic dashboard: 8 → 10 panels, Total Network Bytes now rate().
Updated runbook, data collection reference, and implementation phases docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
2026-04-29 14:02:27 +01:00
Pratik Mankawde
a1cb752745
Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
2026-04-29 13:01:38 +01:00