Commit Graph

14272 Commits

Author SHA1 Message Date
Pratik Mankawde
4470ae7bc9 Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-05-13 12:13:35 +01:00
Pratik Mankawde
25d2dae798 fix(tests): align MockServiceRegistry overrides with ServiceRegistry interface
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".

Rename all 13 mismatched overrides to match the current interface:
  timeKeeper          -> getTimeKeeper
  cachedSLEs          -> getCachedSLEs
  validators          -> getValidators
  validatorSites      -> getValidatorSites
  validatorManifests  -> getValidatorManifests
  publisherManifests  -> getPublisherManifests
  overlay             -> getOverlay
  cluster             -> getCluster
  peerReservations    -> getPeerReservations
  pendingSaves        -> getPendingSaves
  openLedger (x2)     -> getOpenLedger
  getPathRequests     -> getPathRequestManager (type rename too)
  journal             -> getJournal
  logs                -> getLogs
  trapTxID            -> getTrapTxID
  app                 -> getApp

Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 12:12:59 +01:00
Pratik Mankawde
47e83c4e65 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-05-13 12:12:38 +01:00
Pratik Mankawde
4bbe28cb92 fix(consensus): restore DisputedTx getYays/getNays accessors
Consensus.h (Phase 4 tracing) depends on DisputedTx::getYays()/getNays()
to build disputeResolve span events. Both accessors were removed by
earlier 'duplicate accessor' cleanup commits on this branch, leaving
Consensus.h referencing non-existent members. CI caught this on
macOS/clang-17/gcc-13/Windows builds.

Restore the accessors on the branch where they were dropped so downstream
phase branches inherit a compiling DisputedTx.h via merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 12:11:52 +01:00
Pratik Mankawde
782d98d249 Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-13 11:40:15 +01:00
Pratik Mankawde
93a7df6147 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-05-13 11:31:16 +01:00
Pratik Mankawde
dd4911ef5e Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-05-13 11:31:00 +01:00
Pratik Mankawde
c096eeb239 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-05-13 11:30:22 +01:00
Pratik Mankawde
e49c5997b7 added loki config.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-06 17:37:43 +01:00
Pratik Mankawde
5ad5bacc94 compilation fixes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-06 16:26:50 +01:00
Pratik Mankawde
98aa9c9095 fix(telemetry): initialize WindowEvent::agreed field
Fixes cppcoreguidelines-pro-type-member-init clang-tidy error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:25:27 +01:00
Pratik Mankawde
92072ecca4 fix(telemetry): fix CI failures — clang-tidy, levelization, linker
Clang-tidy fixes:
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
  in OTelCollector.h, OTelCollector.cpp, ValidationTracker.h/.cpp
- Add missing direct includes (misc-include-cleaner) in
  ValidationTracker.cpp, test, CollectorManager.cpp, OTelCollector.cpp
- Make lock_guard variables const (misc-const-correctness)
- Add braces around single-line if/else (readability-braces-around-statements)
- Use designated initializer for WindowEvent (modernize-use-designated-initializers)
- Initialize LedgerEvent::seq field (cppcoreguidelines-pro-type-member-init)

Linker fix:
- Add ValidationTracker.cpp as source to xrpl.test.telemetry target
  (it lives in src/xrpld/ but the test links against libxrpl only)

Levelization fix:
- Remove stale dependency edges from ordering.txt that were introduced
  by the erroneous develop-merge commit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:07:25 +01:00
Pratik Mankawde
85330920ac feat(telemetry): add Loki service and filelog receiver for Phase 8 log ingestion
Cherry-pick Loki infrastructure from phase-10 back to where it belongs
(Phase 8, Tasks 8.2/8.3):

- Add Loki 3.4.2 service to docker-compose.yml (port 3100)
- Add filelog receiver to OTel Collector config (tails debug.log,
  regex_parser extracts trace_id/span_id/partition/severity)
- Add otlphttp/loki exporter (uses Loki 3.x native OTLP ingestion)
- Add logs pipeline: filelog -> batch -> otlphttp/loki
- Add health_check extension
- Mount xrpld log directory into collector container
- Add prometheus-data and loki-data persistent volumes

StatsD receiver intentionally excluded — Phase 7 migrated to native
OTLP metrics, making the StatsD receiver unnecessary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 14:55:45 +01:00
Pratik Mankawde
a61cdf0214 fix: remove duplicate DisputedTx getYays/getNays accessors
These were already added in an earlier phase branch. The duplicate
with slightly different Doxygen wording was introduced by the
erroneous merge/revert cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:38:22 +01:00
Pratik Mankawde
fac6c3ac1d Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-05-06 14:34:17 +01:00
Pratik Mankawde
a8549a7ab2 fix(telemetry): address code review findings for Phase 8 log-trace correlation
- Replace GetSpan() with direct context value check in Logs::format()
  to avoid heap allocation (new DefaultSpan) on the no-span path
- Restore Phase 7 documentation accidentally deleted during merge
- Fix undefined $JAEGER variable → use $TEMPO in integration test
- Remove useless LCOV_EXCL markers around #ifdef block
- Fix indentation inconsistencies in Log.cpp injection block
- Remove incorrect url field from loki.yaml derivedFields
- Update stale code sample in Phase8_taskList.md to match implementation
- Correct "<10ns" performance claims to accurate ~15-20ns (no-span)
  and ~50ns (active-span) measurements across all docs
- Replace Jaeger references with Tempo in TESTING.md (port 16686→3200)
- Improve error handling in check_log_correlation(): track files_scanned,
  detect missing log files, fix silent grep error masking

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 14:32:46 +01:00
Pratik Mankawde
761688383d fix(telemetry): address code review issues in OTelCollector
- Fix use-after-free: extract gauge callback to static function and call
  RemoveCallback in ~OTelGaugeImpl() before unregistering from collector
- Use memory_order_acq_rel on callHooks() debounce CAS for proper
  happens-before relationship between hook invocations
- Add explicit 2s timeout to ForceFlush() in destructor to prevent
  blocking indefinitely when OTLP endpoint is unreachable at shutdown
- Add OTLP receiver to metrics pipeline so native OTel metrics from
  xrpld are actually received by the collector
- Remove stale health check port from docker-compose (extension was
  removed from collector config)
- Clarify fallback docs: StatsD path requires re-enabling receiver/port
- Fix comments: Counter uses uint64_t not int64_t, gauge clamps to
  [0, INT64_MAX] not [0, UINT64_MAX]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:24:52 +01:00
Pratik Mankawde
ed31bab500 fix: restore unrelated files to phase-6 state after revert
The blanket revert of f4555c80fe also un-reverted some files that had
been correctly matched to phase-6 (nodestore Backend API refactor,
Vault_test changes). Restore those to the base branch state so the
phase-7 PR only contains telemetry changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:15:14 +01:00
Pratik Mankawde
b62987bda7 Revert "fix: revert all unrelated upstream develop changes from phase-7 PR"
This reverts commit f4555c80fe.
2026-05-06 14:13:59 +01:00
Pratik Mankawde
f08412b3e0 ordering update
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-05-06 13:27:29 +01:00
Pratik Mankawde
a4fc3c878e Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-04-30 17:14:53 +01:00
Pratik Mankawde
82696f6f8e Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-30 17:14:47 +01:00
Pratik Mankawde
43ae5c2d20 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-04-30 17:14:42 +01:00
Pratik Mankawde
8dec56df0b Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-04-30 17:14:34 +01:00
Pratik Mankawde
beaf01ae4d fix(telemetry): fix CI failures in phase-6 build, clang-tidy, and rename checks
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
  validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture

Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
  PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp

Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
  09-data-collection-reference.md and telemetry-runbook.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 17:09:17 +01:00
Pratik Mankawde
bf390559f5 Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-04-30 17:00:46 +01:00
Pratik Mankawde
40cc7f9ed7 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-30 17:00:39 +01:00
Pratik Mankawde
5e37f71139 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-04-30 17:00:27 +01:00
Pratik Mankawde
3f793b2098 fix: revert incorrect docs.sh renames (README.md URL, configLegacyName)
- protocol/README.md: restore historical GitHub URL path (src/ripple/)
- Config.cpp: restore configLegacyName as "rippled.cfg" (legacy name
  must remain as-is for backward compatibility)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 17:00:03 +01:00
Pratik Mankawde
f4555c80fe fix: revert all unrelated upstream develop changes from phase-7 PR
Reverts 259 files that carried unrelated upstream changes through the
phase-6 merge: enum class removals (cppcoreguidelines-use-enum-class),
scoped_lock→lock_guard conversions (modernize-use-scoped-lock),
nodestore Backend API changes (void const* key), .clang-tidy config,
test infrastructure deletions, and miscellaneous develop changes.

These changes belong on develop, not in the telemetry PR chain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 16:59:24 +01:00
Pratik Mankawde
b4f9f47295 Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-04-30 16:46:00 +01:00
Pratik Mankawde
f2bc0b18f2 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-30 16:45:55 +01:00
Pratik Mankawde
5bdb6b4eaf Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-04-30 16:45:49 +01:00
Pratik Mankawde
f44b89b99d fix: revert unrelated develop changes from phase-7 PR
- Revert reusable-build-test-config.yml to develop (action SHA update
  and "Show test failure summary" step removal don't belong here)
- Revert upload-conan-deps.yml to develop (action SHA update)
- Revert features.macro: BatchInnerSigs and Batch back to Supported::no
  (these feature flag changes are unrelated to telemetry)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-30 16:45:43 +01:00
Pratik Mankawde
db4c3788cb Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-04-29 21:18:10 +01:00
Pratik Mankawde
58a3be4517 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-29 21:18:03 +01:00
Pratik Mankawde
1c0d21dba4 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-04-29 21:17:57 +01:00
Pratik Mankawde
f183e9b57f Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-04-29 21:17:48 +01:00
Pratik Mankawde
a0477f9475 Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation 2026-04-29 21:11:03 +01:00
Pratik Mankawde
1658d3dc40 Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill 2026-04-29 21:09:47 +01:00
Pratik Mankawde
8e7a2d6c53 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
# Conflicts:
#	OpenTelemetryPlan/06-implementation-phases.md
#	OpenTelemetryPlan/08-appendix.md
#	OpenTelemetryPlan/OpenTelemetryPlan.md
2026-04-29 21:07:32 +01:00
Pratik Mankawde
9adcc49171 fix: re-apply phase-7 doc/config changes lost during merge
Re-applies phase-7 unique modifications to documentation and
configuration files that were overwritten when taking phase-6's
versions during the merge conflict resolution.

Changes:
- docker-compose.yml: comment out StatsD port 8125, add OTLP notes
- otel-collector-config.yaml: remove StatsD receiver, update pipeline
- integration-test.sh: server=otel, check_otel_metric, StatsD port check
- telemetry-runbook.md: System Metrics section, server=otel config,
  troubleshooting for missing OTel metrics
- 02-design-decisions.md: Phase 7 coexistence strategy notes
- 05-configuration-reference.md: OTel System Metrics correlation
- 06-implementation-phases.md: add Phase 7 section (~180 lines)
- OpenTelemetryPlan.md: update phases table (7 phases, 60.6 days)
- 08-appendix.md: add Phase7_taskList.md to document index
- Delete 5 statsd-*.json dashboards (replaced by system-*.json)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 21:05:48 +01:00
Pratik Mankawde
8e44c95d6a fix: address bashate warnings in benchmark.sh (E042/E044)
Separate local declarations from assignments to avoid hiding errors,
and use [[ instead of [ for non-POSIX comparisons.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:42:01 +01:00
Pratik Mankawde
b659d43395 fix: address CI rename checks (rippled -> xrpld) in phase-10 docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:40:44 +01:00
Pratik Mankawde
70d86d7ebf Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation
# Conflicts:
#	OpenTelemetryPlan/06-implementation-phases.md
#	OpenTelemetryPlan/09-data-collection-reference.md
#	OpenTelemetryPlan/OpenTelemetryPlan.md
#	docker/telemetry/docker-compose.yml
#	docker/telemetry/grafana/dashboards/statsd-network-traffic.json
#	docker/telemetry/otel-collector-config.yaml
#	src/xrpld/overlay/detail/PeerImp.cpp
2026-04-29 20:38:00 +01:00
Pratik Mankawde
9e12e660fe Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:25:13 +01:00
Pratik Mankawde
7ab6f4d34b fix: address CI rename checks (rippled -> xrpld) in phase-8 docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:09:43 +01:00
Pratik Mankawde
81b47afde7 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation
# Conflicts:
#	OpenTelemetryPlan/06-implementation-phases.md
#	OpenTelemetryPlan/08-appendix.md
#	OpenTelemetryPlan/OpenTelemetryPlan.md
#	docker/telemetry/grafana/dashboards/statsd-network-traffic.json
#	docker/telemetry/grafana/dashboards/statsd-node-health.json
#	docker/telemetry/grafana/dashboards/statsd-rpc-pathfinding.json
2026-04-29 20:07:43 +01:00
Pratik Mankawde
b65f91117f fix: address CI checks (prettier, docs.sh rename, levelization)
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:03:22 +01:00
Pratik Mankawde
57ed0d9fd0 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-04-29 19:59:02 +01:00