Commit Graph

12 Commits

Author SHA1 Message Date
Pratik Mankawde
3c13d788fd fix(telemetry): address clang-tidy CI failures on phase9
- MetricsRegistry.cpp: concatenate nested namespaces, add missing
  direct includes (Journal.h, string, string_view, cstdint), suppress
  readability-convert-member-functions-to-static in #else stubs by
  referencing enabled_ member, void unused instanceId parameter.
- MetricsRegistry test: add missing direct includes (Log.h, Journal.h,
  uint256.h, io_context.hpp, optional, stdexcept, string), make
  throwUnimplemented() static, add [[nodiscard]] to getOpenLedger/
  isStopping/getTrapTxID overrides, make const-eligible registry const.
- PerfLogImp.cpp: add braces around if/else body per
  readability-braces-around-statements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-13 18:13:19 +01:00
Pratik Mankawde
4cbb1be5b4 fix(telemetry): CI Werror — registry .get() and unused fields
Two build failures surfaced by CI on the Phase 9 branch:

1. NetworkOPsImp stores the ServiceRegistry as
   std::reference_wrapper<ServiceRegistry> registry_, so calls must go
   through registry_.get().<method>(). The MetricsRegistry hooks added
   in setMode() and recvValidation() dereferenced the wrapper directly,
   which compiles against a pre-existing accessor on the wrapper type
   on some toolchains but fails on clang 16/17/20 and gcc 13/15 with
   "no member named 'getMetricsRegistry' in
   std::reference_wrapper<xrpl::ServiceRegistry>".

2. MetricsRegistry::app_ and MetricsRegistry::journal_ are only used
   inside XRPL_ENABLE_TELEMETRY-guarded code paths (gauge callbacks
   and JLOG). When telemetry is disabled, clang's
   -Werror=-Wunused-private-field tripped. Move the two fields under
   the same #ifdef and guard the constructor initialisers with
   [[maybe_unused]] so the no-op build continues to compile cleanly.
2026-05-13 14:11:16 +01:00
Pratik Mankawde
3c4d51a408 refactor(telemetry): split registerAsyncGauges; record RPC end in OTel
Addresses code review findings on PR #6513:

1. registerAsyncGauges() was ~730 lines, violating the CLAUDE.md
   rule "No function longer than 80 lines." Split into fifteen
   per-domain helpers (cache, TxQ, object count, load factor,
   NodeStore, server info, build info, complete ledgers, DB,
   validator health, peer quality, ledger economy, state tracking,
   storage detail, validation agreement) dispatched from a thin
   shell. Each helper now stays at or below the 80-line limit.

2. PerfLogImp::rpcEnd() only updated the in-memory counter and
   never advanced the OTel xrpld_rpc_method_finished_total,
   xrpld_rpc_method_errored_total, or xrpld_rpc_method_duration_us
   instruments. rpcStart() was already wired up, so the finished
   and errored counters stayed at zero for every RPC call.
   rpcEnd() now computes the duration once, records it under the
   existing mutex, and forwards finish/error events to
   MetricsRegistry::recordRpcFinished / recordRpcErrored outside
   the counter mutex to avoid lock nesting with the OTel SDK.

3. Added class-level Doxygen for MetricsRegistry with an ASCII
   collaborator diagram and explicit @note tags covering
   thread-safety, lifetime, and extension guidance.
2026-05-13 12:23:17 +01:00
Pratik Mankawde
9e12e660fe Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-29 20:25:13 +01:00
Pratik Mankawde
5f139e12c3 feat(telemetry): add 7-day agreement window to validation_agreement gauge
Add agreement_pct_7d, agreements_7d, missed_7d labels to the
rippled_validation_agreement observable gauge, matching the external
xrpl-validator-dashboard's 7-day tracking.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
1defb2111f fix(telemetry): fix ServiceRegistry API names and transaction rate computation
- cachedSLEs() -> getCachedSLEs()
- openLedger() -> getOpenLedger()
- overlay() -> getOverlay()
- Use OpenView::txCount() for transaction rate instead of SHAMap::size()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
350e398aa6 feat(telemetry): wire ValidationTracker to MetricsRegistry and consensus hooks
Add ValidationTracker member to MetricsRegistry with a public accessor,
register a rippled_validation_agreement observable gauge that calls
reconcile() and reports 1h/24h agreement percentages and counts, and
hook recordOurValidation/recordNetworkValidation into RCLConsensus
validate() and LedgerMaster setValidLedger() respectively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
45ffe8e2ec fix(telemetry): add missing counters, fix dashboard metric name, clean dead code
- Add rippled_validation_agreements_total and rippled_validation_missed_total
  counter declarations and creation (wiring to ValidationTracker pending rebase)
- Fix peer-quality dashboard: query rippled_server_info{metric="peer_disconnects_resources"}
  instead of non-existent rippled_Overlay_Peer_Disconnects_Charges
- Remove dead getCountsJson() call in storageDetail callback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
b0e0d5930a fix(telemetry): fix metric labels and add missing parity gauge values
- Rename fee labels to match spec: base_fee_drops -> base_fee_xrp,
  reserve_base_drops -> reserve_base_xrp, reserve_inc_drops -> reserve_inc_xrp
- Add peers_insane_count (stub with TODO for PeerImp::tracking_ exposure)
- Add transaction_rate to ledger economy gauge
- Replace node_store_writes/node_written_bytes with nudb_bytes per spec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
50e6b14c56 feat(telemetry): add external dashboard parity gauges and counters to MetricsRegistry
Add validator health, peer quality, ledger economy, state tracking, and
storage detail observable gauges plus 5 synchronous counters with recording
hooks for ledger close, validation send, state change, and overflow events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
d426f4983a feat(telemetry): add push_metrics.py parity gauges to MetricsRegistry
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00
Pratik Mankawde
892fee638a Phase 9: Metric gap fill - nodestore, cache, TxQ, load factor dashboards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 22:31:49 +01:00