Files
rippled/src
Pratik Mankawde 2735e4ac78 fix(telemetry): detach metrics gauge callbacks before Application services stop
MetricsRegistry observable-gauge callbacks run on the OTel reader thread
and read live state from nodeStore_, overlay_, networkOPs_, ledgerMaster,
inboundLedgers, loadManager, and others. The old shutdown sequence called
metricsRegistry_->stop() AFTER all those services were already stopped,
which left a race window between each service's stop() and the final
provider_->ForceFlush() during which a callback could dereference
already-stopped service state. The try/catch guards in each callback
mitigated crashes but not reads from freed members.

- Add MetricsRegistry::detachCallbacks() that sets an atomic<bool>
  callbacksDetached_ with release ordering. Idempotent.
- Guard every ObservableGauge callback entry with an acquire-load of the
  same flag and return early if it is set. Covers all 15 registered
  callbacks (cacheHitRate, txq, objectCount, loadFactor, nodeStore,
  serverInfo, buildInfo, completeLedgers, dbMetrics, validatorHealth,
  peerQuality, ledgerEconomy, stateTracking, storageDetail,
  validationAgreement).
- Application::run() shutdown sequence now calls
  metricsRegistry_->detachCallbacks() right after m_loadManager->stop()
  and BEFORE m_shaMapStore, m_jobQueue, overlay_, grpcServer_,
  m_networkOPs, serverHandler_, m_ledgerReplayer, m_inboundTransactions,
  m_inboundLedgers, ledgerCleaner_, m_nodeStore, perfLog_ are stopped.
  The acquire/release pair guarantees subsequent reader-thread ticks see
  the detach before they dereference stopped services.
- metricsRegistry_->stop() keeps setting the flag as a belt-and-suspenders
  defense in case a future caller forgets to detach first.
- Drop the misleading "No explicit RemoveCallback is needed" comment
  from stop(); provider destruction alone does not beat the reader
  thread to already-freed state.

The objectCountGauge callback previously discarded its state pointer
via `void* /* state */`; restore the state argument so it can access
self->callbacksDetached_ too.
2026-05-14 17:20:52 +01:00
..