mirror of
https://github.com/XRPLF/rippled.git
synced 2026-06-03 00:36:48 +00:00
MetricsRegistry observable-gauge callbacks run on the OTel reader thread and read live state from nodeStore_, overlay_, networkOPs_, ledgerMaster, inboundLedgers, loadManager, and others. The old shutdown sequence called metricsRegistry_->stop() AFTER all those services were already stopped, which left a race window between each service's stop() and the final provider_->ForceFlush() during which a callback could dereference already-stopped service state. The try/catch guards in each callback mitigated crashes but not reads from freed members. - Add MetricsRegistry::detachCallbacks() that sets an atomic<bool> callbacksDetached_ with release ordering. Idempotent. - Guard every ObservableGauge callback entry with an acquire-load of the same flag and return early if it is set. Covers all 15 registered callbacks (cacheHitRate, txq, objectCount, loadFactor, nodeStore, serverInfo, buildInfo, completeLedgers, dbMetrics, validatorHealth, peerQuality, ledgerEconomy, stateTracking, storageDetail, validationAgreement). - Application::run() shutdown sequence now calls metricsRegistry_->detachCallbacks() right after m_loadManager->stop() and BEFORE m_shaMapStore, m_jobQueue, overlay_, grpcServer_, m_networkOPs, serverHandler_, m_ledgerReplayer, m_inboundTransactions, m_inboundLedgers, ledgerCleaner_, m_nodeStore, perfLog_ are stopped. The acquire/release pair guarantees subsequent reader-thread ticks see the detach before they dereference stopped services. - metricsRegistry_->stop() keeps setting the flag as a belt-and-suspenders defense in case a future caller forgets to detach first. - Drop the misleading "No explicit RemoveCallback is needed" comment from stop(); provider destruction alone does not beat the reader thread to already-freed state. The objectCountGauge callback previously discarded its state pointer via `void* /* state */`; restore the state argument so it can access self->callbacksDetached_ too.