- MetricsRegistry.cpp: concatenate nested namespaces, add missing
direct includes (Journal.h, string, string_view, cstdint), suppress
readability-convert-member-functions-to-static in #else stubs by
referencing enabled_ member, void unused instanceId parameter.
- MetricsRegistry test: add missing direct includes (Log.h, Journal.h,
uint256.h, io_context.hpp, optional, stdexcept, string), make
throwUnimplemented() static, add [[nodiscard]] to getOpenLedger/
isStopping/getTrapTxID overrides, make const-eligible registry const.
- PerfLogImp.cpp: add braces around if/else body per
readability-braces-around-statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
dest_account, fast, search_level, num_complete_paths, num_paths,
num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes so gauges register in Prometheus (via StatsD) even when their
initial/steady-state value is 0:
1. StatsDGaugeImpl m_dirty: default-init to true so the initial value
(0) is emitted on the first flush. Previously, gauges whose value
never changed from 0 were never flushed and never appeared
downstream.
2. io_latency_sampler firstSample_: new atomic<bool>, init true.
m_event.notify now fires when either firstSample_ is true (exchanged
to false) or lastSample >= 10 ms. This guarantees the io_latency
metric is registered on startup; subsequent sub-10 ms samples are
still suppressed to avoid flooding.
Set defaults for tx_span::attr::suppressed (false) and
tx_span::attr::status ("new") immediately after creating the txReceive
span. Without defaults, spans whose suppressed/status attributes would
only be set in the HashRouter-suppressed branch lacked these attributes
entirely, producing incomplete span data in downstream stores.
The suppressed branch still overrides these when the transaction has
already been seen via HashRouter.
Two build failures surfaced by CI on the Phase 9 branch:
1. NetworkOPsImp stores the ServiceRegistry as
std::reference_wrapper<ServiceRegistry> registry_, so calls must go
through registry_.get().<method>(). The MetricsRegistry hooks added
in setMode() and recvValidation() dereferenced the wrapper directly,
which compiles against a pre-existing accessor on the wrapper type
on some toolchains but fails on clang 16/17/20 and gcc 13/15 with
"no member named 'getMetricsRegistry' in
std::reference_wrapper<xrpl::ServiceRegistry>".
2. MetricsRegistry::app_ and MetricsRegistry::journal_ are only used
inside XRPL_ENABLE_TELEMETRY-guarded code paths (gauge callbacks
and JLOG). When telemetry is disabled, clang's
-Werror=-Wunused-private-field tripped. Move the two fields under
the same #ifdef and guard the constructor initialisers with
[[maybe_unused]] so the no-op build continues to compile cleanly.
Addresses code review findings on PR #6513:
1. registerAsyncGauges() was ~730 lines, violating the CLAUDE.md
rule "No function longer than 80 lines." Split into fifteen
per-domain helpers (cache, TxQ, object count, load factor,
NodeStore, server info, build info, complete ledgers, DB,
validator health, peer quality, ledger economy, state tracking,
storage detail, validation agreement) dispatched from a thin
shell. Each helper now stays at or below the 80-line limit.
2. PerfLogImp::rpcEnd() only updated the in-memory counter and
never advanced the OTel xrpld_rpc_method_finished_total,
xrpld_rpc_method_errored_total, or xrpld_rpc_method_duration_us
instruments. rpcStart() was already wired up, so the finished
and errored counters stayed at zero for every RPC call.
rpcEnd() now computes the duration once, records it under the
existing mutex, and forwards finish/error events to
MetricsRegistry::recordRpcFinished / recordRpcErrored outside
the counter mutex to avoid lock nesting with the OTel SDK.
3. Added class-level Doxygen for MetricsRegistry with an ASCII
collaborator diagram and explicit @note tags covering
thread-safety, lifetime, and extension guidance.
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".
Rename all 13 mismatched overrides to match the current interface:
timeKeeper -> getTimeKeeper
cachedSLEs -> getCachedSLEs
validators -> getValidators
validatorSites -> getValidatorSites
validatorManifests -> getValidatorManifests
publisherManifests -> getPublisherManifests
overlay -> getOverlay
cluster -> getCluster
peerReservations -> getPeerReservations
pendingSaves -> getPendingSaves
openLedger (x2) -> getOpenLedger
getPathRequests -> getPathRequestManager (type rename too)
journal -> getJournal
logs -> getLogs
trapTxID -> getTrapTxID
app -> getApp
Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Consensus.h (Phase 4 tracing) depends on DisputedTx::getYays()/getNays()
to build disputeResolve span events. Both accessors were removed by
earlier 'duplicate accessor' cleanup commits on this branch, leaving
Consensus.h referencing non-existent members. CI caught this on
macOS/clang-17/gcc-13/Windows builds.
Restore the accessors on the branch where they were dropped so downstream
phase branches inherit a compiling DisputedTx.h via merge.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Clang-tidy fixes:
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
in OTelCollector.h, OTelCollector.cpp, ValidationTracker.h/.cpp
- Add missing direct includes (misc-include-cleaner) in
ValidationTracker.cpp, test, CollectorManager.cpp, OTelCollector.cpp
- Make lock_guard variables const (misc-const-correctness)
- Add braces around single-line if/else (readability-braces-around-statements)
- Use designated initializer for WindowEvent (modernize-use-designated-initializers)
- Initialize LedgerEvent::seq field (cppcoreguidelines-pro-type-member-init)
Linker fix:
- Add ValidationTracker.cpp as source to xrpl.test.telemetry target
(it lives in src/xrpld/ but the test links against libxrpl only)
Levelization fix:
- Remove stale dependency edges from ordering.txt that were introduced
by the erroneous develop-merge commit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These were already added in an earlier phase branch. The duplicate
with slightly different Doxygen wording was introduced by the
erroneous merge/revert cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace GetSpan() with direct context value check in Logs::format()
to avoid heap allocation (new DefaultSpan) on the no-span path
- Restore Phase 7 documentation accidentally deleted during merge
- Fix undefined $JAEGER variable → use $TEMPO in integration test
- Remove useless LCOV_EXCL markers around #ifdef block
- Fix indentation inconsistencies in Log.cpp injection block
- Remove incorrect url field from loki.yaml derivedFields
- Update stale code sample in Phase8_taskList.md to match implementation
- Correct "<10ns" performance claims to accurate ~15-20ns (no-span)
and ~50ns (active-span) measurements across all docs
- Replace Jaeger references with Tempo in TESTING.md (port 16686→3200)
- Improve error handling in check_log_correlation(): track files_scanned,
detect missing log files, fix silent grep error masking
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix use-after-free: extract gauge callback to static function and call
RemoveCallback in ~OTelGaugeImpl() before unregistering from collector
- Use memory_order_acq_rel on callHooks() debounce CAS for proper
happens-before relationship between hook invocations
- Add explicit 2s timeout to ForceFlush() in destructor to prevent
blocking indefinitely when OTLP endpoint is unreachable at shutdown
- Add OTLP receiver to metrics pipeline so native OTel metrics from
xrpld are actually received by the collector
- Remove stale health check port from docker-compose (extension was
removed from collector config)
- Clarify fallback docs: StatsD path requires re-enabling receiver/port
- Fix comments: Counter uses uint64_t not int64_t, gauge clamps to
[0, INT64_MAX] not [0, UINT64_MAX]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The blanket revert of f4555c80fe also un-reverted some files that had
been correctly matched to phase-6 (nodestore Backend API refactor,
Vault_test changes). Restore those to the base branch state so the
phase-7 PR only contains telemetry changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build fixes in PeerImp.cpp:
- Rename duplicate `span` variable to `consSpan` in proposal and
validation handlers to avoid redefinition error
- Fix `->` on non-pointer SpanGuard (now correctly on shared_ptr)
- Fix move-only type copy in lambda capture
Clang-tidy fixes:
- Concatenate nested namespaces in LedgerSpanNames.h and PeerSpanNames.h
- Add missing SpanNames.h includes in BuildLedger.cpp, LedgerMaster.cpp,
PeerImp.cpp for direct seg:: symbol usage
- Add missing <chrono> and <cstdint> includes in BuildLedger.cpp
- Remove unused Feature.h include from BuildLedger.cpp
Rename check fix:
- Run docs.sh to rename rippled_ metric prefixes to xrpld_ in
09-data-collection-reference.md and telemetry-runbook.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- protocol/README.md: restore historical GitHub URL path (src/ripple/)
- Config.cpp: restore configLegacyName as "rippled.cfg" (legacy name
must remain as-is for backward compatibility)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts 259 files that carried unrelated upstream changes through the
phase-6 merge: enum class removals (cppcoreguidelines-use-enum-class),
scoped_lock→lock_guard conversions (modernize-use-scoped-lock),
nodestore Backend API changes (void const* key), .clang-tidy config,
test infrastructure deletions, and miscellaneous develop changes.
These changes belong on develop, not in the telemetry PR chain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>