- MetricsRegistry.cpp: concatenate nested namespaces, add missing
direct includes (Journal.h, string, string_view, cstdint), suppress
readability-convert-member-functions-to-static in #else stubs by
referencing enabled_ member, void unused instanceId parameter.
- MetricsRegistry test: add missing direct includes (Log.h, Journal.h,
uint256.h, io_context.hpp, optional, stdexcept, string), make
throwUnimplemented() static, add [[nodiscard]] to getOpenLedger/
isStopping/getTrapTxID overrides, make const-eligible registry const.
- PerfLogImp.cpp: add braces around if/else body per
readability-braces-around-statements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Phase2_taskList: update attr refs to bare names, note node-health
attrs moved to resource level.
- 02-design-decisions: strip xrpl.pathfind.* prefix from planned attrs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update OpenTelemetryPlan docs and Telemetry.h doc example to reflect
the renamed per-span attributes: xrpl.rpc.command -> command,
xrpl.rpc.status -> rpc_status, xrpl.grpc.method -> method, etc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update 31 attribute references in telemetry-runbook.md to match the
simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use
domain-qualified names for collisions (rpc_status, consensus_state,
etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drop xrpl.pathfind.* prefix from per-span attrs (source_account,
dest_account, fast, search_level, num_complete_paths, num_paths,
num_requests).
- Keep xrpl.pathfind.ledger_index qualified (rule 5: distinct from
xrpl.ledger.seq).
- Remove per-span nodeAmendmentBlocked/nodeServerState calls from
RPCHandler — promoted to resource-level attrs.
- Mark node-health attrs in SpanNames.h as RESOURCE-ONLY with doc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MetricsRegistry emits OTel SDK metrics with the xrpld_ prefix
(MetricsRegistry.cpp defines "xrpld_nodestore_state",
"xrpld_cache_metrics", etc.), but the Phase 9 dashboards and the
Step 10c integration-test assertions introduced in 892fee638a
queried the rippled_ prefix. Every Phase 9 panel and assertion
therefore rendered "No data" or failed on a live run, even though
the underlying series were being exported correctly.
Rename the rippled_ prefix to xrpld_ for every MetricsRegistry
metric in dashboards and the integration test:
- nodestore_state, cache_metrics, txq_metrics, load_factor_metrics,
object_count
- rpc_method_started_total / _finished_total / _errored_total /
_duration_us_bucket
- job_queued_total / _started_total / _finished_total /
_queued_duration_us_bucket / _running_duration_us_bucket
- peer_quality, server_info, validator_health, ledger_economy,
db_metrics, complete_ledgers, build_info, state_tracking
- ledgers_closed_total, validations_sent_total,
validations_checked_total, state_changes_total
- validation_agreement (ValidationTracker 1h/24h/7d windows)
Also add ValidationTracker window-gauge assertions to Step 10c of
integration-test.sh so the 1h/24h/7d agreement and miss counts are
checked alongside the other Phase 9 gauges.
The rippled_ prefix is preserved for beast::insight metrics
(rippled_LedgerMaster_*, rippled_Peer_Finder_*, rippled_total_*,
rippled_Overlay_*, rippled_State_Accounting_*, rippled_transactions_*,
rippled_proposals_*, rippled_validations_Messages_*) because those
flow through the StatsD-style OTelCollector configured with
`[insight] prefix=rippled` and remain on that prefix by design.
Verified against a live 6-node consensus network: all 22 Phase 9 +
ValidationTracker assertions now report 6+ series per metric.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two fixes so gauges register in Prometheus (via StatsD) even when their
initial/steady-state value is 0:
1. StatsDGaugeImpl m_dirty: default-init to true so the initial value
(0) is emitted on the first flush. Previously, gauges whose value
never changed from 0 were never flushed and never appeared
downstream.
2. io_latency_sampler firstSample_: new atomic<bool>, init true.
m_event.notify now fires when either firstSample_ is true (exchanged
to false) or lastSample >= 10 ms. This guarantees the io_latency
metric is registered on startup; subsequent sub-10 ms samples are
still suppressed to avoid flooding.
Set defaults for tx_span::attr::suppressed (false) and
tx_span::attr::status ("new") immediately after creating the txReceive
span. Without defaults, spans whose suppressed/status attributes would
only be set in the HashRouter-suppressed branch lacked these attributes
entirely, producing incomplete span data in downstream stores.
The suppressed branch still overrides these when the transaction has
already been seen via HashRouter.
Two build failures surfaced by CI on the Phase 9 branch:
1. NetworkOPsImp stores the ServiceRegistry as
std::reference_wrapper<ServiceRegistry> registry_, so calls must go
through registry_.get().<method>(). The MetricsRegistry hooks added
in setMode() and recvValidation() dereferenced the wrapper directly,
which compiles against a pre-existing accessor on the wrapper type
on some toolchains but fails on clang 16/17/20 and gcc 13/15 with
"no member named 'getMetricsRegistry' in
std::reference_wrapper<xrpl::ServiceRegistry>".
2. MetricsRegistry::app_ and MetricsRegistry::journal_ are only used
inside XRPL_ENABLE_TELEMETRY-guarded code paths (gauge callbacks
and JLOG). When telemetry is disabled, clang's
-Werror=-Wunused-private-field tripped. Move the two fields under
the same #ifdef and guard the constructor initialisers with
[[maybe_unused]] so the no-op build continues to compile cleanly.
Addresses code review findings on PR #6513:
1. registerAsyncGauges() was ~730 lines, violating the CLAUDE.md
rule "No function longer than 80 lines." Split into fifteen
per-domain helpers (cache, TxQ, object count, load factor,
NodeStore, server info, build info, complete ledgers, DB,
validator health, peer quality, ledger economy, state tracking,
storage detail, validation agreement) dispatched from a thin
shell. Each helper now stays at or below the 80-line limit.
2. PerfLogImp::rpcEnd() only updated the in-memory counter and
never advanced the OTel xrpld_rpc_method_finished_total,
xrpld_rpc_method_errored_total, or xrpld_rpc_method_duration_us
instruments. rpcStart() was already wired up, so the finished
and errored counters stayed at zero for every RPC call.
rpcEnd() now computes the duration once, records it under the
existing mutex, and forwards finish/error events to
MetricsRegistry::recordRpcFinished / recordRpcErrored outside
the counter mutex to avoid lock nesting with the OTel SDK.
3. Added class-level Doxygen for MetricsRegistry with an ASCII
collaborator diagram and explicit @note tags covering
thread-safety, lifetime, and extension guidance.
Tempo /api/traces/{id} returns OTLP-shaped JSON with a top-level
"batches" key, not "data". The cross-check in check_log_correlation
was querying jq '.data | length' which always returned null, causing
the Log-Tempo cross-check to fail even when the trace existed.
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".
Rename all 13 mismatched overrides to match the current interface:
timeKeeper -> getTimeKeeper
cachedSLEs -> getCachedSLEs
validators -> getValidators
validatorSites -> getValidatorSites
validatorManifests -> getValidatorManifests
publisherManifests -> getPublisherManifests
overlay -> getOverlay
cluster -> getCluster
peerReservations -> getPeerReservations
pendingSaves -> getPendingSaves
openLedger (x2) -> getOpenLedger
getPathRequests -> getPathRequestManager (type rename too)
journal -> getJournal
logs -> getLogs
trapTxID -> getTrapTxID
app -> getApp
Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>