- `src/test/telemetry/MetricsRegistry_test.cpp` (Beast `unit_test::suite`
format under `src/test/`) duplicates the GTest version already
maintained at `src/tests/libxrpl/telemetry/MetricsRegistry.cpp`.
Project rule (`tasks/lessons.md` §Test Format): all new tests use
GTest under `src/tests/libxrpl/`. The GTest version exercises the
same four cases (disabled construction, start/stop lifecycle, recording
no-op, destructor-calls-stop). Deleting the Beast duplicate eliminates
drift and keeps the test authoritative in one place.
- Drop the matching `test.telemetry > xrpl.basics/xrpl.core/xrpld.telemetry`
entries from `.github/scripts/levelization/results/ordering.txt`
because `xrpl.test.telemetry` (the GTest binary) retains its own
entries; the removed ones belonged to the deleted Beast suite.
- `.claude/instructions.md` was committed as a symlink to an
author-local absolute path (`/home/pratik/sourceCode/personal/Rippled/
instructions.md`) that does not exist for any other contributor or in
CI. Remove the symlink from git tracking and add `.claude/` to
`.gitignore` so future agent commits do not re-add per-developer
settings.
MockServiceRegistry in MetricsRegistry.cpp still used the old method
names (timeKeeper, cachedSLEs, validators, overlay, cluster, app, etc.)
while ServiceRegistry has been standardized on getXxx()/isXxx() forms.
Windows CI caught this as C3668 "did not override any base class methods"
errors and C2259 "cannot instantiate abstract class".
Rename all 13 mismatched overrides to match the current interface:
timeKeeper -> getTimeKeeper
cachedSLEs -> getCachedSLEs
validators -> getValidators
validatorSites -> getValidatorSites
validatorManifests -> getValidatorManifests
publisherManifests -> getPublisherManifests
overlay -> getOverlay
cluster -> getCluster
peerReservations -> getPeerReservations
pendingSaves -> getPendingSaves
openLedger (x2) -> getOpenLedger
getPathRequests -> getPathRequestManager (type rename too)
journal -> getJournal
logs -> getLogs
trapTxID -> getTrapTxID
app -> getApp
Also regenerate levelization ordering.txt to reflect the new
tests.libxrpl -> xrpl.core edge introduced by ServiceRegistry.h include.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Clang-tidy fixes:
- Concatenate nested namespaces (modernize-concat-nested-namespaces)
in OTelCollector.h, OTelCollector.cpp, ValidationTracker.h/.cpp
- Add missing direct includes (misc-include-cleaner) in
ValidationTracker.cpp, test, CollectorManager.cpp, OTelCollector.cpp
- Make lock_guard variables const (misc-const-correctness)
- Add braces around single-line if/else (readability-braces-around-statements)
- Use designated initializer for WindowEvent (modernize-use-designated-initializers)
- Initialize LedgerEvent::seq field (cppcoreguidelines-pro-type-member-init)
Linker fix:
- Add ValidationTracker.cpp as source to xrpl.test.telemetry target
(it lives in src/xrpld/ but the test links against libxrpl only)
Levelization fix:
- Remove stale dependency edges from ordering.txt that were introduced
by the erroneous develop-merge commit
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts 259 files that carried unrelated upstream changes through the
phase-6 merge: enum class removals (cppcoreguidelines-use-enum-class),
scoped_lock→lock_guard conversions (modernize-use-scoped-lock),
nodestore Backend API changes (void const* key), .clang-tidy config,
test infrastructure deletions, and miscellaneous develop changes.
These changes belong on develop, not in the telemetry PR chain.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert reusable-build-test-config.yml to develop (action SHA update
and "Show test failure summary" step removal don't belong here)
- Revert upload-conan-deps.yml to develop (action SHA update)
- Revert features.macro: BatchInnerSigs and Batch back to Supported::no
(these feature flag changes are unrelated to telemetry)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prettier formatting for markdown docs and OTelCollector header
- docs.sh rippled→xrpld renames in OTelCollector.cpp comments/strings
- Updated levelization ordering with new dependency edges
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire trace context into P2P message flow so distributed traces
link across nodes. TX relay injects SpanGuard context via
PropagationHelpers.h; consensus propose/validate injects via
TraceContextPropagator.h. Receive-side extraction in PeerImp
creates child spans for proposals and validations.
- Add TraceBytes struct and SpanGuard::getTraceBytes() for
extracting raw trace context without OTel type dependencies
- Add PropagationHelpers.h: injectSpanContext(SpanGuard, proto)
- Add ConsensusReceiveTracing.h: proposalReceiveSpan(),
validationReceiveSpan() with parent context extraction
- NetworkOPs::apply(): inject tx.process context before relay
- RCLConsensus::propose()/validate(): inject active span context
- PeerImp: create receive spans for proposals and validations
with sender's trace context as parent
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 14 missing spans to runbook (6 TxQ + 8 consensus)
- Fix tx.receive attributes and config table in runbook
- Document dispute.resolve and tx.included span events
- Add spanmetrics dimensions for close_time_correct and tx.suppressed
- Fix Close Time Agreement and TX Receive vs Suppressed panel PromQL
- Wire $consensus_mode template variable to all consensus panels
- Add 10 Tempo search filters for operational attributes
- Apply rename script artifacts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move TxQSpanNames.h include to correct alphabetical position, update
levelization results for new xrpld.telemetry module dependencies,
and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unknownCommand and wsUpgrade span name constants to RpcSpanNames.h,
fix SpanGuardFactory tests to use the 3-argument SpanGuard::span() API,
update levelization results, and apply rename script to docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement remaining Phase 4/4a consensus tracing tasks:
- Add consensus.phase.open span (open → closeLedger lifecycle)
- Add consensus.proposal.receive span in PeerImp with trusted attr
- Add consensus.validation.receive span in PeerImp with trusted/seq attrs
- Add tx_count attr on accept.apply, disputes_count on update_positions
- Add tx.included events with txId in doAccept transaction loop
- Enhance dispute.resolve event with yays/nays fields
- Add avalanche_threshold attr on update_positions span
- Reparent accept/accept.apply as children of round span via childSpan()
Also adds compile-time constants in ConsensusSpanNames.h and updates
the span hierarchy diagram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add [[maybe_unused]] to the RAII span in processSession() — the
variable is not read but its lifetime scopes the active OTel context
for child spans created in processRequest()
- Regenerate levelization: remove premature xrpld.telemetry entries
that reference a module not yet present on this branch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- capture_timings.py: fail when captured/total ratio < 50%
(--min-capture-ratio). Prevents silent pass on unreachable Prometheus.
- run-full-validation.sh: set REGRESSION_EXIT=2 on capture failure so
the final exit code reflects it. Update exit code docs in header.
- compare_to_baseline.py: extract _skip_delta helper to bring
compute_delta under 80 lines. Fix 0.0-as-falsy bug in abs_bound
resolution (use explicit None check instead of `or`). Remove dead
variable override_prefix_key.
- prom_queries.py: extract _build_simple_entries and _build_job_entries
to bring build_query_plan under 80 lines. Fix module docstring return
type example. Use aiohttp.ClientTimeout instead of bare int.
- telemetry-validation.yml: add set -euo pipefail to regression summary
step; guard jq calls with -e flag and fallback; fail on missing
baseline file; emit ::warning annotation when timings.json missing.
- baselines/README.md: document the placeholder field.
Captures per-span / per-RPC / per-job timings from Prometheus after the
workload run and diffs them against a committed baseline. Regression
requires breaching both a percentage and an absolute bound, tolerating
small-value noise. When the baseline is a placeholder, the comparator
emits the captured JSON in the exact schema for one-time paste into
baselines/baseline-timings.json, and the CI Step Summary surfaces that
block for the reviewer.
Scope: gate only — automated baseline persistence, benchmark.sh
PromQL migration, and the historical trend dashboard remain follow-ups.