Fix quorum attribute to use actual validator quorum instead of proposer
count, add missing ConsensusState::Expired handling in haveConsensus()
span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve
levelization cycle, remove unused constants, enrich proposal receive
span with sequence, and correct stale documentation references.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Phase4_taskList.md and 06-implementation-phases.md to reflect
completed implementation of all remaining Phase 4/4a tasks (4.2-4.6,
4a.5, 4a.6, 4a.8). Update exit criteria and summary tables.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move Telemetry.h (associated header) to first include position in
Telemetry.cpp per the project's include-order convention. Trim
trailing whitespace from POC_taskList.md markdown table columns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate the existing StatsD metrics pipeline (beast::insight) into
the OpenTelemetry observability stack and add new trace spans for
ledger build/store/validate and peer proposal/validation receive.
Phase 5b — Ledger, peer, and transaction spans:
- Add ledger.build span with close time attributes in BuildLedger.cpp
- Add tx.apply span with tx_count/tx_failed in BuildLedger.cpp
- Add ledger.store and ledger.validate spans in LedgerMaster.cpp
- Add peer.proposal.receive span with trusted attribute in PeerImp.cpp
- Add peer.validation.receive span with ledger_hash, full, trusted
attributes in PeerImp.cpp
- Add ledger-operations and peer-network Grafana dashboards
Phase 6 — StatsD metrics integration:
- Add StatsD UDP receiver (port 8125) to OTel Collector
- Add 5 StatsD Grafana dashboards: node health, network traffic,
overlay traffic detail, ledger data sync, RPC pathfinding
- Add 09-data-collection-reference.md cataloging all metrics/spans
- Update existing dashboards with new span panels
- Expand telemetry runbook and integration test script
- Add codecov exclusions for telemetry modules
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark Tasks 5.3 (alert definitions) and 5.6 (training materials) as
"Deferred — post-MVP" in the implementation phases document to
accurately reflect current delivery scope. Add status column to the
Phase 5 task table.
Also fix stale reference to XRPL_TRACE_* macros in Phase 4a section —
the implementation uses SpanGuard factory methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Record the close time voting threshold and consensus state on
consensus.update_positions and consensus.check spans:
- xrpl.consensus.close_time_threshold: the avCT_CONSENSUS_PCT (75%)
threshold required for close time agreement
- xrpl.consensus.have_close_time_consensus: whether validators
reached close time consensus in this iteration
These attributes enable dashboards to show how the close time
voting process converges (or stalls) across consensus iterations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.
Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:
- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
(consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
(used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
from consensus thread to jtACCEPT worker thread
Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add trace_id = txHash[0:16] strategy so all nodes handling the same
transaction independently produce spans under the same trace_id,
combined with protobuf span_id propagation for parent-child ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to old XRPL_TRACE_TX/CONSENSUS macros with
SpanGuard::span(TraceCategory, ...) factory calls introduced in Phase 1c.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds xrpl.peer.version attribute to tx.receive spans for version-mismatch
correlation during network upgrades.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move node health attribute strings to compile-time constants in
SpanNames.h (attr::nodeAmendmentBlocked, attr::nodeServerState)
- Add Tempo search filters for node health attributes
- Remove unnecessary .c_str() on strOperatingMode() return
- Add samplingRatio clamping test (values > 1.0 and < 0.0)
- Fix Task 2.3 status: delivered in Phase 1c, not Phase 2
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mark deferred tasks (2.1→Phase 3, 2.5→low priority) with rationale.
Mark superseded tasks (2.2→Phase 1c SpanGuard factory). Add Task 2.7
for Grafana search filters. Update summary table with status column.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces task list documents for Phases 2 through 5, with Tempo
references (replacing Jaeger) and Task 2.8 dashboard parity spec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace references to non-existent TracingInstrumentation.h with
SpanGuard.cpp pimpl implementation that actually exists on this branch.
Update conditional compilation section to describe the pimpl approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add DiscardFlag.h and FilteringSpanProcessor references to the file
tree, key files table, and implementation summary in OpenTelemetryPlan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove duplicate otlp/tempo exporter block, duplicate tempo service
definition, and jaeger dependency from docker-compose example.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run .github/scripts/rename/docs.sh to replace rippled → xrpld
references in all plan documentation files, fixing the check-rename
CI failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sub-task 7.10a: Per-Validator Validation Count (Flag Ledger Window)
to the Phase 7 task list. This metric tracks how many of the last 256
ledgers each UNL validator has validated — the key participation metric
for UNL health monitoring.
Implementation plan:
- Observable gauge rippled_validator_participation with validator label
- Data from RCLValidations::getTrustedForLedger() over 256-ledger window
- Emitted at flag ledger boundaries (~15 min interval)
- Grafana table panel with threshold coloring (green/yellow/red)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plan documents referenced Application.h and app_ for getTelemetry()
but the codebase now uses ServiceRegistry as the interface. Updated:
- 05-configuration-reference.md: getTelemetry() on ServiceRegistry,
deferred serviceInstanceId pattern in ApplicationImp
- POC_taskList.md Task 4: target ServiceRegistry.h not Application.h,
correct config file path and constructor pattern
- 04-code-samples.md: fix overlay() -> getOverlay(), rewrite JobQueue
sample to reflect actual architecture (no app_ member)
- 03-implementation-strategy.md: fix file impact table path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip effort/risk columns from task tables and remove the §6.9 Effort
Summary section with its pie chart and resource requirements table.
Renumber §6.10 Quick Wins → §6.9.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split document index into Plan Documents and Task Lists sections.
These files were introduced in this branch but missing from the index.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds ValidationTracker (agreement computation with 8s grace period),
validator health, peer quality, ledger economy, state tracking,
storage detail gauges, 7 synchronous counters, and agreement gauge.
29 new metrics covering validation agreement, peer quality, UNL health,
ledger economy, state tracking, and upgrade awareness.
Part of the external dashboard parity initiative across phases 2-11.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bridge the existing beast::insight gauge for resource-limit peer
disconnects (peerDisconnectsCharges_) into the StatsD metric inventory.
Part of the external dashboard parity initiative.
See docs/superpowers/specs/2026-03-30-external-dashboard-parity-design.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>