Phase 9 introduces the ledger.acquire span (InboundLedger fetch) that phases 7-8
do not have, so the forward-merged 09-data-collection-reference inventory is
extended here:
- §1.1: add ledger.acquire to the Ledger span table.
- §1.2: add its attributes (acquire_reason, timeouts, peer_count, outcome) and
note it also sets ledger_seq; bump the span count.
Also fix two stale StatsD metric references in the Peer Quality dashboard
(xrpld-peer-quality.json): rippled_Peer_Finder_Active_{Inbound,Outbound}_Peers
-> xrpld_Peer_Finder_* to match the xrpld_ metric prefix the rest of the stack uses.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The §1 span and attribute inventory had regressed to an older 16-span snapshot
that uses the pre-2026-05-13 dotted attribute keys, while phase-7's code emits
~36 spans with bare/underscore attribute keys. The §Data Flow Overview and §2
System Metrics sections (native OTLP transport — phase-7's migration) were already
correct and are left unchanged.
- §1.1: expand the span inventory to the full surface — add gRPC (grpc.<MethodName>),
TxQ (txq.*), PathFind (pathfind.*), and the full consensus set (round/phase.open/
establish/update_positions/check/mode_change/proposal.receive/validation.receive).
Fix the phantom rpc.request -> rpc.http_request, add rpc.ws_upgrade. No grpc.request,
no pathfind.rank, no ledger.acquire (the latter is added in phase-9, not yet present here).
- §1.2: convert every span-attribute key from dotted xrpl.<domain>.<field> to the
bare/underscore form. The sole span-attr dotted exception is xrpl.ledger.hash on
peer.validation.receive (shared constant); consensus.validation.send uses bare
ledger_hash. Resource attrs xrpl.network.id/type stay dotted. Fix tx_count/tx_failed
placement (on tx.apply, not ledger.build). Add attribute tables for the new families.
- §1.3: list the full set of spanmetrics dimension labels (bare keys, from the collector
config) instead of the stale xrpl_rpc_command-style names.
- §4/§5: convert Tempo TraceQL and PromQL examples to the bare attribute/label forms.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the two previously-registered-but-never-incremented validation
counters to ValidationTracker's gross lifetime tallies, exported as
monotonic ObservableCounters. New gross atomics count each ledger once at
first classification and are never adjusted on late repair, keeping the
_total counters monotonic and additive (agreements_total + missed_total ==
ledgers reconciled); the repair-aware windowed view stays on the existing
xrpld_validation_agreement gauge. The validator-health dashboard panels
that already query these names now render data instead of "No data".
Also de-stale 09-data-collection-reference.md: §5b documented flat metric
names (xrpld_cache_SLE_hit_rate, ...) that the code never emits — it emits
labeled gauges (xrpld_cache_metrics{metric="SLE_hit_rate"}). Replace the
stale flat-name tables with a pointer to the canonical labeled section,
reconcile the contradictory headline counts, and correct xrpld_job_count
to its real exported name xrpld_jobq_job_count.
Adds two GTests asserting gross tallies stay frozen on repair while net
totals move, plus the additive invariant.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim,
tx.transactor) added on phase-3 through the observability stack so the
spanmetrics connector produces per-stage RED metrics without any native
instruments.
- collector: add the `stage` dimension to the spanmetrics connector so
the three stages split into separate metric series (3 bounded values).
- dashboard: add a "Tx Apply Pipeline" section to transaction-overview
with rate, p95 latency, and failure-rate panels grouped by stage, plus
a `stage` template variable. Panels follow the existing config (node
filter, exported_instance legends, Title Case, axis labels).
- The failure panel filters ter_result != tesSUCCESS rather than span
status, because a failing ter code completes the span normally — only
thrown exceptions set an error status. This matches the existing
"Transaction Results by Type" panel convention.
- docs: document the spans, attributes, and stage dimension in the data
collection reference and runbook, including the sampling caveat that
span-derived metrics inherit tracer head-sampling and undercount at
sampling_ratio < 1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two stray "rippled" tokens introduced by 43258e8d ("docs(telemetry):
add secure-OTel pipeline analysis…") were caught by check-rename in
CI. Re-run docs.sh to convert them to xrpld so the rename check
passes on PR #6425 (and downstream PR #6426 once merged up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the threat model and chosen hardening approach for the OTel
pipeline: mTLS to the collector as primary defense (across-network
deployment), NetworkPolicy as defense-in-depth, and source-side
validation plus per-peer rate limiting for protocol::TraceContext on
peer messages. Skips Basic Auth (wrong shape for multi-operator
fleet) and HTTP-gateway header stripping (rippled is P2P).
Wires the new doc into the master plan ToC, mermaid diagram, and
body section, plus cross-refs from the privacy section in
02-design-decisions.md and the collector config in
05-configuration-reference.md so readers reach it from natural
in-context entry points. Adds a backlink at the top of secure-OTel.md
to the master plan.
Adds 'exfiltration' and 'htpasswd' to cspell dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>