Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim,
tx.transactor) added on phase-3 through the observability stack so the
spanmetrics connector produces per-stage RED metrics without any native
instruments.
- collector: add the `stage` dimension to the spanmetrics connector so
the three stages split into separate metric series (3 bounded values).
- dashboard: add a "Tx Apply Pipeline" section to transaction-overview
with rate, p95 latency, and failure-rate panels grouped by stage, plus
a `stage` template variable. Panels follow the existing config (node
filter, exported_instance legends, Title Case, axis labels).
- The failure panel filters ter_result != tesSUCCESS rather than span
status, because a failing ter code completes the span normally — only
thrown exceptions set an error status. This matches the existing
"Transaction Results by Type" panel convention.
- docs: document the spans, attributes, and stage dimension in the data
collection reference and runbook, including the sampling caveat that
span-derived metrics inherit tracer head-sampling and undercount at
sampling_ratio < 1.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two stray "rippled" tokens introduced by 43258e8d ("docs(telemetry):
add secure-OTel pipeline analysis…") were caught by check-rename in
CI. Re-run docs.sh to convert them to xrpld so the rename check
passes on PR #6425 (and downstream PR #6426 once merged up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the threat model and chosen hardening approach for the OTel
pipeline: mTLS to the collector as primary defense (across-network
deployment), NetworkPolicy as defense-in-depth, and source-side
validation plus per-peer rate limiting for protocol::TraceContext on
peer messages. Skips Basic Auth (wrong shape for multi-operator
fleet) and HTTP-gateway header stripping (rippled is P2P).
Wires the new doc into the master plan ToC, mermaid diagram, and
body section, plus cross-refs from the privacy section in
02-design-decisions.md and the collector config in
05-configuration-reference.md so readers reach it from natural
in-context entry points. Adds a backlink at the top of secure-OTel.md
to the master plan.
Adds 'exfiltration' and 'htpasswd' to cspell dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop xrpl.node.amendment_blocked / xrpl.node.server_state from telemetry
surface (constants in SpanNames.h, two filters in tempo.yaml). Operators
read the same data via server_info / server_state RPC; OTel SDK 1.18.0
cannot refresh resource attrs at runtime so resource-level emission was
not viable either.
- Namespace all pathfind span attributes under pathfind_* (underscore form
per Phase 1c rule 5). Renames in PathFindSpanNames.h and call sites in
PathRequest.cpp, PathRequestManager.cpp, plus the rule-5 retention
xrpl.pathfind.ledger_index -> pathfind_ledger_index.
- Wire pathfind_source_account / pathfind_dest_account on pathfind.request
in doPathFind / doRipplePathFind handlers (only when present + string).
- Collapse per-asset pathfind.discover / pathfind.rank spans into one
pathfind.discover hoisted around the per-source-asset loop in
PathRequest::findPaths. Span count goes from 2N to 1 per RPC call;
per-asset breakdown traded for bounded storage and cardinality. Trade-off
documented inline.
- Fix pathfind_num_paths semantics: now sums getBestPaths().size() across
the loop (paths actually returned) instead of the maxPaths input cap.
- PathRequestManager::updateAll: move span creation after the locked
requests_ snapshot, early-return when no active subscriptions exist
(avoids empty span on every ledger close), set pathfind_num_requests
= requests.size().
- Update Phase2_taskList.md and 02-design-decisions.md to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>