Two stray "rippled" tokens introduced by 43258e8d ("docs(telemetry):
add secure-OTel pipeline analysis…") were caught by check-rename in
CI. Re-run docs.sh to convert them to xrpld so the rename check
passes on PR #6425 (and downstream PR #6426 once merged up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in pratik/otel-phase3-tx-tracing's two commits that move all
consensus-tracing content off phase-3:
- c9521b97fe refactor(telemetry): pull consensus-tracing scope-leak
out of phase-3
- d6b101069e refactor(telemetry): remove consensus tracing from
phase-3
Phase-4 already owns this content (with the consensus_span -> cons_span
namespace rename, round/accept/proposalSend/validationSend span
construction, and the relocated ConsensusSpanNames.h under
src/xrpld/consensus/), plus its own evolution on top — so the merge is
resolved as a tree-identity merge:
- src/xrpld/app/consensus/ConsensusSpanNames.h: keep phase-4's
renamed/expanded version (modify/delete conflict, resolved with
--ours).
- src/xrpld/app/consensus/RCLConsensus.cpp: keep phase-4's version
with cons_span attribute calls, the trace_context inject blocks
on broadcastPropose/validate, the secure-OTel TODO, and the
full validation/round span instrumentation (content conflict,
resolved with --ours).
- src/xrpld/overlay/detail/PeerImp.cpp: keep phase-4's version with
the proposalReceiveSpan/validationReceiveSpan calls, lambda span
captures, and cons_span::attr::* setAttribute calls (content
conflict, resolved with --ours).
- src/xrpld/telemetry/ConsensusReceiveTracing.h: phase-3 deleted it,
phase-4 still uses it. Restored from phase-4 HEAD (silent
auto-deletion otherwise).
- include/xrpl/telemetry/TraceContextPropagator.h: phase-3 stripped
consensus references and the secure-OTel TODO; phase-4 still has
both. Restored from phase-4 HEAD.
- src/xrpld/telemetry/PropagationHelpers.h: phase-3 swapped the
@see ConsensusReceiveTracing.h cross-reference for TxTracing.h;
phase-4 still wants the consensus reference. Restored from
phase-4 HEAD.
Net tree change on phase-4: zero. Verified via 'git diff <pre-merge-sha>
HEAD' returning empty.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Phase-3 (PR #6425) is scoped to transaction tracing only; consensus
tracing belongs to phase-4 (PR #6426). The previous commit on this
branch removed the namespace/attribute scaffolding c6c019ed8b leaked
into phase-3, but phase-3 still carried the consensus span construction
and trace-context propagation introduced in earlier commits
(61cb1faf8f, 93bed03d8d). Move that out too so phase-3 creates and
propagates no consensus spans of any kind.
Removed:
- src/xrpld/telemetry/ConsensusReceiveTracing.h (deleted; phase-4
owns it).
- PeerImp.cpp: remove the std::make_shared<SpanGuard>(
proposalReceiveSpan(...))/validationReceiveSpan(...) constructions
in onMessage(TMProposeSet)/onMessage(TMValidation), drop the
sp = std::move(span) lambda captures, and drop the
#include <xrpld/telemetry/ConsensusReceiveTracing.h>.
- RCLConsensus.cpp: drop the two telemetry::injectToProtobuf() blocks
that injected the active trace context into TMProposeSet (in
Adaptor::propose, after addSuppression) and TMValidation (in
Adaptor::validate, around the broadcast call). Drop the now-unused
#include of TraceContextPropagator.h and the
XRPL_ENABLE_TELEMETRY-gated include of
opentelemetry/context/runtime_context.h.
- TraceContextPropagator.h: update file-level @see comment to drop
the ConsensusReceiveTracing.h reference and to scope the
"wired into the P2P message flow via PropagationHelpers.h"
sentence to TMTransaction only.
- PropagationHelpers.h: replace the
@see ConsensusReceiveTracing.h cross-reference with
@see TxTracing.h.
Inert consensus metadata (TraceCategory::Consensus enum value,
seg::consensus constant, isCategoryEnabled/categoryToSpanKind switch
arms, the SpanGuard.h doc-comment example) is intentionally preserved
on phase-3: nothing references it after this commit, but phase-4
needs it and removing it would widen the phase-3 -> phase-4 merge
surface for no benefit.
Verified via git grep: no remaining phase-3 references to
proposalReceiveSpan, validationReceiveSpan, ConsensusReceiveTracing,
consensus_span::, consensus.proposal, or consensus.validation.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Commit c6c019ed8b ("addressed code review comments") bundled tx-tracing
review fixes with consensus-tracing scaffolding that belongs on
pratik/otel-phase4-consensus-tracing (PR #6426). This commit lifts the
consensus-only parts back out of phase-3 so PR #6425 stays scoped to
transaction tracing. Phase-4 already carries the same content (via prior
phase-3 -> phase-4 merges) plus its own evolution on top, so nothing is
moved across — only removed here.
Removed:
- src/xrpld/app/consensus/ConsensusSpanNames.h (deleted; phase-4 owns
it).
- PeerImp.cpp: revert onMessage(TMProposeSet)/onMessage(TMValidation)
consensus-attr setAttribute calls to the hardcoded
"xrpl.consensus.{trusted,round,ledger.seq}" strings used before
c6c019ed8b. Drop the now-unused
#include <xrpld/app/consensus/ConsensusSpanNames.h>.
- RCLConsensus::Adaptor::validate(): remove the
TODO(observability/secure-OTel) block on validation trace_context.
- TraceContextPropagator.h: remove the TODO(observability/secure-OTel)
block on injectToProtobuf().
Tx-tracing parts of c6c019ed8b are intentionally untouched.
No phase-3 caller of telemetry::consensus_span:: remains; verified via
git grep. No test on phase-3 references the removed header.
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Document the threat model and chosen hardening approach for the OTel
pipeline: mTLS to the collector as primary defense (across-network
deployment), NetworkPolicy as defense-in-depth, and source-side
validation plus per-peer rate limiting for protocol::TraceContext on
peer messages. Skips Basic Auth (wrong shape for multi-operator
fleet) and HTTP-gateway header stripping (rippled is P2P).
Wires the new doc into the master plan ToC, mermaid diagram, and
body section, plus cross-refs from the privacy section in
02-design-decisions.md and the collector config in
05-configuration-reference.md so readers reach it from natural
in-context entry points. Adds a backlink at the top of secure-OTel.md
to the master plan.
Adds 'exfiltration' and 'htpasswd' to cspell dictionary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop xrpl.node.amendment_blocked / xrpl.node.server_state from telemetry
surface (constants in SpanNames.h, two filters in tempo.yaml). Operators
read the same data via server_info / server_state RPC; OTel SDK 1.18.0
cannot refresh resource attrs at runtime so resource-level emission was
not viable either.
- Namespace all pathfind span attributes under pathfind_* (underscore form
per Phase 1c rule 5). Renames in PathFindSpanNames.h and call sites in
PathRequest.cpp, PathRequestManager.cpp, plus the rule-5 retention
xrpl.pathfind.ledger_index -> pathfind_ledger_index.
- Wire pathfind_source_account / pathfind_dest_account on pathfind.request
in doPathFind / doRipplePathFind handlers (only when present + string).
- Collapse per-asset pathfind.discover / pathfind.rank spans into one
pathfind.discover hoisted around the per-source-asset loop in
PathRequest::findPaths. Span count goes from 2N to 1 per RPC call;
per-asset breakdown traded for bounded storage and cardinality. Trade-off
documented inline.
- Fix pathfind_num_paths semantics: now sums getBestPaths().size() across
the loop (paths actually returned) instead of the maxPaths input cap.
- PathRequestManager::updateAll: move span creation after the locked
requests_ snapshot, early-return when no active subscriptions exist
(avoids empty span on every ledger close), set pathfind_num_requests
= requests.size().
- Update Phase2_taskList.md and 02-design-decisions.md to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per PR #6438 review thread r3250432621: known-command errors
(rpcTOO_BUSY, rpcNO_PERMISSION, etc.) were collapsing into a
single rpc.command.unknown span name, hiding per-command error
rates in dashboards. Same anti-pattern existed for gRPC, where
every method was bucketed under grpc.request with the method
relegated to an attribute.
- RPCHandler.cpp: doCommand error path uses cmdName as the span
suffix; the rpc_span::val::unknownCommand fallback only applies
when the request truly omits both command and method fields.
- GRPCServer.cpp: gRPC span name is now grpc.<MethodName>
(e.g. grpc.GetLedger). Method also retained as an attribute.
- GrpcSpanNames.h: drop the unused op::request constant; update
the span-hierarchy comment.
- RpcSpanNames.h: update the gRPC span diagram to match.
Dashboards on downstream phases will benefit from per-command
breakdowns without needing TraceQL attribute filters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 added a span catalog in `06-implementation-phases.md` listing the
source location for each consensus span. Line numbers `Consensus.h:707`,
`RCLConsensus.cpp:232/341/492/541/900` drift on every refactor and would
become stale PR after PR. Filename alone is enough for operators to
grep — the RCLConsensus.cpp spans are already unambiguous from the span
name itself.
Phase-1a plan documents advertised OTLP/gRPC on port 4317 as the default
exporter, four unparsed [telemetry] config keys, and "Phase 4a Complete"
status with exit-criteria checkboxes marked done. Every downstream branch
through Phase 5 ships only OTLP/HTTP on port 4318 via OtlpHttpExporterFactory,
never parses the advertised keys, and the Phase 4 work is not yet delivered.
Fixes:
- 02-design-decisions.md: flip §2.1.1 SDK dependency recommendations to
OTLP/HTTP (shipped) with OTLP/gRPC marked Future. Update §2.2 architecture
diagram and text from OTLP/gRPC:4317 to OTLP/HTTP:4318. Rewrite §2.2.1 as
"OTLP/HTTP (Shipped)" and §2.2.2 as "OTLP/gRPC (Future Work — Planned
Upgrade)" with a concrete checklist (Conan dep, config parsing, factory
branch, runbook/dashboard updates) for landing the gRPC transport later.
- 05-configuration-reference.md: drop the fabricated exporter/otlp_grpc key
and the :4317 default from the sample config block and the options-summary
table. Move trace_pathfind, trace_txq, trace_validator, trace_amendment
into a new "Planned (not yet implemented)" table citing the phase that will
add each one. Keep the example config minimal so copy-paste does not produce
a silently-ignored stanza.
- 06-implementation-phases.md: reset Phase 4 Exit Criteria checkboxes from
[x] to [ ] (Phase 4 is not shipped at Phase-1a time). Rename "Phase 4a
Complete" to "Phase 4a Plan" and describe the work as future. Replace the
broken forward link to Phase4_taskList.md (introduced in the Phase 2 PR)
with a sentence pointing readers to where that spec will land. Renumber
the final section 6.12 to 6.11 so it sits directly after 6.10; section 6.11
("Effort Summary") was intentionally removed in earlier edits.
SpanGuard::span() hardcoded SpanKind::kInternal for every span. Tempo's
service-graph and spanmetrics RED calculations rely on kServer /
kConsumer / kClient / kProducer to classify inbound vs outbound vs
internal operations. With kInternal everywhere, the service graph
collapses to a single self-loop and RED metrics attribute all latency
to internal work.
Add categoryToSpanKind() mapping:
- Rpc -> kServer (inbound synchronous request)
- Peer -> kConsumer (inbound async peer message)
- Transactions -> kInternal
- Consensus -> kInternal
- Ledger -> kInternal
Only the single-argument overload is affected; childSpan / linkedSpan
continue to default to kInternal because they represent in-process
continuations of an already-kinded parent.
Consensus span attributes use bare names (close_time_correct,
consensus_state, close_resolution_ms) and shared canonical attrs
(xrpl.ledger.seq) per SpanNames.h. xrpl.consensus.mode and
xrpl.consensus.round are correct (domain-qualified to avoid collision).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Transaction span attributes use bare names (local, tx_status) per
SpanNames.h convention, not xrpl.tx.* qualified names. xrpl.tx.hash
is correct (shared canonical attr defined in SpanNames.h).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RPC span attributes use bare names (command, rpc_status, rpc_role) per
the naming convention in SpanNames.h, not xrpl.rpc.* qualified names.
Node health attributes (amendment_blocked, server_state) are resource
attributes set at Tracer init, not span attributes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NetworkOPs.h and SpanNames.h were only needed for per-span
nodeAmendmentBlocked/nodeServerState calls, which were removed
in the attr naming simplification. Fixes clang-tidy CI failure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Phase2_taskList: update attr refs to bare names, note node-health
attrs moved to resource level.
- 02-design-decisions: strip xrpl.pathfind.* prefix from planned attrs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update OpenTelemetryPlan docs and Telemetry.h doc example to reflect
the renamed per-span attributes: xrpl.rpc.command -> command,
xrpl.rpc.status -> rpc_status, xrpl.grpc.method -> method, etc.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>