Files
rippled/OpenTelemetryPlan/00-tracing-fundamentals.md
Pratik Mankawde 1fd971b78b fix(docs): apply rename scripts to OpenTelemetry plan docs
Run .github/scripts/rename/docs.sh to replace rippled → xrpld
references in all plan documentation files, fixing the check-rename
CI failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-28 13:57:38 +01:00

26 KiB

Distributed Tracing Fundamentals

Parent Document: OpenTelemetryPlan.md Next: Architecture Analysis


What is Distributed Tracing?

Distributed tracing is a method for tracking data objects as they flow through distributed systems. In a network like XRP Ledger, a single transaction touches multiple independent nodes—each with no shared memory or logging. Distributed tracing connects these dots.

Without tracing: You see isolated logs on each node with no way to correlate them.

With tracing: You see the complete journey of a transaction or an event across all nodes it touched.


Actors and Actions at a Glance

Actors

Who (Plain English) Technical Term
A single unit of work being tracked Span
The complete journey of a request Trace
Data that links spans across services Trace Context
Code that creates spans and propagates context Instrumentation
Service that receives and processes traces Collector
Storage and visualization system Backend (Tempo)
Decision logic for which traces to keep Sampler

Actions

What Happens (Plain English) Technical Term
Start tracking a new operation Create a Span
Connect a child operation to its parent Set parent_span_id
Group all related operations together Share a trace_id
Pass tracking data between services Context Propagation
Decide whether to record a trace Sampling (Head or Tail)
Send completed traces to storage Export (OTLP)

Core Concepts

1. Trace

A trace represents the entire journey of a request through the system. It has a unique trace_id that stays constant across all nodes.

Trace ID: abc123
├── Node A: received transaction
├── Node B: relayed transaction
├── Node C: included in consensus
└── Node D: applied to ledger

2. Span

A span represents a single unit of work within a trace. Each span has:

Attribute Description Example
trace_id Identifies the trace event123
span_id Unique identifier span456
parent_span_id Parent span (if any) p_span123
name Operation name rpc.submit
start_time When work began (local time) 2024-01-15T10:30:00Z
end_time When work completed (local time) 2024-01-15T10:30:00.050Z
attributes Key-value metadata tx.hash=ABC...
status OK, ERROR MSG OK

3. Trace Context

Trace context is the data that propagates between services to link spans together. It contains:

  • trace_id - The trace this span belongs to
  • span_id - The current span (becomes parent for child spans)
  • trace_flags - Sampling decisions

How Spans Form a Trace

Spans have parent-child relationships forming a tree structure:

flowchart TB
    subgraph trace["Trace: abc123"]
        A["tx.submit<br/>span_id: 001<br/>50ms"] --> B["tx.validate<br/>span_id: 002<br/>5ms"]
        A --> C["tx.relay<br/>span_id: 003<br/>10ms"]
        A --> D["tx.apply<br/>span_id: 004<br/>30ms"]
        D --> E["ledger.update<br/>span_id: 005<br/>20ms"]
    end

    style A fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style B fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style C fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style D fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style E fill:#bf360c,stroke:#8c2809,color:#ffffff

Reading the diagram:

  • tx.submit (blue, root): The top-level span representing the entire transaction submission; all other spans are its descendants.
  • tx.validate, tx.relay, tx.apply (green): Direct children of tx.submit, representing the three main stages -- validation, relay to peers, and application to the ledger.
  • ledger.update (red): A grandchild span nested under tx.apply, representing the actual ledger state mutation triggered by applying the transaction.
  • Arrows (parent to child): Each arrow indicates a parent-child span relationship where the parent's completion depends on the child finishing.

The same trace visualized as a timeline (Gantt chart):

Time →   0ms    10ms    20ms    30ms    40ms    50ms
         ├───────────────────────────────────────────┤
tx.submit│▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
         ├─────┤
tx.valid │▓▓▓▓▓│
         │     ├──────────┤
tx.relay │     │▓▓▓▓▓▓▓▓▓▓│
         │               ├────────────────────────────┤
tx.apply │               │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│
         │                         ├──────────────────┤
ledger   │                         │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓│

Span Relationships

Spans don't always form simple parent-child trees. Distributed tracing defines several relationship types to capture different causal patterns:

1. Parent-Child (ChildOf)

The default relationship. The parent span depends on or contains the child span. The child runs within the scope of the parent.

tx.submit (parent)
├── tx.validate (child)     ← parent waits for this
├── tx.relay (child)        ← parent waits for this
└── tx.apply (child)        ← parent waits for this

When to use: Synchronous calls, nested operations, any case where the parent's completion depends on the child.

2. Follows-From

A causal relationship where the first span triggers the second, but does not wait for it. The originator fires and moves on.

Time →

tx.receive [=======]
                     ↓ triggers (follows-from)
              tx.relay   [===========]   ← runs independently

When to use: Asynchronous jobs, queued work, fire-and-forget patterns. For example, a node receives a transaction and queues it for relay — the relay span follows from the receive span but the receiver doesn't wait for relaying to complete.

OpenTracing defined FollowsFrom as a first-class reference type alongside ChildOf. OpenTelemetry represents this using Span Links with descriptive attributes instead (see below).

Links connect spans that are causally related but not in a parent-child hierarchy. Unlike parent-child, links can cross trace boundaries.

Trace A                          Trace B
──────                           ──────
batch.schedule                   batch.execute
├─ item.enqueue (span X)    ┌──► process.item
├─ item.enqueue (span Y) ───┤    (links to X, Y, Z)
├─ item.enqueue (span Z)    └──►

Use cases:

Pattern Description
Batch processing A batch span links back to all individual spans that contributed to it
Fan-in An aggregation span links to the multiple producer spans it merges
Fan-out Multiple downstream spans link back to the single span that triggered them
Async handoff A deferred job links back to the request that queued it (follows-from)
Cross-trace Correlating spans across independent traces (e.g., retries, related events)

Link structure: Each link carries the target span's context plus optional attributes:

Link {
    trace_id:   <target trace>
    span_id:    <target span>
    attributes: { "link.description": "triggered by batch scheduler" }
}

Relationship Summary

flowchart LR
    subgraph parent_child["Parent-Child"]
        direction TB
        P["Parent"] --> C["Child"]
    end

    subgraph follows_from["Follows-From"]
        direction TB
        A["Span A"] -.->|triggers| B["Span B"]
    end

    subgraph links["Span Links"]
        direction TB
        X["Span X\n(Trace 1)"] -.-|link| Y["Span Y\n(Trace 2)"]
    end

    parent_child ~~~ follows_from ~~~ links

    style P fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style C fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style A fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style B fill:#bf360c,stroke:#8c2809,color:#ffffff
    style X fill:#4a148c,stroke:#38006b,color:#ffffff
    style Y fill:#4a148c,stroke:#38006b,color:#ffffff
Relationship Same Trace? Dependency? OTel Mechanism
Parent-Child Yes Parent depends on child parent_span_id
Follows-From Usually Causal but no dependency Link + attributes
Span Link Either Correlation, no dependency Link + attributes

Trace ID Generation

A trace_id is a 128-bit (16-byte) identifier that groups all spans belonging to one logical operation. How it's generated determines how easily you can find and correlate traces later.

General Approaches

1. Random (W3C Default)

Generate a random 128-bit ID when a trace starts. Standard approach for most services.

trace_id = random_128_bits()
Pros Cons
Simple, standard No natural correlation to domain events
Guaranteed unique per trace If propagation is lost, trace is broken
Works with all OTel tooling "Find trace for TX abc" requires index lookup

2. Deterministic (Derived from Domain Data)

Compute the trace_id from a hash of a natural identifier. Every node independently derives the same trace_id for the same event.

trace_id = SHA-256(domain_identifier)[0:16]   // truncate to 128 bits
Pros Cons
Propagation-resilient — same ID computed everywhere Same event processed twice (retry) shares trace_id
Natural search — domain ID maps directly to trace Non-standard (tooling assumes random)
No coordination needed between nodes 256→128 bit truncation (collision risk negligible at ~2⁶⁴)

3. Hybrid (Deterministic Prefix + Random Suffix)

First 8 bytes derived from domain data, last 8 bytes random.

trace_id = SHA-256(domain_identifier)[0:8] || random_64_bits()
Pros Cons
Prefix search: "find all traces for TX abc" Must propagate to maintain full trace_id
Unique per processing instance More complex generation logic
Retries get distinct trace_ids Partial correlation only (prefix match)

XRPL Workflow Analysis

XRPL has a unique advantage: its core workflows produce globally unique 256-bit hashes that are known on every node. This makes deterministic trace_id generation practical in ways most systems can't achieve.

Natural Identifiers by Workflow

Workflow Natural Identifier Size Known at Start? Same on All Nodes?
Transaction Transaction hash (tid_) 256-bit Yes — computed before signing Yes — hash of canonical tx data
Consensus round Previous ledger hash + ledger seq 256+32 bit Yes — known when round opens Yes — all validators agree
Validation Ledger hash being validated 256-bit Yes — from consensus result Yes — same closed ledger
Ledger catch-up Target ledger hash 256-bit Yes — we know what to fetch Yes — identifies ledger globally

Where These Identifiers Live in Code

Transaction:     STTx::getTransactionID()     → uint256 tid_
                 TMTransaction::rawTransaction → recompute hash from bytes

Consensus:       ConsensusProposal::prevLedger_ → uint256 (previous ledger hash)
                 ConsensusProposal::position_   → uint256 (TxSet hash)
                 LedgerHeader::seq              → uint32_t (ledger sequence)

Validation:      STValidation::getLedgerHash()  → uint256
                 STValidation::getNodeID()      → NodeID (160-bit)

Ledger fetch:    InboundLedger constructor      → uint256 hash, uint32_t seq
                 TMGetLedger::ledgerHash        → bytes (uint256)

Each workflow type derives its trace_id from its natural domain identifier:

Transaction trace:   trace_id = SHA-256("tx"    || tx_hash)[0:16]
Consensus trace:     trace_id = SHA-256("cons"  || prev_ledger_hash || ledger_seq)[0:16]
Ledger catch-up:     trace_id = SHA-256("fetch" || target_ledger_hash)[0:16]

The string prefix ("tx", "cons", "fetch") prevents collisions between workflows that might share underlying hashes.

Why this works for XRPL:

  1. Propagation-resilient — Even if a P2P message drops trace context, every node independently computes the same trace_id from the same tx_hash or ledger_hash. Spans still correlate.

  2. Zero-cost search — "Show me the trace for transaction ABC" becomes a direct lookup: compute SHA-256("tx" || ABC)[0:16] and query. No secondary index needed.

  3. Cross-workflow linking via Span Links — A consensus trace links to individual transaction traces. A validation span links to the consensus trace. This connects the full picture without forcing everything into one giant trace.

Cross-Workflow Correlation

Each workflow gets its own trace. Span Links tie them together:

flowchart TB
    subgraph tx_trace["Transaction Trace"]
        direction LR
        Tn["trace_id = f(tx_hash)"]:::note --> T1["tx.receive"] --> T2["tx.validate"] --> T3["tx.relay"]
    end

    subgraph cons_trace["Consensus Trace"]
        direction LR
        Cn["trace_id = f(prev_ledger, seq)"]:::note --> C1["cons.open"] --> C2["cons.propose"] --> C3["cons.accept"]
    end

    subgraph val_trace["Validation"]
        direction LR
        Vn["spans within consensus trace"]:::note --> V1["val.create"] --> V2["val.broadcast"]
    end

    subgraph fetch_trace["Catch-Up Trace"]
        direction LR
        Fn["trace_id = f(ledger_hash)"]:::note --> F1["fetch.request"] --> F2["fetch.receive"] --> F3["fetch.apply"]
    end

    C1 -.-|"span link\n(tx traces)"| T3
    C3 --> V1
    F1 -.-|"span link\n(target ledger)"| C3

    classDef note fill:none,stroke:#888,stroke-dasharray:5 5,color:#333,font-style:italic
    style T1 fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style T2 fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style T3 fill:#0d47a1,stroke:#082f6a,color:#ffffff
    style C1 fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style C2 fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style C3 fill:#1b5e20,stroke:#0d3d14,color:#ffffff
    style V1 fill:#bf360c,stroke:#8c2809,color:#ffffff
    style V2 fill:#bf360c,stroke:#8c2809,color:#ffffff
    style F1 fill:#4a148c,stroke:#38006b,color:#ffffff
    style F2 fill:#4a148c,stroke:#38006b,color:#ffffff
    style F3 fill:#4a148c,stroke:#38006b,color:#ffffff

Reading the diagram:

  • Transaction Trace (blue): An independent trace whose trace_id is deterministically derived from the transaction hash. Contains receive, validate, and relay spans.
  • Consensus Trace (green): An independent trace whose trace_id is derived from the previous ledger hash and sequence number. Covers the open, propose, and accept phases.
  • Validation (red): Validation spans live within the consensus trace (not a separate trace). They are created after the accept phase completes.
  • Catch-Up Trace (purple): An independent trace for ledger acquisition, derived from the target ledger hash. Used when a node is behind and fetching missing ledgers.
  • Dotted arrows (span links): Cross-trace correlations. Consensus links to transaction traces it included; catch-up links to the consensus trace that produced the target ledger.
  • Solid arrow (C3 to V1): A parent-child relationship -- validation spans are direct children of the consensus accept span within the same trace.

How a query flows:

"Why was TX abc slow?"
  1. Compute trace_id = SHA-256("tx" || abc)[0:16]
  2. Find transaction trace → see it was included in consensus round N
  3. Follow span link → consensus trace for round N
  4. See which phase was slow (propose? accept?)
  5. If a node was catching up, follow link → catch-up trace

Trade-offs to Consider

Concern Mitigation
Retries get same trace_id Add attempt attribute to root span; spans have unique span_ids and timestamps
256→128 bit truncation Birthday-bound collision at ~2⁶⁴ operations — negligible for XRPL's throughput
Non-standard generation OTel spec allows any 16-byte non-zero value; tooling works on the hex string
Hash computation cost SHA-256 is ~0.3μs per call; XRPL already computes these hashes for other purposes
Late-binding identifiers Ledger hash isn't known until after consensus — validation spans use ledger_seq as fallback, then link to the consensus trace

Distributed Traces Across Nodes

In distributed systems like xrpld, traces span multiple independent nodes. The trace context must be propagated in network messages:

sequenceDiagram
    participant Client
    participant NodeA as Node A
    participant NodeB as Node B
    participant NodeC as Node C

    Client->>NodeA: Submit TX<br/>(no trace context)

    Note over NodeA: Creates new trace<br/>trace_id: abc123<br/>span: tx.receive

    NodeA->>NodeB: Relay TX<br/>(trace_id: abc123, parent: 001)

    Note over NodeB: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001

    NodeA->>NodeC: Relay TX<br/>(trace_id: abc123, parent: 001)

    Note over NodeC: Creates child span<br/>span: tx.relay<br/>parent_span_id: 001

    Note over NodeA,NodeC: All spans share trace_id: abc123<br/>enabling correlation across nodes

Reading the diagram:

  • Client: The external entity that submits a transaction. It does not carry trace context -- the trace originates at the first node.
  • Node A: The entry point that creates a new trace (trace_id: abc123) and the root span tx.receive. It relays the transaction to peers with trace context attached.
  • Node B and Node C: Peer nodes that receive the relayed transaction along with the propagated trace context. Each creates a child span under Node A's span, preserving the same trace_id.
  • Arrows with trace context: The relay messages carry trace_id and parent_span_id, allowing each downstream node to link its spans back to the originating span on Node A.

Context Propagation

For traces to work across nodes, trace context must be propagated in messages.

What's in the Context (~26 bytes)

Field Size Description
trace_id 16 bytes Identifies the entire trace (constant across all nodes)
span_id 8 bytes The sender's current span (becomes parent on receiver)
trace_flags 1 byte Sampling decision (bit 0 = sampled; bits 1-7 reserved)
trace_state variable Optional vendor-specific data (typically omitted)

How span_id Changes at Each Hop

Only one span_id travels in the context - the sender's current span. Each node:

  1. Extracts the received span_id and uses it as the parent_span_id
  2. Creates a new span_id for its own span
  3. Sends its own span_id as the parent when forwarding
Node A                      Node B                      Node C
──────                      ──────                      ──────

Span AAA                    Span BBB                    Span CCC
   │                           │                           │
   ▼                           ▼                           ▼
Context out:                Context out:                Context out:
├─ trace_id: abc123         ├─ trace_id: abc123         ├─ trace_id: abc123
├─ span_id: AAA ──────────► ├─ span_id: BBB ──────────► ├─ span_id: CCC ──────►
└─ flags: 01                └─ flags: 01                └─ flags: 01
                               │                           │
                          parent = AAA               parent = BBB

The trace_id stays constant, but span_id changes at every hop to maintain the parent-child chain.

Propagation Formats

There are two patterns:

HTTP/RPC Headers (W3C Trace Context)

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             │  │                                │                │
             │  │                                │                └── Flags (sampled)
             │  │                                └── Parent span ID (16 hex)
             │  └── Trace ID (32 hex)
             └── Version

Protocol Buffers (xrpld P2P messages)

message TMTransaction {
    bytes rawTransaction = 1;
    // ... existing fields ...

    // Trace context extension
    bytes trace_parent = 100;  // W3C traceparent
    bytes trace_state = 101;   // W3C tracestate
}

Sampling

Not every trace needs to be recorded. Sampling reduces overhead:

Head Sampling (at trace start)

Request arrives → Random 10% chance → Record or skip entire trace
  • Low overhead
  • May miss interesting traces

Tail Sampling (after trace completes)

Trace completes → Collector evaluates:
                  - Error? → KEEP
                  - Slow? → KEEP
                  - Normal? → Sample 10%
  • Never loses important traces
  • Higher memory usage at collector

Key Benefits for xrpld

Challenge How Tracing Helps
"Where is my transaction?" Follow trace across all nodes it touched
"Why was consensus slow?" See timing breakdown of each phase
"Which node is the bottleneck?" Compare span durations across nodes
"What happened during the outage?" Correlate errors across the network

Glossary

Term Definition
Trace Complete journey of a request, identified by trace_id
Span Single operation within a trace
Parent-Child Span relationship where the parent depends on the child
Follows-From Causal relationship where originator doesn't wait for the result
Span Link Non-hierarchical connection between spans, possibly across traces
Deterministic ID Trace ID derived from domain data (e.g., tx_hash) instead of random
Context Data propagated between services (trace_id, span_id, flags)
Instrumentation Code that creates spans and propagates context
Collector Service that receives, processes, and exports traces
Backend Storage/visualization system (Tempo)
Head Sampling Sampling decision at trace start
Tail Sampling Sampling decision after trace completes

Next: Architecture Analysis | Back to: Overview