Files
rippled/OpenTelemetryPlan/Phase8_taskList.md
Pratik Mankawde 414c30e6e6 Phase 8: Log-trace correlation plan docs and task list
- Add §6.8.1 to 06-implementation-phases.md with full Phase 8 plan
  (motivation, architecture, Mermaid diagrams, tasks table, exit criteria)
- Add Phase8_taskList.md with per-task breakdown (8.1-8.6)
- Add §5a log-trace correlation section to 09-data-collection-reference.md
- Add Phase 8 row to OpenTelemetryPlan.md, update totals to 13 weeks / 8 phases
- Add Phases 6-8 to Gantt chart in 06-implementation-phases.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:28 +00:00

8.3 KiB

Phase 8: Log-Trace Correlation and Centralized Log Ingestion — Task List

Goal: Inject trace context (trace_id, span_id) into rippled's Journal log output for log-trace correlation, and add OTel Collector filelog receiver to ingest logs into Grafana Loki for unified observability.

Scope: Two independent sub-phases — 8a (code change: trace_id in logs) and 8b (infra only: filelog receiver to Loki). No changes to the beast::Journal public API.

Branch: pratik/otel-phase8-log-correlation (from pratik/otel-phase7-native-metrics)

Document Relevance
06-implementation-phases.md Phase 8 plan: motivation, architecture, exit criteria (§6.8.1)
07-observability-backends.md Loki backend recommendation, Grafana data source provisioning
Phase7_taskList.md Prerequisite — native OTel metrics pipeline must be working
05-configuration-reference.md [telemetry] config (trace_id injection toggle)

Task 8.1: Inject trace_id into Logs::format()

Objective: Add OTel trace context to every log line that is emitted within an active span.

What to do:

  • Edit src/libxrpl/basics/Log.cpp:

    • In Logs::format() (around line 346), after severity is appended, check for active OTel span:
      #ifdef XRPL_ENABLE_TELEMETRY
      auto span = opentelemetry::trace::GetSpan(
          opentelemetry::context::RuntimeContext::GetCurrent());
      auto ctx = span->GetContext();
      if (ctx.IsValid())
      {
          // Append trace context as structured fields
          char traceId[33], spanId[17];
          ctx.trace_id().ToLowerBase16(traceId);
          ctx.span_id().ToLowerBase16(spanId);
          output += "trace_id=";
          output.append(traceId, 32);
          output += " span_id=";
          output.append(spanId, 16);
          output += ' ';
      }
      #endif
      
    • Add #include for OTel context headers, guarded by #ifdef XRPL_ENABLE_TELEMETRY
  • Edit include/xrpl/basics/Log.h:

    • No changes needed — format() signature unchanged

Key modified files:

  • src/libxrpl/basics/Log.cpp

Performance note: GetSpan() and GetContext() are thread-local reads with no locking — measured at <10ns per call. With ~1000 JLOG calls/min, this adds <10us/min of overhead.


Task 8.2: Add Loki to Docker Compose Stack

Objective: Add Grafana Loki as a log storage backend in the development observability stack.

What to do:

  • Edit docker/telemetry/docker-compose.yml:

    • Add Loki service:
      loki:
        image: grafana/loki:2.9.0
        ports:
          - "3100:3100"
        command: -config.file=/etc/loki/local-config.yaml
      
    • Add Loki as a Grafana data source in provisioning
  • Create docker/telemetry/grafana/provisioning/datasources/loki.yaml:

    • Configure Loki data source with derived fields linking trace_id to Tempo

Key new files:

  • docker/telemetry/grafana/provisioning/datasources/loki.yaml

Key modified files:

  • docker/telemetry/docker-compose.yml

Task 8.3: Add Filelog Receiver to OTel Collector

Objective: Configure the OTel Collector to tail rippled's log file and export to Loki.

What to do:

  • Edit docker/telemetry/otel-collector-config.yaml:

    • Add filelog receiver:
      receivers:
        filelog:
          include: [/var/log/rippled/debug.log]
          operators:
            - type: regex_parser
              regex: '^(?P<timestamp>\S+)\s+(?P<partition>\S+):(?P<severity>\S+)\s+(?:trace_id=(?P<trace_id>[a-f0-9]+)\s+span_id=(?P<span_id>[a-f0-9]+)\s+)?(?P<message>.*)$'
              timestamp:
                parse_from: attributes.timestamp
                layout: "%Y-%m-%dT%H:%M:%S.%fZ"
      
    • Add logs pipeline:
      service:
        pipelines:
          logs:
            receivers: [filelog]
            processors: [batch]
            exporters: [otlp/loki]
      
    • Add Loki exporter:
      exporters:
        otlp/loki:
          endpoint: loki:3100
          tls:
            insecure: true
      
  • Mount rippled's log directory into the collector container via docker-compose volume

Key modified files:

  • docker/telemetry/otel-collector-config.yaml
  • docker/telemetry/docker-compose.yml

Task 8.4: Configure Grafana Trace-to-Log Correlation

Objective: Enable one-click navigation from Tempo traces to Loki logs in Grafana.

What to do:

  • Edit Grafana Tempo data source provisioning to add tracesToLogs configuration:

    tracesToLogs:
      datasourceUid: loki
      filterByTraceID: true
      filterBySpanID: false
      tags: ["partition", "severity"]
    
  • Edit Grafana Loki data source provisioning to add derivedFields linking trace_id back to Tempo:

    derivedFields:
      - datasourceUid: tempo
        matcherRegex: "trace_id=(\\w+)"
        name: TraceID
        url: "$${__value.raw}"
    

Key modified files:

  • docker/telemetry/grafana/provisioning/datasources/loki.yaml
  • docker/telemetry/grafana/provisioning/datasources/ (Tempo data source file)

Task 8.5: Update Integration Tests

Objective: Verify trace_id appears in logs and Loki correlation works.

What to do:

  • Edit docker/telemetry/integration-test.sh:
    • After sending RPC requests (which create spans), grep rippled's log output for trace_id=
    • Verify trace_id matches a trace visible in Jaeger
    • Optionally: query Loki via API to confirm log ingestion

Key modified files:

  • docker/telemetry/integration-test.sh

Task 8.6: Update Documentation

Objective: Document the log correlation feature in runbook and reference docs.

What to do:

  • Edit docs/telemetry-runbook.md:

    • Add "Log-Trace Correlation" section explaining how to use Grafana Tempo -> Loki linking
    • Add LogQL query examples for filtering by trace_id
  • Edit OpenTelemetryPlan/09-data-collection-reference.md:

    • Add new section "3. Log Correlation" between SpanMetrics and StatsD sections
    • Document the log format with trace_id injection
    • Document Loki as a new backend
  • Edit docker/telemetry/TESTING.md:

    • Add log correlation verification steps

Key modified files:

  • docs/telemetry-runbook.md
  • OpenTelemetryPlan/09-data-collection-reference.md
  • docker/telemetry/TESTING.md

Summary Table

Task Description Sub-Phase New Files Modified Files Effort Risk Depends On
8.1 Inject trace_id into Logs::format() 8a 0 1 1d Low Phase 7
8.2 Add Loki to Docker Compose stack 8b 1 1 0.5d Low --
8.3 Add filelog receiver to OTel Collector 8b 0 2 1d Medium 8.1, 8.2
8.4 Configure Grafana trace-to-log correlation 8b 0 2 0.5d Low 8.3
8.5 Update integration tests 8a + 8b 0 1 0.5d Low 8.4
8.6 Update documentation 8a + 8b 0 3 1d Low 8.5

Total Effort: 4.5 days

Parallel work: Task 8.2 (Loki infra) can run in parallel with Task 8.1 (code change). Tasks 8.3-8.6 are sequential.

Exit Criteria (from 06-implementation-phases.md §6.8.1):

  • Log lines within active spans contain trace_id=<hex> span_id=<hex>
  • Log lines outside spans have no trace context (no empty fields)
  • Loki ingests rippled logs via OTel Collector filelog receiver
  • Grafana Tempo -> Loki one-click correlation works
  • Grafana Loki -> Tempo reverse lookup works via derived field
  • Integration test verifies trace_id presence in logs
  • No performance regression from trace_id injection (< 0.1% overhead)