From 7ab6f4d34b82c599e7d5e661c4a2b4daa1f7ae55 Mon Sep 17 00:00:00 2001 From: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com> Date: Wed, 29 Apr 2026 20:09:43 +0100 Subject: [PATCH] fix: address CI rename checks (rippled -> xrpld) in phase-8 docs Co-Authored-By: Claude Opus 4.6 (1M context) --- OpenTelemetryPlan/06-implementation-phases.md | 14 ++++++------- .../09-data-collection-reference.md | 12 +++++------ OpenTelemetryPlan/Phase8_taskList.md | 10 +++++----- docker/telemetry/TESTING.md | 14 ++++++------- docs/telemetry-runbook.md | 20 +++++++++---------- 5 files changed, 35 insertions(+), 35 deletions(-) diff --git a/OpenTelemetryPlan/06-implementation-phases.md b/OpenTelemetryPlan/06-implementation-phases.md index c8e60f7857..f31ea1424f 100644 --- a/OpenTelemetryPlan/06-implementation-phases.md +++ b/OpenTelemetryPlan/06-implementation-phases.md @@ -391,14 +391,14 @@ The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffi ### Motivation -rippled's `beast::Journal` logs and OpenTelemetry traces are currently two disjoint observability signals. When investigating an issue, operators must manually correlate timestamps between log files and Jaeger/Tempo traces. Phase 8 bridges this gap by injecting trace context (`trace_id`, `span_id`) into every log line emitted within an active span, and ingesting those logs into Grafana Loki via the OTel Collector's filelog receiver. +xrpld's `beast::Journal` logs and OpenTelemetry traces are currently two disjoint observability signals. When investigating an issue, operators must manually correlate timestamps between log files and Jaeger/Tempo traces. Phase 8 bridges this gap by injecting trace context (`trace_id`, `span_id`) into every log line emitted within an active span, and ingesting those logs into Grafana Loki via the OTel Collector's filelog receiver. #### Gains 1. **One-click trace-to-log navigation** — Click a trace in Tempo/Jaeger and immediately see the corresponding log lines in Loki, filtered by `trace_id`. 2. **Reverse lookup (log-to-trace)** — Loki derived fields make `trace_id` values clickable links back to Tempo. 3. **Unified observability** — All three pillars (traces, metrics, logs) flow through the same OTel Collector pipeline and are visible in a single Grafana instance. -4. **Zero new dependencies in rippled** — Uses existing OTel SDK headers (`GetSpan`, `GetContext`) already linked in Phase 1. +4. **Zero new dependencies in xrpld** — Uses existing OTel SDK headers (`GetSpan`, `GetContext`) already linked in Phase 1. 5. **Negligible overhead** — `GetSpan()` + `GetContext()` are thread-local reads (<10ns/call). At ~1000 JLOG calls/min, this adds <10us/min. #### Losses / Risks @@ -416,13 +416,13 @@ The correlation value far outweighs the risks. The log format change is backward Phase 8 has two independent sub-phases that can be developed in parallel: - **Phase 8a (code change)**: Modify `Logs::format()` in `src/libxrpl/basics/Log.cpp` to append `trace_id= span_id=` when the current thread has an active OTel span. Guarded by `#ifdef XRPL_ENABLE_TELEMETRY`. -- **Phase 8b (infra only)**: Add Loki to the Docker Compose stack, configure the OTel Collector's `filelog` receiver to tail rippled's log file, parse out structured fields (timestamp, partition, severity, trace_id, span_id, message), and export to Loki via OTLP. Configure Grafana Tempo↔Loki bidirectional linking. +- **Phase 8b (infra only)**: Add Loki to the Docker Compose stack, configure the OTel Collector's `filelog` receiver to tail xrpld's log file, parse out structured fields (timestamp, partition, severity, trace_id, span_id, message), and export to Loki via OTLP. Configure Grafana Tempo↔Loki bidirectional linking. #### Trace ID Injection Flow ```mermaid flowchart LR - subgraph rippled["rippled process"] + subgraph xrpld["xrpld process"] JLOG["JLOG(j.info())"] Format["Logs::format()"] OTelCtx["OTel Context
(thread-local)"] @@ -436,7 +436,7 @@ flowchart LR Format --> LogLine - style rippled fill:#1a237e,stroke:#0d1642,color:#fff + style xrpld fill:#1a237e,stroke:#0d1642,color:#fff style output fill:#1b5e20,stroke:#0d3d14,color:#fff style JLOG fill:#283593,stroke:#1a237e,color:#fff style Format fill:#283593,stroke:#1a237e,color:#fff @@ -456,7 +456,7 @@ flowchart LR FR --> RP --> BP --> LE end - LogFile["rippled
debug.log"] --> FR + LogFile["xrpld
debug.log"] --> FR LE --> Loki["Grafana Loki
:3100"] Loki <-->|"derivedFields ↔
tracesToLogs"| Tempo["Grafana Tempo"] @@ -487,7 +487,7 @@ flowchart LR - [ ] Log lines within active spans contain `trace_id= span_id=` - [ ] Log lines outside spans have no trace context (no empty fields) -- [ ] Loki ingests rippled logs via OTel Collector filelog receiver +- [ ] Loki ingests xrpld logs via OTel Collector filelog receiver - [ ] Grafana Tempo → Loki one-click correlation works - [ ] Grafana Loki → Tempo reverse lookup works via derived field - [ ] Integration test verifies trace_id presence in logs diff --git a/OpenTelemetryPlan/09-data-collection-reference.md b/OpenTelemetryPlan/09-data-collection-reference.md index f9c0f5def7..ab7a9245ba 100644 --- a/OpenTelemetryPlan/09-data-collection-reference.md +++ b/OpenTelemetryPlan/09-data-collection-reference.md @@ -495,7 +495,7 @@ xrpld_State_Accounting_Full_duration > **Plan details**: [06-implementation-phases.md §6.8.1](./06-implementation-phases.md) — motivation, architecture, Mermaid diagrams > **Task breakdown**: [Phase8_taskList.md](./Phase8_taskList.md) — per-task implementation details -Phase 8 injects OTel trace context into rippled's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field: +Phase 8 injects OTel trace context into xrpld's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field: ### Log Format @@ -520,7 +520,7 @@ The trace context injection is implemented in `Logs::format()` (`src/libxrpl/bas ### Log Ingestion Pipeline ``` -rippled debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki +xrpld debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki ``` The OTel Collector's `filelog` receiver tails `debug.log` files and uses a `regex_parser` operator to extract structured fields: @@ -549,16 +549,16 @@ Grafana Loki (v2.9.0) serves as the log storage backend. It receives log entries ```logql # Find all logs for a specific trace -{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01" +{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01" # Error logs with trace context -{job="rippled"} |= "ERR" |= "trace_id=" +{job="xrpld"} |= "ERR" |= "trace_id=" # Logs from a specific partition with trace context -{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P[a-f0-9]+)` | trace_id != "" +{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P[a-f0-9]+)` | trace_id != "" # Count traced log lines over time -count_over_time({job="rippled"} |= "trace_id=" [5m]) +count_over_time({job="xrpld"} |= "trace_id=" [5m]) ``` --- diff --git a/OpenTelemetryPlan/Phase8_taskList.md b/OpenTelemetryPlan/Phase8_taskList.md index 32b19690f2..d7c4770584 100644 --- a/OpenTelemetryPlan/Phase8_taskList.md +++ b/OpenTelemetryPlan/Phase8_taskList.md @@ -1,6 +1,6 @@ # Phase 8: Log-Trace Correlation and Centralized Log Ingestion — Task List -> **Goal**: Inject trace context (trace_id, span_id) into rippled's Journal log output for log-trace correlation, and add OTel Collector filelog receiver to ingest logs into Grafana Loki for unified observability. +> **Goal**: Inject trace context (trace_id, span_id) into xrpld's Journal log output for log-trace correlation, and add OTel Collector filelog receiver to ingest logs into Grafana Loki for unified observability. > > **Scope**: Two independent sub-phases — 8a (code change: trace_id in logs) and 8b (infra only: filelog receiver to Loki). No changes to the `beast::Journal` public API. > @@ -89,7 +89,7 @@ ## Task 8.3: Add Filelog Receiver to OTel Collector -**Objective**: Configure the OTel Collector to tail rippled's log file and export to Loki. +**Objective**: Configure the OTel Collector to tail xrpld's log file and export to Loki. **What to do**: @@ -124,7 +124,7 @@ insecure: true ``` -- Mount rippled's log directory into the collector container via docker-compose volume +- Mount xrpld's log directory into the collector container via docker-compose volume **Key modified files**: @@ -172,7 +172,7 @@ **What to do**: - Edit `docker/telemetry/integration-test.sh`: - - After sending RPC requests (which create spans), grep rippled's log output for `trace_id=` + - After sending RPC requests (which create spans), grep xrpld's log output for `trace_id=` - Verify trace_id matches a trace visible in Tempo - Optionally: query Loki via API to confirm log ingestion @@ -225,7 +225,7 @@ - [ ] Log lines within active spans contain `trace_id= span_id=` - [ ] Log lines outside spans have no trace context (no empty fields) -- [ ] Loki ingests rippled logs via OTel Collector filelog receiver +- [ ] Loki ingests xrpld logs via OTel Collector filelog receiver - [ ] Grafana Tempo -> Loki one-click correlation works - [ ] Grafana Loki -> Tempo reverse lookup works via derived field - [ ] Integration test verifies trace_id presence in logs diff --git a/docker/telemetry/TESTING.md b/docker/telemetry/TESTING.md index e3a9525db5..418447e59f 100644 --- a/docker/telemetry/TESTING.md +++ b/docker/telemetry/TESTING.md @@ -469,14 +469,14 @@ Pre-configured datasources: ## Test 3: Log-Trace Correlation (Phase 8) -Phase 8 injects `trace_id` and `span_id` into rippled's log output when +Phase 8 injects `trace_id` and `span_id` into xrpld's log output when a log line is emitted within an active OTel span. This test verifies the end-to-end log-trace correlation pipeline. ### Step 1: Verify trace_id in log output After running Test 1 or Test 2 (which generate RPC spans), check the -rippled debug.log for trace context: +xrpld debug.log for trace context: ```bash grep 'trace_id=[a-f0-9]\{32\} span_id=[a-f0-9]\{16\}' /path/to/debug.log @@ -506,13 +506,13 @@ Expected result: `1` (the trace exists in Jaeger). ### Step 3: Verify Loki log ingestion -The OTel Collector's filelog receiver tails rippled's debug.log and +The OTel Collector's filelog receiver tails xrpld's debug.log and exports parsed entries to Loki. Verify Loki has received entries: ```bash -# Query Loki for any rippled logs +# Query Loki for any xrpld logs curl -sG "http://localhost:3100/loki/api/v1/query" \ - --data-urlencode 'query={job="rippled"}' \ + --data-urlencode 'query={job="xrpld"}' \ --data-urlencode 'limit=5' | jq '.data.result | length' ``` @@ -529,7 +529,7 @@ Expected: > 0 results. ### Step 5: Verify Grafana Loki-to-Tempo correlation 1. In Grafana **Explore**, select **Loki** datasource -2. Query: `{job="rippled"} |= "trace_id="` +2. Query: `{job="xrpld"} |= "trace_id="` 3. In the log results, click the **TraceID** derived field link 4. Verify it navigates to the full trace in Tempo @@ -588,7 +588,7 @@ Expected: > 0 results. ### No trace_id in log output (Phase 8) -1. Verify rippled was built with `telemetry=ON` (`-Dtelemetry=ON` in CMake) +1. Verify xrpld was built with `telemetry=ON` (`-Dtelemetry=ON` in CMake) 2. Verify `enabled=1` in the `[telemetry]` config section 3. Log lines only contain trace context when emitted inside an active span. Background logs (startup, periodic tasks outside spans) will not have diff --git a/docs/telemetry-runbook.md b/docs/telemetry-runbook.md index 2b0ad1f92e..1aa0462fee 100644 --- a/docs/telemetry-runbook.md +++ b/docs/telemetry-runbook.md @@ -487,7 +487,7 @@ Requires `trace_peer=1` in the `[telemetry]` config section. ## Log-Trace Correlation (Phase 8) -When rippled is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields: +When xrpld is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields: ``` 2024-01-15T10:30:45.123Z LedgerMaster:NFO trace_id=abc123def456789012345678abcdef01 span_id=0123456789abcdef Validated ledger 42 @@ -506,27 +506,27 @@ Log files are ingested by the OTel Collector's `filelog` receiver, which tails ` ```logql # Find all logs for a specific trace -{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01" +{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01" # Error logs with trace context (log lines with ERR severity that have a trace_id) -{job="rippled"} |= "ERR" |= "trace_id=" +{job="xrpld"} |= "ERR" |= "trace_id=" # All logs from a specific partition that were emitted during a span -{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P[a-f0-9]+)` | trace_id != "" +{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P[a-f0-9]+)` | trace_id != "" # Logs from the last hour containing trace context -{job="rippled"} |= "trace_id=" | regexp `(?P\S+):(?P\S+)\s+trace_id=(?P[a-f0-9]+)` +{job="xrpld"} |= "trace_id=" | regexp `(?P\S+):(?P\S+)\s+trace_id=(?P[a-f0-9]+)` # Count of traced vs untraced log lines -count_over_time({job="rippled"} |= "trace_id=" [5m]) +count_over_time({job="xrpld"} |= "trace_id=" [5m]) ``` ### Verifying Log Correlation -1. Start the observability stack and rippled with telemetry enabled. +1. Start the observability stack and xrpld with telemetry enabled. 2. Send an RPC request: `curl http://localhost:5005 -d '{"method":"server_info"}'` 3. Check the debug.log for `trace_id=` entries: `grep trace_id= /path/to/debug.log` -4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="rippled"} |= "trace_id="`. +4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="xrpld"} |= "trace_id="`. 5. Click the TraceID link to navigate to the corresponding trace in Tempo. ## Troubleshooting @@ -554,14 +554,14 @@ count_over_time({job="rippled"} |= "trace_id=" [5m]) ### No trace_id in log output -- Verify rippled was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag) +- Verify xrpld was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag) - Verify `enabled=1` in the `[telemetry]` config section - Log lines only contain `trace_id`/`span_id` when emitted inside an active span — background logs outside of RPC/consensus/transaction processing will not have trace context - Check that the specific trace category is enabled (e.g., `trace_rpc=1`) ### No logs in Loki -- Verify the log file mount in docker-compose.yml points to the correct rippled log directory +- Verify the log file mount in docker-compose.yml points to the correct xrpld log directory - Check OTel Collector logs for filelog receiver errors: `docker compose logs otel-collector` - Verify Loki is running: `curl http://localhost:3100/ready` - Check the filelog receiver glob pattern matches your log file paths