mirror of
https://github.com/XRPLF/rippled.git
synced 2026-06-03 16:56:48 +00:00
fix: address CI rename checks (rippled -> xrpld) in phase-8 docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -391,14 +391,14 @@ The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffi
|
||||
|
||||
### Motivation
|
||||
|
||||
rippled's `beast::Journal` logs and OpenTelemetry traces are currently two disjoint observability signals. When investigating an issue, operators must manually correlate timestamps between log files and Jaeger/Tempo traces. Phase 8 bridges this gap by injecting trace context (`trace_id`, `span_id`) into every log line emitted within an active span, and ingesting those logs into Grafana Loki via the OTel Collector's filelog receiver.
|
||||
xrpld's `beast::Journal` logs and OpenTelemetry traces are currently two disjoint observability signals. When investigating an issue, operators must manually correlate timestamps between log files and Jaeger/Tempo traces. Phase 8 bridges this gap by injecting trace context (`trace_id`, `span_id`) into every log line emitted within an active span, and ingesting those logs into Grafana Loki via the OTel Collector's filelog receiver.
|
||||
|
||||
#### Gains
|
||||
|
||||
1. **One-click trace-to-log navigation** — Click a trace in Tempo/Jaeger and immediately see the corresponding log lines in Loki, filtered by `trace_id`.
|
||||
2. **Reverse lookup (log-to-trace)** — Loki derived fields make `trace_id` values clickable links back to Tempo.
|
||||
3. **Unified observability** — All three pillars (traces, metrics, logs) flow through the same OTel Collector pipeline and are visible in a single Grafana instance.
|
||||
4. **Zero new dependencies in rippled** — Uses existing OTel SDK headers (`GetSpan`, `GetContext`) already linked in Phase 1.
|
||||
4. **Zero new dependencies in xrpld** — Uses existing OTel SDK headers (`GetSpan`, `GetContext`) already linked in Phase 1.
|
||||
5. **Negligible overhead** — `GetSpan()` + `GetContext()` are thread-local reads (<10ns/call). At ~1000 JLOG calls/min, this adds <10us/min.
|
||||
|
||||
#### Losses / Risks
|
||||
@@ -416,13 +416,13 @@ The correlation value far outweighs the risks. The log format change is backward
|
||||
Phase 8 has two independent sub-phases that can be developed in parallel:
|
||||
|
||||
- **Phase 8a (code change)**: Modify `Logs::format()` in `src/libxrpl/basics/Log.cpp` to append `trace_id=<hex32> span_id=<hex16>` when the current thread has an active OTel span. Guarded by `#ifdef XRPL_ENABLE_TELEMETRY`.
|
||||
- **Phase 8b (infra only)**: Add Loki to the Docker Compose stack, configure the OTel Collector's `filelog` receiver to tail rippled's log file, parse out structured fields (timestamp, partition, severity, trace_id, span_id, message), and export to Loki via OTLP. Configure Grafana Tempo↔Loki bidirectional linking.
|
||||
- **Phase 8b (infra only)**: Add Loki to the Docker Compose stack, configure the OTel Collector's `filelog` receiver to tail xrpld's log file, parse out structured fields (timestamp, partition, severity, trace_id, span_id, message), and export to Loki via OTLP. Configure Grafana Tempo↔Loki bidirectional linking.
|
||||
|
||||
#### Trace ID Injection Flow
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph rippled["rippled process"]
|
||||
subgraph xrpld["xrpld process"]
|
||||
JLOG["JLOG(j.info())"]
|
||||
Format["Logs::format()"]
|
||||
OTelCtx["OTel Context<br/>(thread-local)"]
|
||||
@@ -436,7 +436,7 @@ flowchart LR
|
||||
|
||||
Format --> LogLine
|
||||
|
||||
style rippled fill:#1a237e,stroke:#0d1642,color:#fff
|
||||
style xrpld fill:#1a237e,stroke:#0d1642,color:#fff
|
||||
style output fill:#1b5e20,stroke:#0d3d14,color:#fff
|
||||
style JLOG fill:#283593,stroke:#1a237e,color:#fff
|
||||
style Format fill:#283593,stroke:#1a237e,color:#fff
|
||||
@@ -456,7 +456,7 @@ flowchart LR
|
||||
FR --> RP --> BP --> LE
|
||||
end
|
||||
|
||||
LogFile["rippled<br/>debug.log"] --> FR
|
||||
LogFile["xrpld<br/>debug.log"] --> FR
|
||||
LE --> Loki["Grafana Loki<br/>:3100"]
|
||||
Loki <-->|"derivedFields ↔<br/>tracesToLogs"| Tempo["Grafana Tempo"]
|
||||
|
||||
@@ -487,7 +487,7 @@ flowchart LR
|
||||
|
||||
- [ ] Log lines within active spans contain `trace_id=<hex> span_id=<hex>`
|
||||
- [ ] Log lines outside spans have no trace context (no empty fields)
|
||||
- [ ] Loki ingests rippled logs via OTel Collector filelog receiver
|
||||
- [ ] Loki ingests xrpld logs via OTel Collector filelog receiver
|
||||
- [ ] Grafana Tempo → Loki one-click correlation works
|
||||
- [ ] Grafana Loki → Tempo reverse lookup works via derived field
|
||||
- [ ] Integration test verifies trace_id presence in logs
|
||||
|
||||
@@ -495,7 +495,7 @@ xrpld_State_Accounting_Full_duration
|
||||
> **Plan details**: [06-implementation-phases.md §6.8.1](./06-implementation-phases.md) — motivation, architecture, Mermaid diagrams
|
||||
> **Task breakdown**: [Phase8_taskList.md](./Phase8_taskList.md) — per-task implementation details
|
||||
|
||||
Phase 8 injects OTel trace context into rippled's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
|
||||
Phase 8 injects OTel trace context into xrpld's `Logs::format()` output, enabling log-trace correlation. When a log line is emitted within an active OTel span, the trace and span identifiers are automatically appended after the severity field:
|
||||
|
||||
### Log Format
|
||||
|
||||
@@ -520,7 +520,7 @@ The trace context injection is implemented in `Logs::format()` (`src/libxrpl/bas
|
||||
### Log Ingestion Pipeline
|
||||
|
||||
```
|
||||
rippled debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
|
||||
xrpld debug.log -> OTel Collector filelog receiver -> regex_parser -> Loki exporter -> Grafana Loki
|
||||
```
|
||||
|
||||
The OTel Collector's `filelog` receiver tails `debug.log` files and uses a `regex_parser` operator to extract structured fields:
|
||||
@@ -549,16 +549,16 @@ Grafana Loki (v2.9.0) serves as the log storage backend. It receives log entries
|
||||
|
||||
```logql
|
||||
# Find all logs for a specific trace
|
||||
{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
|
||||
# Error logs with trace context
|
||||
{job="rippled"} |= "ERR" |= "trace_id="
|
||||
{job="xrpld"} |= "ERR" |= "trace_id="
|
||||
|
||||
# Logs from a specific partition with trace context
|
||||
{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
|
||||
# Count traced log lines over time
|
||||
count_over_time({job="rippled"} |= "trace_id=" [5m])
|
||||
count_over_time({job="xrpld"} |= "trace_id=" [5m])
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Phase 8: Log-Trace Correlation and Centralized Log Ingestion — Task List
|
||||
|
||||
> **Goal**: Inject trace context (trace_id, span_id) into rippled's Journal log output for log-trace correlation, and add OTel Collector filelog receiver to ingest logs into Grafana Loki for unified observability.
|
||||
> **Goal**: Inject trace context (trace_id, span_id) into xrpld's Journal log output for log-trace correlation, and add OTel Collector filelog receiver to ingest logs into Grafana Loki for unified observability.
|
||||
>
|
||||
> **Scope**: Two independent sub-phases — 8a (code change: trace_id in logs) and 8b (infra only: filelog receiver to Loki). No changes to the `beast::Journal` public API.
|
||||
>
|
||||
@@ -89,7 +89,7 @@
|
||||
|
||||
## Task 8.3: Add Filelog Receiver to OTel Collector
|
||||
|
||||
**Objective**: Configure the OTel Collector to tail rippled's log file and export to Loki.
|
||||
**Objective**: Configure the OTel Collector to tail xrpld's log file and export to Loki.
|
||||
|
||||
**What to do**:
|
||||
|
||||
@@ -124,7 +124,7 @@
|
||||
insecure: true
|
||||
```
|
||||
|
||||
- Mount rippled's log directory into the collector container via docker-compose volume
|
||||
- Mount xrpld's log directory into the collector container via docker-compose volume
|
||||
|
||||
**Key modified files**:
|
||||
|
||||
@@ -172,7 +172,7 @@
|
||||
**What to do**:
|
||||
|
||||
- Edit `docker/telemetry/integration-test.sh`:
|
||||
- After sending RPC requests (which create spans), grep rippled's log output for `trace_id=`
|
||||
- After sending RPC requests (which create spans), grep xrpld's log output for `trace_id=`
|
||||
- Verify trace_id matches a trace visible in Tempo
|
||||
- Optionally: query Loki via API to confirm log ingestion
|
||||
|
||||
@@ -225,7 +225,7 @@
|
||||
|
||||
- [ ] Log lines within active spans contain `trace_id=<hex> span_id=<hex>`
|
||||
- [ ] Log lines outside spans have no trace context (no empty fields)
|
||||
- [ ] Loki ingests rippled logs via OTel Collector filelog receiver
|
||||
- [ ] Loki ingests xrpld logs via OTel Collector filelog receiver
|
||||
- [ ] Grafana Tempo -> Loki one-click correlation works
|
||||
- [ ] Grafana Loki -> Tempo reverse lookup works via derived field
|
||||
- [ ] Integration test verifies trace_id presence in logs
|
||||
|
||||
@@ -469,14 +469,14 @@ Pre-configured datasources:
|
||||
|
||||
## Test 3: Log-Trace Correlation (Phase 8)
|
||||
|
||||
Phase 8 injects `trace_id` and `span_id` into rippled's log output when
|
||||
Phase 8 injects `trace_id` and `span_id` into xrpld's log output when
|
||||
a log line is emitted within an active OTel span. This test verifies the
|
||||
end-to-end log-trace correlation pipeline.
|
||||
|
||||
### Step 1: Verify trace_id in log output
|
||||
|
||||
After running Test 1 or Test 2 (which generate RPC spans), check the
|
||||
rippled debug.log for trace context:
|
||||
xrpld debug.log for trace context:
|
||||
|
||||
```bash
|
||||
grep 'trace_id=[a-f0-9]\{32\} span_id=[a-f0-9]\{16\}' /path/to/debug.log
|
||||
@@ -506,13 +506,13 @@ Expected result: `1` (the trace exists in Jaeger).
|
||||
|
||||
### Step 3: Verify Loki log ingestion
|
||||
|
||||
The OTel Collector's filelog receiver tails rippled's debug.log and
|
||||
The OTel Collector's filelog receiver tails xrpld's debug.log and
|
||||
exports parsed entries to Loki. Verify Loki has received entries:
|
||||
|
||||
```bash
|
||||
# Query Loki for any rippled logs
|
||||
# Query Loki for any xrpld logs
|
||||
curl -sG "http://localhost:3100/loki/api/v1/query" \
|
||||
--data-urlencode 'query={job="rippled"}' \
|
||||
--data-urlencode 'query={job="xrpld"}' \
|
||||
--data-urlencode 'limit=5' | jq '.data.result | length'
|
||||
```
|
||||
|
||||
@@ -529,7 +529,7 @@ Expected: > 0 results.
|
||||
### Step 5: Verify Grafana Loki-to-Tempo correlation
|
||||
|
||||
1. In Grafana **Explore**, select **Loki** datasource
|
||||
2. Query: `{job="rippled"} |= "trace_id="`
|
||||
2. Query: `{job="xrpld"} |= "trace_id="`
|
||||
3. In the log results, click the **TraceID** derived field link
|
||||
4. Verify it navigates to the full trace in Tempo
|
||||
|
||||
@@ -588,7 +588,7 @@ Expected: > 0 results.
|
||||
|
||||
### No trace_id in log output (Phase 8)
|
||||
|
||||
1. Verify rippled was built with `telemetry=ON` (`-Dtelemetry=ON` in CMake)
|
||||
1. Verify xrpld was built with `telemetry=ON` (`-Dtelemetry=ON` in CMake)
|
||||
2. Verify `enabled=1` in the `[telemetry]` config section
|
||||
3. Log lines only contain trace context when emitted inside an active span.
|
||||
Background logs (startup, periodic tasks outside spans) will not have
|
||||
|
||||
@@ -487,7 +487,7 @@ Requires `trace_peer=1` in the `[telemetry]` config section.
|
||||
|
||||
## Log-Trace Correlation (Phase 8)
|
||||
|
||||
When rippled is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields:
|
||||
When xrpld is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields:
|
||||
|
||||
```
|
||||
2024-01-15T10:30:45.123Z LedgerMaster:NFO trace_id=abc123def456789012345678abcdef01 span_id=0123456789abcdef Validated ledger 42
|
||||
@@ -506,27 +506,27 @@ Log files are ingested by the OTel Collector's `filelog` receiver, which tails `
|
||||
|
||||
```logql
|
||||
# Find all logs for a specific trace
|
||||
{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01"
|
||||
|
||||
# Error logs with trace context (log lines with ERR severity that have a trace_id)
|
||||
{job="rippled"} |= "ERR" |= "trace_id="
|
||||
{job="xrpld"} |= "ERR" |= "trace_id="
|
||||
|
||||
# All logs from a specific partition that were emitted during a span
|
||||
{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
|
||||
|
||||
# Logs from the last hour containing trace context
|
||||
{job="rippled"} |= "trace_id=" | regexp `(?P<partition>\S+):(?P<sev>\S+)\s+trace_id=(?P<tid>[a-f0-9]+)`
|
||||
{job="xrpld"} |= "trace_id=" | regexp `(?P<partition>\S+):(?P<sev>\S+)\s+trace_id=(?P<tid>[a-f0-9]+)`
|
||||
|
||||
# Count of traced vs untraced log lines
|
||||
count_over_time({job="rippled"} |= "trace_id=" [5m])
|
||||
count_over_time({job="xrpld"} |= "trace_id=" [5m])
|
||||
```
|
||||
|
||||
### Verifying Log Correlation
|
||||
|
||||
1. Start the observability stack and rippled with telemetry enabled.
|
||||
1. Start the observability stack and xrpld with telemetry enabled.
|
||||
2. Send an RPC request: `curl http://localhost:5005 -d '{"method":"server_info"}'`
|
||||
3. Check the debug.log for `trace_id=` entries: `grep trace_id= /path/to/debug.log`
|
||||
4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="rippled"} |= "trace_id="`.
|
||||
4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="xrpld"} |= "trace_id="`.
|
||||
5. Click the TraceID link to navigate to the corresponding trace in Tempo.
|
||||
|
||||
## Troubleshooting
|
||||
@@ -554,14 +554,14 @@ count_over_time({job="rippled"} |= "trace_id=" [5m])
|
||||
|
||||
### No trace_id in log output
|
||||
|
||||
- Verify rippled was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag)
|
||||
- Verify xrpld was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag)
|
||||
- Verify `enabled=1` in the `[telemetry]` config section
|
||||
- Log lines only contain `trace_id`/`span_id` when emitted inside an active span — background logs outside of RPC/consensus/transaction processing will not have trace context
|
||||
- Check that the specific trace category is enabled (e.g., `trace_rpc=1`)
|
||||
|
||||
### No logs in Loki
|
||||
|
||||
- Verify the log file mount in docker-compose.yml points to the correct rippled log directory
|
||||
- Verify the log file mount in docker-compose.yml points to the correct xrpld log directory
|
||||
- Check OTel Collector logs for filelog receiver errors: `docker compose logs otel-collector`
|
||||
- Verify Loki is running: `curl http://localhost:3100/ready`
|
||||
- Check the filelog receiver glob pattern matches your log file paths
|
||||
|
||||
Reference in New Issue
Block a user