fix: address CI rename checks (rippled -> xrpld) in phase-10 docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-07-30 10:30:22 +00:00 · 2026-04-29 20:40:44 +01:00
parent 70d86d7ebf
commit b659d43395
5 changed files with 72 additions and 72 deletions
--- a/OpenTelemetryPlan/06-implementation-phases.md
+++ b/OpenTelemetryPlan/06-implementation-phases.md
@@ -846,7 +846,7 @@ flowchart LR

 ### Key Implementation Details

- **Transaction submitter and RPC load generator** both use rippled's native WebSocket command format (`{"command": ...}`) — not JSON-RPC format. Response data lives inside `"result"` with `"status"` at the top level.
+- **Transaction submitter and RPC load generator** both use xrpld's native WebSocket command format (`{"command": ...}`) — not JSON-RPC format. Response data lives inside `"result"` with `"status"` at the top level.
 - **Node config** requires `[signing_support] true` for server-side signing, and `[ips]` (not `[ips_fixed]`) to ensure peer connections count in `Peer_Finder_Active_*` metrics.
 - **Metric validation** uses the Prometheus `/api/v1/series` endpoint (not instant queries) to avoid false negatives from stale StatsD gauges. Every metric in `expected_metrics.json` must have > 0 series.
 - **StatsD gauge fix**: `StatsDGaugeImpl` initializes `m_dirty = true` so all gauges emit their initial value on first flush. Without this, gauges starting at 0 that never change (e.g. `jobq_job_count`) would be invisible in Prometheus.
@@ -871,13 +871,13 @@ See [Phase10_taskList.md](./Phase10_taskList.md) for detailed per-task breakdown

 The validation suite (`validate_telemetry.py`) runs exactly 71 checks, broken down as:

- **1 service registration** — `rippled` exists in Tempo
+- **1 service registration** — `xrpld` exists in Tempo
 - **17 span existence** — `rpc.request`, `rpc.process`, `rpc.ws_message`, `rpc.command.*`, `tx.process`, `tx.receive`, `tx.apply`, `consensus.proposal.send`, `consensus.ledger_close`, `consensus.accept`, `consensus.validation.send`, `consensus.accept.apply`, `ledger.build`, `ledger.validate`, `ledger.store`, `peer.proposal.receive`, `peer.validation.receive`
 - **14 span attribute** — required attributes on the 14 spans that define them (22 unique attributes total)
 - **2 span hierarchies** — `rpc.process` -> `rpc.command.*`, `ledger.build` -> `tx.apply` (1 skipped: `rpc.request` -> `rpc.process`, cross-thread)
 - **1 span duration bounds** — all spans > 0 and < 60 s
 - **26 metric existence** — 4 SpanMetrics (`traces_span_metrics_calls_total`, `..._duration_milliseconds_{bucket,count,sum}`), 6 StatsD gauges (`LedgerMaster_Validated_Ledger_Age`, `Published_Ledger_Age`, `State_Accounting_Full_duration`, `Peer_Finder_Active_{Inbound,Outbound}_Peers`, `jobq_job_count`), 2 StatsD counters (`rpc_requests_total`, `ledger_fetches_total`), 3 StatsD histograms (`rpc_time`, `rpc_size`, `ios_latency`), 4 overlay traffic (`total_Bytes_{In,Out}`, `total_Messages_{In,Out}`), 7 Phase 9 OTLP (`nodestore_state`, `cache_metrics`, `txq_metrics`, `rpc_method_{started,finished}_total`, `object_count`, `load_factor_metrics`)
- **10 dashboard loads** — `rippled-rpc-perf`, `rippled-transactions`, `rippled-consensus`, `rippled-ledger-ops`, `rippled-peer-net`, `rippled-system-node-health`, `rippled-system-network`, `rippled-system-rpc`, `rippled-system-overlay-detail`, `rippled-system-ledger-sync`
+- **10 dashboard loads** — `xrpld-rpc-perf`, `xrpld-transactions`, `xrpld-consensus`, `xrpld-ledger-ops`, `xrpld-peer-net`, `xrpld-system-node-health`, `xrpld-system-network`, `xrpld-system-rpc`, `xrpld-system-overlay-detail`, `xrpld-system-ledger-sync`

 See [Phase10_taskList.md](./Phase10_taskList.md) for the full numbered check-by-check enumeration.

--- a/OpenTelemetryPlan/Phase11_taskList.md
+++ b/OpenTelemetryPlan/Phase11_taskList.md
@@ -446,40 +446,40 @@ This phase addresses the cross-cutting gap identified during research: **xrpld h
 > **Upstream**: Phase 7 Tasks 7.9-7.16 (metrics), Phase 9 Tasks 9.11-9.13 (dashboards).
 > **Downstream**: None — terminal task in the parity chain.

-**Objective**: Add Grafana alerting rules for the Phase 7+ parity metrics (validation agreement, validator health, peer quality, state tracking, ledger economy). These complement Task 11.8's `xrpl_*` alerts by covering the `rippled_*` internal metrics.
+**Objective**: Add Grafana alerting rules for the Phase 7+ parity metrics (validation agreement, validator health, peer quality, state tracking, ledger economy). These complement Task 11.8's `xrpl_*` alerts by covering the `xrpld_*` internal metrics.

 **Critical Group** (8 rules, eval interval 10s):

-| Rule                | Condition                                                       | For |
-| ------------------- | --------------------------------------------------------------- | --- |
-| Agreement Below 90% | `rippled_validation_agreement{metric="agreement_pct_24h"} < 90` | 30s |
-| Not Proposing       | `rippled_state_tracking{metric="state_value"} < 6`              | 10s |
-| Unhealthy State     | `rippled_state_tracking{metric="state_value"} < 4`              | 10s |
-| Amendment Blocked   | `rippled_validator_health{metric="amendment_blocked"} == 1`     | 1m  |
-| UNL Expiring        | `rippled_validator_health{metric="unl_expiry_days"} < 14`       | 1h  |
-| High IO Latency     | `histogram_quantile(0.95, rippled_ios_latency_bucket) > 50`     | 1m  |
-| High Load Factor    | `rippled_load_factor_metrics{metric="load_factor"} > 1000`      | 1m  |
-| Peer Count Critical | `rippled_server_info{metric="peers"} < 5`                       | 1m  |
+| Rule                | Condition                                                     | For |
+| ------------------- | ------------------------------------------------------------- | --- |
+| Agreement Below 90% | `xrpld_validation_agreement{metric="agreement_pct_24h"} < 90` | 30s |
+| Not Proposing       | `xrpld_state_tracking{metric="state_value"} < 6`              | 10s |
+| Unhealthy State     | `xrpld_state_tracking{metric="state_value"} < 4`              | 10s |
+| Amendment Blocked   | `xrpld_validator_health{metric="amendment_blocked"} == 1`     | 1m  |
+| UNL Expiring        | `xrpld_validator_health{metric="unl_expiry_days"} < 14`       | 1h  |
+| High IO Latency     | `histogram_quantile(0.95, xrpld_ios_latency_bucket) > 50`     | 1m  |
+| High Load Factor    | `xrpld_load_factor_metrics{metric="load_factor"} > 1000`      | 1m  |
+| Peer Count Critical | `xrpld_server_info{metric="peers"} < 5`                       | 1m  |

 **Network Group** (3 rules, eval interval 10s):

-| Rule                      | Condition                                                           | For |
-| ------------------------- | ------------------------------------------------------------------- | --- |
-| Peer Drop >10%            | `delta(rippled_server_info{metric="peers"}[30s]) / ... * 100 < -10` | 30s |
-| Peer Drop >30%            | Same formula, threshold -30                                         | 30s |
-| P90 Latency + Disconnects | `peer_latency_p90_ms > 500 AND rate(disconnects) > 0`               | 2m  |
+| Rule                      | Condition                                                         | For |
+| ------------------------- | ----------------------------------------------------------------- | --- |
+| Peer Drop >10%            | `delta(xrpld_server_info{metric="peers"}[30s]) / ... * 100 < -10` | 30s |
+| Peer Drop >30%            | Same formula, threshold -30                                       | 30s |
+| P90 Latency + Disconnects | `peer_latency_p90_ms > 500 AND rate(disconnects) > 0`             | 2m  |

 **Performance Group** (7 rules, eval interval 10s):

-| Rule                | Condition                                                      | For |
-| ------------------- | -------------------------------------------------------------- | --- |
-| CPU High            | Per-core CPU > 80% (requires node_exporter)                    | 2m  |
-| Memory Critical     | Memory usage > 90% (requires node_exporter)                    | 1m  |
-| Disk Warning        | Disk usage > 85% (requires node_exporter)                      | 2m  |
-| Job Queue Overflow  | `rate(rippled_jq_trans_overflow_total[5m]) > 0`                | 1m  |
-| Upgrade Recommended | `rippled_peer_quality{metric="peers_higher_version_pct"} > 60` | 1m  |
-| TX Rate Drop        | Transaction rate dropped > 50% in 5m window                    | 5m  |
-| Stale Ledger        | `rippled_ledger_economy{metric="ledger_age_seconds"} > 30`     | 1m  |
+| Rule                | Condition                                                    | For |
+| ------------------- | ------------------------------------------------------------ | --- |
+| CPU High            | Per-core CPU > 80% (requires node_exporter)                  | 2m  |
+| Memory Critical     | Memory usage > 90% (requires node_exporter)                  | 1m  |
+| Disk Warning        | Disk usage > 85% (requires node_exporter)                    | 2m  |
+| Job Queue Overflow  | `rate(xrpld_jq_trans_overflow_total[5m]) > 0`                | 1m  |
+| Upgrade Recommended | `xrpld_peer_quality{metric="peers_higher_version_pct"} > 60` | 1m  |
+| TX Rate Drop        | Transaction rate dropped > 50% in 5m window                  | 5m  |
+| Stale Ledger        | `xrpld_ledger_economy{metric="ledger_age_seconds"} > 30`     | 1m  |

 **Notification channel templates**: Email/SMTP, Discord, Slack, PagerDuty.

@@ -507,13 +507,13 @@ This phase addresses the cross-cutting gap identified during research: **xrpld h

 **Use case**: Real-time state panels (server state, ledger age, peer count) where 10-15s latency is too slow for operational dashboards.

-**Decision**: Document as a future option, not implement now. The current 10s interval is acceptable for v1. The external dashboard achieves 2-5s freshness by polling RPC directly, which is what the Phase 11 receiver already does. Adding a separate scrape endpoint to rippled would only be needed if sub-second metric freshness is required from the internal metrics pipeline.
+**Decision**: Document as a future option, not implement now. The current 10s interval is acceptable for v1. The external dashboard achieves 2-5s freshness by polling RPC directly, which is what the Phase 11 receiver already does. Adding a separate scrape endpoint to xrpld would only be needed if sub-second metric freshness is required from the internal metrics pipeline.

 **What to document**:

 - Architecture comparison: OTLP pipeline (10-15s) vs. direct scrape (2-5s) vs. push gateway
 - When to consider: operator feedback indicating 10s is insufficient for alerting SLOs
- How to implement if needed: add `/metrics` HTTP endpoint to rippled with Prometheus client library
+- How to implement if needed: add `/metrics` HTTP endpoint to xrpld with Prometheus client library
 - Trade-offs: additional port, additional dependency, duplication with OTLP metrics

 **Key files**:
--- a/OpenTelemetryPlan/Phase9_taskList.md
+++ b/OpenTelemetryPlan/Phase9_taskList.md
@@ -127,10 +127,10 @@ These metrics serve multiple external consumer categories identified during rese
 **What to do**:

 - Register OTel instruments for PerfLog RPC counters (from `PerfLogImp.cpp` line ~63):
-  - Counter: `rippled_rpc_method_started_total{method="<name>"}` — calls started
-  - Counter: `rippled_rpc_method_finished_total{method="<name>"}` — calls completed
-  - Counter: `rippled_rpc_method_errored_total{method="<name>"}` — calls errored
-  - Histogram: `rippled_rpc_method_duration_us{method="<name>"}` — execution time distribution
+  - Counter: `xrpld_rpc_method_started_total{method="<name>"}` — calls started
+  - Counter: `xrpld_rpc_method_finished_total{method="<name>"}` — calls completed
+  - Counter: `xrpld_rpc_method_errored_total{method="<name>"}` — calls errored
+  - Histogram: `xrpld_rpc_method_duration_us{method="<name>"}` — execution time distribution

 - Use OTel `Counter<int64_t>` and `Histogram<double>` instruments with `method` attribute label.

@@ -154,11 +154,11 @@ These metrics serve multiple external consumer categories identified during rese
 **What to do**:

 - Register OTel instruments for PerfLog job counters:
-  - Counter: `rippled_job_queued_total{job_type="<name>"}` — jobs queued
-  - Counter: `rippled_job_started_total{job_type="<name>"}` — jobs started
-  - Counter: `rippled_job_finished_total{job_type="<name>"}` — jobs completed
-  - Histogram: `rippled_job_queued_duration_us{job_type="<name>"}` — time spent waiting in queue
-  - Histogram: `rippled_job_running_duration_us{job_type="<name>"}` — execution time distribution
+  - Counter: `xrpld_job_queued_total{job_type="<name>"}` — jobs queued
+  - Counter: `xrpld_job_started_total{job_type="<name>"}` — jobs started
+  - Counter: `xrpld_job_finished_total{job_type="<name>"}` — jobs completed
+  - Histogram: `xrpld_job_queued_duration_us{job_type="<name>"}` — time spent waiting in queue
+  - Histogram: `xrpld_job_running_duration_us{job_type="<name>"}` — execution time distribution

 - Hook into PerfLog's existing job tracking alongside Task 9.4.

@@ -180,15 +180,15 @@ These metrics serve multiple external consumer categories identified during rese
 **What to do**:

 - Register OTel `ObservableGauge` callbacks for `CountedObject<T>` instance counts:
-  - `rippled_object_count{type="Transaction"}` — live Transaction objects
-  - `rippled_object_count{type="Ledger"}` — live Ledger objects
-  - `rippled_object_count{type="NodeObject"}` — live NodeObject instances
-  - `rippled_object_count{type="STTx"}` — serialized transaction objects
-  - `rippled_object_count{type="STLedgerEntry"}` — serialized ledger entries
-  - `rippled_object_count{type="InboundLedger"}` — ledgers being fetched
-  - `rippled_object_count{type="Pathfinder"}` — active pathfinding computations
-  - `rippled_object_count{type="PathRequest"}` — active path requests
-  - `rippled_object_count{type="HashRouterEntry"}` — hash router entries
+  - `xrpld_object_count{type="Transaction"}` — live Transaction objects
+  - `xrpld_object_count{type="Ledger"}` — live Ledger objects
+  - `xrpld_object_count{type="NodeObject"}` — live NodeObject instances
+  - `xrpld_object_count{type="STTx"}` — serialized transaction objects
+  - `xrpld_object_count{type="STLedgerEntry"}` — serialized ledger entries
+  - `xrpld_object_count{type="InboundLedger"}` — ledgers being fetched
+  - `xrpld_object_count{type="Pathfinder"}` — active pathfinding computations
+  - `xrpld_object_count{type="PathRequest"}` — active path requests
+  - `xrpld_object_count{type="HashRouterEntry"}` — hash router entries

 - The `CountedObject` template already tracks these via atomic counters. The callback just reads the current counts.

--- a/docker/telemetry/workload/README.md
+++ b/docker/telemetry/workload/README.md
@@ -1,11 +1,11 @@
 # Telemetry Workload Tools

-Synthetic workload generation and validation tools for rippled's OpenTelemetry telemetry stack. These tools validate that all spans, metrics, dashboards, and log-trace correlation work end-to-end under controlled load.
+Synthetic workload generation and validation tools for xrpld's OpenTelemetry telemetry stack. These tools validate that all spans, metrics, dashboards, and log-trace correlation work end-to-end under controlled load.

 ## Quick Start

 ```bash
-# Build rippled with telemetry enabled
+# Build xrpld with telemetry enabled
 conan install . --build=missing -o telemetry=True
 cmake --preset default -Dtelemetry=ON
 cmake --build --preset default
@@ -19,7 +19,7 @@ docker/telemetry/workload/run-full-validation.sh --cleanup

 ## Architecture

-The validation suite runs a multi-node rippled cluster as local processes alongside
+The validation suite runs a multi-node xrpld cluster as local processes alongside
 a Docker Compose telemetry stack. The cluster exercises consensus, peer-to-peer
 spans (proposals, validations), and all metric pipelines.

@@ -108,7 +108,7 @@ Custom `"weights"` override the default command/transaction distribution.

 ### run-full-validation.sh

-Orchestrates the complete validation pipeline. Starts the telemetry stack, starts a multi-node rippled cluster, generates load, and validates the results.
+Orchestrates the complete validation pipeline. Starts the telemetry stack, starts a multi-node xrpld cluster, generates load, and validates the results.

 ```bash
 # Full validation with defaults (uses full-validation profile)
@@ -146,7 +146,7 @@ python3 workload_orchestrator.py --profile stress --report /tmp/report.json
 ### rpc_load_generator.py

 Generates RPC traffic matching realistic production distribution. Uses
-rippled's **native WebSocket command format** (`{"command": ...}`) with flat
+xrpld's **native WebSocket command format** (`{"command": ...}`) with flat
 parameters — the same format as `tx_submitter.py`.

 - 40% health checks (server_info, fee)
@@ -172,7 +172,7 @@ python3 rpc_load_generator.py --endpoints ws://localhost:6006 \
 ### tx_submitter.py

 Submits diverse transaction types to exercise the full span and metric surface.
-Uses rippled's **native WebSocket command format** (`{"command": ...}`) rather
+Uses xrpld's **native WebSocket command format** (`{"command": ...}`) rather
 than JSON-RPC format. The response payload is inside the `"result"` key, with
 `"status"` at the top level.

@@ -310,7 +310,7 @@ Categories:
 The validation runs as a GitHub Actions workflow (`.github/workflows/telemetry-validation.yml`):

 - Triggered manually or on pushes to telemetry branches
- Builds rippled, starts the full stack, runs load, validates
+- Builds xrpld, starts the full stack, runs load, validates
 - Uploads reports as artifacts
 - Posts summary to PR

--- a/tasks/fix-validation-checks.md
+++ b/tasks/fix-validation-checks.md
@@ -16,11 +16,11 @@ CI run: https://github.com/XRPLF/rippled/actions/runs/23026466191
 **Symptoms:**

 ```
-[FAIL] metric.statsd_gauges.rippled_LedgerMaster_Validated_Ledger_Age: 0 series
-[FAIL] metric.statsd_counters.rippled_rpc_requests: 0 series
-[FAIL] metric.statsd_histograms.rippled_rpc_time: 0 series
-[FAIL] metric.overlay_traffic.rippled_total_Bytes_In: 0 series
-[FAIL] metric.phase9_nodestore.rippled_nodestore_reads_total: 0 series
+[FAIL] metric.statsd_gauges.xrpld_LedgerMaster_Validated_Ledger_Age: 0 series
+[FAIL] metric.statsd_counters.xrpld_rpc_requests: 0 series
+[FAIL] metric.statsd_histograms.xrpld_rpc_time: 0 series
+[FAIL] metric.overlay_traffic.xrpld_total_Bytes_In: 0 series
+[FAIL] metric.phase9_nodestore.xrpld_nodestore_reads_total: 0 series
 ... (25 total)
 ```

@@ -32,7 +32,7 @@ CI run: https://github.com/XRPLF/rippled/actions/runs/23026466191
   the validation harness configures xrpld nodes with `server=statsd`.

 2. **Metric name mismatch:** The `expected_metrics.json` expects StatsD-style metric
-   names (e.g., `rippled_LedgerMaster_Validated_Ledger_Age`). When using `server=otel`,
+   names (e.g., `xrpld_LedgerMaster_Validated_Ledger_Age`). When using `server=otel`,
   beast::insight emits OTLP metrics which may have different names/structure.

 **Fix Options (pick one):**
@@ -127,24 +127,24 @@ individual checks), but the parent-child relationship isn't established.
 **Symptoms:**

 ```
-[FAIL] dashboard.rippled-statsd-node-health: HTTP 404
-[FAIL] dashboard.rippled-statsd-network: HTTP 404
-[FAIL] dashboard.rippled-statsd-rpc: HTTP 404
-[FAIL] dashboard.rippled-statsd-overlay-detail: HTTP 404
-[FAIL] dashboard.rippled-statsd-ledger-sync: HTTP 404
+[FAIL] dashboard.xrpld-statsd-node-health: HTTP 404
+[FAIL] dashboard.xrpld-statsd-network: HTTP 404
+[FAIL] dashboard.xrpld-statsd-rpc: HTTP 404
+[FAIL] dashboard.xrpld-statsd-overlay-detail: HTTP 404
+[FAIL] dashboard.xrpld-statsd-ledger-sync: HTTP 404
 ```

-**Root Cause:** Dashboard UIDs were renamed from `rippled-statsd-*` to `rippled-system-*`
+**Root Cause:** Dashboard UIDs were renamed from `xrpld-statsd-*` to `xrpld-system-*`
 but `expected_metrics.json` still references the old names.

 **Actual UIDs in `docker/telemetry/grafana/dashboards/`:**
 | Expected (in expected_metrics.json) | Actual (in dashboard JSON) |
 |-------------------------------------|-------------------------------|
-| `rippled-statsd-node-health` | `rippled-system-node-health` |
-| `rippled-statsd-network` | `rippled-system-network` |
-| `rippled-statsd-rpc` | `rippled-system-rpc` |
-| `rippled-statsd-overlay-detail` | `rippled-system-overlay-detail` |
-| `rippled-statsd-ledger-sync` | `rippled-system-ledger-sync` |
+| `xrpld-statsd-node-health` | `xrpld-system-node-health` |
+| `xrpld-statsd-network` | `xrpld-system-network` |
+| `xrpld-statsd-rpc` | `xrpld-system-rpc` |
+| `xrpld-statsd-overlay-detail` | `xrpld-system-overlay-detail` |
+| `xrpld-statsd-ledger-sync` | `xrpld-system-ledger-sync` |

 **Fix:** Update the 5 UIDs in `expected_metrics.json` → `grafana_dashboards.uids[]`.