mirror of
https://github.com/XRPLF/rippled.git
synced 2026-02-27 01:02:32 +00:00
Revert meter wire format change, defer |m -> |c fix as separate task
The StatsDCollector |m -> |c change is a breaking change for existing StatsD backends. Reverted to original |m and added TODO comments noting that Resource Manager warn/drop metrics need this fix to flow through the OTel StatsD receiver. Marked Phase 6 Task 6.1 as deferred. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -212,24 +212,26 @@ rippled has a mature metrics framework (`beast::insight`) that emits StatsD-form
|
||||
|
||||
### Tasks
|
||||
|
||||
| Task | Description | Effort | Risk |
|
||||
| ---- | ---------------------------------------------------------------- | ------ | ---- |
|
||||
| 6.1 | Fix Meter wire format (`\|m` → `\|c`) in StatsDCollector.cpp | 0.5d | Low |
|
||||
| 6.2 | Add `statsd` receiver to OTel Collector config | 0.5d | Low |
|
||||
| 6.3 | Expose UDP port 8125 in docker-compose.yml | 0.1d | Low |
|
||||
| 6.4 | Add `[insight]` config to integration test node configs | 0.5d | Low |
|
||||
| 6.5 | Create "Node Health" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.6 | Create "Network Traffic" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.7 | Create "RPC & Pathfinding (StatsD)" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.8 | Update integration test to verify StatsD metrics in Prometheus | 0.5d | Low |
|
||||
| 6.9 | Update TESTING.md and telemetry-runbook.md | 0.5d | Low |
|
||||
| Task | Description | Effort | Risk |
|
||||
| ---- | --------------------------------------------------------------------------------------------------------------- | ------ | ---- |
|
||||
| 6.1 | **DEFERRED** Fix Meter wire format (`\|m` → `\|c`) in StatsDCollector.cpp — breaking change, tracked separately | 0.5d | Low |
|
||||
| 6.2 | Add `statsd` receiver to OTel Collector config | 0.5d | Low |
|
||||
| 6.3 | Expose UDP port 8125 in docker-compose.yml | 0.1d | Low |
|
||||
| 6.4 | Add `[insight]` config to integration test node configs | 0.5d | Low |
|
||||
| 6.5 | Create "Node Health" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.6 | Create "Network Traffic" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.7 | Create "RPC & Pathfinding (StatsD)" Grafana dashboard (8 panels) | 1d | Low |
|
||||
| 6.8 | Update integration test to verify StatsD metrics in Prometheus | 0.5d | Low |
|
||||
| 6.9 | Update TESTING.md and telemetry-runbook.md | 0.5d | Low |
|
||||
|
||||
**Total Effort**: 5.6 days
|
||||
|
||||
### Wire Format Fix (Task 6.1)
|
||||
### Wire Format Fix (Task 6.1) — DEFERRED
|
||||
|
||||
The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffix, which is non-standard StatsD. The OTel StatsD receiver silently drops these. Fix: change `|m` to `|c` (counter), which is semantically correct since meters are increment-only counters. Only 2 metrics are affected (`warn`, `drop` in Resource Manager).
|
||||
|
||||
**Status**: Deferred as a separate change — this is a breaking change for any StatsD backend that previously consumed the custom `|m` type. The Resource Warnings and Resource Drops dashboard panels will show no data until this fix is applied.
|
||||
|
||||
### New Grafana Dashboards
|
||||
|
||||
**Node Health** (`statsd-node-health.json`, uid: `rippled-statsd-node-health`):
|
||||
@@ -249,7 +251,7 @@ The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffi
|
||||
- [ ] StatsD metrics visible in Prometheus (`curl localhost:9090/api/v1/query?query=rippled_LedgerMaster_Validated_Ledger_Age`)
|
||||
- [ ] All 3 new Grafana dashboards load without errors
|
||||
- [ ] Integration test verifies at least core StatsD metrics (ledger age, peer counts, RPC requests)
|
||||
- [ ] Meter metrics (`warn`, `drop`) flow correctly after `|m` → `|c` fix
|
||||
- [ ] ~~Meter metrics (`warn`, `drop`) flow correctly after `|m` → `|c` fix~~ — DEFERRED (breaking change, tracked separately)
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -185,7 +185,7 @@
|
||||
},
|
||||
{
|
||||
"title": "Resource Warnings Rate",
|
||||
"description": "Rate of resource warning events from the Resource Manager. Sourced from the warn meter (Logic.h:33) which increments when a consumer (peer or RPC client) exceeds the warning threshold for resource usage. A rising rate indicates aggressive clients that may need throttling.",
|
||||
"description": "Rate of resource warning events from the Resource Manager. Sourced from the warn meter (Logic.h:33) which increments when a consumer (peer or RPC client) exceeds the warning threshold for resource usage. A rising rate indicates aggressive clients that may need throttling. NOTE: This panel will show no data until the |m -> |c fix is applied in StatsDCollector.cpp:706 (Phase 6 Task 6.1).",
|
||||
"type": "stat",
|
||||
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 },
|
||||
"options": {
|
||||
@@ -214,7 +214,7 @@
|
||||
},
|
||||
{
|
||||
"title": "Resource Drops Rate",
|
||||
"description": "Rate of resource drop events from the Resource Manager. Sourced from the drop meter (Logic.h:34) which increments when a consumer is disconnected or blocked due to excessive resource usage. Non-zero values mean the node is actively rejecting abusive connections.",
|
||||
"description": "Rate of resource drop events from the Resource Manager. Sourced from the drop meter (Logic.h:34) which increments when a consumer is disconnected or blocked due to excessive resource usage. Non-zero values mean the node is actively rejecting abusive connections. NOTE: This panel will show no data until the |m -> |c fix is applied in StatsDCollector.cpp:706 (Phase 6 Task 6.1).",
|
||||
"type": "stat",
|
||||
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 24 },
|
||||
"options": {
|
||||
|
||||
@@ -11,6 +11,13 @@
|
||||
# rippled also sends beast::insight metrics via StatsD/UDP to port 8125.
|
||||
# These are ingested by the statsd receiver and merged into the same
|
||||
# Prometheus endpoint alongside span-derived metrics.
|
||||
#
|
||||
# TODO: The Resource Manager's "warn" and "drop" metrics use the non-standard
|
||||
# "|m" (meter) StatsD type in StatsDCollector.cpp:706. The OTel StatsD
|
||||
# receiver silently drops "|m" metrics since it only recognizes standard
|
||||
# types (|c, |g, |ms, |h, |s). To capture these two metrics, change "|m"
|
||||
# to "|c" in StatsDCollector.cpp — this is a breaking change for any
|
||||
# backend that relied on the custom "|m" type. Tracked as Phase 6 Task 6.1.
|
||||
|
||||
receivers:
|
||||
otlp:
|
||||
|
||||
@@ -703,7 +703,7 @@ StatsDMeterImpl::flush()
|
||||
{
|
||||
m_dirty = false;
|
||||
std::stringstream ss;
|
||||
ss << m_impl->prefix() << "." << m_name << ":" << m_value << "|c"
|
||||
ss << m_impl->prefix() << "." << m_name << ":" << m_value << "|m"
|
||||
<< "\n";
|
||||
m_value = 0;
|
||||
m_impl->post_buffer(ss.str());
|
||||
|
||||
Reference in New Issue
Block a user