Revert meter wire format change, defer |m -> |c fix as separate task

The StatsDCollector |m -> |c change is a breaking change for existing
StatsD backends. Reverted to original |m and added TODO comments noting
that Resource Manager warn/drop metrics need this fix to flow through
the OTel StatsD receiver. Marked Phase 6 Task 6.1 as deferred.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-02-26 16:23:49 +00:00
parent bac93cceaf
commit aa1bbd3273
4 changed files with 25 additions and 16 deletions

View File

@@ -212,24 +212,26 @@ rippled has a mature metrics framework (`beast::insight`) that emits StatsD-form
### Tasks
| Task | Description | Effort | Risk |
| ---- | ---------------------------------------------------------------- | ------ | ---- |
| 6.1 | Fix Meter wire format (`\|m` `\|c`) in StatsDCollector.cpp | 0.5d | Low |
| 6.2 | Add `statsd` receiver to OTel Collector config | 0.5d | Low |
| 6.3 | Expose UDP port 8125 in docker-compose.yml | 0.1d | Low |
| 6.4 | Add `[insight]` config to integration test node configs | 0.5d | Low |
| 6.5 | Create "Node Health" Grafana dashboard (8 panels) | 1d | Low |
| 6.6 | Create "Network Traffic" Grafana dashboard (8 panels) | 1d | Low |
| 6.7 | Create "RPC & Pathfinding (StatsD)" Grafana dashboard (8 panels) | 1d | Low |
| 6.8 | Update integration test to verify StatsD metrics in Prometheus | 0.5d | Low |
| 6.9 | Update TESTING.md and telemetry-runbook.md | 0.5d | Low |
| Task | Description | Effort | Risk |
| ---- | --------------------------------------------------------------------------------------------------------------- | ------ | ---- |
| 6.1 | **DEFERRED** Fix Meter wire format (`\|m` `\|c`) in StatsDCollector.cpp breaking change, tracked separately | 0.5d | Low |
| 6.2 | Add `statsd` receiver to OTel Collector config | 0.5d | Low |
| 6.3 | Expose UDP port 8125 in docker-compose.yml | 0.1d | Low |
| 6.4 | Add `[insight]` config to integration test node configs | 0.5d | Low |
| 6.5 | Create "Node Health" Grafana dashboard (8 panels) | 1d | Low |
| 6.6 | Create "Network Traffic" Grafana dashboard (8 panels) | 1d | Low |
| 6.7 | Create "RPC & Pathfinding (StatsD)" Grafana dashboard (8 panels) | 1d | Low |
| 6.8 | Update integration test to verify StatsD metrics in Prometheus | 0.5d | Low |
| 6.9 | Update TESTING.md and telemetry-runbook.md | 0.5d | Low |
**Total Effort**: 5.6 days
### Wire Format Fix (Task 6.1)
### Wire Format Fix (Task 6.1) — DEFERRED
The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffix, which is non-standard StatsD. The OTel StatsD receiver silently drops these. Fix: change `|m` to `|c` (counter), which is semantically correct since meters are increment-only counters. Only 2 metrics are affected (`warn`, `drop` in Resource Manager).
**Status**: Deferred as a separate change this is a breaking change for any StatsD backend that previously consumed the custom `|m` type. The Resource Warnings and Resource Drops dashboard panels will show no data until this fix is applied.
### New Grafana Dashboards
**Node Health** (`statsd-node-health.json`, uid: `rippled-statsd-node-health`):
@@ -249,7 +251,7 @@ The `StatsDMeterImpl` in `StatsDCollector.cpp:706` sends metrics with `|m` suffi
- [ ] StatsD metrics visible in Prometheus (`curl localhost:9090/api/v1/query?query=rippled_LedgerMaster_Validated_Ledger_Age`)
- [ ] All 3 new Grafana dashboards load without errors
- [ ] Integration test verifies at least core StatsD metrics (ledger age, peer counts, RPC requests)
- [ ] Meter metrics (`warn`, `drop`) flow correctly after `|m` `|c` fix
- [ ] ~~Meter metrics (`warn`, `drop`) flow correctly after `|m` → `|c` fix~~ DEFERRED (breaking change, tracked separately)
---

View File

@@ -185,7 +185,7 @@
},
{
"title": "Resource Warnings Rate",
"description": "Rate of resource warning events from the Resource Manager. Sourced from the warn meter (Logic.h:33) which increments when a consumer (peer or RPC client) exceeds the warning threshold for resource usage. A rising rate indicates aggressive clients that may need throttling.",
"description": "Rate of resource warning events from the Resource Manager. Sourced from the warn meter (Logic.h:33) which increments when a consumer (peer or RPC client) exceeds the warning threshold for resource usage. A rising rate indicates aggressive clients that may need throttling. NOTE: This panel will show no data until the |m -> |c fix is applied in StatsDCollector.cpp:706 (Phase 6 Task 6.1).",
"type": "stat",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 24 },
"options": {
@@ -214,7 +214,7 @@
},
{
"title": "Resource Drops Rate",
"description": "Rate of resource drop events from the Resource Manager. Sourced from the drop meter (Logic.h:34) which increments when a consumer is disconnected or blocked due to excessive resource usage. Non-zero values mean the node is actively rejecting abusive connections.",
"description": "Rate of resource drop events from the Resource Manager. Sourced from the drop meter (Logic.h:34) which increments when a consumer is disconnected or blocked due to excessive resource usage. Non-zero values mean the node is actively rejecting abusive connections. NOTE: This panel will show no data until the |m -> |c fix is applied in StatsDCollector.cpp:706 (Phase 6 Task 6.1).",
"type": "stat",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 24 },
"options": {

View File

@@ -11,6 +11,13 @@
# rippled also sends beast::insight metrics via StatsD/UDP to port 8125.
# These are ingested by the statsd receiver and merged into the same
# Prometheus endpoint alongside span-derived metrics.
#
# TODO: The Resource Manager's "warn" and "drop" metrics use the non-standard
# "|m" (meter) StatsD type in StatsDCollector.cpp:706. The OTel StatsD
# receiver silently drops "|m" metrics since it only recognizes standard
# types (|c, |g, |ms, |h, |s). To capture these two metrics, change "|m"
# to "|c" in StatsDCollector.cpp — this is a breaking change for any
# backend that relied on the custom "|m" type. Tracked as Phase 6 Task 6.1.
receivers:
otlp:

View File

@@ -703,7 +703,7 @@ StatsDMeterImpl::flush()
{
m_dirty = false;
std::stringstream ss;
ss << m_impl->prefix() << "." << m_name << ":" << m_value << "|c"
ss << m_impl->prefix() << "." << m_name << ":" << m_value << "|m"
<< "\n";
m_value = 0;
m_impl->post_buffer(ss.str());