rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-06-05 09:46:53 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	61c2760296	consmetic updates Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 14:32:13 +01:00
Pratik Mankawde	88ac4b6aee	fix(telemetry): use short unit for NodeStore and object-count panels The phase-9 NodeStore I/O totals, write-load/read-queue, read-threads, and object instance-count panels rendered large cumulative values with unit "none". Switch to "short" for readable abbreviation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 14:27:53 +01:00
Pratik Mankawde	90f7a8bd4e	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 14:26:16 +01:00
Pratik Mankawde	a5f80514a9	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 14:26:16 +01:00
Pratik Mankawde	45ab508ed8	fix(telemetry): use short unit for large count/message panels Count and message-volume panels (operating-mode transitions, job queue depth, network/overlay message totals, getobject message counts) used unit "none", rendering large values as raw unscaled numbers. Switch to "short" so Grafana abbreviates (e.g. 1.5 Mil) for readability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 14:26:03 +01:00
Pratik Mankawde	a6cebf21b0	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # docker/telemetry/grafana/dashboards/system-node-health.json	2026-06-04 14:06:46 +01:00
Pratik Mankawde	6c71aa8c2a	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 14:05:25 +01:00
Pratik Mankawde	9b46a343fc	fix(telemetry): migrate system dashboards from dead rippled_ to xrpld_ metrics The system-* dashboards queried the legacy StatsD rippled_ prefix, but the node now emits beast::insight metrics via native OTLP under the xrpld_ prefix (config: [insight] server=otel, prefix=xrpld). All queries returned no data. Migration (names derived from C++ beast::insight registrations, not live Prometheus, since a syncing node does not emit every metric yet): - rippled_ -> xrpld_ prefix across all panel queries and template variables (including the $node variable query, which broke the whole dashboard filter) - Histogram Event instruments export with unit ms, so bare _bucket becomes _milliseconds_bucket: ios_latency, rpc_time, rpc_size, pathfind_fast/full - Job-type metrics were StatsD summaries (label quantile="$quantile"); on the OTLP path they are histograms. Converted those queries to histogram_quantile($quantile, rate(xrpld_<job>_milliseconds_bucket[5m])) and added the previously-undefined $quantile template variable - Per-job-type detail panels: __name__ regex now matches _milliseconds_bucket No panels removed. Panels for metrics not yet emitted (e.g. warn/drop, or job types the syncing node has not run) show no data until the path executes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 14:01:13 +01:00
Pratik Mankawde	10b4112382	fix(telemetry): use p75/p99 quantiles and add gauge panels for job/rpc latency P100 from a histogram is degenerate — it always returns the upper bound of the highest populated bucket (a single slow outlier pins it to the top boundary), producing a flat line. Revert to meaningful quantiles: - Job Queue Wait Time / Job Execution Time: p75 (typical) + p99 (tail) - Per-Job-Type / Per-Method: p99 - Added gauge panels showing current p99 with green/yellow/red thresholds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 12:46:58 +01:00
Pratik Mankawde	859bd21ca5	only render p100. Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 12:16:48 +01:00
Pratik Mankawde	15d3e3a375	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 11:28:04 +01:00
Pratik Mankawde	0fe09cda9b	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 11:28:04 +01:00
Pratik Mankawde	a9cc1067d0	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 11:28:04 +01:00
Pratik Mankawde	194f5b8af8	fix(telemetry): set ms unit on duration heatmap y-axes The three duration heatmaps (transaction, consensus accept, RPC latency) had an axisLabel of "Duration (ms)" but no unit code, so y-axis tick values rendered unscaled. Set unit=ms on both the yAxis options and panel defaults so buckets display as proper millisecond values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 11:27:46 +01:00
Pratik Mankawde	37c9168065	fix(telemetry): correct invalid 'us' unit code to 'µs' on duration panels Grafana does not recognize 'us' as a unit code, so microsecond values rendered as raw numbers with a plain 'us' suffix (no scaling). The correct code is 'µs'. Affects job-queue and OTel RPC latency panels backed by *_duration_us histograms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 11:26:43 +01:00
Pratik Mankawde	373012e84d	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 10:55:36 +01:00
Pratik Mankawde	8f9fa52f93	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 10:55:35 +01:00
Pratik Mankawde	fb7c3bc38d	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # docker/telemetry/grafana/dashboards/transaction-overview.json	2026-06-04 10:55:27 +01:00
Pratik Mankawde	8e606bbaf4	feat(telemetry): add tx_type/ter_result/txq_status dashboard filters Adds template variables $tx_type, $ter_result, $txq_status to the Transaction Overview dashboard. All relevant panels now respect these filters, enabling operators to drill into specific transaction types or result codes. Changes: - Panel 2 renamed to "Transaction Processing Latency by Type" (now shows p95/p50 per tx_type instead of aggregate) - Panels 1,3,4,5,7,9,12 filter by $tx_type - Panel 10 filters by $tx_type and $ter_result - Panel 11 filters by $txq_status - Removed redundant "TX Processing Latency by Type (p95)" panel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 10:55:11 +01:00
Pratik Mankawde	40fba327cf	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 10:53:56 +01:00
Pratik Mankawde	811b934004	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 10:53:55 +01:00
Pratik Mankawde	c80038fd42	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 10:53:55 +01:00
Pratik Mankawde	7397bbcdd2	feat(telemetry): add tx_type/ter_result/txq_status dashboard filters Adds template variables $tx_type, $ter_result, $txq_status to the Transaction Overview dashboard. All relevant panels now respect these filters, enabling operators to drill into specific transaction types or result codes. Changes: - Panel 2 renamed to "Transaction Processing Latency by Type" (now shows p95/p50 per tx_type instead of aggregate) - Panels 1,3,4,5,7,9,12 filter by $tx_type - Panel 10 filters by $tx_type and $ter_result - Panel 11 filters by $txq_status - Removed redundant "TX Processing Latency by Type (p95)" panel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 10:53:45 +01:00
Pratik Mankawde	8259026a25	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 10:47:47 +01:00
Pratik Mankawde	9947a52e79	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 10:47:47 +01:00
Pratik Mankawde	ee2f1b4fbf	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 10:47:47 +01:00
Pratik Mankawde	2627ea7f65	feat(telemetry): add TX Processing Latency by Type panel to dashboard Shows p95 latency of tx.process span broken down by tx_type. Works for both received and locally-processed transactions, unlike the tx.transactor panel which requires the node to be synced and applying. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 10:47:33 +01:00
Pratik Mankawde	a675897aaf	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Resolve consensus dashboard conflict and remove duplicate consensus_state dimension in collector config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:53:10 +01:00
Pratik Mankawde	f60c995fe1	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:52:00 +01:00
Pratik Mankawde	fff8598a33	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:52:00 +01:00
Pratik Mankawde	ac1805f0a4	feat(telemetry): add spanmetrics dimensions and dashboard panels for enriched attrs Collector config: add tx_type, ter_result, txq_status, consensus_state, load_type, is_batch as spanmetrics dimensions so they appear as Prometheus labels for dashboard queries. New dashboard panels: - Transaction Overview: Rate by Type, Results by Type, TxQ Status (pie), Transactor Duration p95 by Type - Consensus Health: Outcome Distribution (pie), Failures Over Time - RPC Performance: Resource Cost by Command, Batch vs Single Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:51:51 +01:00
Pratik Mankawde	11717a5431	build fixed Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 18:13:10 +01:00
Pratik Mankawde	615d339f84	fix(docs): apply rename scripts — prefix=rippled to prefix=xrpld The check-rename CI job requires all rename scripts to have been run. The telemetry config files had 'prefix=rippled' which should be 'xrpld'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-01 17:03:27 +01:00
Pratik Mankawde	4d6ddb5f1f	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:56:09 +01:00
Pratik Mankawde	ba7e1f98e4	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:24:43 +01:00
Pratik Mankawde	d7579b2861	formatting changes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:21:00 +01:00
Pratik Mankawde	088848e7ab	formatting updates Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:20:08 +01:00
Pratik Mankawde	e7dea147cd	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:18:36 +01:00
Pratik Mankawde	8d730b8b9a	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:16:35 +01:00
Pratik Mankawde	2f96c6547c	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:51:31 +01:00
Pratik Mankawde	c187a62353	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:47:15 +01:00
Pratik Mankawde	c848e51e13	Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:44:07 +01:00
Pratik Mankawde	3a1f22583f	Merge branch 'pratik/otel-phase1a-plan-docs' into pratik/otel-phase1b-telemetry-infra Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 15:34:22 +01:00
Pratik Mankawde	7ac5343119	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 16:09:41 +01:00
Pratik Mankawde	c6c019ed8b	addressed code review comments Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 15:55:25 +01:00
Pratik Mankawde	4bd1176df5	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 11:38:05 +01:00
Pratik Mankawde	9498b2865f	fix(telemetry): address PR #6424 review comments - Drop xrpl.node.amendment_blocked / xrpl.node.server_state from telemetry surface (constants in SpanNames.h, two filters in tempo.yaml). Operators read the same data via server_info / server_state RPC; OTel SDK 1.18.0 cannot refresh resource attrs at runtime so resource-level emission was not viable either. - Namespace all pathfind span attributes under pathfind_* (underscore form per Phase 1c rule 5). Renames in PathFindSpanNames.h and call sites in PathRequest.cpp, PathRequestManager.cpp, plus the rule-5 retention xrpl.pathfind.ledger_index -> pathfind_ledger_index. - Wire pathfind_source_account / pathfind_dest_account on pathfind.request in doPathFind / doRipplePathFind handlers (only when present + string). - Collapse per-asset pathfind.discover / pathfind.rank spans into one pathfind.discover hoisted around the per-source-asset loop in PathRequest::findPaths. Span count goes from 2N to 1 per RPC call; per-asset breakdown traded for bounded storage and cardinality. Trade-off documented inline. - Fix pathfind_num_paths semantics: now sums getBestPaths().size() across the loop (paths actually returned) instead of the maxPaths input cap. - PathRequestManager::updateAll: move span creation after the locked requests_ snapshot, early-return when no active subscriptions exist (avoids empty span on every ledger close), set pathfind_num_requests = requests.size(). - Update Phase2_taskList.md and 02-design-decisions.md to match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 11:27:29 +01:00
Pratik Mankawde	ce04dac32e	consensus total per round time panel added Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-27 14:54:36 +01:00
Pratik Mankawde	0330d037ef	connection to mainnet added Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-27 14:53:29 +01:00
Ayaz Salikhov	23d0812827	style: Use shfmt instead of bashate (#7326 )	2026-05-26 18:28:23 +00:00

1 2 3

144 Commits