rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-30 18:40:28 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	18121a8cf4	fix(telemetry): widen only timeseries with right-side table legends Correct the width rule from the previous layout commit. Full width (w=24) is now applied ONLY to timeseries panels whose legend is a right-side table, since those legends need the horizontal room. Panels with default/bottom legends, pie charts, and the heatmap return to half width. This narrows "Transaction Receive vs Suppressed" and "TxQ Enqueue Rate by Transaction Type", which were wrongly widened. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 16:24:06 +01:00
Pratik Mankawde	93c31573c5	refactor(telemetry): stable panel ids and topic-grouped layout Make transaction-overview deep-links stable and improve readability: - Assign explicit sequential panel ids (1..20) so viewPanel=panel-N URLs stay pinned to the same chart across edits. Previously ids were unset and Grafana auto-assigned them by array position, so any reorder silently repointed bookmarks. - Move the single-value stat panel (Transaction Apply Failed Rate) to the top row. - Lay out in three topic sections (Processing, Apply Pipeline, Queue). Within each, timeseries with a breakdown dimension (tx_type, stage, ter_result, suppressed) take full width so their right-side table legends are readable; single-series panels, pie charts, and the heatmap stay half-width and pair up. All six template variables already default to All (includeAll + multi); no change needed there. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 16:02:47 +01:00
Pratik Mankawde	ceea9e49dd	fix(telemetry): standardize transaction-overview legends and cap tooltips Apply the dashboard legend convention across all panels now that the P50 series have been removed (P95-only): - Drop the redundant "P95 " / "P50 " prefix; the panel title already states the percentile. - Put every filter/dimension value inside [] comma-separated, ending with exported_instance, e.g. "AMMDeposit [Preclaim, xrpld-mainnet]". - Add exported_instance to the by() clause and legend of the three panels that filtered on $node but omitted it (Transaction Rate by Type, Transaction Results by Type, TxQ Accept Status), so per-node series are produced. - Title-case the stage value for display via label_replace in the four apply-pipeline panels; the span attribute stays lowercase (preflight/preclaim/apply) since legendFormat cannot change case. - Cap tooltip maxHeight at 500 on every panel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 15:06:52 +01:00
Pratik Mankawde	db4d70bbc2	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-05 13:40:19 +01:00
Pratik Mankawde	b8dd848899	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 13:40:18 +01:00
Pratik Mankawde	b321792a14	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-05 13:40:18 +01:00
Pratik Mankawde	72642b5dc6	feat(telemetry): add tx apply latency panel by type and stage The existing apply-pipeline panels show latency by stage (all types combined) or by type (single span). Neither answers "for a given transaction type, which stage dominates its latency". Add a p95 panel grouped by both tx_type and stage, filterable via the $tx_type and $stage variables. Both dimensions already exist in spanmetrics, so no collector change is needed. Reflow the section so the full-width failure panel sits below the new full-width panel. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 13:39:59 +01:00
Pratik Mankawde	f37a4a1022	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # src/xrpld/app/misc/detail/TxQ.cpp	2026-06-05 12:49:38 +01:00
Pratik Mankawde	8f3974c094	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 12:48:40 +01:00
Pratik Mankawde	283fbaa54f	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/09-data-collection-reference.md	2026-06-05 12:48:31 +01:00
Pratik Mankawde	3167a49f41	feat(telemetry): derive per-stage tx metrics from apply-pipeline spans Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim, tx.transactor) added on phase-3 through the observability stack so the spanmetrics connector produces per-stage RED metrics without any native instruments. - collector: add the `stage` dimension to the spanmetrics connector so the three stages split into separate metric series (3 bounded values). - dashboard: add a "Tx Apply Pipeline" section to transaction-overview with rate, p95 latency, and failure-rate panels grouped by stage, plus a `stage` template variable. Panels follow the existing config (node filter, exported_instance legends, Title Case, axis labels). - The failure panel filters ter_result != tesSUCCESS rather than span status, because a failing ter code completes the span normally — only thrown exceptions set an error status. This matches the existing "Transaction Results by Type" panel convention. - docs: document the spans, attributes, and stage dimension in the data collection reference and runbook, including the sampling caveat that span-derived metrics inherit tracer head-sampling and undercount at sampling_ratio < 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 12:42:53 +01:00
Pratik Mankawde	759d3506b2	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-05 11:58:59 +01:00
Pratik Mankawde	021300538a	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-05 11:58:49 +01:00
Pratik Mankawde	a71d6635e6	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-05 11:58:43 +01:00
Pratik Mankawde	3df7e9cba6	code review changes and wire unused attributes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-05 11:42:33 +01:00
Pratik Mankawde	6a16dfa823	clang-tidy and formatting changes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-05 11:25:29 +01:00
Pratik Mankawde	6428c9f13c	feat(telemetry): add preflight/preclaim stage spans and stage attribute The tx.transactor span covered only the apply stage; preflight and preclaim had no telemetry, so a transaction that hard-failed those stages produced no apply-pipeline span and per-stage latency/failure was invisible. Add tx.preflight and tx.preclaim spans in applySteps.cpp via a makeStageSpan() helper using SpanGuard::hashSpan, so all three stages share a deterministic trace_id derived from txID[0:16] even though they run sequentially and often cross-thread. Each span carries stage, tx_type, and ter_result; exceptions are recorded as tefEXCEPTION before the public wrappers map them. The type lookup is guarded behind the span-active check so it costs nothing when tracing is off. Add a stage="apply" attribute to the tx.transactor span and move its three hardcoded attribute strings to a new library-safe header include/xrpl/tx/detail/TxApplySpanNames.h, which mirrors the daemon-side TxSpanNames.h strings so the collector spanmetrics connector aggregates both span sets under one dimension set. A constants-contract test pins the span-name, attribute-key, and stage-value strings; span content stays covered by the docker integration test, as the rest of the telemetry suite is. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 11:11:55 +01:00
Pratik Mankawde	d7e847a53b	removed p50 renders from all dashboards Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 18:11:23 +01:00
Pratik Mankawde	c3bdcb4291	clang-tidy include Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 18:02:47 +01:00
Pratik Mankawde	478b58395b	loop levelization Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 17:54:52 +01:00
Pratik Mankawde	cf075888ff	docs(telemetry): fix TraceQL/LogQL query syntax in runbook - Replace all `{name="..."} \| attr = val` pipeline queries with the correct `{name="..." && span.attr = val}` inline filter syntax - Add `span.` prefix to all span attribute references; `duration`, `status`, `name`, and `resource.` keep no prefix - Fix Loki stream selector: `{job="xrpld"}` → `{service_name="xrpld"}` in all LogQL examples and the verification step - Fix cross-node queries: `rootServiceName` → `resource.service.name`, `{name=~"tx\\.."} \| attr` → `{name =~ "tx." && span.attr}` - Add DEX section (OfferCreate variants by ter_result, OfferCancel, peer relay) - Add syntax cheat-sheet block at top of Insights section - Expand tx workflow: per-AMM-type queries, Payment tecPATH., TrustSet, OracleSet, NFTokenMint cross-span - Expand consensus: slow rounds, validation send+receive comparison - Expand cross-subsystem: AMM cross-span, tx.receive no-error - Expand TxQ: retried status, NFToken enqueue type - Update Where-to-Look table: add AMM/DEX/NFT/close-time rows, fix attribute references to use span. prefix, fix stale consensus_stalled entry (now consensus_result on consensus.check) - All 57 queries verified against live stack — zero parse errors Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 17:51:37 +01:00
Pratik Mankawde	d6b314e8d5	fix(telemetry): trim Tempo search filters to 7 cross-cutting entry points Reduced from 30 to 7 filters: service.instance.id, name, status, command, tx_hash, tx_type, ledger_hash. Full attribute inventory is in OpenTelemetryPlan/09-data-collection-reference.md §4; TraceQL autocomplete covers the rest. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 17:43:26 +01:00
Pratik Mankawde	0a800069bf	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 16:43:25 +01:00
Pratik Mankawde	938a4d17ce	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 16:43:25 +01:00
Pratik Mankawde	ca3a78abce	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 16:43:25 +01:00
Pratik Mankawde	eef11a65fa	fix(telemetry): code-review dashboard cleanups (legends + stale descriptions) From the code-review pass: - transaction-overview.json: the tx.process and tx.transactor latency-by-type panels used lowercase legends (p95/p50) without the per-node dimension. Use Title Case (P95/P50), add exported_instance to the by() clause, and include [{{exported_instance}}] in the legend, per the dashboard legend convention. - consensus-health.json: panel descriptions still referenced the old dotted attribute names (xrpl.consensus.mode, xrpl.ledger.seq) after the A1 rename; update them to the bare emitted names (consensus_mode, ledger_seq). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:43:12 +01:00
Pratik Mankawde	9c40039fda	fix(telemetry): avoid double-counting consensus_txset ledger mismatches handleMismatch() recorded the "consensus_txset" reason but then fell through to the transaction-level comparison, which also recorded a reason ("same_txset_diff_result" / "different_txset"). A single mismatch with disagreeing consensus tx-set hashes therefore incremented xrpld_ledger_history_mismatch_total twice across two reason labels, so the sum over reason exceeded the real mismatch count. The consensus tx-set hash disagreement is the root cause; return after recording it so each mismatch contributes exactly one reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:37:59 +01:00
Pratik Mankawde	c20d10fd36	fix(telemetry): restore consensus_mode label fix lost in phase-8->9 merge The A1 fix (xrpl_consensus_mode -> consensus_mode) was applied on phase-6, but the phase-8->phase-9 merge conflict resolution for consensus-health.json took phase-9's pre-fix panel base, silently reintroducing all 11 stale xrpl_consensus_mode label references (the spanmetrics label that is never populated — see the original A1 commit). Re-apply the label fix on phase-9: xrpl_consensus_mode -> consensus_mode in every panel expr, legendFormat, and the $consensus_mode template variable's label_values() query. The Grafana variable name $consensus_mode is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:20:42 +01:00
Pratik Mankawde	7d8e908879	feat(telemetry): add dashboard panels for new T3 metrics Visualise the metrics added in this series: - consensus-health: "Ledger History Mismatch Rate by Reason" (xrpld_ledger_history_mismatch_total by reason — fork diagnostics) - fee-market: "Queue Abandonment Rate (Expired)" and "Queue Admission Rejections (Dropped)" (xrpld_txq_expired_total / dropped_total) - peer-network: "Reduce-Relay Peer Selection" and "Reduce-Relay Missing-Tx Frequency" (xrpld_reduce_relay_metrics) - system-node-health: "Ledger Acquire Duration" and "Ledger Acquire Rate by Outcome" (ledger.acquire span) otel-collector-config.yaml: add outcome and acquire_reason spanmetrics dimensions so the ledger.acquire outcome breakdown populates. All panels follow the existing template: $node filter, exported_instance in legends, Title Case, axis labels. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:16:55 +01:00
Pratik Mankawde	6205199dc7	docs(telemetry): list new instruments in MetricsRegistry class diagram Add the new synchronous counters (ledger_history_mismatch_total{reason}, txq_expired_total, txq_dropped_total{reason}) and the reduce-relay observable gauge to the ASCII ownership diagram in the MetricsRegistry header so the documented instrument inventory matches the code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:15:20 +01:00
Pratik Mankawde	9376aa7c88	feat(telemetry): add reduce-relay efficiency gauge The transaction reduce-relay subsystem (selected vs suppressed peers, feature-disabled peers, missing-tx frequency) was computed in OverlayImpl's TxMetrics but only surfaced via the get_counts JSON RPC — invisible to Prometheus/Grafana, despite being the central efficiency KPI for the feature. Add an observable gauge xrpld_reduce_relay_metrics{metric} that reads Overlay::txMetrics() and parses its rolling-average fields: - selected_peers (txr_selected_cnt) - suppressed_peers (txr_suppressed_cnt) - not_enabled_peers (txr_not_enabled_cnt) - missing_tx_freq (txr_missing_tx_freq) The JSON values are decimal strings (std::to_string), parsed via std::stoll — the same JSON-reading pattern as registerNodeStoreGauge. No new Overlay accessor or core-interface change required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:14:33 +01:00
Pratik Mankawde	864ac729de	feat(telemetry): add ledger.acquire span for inbound ledger fetch InboundLedger drives ledger back-fill and fork recovery with timeout/retry logic (kLedgerTimeoutRetriesMax = 6), but emitted only a global ledger_fetches counter — sync/recovery cost was a telemetry blind spot. Add a ledger.acquire span that wraps the acquisition lifecycle: - Started in InboundLedger::init() with ledger_seq and acquire_reason (history / consensus / generic, mirroring InboundLedger::Reason). - Finalized in InboundLedger::done() with outcome (complete / failed), timeouts, and peer_count, then reset so the span duration is exported. Held as a std::optional<SpanGuard> member (same pattern as RCLConsensus roundSpan_). New op/attr/val constants added to LedgerSpanNames.h. Compiles to a no-op when telemetry is disabled via the SpanGuard fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:11:57 +01:00
Pratik Mankawde	793d2ecfce	feat(telemetry): add txq expired/dropped counters for queue backpressure The transaction queue had no metric for demand that leaves or never enters the queue, so fee-underpayment abandonment and admission-control rejection were invisible (distinct from jq_trans_overflow, which is the job queue). Add two synchronous counters via MetricsRegistry: - xrpld_txq_expired_total — incremented in TxQ::processClosedLedger() for each queued transaction removed because its LastLedgerSequence passed (submitters who under-bid the escalating fee and were never included). - xrpld_txq_dropped_total{reason} — incremented in TxQ::apply() at the queue-full admission-control returns (reason="queue_full"). Both reach MetricsRegistry via the Application& parameter already passed to these methods; calls are null-guarded so they no-op when telemetry is disabled. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:06:33 +01:00
Pratik Mankawde	7a509a01eb	feat(telemetry): add xrpld_ledger_history_mismatch_total{reason} counter LedgerHistory::handleMismatch() already classifies a built-vs-validated ledger mismatch (prior ledger, close time, consensus tx set, same/different tx set), but only bumped a single untyped beast::insight counter — the reason was dropped. Fork diagnosis was therefore a log-grep exercise. Add a labeled OTel counter so the mismatch reason is a queryable time series: - MetricsRegistry: new ledgerHistoryMismatchCounter_ + incrementLedgerHistoryMismatch(reason) - LedgerHistory: record one reason per classification branch (unknown, prior_ledger, close_time, consensus_txset, same_txset_diff_result, different_txset). Reaches MetricsRegistry via the existing app_ reference. The existing beast::insight mismatchCounter_ is left intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:02:26 +01:00
Pratik Mankawde	d3955d3639	fix(telemetry): emit real diverged-peer count for peers_insane_count The xrpld_peer_quality{metric="peers_insane_count"} gauge was hardcoded to 0.0 with a TODO, leaving the "Insane/Diverged Peers" panel permanently empty. PeerImp::json() already exposes the peer's tracking state via the "track" field (set to "diverged" when tracking_ == Tracking::Diverged). The peer-quality callback already iterates peer->json() for latency and version, so count peers whose "track" field equals "diverged" in the same loop — no change to the abstract Peer interface required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:53:53 +01:00
Pratik Mankawde	d7baf262f8	fix(telemetry): remove duplicate consensus outcome/failures panels A phase-8->phase-9 merge (`a675897aaf`) duplicated the "Consensus Outcome Distribution" and "Consensus Failures Over Time" panels: both appeared twice with byte-identical queries (verified ignoring gridPos). The pair existed once on phase-6/7/8 and became two on phase-9 only, so the duplication originated in phase-9's own merge history. Remove the second (lower) copy of each and re-stack panel y-positions with no gaps. The single retained copy keeps the original y=64 row. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:51:52 +01:00
Pratik Mankawde	b286335ccf	feat(telemetry): add load-factor attribution and 7-day agreement panels Both metrics are already emitted and live in Prometheus but were not fully visualised. - Fee Market (xrpld-fee-market.json): "Load Factor Attribution (Stacked Components)" — stacks load_factor_fee_escalation / fee_queue / local / net / cluster so an operator can see which component drives the effective fee. The existing panels showed the aggregate only. - Validator Health (xrpld-validator-health.json): "Agreement % (7d)" and "Agreements vs Missed (7d)" — the xrpld_validation_agreement gauge already observes agreement_pct_7d / agreements_7d / missed_7d, but the dashboard only plotted 1h and 24h windows. Panels follow the existing template: $node filter, exported_instance in legends, Title Case, axis labels. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:44:07 +01:00
Pratik Mankawde	5c2997d95e	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # docker/telemetry/grafana/dashboards/consensus-health.json	2026-06-04 15:41:20 +01:00
Pratik Mankawde	342b9f55a1	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 15:40:17 +01:00
Pratik Mankawde	000ad1d1f5	feat(telemetry): add gRPC and pathfinding span panels (RPC dashboard) The grpc.{Method} spans (GRPCServer.cpp) and pathfind.* spans (PathRequest.cpp) are emitted but had no dashboard coverage. The existing RPC & Pathfinding dashboard only plotted StatsD timers. Add span-derived rows: - gRPC Request Rate by Method (grpc.* by method) - gRPC Latency P95 by Method - gRPC Error Rate by Status (by grpc_status) - Pathfinding Compute Duration (pathfind.compute p95/p50) - Pathfinding Request & Discovery Rate (pathfind.request / pathfind.discover) otel-collector-config.yaml: add method, grpc_role, grpc_status spanmetrics dimensions (bounded value sets). Add a $grpc_method template variable so the gRPC panels can be filtered by method, consistent with the dashboard filter conventions. Note: these spans populate only when the node serves gRPC / pathfinding traffic; they are correct but not exercised by the current health-check workload (they will be covered by the Phase 10 workload generator). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:40:07 +01:00
Pratik Mankawde	17ffe8b049	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 15:37:55 +01:00
Pratik Mankawde	63c6f3b8df	feat(telemetry): surface consensus + TxQ lifecycle spans in dashboards The consensus state-machine and TxQ lifecycle spans are emitted by the code and present in Prometheus, but no panel visualised them. Add panels keyed on those span_names (verified live) plus the low-cardinality dimensions needed to break them down. Consensus Health (consensus-health.json) — new rows: - Consensus Round Duration (full round, p95/p50, mode-filterable) - Consensus Phase Duration (open vs establish breakdown) - Position Update Duration (update_positions p95/p50) - Consensus Stall Rate (consensus.check by consensus_stalled) - Consensus Mode-Change Rate by Target Mode (mode_change by mode_new) Transaction Overview (transaction-overview.json) — new rows: - TxQ Enqueue Rate by Transaction Type (txq.enqueue by tx_type) - Queue Bypass Ratio (txq.apply_direct vs txq.enqueue) - Queue Accept (Drain) Duration per Ledger (txq.accept p95/p50) - Queue Cleanup Rate (txq.cleanup expired entries) otel-collector-config.yaml — add spanmetrics dimensions for the lifecycle breakdowns: mode_new, consensus_stalled, consensus_phase, consensus_result (all bounded value sets, safe as Prometheus labels). All new panels follow the existing dashboard template: $node filter, exported_instance in every legend, Title Case, axis labels, row layout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:37:29 +01:00
Pratik Mankawde	4174aef07b	fix(telemetry): align consensus_mode spanmetrics label with emitted attribute The spanmetrics connector dimension was `xrpl.consensus.mode`, but the code emits the span attribute under the bare key `consensus_mode` (matching every other dimension after the Phase 6 rename). The mismatch left the `xrpl_consensus_mode` Prometheus label empty, so the Consensus Health "Consensus Mode Over Time" panel and the `$consensus_mode` template variable (which filters every panel) matched no live series. - otel-collector-config.yaml: dimension `xrpl.consensus.mode` -> `consensus_mode` - consensus-health.json: 11 label refs `xrpl_consensus_mode` -> `consensus_mode` (the `$consensus_mode` Grafana variable name is unchanged) - telemetry-runbook.md: refresh the stale spanmetrics label table to the bare names actually emitted (command/rpc_status/consensus_mode/local/ proposal_trusted/validation_trusted), fix dotted->bare attribute names in span tables and TraceQL examples (tx_hash, ledger_seq, consensus_round_id, consensus_ledger_id, consensus_round, tx_id event attr), correct the consensus_round_id query to int (not quoted string), and fix the load_type value query ("exception_rpc" -> "exceptioned RPC"). Verified against the live stack: Tempo span tags confirm bare attribute keys (consensus_mode, ledger_seq, tx_hash, ...); the populated xrpl_consensus_mode series in Prometheus is stale retained data from an older build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:29:45 +01:00
Pratik Mankawde	e6643a4389	updated tags Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 14:46:57 +01:00
Pratik Mankawde	80800ee130	use image-renderer in graphana Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 14:40:35 +01:00
Pratik Mankawde	ebc5c5ed9d	fix(telemetry): set service_instance_id in [insight] so dashboards filter beast::insight metrics exported via OTLP carried no exported_instance label because [insight] omitted service_instance_id (only [telemetry] set it). Every system-* dashboard filters insight metrics with exported_instance=~"$node", and the $node template variable is sourced from label_values(..., exported_instance) — so with the label absent, $node was empty and all insight-backed panels showed no data. Add service_instance_id to [insight] in both telemetry configs, matching the [telemetry] value (xrpld-mainnet / xrpld-devnet). CollectorManager already reads this key and passes it to OTelCollector, which sets the service.instance.id resource attribute. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 14:36:04 +01:00
Pratik Mankawde	61c2760296	consmetic updates Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 14:32:13 +01:00
Pratik Mankawde	88ac4b6aee	fix(telemetry): use short unit for NodeStore and object-count panels The phase-9 NodeStore I/O totals, write-load/read-queue, read-threads, and object instance-count panels rendered large cumulative values with unit "none". Switch to "short" for readable abbreviation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-04 14:27:53 +01:00
Pratik Mankawde	a5f80514a9	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 14:26:16 +01:00
Pratik Mankawde	90f7a8bd4e	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-04 14:26:16 +01:00

1 2 3 4 5 ...

14676 Commits