rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-30 10:30:22 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	db5b93e2c4	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-05 12:50:09 +01:00
Pratik Mankawde	f37a4a1022	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # src/xrpld/app/misc/detail/TxQ.cpp	2026-06-05 12:49:38 +01:00
Pratik Mankawde	8f3974c094	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 12:48:40 +01:00
Pratik Mankawde	283fbaa54f	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/09-data-collection-reference.md	2026-06-05 12:48:31 +01:00
Pratik Mankawde	3167a49f41	feat(telemetry): derive per-stage tx metrics from apply-pipeline spans Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim, tx.transactor) added on phase-3 through the observability stack so the spanmetrics connector produces per-stage RED metrics without any native instruments. - collector: add the `stage` dimension to the spanmetrics connector so the three stages split into separate metric series (3 bounded values). - dashboard: add a "Tx Apply Pipeline" section to transaction-overview with rate, p95 latency, and failure-rate panels grouped by stage, plus a `stage` template variable. Panels follow the existing config (node filter, exported_instance legends, Title Case, axis labels). - The failure panel filters ter_result != tesSUCCESS rather than span status, because a failing ter code completes the span normally — only thrown exceptions set an error status. This matches the existing "Transaction Results by Type" panel convention. - docs: document the spans, attributes, and stage dimension in the data collection reference and runbook, including the sampling caveat that span-derived metrics inherit tracer head-sampling and undercount at sampling_ratio < 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 12:42:53 +01:00
Pratik Mankawde	cf075888ff	docs(telemetry): fix TraceQL/LogQL query syntax in runbook - Replace all `{name="..."} \| attr = val` pipeline queries with the correct `{name="..." && span.attr = val}` inline filter syntax - Add `span.` prefix to all span attribute references; `duration`, `status`, `name`, and `resource.` keep no prefix - Fix Loki stream selector: `{job="xrpld"}` → `{service_name="xrpld"}` in all LogQL examples and the verification step - Fix cross-node queries: `rootServiceName` → `resource.service.name`, `{name=~"tx\\.."} \| attr` → `{name =~ "tx." && span.attr}` - Add DEX section (OfferCreate variants by ter_result, OfferCancel, peer relay) - Add syntax cheat-sheet block at top of Insights section - Expand tx workflow: per-AMM-type queries, Payment tecPATH., TrustSet, OracleSet, NFTokenMint cross-span - Expand consensus: slow rounds, validation send+receive comparison - Expand cross-subsystem: AMM cross-span, tx.receive no-error - Expand TxQ: retried status, NFToken enqueue type - Update Where-to-Look table: add AMM/DEX/NFT/close-time rows, fix attribute references to use span. prefix, fix stale consensus_stalled entry (now consensus_result on consensus.check) - All 57 queries verified against live stack — zero parse errors Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 17:51:37 +01:00
Pratik Mankawde	91ff486950	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-04 16:18:13 +01:00
Pratik Mankawde	5c2997d95e	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # docker/telemetry/grafana/dashboards/consensus-health.json	2026-06-04 15:41:20 +01:00
Pratik Mankawde	342b9f55a1	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-04 15:40:17 +01:00
Pratik Mankawde	17ffe8b049	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-04 15:37:55 +01:00
Pratik Mankawde	4174aef07b	fix(telemetry): align consensus_mode spanmetrics label with emitted attribute The spanmetrics connector dimension was `xrpl.consensus.mode`, but the code emits the span attribute under the bare key `consensus_mode` (matching every other dimension after the Phase 6 rename). The mismatch left the `xrpl_consensus_mode` Prometheus label empty, so the Consensus Health "Consensus Mode Over Time" panel and the `$consensus_mode` template variable (which filters every panel) matched no live series. - otel-collector-config.yaml: dimension `xrpl.consensus.mode` -> `consensus_mode` - consensus-health.json: 11 label refs `xrpl_consensus_mode` -> `consensus_mode` (the `$consensus_mode` Grafana variable name is unchanged) - telemetry-runbook.md: refresh the stale spanmetrics label table to the bare names actually emitted (command/rpc_status/consensus_mode/local/ proposal_trusted/validation_trusted), fix dotted->bare attribute names in span tables and TraceQL examples (tx_hash, ledger_seq, consensus_round_id, consensus_ledger_id, consensus_round, tx_id event attr), correct the consensus_round_id query to int (not quoted string), and fix the load_type value query ("exception_rpc" -> "exceptioned RPC"). Verified against the live stack: Tempo span tags confirm bare attribute keys (consensus_mode, ledger_seq, tx_hash, ...); the populated xrpl_consensus_mode series in Prometheus is stale retained data from an older build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:29:45 +01:00
Pratik Mankawde	b9704c9549	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 16:23:47 +01:00
Pratik Mankawde	9c69aab326	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Resolve test conflict: keep xrpl.pb.h include (phase 9) and std::uint8_t qualifiers (phase 8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:23:39 +01:00
Pratik Mankawde	3eeb8b3730	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:22:40 +01:00
Pratik Mankawde	93c27997b4	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:22:35 +01:00
Pratik Mankawde	ac79a5123e	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Resolve runbook conflict: keep both phase 6 ledger/peer span tables AND new insights/sample queries section from the enrichment work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:22:20 +01:00
Pratik Mankawde	1b227a1eff	docs(telemetry): update runbook with enriched attributes and sample queries Adds comprehensive "Insights and Sample Queries" section showing operators what questions they can answer with the newly-added span attributes: - Transaction workflow analysis (filter by tx_type, fee, ter_result) - TxQ health (txq_status, ledger_changed) - RPC debugging (is_batch, request_payload_size, load_type) - PathFinding performance (dest_currency, num_source_assets) - Consensus health (consensus_state, is_bow_out, disputes_count) - Cross-subsystem correlation examples Also updates all span reference tables with the new attributes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:18:43 +01:00
Pratik Mankawde	98fc939851	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 15:01:19 +01:00
Pratik Mankawde	4d6ddb5f1f	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:56:09 +01:00
Pratik Mankawde	ba7e1f98e4	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:24:43 +01:00
Pratik Mankawde	e7dea147cd	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:18:36 +01:00
Pratik Mankawde	8d730b8b9a	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:16:35 +01:00
Pratik Mankawde	e5fae351d6	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-05-29 17:53:29 +01:00
Pratik Mankawde	a44d91ec27	leftover clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 17:52:45 +01:00
Pratik Mankawde	8f9057729c	Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:14:21 +01:00
Pratik Mankawde	3a1f22583f	Merge branch 'pratik/otel-phase1a-plan-docs' into pratik/otel-phase1b-telemetry-infra Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 15:34:22 +01:00
Pratik Mankawde	f66a53cfc9	Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 14:51:12 +01:00
Pratik Mankawde	8b790ebac9	bumped otel version to 1.26.0 Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 12:18:20 +01:00
Pratik Mankawde	824f63216a	Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-27 16:57:08 +01:00
Pratik Mankawde	a104140a51	addressing code review comments Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-27 16:46:35 +01:00
Pratik Mankawde	ac57a91b77	merge: phase-9 (dashboard UID + line-number cleanup, detach callbacks) into phase-10 # Conflicts: # docker/telemetry/TESTING.md	2026-05-14 17:23:55 +01:00
Pratik Mankawde	a9f52458b3	merge: pratik/otel-phase8-log-correlation (dashboard UID + line-number cleanup) into pratik/otel-phase9-metric-gap-fill # Conflicts: # docker/telemetry/grafana/dashboards/consensus-health.json # docker/telemetry/grafana/dashboards/ledger-operations.json # docker/telemetry/grafana/dashboards/peer-network.json # docker/telemetry/grafana/dashboards/rpc-performance.json # docker/telemetry/grafana/dashboards/system-ledger-data-sync.json # docker/telemetry/grafana/dashboards/system-network-traffic.json # docker/telemetry/grafana/dashboards/system-node-health.json # docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json # docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json # docker/telemetry/grafana/dashboards/transaction-overview.json	2026-05-14 17:10:12 +01:00
Pratik Mankawde	0e5e802e5e	merge: pratik/otel-phase7-native-metrics (dashboard UID + line-number cleanup) into pratik/otel-phase8-log-correlation	2026-05-14 17:07:34 +01:00
Pratik Mankawde	6985e1948b	merge: pratik/otel-phase6-statsd (line-number + docs cleanup) into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # docker/telemetry/grafana/dashboards/system-ledger-data-sync.json # docker/telemetry/grafana/dashboards/system-network-traffic.json # docker/telemetry/grafana/dashboards/system-node-health.json # docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json # docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json	2026-05-14 17:07:15 +01:00
Pratik Mankawde	dfe91e071f	merge: phase-5 (runbook span-name + line-number fixes) into phase-6 # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # docs/telemetry-runbook.md	2026-05-14 16:42:13 +01:00
Pratik Mankawde	dec8b0a9a1	docs(telemetry): fix stale RPC span names + drop volatile line numbers in runbook - RPC Spans table: `rpc.request` was documented but the code actually emits `rpc.http_request`. Listed the actual emitted names (`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`) and their parent/child relationship. - Drop `:<line>` suffixes from Source File columns in both RPC and Transaction span tables. Line numbers drift with every refactor; the filename is enough for operators to grep. - Summary table: replace the never-emitted `rpc.request` row with the real entry points so `span_name=` filters in PromQL / TraceQL match.	2026-05-14 16:34:58 +01:00
Pratik Mankawde	ec8e3e2950	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-05-13 16:17:49 +01:00
Pratik Mankawde	495d5bd8a0	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 16:17:12 +01:00
Pratik Mankawde	6cd910f06f	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-13 16:17:05 +01:00
Pratik Mankawde	5cd71ed107	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-05-13 16:16:50 +01:00
Pratik Mankawde	e60efd4d2f	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-05-13 16:10:46 +01:00
Pratik Mankawde	c48f5ed6e7	docs(telemetry): update runbook attr names for simplified naming convention Update 31 attribute references in telemetry-runbook.md to match the simplified naming: drop xrpl.<domain>. prefix on per-span attrs, use domain-qualified names for collisions (rpc_status, consensus_state, etc.), and unify cross-domain refs (xrpl.ledger.seq, xrpl.tx.hash). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 16:08:48 +01:00
Pratik Mankawde	c9fe4b1a14	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-05-13 16:04:27 +01:00
Pratik Mankawde	7a854ccad2	refactor(telemetry): simplify attr naming on phase-1c — drop xrpl.<domain>. prefix - Drop xrpl.rpc.* prefix from per-span attrs (command, version). - Qualify collision-prone fields: role -> rpc_role/grpc_role, status -> rpc_status/grpc_status. - Rename payload_size -> request_payload_size for cross-domain clarity. - Simplify link.type -> link_type (bare name, no join). - Update convention doc in SpanNames.h to reflect new naming rules. - Update telemetry.md doc with renamed attr keys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 15:54:13 +01:00
Pratik Mankawde	782d98d249	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 11:40:15 +01:00
Pratik Mankawde	c096eeb239	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 11:30:22 +01:00
Pratik Mankawde	fac6c3ac1d	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-06 14:34:17 +01:00
Pratik Mankawde	761688383d	fix(telemetry): address code review issues in OTelCollector - Fix use-after-free: extract gauge callback to static function and call RemoveCallback in ~OTelGaugeImpl() before unregistering from collector - Use memory_order_acq_rel on callHooks() debounce CAS for proper happens-before relationship between hook invocations - Add explicit 2s timeout to ForceFlush() in destructor to prevent blocking indefinitely when OTLP endpoint is unreachable at shutdown - Add OTLP receiver to metrics pipeline so native OTel metrics from xrpld are actually received by the collector - Remove stale health check port from docker-compose (extension was removed from collector config) - Clarify fallback docs: StatsD path requires re-enabling receiver/port - Fix comments: Counter uses uint64_t not int64_t, gauge clamps to [0, INT64_MAX] not [0, UINT64_MAX] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 14:24:52 +01:00
Ayaz Salikhov	27f7fdb3a6	chore: Do not duplicate sanitizer flags (#7058 ) Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-05 16:32:43 +00:00
Alex Kremer	8995564ed6	refactor: Enable clang-tidy `readability-identifier-naming` check (#6571 )	2026-05-03 10:31:53 +00:00

1 2 3

149 Commits