rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-31 02:50:24 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	758a3fec29	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation # Conflicts: # OpenTelemetryPlan/09-data-collection-reference.md	2026-06-05 19:42:53 +01:00
Pratik Mankawde	a23d83f393	docs(telemetry): add ledger.acquire to 09-doc + fix peer-quality dashboard metric prefix Phase 9 introduces the ledger.acquire span (InboundLedger fetch) that phases 7-8 do not have, so the forward-merged 09-data-collection-reference inventory is extended here: - §1.1: add ledger.acquire to the Ledger span table. - §1.2: add its attributes (acquire_reason, timeouts, peer_count, outcome) and note it also sets ledger_seq; bump the span count. Also fix two stale StatsD metric references in the Peer Quality dashboard (xrpld-peer-quality.json): rippled_Peer_Finder_Active_{Inbound,Outbound}_Peers -> xrpld_Peer_Finder_* to match the xrpld_ metric prefix the rest of the stack uses. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 19:36:34 +01:00
Pratik Mankawde	22b533ac51	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-05 19:33:13 +01:00
Pratik Mankawde	8046a30e9b	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 19:28:58 +01:00
Pratik Mankawde	4a8aa9e514	docs(telemetry): reconcile 09-data-collection-reference span/attribute inventory The §1 span and attribute inventory had regressed to an older 16-span snapshot that uses the pre-2026-05-13 dotted attribute keys, while phase-7's code emits ~36 spans with bare/underscore attribute keys. The §Data Flow Overview and §2 System Metrics sections (native OTLP transport — phase-7's migration) were already correct and are left unchanged. - §1.1: expand the span inventory to the full surface — add gRPC (grpc.<MethodName>), TxQ (txq.), PathFind (pathfind.), and the full consensus set (round/phase.open/ establish/update_positions/check/mode_change/proposal.receive/validation.receive). Fix the phantom rpc.request -> rpc.http_request, add rpc.ws_upgrade. No grpc.request, no pathfind.rank, no ledger.acquire (the latter is added in phase-9, not yet present here). - §1.2: convert every span-attribute key from dotted xrpl.<domain>.<field> to the bare/underscore form. The sole span-attr dotted exception is xrpl.ledger.hash on peer.validation.receive (shared constant); consensus.validation.send uses bare ledger_hash. Resource attrs xrpl.network.id/type stay dotted. Fix tx_count/tx_failed placement (on tx.apply, not ledger.build). Add attribute tables for the new families. - §1.3: list the full set of spanmetrics dimension labels (bare keys, from the collector config) instead of the stale xrpl_rpc_command-style names. - §4/§5: convert Tempo TraceQL and PromQL examples to the bare attribute/label forms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 19:28:45 +01:00
Pratik Mankawde	dc5bb4b35c	feat(telemetry): emit xrpld_validation_{agreements,missed}_total counters Wire the two previously-registered-but-never-incremented validation counters to ValidationTracker's gross lifetime tallies, exported as monotonic ObservableCounters. New gross atomics count each ledger once at first classification and are never adjusted on late repair, keeping the _total counters monotonic and additive (agreements_total + missed_total == ledgers reconciled); the repair-aware windowed view stays on the existing xrpld_validation_agreement gauge. The validator-health dashboard panels that already query these names now render data instead of "No data". Also de-stale 09-data-collection-reference.md: §5b documented flat metric names (xrpld_cache_SLE_hit_rate, ...) that the code never emits — it emits labeled gauges (xrpld_cache_metrics{metric="SLE_hit_rate"}). Replace the stale flat-name tables with a pointer to the canonical labeled section, reconcile the contradictory headline counts, and correct xrpld_job_count to its real exported name xrpld_jobq_job_count. Adds two GTests asserting gross tallies stay frozen on repair while net totals move, plus the additive invariant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 18:29:29 +01:00
Pratik Mankawde	cb9fce6890	fix(telemetry): align Phase 10 workload harness with current OTel recording surface + fix CI The Phase 10 validation harness had drifted from the code's recording surface and the telemetry-validation CI job was failing before it could build. CI fix (telemetry-validation.yml): - Replace nonexistent local action ./.github/actions/print-env with the remote XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this). - Sync prepare-runner and upload-artifact action SHAs to the canonical workflow. Recording-surface reconciliation (docker/telemetry/workload/): - Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id, ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...). Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared constant), while consensus.validation.send uses bare ledger_hash. - Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq); ledger.build carries ledger_seq/close_* (not tx_count/tx_failed). - Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop the never-emitted duration_ms; rebuild the parent-child map accordingly. - Add the new spans the code emits: apply-pipeline stage spans (tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq., consensus sub-spans (round/establish/update_positions/check/phase.open), ledger.acquire, grpc., pathfind.. Conditional spans are marked optional so they are skipped (not failed) when the workload does not exercise them. - validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span attrs); add optional-span handling that skips missing optional spans while still validating attributes when present. - expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics, xrpld_job_count, the 15 on-disk xrpld- dashboard UIDs, and the real bare spanmetrics dimension labels. - regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message. Metrics pipeline fix: - Switch node [insight] config from server=statsd/prefix=rippled to server=otel + /v1/metrics endpoint + prefix=xrpld across run-full-validation.sh, xrpld-validator.cfg.template, benchmark.sh and the workload compose. The collector has no StatsD receiver, so system metrics only reach Prometheus over OTLP. Synthetic load for new spans: - Add ripple_path_find to the RPC load generator (drives pathfind.* spans). - Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.). All facts verified against the SpanNames.h headers and a live xrpld node + collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type], 279 xrpld_ Prometheus metrics and zero rippled_). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 17:08:58 +01:00
Pratik Mankawde	db5b93e2c4	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-05 12:50:09 +01:00
Pratik Mankawde	f37a4a1022	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # src/xrpld/app/misc/detail/TxQ.cpp	2026-06-05 12:49:38 +01:00
Pratik Mankawde	8f3974c094	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 12:48:40 +01:00
Pratik Mankawde	283fbaa54f	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/09-data-collection-reference.md	2026-06-05 12:48:31 +01:00
Pratik Mankawde	3167a49f41	feat(telemetry): derive per-stage tx metrics from apply-pipeline spans Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim, tx.transactor) added on phase-3 through the observability stack so the spanmetrics connector produces per-stage RED metrics without any native instruments. - collector: add the `stage` dimension to the spanmetrics connector so the three stages split into separate metric series (3 bounded values). - dashboard: add a "Tx Apply Pipeline" section to transaction-overview with rate, p95 latency, and failure-rate panels grouped by stage, plus a `stage` template variable. Panels follow the existing config (node filter, exported_instance legends, Title Case, axis labels). - The failure panel filters ter_result != tesSUCCESS rather than span status, because a failing ter code completes the span normally — only thrown exceptions set an error status. This matches the existing "Transaction Results by Type" panel convention. - docs: document the spans, attributes, and stage dimension in the data collection reference and runbook, including the sampling caveat that span-derived metrics inherit tracer head-sampling and undercount at sampling_ratio < 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 12:42:53 +01:00
Pratik Mankawde	759d3506b2	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-05 11:58:59 +01:00
Pratik Mankawde	021300538a	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-05 11:58:49 +01:00
Pratik Mankawde	a71d6635e6	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-05 11:58:43 +01:00
Pratik Mankawde	3df7e9cba6	code review changes and wire unused attributes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-05 11:42:33 +01:00
Pratik Mankawde	b9704c9549	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 16:23:47 +01:00
Pratik Mankawde	9c69aab326	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Resolve test conflict: keep xrpl.pb.h include (phase 9) and std::uint8_t qualifiers (phase 8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:23:39 +01:00
Pratik Mankawde	3eeb8b3730	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:22:40 +01:00
Pratik Mankawde	93c27997b4	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:22:35 +01:00
Pratik Mankawde	ac79a5123e	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Resolve runbook conflict: keep both phase 6 ledger/peer span tables AND new insights/sample queries section from the enrichment work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:22:20 +01:00
Pratik Mankawde	b0e9e1a24d	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-03 16:16:53 +01:00
Pratik Mankawde	bf0b843ce1	docs(telemetry): document Task 4.9 consensus span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:16:43 +01:00
Pratik Mankawde	fce770e4f4	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 16:15:43 +01:00
Pratik Mankawde	8dd5ac55e8	docs(telemetry): document Task 3.11 TX/TxQ span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:15:33 +01:00
Pratik Mankawde	507828edde	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing	2026-06-03 16:14:57 +01:00
Pratik Mankawde	aca6623f14	docs(telemetry): document Task 2.10 RPC/PathFind span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:14:49 +01:00
Pratik Mankawde	98fc939851	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 15:01:19 +01:00
Pratik Mankawde	4d6ddb5f1f	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:56:09 +01:00
Pratik Mankawde	cd6264c02f	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:51:39 +01:00
Pratik Mankawde	7aebc62223	clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:50:54 +01:00
Pratik Mankawde	6554f04252	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-01 14:49:13 +01:00
Pratik Mankawde	ce6a3153a1	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-01 11:49:43 +01:00
Pratik Mankawde	3115313551	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-01 11:49:30 +01:00
Pratik Mankawde	2e61a1c412	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-01 11:49:02 +01:00
Pratik Mankawde	046e2e2b85	minor doc update Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 11:48:47 +01:00
Pratik Mankawde	9f81e770eb	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 11:36:19 +01:00
Pratik Mankawde	e321f294e5	clang issues Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 19:22:07 +01:00
Pratik Mankawde	ba7e1f98e4	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:24:43 +01:00
Pratik Mankawde	e7dea147cd	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:18:36 +01:00
Pratik Mankawde	8d730b8b9a	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:16:35 +01:00
Pratik Mankawde	e5fae351d6	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-05-29 17:53:29 +01:00
Pratik Mankawde	a44d91ec27	leftover clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 17:52:45 +01:00
Pratik Mankawde	2f96c6547c	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:51:31 +01:00
Pratik Mankawde	c187a62353	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:47:15 +01:00
Pratik Mankawde	c848e51e13	Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:44:07 +01:00
Pratik Mankawde	8f9057729c	Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:14:21 +01:00
Pratik Mankawde	f031befc6e	compilation fixes and levelization fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:04:19 +01:00
Pratik Mankawde	4e0b6f5b9e	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-05-28 18:32:44 +01:00
Pratik Mankawde	53e8c4d54e	fix(docs): apply rename scripts to secure-OTel doc references Two stray "rippled" tokens introduced by `43258e8d` ("docs(telemetry): add secure-OTel pipeline analysis…") were caught by check-rename in CI. Re-run docs.sh to convert them to xrpld so the rename check passes on PR #6425 (and downstream PR #6426 once merged up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 18:27:58 +01:00

1 2 3 4 5

204 Commits