rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-27 00:50:45 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	cb9fce6890	fix(telemetry): align Phase 10 workload harness with current OTel recording surface + fix CI The Phase 10 validation harness had drifted from the code's recording surface and the telemetry-validation CI job was failing before it could build. CI fix (telemetry-validation.yml): - Replace nonexistent local action ./.github/actions/print-env with the remote XRPLF/actions/print-build-env (the build-xrpld job failed in 56s on this). - Sync prepare-runner and upload-artifact action SHAs to the canonical workflow. Recording-surface reconciliation (docker/telemetry/workload/): - Migrate span attributes from dotted xrpl.<domain>.<field> to the bare/underscore form introduced by the 2026-05-13 span-attr naming redesign (tx_hash, peer_id, ledger_seq, consensus_mode, consensus_round, full_validation, quorum, ...). Dotted xrpl.ledger.hash is retained only on peer.validation.receive (shared constant), while consensus.validation.send uses bare ledger_hash. - Fix attribute placement: tx.apply carries tx_count/tx_failed (not ledger_seq); ledger.build carries ledger_seq/close_* (not tx_count/tx_failed). - Replace the phantom rpc.request span with the real WS root rpc.ws_message; drop the never-emitted duration_ms; rebuild the parent-child map accordingly. - Add the new spans the code emits: apply-pipeline stage spans (tx.preflight/preclaim/transactor with stage/tx_type/ter_result), txq., consensus sub-spans (round/establish/update_positions/check/phase.open), ledger.acquire, grpc., pathfind.. Conditional spans are marked optional so they are skipped (not failed) when the workload does not exercise them. - validate_telemetry.py: service.name and Loki job label rippled -> xrpld; fix PARITY_SPAN_ATTRS (rename the 4 real attrs, drop the 3 that are metrics not span attrs); add optional-span handling that skips missing optional spans while still validating attributes when present. - expected_metrics.json: rippled_ -> xrpld_ on all beast::insight/overlay metrics, xrpld_job_count, the 15 on-disk xrpld- dashboard UIDs, and the real bare spanmetrics dimension labels. - regression-metrics.json + baseline-timings.json: rpc.request -> rpc.ws_message. Metrics pipeline fix: - Switch node [insight] config from server=statsd/prefix=rippled to server=otel + /v1/metrics endpoint + prefix=xrpld across run-full-validation.sh, xrpld-validator.cfg.template, benchmark.sh and the workload compose. The collector has no StatsD receiver, so system metrics only reach Prometheus over OTLP. Synthetic load for new spans: - Add ripple_path_find to the RPC load generator (drives pathfind.* spans). - Add a high-TPS txq-burst workload phase to force fee escalation (drives txq.). All facts verified against the SpanNames.h headers and a live xrpld node + collector (Tempo service.name=xrpld, tx.preflight attrs [stage,ter_result,tx_type], 279 xrpld_ Prometheus metrics and zero rippled_). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 17:08:58 +01:00
Pratik Mankawde	db5b93e2c4	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-05 12:50:09 +01:00
Pratik Mankawde	f37a4a1022	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill # Conflicts: # src/xrpld/app/misc/detail/TxQ.cpp	2026-06-05 12:49:38 +01:00
Pratik Mankawde	8f3974c094	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-05 12:48:40 +01:00
Pratik Mankawde	283fbaa54f	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/09-data-collection-reference.md	2026-06-05 12:48:31 +01:00
Pratik Mankawde	3167a49f41	feat(telemetry): derive per-stage tx metrics from apply-pipeline spans Wire the apply-pipeline stage spans (tx.preflight, tx.preclaim, tx.transactor) added on phase-3 through the observability stack so the spanmetrics connector produces per-stage RED metrics without any native instruments. - collector: add the `stage` dimension to the spanmetrics connector so the three stages split into separate metric series (3 bounded values). - dashboard: add a "Tx Apply Pipeline" section to transaction-overview with rate, p95 latency, and failure-rate panels grouped by stage, plus a `stage` template variable. Panels follow the existing config (node filter, exported_instance legends, Title Case, axis labels). - The failure panel filters ter_result != tesSUCCESS rather than span status, because a failing ter code completes the span normally — only thrown exceptions set an error status. This matches the existing "Transaction Results by Type" panel convention. - docs: document the spans, attributes, and stage dimension in the data collection reference and runbook, including the sampling caveat that span-derived metrics inherit tracer head-sampling and undercount at sampling_ratio < 1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 12:42:53 +01:00
Pratik Mankawde	759d3506b2	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-05 11:58:59 +01:00
Pratik Mankawde	021300538a	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-05 11:58:49 +01:00
Pratik Mankawde	a71d6635e6	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-05 11:58:43 +01:00
Pratik Mankawde	3df7e9cba6	code review changes and wire unused attributes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-05 11:42:33 +01:00
Pratik Mankawde	b9704c9549	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 16:23:47 +01:00
Pratik Mankawde	9c69aab326	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Resolve test conflict: keep xrpl.pb.h include (phase 9) and std::uint8_t qualifiers (phase 8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:23:39 +01:00
Pratik Mankawde	3eeb8b3730	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:22:40 +01:00
Pratik Mankawde	93c27997b4	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:22:35 +01:00
Pratik Mankawde	ac79a5123e	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Resolve runbook conflict: keep both phase 6 ledger/peer span tables AND new insights/sample queries section from the enrichment work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:22:20 +01:00
Pratik Mankawde	b0e9e1a24d	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-03 16:16:53 +01:00
Pratik Mankawde	bf0b843ce1	docs(telemetry): document Task 4.9 consensus span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:16:43 +01:00
Pratik Mankawde	fce770e4f4	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 16:15:43 +01:00
Pratik Mankawde	8dd5ac55e8	docs(telemetry): document Task 3.11 TX/TxQ span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:15:33 +01:00
Pratik Mankawde	507828edde	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing	2026-06-03 16:14:57 +01:00
Pratik Mankawde	aca6623f14	docs(telemetry): document Task 2.10 RPC/PathFind span attribute gap fill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:14:49 +01:00
Pratik Mankawde	98fc939851	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 15:01:19 +01:00
Pratik Mankawde	4d6ddb5f1f	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:56:09 +01:00
Pratik Mankawde	cd6264c02f	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:51:39 +01:00
Pratik Mankawde	7aebc62223	clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 14:50:54 +01:00
Pratik Mankawde	6554f04252	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-01 14:49:13 +01:00
Pratik Mankawde	ce6a3153a1	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-01 11:49:43 +01:00
Pratik Mankawde	3115313551	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-06-01 11:49:30 +01:00
Pratik Mankawde	2e61a1c412	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-01 11:49:02 +01:00
Pratik Mankawde	046e2e2b85	minor doc update Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 11:48:47 +01:00
Pratik Mankawde	9f81e770eb	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 11:36:19 +01:00
Pratik Mankawde	e321f294e5	clang issues Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 19:22:07 +01:00
Pratik Mankawde	ba7e1f98e4	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:24:43 +01:00
Pratik Mankawde	e7dea147cd	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:18:36 +01:00
Pratik Mankawde	8d730b8b9a	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 18:16:35 +01:00
Pratik Mankawde	e5fae351d6	Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment	2026-05-29 17:53:29 +01:00
Pratik Mankawde	a44d91ec27	leftover clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 17:52:45 +01:00
Pratik Mankawde	2f96c6547c	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:51:31 +01:00
Pratik Mankawde	c187a62353	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:47:15 +01:00
Pratik Mankawde	c848e51e13	Merge branch 'pratik/otel-phase1c-rpc-integration' into pratik/otel-phase2-rpc-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:44:07 +01:00
Pratik Mankawde	8f9057729c	Merge branch 'pratik/otel-phase1b-telemetry-infra' into pratik/otel-phase1c-rpc-integration Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:14:21 +01:00
Pratik Mankawde	f031befc6e	compilation fixes and levelization fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-29 16:04:19 +01:00
Pratik Mankawde	4e0b6f5b9e	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-05-28 18:32:44 +01:00
Pratik Mankawde	53e8c4d54e	fix(docs): apply rename scripts to secure-OTel doc references Two stray "rippled" tokens introduced by `43258e8d` ("docs(telemetry): add secure-OTel pipeline analysis…") were caught by check-rename in CI. Re-run docs.sh to convert them to xrpld so the rename check passes on PR #6425 (and downstream PR #6426 once merged up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 18:27:58 +01:00
Pratik Mankawde	5700eeed1b	renaming and namespace updates Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 17:52:35 +01:00
Pratik Mankawde	7ac5343119	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 16:09:41 +01:00
Pratik Mankawde	954223958f	renames Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 16:07:34 +01:00
Pratik Mankawde	c6c019ed8b	addressed code review comments Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 15:55:25 +01:00
Pratik Mankawde	43258e8dc0	docs(telemetry): add secure-OTel pipeline analysis and link into plan Document the threat model and chosen hardening approach for the OTel pipeline: mTLS to the collector as primary defense (across-network deployment), NetworkPolicy as defense-in-depth, and source-side validation plus per-peer rate limiting for protocol::TraceContext on peer messages. Skips Basic Auth (wrong shape for multi-operator fleet) and HTTP-gateway header stripping (rippled is P2P). Wires the new doc into the master plan ToC, mermaid diagram, and body section, plus cross-refs from the privacy section in 02-design-decisions.md and the collector config in 05-configuration-reference.md so readers reach it from natural in-context entry points. Adds a backlink at the top of secure-OTel.md to the master plan. Adds 'exfiltration' and 'htpasswd' to cspell dictionary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 12:33:16 +01:00
Pratik Mankawde	4bd1176df5	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-28 11:38:05 +01:00

1 2 3 4

198 Commits