rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-07-23 23:20:33 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	91ff486950	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-04 16:18:13 +01:00
Pratik Mankawde	6205199dc7	docs(telemetry): list new instruments in MetricsRegistry class diagram Add the new synchronous counters (ledger_history_mismatch_total{reason}, txq_expired_total, txq_dropped_total{reason}) and the reduce-relay observable gauge to the ASCII ownership diagram in the MetricsRegistry header so the documented instrument inventory matches the code. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:15:20 +01:00
Pratik Mankawde	9376aa7c88	feat(telemetry): add reduce-relay efficiency gauge The transaction reduce-relay subsystem (selected vs suppressed peers, feature-disabled peers, missing-tx frequency) was computed in OverlayImpl's TxMetrics but only surfaced via the get_counts JSON RPC — invisible to Prometheus/Grafana, despite being the central efficiency KPI for the feature. Add an observable gauge xrpld_reduce_relay_metrics{metric} that reads Overlay::txMetrics() and parses its rolling-average fields: - selected_peers (txr_selected_cnt) - suppressed_peers (txr_suppressed_cnt) - not_enabled_peers (txr_not_enabled_cnt) - missing_tx_freq (txr_missing_tx_freq) The JSON values are decimal strings (std::to_string), parsed via std::stoll — the same JSON-reading pattern as registerNodeStoreGauge. No new Overlay accessor or core-interface change required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:14:33 +01:00
Pratik Mankawde	864ac729de	feat(telemetry): add ledger.acquire span for inbound ledger fetch InboundLedger drives ledger back-fill and fork recovery with timeout/retry logic (kLedgerTimeoutRetriesMax = 6), but emitted only a global ledger_fetches counter — sync/recovery cost was a telemetry blind spot. Add a ledger.acquire span that wraps the acquisition lifecycle: - Started in InboundLedger::init() with ledger_seq and acquire_reason (history / consensus / generic, mirroring InboundLedger::Reason). - Finalized in InboundLedger::done() with outcome (complete / failed), timeouts, and peer_count, then reset so the span duration is exported. Held as a std::optional<SpanGuard> member (same pattern as RCLConsensus roundSpan_). New op/attr/val constants added to LedgerSpanNames.h. Compiles to a no-op when telemetry is disabled via the SpanGuard fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:11:57 +01:00
Pratik Mankawde	793d2ecfce	feat(telemetry): add txq expired/dropped counters for queue backpressure The transaction queue had no metric for demand that leaves or never enters the queue, so fee-underpayment abandonment and admission-control rejection were invisible (distinct from jq_trans_overflow, which is the job queue). Add two synchronous counters via MetricsRegistry: - xrpld_txq_expired_total — incremented in TxQ::processClosedLedger() for each queued transaction removed because its LastLedgerSequence passed (submitters who under-bid the escalating fee and were never included). - xrpld_txq_dropped_total{reason} — incremented in TxQ::apply() at the queue-full admission-control returns (reason="queue_full"). Both reach MetricsRegistry via the Application& parameter already passed to these methods; calls are null-guarded so they no-op when telemetry is disabled. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:06:33 +01:00
Pratik Mankawde	7a509a01eb	feat(telemetry): add xrpld_ledger_history_mismatch_total{reason} counter LedgerHistory::handleMismatch() already classifies a built-vs-validated ledger mismatch (prior ledger, close time, consensus tx set, same/different tx set), but only bumped a single untyped beast::insight counter — the reason was dropped. Fork diagnosis was therefore a log-grep exercise. Add a labeled OTel counter so the mismatch reason is a queryable time series: - MetricsRegistry: new ledgerHistoryMismatchCounter_ + incrementLedgerHistoryMismatch(reason) - LedgerHistory: record one reason per classification branch (unknown, prior_ledger, close_time, consensus_txset, same_txset_diff_result, different_txset). Reaches MetricsRegistry via the existing app_ reference. The existing beast::insight mismatchCounter_ is left intact. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 16:02:26 +01:00
Pratik Mankawde	d3955d3639	fix(telemetry): emit real diverged-peer count for peers_insane_count The xrpld_peer_quality{metric="peers_insane_count"} gauge was hardcoded to 0.0 with a TODO, leaving the "Insane/Diverged Peers" panel permanently empty. PeerImp::json() already exposes the peer's tracking state via the "track" field (set to "diverged" when tracking_ == Tracking::Diverged). The peer-quality callback already iterates peer->json() for latency and version, so count peers whose "track" field equals "diverged" in the same loop — no change to the abstract Peer interface required. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-04 15:53:53 +01:00
Pratik Mankawde	1ccc1bd286	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-04 14:10:27 +01:00
Pratik Mankawde	859bd21ca5	only render p100. Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-04 12:16:48 +01:00
Pratik Mankawde	cddb220221	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 17:25:22 +01:00
Pratik Mankawde	56d33fc87f	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-03 17:25:22 +01:00
Pratik Mankawde	013252f210	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 17:25:22 +01:00
Pratik Mankawde	970914d2ce	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 17:25:22 +01:00
Pratik Mankawde	289b049b70	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-03 17:25:22 +01:00
Pratik Mankawde	36cae13352	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 17:25:22 +01:00
Pratik Mankawde	dfd67b8124	fix(telemetry): eliminate duplicate suppressed attribute on tx.receive span The OTel C++ SDK's SetAttribute appends rather than overwrites on in-flight spans. Setting suppressed=false as a default then overriding to true resulted in both values appearing in the exported span. Fix: remove the default-false set, place suppressed=false once after the HashRouter check passes (non-suppressed path), and suppressed=true remains only in the suppressed path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 17:23:59 +01:00
Pratik Mankawde	a13a858112	feat(telemetry): add tx.transactor span for per-transactor execution timing Wraps Transactor::operator() with a span that captures tx_type, ter_result, and applied. This is the universal dispatch point — every transaction flows through it, giving per-type latency breakdown. Adds libxrpl.tx > xrpl.telemetry levelization dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:40:10 +01:00
Pratik Mankawde	f6b4d945d8	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-03 16:39:07 +01:00
Pratik Mankawde	146ea1455b	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 16:32:37 +01:00
Pratik Mankawde	d6fe31442e	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-03 16:32:36 +01:00
Pratik Mankawde	8adb5d03da	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 16:32:31 +01:00
Pratik Mankawde	66552e7858	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:32:31 +01:00
Pratik Mankawde	2264a8427a	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:32:31 +01:00
Pratik Mankawde	c5bdaafc39	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-06-03 16:32:31 +01:00
Pratik Mankawde	4b6c1c270f	feat(telemetry): add tx.transactor span for per-transactor execution timing Wraps Transactor::operator() with a span that captures tx_type, ter_result, and applied. This is the universal dispatch point — every transaction flows through it, giving per-type latency breakdown. Adds libxrpl.tx > xrpl.telemetry levelization dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:32:16 +01:00
Pratik Mankawde	945355d6c6	fix(build): remove unused includes in Application.cpp Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:29:21 +01:00
Pratik Mankawde	b9704c9549	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-03 16:23:47 +01:00
Pratik Mankawde	9c69aab326	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Resolve test conflict: keep xrpl.pb.h include (phase 9) and std::uint8_t qualifiers (phase 8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:23:39 +01:00
Pratik Mankawde	3eeb8b3730	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-03 16:22:40 +01:00
Pratik Mankawde	93c27997b4	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-06-03 16:22:35 +01:00
Pratik Mankawde	ac79a5123e	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd Resolve runbook conflict: keep both phase 6 ledger/peer span tables AND new insights/sample queries section from the enrichment work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:22:20 +01:00
Pratik Mankawde	765c96919c	feat(telemetry): enrich consensus spans with state, disputes, and ledger_hash Adds workflow-critical attributes to consensus spans: - consensus.proposal.send: is_bow_out (identifies resignation proposals) - consensus.accept: consensus_state (yes/moved_on/expired), disputes_count - consensus.validation.send: ledger_hash (correlates validation to ledger) Enables answering: "Did we reach consensus or time out?", "How many disputes existed at acceptance?", "Which ledger did we validate?" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:09:41 +01:00
Pratik Mankawde	7a9215a4d3	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 16:07:02 +01:00
Pratik Mankawde	dd9cde88f3	fix(telemetry): qualify tx_span with telemetry:: namespace in apply() The apply() function doesn't have a `using namespace telemetry` directive (unlike processTransaction), so tx_span attrs need explicit qualification. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:06:51 +01:00
Pratik Mankawde	e52f1470b6	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-06-03 16:02:26 +01:00
Pratik Mankawde	1a2f9a71f5	feat(telemetry): add ter_result and applied attributes to tx.process span Enriches the tx.process span with final outcome after batch application: - ter_result: the TER code string (e.g., "tesSUCCESS", "tecPATH_DRY") - applied: boolean whether the transaction was included in the ledger These attributes complete the tx.process span lifecycle — it now captures identity (tx_type, tx_hash), intent (fee, sequence), and outcome (ter_result, applied) for full workflow traceability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 16:02:04 +01:00
Pratik Mankawde	ebf107e73c	feat(telemetry): enrich TX and TxQ spans with tx_type, fee, sequence, and status Adds workflow-identifying attributes to transaction lifecycle spans: - tx.process: tx_type, fee (drops), sequence - tx.receive: tx_type - txq.enqueue: tx_type - txq.accept.tx: txq_status (applied/failed/retried) - txq.accept: ledger_changed Enables filtering traces by transaction type (Payment, AMMDeposit, etc.) and understanding TxQ outcomes without correlating tx_hash externally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 15:52:21 +01:00
Pratik Mankawde	d5f9242f84	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing	2026-06-03 15:47:42 +01:00
Pratik Mankawde	84fc829be3	feat(telemetry): enrich RPC and PathFind spans with workflow-identifying attributes Wire up span attributes that enable filtering/grouping traces by request characteristics: batch detection, payload size, resource cost category, command name on WS spans, and pathfinding search parameters (destination amount/currency, source asset count). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-06-03 15:46:40 +01:00
Pratik Mankawde	e07a0c347f	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-02 11:13:17 +01:00
Pratik Mankawde	25e08b1840	clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-02 10:46:27 +01:00
Pratik Mankawde	66e6310b56	more clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 19:24:20 +01:00
Pratik Mankawde	11717a5431	build fixed Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 18:13:10 +01:00
Pratik Mankawde	994e425804	more clang-tid fixes! Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 18:07:23 +01:00
Pratik Mankawde	e804ec83aa	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-06-01 17:03:52 +01:00
Pratik Mankawde	ece8c62bca	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-06-01 17:01:05 +01:00
Pratik Mankawde	bed6770751	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-06-01 16:58:22 +01:00
Pratik Mankawde	dfdda305ee	clang-tidy fixes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-06-01 16:58:05 +01:00
Pratik Mankawde	2ac93c504e	fix(tests): rename make_Telemetry to telemetry::makeTelemetry in Peer.h The project-wide rename check changed the factory function name but missed this call site in the consensus simulation test framework. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-01 16:45:38 +01:00
Pratik Mankawde	124e3a154d	fix(tests): add getMetricsRegistry() override to TestServiceRegistry ServiceRegistry gained the pure virtual getMetricsRegistry() in phase 7 but TestServiceRegistry was never updated. Returns nullptr since tests don't need a real MetricsRegistry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-01 16:32:43 +01:00

1 2 3 4 5 ...

10319 Commits