Commit Graph

14540 Commits

Author SHA1 Message Date
Pratik Mankawde
013252f210 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-03 17:25:22 +01:00
Pratik Mankawde
970914d2ce Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-03 17:25:22 +01:00
Pratik Mankawde
289b049b70 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-06-03 17:25:22 +01:00
Pratik Mankawde
4e422a0354 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-06-03 17:25:22 +01:00
Pratik Mankawde
36cae13352 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 17:25:22 +01:00
Pratik Mankawde
dfd67b8124 fix(telemetry): eliminate duplicate suppressed attribute on tx.receive span
The OTel C++ SDK's SetAttribute appends rather than overwrites on
in-flight spans. Setting suppressed=false as a default then overriding
to true resulted in both values appearing in the exported span.

Fix: remove the default-false set, place suppressed=false once after
the HashRouter check passes (non-suppressed path), and suppressed=true
remains only in the suppressed path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 17:23:59 +01:00
Pratik Mankawde
f60c995fe1 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-03 16:52:00 +01:00
Pratik Mankawde
fff8598a33 Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-03 16:52:00 +01:00
Pratik Mankawde
ac1805f0a4 feat(telemetry): add spanmetrics dimensions and dashboard panels for enriched attrs
Collector config: add tx_type, ter_result, txq_status, consensus_state,
load_type, is_batch as spanmetrics dimensions so they appear as
Prometheus labels for dashboard queries.

New dashboard panels:
- Transaction Overview: Rate by Type, Results by Type, TxQ Status (pie),
  Transactor Duration p95 by Type
- Consensus Health: Outcome Distribution (pie), Failures Over Time
- RPC Performance: Resource Cost by Command, Batch vs Single

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:51:51 +01:00
Pratik Mankawde
365907ab22 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-06-03 16:40:22 +01:00
Pratik Mankawde
8b5ded4324 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 16:40:22 +01:00
Pratik Mankawde
39f3b86d17 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-03 16:40:22 +01:00
Pratik Mankawde
2ef026aef5 Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-03 16:40:22 +01:00
Pratik Mankawde
03fffec640 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-06-03 16:40:22 +01:00
Pratik Mankawde
a13a858112 feat(telemetry): add tx.transactor span for per-transactor execution timing
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.

Adds libxrpl.tx > xrpl.telemetry levelization dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:40:10 +01:00
Pratik Mankawde
a4bc7bd611 Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-06-03 16:32:31 +01:00
Pratik Mankawde
8adb5d03da Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 16:32:31 +01:00
Pratik Mankawde
66552e7858 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-03 16:32:31 +01:00
Pratik Mankawde
2264a8427a Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-03 16:32:31 +01:00
Pratik Mankawde
c5bdaafc39 Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-06-03 16:32:31 +01:00
Pratik Mankawde
4b6c1c270f feat(telemetry): add tx.transactor span for per-transactor execution timing
Wraps Transactor::operator() with a span that captures tx_type,
ter_result, and applied. This is the universal dispatch point — every
transaction flows through it, giving per-type latency breakdown.

Adds libxrpl.tx > xrpl.telemetry levelization dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:32:16 +01:00
Pratik Mankawde
3eeb8b3730 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-03 16:22:40 +01:00
Pratik Mankawde
93c27997b4 Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-03 16:22:35 +01:00
Pratik Mankawde
ac79a5123e Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd
Resolve runbook conflict: keep both phase 6 ledger/peer span tables
AND new insights/sample queries section from the enrichment work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:22:20 +01:00
Pratik Mankawde
1b227a1eff docs(telemetry): update runbook with enriched attributes and sample queries
Adds comprehensive "Insights and Sample Queries" section showing operators
what questions they can answer with the newly-added span attributes:
- Transaction workflow analysis (filter by tx_type, fee, ter_result)
- TxQ health (txq_status, ledger_changed)
- RPC debugging (is_batch, request_payload_size, load_type)
- PathFinding performance (dest_currency, num_source_assets)
- Consensus health (consensus_state, is_bow_out, disputes_count)
- Cross-subsystem correlation examples

Also updates all span reference tables with the new attributes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:18:43 +01:00
Pratik Mankawde
b0e9e1a24d Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-06-03 16:16:53 +01:00
Pratik Mankawde
bf0b843ce1 docs(telemetry): document Task 4.9 consensus span attribute gap fill
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:16:43 +01:00
Pratik Mankawde
fce770e4f4 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 16:15:43 +01:00
Pratik Mankawde
8dd5ac55e8 docs(telemetry): document Task 3.11 TX/TxQ span attribute gap fill
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:15:33 +01:00
Pratik Mankawde
507828edde Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-06-03 16:14:57 +01:00
Pratik Mankawde
aca6623f14 docs(telemetry): document Task 2.10 RPC/PathFind span attribute gap fill
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:14:49 +01:00
Pratik Mankawde
765c96919c feat(telemetry): enrich consensus spans with state, disputes, and ledger_hash
Adds workflow-critical attributes to consensus spans:
- consensus.proposal.send: is_bow_out (identifies resignation proposals)
- consensus.accept: consensus_state (yes/moved_on/expired), disputes_count
- consensus.validation.send: ledger_hash (correlates validation to ledger)

Enables answering: "Did we reach consensus or time out?", "How many
disputes existed at acceptance?", "Which ledger did we validate?"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:09:41 +01:00
Pratik Mankawde
7a9215a4d3 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 16:07:02 +01:00
Pratik Mankawde
dd9cde88f3 fix(telemetry): qualify tx_span with telemetry:: namespace in apply()
The apply() function doesn't have a `using namespace telemetry` directive
(unlike processTransaction), so tx_span attrs need explicit qualification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:06:51 +01:00
Pratik Mankawde
e52f1470b6 Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing 2026-06-03 16:02:26 +01:00
Pratik Mankawde
1a2f9a71f5 feat(telemetry): add ter_result and applied attributes to tx.process span
Enriches the tx.process span with final outcome after batch application:
- ter_result: the TER code string (e.g., "tesSUCCESS", "tecPATH_DRY")
- applied: boolean whether the transaction was included in the ledger

These attributes complete the tx.process span lifecycle — it now captures
identity (tx_type, tx_hash), intent (fee, sequence), and outcome
(ter_result, applied) for full workflow traceability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 16:02:04 +01:00
Pratik Mankawde
ebf107e73c feat(telemetry): enrich TX and TxQ spans with tx_type, fee, sequence, and status
Adds workflow-identifying attributes to transaction lifecycle spans:
- tx.process: tx_type, fee (drops), sequence
- tx.receive: tx_type
- txq.enqueue: tx_type
- txq.accept.tx: txq_status (applied/failed/retried)
- txq.accept: ledger_changed

Enables filtering traces by transaction type (Payment, AMMDeposit, etc.)
and understanding TxQ outcomes without correlating tx_hash externally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 15:52:21 +01:00
Pratik Mankawde
d5f9242f84 Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing 2026-06-03 15:47:42 +01:00
Pratik Mankawde
84fc829be3 feat(telemetry): enrich RPC and PathFind spans with workflow-identifying attributes
Wire up span attributes that enable filtering/grouping traces by request
characteristics: batch detection, payload size, resource cost category,
command name on WS spans, and pathfinding search parameters (destination
amount/currency, source asset count).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-03 15:46:40 +01:00
Pratik Mankawde
25e08b1840 clang-tidy fixes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-06-02 10:46:27 +01:00
Pratik Mankawde
994e425804 more clang-tid fixes!
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-06-01 18:07:23 +01:00
Pratik Mankawde
bed6770751 Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-01 16:58:22 +01:00
Pratik Mankawde
dfdda305ee clang-tidy fixes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-06-01 16:58:05 +01:00
Pratik Mankawde
2ac93c504e fix(tests): rename make_Telemetry to telemetry::makeTelemetry in Peer.h
The project-wide rename check changed the factory function name but
missed this call site in the consensus simulation test framework.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-01 16:45:38 +01:00
Pratik Mankawde
47f3480fb9 fix(telemetry): replace indirect OTel includes with direct headers in Log.cpp
clang-tidy misc-include-cleaner requires each symbol to be provided by
a directly-included header. Replace the convenience trace/context.h
(which only provides GetSpan/SetSpan) with the specific headers for
kSpanKey, holds_alternative, get, shared_ptr, Span, and span.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-01 16:11:14 +01:00
Pratik Mankawde
20a6274a48 minor namespace fix
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-06-01 16:07:58 +01:00
Pratik Mankawde
7c09c38a5b Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation 2026-06-01 16:01:56 +01:00
Pratik Mankawde
c5ae16cdd9 Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics 2026-06-01 16:01:42 +01:00
Pratik Mankawde
1162b6f3bc Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd 2026-06-01 16:00:14 +01:00
Pratik Mankawde
0bcc7635ac Merge branch 'pratik/otel-phase4-consensus-tracing' into pratik/otel-phase5-docs-deployment 2026-06-01 16:00:00 +01:00