rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-06-03 00:36:48 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	34bf61ff77	merge: pratik/otel-phase9-metric-gap-fill fix(SpanKind) into pratik/otel-phase10-workload-validation # Conflicts: # docker/telemetry/otel-collector-config.yaml # docker/telemetry/xrpld-telemetry.cfg	2026-05-14 15:59:39 +01:00
Pratik Mankawde	53e1ff82d8	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-14 14:01:46 +01:00
Pratik Mankawde	8df3ea1bbe	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-14 14:01:41 +01:00
Pratik Mankawde	5a6882f119	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # docker/telemetry/otel-collector-config.yaml	2026-05-14 14:01:36 +01:00
Pratik Mankawde	b449db0434	fix(telemetry): align spanmetrics dimensions, Tempo tags, and dashboard queries with C++ attribute names Spanmetrics dimensions used xrpl.rpc.command etc. but C++ emits bare "command". Tempo tags for phase6-added consensus/tx/peer filters used qualified names but C++ uses bare names. Dashboard panel referenced xrpl_tx_suppressed (never populated) instead of suppressed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 14:01:12 +01:00
Pratik Mankawde	9babfff3c8	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-05-14 13:59:19 +01:00
Pratik Mankawde	61ab5c6fe3	fix(telemetry): align Tempo consensus search tags with C++ attribute names Consensus span attributes use bare names (close_time_correct, consensus_state, close_resolution_ms) and shared canonical attrs (xrpl.ledger.seq) per SpanNames.h. xrpl.consensus.mode and xrpl.consensus.round are correct (domain-qualified to avoid collision). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 13:59:08 +01:00
Pratik Mankawde	837f7e7b50	Merge branch 'pratik/otel-phase3-tx-tracing' into pratik/otel-phase4-consensus-tracing	2026-05-14 13:58:38 +01:00
Pratik Mankawde	b392035544	fix(telemetry): align Tempo TX search tags with C++ attribute names Transaction span attributes use bare names (local, tx_status) per SpanNames.h convention, not xrpl.tx.* qualified names. xrpl.tx.hash is correct (shared canonical attr defined in SpanNames.h). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 13:58:31 +01:00
Pratik Mankawde	450004ebd8	Merge branch 'pratik/otel-phase2-rpc-tracing' into pratik/otel-phase3-tx-tracing	2026-05-14 13:58:19 +01:00
Pratik Mankawde	6f403fdd1b	fix(telemetry): align Tempo search tags with C++ span attribute names RPC span attributes use bare names (command, rpc_status, rpc_role) per the naming convention in SpanNames.h, not xrpl.rpc.* qualified names. Node health attributes (amendment_blocked, server_state) are resource attributes set at Tracer init, not span attributes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 13:58:13 +01:00
Pratik Mankawde	5dc4ae8fcc	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-14 13:49:59 +01:00
Pratik Mankawde	690841e934	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-14 13:49:51 +01:00
Pratik Mankawde	7d61a4a0ef	feat(telemetry): add missing Phase 9 metric panels to dashboards 13 metrics from 09-data-collection-reference.md were not displayed on any Grafana dashboard. Adds panels for all of them: system-node-health.json (+7 panels): - NodeStore Bytes Read/Written (node_written_bytes, node_read_bytes) - NodeStore Read Threads & Duration (node_reads_duration_us, read_request_bundle, read_threads_running, read_threads_total) - AL_size added to Cache Sizes panel - Current Ledger Index (ledger_current_index) - NuDB Storage Size (storage_detail{metric="nudb_bytes"}) rippled-validator-health.json (+2 panels): - UNL Blocked (validator_health{metric="unl_blocked"}) - Agreement/Missed Counters Rate (validation_agreements_total, validation_missed_total) rippled-job-queue.json (+1 panel): - Transaction Overflow Rate (jq_trans_overflow_total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 13:32:55 +01:00
Pratik Mankawde	93caaba5ca	fix(telemetry): recover Phase 6 dashboard panels lost during statsd→system rename Panels 8-15 from statsd-node-health.json and panels 8-9 from statsd-network-traffic.json were lost when Phase 7 renamed these files to system-*. The merge (`5cd71ed107`) took Phase 7's smaller version without the extra panels added by commit `b933e8ae00` on Phase 6. Recovered panels (system-node-health.json): - Key Jobs Execution Time (11 job types) - Key Jobs Dequeue Wait Time (11 job types) - FullBelowCache Size - FullBelowCache Hit Rate - Ledger Publish Gap (validated - published age delta) - State Duration Rate (Full vs Tracking) - All Jobs Execution Time Detail (34 job types) - All Jobs Dequeue Wait Detail (34 job types) Recovered panels (system-network-traffic.json): - Duplicate Traffic (Wasted Bandwidth) - All Traffic Categories Detail (topk 15 by byte rate) All recovered panels updated to include exported_instance=~"$node" filter per project dashboard guidelines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 12:33:18 +01:00
Pratik Mankawde	02fe838257	auto refresh at 5seconds Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 19:00:36 +01:00
Pratik Mankawde	20477e5494	validator path changes Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 18:49:21 +01:00
Pratik Mankawde	f0c6227c06	added config for devnet test run Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 18:42:57 +01:00
Pratik Mankawde	a04459f1f8	fix(telemetry): update collector config + tempo datasource + design doc for simplified attr names - otel-collector-config.yaml: spanmetrics dimensions use new bare names. - tempo.yaml: TraceQL filter tags use new bare names. - 02-design-decisions.md: strip xrpl.txq.* prefix from planned attrs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 16:47:36 +01:00
Pratik Mankawde	815e2b1f5d	refactor(telemetry): fix remaining old attr refs in tests, docs, workload - Update Telemetry.h doc example: xrpl.rpc.command -> command. - Update SpanGuardFactory.cpp test: use new bare attr names. - Update TESTING.md: rename attr refs in span table + PromQL example. - Update expected_spans.json: all attrs match simplified naming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 16:21:18 +01:00
Pratik Mankawde	ec8e3e2950	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-05-13 16:17:49 +01:00
Pratik Mankawde	495d5bd8a0	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 16:17:12 +01:00
Pratik Mankawde	6cd910f06f	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-13 16:17:05 +01:00
Pratik Mankawde	5cd71ed107	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics	2026-05-13 16:16:50 +01:00
Pratik Mankawde	9e27120a15	refactor(telemetry): simplify ledger/peer attr naming on phase-6, update dashboards - Add canonical ledgerHash (xrpl.ledger.hash) to SpanNames.h. - LedgerSpanNames: reuse shared canonicals (ledgerSeq, closeTime, closeTimeCorrect, closeResolutionMs, ledgerHash); bare names for tx_count, tx_failed, validations. - PeerSpanNames: reuse shared canonicals (peerId, ledgerHash); bare names for proposal_trusted, validation_full, validation_trusted. - Update call sites in BuildLedger.cpp, LedgerMaster.cpp, PeerImp.cpp. - Update 5 Grafana dashboards: strip xrpl.<domain>. prefix from per-span attr refs in PromQL/TraceQL queries. Keep rule-5 entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 16:16:30 +01:00
Pratik Mankawde	592e546f82	fix(telemetry): align Phase 10 workload configs with xrpld_ metric prefix Phase 10's workload validation configs (expected_metrics.json, regression-metrics.json, validate_telemetry.py) queried the MetricsRegistry metrics under the rippled_ prefix, but MetricsRegistry emits them as xrpld_ (see MetricsRegistry.cpp). On a live run the workload validator reported every MetricsRegistry metric as missing, masking genuine regressions. Rename the following to xrpld_ across the workload validator, expected-metrics manifest, and regression-metrics template: - nodestore_state, cache_metrics, txq_metrics, load_factor_metrics, object_count - rpc_method_started_total / _finished_total / _errored_total / _duration_us - job_queued_total / _started_total / _finished_total / _queued_duration_us_bucket / _running_duration_us_bucket - peer_quality, server_info, validator_health, ledger_economy, db_metrics, complete_ledgers, build_info, state_tracking, storage_detail - ledgers_closed_total, validations_sent_total, validations_checked_total, state_changes_total - validation_agreement, validation_agreements_total, validation_missed_total Mirrors the phase-9 fix in commit `5601615952`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-13 15:01:13 +01:00
Pratik Mankawde	201da0e00d	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-05-13 14:59:45 +01:00
Pratik Mankawde	5601615952	fix(telemetry): align Phase 9 dashboards and integration-test with xrpld_ metric prefix MetricsRegistry emits OTel SDK metrics with the xrpld_ prefix (MetricsRegistry.cpp defines "xrpld_nodestore_state", "xrpld_cache_metrics", etc.), but the Phase 9 dashboards and the Step 10c integration-test assertions introduced in `892fee638a` queried the rippled_ prefix. Every Phase 9 panel and assertion therefore rendered "No data" or failed on a live run, even though the underlying series were being exported correctly. Rename the rippled_ prefix to xrpld_ for every MetricsRegistry metric in dashboards and the integration test: - nodestore_state, cache_metrics, txq_metrics, load_factor_metrics, object_count - rpc_method_started_total / _finished_total / _errored_total / _duration_us_bucket - job_queued_total / _started_total / _finished_total / _queued_duration_us_bucket / _running_duration_us_bucket - peer_quality, server_info, validator_health, ledger_economy, db_metrics, complete_ledgers, build_info, state_tracking - ledgers_closed_total, validations_sent_total, validations_checked_total, state_changes_total - validation_agreement (ValidationTracker 1h/24h/7d windows) Also add ValidationTracker window-gauge assertions to Step 10c of integration-test.sh so the 1h/24h/7d agreement and miss counts are checked alongside the other Phase 9 gauges. The rippled_ prefix is preserved for beast::insight metrics (rippled_LedgerMaster_, rippled_Peer_Finder_, rippled_total_, rippled_Overlay_, rippled_State_Accounting_, rippled_transactions_, rippled_proposals_, rippled_validations_Messages_) because those flow through the StatsD-style OTelCollector configured with `[insight] prefix=rippled` and remain on that prefix by design. Verified against a live 6-node consensus network: all 22 Phase 9 + ValidationTracker assertions now report 6+ series per metric. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-13 14:59:00 +01:00
Pratik Mankawde	8e9e852b74	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-05-13 12:24:15 +01:00
Pratik Mankawde	db04120f74	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 12:24:00 +01:00
Pratik Mankawde	fac3287912	fix(telemetry): use .batches for Tempo trace lookup in integration test Tempo /api/traces/{id} returns OTLP-shaped JSON with a top-level "batches" key, not "data". The cross-check in check_log_correlation was querying jq '.data \| length' which always returned null, causing the Log-Tempo cross-check to fail even when the trace existed.	2026-05-13 12:16:41 +01:00
Pratik Mankawde	782d98d249	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 11:40:15 +01:00
Pratik Mankawde	c096eeb239	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 11:30:22 +01:00
Pratik Mankawde	e49c5997b7	added loki config. Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-06 17:37:43 +01:00
Pratik Mankawde	85330920ac	feat(telemetry): add Loki service and filelog receiver for Phase 8 log ingestion Cherry-pick Loki infrastructure from phase-10 back to where it belongs (Phase 8, Tasks 8.2/8.3): - Add Loki 3.4.2 service to docker-compose.yml (port 3100) - Add filelog receiver to OTel Collector config (tails debug.log, regex_parser extracts trace_id/span_id/partition/severity) - Add otlphttp/loki exporter (uses Loki 3.x native OTLP ingestion) - Add logs pipeline: filelog -> batch -> otlphttp/loki - Add health_check extension - Mount xrpld log directory into collector container - Add prometheus-data and loki-data persistent volumes StatsD receiver intentionally excluded — Phase 7 migrated to native OTLP metrics, making the StatsD receiver unnecessary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:55:45 +01:00
Pratik Mankawde	fac6c3ac1d	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation	2026-05-06 14:34:17 +01:00
Pratik Mankawde	a8549a7ab2	fix(telemetry): address code review findings for Phase 8 log-trace correlation - Replace GetSpan() with direct context value check in Logs::format() to avoid heap allocation (new DefaultSpan) on the no-span path - Restore Phase 7 documentation accidentally deleted during merge - Fix undefined $JAEGER variable → use $TEMPO in integration test - Remove useless LCOV_EXCL markers around #ifdef block - Fix indentation inconsistencies in Log.cpp injection block - Remove incorrect url field from loki.yaml derivedFields - Update stale code sample in Phase8_taskList.md to match implementation - Correct "<10ns" performance claims to accurate ~15-20ns (no-span) and ~50ns (active-span) measurements across all docs - Replace Jaeger references with Tempo in TESTING.md (port 16686→3200) - Improve error handling in check_log_correlation(): track files_scanned, detect missing log files, fix silent grep error masking Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:32:46 +01:00
Pratik Mankawde	761688383d	fix(telemetry): address code review issues in OTelCollector - Fix use-after-free: extract gauge callback to static function and call RemoveCallback in ~OTelGaugeImpl() before unregistering from collector - Use memory_order_acq_rel on callHooks() debounce CAS for proper happens-before relationship between hook invocations - Add explicit 2s timeout to ForceFlush() in destructor to prevent blocking indefinitely when OTLP endpoint is unreachable at shutdown - Add OTLP receiver to metrics pipeline so native OTel metrics from xrpld are actually received by the collector - Remove stale health check port from docker-compose (extension was removed from collector config) - Clarify fallback docs: StatsD path requires re-enabling receiver/port - Fix comments: Counter uses uint64_t not int64_t, gauge clamps to [0, INT64_MAX] not [0, UINT64_MAX] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 14:24:52 +01:00
Pratik Mankawde	a0477f9475	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation	2026-04-29 21:11:03 +01:00
Pratik Mankawde	1658d3dc40	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-04-29 21:09:47 +01:00
Pratik Mankawde	8e7a2d6c53	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/08-appendix.md # OpenTelemetryPlan/OpenTelemetryPlan.md	2026-04-29 21:07:32 +01:00
Pratik Mankawde	9adcc49171	fix: re-apply phase-7 doc/config changes lost during merge Re-applies phase-7 unique modifications to documentation and configuration files that were overwritten when taking phase-6's versions during the merge conflict resolution. Changes: - docker-compose.yml: comment out StatsD port 8125, add OTLP notes - otel-collector-config.yaml: remove StatsD receiver, update pipeline - integration-test.sh: server=otel, check_otel_metric, StatsD port check - telemetry-runbook.md: System Metrics section, server=otel config, troubleshooting for missing OTel metrics - 02-design-decisions.md: Phase 7 coexistence strategy notes - 05-configuration-reference.md: OTel System Metrics correlation - 06-implementation-phases.md: add Phase 7 section (~180 lines) - OpenTelemetryPlan.md: update phases table (7 phases, 60.6 days) - 08-appendix.md: add Phase7_taskList.md to document index - Delete 5 statsd-.json dashboards (replaced by system-.json) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 21:05:48 +01:00
Pratik Mankawde	8e44c95d6a	fix: address bashate warnings in benchmark.sh (E042/E044) Separate local declarations from assignments to avoid hiding errors, and use [[ instead of [ for non-POSIX comparisons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:42:01 +01:00
Pratik Mankawde	b659d43395	fix: address CI rename checks (rippled -> xrpld) in phase-10 docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:40:44 +01:00
Pratik Mankawde	70d86d7ebf	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/09-data-collection-reference.md # OpenTelemetryPlan/OpenTelemetryPlan.md # docker/telemetry/docker-compose.yml # docker/telemetry/grafana/dashboards/statsd-network-traffic.json # docker/telemetry/otel-collector-config.yaml # src/xrpld/overlay/detail/PeerImp.cpp	2026-04-29 20:38:00 +01:00
Pratik Mankawde	9e12e660fe	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:25:13 +01:00
Pratik Mankawde	7ab6f4d34b	fix: address CI rename checks (rippled -> xrpld) in phase-8 docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:09:43 +01:00
Pratik Mankawde	81b47afde7	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/08-appendix.md # OpenTelemetryPlan/OpenTelemetryPlan.md # docker/telemetry/grafana/dashboards/statsd-network-traffic.json # docker/telemetry/grafana/dashboards/statsd-node-health.json # docker/telemetry/grafana/dashboards/statsd-rpc-pathfinding.json	2026-04-29 20:07:43 +01:00
Pratik Mankawde	769668579a	Merge branch 'pratik/otel-phase6-statsd' into pratik/otel-phase7-native-metrics # Conflicts: # .codecov.yml # .github/scripts/levelization/results/ordering.txt # .github/workflows/reusable-clang-tidy-files.yml # CMakeLists.txt # OpenTelemetryPlan/00-tracing-fundamentals.md # OpenTelemetryPlan/01-architecture-analysis.md # OpenTelemetryPlan/02-design-decisions.md # OpenTelemetryPlan/03-implementation-strategy.md # OpenTelemetryPlan/04-code-samples.md # OpenTelemetryPlan/05-configuration-reference.md # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/07-observability-backends.md # OpenTelemetryPlan/08-appendix.md # OpenTelemetryPlan/09-data-collection-reference.md # OpenTelemetryPlan/OpenTelemetryPlan.md # OpenTelemetryPlan/POC_taskList.md # OpenTelemetryPlan/Phase2_taskList.md # OpenTelemetryPlan/Phase3_taskList.md # OpenTelemetryPlan/Phase4_taskList.md # OpenTelemetryPlan/Phase5_IntegrationTest_taskList.md # OpenTelemetryPlan/Phase5_taskList.md # OpenTelemetryPlan/presentation.md # cfg/xrpld-example.cfg # conan.lock # conanfile.py # cspell.config.yaml # docker/telemetry/TESTING.md # docker/telemetry/docker-compose.yml # docker/telemetry/grafana/dashboards/consensus-health.json # docker/telemetry/grafana/dashboards/transaction-overview.json # docker/telemetry/grafana/provisioning/dashboards/dashboards.yaml # docker/telemetry/grafana/provisioning/datasources/tempo.yaml # docker/telemetry/integration-test.sh # docker/telemetry/otel-collector-config.yaml # docker/telemetry/tempo.yaml # docker/telemetry/xrpld-telemetry.cfg # docs/build/telemetry.md # docs/telemetry-runbook.md # include/xrpl/core/ServiceRegistry.h # include/xrpl/protocol/detail/features.macro # include/xrpl/telemetry/SpanGuard.h # include/xrpl/telemetry/Telemetry.h # include/xrpl/telemetry/TraceContextPropagator.h # src/libxrpl/basics/MallocTrim.cpp # src/libxrpl/nodestore/backend/MemoryFactory.cpp # src/libxrpl/nodestore/backend/NuDBFactory.cpp # src/libxrpl/nodestore/backend/RocksDBFactory.cpp # src/libxrpl/telemetry/NullTelemetry.cpp # src/libxrpl/telemetry/Telemetry.cpp # src/libxrpl/telemetry/TelemetryConfig.cpp # src/tests/libxrpl/basics/MallocTrim.cpp # src/tests/libxrpl/telemetry/TelemetryConfig.cpp # src/xrpld/app/consensus/RCLConsensus.cpp # src/xrpld/app/consensus/RCLConsensus.h # src/xrpld/app/ledger/detail/BuildLedger.cpp # src/xrpld/app/ledger/detail/LedgerMaster.cpp # src/xrpld/app/main/Application.cpp # src/xrpld/app/misc/NetworkOPs.cpp # src/xrpld/consensus/Consensus.h # src/xrpld/overlay/detail/PeerImp.cpp # src/xrpld/rpc/detail/RPCHandler.cpp # src/xrpld/rpc/detail/ServerHandler.cpp	2026-04-29 19:50:32 +01:00
Pratik Mankawde	8fb33b0818	feat(telemetry): add Phase 4 consensus tracing with SpanGuard API Instrument the consensus subsystem with OpenTelemetry spans covering the full round lifecycle: round start, establish phase, proposal send, ledger close, position updates, consensus check, accept, validation send, and mode changes. Key design choices adapted from the original Phase 4 implementation to the new SpanGuard factory pattern introduced in Phase 3: - Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs (consensus round spans share trace_id across validators via ledger hash) - Add SpanGuard::addEvent() overload with key-value attribute pairs (used for dispute.resolve events during position updates) - Add ConsensusSpanNames.h with compile-time span name constants following the colocated *SpanNames.h pattern from Phase 3 - Add consensusTraceStrategy config option ("deterministic"/"attribute") for cross-node trace correlation strategy selection - Use SpanGuard::linkedSpan() for follows-from relationships between consecutive rounds and cross-thread validation spans - Use SpanGuard::captureContext() for thread-safe context propagation from consensus thread to jtACCEPT worker thread Spans produced: consensus.round, consensus.proposal.send, consensus.ledger_close, consensus.establish, consensus.update_positions, consensus.check, consensus.accept, consensus.accept.apply, consensus.validation.send, consensus.mode_change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 17:32:56 +01:00

1 2 3

109 Commits