rippled

mirror of https://github.com/XRPLF/rippled.git synced 2026-06-03 08:46:46 +00:00

Author	SHA1	Message	Date
Pratik Mankawde	ac57a91b77	merge: phase-9 (dashboard UID + line-number cleanup, detach callbacks) into phase-10 # Conflicts: # docker/telemetry/TESTING.md	2026-05-14 17:23:55 +01:00
Pratik Mankawde	a9f52458b3	merge: pratik/otel-phase8-log-correlation (dashboard UID + line-number cleanup) into pratik/otel-phase9-metric-gap-fill # Conflicts: # docker/telemetry/grafana/dashboards/consensus-health.json # docker/telemetry/grafana/dashboards/ledger-operations.json # docker/telemetry/grafana/dashboards/peer-network.json # docker/telemetry/grafana/dashboards/rpc-performance.json # docker/telemetry/grafana/dashboards/system-ledger-data-sync.json # docker/telemetry/grafana/dashboards/system-network-traffic.json # docker/telemetry/grafana/dashboards/system-node-health.json # docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json # docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json # docker/telemetry/grafana/dashboards/transaction-overview.json	2026-05-14 17:10:12 +01:00
Pratik Mankawde	0e5e802e5e	merge: pratik/otel-phase7-native-metrics (dashboard UID + line-number cleanup) into pratik/otel-phase8-log-correlation	2026-05-14 17:07:34 +01:00
Pratik Mankawde	6985e1948b	merge: pratik/otel-phase6-statsd (line-number + docs cleanup) into pratik/otel-phase7-native-metrics # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # docker/telemetry/grafana/dashboards/system-ledger-data-sync.json # docker/telemetry/grafana/dashboards/system-network-traffic.json # docker/telemetry/grafana/dashboards/system-node-health.json # docker/telemetry/grafana/dashboards/system-overlay-traffic-detail.json # docker/telemetry/grafana/dashboards/system-rpc-pathfinding.json	2026-05-14 17:07:15 +01:00
Pratik Mankawde	a844c14e49	merge: pratik/otel-phase5-docs-deployment (line-number + docs cleanup) into pratik/otel-phase6-statsd	2026-05-14 17:00:05 +01:00
Pratik Mankawde	92bc0b24b8	docs(telemetry): drop volatile line numbers from Phase 4 span-catalog table Phase 4 added a span catalog in `06-implementation-phases.md` listing the source location for each consensus span. Line numbers `Consensus.h:707`, `RCLConsensus.cpp:232/341/492/541/900` drift on every refactor and would become stale PR after PR. Filename alone is enough for operators to grep — the RCLConsensus.cpp spans are already unambiguous from the span name itself.	2026-05-14 16:59:43 +01:00
Pratik Mankawde	44cdc8133e	fix(telemetry): phase-6 dashboards — rename UIDs, add $node filter, drop line numbers Phase-6 introduces ledger-operations, peer-network, and the five StatsD dashboards. Align them with the rest of the chain: - Rename dashboard UIDs from `rippled-` to `xrpld-` so the provisioned UIDs match the post-rename-script documentation (`docs.sh` rewrites .md but not .json, so the two drifted). Runbook references `xrpld-rpc-perf`, `xrpld-transactions`, etc., now the JSON matches. - Add the `$node` template variable + `exported_instance=~"$node"` filter to every target in the five `statsd-*` dashboards. Mirrors the pattern already used by consensus-health, ledger-operations, and peer-network per the project rule that every dashboard must support per-node filtering. - Strip `:<line>` (and `:NN-NN` range) suffixes from C++ file references in every dashboard panel description and in docker/telemetry/TESTING.md. Line numbers drift on every refactor; the filename alone is enough to grep. - Replace stale `rpc.request` entries with the real emitted span names (`rpc.http_request`, `rpc.ws_upgrade`, `rpc.ws_message`, `rpc.process`) in TESTING.md so operators can copy-paste the filters and hit real traces. - Also drop the `:706` line ref from the `StatsDCollector.cpp` callout in `06-implementation-phases.md`.	2026-05-14 16:51:14 +01:00
Pratik Mankawde	dfe91e071f	merge: phase-5 (runbook span-name + line-number fixes) into phase-6 # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # docs/telemetry-runbook.md	2026-05-14 16:42:13 +01:00
Pratik Mankawde	41d72cb51b	merge: phase-3 (phase-1a docs fixes) into phase-4 # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md	2026-05-14 16:24:27 +01:00
Pratik Mankawde	45e1c15d24	merge: pratik/otel-phase2-rpc-tracing (phase-1a docs fixes) into pratik/otel-phase3-tx-tracing # Conflicts: # OpenTelemetryPlan/05-configuration-reference.md	2026-05-14 16:13:35 +01:00
Pratik Mankawde	f3a095ab65	docs(telemetry): align Phase 1a plan docs with Phase 1b implementation Phase-1a plan documents advertised OTLP/gRPC on port 4317 as the default exporter, four unparsed [telemetry] config keys, and "Phase 4a Complete" status with exit-criteria checkboxes marked done. Every downstream branch through Phase 5 ships only OTLP/HTTP on port 4318 via OtlpHttpExporterFactory, never parses the advertised keys, and the Phase 4 work is not yet delivered. Fixes: - 02-design-decisions.md: flip §2.1.1 SDK dependency recommendations to OTLP/HTTP (shipped) with OTLP/gRPC marked Future. Update §2.2 architecture diagram and text from OTLP/gRPC:4317 to OTLP/HTTP:4318. Rewrite §2.2.1 as "OTLP/HTTP (Shipped)" and §2.2.2 as "OTLP/gRPC (Future Work — Planned Upgrade)" with a concrete checklist (Conan dep, config parsing, factory branch, runbook/dashboard updates) for landing the gRPC transport later. - 05-configuration-reference.md: drop the fabricated exporter/otlp_grpc key and the :4317 default from the sample config block and the options-summary table. Move trace_pathfind, trace_txq, trace_validator, trace_amendment into a new "Planned (not yet implemented)" table citing the phase that will add each one. Keep the example config minimal so copy-paste does not produce a silently-ignored stanza. - 06-implementation-phases.md: reset Phase 4 Exit Criteria checkboxes from [x] to [ ] (Phase 4 is not shipped at Phase-1a time). Rename "Phase 4a Complete" to "Phase 4a Plan" and describe the work as future. Replace the broken forward link to Phase4_taskList.md (introduced in the Phase 2 PR) with a sentence pointing readers to where that spec will land. Renumber the final section 6.12 to 6.11 so it sits directly after 6.10; section 6.11 ("Effort Summary") was intentionally removed in earlier edits.	2026-05-14 16:09:48 +01:00
Pratik Mankawde	782d98d249	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-05-13 11:40:15 +01:00
Pratik Mankawde	c096eeb239	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill	2026-05-13 11:30:22 +01:00
Pratik Mankawde	a8549a7ab2	fix(telemetry): address code review findings for Phase 8 log-trace correlation - Replace GetSpan() with direct context value check in Logs::format() to avoid heap allocation (new DefaultSpan) on the no-span path - Restore Phase 7 documentation accidentally deleted during merge - Fix undefined $JAEGER variable → use $TEMPO in integration test - Remove useless LCOV_EXCL markers around #ifdef block - Fix indentation inconsistencies in Log.cpp injection block - Remove incorrect url field from loki.yaml derivedFields - Update stale code sample in Phase8_taskList.md to match implementation - Correct "<10ns" performance claims to accurate ~15-20ns (no-span) and ~50ns (active-span) measurements across all docs - Replace Jaeger references with Tempo in TESTING.md (port 16686→3200) - Improve error handling in check_log_correlation(): track files_scanned, detect missing log files, fix silent grep error masking Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 14:32:46 +01:00
Pratik Mankawde	9adcc49171	fix: re-apply phase-7 doc/config changes lost during merge Re-applies phase-7 unique modifications to documentation and configuration files that were overwritten when taking phase-6's versions during the merge conflict resolution. Changes: - docker-compose.yml: comment out StatsD port 8125, add OTLP notes - otel-collector-config.yaml: remove StatsD receiver, update pipeline - integration-test.sh: server=otel, check_otel_metric, StatsD port check - telemetry-runbook.md: System Metrics section, server=otel config, troubleshooting for missing OTel metrics - 02-design-decisions.md: Phase 7 coexistence strategy notes - 05-configuration-reference.md: OTel System Metrics correlation - 06-implementation-phases.md: add Phase 7 section (~180 lines) - OpenTelemetryPlan.md: update phases table (7 phases, 60.6 days) - 08-appendix.md: add Phase7_taskList.md to document index - Delete 5 statsd-.json dashboards (replaced by system-.json) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 21:05:48 +01:00
Pratik Mankawde	b659d43395	fix: address CI rename checks (rippled -> xrpld) in phase-10 docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:40:44 +01:00
Pratik Mankawde	70d86d7ebf	Merge branch 'pratik/otel-phase9-metric-gap-fill' into pratik/otel-phase10-workload-validation # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/09-data-collection-reference.md # OpenTelemetryPlan/OpenTelemetryPlan.md # docker/telemetry/docker-compose.yml # docker/telemetry/grafana/dashboards/statsd-network-traffic.json # docker/telemetry/otel-collector-config.yaml # src/xrpld/overlay/detail/PeerImp.cpp	2026-04-29 20:38:00 +01:00
Pratik Mankawde	9e12e660fe	Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:25:13 +01:00
Pratik Mankawde	7ab6f4d34b	fix: address CI rename checks (rippled -> xrpld) in phase-8 docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 20:09:43 +01:00
Pratik Mankawde	81b47afde7	Merge branch 'pratik/otel-phase7-native-metrics' into pratik/otel-phase8-log-correlation # Conflicts: # OpenTelemetryPlan/06-implementation-phases.md # OpenTelemetryPlan/08-appendix.md # OpenTelemetryPlan/OpenTelemetryPlan.md # docker/telemetry/grafana/dashboards/statsd-network-traffic.json # docker/telemetry/grafana/dashboards/statsd-node-health.json # docker/telemetry/grafana/dashboards/statsd-rpc-pathfinding.json	2026-04-29 20:07:43 +01:00
Pratik Mankawde	ef10c754b1	fix(telemetry): address code review findings for Phase 4 consensus tracing Fix quorum attribute to use actual validator quorum instead of proposer count, add missing ConsensusState::Expired handling in haveConsensus() span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve levelization cycle, remove unused constants, enrich proposal receive span with sequence, and correct stale documentation references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 17:32:56 +01:00
Pratik Mankawde	2773de7b54	docs(telemetry): mark Phase 4/4a consensus tracing tasks complete Update Phase4_taskList.md and 06-implementation-phases.md to reflect completed implementation of all remaining Phase 4/4a tasks (4.2-4.6, 4a.5, 4a.6, 4a.8). Update exit criteria and summary tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 17:32:56 +01:00
Pratik Mankawde	264516c37d	docs update Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-04-29 17:32:56 +01:00
Pratik Mankawde	8fb33b0818	feat(telemetry): add Phase 4 consensus tracing with SpanGuard API Instrument the consensus subsystem with OpenTelemetry spans covering the full round lifecycle: round start, establish phase, proposal send, ledger close, position updates, consensus check, accept, validation send, and mode changes. Key design choices adapted from the original Phase 4 implementation to the new SpanGuard factory pattern introduced in Phase 3: - Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs (consensus round spans share trace_id across validators via ledger hash) - Add SpanGuard::addEvent() overload with key-value attribute pairs (used for dispute.resolve events during position updates) - Add ConsensusSpanNames.h with compile-time span name constants following the colocated *SpanNames.h pattern from Phase 3 - Add consensusTraceStrategy config option ("deterministic"/"attribute") for cross-node trace correlation strategy selection - Use SpanGuard::linkedSpan() for follows-from relationships between consecutive rounds and cross-thread validation spans - Use SpanGuard::captureContext() for thread-safe context propagation from consensus thread to jtACCEPT worker thread Spans produced: consensus.round, consensus.proposal.send, consensus.ledger_close, consensus.establish, consensus.update_positions, consensus.check, consensus.accept, consensus.accept.apply, consensus.validation.send, consensus.mode_change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 17:32:56 +01:00
Pratik Mankawde	312dec2baa	docs(telemetry): add deterministic TX trace ID design (Task 3.9) Add trace_id = txHash[0:16] strategy so all nodes handling the same transaction independently produce spans under the same trace_id, combined with protobuf span_id propagation for parent-child ordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 17:32:49 +01:00
Pratik Mankawde	b933e8ae00	feat(telemetry): add missing StatsD dashboard panels from production dashboard Compared shared production Grafana dashboard against Phase 6 StatsD dashboards and added 10 missing panels covering job execution/dequeue timers, cache metrics, ledger publish gap, state duration rate, duplicate traffic, and detailed traffic breakdown. Node Health dashboard: 8 → 16 panels, plus quantile template variable. Network Traffic dashboard: 8 → 10 panels, Total Network Bytes now rate(). Updated runbook, data collection reference, and implementation phases docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 14:02:27 +01:00
Pratik Mankawde	1a96f75954	fix(telemetry): apply rename script to phase 6 documentation Replace remaining rippled/Ripple references with xrpld/XRPL in data collection reference, implementation phases, and runbook docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 11:30:50 +01:00
Pratik Mankawde	88e25119f0	Merge branch 'pratik/otel-phase5-docs-deployment' into pratik/otel-phase6-statsd	2026-04-29 11:29:14 +01:00
Pratik Mankawde	c5a59645d9	fix(telemetry): resolve merge conflicts, bashate, and rename for phase 5 Resolve merge conflicts taking phase 4 consensus span improvements, fix bashate indentation in integration test script, and apply rename script to Phase5 integration test docs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-29 11:28:54 +01:00
Pratik Mankawde	c01f8ae99c	fix(telemetry): address code review findings for Phase 4 consensus tracing Fix quorum attribute to use actual validator quorum instead of proposer count, add missing ConsensusState::Expired handling in haveConsensus() span, move ConsensusSpanNames.h to xrpld/consensus/ to resolve levelization cycle, remove unused constants, enrich proposal receive span with sequence, and correct stale documentation references. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 18:14:00 +01:00
Pratik Mankawde	1e4ce19556	docs(telemetry): mark Phase 4/4a consensus tracing tasks complete Update Phase4_taskList.md and 06-implementation-phases.md to reflect completed implementation of all remaining Phase 4/4a tasks (4.2-4.6, 4a.5, 4a.6, 4a.8). Update exit criteria and summary tables. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-28 16:17:06 +01:00
Pratik Mankawde	90c2321bb8	docs update Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>	2026-04-28 15:33:45 +01:00
Pratik Mankawde	cbbd6ebee2	feat(telemetry): add Phase 6 StatsD metrics, ledger/peer spans, and expanded dashboards Integrate the existing StatsD metrics pipeline (beast::insight) into the OpenTelemetry observability stack and add new trace spans for ledger build/store/validate and peer proposal/validation receive. Phase 5b — Ledger, peer, and transaction spans: - Add ledger.build span with close time attributes in BuildLedger.cpp - Add tx.apply span with tx_count/tx_failed in BuildLedger.cpp - Add ledger.store and ledger.validate spans in LedgerMaster.cpp - Add peer.proposal.receive span with trusted attribute in PeerImp.cpp - Add peer.validation.receive span with ledger_hash, full, trusted attributes in PeerImp.cpp - Add ledger-operations and peer-network Grafana dashboards Phase 6 — StatsD metrics integration: - Add StatsD UDP receiver (port 8125) to OTel Collector - Add 5 StatsD Grafana dashboards: node health, network traffic, overlay traffic detail, ledger data sync, RPC pathfinding - Add 09-data-collection-reference.md cataloging all metrics/spans - Update existing dashboards with new span panels - Expand telemetry runbook and integration test script - Add codecov exclusions for telemetry modules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:00:57 +01:00
Pratik Mankawde	ae475793d5	docs(telemetry): mark Phase 5 deferred tasks and fix stale macro reference Mark Tasks 5.3 (alert definitions) and 5.6 (training materials) as "Deferred — post-MVP" in the implementation phases document to accurately reflect current delivery scope. Add status column to the Phase 5 task table. Also fix stale reference to XRPL_TRACE_* macros in Phase 4a section — the implementation uses SpanGuard factory methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 15:00:40 +01:00
Pratik Mankawde	34ee231d62	feat(telemetry): add Phase 4 consensus tracing with SpanGuard API Instrument the consensus subsystem with OpenTelemetry spans covering the full round lifecycle: round start, establish phase, proposal send, ledger close, position updates, consensus check, accept, validation send, and mode changes. Key design choices adapted from the original Phase 4 implementation to the new SpanGuard factory pattern introduced in Phase 3: - Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs (consensus round spans share trace_id across validators via ledger hash) - Add SpanGuard::addEvent() overload with key-value attribute pairs (used for dispute.resolve events during position updates) - Add ConsensusSpanNames.h with compile-time span name constants following the colocated *SpanNames.h pattern from Phase 3 - Add consensusTraceStrategy config option ("deterministic"/"attribute") for cross-node trace correlation strategy selection - Use SpanGuard::linkedSpan() for follows-from relationships between consecutive rounds and cross-thread validation spans - Use SpanGuard::captureContext() for thread-safe context propagation from consensus thread to jtACCEPT worker thread Spans produced: consensus.round, consensus.proposal.send, consensus.ledger_close, consensus.establish, consensus.update_positions, consensus.check, consensus.accept, consensus.accept.apply, consensus.validation.send, consensus.mode_change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:34:39 +01:00
Pratik Mankawde	c585d9b66c	docs(telemetry): add deterministic TX trace ID design (Task 3.9) Add trace_id = txHash[0:16] strategy so all nodes handling the same transaction independently produce spans under the same trace_id, combined with protobuf span_id propagation for parent-child ordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-28 14:29:16 +01:00
Pratik Mankawde	df79d5e74b	feat: add OTel-driven regression gate for Phase 10 telemetry validation Captures per-span / per-RPC / per-job timings from Prometheus after the workload run and diffs them against a committed baseline. Regression requires breaching both a percentage and an absolute bound, tolerating small-value noise. When the baseline is a placeholder, the comparator emits the captured JSON in the exact schema for one-time paste into baselines/baseline-timings.json, and the CI Step Summary surfaces that block for the reviewer. Scope: gate only — automated baseline persistence, benchmark.sh PromQL migration, and the historical trend dashboard remain follow-ups.	2026-04-24 18:53:44 +01:00
Pratik Mankawde	913a4b794c	docs: correct OTel overhead estimates against SDK benchmarks Verified CPU, memory, and network overhead calculations against official OTel C++ SDK benchmarks (969 CI runs) and source code analysis. Key corrections: - Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median ~1000ns; original estimate matched API no-op, not SDK path) - Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%) - Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper + SpanData + std::map attribute storage) - Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread stack ~8MB was omitted) - Total memory ceiling: ~2.3MB → ~10MB - Memory success metric target: <5MB → <10MB - AddEvent: 50-80ns → 100-200ns Added Section 3.5.4 with links to all benchmark sources. Updated presentation.md with matching corrections. High-level conclusions unchanged (1-3% CPU, negligible consensus). Also includes: review fixes, cross-document consistency improvements, additional component tracing docs (PathFinding, TxQ, Validator, etc.), context size corrections (32 → 25 bytes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	c6fa00fbe3	Remove effort estimates from implementation phases document Strip effort/risk columns from task tables and remove the §6.9 Effort Summary section with its pie chart and resource requirements table. Renumber §6.10 Quick Wins → §6.9. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	bfb8f4f01a	Add Phase 4a implementation status to plan docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	ddf894dcb0	Phase 1a: OpenTelemetry plan documentation Add comprehensive planning documentation for the OpenTelemetry distributed tracing integration: - Tracing fundamentals and concepts - Architecture analysis of rippled's tracing surface area - Design decisions and trade-offs - Implementation strategy and code samples - Configuration reference - Implementation phases roadmap - Observability backend comparison - POC task list and presentation materials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 15:00:47 +01:00
Pratik Mankawde	7e149f7773	refactor(telemetry): remove residual Jaeger references across chain Fix remaining Jaeger references that accumulated across intermediate branches in the stacked PR chain. These were in files modified by multiple phases where the per-branch fixes didn't cover all additions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 22:35:04 +01:00
Pratik Mankawde	5de8c520d1	Phase 10: Workload validation - synthetic load generation and telemetry checks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:32:02 +01:00
Pratik Mankawde	936c73982d	docs: update Phase 9 docs and dashboard for push_metrics.py parity gauges - Add Task 9.7a to Phase9_taskList.md documenting new gauges - Add metric tables to 09-data-collection-reference.md (server_info, build_info, complete_ledgers, db_metrics, extended cache/nodestore) - Update metric counts from ~50 to ~68 in 06-implementation-phases.md - Add OTel MetricsRegistry gauge reference to telemetry-runbook.md - Add 11 new panels to system-node-health.json Grafana dashboard (server state, uptime, peers, validated seq, last close info, build version, complete ledgers, db sizes, historical fetch rate, peer disconnects) - Fix leftover merge conflict marker in 08-appendix.md - Add ripplex/mseconds to cspell dictionary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:49 +01:00
Pratik Mankawde	892fee638a	Phase 9: Metric gap fill - nodestore, cache, TxQ, load factor dashboards Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:49 +01:00
Pratik Mankawde	fdec3ce5c4	Phase 8: Log-trace correlation with Loki and filelog receiver Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:37 +01:00
Pratik Mankawde	2f7064ace6	Phase 7: Native OTel metrics migration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:24 +01:00
Pratik Mankawde	21192e9b3f	Phase 6: StatsD metrics integration into telemetry pipeline Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:31:07 +01:00
Pratik Mankawde	a127711b86	Phase 4: Consensus tracing - round lifecycle, proposals, validations, close time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 22:28:33 +01:00
Pratik Mankawde	f135842071	docs: correct OTel overhead estimates against SDK benchmarks Verified CPU, memory, and network overhead calculations against official OTel C++ SDK benchmarks (969 CI runs) and source code analysis. Key corrections: - Span creation: 200-500ns → 500-1000ns (SDK BM_SpanCreation median ~1000ns; original estimate matched API no-op, not SDK path) - Per-TX overhead: 2.4μs → 4.0μs (2.0% vs 1.2%; still within 1-3%) - Active span memory: ~200 bytes → ~500-800 bytes (Span wrapper + SpanData + std::map attribute storage) - Static memory: ~456KB → ~8.3MB (BatchSpanProcessor worker thread stack ~8MB was omitted) - Total memory ceiling: ~2.3MB → ~10MB - Memory success metric target: <5MB → <10MB - AddEvent: 50-80ns → 100-200ns Added Section 3.5.4 with links to all benchmark sources. Updated presentation.md with matching corrections. High-level conclusions unchanged (1-3% CPU, negligible consensus). Also includes: review fixes, cross-document consistency improvements, additional component tracing docs (PathFinding, TxQ, Validator, etc.), context size corrections (32 → 25 bytes). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 15:55:26 +01:00

1 2

53 Commits