Commit Graph

4 Commits

Author SHA1 Message Date
Pratik Mankawde
eafdd59121 feat(telemetry): add Phase 4 consensus tracing with SpanGuard API
Instrument the consensus subsystem with OpenTelemetry spans covering
the full round lifecycle: round start, establish phase, proposal send,
ledger close, position updates, consensus check, accept, validation
send, and mode changes.

Key design choices adapted from the original Phase 4 implementation
to the new SpanGuard factory pattern introduced in Phase 3:

- Add SpanGuard::hashSpan() for category-gated hash-derived trace IDs
  (consensus round spans share trace_id across validators via ledger hash)
- Add SpanGuard::addEvent() overload with key-value attribute pairs
  (used for dispute.resolve events during position updates)
- Add ConsensusSpanNames.h with compile-time span name constants
  following the colocated *SpanNames.h pattern from Phase 3
- Add consensusTraceStrategy config option ("deterministic"/"attribute")
  for cross-node trace correlation strategy selection
- Use SpanGuard::linkedSpan() for follows-from relationships between
  consecutive rounds and cross-thread validation spans
- Use SpanGuard::captureContext() for thread-safe context propagation
  from consensus thread to jtACCEPT worker thread

Spans produced: consensus.round, consensus.proposal.send,
consensus.ledger_close, consensus.establish, consensus.update_positions,
consensus.check, consensus.accept, consensus.accept.apply,
consensus.validation.send, consensus.mode_change

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 14:48:26 +01:00
Pratik Mankawde
714ac49f47 fix(telemetry): address Phase 1b code review findings
Redesign SpanGuard with pimpl idiom to hide all OpenTelemetry types
from public headers. Add global Telemetry accessor so SpanGuard factory
methods work without explicit Telemetry references. Add child/linked
span creation and cross-thread context propagation. Update plan docs
to reflect macro removal in favor of SpanGuard factory pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 16:31:50 +01:00
Pratik Mankawde
4830715d57 fix(telemetry): address review findings and PR #6437 comments
Critical fixes:
- Restore accidentally removed mallocTrim call and MallocTrim.h include
- Add missing shouldTraceLedger() to interface and all implementations
- Derive networkId/networkType from config_->NETWORK_ID (0=mainnet,
  1=testnet, 2=devnet) instead of leaving defaults unpopulated
- Clamp sampling_ratio to [0.0, 1.0] in config parser

PR comment fixes:
- Rename rippled -> xrpld in service name defaults, getTracer() calls,
  Docker network, comments, and docs/build/telemetry.md
- Remove exporter config option (only otlp_http supported)
- Add trace_ledger and service_name to example config
- Clarify head-based sampling semantics in config comments
- Add filter descriptions for span intrinsic filters in Grafana datasource
- Add inline comments to Docker Compose services

Docker/config improvements:
- Remove deprecated version: "3.8" from docker-compose.yml
- Pin images: collector 0.121.0, grafana 11.5.2
- Add health_check extension to otel-collector-config.yaml
- Comment out Tempo metrics_generator remote_write (no Prometheus service)
- Add Prometheus datasource caveat in Grafana datasource config

Other:
- Revert unrelated formatting changes in ServiceRegistry.h
- Change Conan telemetry default to False (matches CMake OFF)
- Add CLAUDE.md-required docs (ASCII diagrams, usage examples,
  @note thread-safety) to Telemetry.h and SpanGuard.h

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 17:07:01 +01:00
Pratik Mankawde
ce82e067bd Phase 1b: Telemetry core infrastructure - CMake, Conan, SpanGuard, config
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 15:37:58 +01:00