diff --git a/OpenTelemetryPlan/02-design-decisions.md b/OpenTelemetryPlan/02-design-decisions.md index 5df51798dd..6523a53d09 100644 --- a/OpenTelemetryPlan/02-design-decisions.md +++ b/OpenTelemetryPlan/02-design-decisions.md @@ -13,13 +13,13 @@ **Primary Choice**: OpenTelemetry C++ SDK (`opentelemetry-cpp`) -| Component | Purpose | Required | -| --------------------------------------- | ---------------------- | ----------- | -| `opentelemetry-cpp::api` | Tracing API headers | Yes | -| `opentelemetry-cpp::sdk` | SDK implementation | Yes | -| `opentelemetry-cpp::ext` | Extensions (exporters) | Yes | -| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export | Recommended | -| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export | Alternative | +| Component | Purpose | Required | +| --------------------------------------- | ---------------------- | ------------------------- | +| `opentelemetry-cpp::api` | Tracing API headers | Yes | +| `opentelemetry-cpp::sdk` | SDK implementation | Yes | +| `opentelemetry-cpp::ext` | Extensions (exporters) | Yes | +| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export | Yes (shipped in Phase 1b) | +| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export | Future (not yet wired up) | ### 2.1.2 Instrumentation Strategy @@ -51,9 +51,9 @@ flowchart TB elastic["Elastic
APM"] end - node1 -->|"OTLP/gRPC
:4317"| collector - node2 -->|"OTLP/gRPC
:4317"| collector - node3 -->|"OTLP/gRPC
:4317"| collector + node1 -->|"OTLP/HTTP
:4318"| collector + node2 -->|"OTLP/HTTP
:4318"| collector + node3 -->|"OTLP/HTTP
:4318"| collector collector --> tempo collector --> elastic @@ -65,27 +65,15 @@ flowchart TB **Reading the diagram:** -- **xrpld Nodes (blue)**: The source of telemetry data. Each xrpld node exports spans via OTLP/gRPC on port 4317. +- **xrpld Nodes (blue)**: The source of telemetry data. Each xrpld node exports spans via OTLP/HTTP on port 4318 (the only exporter shipped in Phase 1b). - **OpenTelemetry Collector (red)**: The central aggregation point that receives spans from all nodes. Can run as a sidecar (per-node) or standalone (shared). Handles batching, filtering, and routing. - **Observability Backends (green)**: The storage and visualization destinations. Tempo is the recommended backend for both development and production, and Elastic APM is an alternative. The Collector routes to one or more backends. -- **Arrows (nodes to collector to backends)**: The data pipeline -- spans flow from nodes to the Collector over gRPC, then the Collector fans out to the configured backends. +- **Arrows (nodes to collector to backends)**: The data pipeline -- spans flow from nodes to the Collector over HTTP, then the Collector fans out to the configured backends. -### 2.2.1 OTLP/gRPC (Recommended) +### 2.2.1 OTLP/HTTP (Shipped in Phase 1b) ```cpp -// Configuration for OTLP over gRPC -namespace otlp = opentelemetry::exporter::otlp; - -otlp::OtlpGrpcExporterOptions opts; -opts.endpoint = "localhost:4317"; -opts.useTls = true; -opts.sslCaCertPath = "/path/to/ca.crt"; -``` - -### 2.2.2 OTLP/HTTP (Alternative) - -```cpp -// Configuration for OTLP over HTTP +// Configuration for OTLP over HTTP (the only exporter currently wired up). namespace otlp = opentelemetry::exporter::otlp; otlp::OtlpHttpExporterOptions opts; @@ -93,6 +81,40 @@ opts.url = "http://localhost:4318/v1/traces"; opts.content_type = otlp::HttpRequestContentType::kJson; // or kBinary ``` +### 2.2.2 OTLP/gRPC (Future Work — Planned Upgrade) + +OTLP/gRPC is planned as a future upgrade from the HTTP exporter. The gRPC +transport offers lower per-span overhead and tighter back-pressure semantics +than HTTP/JSON, making it attractive for production deployments once the HTTP +path is validated in earlier phases. + +Required to land this upgrade: + +1. Add `opentelemetry-cpp::otlp_grpc_exporter` to the Conan recipe (the + dependency already exists but is not linked in Phase 1b builds). +2. Extend `TelemetryConfig.cpp` to parse an `exporter` key (`otlp_http` + default, `otlp_grpc` opt-in) and a gRPC endpoint override. +3. In `Telemetry::start()` branch on the parsed exporter type and construct + either `OtlpHttpExporterFactory::Create(httpOpts)` or + `OtlpGrpcExporterFactory::Create(grpcOpts)` accordingly. +4. Update the runbook and dashboards to document the alternate port and TLS + settings. + +Example Phase 1b+ gRPC configuration (when wired up): + +```cpp +// Configuration for OTLP over gRPC (future work). +namespace otlp = opentelemetry::exporter::otlp; + +otlp::OtlpGrpcExporterOptions opts; +opts.endpoint = ":4317"; +opts.use_ssl_credentials = true; +opts.ssl_credentials_cacert_path = "/path/to/ca.crt"; +``` + +Until that work lands, `OtlpGrpcExporterOptions` is **not** used by any code +path in Phase 1b through Phase 5. + --- ## 2.3 Span Naming Conventions diff --git a/OpenTelemetryPlan/05-configuration-reference.md b/OpenTelemetryPlan/05-configuration-reference.md index 6c94e16513..ff7f21e913 100644 --- a/OpenTelemetryPlan/05-configuration-reference.md +++ b/OpenTelemetryPlan/05-configuration-reference.md @@ -26,11 +26,10 @@ Add to `cfg/xrpld-example.cfg`: # # Enable/disable telemetry (default: 0 = disabled) # enabled=1 # -# # Exporter type: "otlp_grpc" (default), "otlp_http", or "none" -# exporter=otlp_grpc -# -# # OTLP endpoint (default: localhost:4317 for gRPC, localhost:4318 for HTTP) -# endpoint=localhost:4317 +# # OTLP endpoint (default: http://localhost:4318/v1/traces - OTLP/HTTP) +# # Note: only OTLP/HTTP is shipped in Phase 1b. OTLP/gRPC support is +# # planned as future work and is not yet parsed by TelemetryConfig.cpp. +# endpoint=http://localhost:4318/v1/traces # # # Use TLS for exporter connection (default: 0) # use_tls=0 @@ -56,10 +55,12 @@ Add to `cfg/xrpld-example.cfg`: # trace_rpc=1 # RPC request handling # trace_peer=0 # Peer messages (high volume, disabled by default) # trace_ledger=1 # Ledger acquisition and building -# trace_pathfind=1 # Path computation (can be expensive) -# trace_txq=1 # Transaction queue and fee escalation -# trace_validator=0 # Validator list and manifest updates (low volume) -# trace_amendment=0 # Amendment voting (very low volume) +# +# # Planned (not yet parsed by TelemetryConfig.cpp): +# # trace_pathfind=1 # Path computation (Phase 2) +# # trace_txq=1 # Transaction queue (Phase 3) +# # trace_validator=0 # Validator list / manifest (future) +# # trace_amendment=0 # Amendment voting (future) # # # Service identification (automatically detected if not specified) # # service_name=xrpld @@ -71,28 +72,35 @@ enabled=0 ### 5.1.2 Configuration Options Summary -| Option | Type | Default | Description | -| --------------------- | ------ | ---------------- | ----------------------------------------- | -| `enabled` | bool | `false` | Enable/disable telemetry | -| `exporter` | string | `"otlp_grpc"` | Exporter type: otlp_grpc, otlp_http, none | -| `endpoint` | string | `localhost:4317` | OTLP collector endpoint | -| `use_tls` | bool | `false` | Enable TLS for exporter connection | -| `tls_ca_cert` | string | `""` | Path to CA certificate file | -| `sampling_ratio` | float | `1.0` | Sampling ratio (0.0-1.0) | -| `batch_size` | uint | `512` | Spans per export batch | -| `batch_delay_ms` | uint | `5000` | Max delay before sending batch (ms) | -| `max_queue_size` | uint | `2048` | Maximum queued spans | -| `trace_transactions` | bool | `true` | Enable transaction tracing | -| `trace_consensus` | bool | `true` | Enable consensus tracing | -| `trace_rpc` | bool | `true` | Enable RPC tracing | -| `trace_peer` | bool | `false` | Enable peer message tracing (high volume) | -| `trace_ledger` | bool | `true` | Enable ledger tracing | -| `trace_pathfind` | bool | `true` | Enable path computation tracing | -| `trace_txq` | bool | `true` | Enable transaction queue tracing | -| `trace_validator` | bool | `false` | Enable validator list/manifest tracing | -| `trace_amendment` | bool | `false` | Enable amendment voting tracing | -| `service_name` | string | `"xrpld"` | Service name for traces | -| `service_instance_id` | string | `` | Instance identifier | +| Option | Type | Default | Description | +| --------------------- | ------ | --------------------------------- | ----------------------------------------- | +| `enabled` | bool | `false` | Enable/disable telemetry | +| `endpoint` | string | `http://localhost:4318/v1/traces` | OTLP/HTTP collector endpoint | +| `use_tls` | bool | `false` | Enable TLS for exporter connection | +| `tls_ca_cert` | string | `""` | Path to CA certificate file | +| `sampling_ratio` | float | `1.0` | Sampling ratio (0.0-1.0) | +| `batch_size` | uint | `512` | Spans per export batch | +| `batch_delay_ms` | uint | `5000` | Max delay before sending batch (ms) | +| `max_queue_size` | uint | `2048` | Maximum queued spans | +| `trace_transactions` | bool | `true` | Enable transaction tracing | +| `trace_consensus` | bool | `true` | Enable consensus tracing | +| `trace_rpc` | bool | `true` | Enable RPC tracing | +| `trace_peer` | bool | `false` | Enable peer message tracing (high volume) | +| `trace_ledger` | bool | `true` | Enable ledger tracing | +| `service_name` | string | `"xrpld"` | Service name for traces | +| `service_instance_id` | string | `` | Instance identifier | + +**Planned (not yet implemented)**: the following options appear in the design +documents but are not parsed by `TelemetryConfig.cpp` in Phase 1b and later +phases. They will be added as the corresponding subsystems are instrumented: + +| Option | Planned Phase | Purpose | +| ----------------- | ------------- | ---------------------------------------- | +| `exporter` | Future | Select between OTLP/HTTP and OTLP/gRPC | +| `trace_pathfind` | Phase 2 | Path computation tracing toggle | +| `trace_txq` | Phase 3 | Transaction queue tracing toggle | +| `trace_validator` | Future | Validator list / manifest update tracing | +| `trace_amendment` | Future | Amendment voting tracing | --- diff --git a/OpenTelemetryPlan/06-implementation-phases.md b/OpenTelemetryPlan/06-implementation-phases.md index ccf1fd54d4..12eea9c67b 100644 --- a/OpenTelemetryPlan/06-implementation-phases.md +++ b/OpenTelemetryPlan/06-implementation-phases.md @@ -166,22 +166,21 @@ gantt ### Exit Criteria -- [x] Complete consensus round traces -- [x] Phase transitions visible -- [x] Proposals and validations traced -- [x] No impact on consensus timing +- [ ] Complete consensus round traces +- [ ] Phase transitions visible +- [ ] Proposals and validations traced +- [ ] No impact on consensus timing - [ ] Multi-validator test network validated -### Implementation Status — Phase 4a Complete +### Implementation Status — Phase 4a Plan -Phase 4a (establish-phase gap fill & cross-node correlation) adds: +Phase 4a (establish-phase gap fill & cross-node correlation) will add: - **Deterministic trace ID** derived from `previousLedger.id()` so all validators in the same round share the same `trace_id` (switchable via `consensus_trace_strategy` config: `"deterministic"` or `"attribute"`). See [Configuration Reference](./05-configuration-reference.md) for full - configuration options. The `consensus_trace_strategy` option will be - documented in the configuration reference as part of Phase 4a implementation. + configuration options. - **Round lifecycle spans**: `consensus.round` with round-to-round span links. - **Establish phase**: `consensus.establish`, `consensus.update_positions` (with `dispute.resolve` events), `consensus.check` (with threshold tracking). @@ -192,7 +191,8 @@ Phase 4a (establish-phase gap fill & cross-node correlation) adds: (`startRoundTracing`, `createValidationSpan`, `startEstablishTracing`, `updateEstablishTracing`, `endEstablishTracing`). -See [Phase4_taskList.md](./Phase4_taskList.md) for the full spec and implementation notes. +The `Phase4_taskList.md` spec document is introduced in the Phase 2 PR (#6424) +and will contain the full task breakdown and implementation notes. --- @@ -490,7 +490,7 @@ Clear, measurable criteria for each phase. --- -## 6.12 Recommended Implementation Order +## 6.11 Recommended Implementation Order Based on ROI analysis, implement in this exact order: