From f3a095ab653fdbbf1bf9f1fc9a94837d0552c535 Mon Sep 17 00:00:00 2001 From: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com> Date: Thu, 14 May 2026 16:09:48 +0100 Subject: [PATCH] docs(telemetry): align Phase 1a plan docs with Phase 1b implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase-1a plan documents advertised OTLP/gRPC on port 4317 as the default exporter, four unparsed [telemetry] config keys, and "Phase 4a Complete" status with exit-criteria checkboxes marked done. Every downstream branch through Phase 5 ships only OTLP/HTTP on port 4318 via OtlpHttpExporterFactory, never parses the advertised keys, and the Phase 4 work is not yet delivered. Fixes: - 02-design-decisions.md: flip §2.1.1 SDK dependency recommendations to OTLP/HTTP (shipped) with OTLP/gRPC marked Future. Update §2.2 architecture diagram and text from OTLP/gRPC:4317 to OTLP/HTTP:4318. Rewrite §2.2.1 as "OTLP/HTTP (Shipped)" and §2.2.2 as "OTLP/gRPC (Future Work — Planned Upgrade)" with a concrete checklist (Conan dep, config parsing, factory branch, runbook/dashboard updates) for landing the gRPC transport later. - 05-configuration-reference.md: drop the fabricated exporter/otlp_grpc key and the :4317 default from the sample config block and the options-summary table. Move trace_pathfind, trace_txq, trace_validator, trace_amendment into a new "Planned (not yet implemented)" table citing the phase that will add each one. Keep the example config minimal so copy-paste does not produce a silently-ignored stanza. - 06-implementation-phases.md: reset Phase 4 Exit Criteria checkboxes from [x] to [ ] (Phase 4 is not shipped at Phase-1a time). Rename "Phase 4a Complete" to "Phase 4a Plan" and describe the work as future. Replace the broken forward link to Phase4_taskList.md (introduced in the Phase 2 PR) with a sentence pointing readers to where that spec will land. Renumber the final section 6.12 to 6.11 so it sits directly after 6.10; section 6.11 ("Effort Summary") was intentionally removed in earlier edits. --- OpenTelemetryPlan/02-design-decisions.md | 74 ++++++++++++------- .../05-configuration-reference.md | 70 ++++++++++-------- OpenTelemetryPlan/06-implementation-phases.md | 20 ++--- 3 files changed, 97 insertions(+), 67 deletions(-) diff --git a/OpenTelemetryPlan/02-design-decisions.md b/OpenTelemetryPlan/02-design-decisions.md index fe87fc78db..184b40bca2 100644 --- a/OpenTelemetryPlan/02-design-decisions.md +++ b/OpenTelemetryPlan/02-design-decisions.md @@ -13,13 +13,13 @@ **Primary Choice**: OpenTelemetry C++ SDK (`opentelemetry-cpp`) -| Component | Purpose | Required | -| --------------------------------------- | ---------------------- | ----------- | -| `opentelemetry-cpp::api` | Tracing API headers | Yes | -| `opentelemetry-cpp::sdk` | SDK implementation | Yes | -| `opentelemetry-cpp::ext` | Extensions (exporters) | Yes | -| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export | Recommended | -| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export | Alternative | +| Component | Purpose | Required | +| --------------------------------------- | ---------------------- | ------------------------- | +| `opentelemetry-cpp::api` | Tracing API headers | Yes | +| `opentelemetry-cpp::sdk` | SDK implementation | Yes | +| `opentelemetry-cpp::ext` | Extensions (exporters) | Yes | +| `opentelemetry-cpp::otlp_http_exporter` | OTLP/HTTP export | Yes (shipped in Phase 1b) | +| `opentelemetry-cpp::otlp_grpc_exporter` | OTLP/gRPC export | Future (not yet wired up) | ### 2.1.2 Instrumentation Strategy @@ -51,9 +51,9 @@ flowchart TB elastic["Elastic
APM"] end - node1 -->|"OTLP/gRPC
:4317"| collector - node2 -->|"OTLP/gRPC
:4317"| collector - node3 -->|"OTLP/gRPC
:4317"| collector + node1 -->|"OTLP/HTTP
:4318"| collector + node2 -->|"OTLP/HTTP
:4318"| collector + node3 -->|"OTLP/HTTP
:4318"| collector collector --> tempo collector --> elastic @@ -65,27 +65,15 @@ flowchart TB **Reading the diagram:** -- **xrpld Nodes (blue)**: The source of telemetry data. Each xrpld node exports spans via OTLP/gRPC on port 4317. +- **xrpld Nodes (blue)**: The source of telemetry data. Each xrpld node exports spans via OTLP/HTTP on port 4318 (the only exporter shipped in Phase 1b). - **OpenTelemetry Collector (red)**: The central aggregation point that receives spans from all nodes. Can run as a sidecar (per-node) or standalone (shared). Handles batching, filtering, and routing. - **Observability Backends (green)**: The storage and visualization destinations. Tempo is the recommended backend for both development and production, and Elastic APM is an alternative. The Collector routes to one or more backends. -- **Arrows (nodes to collector to backends)**: The data pipeline -- spans flow from nodes to the Collector over gRPC, then the Collector fans out to the configured backends. +- **Arrows (nodes to collector to backends)**: The data pipeline -- spans flow from nodes to the Collector over HTTP, then the Collector fans out to the configured backends. -### 2.2.1 OTLP/gRPC (Recommended) +### 2.2.1 OTLP/HTTP (Shipped in Phase 1b) ```cpp -// Configuration for OTLP over gRPC -namespace otlp = opentelemetry::exporter::otlp; - -otlp::OtlpGrpcExporterOptions opts; -opts.endpoint = "localhost:4317"; -opts.useTls = true; -opts.sslCaCertPath = "/path/to/ca.crt"; -``` - -### 2.2.2 OTLP/HTTP (Alternative) - -```cpp -// Configuration for OTLP over HTTP +// Configuration for OTLP over HTTP (the only exporter currently wired up). namespace otlp = opentelemetry::exporter::otlp; otlp::OtlpHttpExporterOptions opts; @@ -93,6 +81,40 @@ opts.url = "http://localhost:4318/v1/traces"; opts.content_type = otlp::HttpRequestContentType::kJson; // or kBinary ``` +### 2.2.2 OTLP/gRPC (Future Work — Planned Upgrade) + +OTLP/gRPC is planned as a future upgrade from the HTTP exporter. The gRPC +transport offers lower per-span overhead and tighter back-pressure semantics +than HTTP/JSON, making it attractive for production deployments once the HTTP +path is validated in earlier phases. + +Required to land this upgrade: + +1. Add `opentelemetry-cpp::otlp_grpc_exporter` to the Conan recipe (the + dependency already exists but is not linked in Phase 1b builds). +2. Extend `TelemetryConfig.cpp` to parse an `exporter` key (`otlp_http` + default, `otlp_grpc` opt-in) and a gRPC endpoint override. +3. In `Telemetry::start()` branch on the parsed exporter type and construct + either `OtlpHttpExporterFactory::Create(httpOpts)` or + `OtlpGrpcExporterFactory::Create(grpcOpts)` accordingly. +4. Update the runbook and dashboards to document the alternate port and TLS + settings. + +Example Phase 1b+ gRPC configuration (when wired up): + +```cpp +// Configuration for OTLP over gRPC (future work). +namespace otlp = opentelemetry::exporter::otlp; + +otlp::OtlpGrpcExporterOptions opts; +opts.endpoint = ":4317"; +opts.use_ssl_credentials = true; +opts.ssl_credentials_cacert_path = "/path/to/ca.crt"; +``` + +Until that work lands, `OtlpGrpcExporterOptions` is **not** used by any code +path in Phase 1b through Phase 5. + --- ## 2.3 Span Naming Conventions diff --git a/OpenTelemetryPlan/05-configuration-reference.md b/OpenTelemetryPlan/05-configuration-reference.md index 56627c3b6c..6c7161be7b 100644 --- a/OpenTelemetryPlan/05-configuration-reference.md +++ b/OpenTelemetryPlan/05-configuration-reference.md @@ -26,11 +26,10 @@ Add to `cfg/xrpld-example.cfg`: # # Enable/disable telemetry (default: 0 = disabled) # enabled=1 # -# # Exporter type: "otlp_grpc" (default), "otlp_http", or "none" -# exporter=otlp_grpc -# -# # OTLP endpoint (default: localhost:4317 for gRPC, localhost:4318 for HTTP) -# endpoint=localhost:4317 +# # OTLP endpoint (default: http://localhost:4318/v1/traces - OTLP/HTTP) +# # Note: only OTLP/HTTP is shipped in Phase 1b. OTLP/gRPC support is +# # planned as future work and is not yet parsed by TelemetryConfig.cpp. +# endpoint=http://localhost:4318/v1/traces # # # Use TLS for exporter connection (default: 0) # use_tls=0 @@ -56,10 +55,12 @@ Add to `cfg/xrpld-example.cfg`: # trace_rpc=1 # RPC request handling # trace_peer=0 # Peer messages (high volume, disabled by default) # trace_ledger=1 # Ledger acquisition and building -# trace_pathfind=1 # Path computation (can be expensive) -# trace_txq=1 # Transaction queue and fee escalation -# trace_validator=0 # Validator list and manifest updates (low volume) -# trace_amendment=0 # Amendment voting (very low volume) +# +# # Planned (not yet parsed by TelemetryConfig.cpp): +# # trace_pathfind=1 # Path computation (Phase 2) +# # trace_txq=1 # Transaction queue (Phase 3) +# # trace_validator=0 # Validator list / manifest (future) +# # trace_amendment=0 # Amendment voting (future) # # # Service identification (automatically detected if not specified) # # service_name=xrpld @@ -71,28 +72,35 @@ enabled=0 ### 5.1.2 Configuration Options Summary -| Option | Type | Default | Description | -| --------------------- | ------ | ---------------- | ----------------------------------------- | -| `enabled` | bool | `false` | Enable/disable telemetry | -| `exporter` | string | `"otlp_grpc"` | Exporter type: otlp_grpc, otlp_http, none | -| `endpoint` | string | `localhost:4317` | OTLP collector endpoint | -| `use_tls` | bool | `false` | Enable TLS for exporter connection | -| `tls_ca_cert` | string | `""` | Path to CA certificate file | -| `sampling_ratio` | float | `1.0` | Sampling ratio (0.0-1.0) | -| `batch_size` | uint | `512` | Spans per export batch | -| `batch_delay_ms` | uint | `5000` | Max delay before sending batch (ms) | -| `max_queue_size` | uint | `2048` | Maximum queued spans | -| `trace_transactions` | bool | `true` | Enable transaction tracing | -| `trace_consensus` | bool | `true` | Enable consensus tracing | -| `trace_rpc` | bool | `true` | Enable RPC tracing | -| `trace_peer` | bool | `false` | Enable peer message tracing (high volume) | -| `trace_ledger` | bool | `true` | Enable ledger tracing | -| `trace_pathfind` | bool | `true` | Enable path computation tracing | -| `trace_txq` | bool | `true` | Enable transaction queue tracing | -| `trace_validator` | bool | `false` | Enable validator list/manifest tracing | -| `trace_amendment` | bool | `false` | Enable amendment voting tracing | -| `service_name` | string | `"xrpld"` | Service name for traces | -| `service_instance_id` | string | `` | Instance identifier | +| Option | Type | Default | Description | +| --------------------- | ------ | --------------------------------- | ----------------------------------------- | +| `enabled` | bool | `false` | Enable/disable telemetry | +| `endpoint` | string | `http://localhost:4318/v1/traces` | OTLP/HTTP collector endpoint | +| `use_tls` | bool | `false` | Enable TLS for exporter connection | +| `tls_ca_cert` | string | `""` | Path to CA certificate file | +| `sampling_ratio` | float | `1.0` | Sampling ratio (0.0-1.0) | +| `batch_size` | uint | `512` | Spans per export batch | +| `batch_delay_ms` | uint | `5000` | Max delay before sending batch (ms) | +| `max_queue_size` | uint | `2048` | Maximum queued spans | +| `trace_transactions` | bool | `true` | Enable transaction tracing | +| `trace_consensus` | bool | `true` | Enable consensus tracing | +| `trace_rpc` | bool | `true` | Enable RPC tracing | +| `trace_peer` | bool | `false` | Enable peer message tracing (high volume) | +| `trace_ledger` | bool | `true` | Enable ledger tracing | +| `service_name` | string | `"xrpld"` | Service name for traces | +| `service_instance_id` | string | `` | Instance identifier | + +**Planned (not yet implemented)**: the following options appear in the design +documents but are not parsed by `TelemetryConfig.cpp` in Phase 1b and later +phases. They will be added as the corresponding subsystems are instrumented: + +| Option | Planned Phase | Purpose | +| ----------------- | ------------- | ---------------------------------------- | +| `exporter` | Future | Select between OTLP/HTTP and OTLP/gRPC | +| `trace_pathfind` | Phase 2 | Path computation tracing toggle | +| `trace_txq` | Phase 3 | Transaction queue tracing toggle | +| `trace_validator` | Future | Validator list / manifest update tracing | +| `trace_amendment` | Future | Amendment voting tracing | --- diff --git a/OpenTelemetryPlan/06-implementation-phases.md b/OpenTelemetryPlan/06-implementation-phases.md index ccf1fd54d4..12eea9c67b 100644 --- a/OpenTelemetryPlan/06-implementation-phases.md +++ b/OpenTelemetryPlan/06-implementation-phases.md @@ -166,22 +166,21 @@ gantt ### Exit Criteria -- [x] Complete consensus round traces -- [x] Phase transitions visible -- [x] Proposals and validations traced -- [x] No impact on consensus timing +- [ ] Complete consensus round traces +- [ ] Phase transitions visible +- [ ] Proposals and validations traced +- [ ] No impact on consensus timing - [ ] Multi-validator test network validated -### Implementation Status — Phase 4a Complete +### Implementation Status — Phase 4a Plan -Phase 4a (establish-phase gap fill & cross-node correlation) adds: +Phase 4a (establish-phase gap fill & cross-node correlation) will add: - **Deterministic trace ID** derived from `previousLedger.id()` so all validators in the same round share the same `trace_id` (switchable via `consensus_trace_strategy` config: `"deterministic"` or `"attribute"`). See [Configuration Reference](./05-configuration-reference.md) for full - configuration options. The `consensus_trace_strategy` option will be - documented in the configuration reference as part of Phase 4a implementation. + configuration options. - **Round lifecycle spans**: `consensus.round` with round-to-round span links. - **Establish phase**: `consensus.establish`, `consensus.update_positions` (with `dispute.resolve` events), `consensus.check` (with threshold tracking). @@ -192,7 +191,8 @@ Phase 4a (establish-phase gap fill & cross-node correlation) adds: (`startRoundTracing`, `createValidationSpan`, `startEstablishTracing`, `updateEstablishTracing`, `endEstablishTracing`). -See [Phase4_taskList.md](./Phase4_taskList.md) for the full spec and implementation notes. +The `Phase4_taskList.md` spec document is introduced in the Phase 2 PR (#6424) +and will contain the full task breakdown and implementation notes. --- @@ -490,7 +490,7 @@ Clear, measurable criteria for each phase. --- -## 6.12 Recommended Implementation Order +## 6.11 Recommended Implementation Order Based on ROI analysis, implement in this exact order: