mirror of
https://github.com/XRPLF/rippled.git
synced 2026-04-29 15:37:57 +00:00
compare with other open source vendors
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
This commit is contained in:
@@ -136,6 +136,7 @@ For traces to work across nodes, **trace context must be propagated** in message
|
||||
### How span_id Changes at Each Hop
|
||||
|
||||
Only **one** `span_id` travels in the context - the sender's current span. Each node:
|
||||
|
||||
1. Extracts the received `span_id` and uses it as the `parent_span_id`
|
||||
2. Creates a **new** `span_id` for its own span
|
||||
3. Sends its own `span_id` as the parent when forwarding
|
||||
@@ -192,19 +193,23 @@ message TMTransaction {
|
||||
Not every trace needs to be recorded. **Sampling** reduces overhead:
|
||||
|
||||
### Head Sampling (at trace start)
|
||||
|
||||
```
|
||||
Request arrives → Random 10% chance → Record or skip entire trace
|
||||
```
|
||||
|
||||
- ✅ Low overhead
|
||||
- ❌ May miss interesting traces
|
||||
|
||||
### Tail Sampling (after trace completes)
|
||||
|
||||
```
|
||||
Trace completes → Collector evaluates:
|
||||
- Error? → KEEP
|
||||
- Slow? → KEEP
|
||||
- Normal? → Sample 10%
|
||||
```
|
||||
|
||||
- ✅ Never loses important traces
|
||||
- ❌ Higher memory usage at collector
|
||||
|
||||
@@ -236,4 +241,4 @@ Trace completes → Collector evaluates:
|
||||
|
||||
---
|
||||
|
||||
*Next: [Architecture Analysis](./01-architecture-analysis.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Next: [Architecture Analysis](./01-architecture-analysis.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -250,6 +250,7 @@ After implementing OpenTelemetry, operators and developers will gain visibility
|
||||
### 1.8.3 Concrete Dashboard Examples
|
||||
|
||||
**Transaction Trace View (Jaeger/Tempo):**
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Trace: abc123... (Transaction Submission) Duration: 847ms │
|
||||
@@ -270,6 +271,7 @@ After implementing OpenTelemetry, operators and developers will gain visibility
|
||||
```
|
||||
|
||||
**RPC Performance Dashboard Panel:**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ RPC Command Latency (Last 1 Hour) │
|
||||
@@ -325,4 +327,4 @@ xychart-beta
|
||||
|
||||
---
|
||||
|
||||
*Next: [Design Decisions](./02-design-decisions.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Next: [Design Decisions](./02-design-decisions.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -95,6 +95,7 @@ opts.content_type = otlp::HttpRequestContentType::kJson; // or kBinary
|
||||
```
|
||||
|
||||
**Examples**:
|
||||
|
||||
- `tx.receive` - Transaction received from peer
|
||||
- `consensus.phase.establish` - Consensus establish phase
|
||||
- `rpc.command.server_info` - server_info RPC command
|
||||
@@ -104,51 +105,51 @@ opts.content_type = otlp::HttpRequestContentType::kJson; // or kBinary
|
||||
```yaml
|
||||
# Transaction Spans
|
||||
tx:
|
||||
receive: "Transaction received from network"
|
||||
validate: "Transaction signature/format validation"
|
||||
process: "Full transaction processing"
|
||||
relay: "Transaction relay to peers"
|
||||
apply: "Apply transaction to ledger"
|
||||
receive: "Transaction received from network"
|
||||
validate: "Transaction signature/format validation"
|
||||
process: "Full transaction processing"
|
||||
relay: "Transaction relay to peers"
|
||||
apply: "Apply transaction to ledger"
|
||||
|
||||
# Consensus Spans
|
||||
consensus:
|
||||
round: "Complete consensus round"
|
||||
round: "Complete consensus round"
|
||||
phase:
|
||||
open: "Open phase - collecting transactions"
|
||||
open: "Open phase - collecting transactions"
|
||||
establish: "Establish phase - reaching agreement"
|
||||
accept: "Accept phase - applying consensus"
|
||||
accept: "Accept phase - applying consensus"
|
||||
proposal:
|
||||
receive: "Receive peer proposal"
|
||||
send: "Send our proposal"
|
||||
receive: "Receive peer proposal"
|
||||
send: "Send our proposal"
|
||||
validation:
|
||||
receive: "Receive peer validation"
|
||||
send: "Send our validation"
|
||||
receive: "Receive peer validation"
|
||||
send: "Send our validation"
|
||||
|
||||
# RPC Spans
|
||||
rpc:
|
||||
request: "HTTP/WebSocket request handling"
|
||||
request: "HTTP/WebSocket request handling"
|
||||
command:
|
||||
"*": "Specific RPC command (dynamic)"
|
||||
"*": "Specific RPC command (dynamic)"
|
||||
|
||||
# Peer Spans
|
||||
peer:
|
||||
connect: "Peer connection establishment"
|
||||
disconnect: "Peer disconnection"
|
||||
connect: "Peer connection establishment"
|
||||
disconnect: "Peer disconnection"
|
||||
message:
|
||||
send: "Send protocol message"
|
||||
receive: "Receive protocol message"
|
||||
send: "Send protocol message"
|
||||
receive: "Receive protocol message"
|
||||
|
||||
# Ledger Spans
|
||||
ledger:
|
||||
acquire: "Ledger acquisition from network"
|
||||
build: "Build new ledger"
|
||||
validate: "Ledger validation"
|
||||
close: "Close ledger"
|
||||
acquire: "Ledger acquisition from network"
|
||||
build: "Build new ledger"
|
||||
validate: "Ledger validation"
|
||||
close: "Close ledger"
|
||||
|
||||
# Job Spans
|
||||
job:
|
||||
enqueue: "Job added to queue"
|
||||
execute: "Job execution"
|
||||
enqueue: "Job added to queue"
|
||||
execute: "Job execution"
|
||||
```
|
||||
|
||||
---
|
||||
@@ -173,6 +174,7 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
|
||||
### 2.4.2 Span Attributes by Category
|
||||
|
||||
#### Transaction Attributes
|
||||
|
||||
```cpp
|
||||
"xrpl.tx.hash" = string // Transaction hash (hex)
|
||||
"xrpl.tx.type" = string // "Payment", "OfferCreate", etc.
|
||||
@@ -184,6 +186,7 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
|
||||
```
|
||||
|
||||
#### Consensus Attributes
|
||||
|
||||
```cpp
|
||||
"xrpl.consensus.round" = int64 // Round number
|
||||
"xrpl.consensus.phase" = string // "open", "establish", "accept"
|
||||
@@ -196,6 +199,7 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
|
||||
```
|
||||
|
||||
#### RPC Attributes
|
||||
|
||||
```cpp
|
||||
"xrpl.rpc.command" = string // Command name
|
||||
"xrpl.rpc.version" = int64 // API version
|
||||
@@ -204,6 +208,7 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
|
||||
```
|
||||
|
||||
#### Peer & Message Attributes
|
||||
|
||||
```cpp
|
||||
"xrpl.peer.id" = string // Peer public key (base58)
|
||||
"xrpl.peer.address" = string // IP:port
|
||||
@@ -215,6 +220,7 @@ resource::SemanticConventions::SERVICE_INSTANCE_ID = <node_public_key_base58>
|
||||
```
|
||||
|
||||
#### Ledger & Job Attributes
|
||||
|
||||
```cpp
|
||||
"xrpl.ledger.hash" = string // Ledger hash
|
||||
"xrpl.ledger.index" = int64 // Ledger sequence/index
|
||||
@@ -352,6 +358,7 @@ rippled already has two observability mechanisms. OpenTelemetry complements (not
|
||||
### 2.6.2 What Each Framework Does Best
|
||||
|
||||
#### PerfLog
|
||||
|
||||
- **Purpose**: Detailed local event logging for RPC and job execution
|
||||
- **Strengths**:
|
||||
- Rich JSON output with timing data
|
||||
@@ -373,6 +380,7 @@ rippled already has two observability mechanisms. OpenTelemetry complements (not
|
||||
```
|
||||
|
||||
#### Beast Insight (StatsD)
|
||||
|
||||
- **Purpose**: Real-time metrics for monitoring dashboards
|
||||
- **Strengths**:
|
||||
- Aggregated metrics (counters, gauges, histograms)
|
||||
@@ -391,6 +399,7 @@ insight.timing("consensus.round", duration);
|
||||
```
|
||||
|
||||
#### OpenTelemetry (NEW)
|
||||
|
||||
- **Purpose**: Distributed request tracing across nodes
|
||||
- **Strengths**:
|
||||
- **Cross-node correlation** via `trace_id`
|
||||
@@ -411,14 +420,14 @@ span->SetAttribute("peer.id", peerId);
|
||||
|
||||
### 2.6.3 When to Use Each
|
||||
|
||||
| Scenario | PerfLog | StatsD | OpenTelemetry |
|
||||
| --------------------------------------- | --------- | ------ | ------------- |
|
||||
| "How many TXs per second?" | ❌ | ✅ | ❌ |
|
||||
| "What's the p99 RPC latency?" | ❌ | ✅ | ✅ |
|
||||
| "Why was this specific TX slow?" | ⚠️ partial | ❌ | ✅ |
|
||||
| "Which node delayed consensus?" | ❌ | ❌ | ✅ |
|
||||
| "What happened on node X at time T?" | ✅ | ❌ | ✅ |
|
||||
| "Show me the TX journey across 5 nodes" | ❌ | ❌ | ✅ |
|
||||
| Scenario | PerfLog | StatsD | OpenTelemetry |
|
||||
| --------------------------------------- | ---------- | ------ | ------------- |
|
||||
| "How many TXs per second?" | ❌ | ✅ | ❌ |
|
||||
| "What's the p99 RPC latency?" | ❌ | ✅ | ✅ |
|
||||
| "Why was this specific TX slow?" | ⚠️ partial | ❌ | ✅ |
|
||||
| "Which node delayed consensus?" | ❌ | ❌ | ✅ |
|
||||
| "What happened on node X at time T?" | ✅ | ❌ | ✅ |
|
||||
| "Show me the TX journey across 5 nodes" | ❌ | ❌ | ✅ |
|
||||
|
||||
### 2.6.4 Coexistence Strategy
|
||||
|
||||
@@ -482,4 +491,4 @@ Status doCommand(RPC::JsonContext& context, Json::Value& result)
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Architecture Analysis](./01-architecture-analysis.md)* | *Next: [Implementation Strategy](./03-implementation-strategy.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Architecture Analysis](./01-architecture-analysis.md)_ | _Next: [Implementation Strategy](./03-implementation-strategy.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -191,6 +191,7 @@ xychart-beta
|
||||
```
|
||||
|
||||
**Notes**:
|
||||
|
||||
- Memory increases linearly with span rate
|
||||
- Batch export prevents unbounded growth
|
||||
- Queue size is configurable (default 2048 spans)
|
||||
@@ -386,8 +387,8 @@ quadrantChart
|
||||
|
||||
### 3.9.5 Backward Compatibility
|
||||
|
||||
| Compatibility | Status | Notes |
|
||||
| --------------- | ------ | ----------------------------------------------------- |
|
||||
| Compatibility | Status | Notes |
|
||||
| --------------- | ------- | ----------------------------------------------------- |
|
||||
| **Config File** | ✅ Full | New `[telemetry]` section is optional |
|
||||
| **Protocol** | ✅ Full | Optional protobuf fields with high field numbers |
|
||||
| **Build** | ✅ Full | `XRPL_ENABLE_TELEMETRY=OFF` produces identical binary |
|
||||
@@ -405,6 +406,7 @@ If issues are discovered after deployment:
|
||||
### 3.9.7 Code Change Examples
|
||||
|
||||
**Minimal RPC Instrumentation (Low Intrusiveness):**
|
||||
|
||||
```cpp
|
||||
// Before
|
||||
void ServerHandler::onRequest(...) {
|
||||
@@ -425,6 +427,7 @@ void ServerHandler::onRequest(...) {
|
||||
```
|
||||
|
||||
**Consensus Instrumentation (Medium Intrusiveness):**
|
||||
|
||||
```cpp
|
||||
// Before
|
||||
void RCLConsensusAdaptor::startRound(...) {
|
||||
@@ -445,4 +448,4 @@ void RCLConsensusAdaptor::startRound(...) {
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Design Decisions](./02-design-decisions.md)* | *Next: [Code Samples](./04-code-samples.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Design Decisions](./02-design-decisions.md)_ | _Next: [Code Samples](./04-code-samples.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -979,4 +979,4 @@ flowchart TB
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Implementation Strategy](./03-implementation-strategy.md)* | *Next: [Configuration Reference](./05-configuration-reference.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Implementation Strategy](./03-implementation-strategy.md)_ | _Next: [Configuration Reference](./05-configuration-reference.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -506,7 +506,7 @@ service:
|
||||
|
||||
```yaml
|
||||
# docker-compose-telemetry.yaml
|
||||
version: '3.8'
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
# OpenTelemetry Collector
|
||||
@@ -517,8 +517,8 @@ services:
|
||||
volumes:
|
||||
- ./otel-collector-dev.yaml:/etc/otel-collector-config.yaml:ro
|
||||
ports:
|
||||
- "4317:4317" # OTLP gRPC
|
||||
- "4318:4318" # OTLP HTTP
|
||||
- "4317:4317" # OTLP gRPC
|
||||
- "4318:4318" # OTLP HTTP
|
||||
- "13133:13133" # Health check
|
||||
depends_on:
|
||||
- jaeger
|
||||
@@ -628,8 +628,8 @@ datasources:
|
||||
httpMethod: GET
|
||||
tracesToLogs:
|
||||
datasourceUid: loki
|
||||
tags: ['service.name', 'xrpl.tx.hash']
|
||||
mappedTags: [{ key: 'trace_id', value: 'traceID' }]
|
||||
tags: ["service.name", "xrpl.tx.hash"]
|
||||
mappedTags: [{ key: "trace_id", value: "traceID" }]
|
||||
mapTagNamesEnabled: true
|
||||
filterByTraceID: true
|
||||
serviceMap:
|
||||
@@ -656,7 +656,7 @@ datasources:
|
||||
jsonData:
|
||||
tracesToLogs:
|
||||
datasourceUid: loki
|
||||
tags: ['service.name']
|
||||
tags: ["service.name"]
|
||||
```
|
||||
|
||||
#### Elastic APM
|
||||
@@ -685,10 +685,10 @@ datasources:
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: 'rippled-dashboards'
|
||||
- name: "rippled-dashboards"
|
||||
orgId: 1
|
||||
folder: 'rippled'
|
||||
folderUid: 'rippled'
|
||||
folder: "rippled"
|
||||
folderUid: "rippled"
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 30
|
||||
@@ -880,7 +880,7 @@ In Tempo data source configuration, set up the derived field:
|
||||
jsonData:
|
||||
tracesToLogs:
|
||||
datasourceUid: loki
|
||||
tags: ['trace_id', 'xrpl.tx.hash']
|
||||
tags: ["trace_id", "xrpl.tx.hash"]
|
||||
filterByTraceID: true
|
||||
filterBySpanID: false
|
||||
```
|
||||
@@ -894,9 +894,9 @@ To correlate traces with existing Beast Insight metrics:
|
||||
```yaml
|
||||
# prometheus.yaml
|
||||
scrape_configs:
|
||||
- job_name: 'rippled-statsd'
|
||||
- job_name: "rippled-statsd"
|
||||
static_configs:
|
||||
- targets: ['statsd-exporter:9102']
|
||||
- targets: ["statsd-exporter:9102"]
|
||||
```
|
||||
|
||||
**Step 2: Add exemplars to metrics**
|
||||
@@ -933,4 +933,4 @@ This allows clicking on metric data points to jump directly to the related trace
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Code Samples](./04-code-samples.md)* | *Next: [Implementation Phases](./06-implementation-phases.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Code Samples](./04-code-samples.md)_ | _Next: [Implementation Phases](./06-implementation-phases.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -315,6 +315,7 @@ flowchart TB
|
||||
**Goal**: Get basic tracing working with minimal code changes.
|
||||
|
||||
**What You Get**:
|
||||
|
||||
- RPC request/response traces for all commands
|
||||
- Latency breakdown per RPC command
|
||||
- Error visibility with stack traces
|
||||
@@ -323,6 +324,7 @@ flowchart TB
|
||||
**Code Changes**: ~15 lines in `ServerHandler.cpp`, ~40 lines in new telemetry module
|
||||
|
||||
**Why Start Here**:
|
||||
|
||||
- RPC is the lowest-risk, highest-visibility component
|
||||
- Immediate value for debugging client issues
|
||||
- No cross-node complexity
|
||||
@@ -333,6 +335,7 @@ flowchart TB
|
||||
**Goal**: Add transaction lifecycle tracing across nodes.
|
||||
|
||||
**What You Get**:
|
||||
|
||||
- End-to-end transaction traces from submit to relay
|
||||
- Cross-node correlation (see transaction path)
|
||||
- HashRouter deduplication visibility
|
||||
@@ -341,6 +344,7 @@ flowchart TB
|
||||
**Code Changes**: ~120 lines across 4 files, plus protobuf extension
|
||||
|
||||
**Why Do This Second**:
|
||||
|
||||
- Builds on RPC tracing (transactions submitted via RPC)
|
||||
- Moderate complexity (requires context propagation)
|
||||
- High value for debugging transaction issues
|
||||
@@ -350,6 +354,7 @@ flowchart TB
|
||||
**Goal**: Full observability including consensus.
|
||||
|
||||
**What You Get**:
|
||||
|
||||
- Complete consensus round visibility
|
||||
- Phase transition timing
|
||||
- Validator proposal tracking
|
||||
@@ -358,6 +363,7 @@ flowchart TB
|
||||
**Code Changes**: ~100 lines across 3 consensus files
|
||||
|
||||
**Why Do This Last**:
|
||||
|
||||
- Highest complexity (consensus is critical path)
|
||||
- Requires thorough testing
|
||||
- Lower relative value (consensus issues are rarer)
|
||||
@@ -392,7 +398,7 @@ Clear, measurable criteria for each phase.
|
||||
|
||||
| Criterion | Measurement | Target |
|
||||
| --------------- | ---------------------------------------------------------- | ---------------------------- |
|
||||
| SDK Integration | `cmake --build` succeeds with `-DXRPL_ENABLE_TELEMETRY=ON` | ✅ Compiles |
|
||||
| SDK Integration | `cmake --build` succeeds with `-DXRPL_ENABLE_TELEMETRY=ON` | ✅ Compiles |
|
||||
| Runtime Toggle | `enabled=0` produces zero overhead | <0.1% CPU difference |
|
||||
| Span Creation | Unit test creates and exports span | Span appears in Jaeger |
|
||||
| Configuration | All config options parsed correctly | Config validation tests pass |
|
||||
@@ -534,4 +540,4 @@ flowchart TB
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Configuration Reference](./05-configuration-reference.md)* | *Next: [Observability Backends](./07-observability-backends.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Configuration Reference](./05-configuration-reference.md)_ | _Next: [Observability Backends](./07-observability-backends.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -137,6 +137,7 @@ flowchart TB
|
||||
| **Gateway** | Central collector(s) | Centralized processing | Single point of failure |
|
||||
|
||||
**Recommendation**: Use **Gateway** pattern with regional collectors for rippled networks:
|
||||
|
||||
- One collector cluster per datacenter/region
|
||||
- Tail-based sampling at collector level
|
||||
- Multiple export destinations for redundancy
|
||||
@@ -472,23 +473,27 @@ flowchart TB
|
||||
### 7.7.3 Example: Debugging a Slow Transaction
|
||||
|
||||
**Step 1: Find the trace**
|
||||
|
||||
```
|
||||
# In Grafana Explore with Tempo
|
||||
{resource.service.name="rippled" && span.xrpl.tx.hash="ABC123..."}
|
||||
```
|
||||
|
||||
**Step 2: Get the trace_id from the trace view**
|
||||
|
||||
```
|
||||
Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736
|
||||
```
|
||||
|
||||
**Step 3: Find related PerfLog entries**
|
||||
|
||||
```
|
||||
# In Grafana Explore with Loki
|
||||
{job="rippled"} |= "4bf92f3577b34da6a3ce929d0e0e4736"
|
||||
```
|
||||
|
||||
**Step 4: Check Insight metrics for the time window**
|
||||
|
||||
```
|
||||
# In Grafana with Prometheus
|
||||
rate(rippled_tx_applied_total[1m])
|
||||
@@ -587,4 +592,4 @@ rate(rippled_tx_applied_total[1m])
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Implementation Phases](./06-implementation-phases.md)* | *Next: [Appendix](./08-appendix.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Implementation Phases](./06-implementation-phases.md)_ | _Next: [Appendix](./08-appendix.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -130,4 +130,4 @@ flowchart TB
|
||||
|
||||
---
|
||||
|
||||
*Previous: [Observability Backends](./07-observability-backends.md)* | *Back to: [Overview](./OpenTelemetryPlan.md)*
|
||||
_Previous: [Observability Backends](./07-observability-backends.md)_ | _Back to: [Overview](./OpenTelemetryPlan.md)_
|
||||
|
||||
@@ -130,6 +130,7 @@ Performance optimization strategies include probabilistic head sampling (10% def
|
||||
## 4. Code Samples
|
||||
|
||||
Complete C++ implementation examples are provided for all telemetry components:
|
||||
|
||||
- `Telemetry.h` - Core interface for tracer access and span creation
|
||||
- `SpanGuard.h` - RAII wrapper for automatic span lifecycle management
|
||||
- `TracingInstrumentation.h` - Macros for conditional instrumentation
|
||||
@@ -186,4 +187,4 @@ The appendix contains a glossary of OpenTelemetry and rippled-specific terms, re
|
||||
|
||||
---
|
||||
|
||||
*This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. For detailed information on any section, follow the links to the corresponding sub-documents.*
|
||||
_This document provides a comprehensive implementation plan for integrating OpenTelemetry distributed tracing into the rippled XRP Ledger node software. For detailed information on any section, follow the links to the corresponding sub-documents._
|
||||
|
||||
@@ -303,3 +303,9 @@ words:
|
||||
- xrplf
|
||||
- xxhash
|
||||
- xxhasher
|
||||
- xychart
|
||||
- otelc
|
||||
- zpages
|
||||
- traceql
|
||||
- Gantt
|
||||
- gantt
|
||||
|
||||
@@ -29,7 +29,24 @@ flowchart LR
|
||||
|
||||
---
|
||||
|
||||
## Slide 2: Comparison with Existing Solutions
|
||||
## Slide 2: OpenTelemetry vs Open Source Alternatives
|
||||
|
||||
| Feature | OpenTelemetry | Jaeger | Zipkin | SkyWalking | Pinpoint | Prometheus |
|
||||
| ------------------- | ---------------- | ---------------- | ------------------ | ---------- | ---------- | ---------- |
|
||||
| **Tracing** | YES | YES | YES | YES | YES | NO |
|
||||
| **Metrics** | YES | NO | NO | YES | YES | YES |
|
||||
| **Logs** | YES | NO | NO | YES | NO | NO |
|
||||
| **C++ SDK** | YES Official | YES (Deprecated) | YES (Unmaintained) | NO | NO | YES |
|
||||
| **Vendor Neutral** | YES Primary goal | NO | NO | NO | NO | NO |
|
||||
| **Instrumentation** | Manual + Auto | Manual | Manual | Auto-first | Auto-first | Manual |
|
||||
| **Backend** | Any (exporters) | Self | Self | Self | Self | Self |
|
||||
| **CNCF Status** | Incubating | Graduated | NO | Incubating | NO | Graduated |
|
||||
|
||||
> **Why OpenTelemetry?** It's the only actively maintained, full-featured C++ option with vendor neutrality — allowing export to Jaeger, Prometheus, Grafana, or any commercial backend without changing instrumentation.
|
||||
|
||||
---
|
||||
|
||||
## Slide 3: Comparison with rippled's Existing Solutions
|
||||
|
||||
### Current Observability Stack
|
||||
|
||||
@@ -46,16 +63,16 @@ flowchart LR
|
||||
|
||||
| Scenario | PerfLog | StatsD | OpenTelemetry |
|
||||
| -------------------------------- | ------- | ------ | ------------- |
|
||||
| "How many TXs per second?" | ❌ | ✅ | ❌ |
|
||||
| "Why was this specific TX slow?" | ⚠️ | ❌ | ✅ |
|
||||
| "Which node delayed consensus?" | ❌ | ❌ | ✅ |
|
||||
| "Show TX journey across 5 nodes" | ❌ | ❌ | ✅ |
|
||||
| "How many TXs per second?" | ❌ | ✅ | ❌ |
|
||||
| "Why was this specific TX slow?" | ⚠️ | ❌ | ✅ |
|
||||
| "Which node delayed consensus?" | ❌ | ❌ | ✅ |
|
||||
| "Show TX journey across 5 nodes" | ❌ | ❌ | ✅ |
|
||||
|
||||
> **Key Insight**: OpenTelemetry **complements** (not replaces) existing systems.
|
||||
|
||||
---
|
||||
|
||||
## Slide 3: Architecture
|
||||
## Slide 4: Architecture
|
||||
|
||||
### High-Level Integration Architecture
|
||||
|
||||
@@ -103,7 +120,7 @@ sequenceDiagram
|
||||
|
||||
---
|
||||
|
||||
## Slide 4: Implementation Plan
|
||||
## Slide 5: Implementation Plan
|
||||
|
||||
### 5-Phase Rollout (9 Weeks)
|
||||
|
||||
@@ -143,7 +160,7 @@ gantt
|
||||
|
||||
---
|
||||
|
||||
## Slide 5: Performance Overhead
|
||||
## Slide 6: Performance Overhead
|
||||
|
||||
### Estimated System Impact
|
||||
|
||||
@@ -211,7 +228,7 @@ flowchart LR
|
||||
|
||||
---
|
||||
|
||||
## Slide 6: Data Collection & Privacy
|
||||
## Slide 7: Data Collection & Privacy
|
||||
|
||||
### What Data is Collected
|
||||
|
||||
@@ -260,4 +277,4 @@ flowchart LR
|
||||
|
||||
---
|
||||
|
||||
*End of Presentation*
|
||||
_End of Presentation_
|
||||
|
||||
Reference in New Issue
Block a user