Merge branch 'pratik/otel-phase8-log-correlation' into pratik/otel-phase9-metric-gap-fill

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-04-29 20:24:52 +01:00
1526 changed files with 75760 additions and 25422 deletions

View File

@@ -558,7 +558,7 @@ network delay. A test case specifies:
1. a UNL with different number of validators for different test cases,
1. a network with zero or more non-validator nodes,
1. a sequence of validator reliability change events (by killing/restarting
nodes, or by running modified rippled that does not send all validation
nodes, or by running modified xrpld that does not send all validation
messages),
1. the correct outcomes.
@@ -566,7 +566,7 @@ For all the test cases, the correct outcomes are verified by examining logs. We
will grep the log to see if the correct negative UNLs are generated, and whether
or not the network is making progress when it should be. The ripdtop tool will
be helpful for monitoring validators' states and ledger progress. Some of the
timing parameters of rippled will be changed to have faster ledger time. Most if
timing parameters of xrpld will be changed to have faster ledger time. Most if
not all test cases do not need client transactions.
For example, the test cases for the prototype:
@@ -583,7 +583,7 @@ For example, the test cases for the prototype:
We considered testing with the current unit test framework, specifically the
[Consensus Simulation
Framework](https://github.com/ripple/rippled/blob/develop/src/test/csf/README.md)
Framework](https://github.com/XRPLF/rippled/blob/develop/src/test/csf/README.md)
(CSF). However, the CSF currently can only test the generic consensus algorithm
as in the paper: [Analysis of the XRP Ledger Consensus
Protocol](https://arxiv.org/abs/1802.07242).

View File

@@ -4,7 +4,7 @@ skinparam sequenceArrowThickness 2
skinparam roundcorner 20
skinparam maxmessagesize 160
actor "Rippled Start" as RS
actor "Xrpld Start" as RS
participant "Timer" as T
participant "NetworkOPs" as NOP
participant "ValidatorList" as VL #lightgreen

View File

@@ -1,5 +1,5 @@
# `rippled` Docker Image
# `xrpld` Docker Image
- Some info relating to Docker containers can be found here: [../Builds/containers](../Builds/containers)
- Images for building and testing rippled can be found here: [thejohnfreeman/rippled-docker](https://github.com/thejohnfreeman/rippled-docker/)
- These images do not have rippled. They have all the tools necessary to build rippled.
- Images for building and testing xrpld can be found here: [thejohnfreeman/rippled-docker](https://github.com/thejohnfreeman/rippled-docker/)
- These images do not have xrpld. They have all the tools necessary to build xrpld.

View File

@@ -2,7 +2,7 @@
# Project related configuration options
#---------------------------------------------------------------------------
DOXYFILE_ENCODING = UTF-8
PROJECT_NAME = "rippled"
PROJECT_NAME = "xrpld"
PROJECT_NUMBER =
PROJECT_BRIEF =
PROJECT_LOGO =

View File

@@ -1,4 +1,4 @@
## Heap profiling of rippled with jemalloc
## Heap profiling of xrpld with jemalloc
The jemalloc library provides a good API for doing heap analysis,
including a mechanism to dump a description of the heap from within the
@@ -7,26 +7,26 @@ activity in general, as well as how to acquire the software, are available on
the jemalloc site:
[https://github.com/jemalloc/jemalloc/wiki/Use-Case:-Heap-Profiling](https://github.com/jemalloc/jemalloc/wiki/Use-Case:-Heap-Profiling)
jemalloc is acquired separately from rippled, and is not affiliated
jemalloc is acquired separately from xrpld, and is not affiliated
with Ripple Labs. If you compile and install jemalloc from the
source release with default options, it will install the library and header
under `/usr/local/lib` and `/usr/local/include`, respectively. Heap
profiling has been tested with rippled on a Linux platform. It should
work on platforms on which both rippled and jemalloc are available.
profiling has been tested with xrpld on a Linux platform. It should
work on platforms on which both xrpld and jemalloc are available.
To link rippled with jemalloc, the argument
To link xrpld with jemalloc, the argument
`profile-jemalloc=<jemalloc_dir>` is provided after the optional target.
The `<jemalloc_dir>` argument should be the same as that of the
`--prefix` parameter passed to the jemalloc configure script when building.
## Examples:
Build rippled with jemalloc library under /usr/local/lib and
Build xrpld with jemalloc library under /usr/local/lib and
header under /usr/local/include:
$ scons profile-jemalloc=/usr/local
Build rippled using clang with the jemalloc library under /opt/local/lib
Build xrpld using clang with the jemalloc library under /opt/local/lib
and header under /opt/local/include:
$ scons clang profile-jemalloc=/opt/local

View File

@@ -1,61 +0,0 @@
# Building documentation
## Dependencies
Install these dependencies:
- [Doxygen](http://www.doxygen.nl): All major platforms have [official binary
distributions](http://www.doxygen.nl/download.html#srcbin), or you can
build from [source](http://www.doxygen.nl/download.html#srcbin).
- MacOS: We recommend installing via Homebrew: `brew install doxygen`.
The executable will be installed in `/usr/local/bin` which is already
in the default `PATH`.
If you use the official binary distribution, then you'll need to make
Doxygen available to your command line. You can do this by adding
a symbolic link from `/usr/local/bin` to the `doxygen` executable. For
example,
```
$ ln -s /Applications/Doxygen.app/Contents/Resources/doxygen /usr/local/bin/doxygen
```
- [PlantUML](http://plantuml.com):
1. Install a functioning Java runtime, if you don't already have one.
2. Download [`plantuml.jar`](http://sourceforge.net/projects/plantuml/files/plantuml.jar/download).
- [Graphviz](https://www.graphviz.org):
- Linux: Install from your package manager.
- Windows: Use an [official installer](https://graphviz.gitlab.io/_pages/Download/Download_windows.html).
- MacOS: Install via Homebrew: `brew install graphviz`.
## Docker
Instead of installing the above dependencies locally, you can use the official
build environment Docker image, which has all of them installed already.
1. Install [Docker](https://docs.docker.com/engine/installation/)
2. Pull the image:
```
sudo docker pull rippleci/rippled-ci-builder:2944b78d22db
```
3. Run the image from the project folder:
```
sudo docker run -v $PWD:/opt/rippled --rm rippleci/rippled-ci-builder:2944b78d22db
```
## Build
There is a `docs` target in the CMake configuration.
```
mkdir build
cd build
cmake -Donly_docs=ON ..
cmake --build . --target docs --parallel
```
The output will be in `build/docs/html`.

View File

@@ -20,7 +20,7 @@ CMakeToolchain
```
# If you want to depend on a version of libxrpl that is not in ConanCenter,
# then you can export the recipe from the rippled project.
# then you can export the recipe from the xrpld project.
conan export <path>
```
@@ -49,9 +49,9 @@ cmake --build . --parallel
## CMake subdirectory
The second method adds the [rippled][] project as a CMake
The second method adds the [xrpld][] project as a CMake
[subdirectory][add_subdirectory].
This method works well when you keep the rippled project as a Git
This method works well when you keep the xrpld project as a Git
[submodule][].
It's good for when you want to make changes to libxrpl as part of your own
project.
@@ -90,6 +90,6 @@ cmake --build . --parallel
[add_subdirectory]: https://cmake.org/cmake/help/latest/command/add_subdirectory.html
[submodule]: https://git-scm.com/book/en/v2/Git-Tools-Submodules
[rippled]: https://github.com/ripple/rippled
[xrpld]: https://github.com/XRPLF/rippled
[Conan]: https://docs.conan.io/
[CMake]: https://cmake.org/cmake/help/latest/

View File

@@ -55,7 +55,7 @@ clang --version
### Install Xcode Specific Version (Optional)
If you develop other applications using XCode you might be consistently updating to the newest version of Apple Clang.
This will likely cause issues building rippled. You may want to install a specific version of Xcode:
This will likely cause issues building xrpld. You may want to install a specific version of Xcode:
1. **Download Xcode**
- Visit [Apple Developer Downloads](https://developer.apple.com/download/more/)

58
docs/build/install.md vendored
View File

@@ -1,4 +1,4 @@
This document contains instructions for installing rippled.
This document contains instructions for installing xrpld.
The APT package manager is common on Debian-based Linux distributions like
Ubuntu,
while the YUM package manager is common on Red Hat-based Linux distributions
@@ -8,7 +8,7 @@ and the only supported option for installing custom builds.
## From source
From a source build, you can install rippled and libxrpl using CMake's
From a source build, you can install xrpld and libxrpl using CMake's
`--install` mode:
```
@@ -16,7 +16,7 @@ cmake --install . --prefix /opt/local
```
The default [prefix][1] is typically `/usr/local` on Linux and macOS and
`C:/Program Files/rippled` on Windows.
`C:/Program Files/xrpld` on Windows.
[1]: https://cmake.org/cmake/help/latest/variable/CMAKE_INSTALL_PREFIX.html
@@ -50,9 +50,9 @@ The default [prefix][1] is typically `/usr/local` on Linux and macOS and
In particular, make sure that the fingerprint matches. (In the above example, the fingerprint is on the third line, starting with `C001`.)
5. Add the appropriate Ripple repository for your operating system version:
5. Add the appropriate XRPL repository for your operating system version:
echo "deb [signed-by=/usr/local/share/keyrings/ripple-key.gpg] https://repos.ripple.com/repos/rippled-deb focal stable" | \
echo "deb [signed-by=/usr/local/share/keyrings/ripple-key.gpg] https://repos.ripple.com/repos/xrpld-deb focal stable" | \
sudo tee -a /etc/apt/sources.list.d/ripple.list
The above example is appropriate for **Ubuntu 20.04 Focal Fossa**. For other operating systems, replace the word `focal` with one of the following:
@@ -61,33 +61,33 @@ The default [prefix][1] is typically `/usr/local` on Linux and macOS and
- `bullseye` for **Debian 11 Bullseye**
- `buster` for **Debian 10 Buster**
If you want access to development or pre-release versions of `rippled`, use one of the following instead of `stable`:
- `unstable` - Pre-release builds ([`release` branch](https://github.com/ripple/rippled/tree/release))
- `nightly` - Experimental/development builds ([`develop` branch](https://github.com/ripple/rippled/tree/develop))
If you want access to development or pre-release versions of `xrpld`, use one of the following instead of `stable`:
- `unstable` - Pre-release builds ([`release` branch](https://github.com/XRPLF/rippled/tree/release))
- `nightly` - Experimental/development builds ([`develop` branch](https://github.com/XRPLF/rippled/tree/develop))
**Warning:** Unstable and nightly builds may be broken at any time. Do not use these builds for production servers.
6. Fetch the Ripple repository.
6. Fetch the XRPL repository.
sudo apt -y update
7. Install the `rippled` software package:
7. Install the `xrpld` software package:
sudo apt -y install rippled
sudo apt -y install xrpld
8. Check the status of the `rippled` service:
8. Check the status of the `xrpld` service:
systemctl status rippled.service
systemctl status xrpld.service
The `rippled` service should start automatically. If not, you can start it manually:
The `xrpld` service should start automatically. If not, you can start it manually:
sudo systemctl start rippled.service
sudo systemctl start xrpld.service
9. Optional: allow `rippled` to bind to privileged ports.
9. Optional: allow `xrpld` to bind to privileged ports.
This allows you to serve incoming API requests on port 80 or 443. (If you want to do so, you must also update the config file's port settings.)
sudo setcap 'cap_net_bind_service=+ep' /opt/ripple/bin/rippled
sudo setcap 'cap_net_bind_service=+ep' /opt/xrpld/bin/xrpld
## With the YUM package manager
@@ -106,8 +106,8 @@ The default [prefix][1] is typically `/usr/local` on Linux and macOS and
enabled=1
gpgcheck=0
repo_gpgcheck=1
baseurl=https://repos.ripple.com/repos/rippled-rpm/stable/
gpgkey=https://repos.ripple.com/repos/rippled-rpm/stable/repodata/repomd.xml.key
baseurl=https://repos.ripple.com/repos/xrpld-rpm/stable/
gpgkey=https://repos.ripple.com/repos/xrpld-rpm/stable/repodata/repomd.xml.key
REPOFILE
_Unstable_
@@ -118,8 +118,8 @@ The default [prefix][1] is typically `/usr/local` on Linux and macOS and
enabled=1
gpgcheck=0
repo_gpgcheck=1
baseurl=https://repos.ripple.com/repos/rippled-rpm/unstable/
gpgkey=https://repos.ripple.com/repos/rippled-rpm/unstable/repodata/repomd.xml.key
baseurl=https://repos.ripple.com/repos/xrpld-rpm/unstable/
gpgkey=https://repos.ripple.com/repos/xrpld-rpm/unstable/repodata/repomd.xml.key
REPOFILE
_Nightly_
@@ -130,22 +130,22 @@ The default [prefix][1] is typically `/usr/local` on Linux and macOS and
enabled=1
gpgcheck=0
repo_gpgcheck=1
baseurl=https://repos.ripple.com/repos/rippled-rpm/nightly/
gpgkey=https://repos.ripple.com/repos/rippled-rpm/nightly/repodata/repomd.xml.key
baseurl=https://repos.ripple.com/repos/xrpld-rpm/nightly/
gpgkey=https://repos.ripple.com/repos/xrpld-rpm/nightly/repodata/repomd.xml.key
REPOFILE
2. Fetch the latest repo updates:
sudo yum -y update
3. Install the new `rippled` package:
3. Install the new `xrpld` package:
sudo yum install -y rippled
sudo yum install -y xrpld
4. Configure the `rippled` service to start on boot:
4. Configure the `xrpld` service to start on boot:
sudo systemctl enable rippled.service
sudo systemctl enable xrpld.service
5. Start the `rippled` service:
5. Start the `xrpld` service:
sudo systemctl start rippled.service
sudo systemctl start xrpld.service

View File

@@ -1,9 +1,9 @@
# Sanitizer Configuration for Rippled
# Sanitizer Configuration for Xrpld
This document explains how to properly configure and run sanitizers (AddressSanitizer, undefinedbehaviorSanitizer, ThreadSanitizer) with the xrpld project.
Corresponding suppression files are located in the `sanitizers/suppressions` directory.
- [Sanitizer Configuration for Rippled](#sanitizer-configuration-for-rippled)
- [Sanitizer Configuration for Xrpld](#sanitizer-configuration-for-xrpld)
- [Building with Sanitizers](#building-with-sanitizers)
- [Summary](#summary)
- [Build steps:](#build-steps)
@@ -100,7 +100,7 @@ export LSAN_OPTIONS="include=sanitizers/suppressions/runtime-lsan-options.txt:su
- Boost intrusive containers (used in `aged_unordered_container`) trigger false positives
- Boost context switching (used in `Workers.cpp`) confuses ASAN's stack tracking
- Since we usually don't build Boost (because we don't want to instrument Boost and detect issues in Boost code) with ASAN but use Boost containers in ASAN instrumented rippled code, it generates false positives.
- Since we usually don't build Boost (because we don't want to instrument Boost and detect issues in Boost code) with ASAN but use Boost containers in ASAN instrumented xrpld code, it generates false positives.
- Building dependencies with ASAN instrumentation reduces false positives. But we don't want to instrument dependencies like Boost with ASAN because it is slow (to compile as well as run tests) and not necessary.
- See: https://github.com/google/sanitizers/wiki/AddressSanitizerContainerOverflow
- More such flags are detailed [here](https://github.com/google/sanitizers/wiki/AddressSanitizerFlags)

View File

@@ -1,8 +1,8 @@
# OpenTelemetry Tracing for Rippled
# OpenTelemetry Tracing for xrpld
This document explains how to build rippled with OpenTelemetry distributed tracing support, configure the runtime telemetry options, and set up the observability backend to view traces.
This document explains how to build xrpld with OpenTelemetry distributed tracing support, configure the runtime telemetry options, and set up the observability backend to view traces.
- [OpenTelemetry Tracing for Rippled](#opentelemetry-tracing-for-rippled)
- [OpenTelemetry Tracing for xrpld](#opentelemetry-tracing-for-xrpld)
- [Overview](#overview)
- [Building with Telemetry](#building-with-telemetry)
- [Summary](#summary)
@@ -28,7 +28,7 @@ This document explains how to build rippled with OpenTelemetry distributed traci
## Overview
Rippled supports optional [OpenTelemetry](https://opentelemetry.io/) distributed tracing.
xrpld supports optional [OpenTelemetry](https://opentelemetry.io/) distributed tracing.
When enabled, it instruments RPC requests with trace spans that are exported via
OTLP/HTTP to an OpenTelemetry Collector, which forwards them to a tracing backend
such as Grafana Tempo.
@@ -55,7 +55,7 @@ Follow the same instructions as mentioned in [BUILD.md](../../BUILD.md) but with
### Build steps
```bash
cd /path/to/rippled
cd /path/to/xrpld
rm -rf .build
mkdir .build
cd .build
@@ -119,13 +119,13 @@ Add a `[telemetry]` section to your `xrpld.cfg` file:
```ini
[telemetry]
enabled=1
service_name=rippled
endpoint=http://localhost:4318/v1/traces
sampling_ratio=1.0
trace_rpc=1
trace_transactions=1
trace_consensus=1
trace_peer=0
trace_ledger=1
```
### Configuration options
@@ -133,13 +133,12 @@ trace_peer=0
| Option | Type | Default | Description |
| --------------------- | ------ | --------------------------------- | -------------------------------------------------- |
| `enabled` | int | `0` | Enable (`1`) or disable (`0`) telemetry at runtime |
| `service_name` | string | `rippled` | Service name reported in traces |
| `service_name` | string | `xrpld` | Service name reported in traces |
| `service_instance_id` | string | node public key | Unique instance identifier |
| `exporter` | string | `otlp_http` | Exporter type |
| `endpoint` | string | `http://localhost:4318/v1/traces` | OTLP/HTTP collector endpoint |
| `use_tls` | int | `0` | Enable TLS for the exporter connection |
| `tls_ca_cert` | string | (empty) | Path to CA certificate for TLS |
| `sampling_ratio` | double | `1.0` | Fraction of traces to sample (`0.0` to `1.0`) |
| `sampling_ratio` | double | `1.0` | Head-based sampling ratio (`0.0` to `1.0`) |
| `batch_size` | uint32 | `512` | Maximum spans per export batch |
| `batch_delay_ms` | uint32 | `5000` | Maximum delay (ms) before flushing a batch |
| `max_queue_size` | uint32 | `2048` | Maximum spans queued in memory |
@@ -179,7 +178,7 @@ open http://localhost:3000
1. Open `http://localhost:3000` in a browser.
2. Navigate to **Explore** and select the **Tempo** datasource.
3. Use **Search** or **TraceQL** to find traces by service name (e.g. `rippled`).
3. Use **Search** or **TraceQL** to find traces by service name (e.g. `xrpld`).
4. Click into any trace to see the span tree and attributes.
Traced RPC operations produce a span hierarchy like:
@@ -210,7 +209,7 @@ silently drops spans with no impact on test results.
./xrpld --unittest --unittest-jobs $(nproc)
```
To generate traces during manual testing, start rippled in standalone mode:
To generate traces during manual testing, start xrpld in standalone mode:
```bash
./xrpld --conf /path/to/xrpld.cfg --standalone --start
@@ -230,7 +229,7 @@ curl -s -X POST http://127.0.0.1:5005/ \
1. Confirm the OTel Collector is running: `docker compose -f docker/telemetry/docker-compose.yml ps`
2. Check collector logs for errors: `docker compose -f docker/telemetry/docker-compose.yml logs otel-collector`
3. Confirm `[telemetry] enabled=1` is set in the rippled config.
3. Confirm `[telemetry] enabled=1` is set in the xrpld config.
4. Confirm `endpoint` points to the correct collector address (`http://localhost:4318/v1/traces`).
5. Wait for the batch delay to elapse (default `5000` ms) before checking Grafana Explore.
@@ -252,24 +251,48 @@ The Conan package provides a single umbrella target
### Key files
| File | Purpose |
| ---------------------------------------------- | ----------------------------------------------------------- |
| `include/xrpl/telemetry/Telemetry.h` | Abstract telemetry interface and `Setup` struct |
| `include/xrpl/telemetry/SpanGuard.h` | RAII span guard (activates scope, ends span on destruction) |
| `src/libxrpl/telemetry/Telemetry.cpp` | OTel-backed implementation (`TelemetryImpl`) |
| `src/libxrpl/telemetry/TelemetryConfig.cpp` | Config parser (`setup_Telemetry()`) |
| `src/libxrpl/telemetry/NullTelemetry.cpp` | No-op implementation (used when disabled) |
| `src/xrpld/telemetry/TracingInstrumentation.h` | Convenience macros (`XRPL_TRACE_RPC`, etc.) |
| `src/xrpld/rpc/detail/ServerHandler.cpp` | RPC entry point instrumentation |
| `src/xrpld/rpc/detail/RPCHandler.cpp` | Per-command instrumentation |
| `docker/telemetry/docker-compose.yml` | Observability stack (Collector + Tempo + Grafana) |
| `docker/telemetry/otel-collector-config.yaml` | OTel Collector pipeline configuration |
| File | Purpose |
| --------------------------------------------- | ------------------------------------------------------------ |
| `include/xrpl/telemetry/Telemetry.h` | Abstract telemetry interface and `Setup` struct |
| `include/xrpl/telemetry/SpanGuard.h` | RAII span guard with `discard()` for dropping unwanted spans |
| `include/xrpl/telemetry/DiscardFlag.h` | Thread-local discard flag (zero-dependency header) |
| `src/libxrpl/telemetry/Telemetry.cpp` | OTel SDK setup, `FilteringSpanProcessor`, provider lifecycle |
| `src/libxrpl/telemetry/TelemetryConfig.cpp` | Config parser (`setup_Telemetry()`) |
| `src/libxrpl/telemetry/NullTelemetry.cpp` | No-op implementation (used when disabled) |
| `src/libxrpl/telemetry/SpanGuard.cpp` | Pimpl implementation for SpanGuard (all OTel types confined) |
| `src/xrpld/rpc/detail/ServerHandler.cpp` | RPC entry point instrumentation |
| `src/xrpld/rpc/detail/RPCHandler.cpp` | Per-command instrumentation |
| `docker/telemetry/docker-compose.yml` | Observability stack (Collector + Tempo + Grafana) |
| `docker/telemetry/otel-collector-config.yaml` | OTel Collector pipeline configuration |
### Span discard mechanism
`SpanGuard::discard()` allows callers to silently drop spans that turn out to be
uninteresting (e.g., failed preflight transactions). This saves both network bandwidth
and storage by preventing the span from being exported.
The mechanism uses a thread-local flag (`tl_discardCurrentSpan` in `DiscardFlag.h`) as a
side-channel to the `FilteringSpanProcessor` (in `Telemetry.cpp`):
1. `SpanGuard::discard()` sets the thread-local flag and calls `Span::End()`
2. The OTel SDK calls `FilteringSpanProcessor::OnEnd()` synchronously on the same thread
3. The processor checks the flag, clears it, and drops the span before it enters the batch queue
```cpp
SpanGuard guard(telemetry.startSpan("tx.process"));
auto result = preflight(tx);
if (result != tesSUCCESS)
{
guard.discard(); // span is dropped, never exported
return result;
}
```
### Conditional compilation
All OpenTelemetry SDK headers are guarded behind `#ifdef XRPL_ENABLE_TELEMETRY`.
The instrumentation macros in `TracingInstrumentation.h` compile to `((void)0)` when
the define is absent.
All OpenTelemetry SDK types are hidden behind the pimpl idiom in `SpanGuard.cpp`.
When `XRPL_ENABLE_TELEMETRY` is not defined, `SpanGuard.h` provides an all-inline
no-op stub class with zero overhead and zero OTel dependencies.
At runtime, if `enabled=0` is set in config (or the section is omitted), a
`NullTelemetry` implementation is used that returns no-op spans.
This two-layer approach ensures zero overhead when telemetry is not wanted.

View File

@@ -5,9 +5,9 @@
Consensus is the task of reaching agreement within a distributed system in the
presence of faulty or even malicious participants. This document outlines the
[XRP Ledger Consensus Algorithm](https://arxiv.org/abs/1802.07242)
as implemented in [rippled](https://github.com/ripple/rippled), but
as implemented in [xrpld](https://github.com/XRPLF/rippled), but
focuses on its utility as a generic consensus algorithm independent of the
detailed mechanics of the Ripple Consensus Ledger. Most notably, the algorithm
detailed mechanics of the XRPL consensus Ledger. Most notably, the algorithm
does not require fully synchronous communication between all nodes in the
network, or even a fixed network topology, but instead achieves consensus via
collectively trusted subnetworks.
@@ -15,7 +15,7 @@ collectively trusted subnetworks.
## Distributed Agreement
A challenge for distributed systems is reaching agreement on changes in shared
state. For the Ripple network, the shared state is the current ledger--account
state. For the XRPL network, the shared state is the current ledger--account
information, account balances, order books and other financial data. We will
refer to shared distributed state as a /ledger/ throughout the remainder of this
document.
@@ -23,7 +23,7 @@ document.
![Ledger Chain](images/consensus/ledger_chain.png "Ledger Chain")
As shown above, new ledgers are made by applying a set of transactions to the
prior ledger. For the Ripple network, transactions include payments,
prior ledger. For the XRPL network, transactions include payments,
modification of account settings, updates to offers and more.
In a centralized system, generating the next ledger is trivial since there is a
@@ -33,10 +33,10 @@ the set of transactions to include, the order to apply those transactions, and
even the resulting ledger after applying the transactions. This is even more
difficult when some participants are faulty or malicious.
The Ripple network is a decentralized and **trust-full** network. Anyone is free
The XRPL network is a decentralized and **trust-full** network. Anyone is free
to join and participants are free to choose a subset of peers that are
collectively trusted to not collude in an attempt to defraud the participant.
Leveraging this network of trust, the Ripple algorithm has two main components.
Leveraging this network of trust, the XRPL algorithm has two main components.
- _Consensus_ in which network participants agree on the transactions to apply
to a prior ledger, based on the positions of their chosen peers.
@@ -54,9 +54,9 @@ and was abandoned.
The remainder of this section describes the Consensus and Validation algorithms
in more detail and is meant as a companion guide to understanding the generic
implementation in `rippled`. The document **does not** discuss correctness,
implementation in `xrpld`. The document **does not** discuss correctness,
fault-tolerance or liveness properties of the algorithms or the full details of
how they integrate within `rippled` to support the Ripple Consensus Ledger.
how they integrate within `xrpld` to support the XRPL consensus Ledger.
## Consensus Overview

View File

@@ -7,7 +7,7 @@
## Summary
Integrate 29 missing metrics, 18 alert rules, and enriched span attributes from the community `xrpl-validator-dashboard` into rippled's native OpenTelemetry instrumentation. Changes are distributed across phases 2, 3, 4, 6, 7, 9, 10, and 11 of the OTel PR chain.
Integrate 29 missing metrics, 18 alert rules, and enriched span attributes from the community `xrpl-validator-dashboard` into xrpld's native OpenTelemetry instrumentation. Changes are distributed across phases 2, 3, 4, 6, 7, 9, 10, and 11 of the OTel PR chain.
## Gap Analysis
@@ -76,13 +76,13 @@ New span attributes on `rpc.command.*`:
**Task 3.7: Transaction Span Peer Version Attribute**
Add the relaying peer's rippled version to transaction receive spans to enable version-mismatch correlation.
Add the relaying peer's xrpld version to transaction receive spans to enable version-mismatch correlation.
New span attribute on `tx.receive`:
| Attribute | Type | Source | Value Example |
| ------------------- | ------ | -------------------- | ----------------- |
| `xrpl.peer.version` | string | `peer->getVersion()` | `"rippled-2.4.0"` |
| Attribute | Type | Source | Value Example |
| ------------------- | ------ | -------------------- | --------------- |
| `xrpl.peer.version` | string | `peer->getVersion()` | `"xrpld-2.4.0"` |
**File**: `src/xrpld/overlay/detail/PeerImp.cpp` (in the `tx.receive` span block, after existing `xrpl.peer.id` setAttribute)
@@ -150,12 +150,12 @@ The overlay already tracks resource-limit disconnects via `OverlayImpl::Stats::p
**What to do**:
- Ensure `rippled_Overlay_Peer_Disconnects_Charges` appears in the StatsD-to-Prometheus metric name mapping
- Ensure `xrpld_Overlay_Peer_Disconnects_Charges` appears in the StatsD-to-Prometheus metric name mapping
- Verify the metric appears in Prometheus after StatsD bridge is active
**File**: `src/xrpld/overlay/detail/OverlayImpl.cpp`
**Prometheus name**: `rippled_Overlay_Peer_Disconnects_Charges`
**Prometheus name**: `xrpld_Overlay_Peer_Disconnects_Charges`
---
@@ -262,12 +262,12 @@ class ValidationTracker
New MetricsRegistry observable gauge for amendment, UNL, and quorum health.
| Gauge Name | Label `metric=` | Type | Source |
| -------------------------- | ------------------- | ------ | ------------------------------------------------- |
| `rippled_validator_health` | `amendment_blocked` | int64 | `app_.getOPs().isAmendmentBlocked()` → 0/1 |
| | `unl_blocked` | int64 | `app_.getOPs().isUNLBlocked()` → 0/1 |
| | `unl_expiry_days` | double | `app_.validators().expires()` → days until expiry |
| | `validation_quorum` | int64 | `app_.validators().quorum()` |
| Gauge Name | Label `metric=` | Type | Source |
| ------------------------ | ------------------- | ------ | ------------------------------------------------- |
| `xrpld_validator_health` | `amendment_blocked` | int64 | `app_.getOPs().isAmendmentBlocked()` → 0/1 |
| | `unl_blocked` | int64 | `app_.getOPs().isUNLBlocked()` → 0/1 |
| | `unl_expiry_days` | double | `app_.validators().expires()` → days until expiry |
| | `validation_quorum` | int64 | `app_.validators().quorum()` |
**File**: `src/xrpld/telemetry/MetricsRegistry.cpp` (new gauge callback in `registerAsyncGauges()`)
@@ -283,12 +283,12 @@ New MetricsRegistry observable gauge for amendment, UNL, and quorum health.
New MetricsRegistry observable gauge for peer health aggregates.
| Gauge Name | Label `metric=` | Type | Source |
| ---------------------- | -------------------------- | ------ | ------------------------------------------ |
| `rippled_peer_quality` | `peer_latency_p90_ms` | double | Iterate peers, compute P90 from `latency_` |
| | `peers_insane_count` | int64 | Count peers with `tracking_ == diverged` |
| | `peers_higher_version_pct` | double | Compare `getVersion()` to own version |
| | `upgrade_recommended` | int64 | 1 if `peers_higher_version_pct > 60%` |
| Gauge Name | Label `metric=` | Type | Source |
| -------------------- | -------------------------- | ------ | ------------------------------------------ |
| `xrpld_peer_quality` | `peer_latency_p90_ms` | double | Iterate peers, compute P90 from `latency_` |
| | `peers_insane_count` | int64 | Count peers with `tracking_ == diverged` |
| | `peers_higher_version_pct` | double | Compare `getVersion()` to own version |
| | `upgrade_recommended` | int64 | 1 if `peers_higher_version_pct > 60%` |
**Implementation note**: The callback iterates `app_.overlay().foreach(...)` to collect per-peer latency and version data. This runs every 10s on the metrics reader thread — acceptable overhead for ~50-200 peers.
@@ -298,7 +298,7 @@ New MetricsRegistry observable gauge for peer health aggregates.
- [ ] P90 latency computed correctly (sort peer latencies, pick 90th percentile)
- [ ] Insane count matches `peers` RPC output
- [ ] Version comparison handles format variations (e.g., "rippled-2.4.0-rc1")
- [ ] Version comparison handles format variations (e.g., "xrpld-2.4.0-rc1")
- [ ] Values visible in Prometheus
---
@@ -307,13 +307,13 @@ New MetricsRegistry observable gauge for peer health aggregates.
New MetricsRegistry observable gauge for fee and ledger metrics.
| Gauge Name | Label `metric=` | Type | Source |
| ------------------------ | -------------------- | ------ | ----------------------------------------- |
| `rippled_ledger_economy` | `base_fee_xrp` | double | `app_.getFeeTrack().getBaseFee()` → drops |
| | `reserve_base_xrp` | double | From validated ledger fee settings |
| | `reserve_inc_xrp` | double | From validated ledger fee settings |
| | `ledger_age_seconds` | double | `now - lastValidatedCloseTime` |
| | `transaction_rate` | double | Derived: tx count delta / time delta |
| Gauge Name | Label `metric=` | Type | Source |
| ---------------------- | -------------------- | ------ | ----------------------------------------- |
| `xrpld_ledger_economy` | `base_fee_xrp` | double | `app_.getFeeTrack().getBaseFee()` → drops |
| | `reserve_base_xrp` | double | From validated ledger fee settings |
| | `reserve_inc_xrp` | double | From validated ledger fee settings |
| | `ledger_age_seconds` | double | `now - lastValidatedCloseTime` |
| | `transaction_rate` | double | Derived: tx count delta / time delta |
**File**: `src/xrpld/telemetry/MetricsRegistry.cpp`
@@ -329,14 +329,14 @@ New MetricsRegistry observable gauge for fee and ledger metrics.
New MetricsRegistry observable gauge for node state duration.
| Gauge Name | Label `metric=` | Type | Source |
| ------------------------ | ------------------------------- | ------ | ------------------------------------------------ |
| `rippled_state_tracking` | `state_value` | int64 | 0-7 numeric encoding matching external dashboard |
| | `time_in_current_state_seconds` | double | `now - lastModeChangeTime` |
| Gauge Name | Label `metric=` | Type | Source |
| ---------------------- | ------------------------------- | ------ | ------------------------------------------------ |
| `xrpld_state_tracking` | `state_value` | int64 | 0-7 numeric encoding matching external dashboard |
| | `time_in_current_state_seconds` | double | `now - lastModeChangeTime` |
**State value encoding**:
rippled's `OperatingMode` enum maps 0-4 (DISCONNECTED through FULL). The external dashboard extends this to 0-6 by combining operating mode with consensus participation:
xrpld's `OperatingMode` enum maps 0-4 (DISCONNECTED through FULL). The external dashboard extends this to 0-6 by combining operating mode with consensus participation:
| Value | State | Source |
| ----- | ------------ | ----------------------------------------------------------- |
@@ -361,9 +361,9 @@ rippled's `OperatingMode` enum maps 0-4 (DISCONNECTED through FULL). The externa
**Task 7.13: Storage Detail Observable Gauge**
| Gauge Name | Label `metric=` | Type | Source |
| ------------------------ | --------------- | ----- | ---------------------------------------- |
| `rippled_storage_detail` | `nudb_bytes` | int64 | NuDB backend file size (filesystem stat) |
| Gauge Name | Label `metric=` | Type | Source |
| ---------------------- | --------------- | ----- | ---------------------------------------- |
| `xrpld_storage_detail` | `nudb_bytes` | int64 | NuDB backend file size (filesystem stat) |
**File**: `src/xrpld/telemetry/MetricsRegistry.cpp`
@@ -378,15 +378,15 @@ rippled's `OperatingMode` enum maps 0-4 (DISCONNECTED through FULL). The externa
New counters incremented at event sites. Declared in MetricsRegistry, recording sites added in consensus/overlay/network code.
| Counter Name | Increment Site | Source File |
| ------------------------------------- | -------------------------------- | --------------------- |
| `rippled_ledgers_closed_total` | `onAccept()` in consensus | RCLConsensus.cpp |
| `rippled_validations_sent_total` | `validate()` in consensus | RCLConsensus.cpp |
| `rippled_validations_checked_total` | Network validation received | LedgerMaster.cpp |
| `rippled_validation_agreements_total` | ValidationTracker reconciliation | ValidationTracker.cpp |
| `rippled_validation_missed_total` | ValidationTracker reconciliation | ValidationTracker.cpp |
| `rippled_state_changes_total` | `setMode()` in NetworkOPs | NetworkOPs.cpp |
| `rippled_jq_trans_overflow_total` | Job queue overflow path | JobQueue.cpp |
| Counter Name | Increment Site | Source File |
| ----------------------------------- | -------------------------------- | --------------------- |
| `xrpld_ledgers_closed_total` | `onAccept()` in consensus | RCLConsensus.cpp |
| `xrpld_validations_sent_total` | `validate()` in consensus | RCLConsensus.cpp |
| `xrpld_validations_checked_total` | Network validation received | LedgerMaster.cpp |
| `xrpld_validation_agreements_total` | ValidationTracker reconciliation | ValidationTracker.cpp |
| `xrpld_validation_missed_total` | ValidationTracker reconciliation | ValidationTracker.cpp |
| `xrpld_state_changes_total` | `setMode()` in NetworkOPs | NetworkOPs.cpp |
| `xrpld_jq_trans_overflow_total` | Job queue overflow path | JobQueue.cpp |
**Key modified files**:
@@ -407,14 +407,14 @@ New counters incremented at event sites. Declared in MetricsRegistry, recording
Reads from the `ValidationTracker` (Task 7.8) to export rolling window stats.
| Gauge Name | Label `metric=` | Type | Source |
| ------------------------------ | ------------------- | ------ | --------------------------- |
| `rippled_validation_agreement` | `agreement_pct_1h` | double | `tracker.agreementPct1h()` |
| | `agreements_1h` | int64 | `tracker.agreements1h()` |
| | `missed_1h` | int64 | `tracker.missed1h()` |
| | `agreement_pct_24h` | double | `tracker.agreementPct24h()` |
| | `agreements_24h` | int64 | `tracker.agreements24h()` |
| | `missed_24h` | int64 | `tracker.missed24h()` |
| Gauge Name | Label `metric=` | Type | Source |
| ---------------------------- | ------------------- | ------ | --------------------------- |
| `xrpld_validation_agreement` | `agreement_pct_1h` | double | `tracker.agreementPct1h()` |
| | `agreements_1h` | int64 | `tracker.agreements1h()` |
| | `missed_1h` | int64 | `tracker.missed1h()` |
| | `agreement_pct_24h` | double | `tracker.agreementPct24h()` |
| | `agreements_24h` | int64 | `tracker.agreements24h()` |
| | `missed_24h` | int64 | `tracker.missed24h()` |
**File**: `src/xrpld/telemetry/MetricsRegistry.cpp`
@@ -432,23 +432,23 @@ Reads from the `ValidationTracker` (Task 7.8) to export rolling window stats.
**Task 9.11: Validator Health Dashboard**
New Grafana dashboard: `rippled-validator-health.json`
New Grafana dashboard: `xrpld-validator-health.json`
| Panel | Type | PromQL |
| -------------------------- | ---------- | ---------------------------------------------------------------- |
| Agreement % (1h) | stat | `rippled_validation_agreement{metric="agreement_pct_1h"}` |
| Agreement % (24h) | stat | `rippled_validation_agreement{metric="agreement_pct_24h"}` |
| Agreements vs Missed (1h) | bargauge | `agreements_1h` and `missed_1h` side by side |
| Agreements vs Missed (24h) | bargauge | `agreements_24h` and `missed_24h` side by side |
| Validation Rate | stat | `rate(rippled_validations_sent_total[5m]) * 60` |
| Validations Checked Rate | stat | `rate(rippled_validations_checked_total[5m]) * 60` |
| Amendment Blocked | stat | `rippled_validator_health{metric="amendment_blocked"}` |
| UNL Expiry (days) | stat | `rippled_validator_health{metric="unl_expiry_days"}` |
| Validation Quorum | stat | `rippled_validator_health{metric="validation_quorum"}` |
| State Value Timeline | timeseries | `rippled_state_tracking{metric="state_value"}` |
| Time in Current State | stat | `rippled_state_tracking{metric="time_in_current_state_seconds"}` |
| State Changes Rate | stat | `rate(rippled_state_changes_total[1h])` |
| Ledgers Closed Rate | stat | `rate(rippled_ledgers_closed_total[5m]) * 60` |
| Panel | Type | PromQL |
| -------------------------- | ---------- | -------------------------------------------------------------- |
| Agreement % (1h) | stat | `xrpld_validation_agreement{metric="agreement_pct_1h"}` |
| Agreement % (24h) | stat | `xrpld_validation_agreement{metric="agreement_pct_24h"}` |
| Agreements vs Missed (1h) | bargauge | `agreements_1h` and `missed_1h` side by side |
| Agreements vs Missed (24h) | bargauge | `agreements_24h` and `missed_24h` side by side |
| Validation Rate | stat | `rate(xrpld_validations_sent_total[5m]) * 60` |
| Validations Checked Rate | stat | `rate(xrpld_validations_checked_total[5m]) * 60` |
| Amendment Blocked | stat | `xrpld_validator_health{metric="amendment_blocked"}` |
| UNL Expiry (days) | stat | `xrpld_validator_health{metric="unl_expiry_days"}` |
| Validation Quorum | stat | `xrpld_validator_health{metric="validation_quorum"}` |
| State Value Timeline | timeseries | `xrpld_state_tracking{metric="state_value"}` |
| Time in Current State | stat | `xrpld_state_tracking{metric="time_in_current_state_seconds"}` |
| State Changes Rate | stat | `rate(xrpld_state_changes_total[1h])` |
| Ledgers Closed Rate | stat | `rate(xrpld_ledgers_closed_total[5m]) * 60` |
**Dashboard conventions**: `$node` template variable for `exported_instance` filtering, dark theme, matching existing panel sizes and color schemes.
@@ -456,16 +456,16 @@ New Grafana dashboard: `rippled-validator-health.json`
**Task 9.12: Peer Quality Dashboard**
New Grafana dashboard: `rippled-peer-quality.json`
New Grafana dashboard: `xrpld-peer-quality.json`
| Panel | Type | PromQL |
| ---------------------- | ---------- | ---------------------------------------------------------------- |
| P90 Peer Latency | timeseries | `rippled_peer_quality{metric="peer_latency_p90_ms"}` |
| Insane/Diverged Peers | stat | `rippled_peer_quality{metric="peers_insane_count"}` |
| Higher Version Peers % | stat | `rippled_peer_quality{metric="peers_higher_version_pct"}` |
| Upgrade Recommended | stat | `rippled_peer_quality{metric="upgrade_recommended"}` |
| Resource Disconnects | timeseries | `rippled_Overlay_Peer_Disconnects_Charges` |
| Inbound vs Outbound | bargauge | `rippled_Peer_Finder_Active_Inbound_Peers`, `..._Outbound_Peers` |
| Panel | Type | PromQL |
| ---------------------- | ---------- | -------------------------------------------------------------- |
| P90 Peer Latency | timeseries | `xrpld_peer_quality{metric="peer_latency_p90_ms"}` |
| Insane/Diverged Peers | stat | `xrpld_peer_quality{metric="peers_insane_count"}` |
| Higher Version Peers % | stat | `xrpld_peer_quality{metric="peers_higher_version_pct"}` |
| Upgrade Recommended | stat | `xrpld_peer_quality{metric="upgrade_recommended"}` |
| Resource Disconnects | timeseries | `xrpld_Overlay_Peer_Disconnects_Charges` |
| Inbound vs Outbound | bargauge | `xrpld_Peer_Finder_Active_Inbound_Peers`, `..._Outbound_Peers` |
---
@@ -473,13 +473,13 @@ New Grafana dashboard: `rippled-peer-quality.json`
Add a "Ledger Economy" row to the existing `system-node-health.json` dashboard:
| Panel | Type | PromQL |
| -------------------- | ---------- | ----------------------------------------------------- |
| Base Fee (drops) | stat | `rippled_ledger_economy{metric="base_fee_xrp"}` |
| Reserve Base (drops) | stat | `rippled_ledger_economy{metric="reserve_base_xrp"}` |
| Reserve Inc (drops) | stat | `rippled_ledger_economy{metric="reserve_inc_xrp"}` |
| Ledger Age | stat | `rippled_ledger_economy{metric="ledger_age_seconds"}` |
| Transaction Rate | timeseries | `rippled_ledger_economy{metric="transaction_rate"}` |
| Panel | Type | PromQL |
| -------------------- | ---------- | --------------------------------------------------- |
| Base Fee (drops) | stat | `xrpld_ledger_economy{metric="base_fee_xrp"}` |
| Reserve Base (drops) | stat | `xrpld_ledger_economy{metric="reserve_base_xrp"}` |
| Reserve Inc (drops) | stat | `xrpld_ledger_economy{metric="reserve_inc_xrp"}` |
| Ledger Age | stat | `xrpld_ledger_economy{metric="ledger_age_seconds"}` |
| Transaction Rate | timeseries | `xrpld_ledger_economy{metric="transaction_rate"}` |
---
@@ -506,28 +506,28 @@ Add checks to `validate_telemetry.py` for all new span attributes and metrics.
**New metric existence checks (~13)**:
| Metric Name |
| ---------------------------------------------------------- |
| `rippled_validation_agreement{metric="agreement_pct_1h"}` |
| `rippled_validation_agreement{metric="agreement_pct_24h"}` |
| `rippled_validator_health{metric="amendment_blocked"}` |
| `rippled_validator_health{metric="unl_expiry_days"}` |
| `rippled_peer_quality{metric="peer_latency_p90_ms"}` |
| `rippled_peer_quality{metric="peers_insane_count"}` |
| `rippled_ledger_economy{metric="base_fee_xrp"}` |
| `rippled_ledger_economy{metric="transaction_rate"}` |
| `rippled_state_tracking{metric="state_value"}` |
| `rippled_ledgers_closed_total` |
| `rippled_validations_sent_total` |
| `rippled_state_changes_total` |
| `rippled_storage_detail{metric="nudb_bytes"}` |
| Metric Name |
| -------------------------------------------------------- |
| `xrpld_validation_agreement{metric="agreement_pct_1h"}` |
| `xrpld_validation_agreement{metric="agreement_pct_24h"}` |
| `xrpld_validator_health{metric="amendment_blocked"}` |
| `xrpld_validator_health{metric="unl_expiry_days"}` |
| `xrpld_peer_quality{metric="peer_latency_p90_ms"}` |
| `xrpld_peer_quality{metric="peers_insane_count"}` |
| `xrpld_ledger_economy{metric="base_fee_xrp"}` |
| `xrpld_ledger_economy{metric="transaction_rate"}` |
| `xrpld_state_tracking{metric="state_value"}` |
| `xrpld_ledgers_closed_total` |
| `xrpld_validations_sent_total` |
| `xrpld_state_changes_total` |
| `xrpld_storage_detail{metric="nudb_bytes"}` |
**New dashboard load checks (~3)**:
| Dashboard |
| ------------------------------ |
| `rippled-validator-health` |
| `rippled-peer-quality` |
| `xrpld-validator-health` |
| `xrpld-peer-quality` |
| `system-node-health` (updated) |
**New metric value sanity checks (~4)**:
@@ -553,36 +553,36 @@ Port 18 alert rules from the external `xrpl-validator-dashboard` to Grafana aler
**Critical Group** (8 rules, eval interval 10s):
| Rule | Condition | For |
| ------------------- | --------------------------------------------------------------- | --- |
| Agreement Below 90% | `rippled_validation_agreement{metric="agreement_pct_24h"} < 90` | 30s |
| Not Proposing | `rippled_state_tracking{metric="state_value"} < 6` | 10s |
| Unhealthy State | `rippled_state_tracking{metric="state_value"} < 4` | 10s |
| Amendment Blocked | `rippled_validator_health{metric="amendment_blocked"} == 1` | 1m |
| UNL Expiring | `rippled_validator_health{metric="unl_expiry_days"} < 14` | 1h |
| High IO Latency | `histogram_quantile(0.95, rippled_ios_latency_bucket) > 50` | 1m |
| High Load Factor | `rippled_load_factor_metrics{metric="load_factor"} > 1000` | 1m |
| Peer Count Critical | `rippled_server_info{metric="peers"} < 5` | 1m |
| Rule | Condition | For |
| ------------------- | ------------------------------------------------------------- | --- |
| Agreement Below 90% | `xrpld_validation_agreement{metric="agreement_pct_24h"} < 90` | 30s |
| Not Proposing | `xrpld_state_tracking{metric="state_value"} < 6` | 10s |
| Unhealthy State | `xrpld_state_tracking{metric="state_value"} < 4` | 10s |
| Amendment Blocked | `xrpld_validator_health{metric="amendment_blocked"} == 1` | 1m |
| UNL Expiring | `xrpld_validator_health{metric="unl_expiry_days"} < 14` | 1h |
| High IO Latency | `histogram_quantile(0.95, xrpld_ios_latency_bucket) > 50` | 1m |
| High Load Factor | `xrpld_load_factor_metrics{metric="load_factor"} > 1000` | 1m |
| Peer Count Critical | `xrpld_server_info{metric="peers"} < 5` | 1m |
**Network Group** (3 rules, eval interval 10s):
| Rule | Condition | For |
| ------------------------- | ------------------------------------------------------------------- | --- |
| Peer Drop >10% | `delta(rippled_server_info{metric="peers"}[30s]) / ... * 100 < -10` | 30s |
| Peer Drop >30% | Same formula, threshold -30 | 30s |
| P90 Latency + Disconnects | `peer_latency_p90_ms > 500 AND rate(disconnects) > 0` | 2m |
| Rule | Condition | For |
| ------------------------- | ----------------------------------------------------------------- | --- |
| Peer Drop >10% | `delta(xrpld_server_info{metric="peers"}[30s]) / ... * 100 < -10` | 30s |
| Peer Drop >30% | Same formula, threshold -30 | 30s |
| P90 Latency + Disconnects | `peer_latency_p90_ms > 500 AND rate(disconnects) > 0` | 2m |
**Performance Group** (7 rules, eval interval 10s):
| Rule | Condition | For |
| ------------------- | -------------------------------------------------------------- | --- |
| CPU High | Per-core CPU > 80% | 2m |
| Memory Critical | Memory usage > 90% | 1m |
| Disk Warning | Disk usage > 85% | 2m |
| Job Queue Overflow | `rate(rippled_jq_trans_overflow_total[5m]) > 0` | 1m |
| Upgrade Recommended | `rippled_peer_quality{metric="peers_higher_version_pct"} > 60` | 1m |
| TX Rate Drop | Transaction rate dropped > 50% in 5m window | 5m |
| Stale Ledger | `rippled_ledger_economy{metric="ledger_age_seconds"} > 30` | 1m |
| Rule | Condition | For |
| ------------------- | ------------------------------------------------------------ | --- |
| CPU High | Per-core CPU > 80% | 2m |
| Memory Critical | Memory usage > 90% | 1m |
| Disk Warning | Disk usage > 85% | 2m |
| Job Queue Overflow | `rate(xrpld_jq_trans_overflow_total[5m]) > 0` | 1m |
| Upgrade Recommended | `xrpld_peer_quality{metric="peers_higher_version_pct"} > 60` | 1m |
| TX Rate Drop | Transaction rate dropped > 50% in 5m window | 5m |
| Stale Ledger | `xrpld_ledger_economy{metric="ledger_age_seconds"} > 30` | 1m |
**Notification channels**: Template configs for Email/SMTP, Discord, Slack, PagerDuty.

View File

@@ -1,8 +1,8 @@
# rippled Telemetry Operator Runbook
# xrpld Telemetry Operator Runbook
## Overview
rippled supports OpenTelemetry distributed tracing to provide visibility into RPC requests, transaction processing, and consensus rounds.
xrpld supports OpenTelemetry distributed tracing to provide visibility into RPC requests, transaction processing, and consensus rounds.
## Quick Start
@@ -20,7 +20,7 @@ This starts:
- **Loki** on http://localhost:3100 (log aggregation)
- **Grafana** on http://localhost:3000
### 2. Enable telemetry in rippled
### 2. Enable telemetry in xrpld
Add to your `xrpld.cfg`:
@@ -40,26 +40,28 @@ cmake --build --preset default
## Configuration Reference
| Option | Default | Description |
| -------------------- | --------------------------------- | ----------------------------------------- |
| `enabled` | `0` | Master switch for telemetry |
| `endpoint` | `http://localhost:4318/v1/traces` | OTLP/HTTP endpoint |
| `exporter` | `otlp_http` | Exporter type |
| `sampling_ratio` | `1.0` | Head-based sampling ratio (0.01.0) |
| `trace_rpc` | `1` | Enable RPC request tracing |
| `trace_transactions` | `1` | Enable transaction tracing |
| `trace_consensus` | `1` | Enable consensus tracing |
| `trace_peer` | `0` | Enable peer message tracing (high volume) |
| `trace_ledger` | `1` | Enable ledger tracing |
| `batch_size` | `512` | Max spans per batch export |
| `batch_delay_ms` | `5000` | Delay between batch exports |
| `max_queue_size` | `2048` | Max spans queued before dropping |
| `use_tls` | `0` | Use TLS for exporter connection |
| `tls_ca_cert` | (empty) | Path to CA certificate bundle |
| Option | Default | Description |
| -------------------------- | --------------------------------- | --------------------------------------------------------- |
| `enabled` | `0` | Master switch for telemetry |
| `endpoint` | `http://localhost:4318/v1/traces` | OTLP/HTTP endpoint |
| `service_name` | `xrpld` | OpenTelemetry service name resource attribute |
| `service_instance_id` | node public key | OpenTelemetry service instance ID resource attribute |
| `sampling_ratio` | `1.0` | Head-based sampling ratio (0.0--1.0) |
| `trace_rpc` | `1` | Enable RPC request tracing |
| `trace_transactions` | `1` | Enable transaction tracing |
| `trace_consensus` | `1` | Enable consensus tracing |
| `trace_peer` | `0` | Enable peer message tracing (high volume) |
| `trace_ledger` | `1` | Enable ledger tracing |
| `consensus_trace_strategy` | `deterministic` | Consensus trace ID strategy (`deterministic` or `random`) |
| `batch_size` | `512` | Max spans per batch export |
| `batch_delay_ms` | `5000` | Delay between batch exports |
| `max_queue_size` | `2048` | Max spans queued before dropping |
| `use_tls` | `0` | Use TLS for exporter connection |
| `tls_ca_cert` | (empty) | Path to CA certificate bundle |
## Span Reference
All spans instrumented in rippled, grouped by subsystem:
All spans instrumented in xrpld, grouped by subsystem:
### RPC Spans (Phase 2)
@@ -72,21 +74,47 @@ All spans instrumented in rippled, grouped by subsystem:
### Transaction Spans (Phase 3)
| Span Name | Source File | Attributes | Description |
| ------------ | ------------------- | ---------------------------------------------------------------------- | ------------------------------------- |
| `tx.process` | NetworkOPs.cpp:1227 | `xrpl.tx.hash`, `xrpl.tx.local`, `xrpl.tx.path` | Transaction submission and processing |
| `tx.receive` | PeerImp.cpp:1273 | `xrpl.peer.id`, `xrpl.tx.hash`, `xrpl.tx.suppressed`, `xrpl.tx.status` | Transaction received from peer relay |
| `tx.apply` | BuildLedger.cpp:88 | `xrpl.ledger.seq`, `xrpl.ledger.tx_count`, `xrpl.ledger.tx_failed` | Transaction set applied per ledger |
| Span Name | Source File | Attributes | Description |
| ------------ | ------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------- |
| `tx.process` | NetworkOPs.cpp:1227 | `xrpl.tx.hash`, `xrpl.tx.local`, `xrpl.tx.path` | Transaction submission and processing |
| `tx.receive` | PeerImp.cpp:1273 | `xrpl.peer.id`, `xrpl.tx.hash`, `xrpl.peer.version`, `xrpl.tx.suppressed`, `xrpl.tx.status` | Transaction received from peer relay |
| `tx.apply` | BuildLedger.cpp:88 | `xrpl.ledger.seq`, `xrpl.ledger.tx_count`, `xrpl.ledger.tx_failed` | Transaction set applied per ledger |
### Transaction Queue Spans (Phase 3)
| Span Name | Source File | Attributes | Description |
| ------------------ | ----------- | --------------------------------------------------------------------- | -------------------------------------------------- |
| `txq.enqueue` | TxQ.cpp | `xrpl.txq.tx_hash` | Transaction enqueue decision (child of tx.process) |
| `txq.apply_direct` | TxQ.cpp | -- | Direct apply attempt (bypassing queue) |
| `txq.batch_clear` | TxQ.cpp | -- | Batch clear of queued transactions for an account |
| `txq.accept` | TxQ.cpp | `xrpl.txq.queue_size` | Ledger-close accept loop over queued transactions |
| `txq.accept_tx` | TxQ.cpp | `xrpl.txq.tx_hash`, `xrpl.txq.retries_remaining`, `xrpl.txq.ter_code` | Per-transaction apply during accept |
| `txq.cleanup` | TxQ.cpp | `xrpl.txq.ledger_seq` | Post-close cleanup of expired queue entries |
### Consensus Spans (Phase 4)
| Span Name | Source File | Attributes | Description |
| --------------------------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
| `consensus.proposal.send` | RCLConsensus.cpp:177 | `xrpl.consensus.round` | Consensus proposal broadcast |
| `consensus.ledger_close` | RCLConsensus.cpp:282 | `xrpl.consensus.ledger.seq`, `xrpl.consensus.mode` | Ledger close event |
| `consensus.accept` | RCLConsensus.cpp:395 | `xrpl.consensus.proposers`, `xrpl.consensus.round_time_ms` | Ledger accepted by consensus |
| `consensus.validation.send` | RCLConsensus.cpp:753 | `xrpl.consensus.ledger.seq`, `xrpl.consensus.proposing` | Validation sent after accept |
| `consensus.accept.apply` | RCLConsensus.cpp:453 | `xrpl.consensus.close_time`, `close_time_correct`, `close_resolution_ms`, `state`, `proposing`, `round_time_ms`, `ledger.seq` | Ledger application with close time details |
| Span Name | Source File | Attributes | Description |
| ------------------------------ | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------- |
| `consensus.round` | RCLConsensus.cpp | `xrpl.consensus.ledger_id`, `xrpl.consensus.ledger.seq`, `xrpl.consensus.mode`, `xrpl.consensus.trace_strategy`, `xrpl.consensus.round_id` | Root span for a consensus round (deterministic or random trace ID) |
| `consensus.phase.open` | Consensus.h | -- | Open phase duration (child of round) |
| `consensus.proposal.send` | RCLConsensus.cpp | `xrpl.consensus.round` | Consensus proposal broadcast |
| `consensus.ledger_close` | RCLConsensus.cpp | `xrpl.consensus.ledger.seq`, `xrpl.consensus.mode` | Ledger close event |
| `consensus.establish` | Consensus.h | `xrpl.consensus.converge_percent`, `xrpl.consensus.establish_count`, `xrpl.consensus.proposers` | Establish phase duration (child of round) |
| `consensus.update_positions` | Consensus.h | `xrpl.consensus.converge_percent`, `xrpl.consensus.proposers`, `xrpl.consensus.disputes_count` | Position update and dispute resolution (see Events below) |
| `consensus.check` | Consensus.h | `xrpl.consensus.agree_count`, `xrpl.consensus.disagree_count`, `xrpl.consensus.converge_percent`, `xrpl.consensus.have_close_time_consensus`, `xrpl.consensus.threshold_percent`, `xrpl.consensus.result` | Consensus threshold check |
| `consensus.accept` | RCLConsensus.cpp | `xrpl.consensus.proposers`, `xrpl.consensus.round_time_ms`, `xrpl.consensus.quorum` | Ledger accepted by consensus |
| `consensus.accept.apply` | RCLConsensus.cpp | `xrpl.consensus.ledger.seq`, `xrpl.consensus.close_time`, `xrpl.consensus.close_time_correct`, `xrpl.consensus.close_resolution_ms`, `xrpl.consensus.state`, `xrpl.consensus.proposing`, `xrpl.consensus.round_time_ms`, `xrpl.consensus.parent_close_time`, `xrpl.consensus.close_time_self`, `xrpl.consensus.close_time_vote_bins`, `xrpl.consensus.resolution_direction`, `xrpl.consensus.tx_count` | Ledger application with close time details (see Events below) |
| `consensus.validation.send` | RCLConsensus.cpp | `xrpl.consensus.ledger.seq`, `xrpl.consensus.proposing` | Validation sent after accept (follows-from link) |
| `consensus.mode_change` | RCLConsensus.cpp | `xrpl.consensus.mode.old`, `xrpl.consensus.mode.new` | Consensus mode transition |
| `consensus.proposal.receive` | PeerImp.cpp | `xrpl.consensus.trusted`, `xrpl.consensus.round` | Proposal received from peer (extracts parent context from TraceContext when present; falls back to standalone span for older peers) |
| `consensus.validation.receive` | PeerImp.cpp | `xrpl.consensus.trusted`, `xrpl.consensus.ledger.seq` | Validation received from peer (extracts parent context from TraceContext when present; falls back to standalone span for older peers) |
#### Consensus Span Events
| Parent Span | Event Name | Event Attributes | Description |
| ---------------------------- | ----------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------- |
| `consensus.update_positions` | `dispute.resolve` | `xrpl.tx.id`, `xrpl.dispute.our_vote`, `xrpl.dispute.yays`, `xrpl.dispute.nays` | Emitted per dispute when votes are tallied |
| `consensus.accept.apply` | `tx.included` | `xrpl.tx.id` | Emitted per transaction included in the accepted ledger |
#### Close Time Queries (Tempo TraceQL)
@@ -102,6 +130,12 @@ All spans instrumented in rippled, grouped by subsystem:
# Find specific ledger's consensus details
{name="consensus.accept.apply"} | xrpl.consensus.ledger.seq = 92345678
# Find all spans in a consensus round (deterministic trace strategy)
{name="consensus.round"} | xrpl.consensus.round_id = "<round_id>"
# Find dispute resolutions
{name="consensus.update_positions"} >> {event:name="dispute.resolve"}
```
### Ledger Spans (Phase 5)
@@ -119,9 +153,90 @@ All spans instrumented in rippled, grouped by subsystem:
| `peer.proposal.receive` | PeerImp.cpp:1667 | `xrpl.peer.id`, `xrpl.peer.proposal.trusted` | Proposal received from peer |
| `peer.validation.receive` | PeerImp.cpp:2264 | `xrpl.peer.id`, `xrpl.peer.validation.trusted` | Validation received from peer |
## Cross-Node Trace Propagation
xrpld propagates trace context across nodes via protobuf `TraceContext` fields
embedded in peer-to-peer messages. When Node A sends a transaction, proposal,
or validation, it injects its active span's trace/span IDs into the protobuf
message. Node B extracts that context on receipt and creates a child span,
linking the two nodes into a single distributed trace.
### How It Works
```
Node A (sender) Node B (receiver)
+-----------------------------+ +-------------------------------+
| tx.process / consensus.* | | PeerImp::onMessage() |
| | | | | |
| v | | v |
| SpanGuard::getTraceBytes() | | extract TraceContext from |
| | | | protobuf message |
| v | send | | |
| injectSpanContext() --------|--------->| v |
| sets TraceContext fields | proto | txReceiveSpan() |
| (trace_id, span_id, flags) | msg | proposalReceiveSpan() |
+-----------------------------+ | validationReceiveSpan() |
| | |
| v |
| child span with parent link |
+-------------------------------+
```
### Send-Side Injection
| Message Type | Injection Point | Mechanism |
| ------------- | -------------------------- | ------------------------------------------ |
| TMTransaction | `NetworkOPs::apply()` | Injects `tx.process` span into relay msg |
| TMProposeSet | `RCLConsensus::propose()` | Injects active context into proposal msg |
| TMValidation | `RCLConsensus::validate()` | Injects active context into validation msg |
### Receive-Side Extraction
| Message Type | Extraction Point | Helper Function |
| ------------- | ----------------------------------- | -------------------------------------------------- |
| TMTransaction | `PeerImp::onMessage(TMTransaction)` | `TxTracing::txReceiveSpan()` |
| TMProposeSet | `PeerImp::onMessage(TMProposeSet)` | `ConsensusReceiveTracing::proposalReceiveSpan()` |
| TMValidation | `PeerImp::onMessage(TMValidation)` | `ConsensusReceiveTracing::validationReceiveSpan()` |
### Key Files
| File | Role |
| ------------------------------------------------- | ----------------------------------------------- |
| `src/xrpld/telemetry/PropagationHelpers.h` | `injectSpanContext()` — SpanGuard to protobuf |
| `include/xrpl/telemetry/TraceContextPropagator.h` | OTel context <-> protobuf conversion primitives |
| `src/xrpld/telemetry/ConsensusReceiveTracing.h` | Proposal/validation receive span factories |
| `src/xrpld/telemetry/TxTracing.h` | Transaction receive span factory |
### Backwards Compatibility
Older peers that do not populate `TraceContext` fields in their messages will
simply produce empty trace bytes on the receive side. The extraction helpers
detect this and create standalone (root) spans instead of child spans. No
errors are logged and no data is lost — the receive span is still created with
all its normal attributes, it just lacks a cross-node parent link.
### Example Tempo Queries
```
# Find cross-node transaction traces (tx.process -> tx.receive across nodes)
{name="tx.receive"} && status != error
# Find proposals received with cross-node parent context
{name="consensus.proposal.receive"} && nestedSetParent > 0
# Trace a transaction across the network by its hash
{name=~"tx\\..*"} | xrpl.tx.hash = "<hash>"
# Find all spans in a cross-node consensus trace
{rootServiceName="xrpld"} | xrpl.consensus.round_id = "<round_id>"
# Compare latency between sender and receiver for validations
{name="consensus.validation.send" || name="consensus.validation.receive"}
```
## Prometheus Metrics (Spanmetrics)
The OTel Collector's spanmetrics connector automatically derives RED (Rate, Errors, Duration) metrics from every span. No custom metrics code is needed in rippled.
The OTel Collector's spanmetrics connector automatically derives RED (Rate, Errors, Duration) metrics from every span. No custom metrics code is needed in xrpld.
### Generated Metric Names
@@ -140,7 +255,7 @@ Every metric carries these standard labels:
| -------------- | ------------------ | ---------------------------------------- |
| `span_name` | Span name | `rpc.command.server_info` |
| `status_code` | Span status | `STATUS_CODE_UNSET`, `STATUS_CODE_ERROR` |
| `service_name` | Resource attribute | `rippled` |
| `service_name` | Resource attribute | `xrpld` |
| `span_kind` | Span kind | `SPAN_KIND_INTERNAL` |
Additionally, span attributes configured as dimensions in the collector become metric labels (dots → underscores):
@@ -162,104 +277,91 @@ Configured in `otel-collector-config.yaml`:
1ms, 5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 5s
```
## System Metrics (beast::insight via OTel native)
## StatsD Metrics (beast::insight)
rippled has a built-in metrics framework (`beast::insight`) that exports metrics natively via OTLP/HTTP. These complement the span-derived RED metrics by providing system-level gauges, counters, and timers that don't map to individual trace spans.
xrpld has a built-in metrics framework (`beast::insight`) that emits StatsD-format metrics over UDP. These complement the span-derived RED metrics by providing system-level gauges, counters, and timers that don't map to individual trace spans.
### Configuration
Add to `xrpld.cfg`:
```ini
[insight]
server=otel
endpoint=http://localhost:4318/v1/metrics
prefix=rippled
```
The OTel Collector receives these via the OTLP receiver (same endpoint as traces, port 4318) and exports them to Prometheus alongside spanmetrics.
#### StatsD fallback (backward compatibility)
The legacy StatsD backend is still available:
```ini
[insight]
server=statsd
address=127.0.0.1:8125
prefix=rippled
prefix=xrpld
```
When using StatsD, uncomment the `statsd` receiver in `otel-collector-config.yaml` and add port `8125:8125/udp` to the docker-compose otel-collector service.
The OTel Collector receives these via a `statsd` receiver on UDP port 8125 and exports them to Prometheus alongside spanmetrics.
### Metric Reference
#### Gauges
| Prometheus Metric | Source | Description |
| --------------------------------------------- | ------------------------- | -------------------------------------------------------------------------- |
| `rippled_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h:373 | Age of validated ledger (seconds) |
| `rippled_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h:374 | Age of published ledger (seconds) |
| `rippled_State_Accounting_{Mode}_duration` | NetworkOPs.cpp:774 | Time in each operating mode (Disconnected/Connected/Syncing/Tracking/Full) |
| `rippled_State_Accounting_{Mode}_transitions` | NetworkOPs.cpp:780 | Transition count per mode |
| `rippled_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp:214 | Active inbound peer connections |
| `rippled_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp:215 | Active outbound peer connections |
| `rippled_Overlay_Peer_Disconnects` | OverlayImpl.h:557 | Peer disconnect count |
| `rippled_job_count` | JobQueue.cpp:26 | Current job queue depth |
| `rippled_{category}_Bytes_In/Out` | OverlayImpl.h:535 | Overlay traffic bytes per category (57 categories) |
| `rippled_{category}_Messages_In/Out` | OverlayImpl.h:535 | Overlay traffic messages per category |
| Prometheus Metric | Source | Description |
| ------------------------------------------- | ------------------------- | -------------------------------------------------------------------------- |
| `xrpld_LedgerMaster_Validated_Ledger_Age` | LedgerMaster.h:373 | Age of validated ledger (seconds) |
| `xrpld_LedgerMaster_Published_Ledger_Age` | LedgerMaster.h:374 | Age of published ledger (seconds) |
| `xrpld_State_Accounting_{Mode}_duration` | NetworkOPs.cpp:774 | Time in each operating mode (Disconnected/Connected/Syncing/Tracking/Full) |
| `xrpld_State_Accounting_{Mode}_transitions` | NetworkOPs.cpp:780 | Transition count per mode |
| `xrpld_Peer_Finder_Active_Inbound_Peers` | PeerfinderManager.cpp:214 | Active inbound peer connections |
| `xrpld_Peer_Finder_Active_Outbound_Peers` | PeerfinderManager.cpp:215 | Active outbound peer connections |
| `xrpld_Overlay_Peer_Disconnects` | OverlayImpl.h:557 | Peer disconnect count |
| `xrpld_job_count` | JobQueue.cpp:26 | Current job queue depth |
| `xrpld_{category}_Bytes_In/Out` | OverlayImpl.h:535 | Overlay traffic bytes per category (57 categories) |
| `xrpld_{category}_Messages_In/Out` | OverlayImpl.h:535 | Overlay traffic messages per category |
#### OTel MetricsRegistry Gauges (Phase 9)
These gauges are exported via the OTel Metrics SDK `PeriodicMetricReader` (10s interval), NOT through beast::insight.
| Prometheus Metric | Source | Description |
| ----------------------------------------------------------- | ------------------- | -------------------------------------------- |
| `rippled_server_info{metric="server_state"}` | MetricsRegistry.cpp | Operating mode (0=DISCONNECTED .. 4=FULL) |
| `rippled_server_info{metric="uptime"}` | MetricsRegistry.cpp | Seconds since server start |
| `rippled_server_info{metric="peers"}` | MetricsRegistry.cpp | Total connected peers |
| `rippled_server_info{metric="validated_ledger_seq"}` | MetricsRegistry.cpp | Validated ledger sequence number |
| `rippled_server_info{metric="ledger_current_index"}` | MetricsRegistry.cpp | Current open ledger sequence |
| `rippled_server_info{metric="peer_disconnects_resources"}` | MetricsRegistry.cpp | Cumulative resource-related peer disconnects |
| `rippled_server_info{metric="last_close_proposers"}` | MetricsRegistry.cpp | Proposers in last closed round |
| `rippled_server_info{metric="last_close_converge_time_ms"}` | MetricsRegistry.cpp | Last close convergence time (ms) |
| `rippled_build_info{version="<ver>"}` | MetricsRegistry.cpp | Info-style metric (always 1) |
| `rippled_complete_ledgers{bound="start\|end",index="<N>"}` | MetricsRegistry.cpp | Complete ledger range start/end pairs |
| `rippled_db_metrics{metric="db_kb_total"}` | MetricsRegistry.cpp | Total database size (KB) |
| `rippled_db_metrics{metric="db_kb_ledger"}` | MetricsRegistry.cpp | Ledger database size (KB) |
| `rippled_db_metrics{metric="db_kb_transaction"}` | MetricsRegistry.cpp | Transaction database size (KB) |
| `rippled_db_metrics{metric="historical_perminute"}` | MetricsRegistry.cpp | Historical ledger fetches per minute |
| `rippled_cache_metrics{metric="AL_size"}` | MetricsRegistry.cpp | AcceptedLedger cache size |
| `rippled_nodestore_state{metric="node_reads_duration_us"}` | MetricsRegistry.cpp | Cumulative read time (microseconds) |
| `rippled_nodestore_state{metric="read_request_bundle"}` | MetricsRegistry.cpp | Read request bundle count |
| `rippled_nodestore_state{metric="read_threads_running"}` | MetricsRegistry.cpp | Active read threads |
| `rippled_nodestore_state{metric="read_threads_total"}` | MetricsRegistry.cpp | Total read threads configured |
| Prometheus Metric | Source | Description |
| --------------------------------------------------------- | ------------------- | -------------------------------------------- |
| `xrpld_server_info{metric="server_state"}` | MetricsRegistry.cpp | Operating mode (0=DISCONNECTED .. 4=FULL) |
| `xrpld_server_info{metric="uptime"}` | MetricsRegistry.cpp | Seconds since server start |
| `xrpld_server_info{metric="peers"}` | MetricsRegistry.cpp | Total connected peers |
| `xrpld_server_info{metric="validated_ledger_seq"}` | MetricsRegistry.cpp | Validated ledger sequence number |
| `xrpld_server_info{metric="ledger_current_index"}` | MetricsRegistry.cpp | Current open ledger sequence |
| `xrpld_server_info{metric="peer_disconnects_resources"}` | MetricsRegistry.cpp | Cumulative resource-related peer disconnects |
| `xrpld_server_info{metric="last_close_proposers"}` | MetricsRegistry.cpp | Proposers in last closed round |
| `xrpld_server_info{metric="last_close_converge_time_ms"}` | MetricsRegistry.cpp | Last close convergence time (ms) |
| `xrpld_build_info{version="<ver>"}` | MetricsRegistry.cpp | Info-style metric (always 1) |
| `xrpld_complete_ledgers{bound="start\|end",index="<N>"}` | MetricsRegistry.cpp | Complete ledger range start/end pairs |
| `xrpld_db_metrics{metric="db_kb_total"}` | MetricsRegistry.cpp | Total database size (KB) |
| `xrpld_db_metrics{metric="db_kb_ledger"}` | MetricsRegistry.cpp | Ledger database size (KB) |
| `xrpld_db_metrics{metric="db_kb_transaction"}` | MetricsRegistry.cpp | Transaction database size (KB) |
| `xrpld_db_metrics{metric="historical_perminute"}` | MetricsRegistry.cpp | Historical ledger fetches per minute |
| `xrpld_cache_metrics{metric="AL_size"}` | MetricsRegistry.cpp | AcceptedLedger cache size |
| `xrpld_nodestore_state{metric="node_reads_duration_us"}` | MetricsRegistry.cpp | Cumulative read time (microseconds) |
| `xrpld_nodestore_state{metric="read_request_bundle"}` | MetricsRegistry.cpp | Read request bundle count |
| `xrpld_nodestore_state{metric="read_threads_running"}` | MetricsRegistry.cpp | Active read threads |
| `xrpld_nodestore_state{metric="read_threads_total"}` | MetricsRegistry.cpp | Total read threads configured |
#### Counters
| Prometheus Metric | Source | Description |
| --------------------------------- | --------------------- | ------------------------------ |
| `rippled_rpc_requests` | ServerHandler.cpp:108 | Total RPC request count |
| `rippled_ledger_fetches` | InboundLedgers.cpp:44 | Ledger fetch request count |
| `rippled_ledger_history_mismatch` | LedgerHistory.cpp:16 | Ledger hash mismatch count |
| `rippled_warn` | Logic.h:33 | Resource manager warning count |
| `rippled_drop` | Logic.h:34 | Resource manager drop count |
| Prometheus Metric | Source | Description |
| ------------------------------- | --------------------- | ------------------------------ |
| `xrpld_rpc_requests` | ServerHandler.cpp:108 | Total RPC request count |
| `xrpld_ledger_fetches` | InboundLedgers.cpp:44 | Ledger fetch request count |
| `xrpld_ledger_history_mismatch` | LedgerHistory.cpp:16 | Ledger hash mismatch count |
| `xrpld_warn` | Logic.h:33 | Resource manager warning count |
| `xrpld_drop` | Logic.h:34 | Resource manager drop count |
#### Histograms (from StatsD timers)
| Prometheus Metric | Source | Description |
| ----------------------- | --------------------- | ------------------------------ |
| `rippled_rpc_time` | ServerHandler.cpp:110 | RPC response time (ms) |
| `rippled_rpc_size` | ServerHandler.cpp:109 | RPC response size (bytes) |
| `rippled_ios_latency` | Application.cpp:438 | I/O service loop latency (ms) |
| `rippled_pathfind_fast` | PathRequests.h:23 | Fast pathfinding duration (ms) |
| `rippled_pathfind_full` | PathRequests.h:24 | Full pathfinding duration (ms) |
| Prometheus Metric | Source | Description |
| --------------------- | --------------------- | ------------------------------ |
| `xrpld_rpc_time` | ServerHandler.cpp:110 | RPC response time (ms) |
| `xrpld_rpc_size` | ServerHandler.cpp:109 | RPC response size (bytes) |
| `xrpld_ios_latency` | Application.cpp:438 | I/O service loop latency (ms) |
| `xrpld_pathfind_fast` | PathRequests.h:23 | Fast pathfinding duration (ms) |
| `xrpld_pathfind_full` | PathRequests.h:24 | Full pathfinding duration (ms) |
## Grafana Dashboards
Eight dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
Ten dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
### RPC Performance (`rippled-rpc-perf`)
### RPC Performance (`xrpld-rpc-perf`)
| Panel | Type | PromQL | Labels Used |
| --------------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- |
@@ -272,7 +374,7 @@ Eight dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
| Top Commands by Volume | bargauge | `topk(10, ...)` by `xrpl_rpc_command` | `xrpl_rpc_command` |
| WebSocket Message Rate | stat | `rpc.ws_message` rate | — |
### Transaction Overview (`rippled-transactions`)
### Transaction Overview (`xrpld-transactions`)
| Panel | Type | PromQL | Labels Used |
| --------------------------------- | ---------- | -------------------------------------------------------------------------------------------- | --------------- |
@@ -285,7 +387,7 @@ Eight dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
| Peer TX Receive Rate | timeseries | `tx.receive` rate | — |
| TX Apply Failed Rate | stat | `tx.apply` with `STATUS_CODE_ERROR` | `status_code` |
### Consensus Health (`rippled-consensus`)
### Consensus Health (`xrpld-consensus`)
| Panel | Type | PromQL | Labels Used |
| ----------------------------- | ---------- | ---------------------------------------------------------------------------------- | --------------------- |
@@ -300,7 +402,7 @@ Eight dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
| Validation vs Close Rate | timeseries | `consensus.validation.send` vs `consensus.ledger_close` | — |
| Accept Duration Heatmap | heatmap | `consensus.accept` histogram buckets | `le` |
### Ledger Operations (`rippled-ledger-ops`)
### Ledger Operations (`xrpld-ledger-ops`)
| Panel | Type | PromQL | Labels Used |
| ----------------------- | ---------- | ---------------------------------------------- | ----------- |
@@ -313,7 +415,7 @@ Eight dashboards are pre-provisioned in `docker/telemetry/grafana/dashboards/`:
| Ledger Store Rate | stat | `ledger.store` call rate | — |
| Build vs Close Duration | timeseries | p95 `ledger.build` vs `consensus.ledger_close` | — |
### Peer Network (`rippled-peer-net`)
### Peer Network (`xrpld-peer-net`)
Requires `trace_peer=1` in the `[telemetry]` config section.
@@ -324,78 +426,102 @@ Requires `trace_peer=1` in the `[telemetry]` config section.
| Proposals Trusted vs Untrusted | piechart | by `xrpl_peer_proposal_trusted` | `xrpl_peer_proposal_trusted` |
| Validations Trusted vs Untrusted | piechart | by `xrpl_peer_validation_trusted` | `xrpl_peer_validation_trusted` |
### Node Health — System Metrics (`rippled-system-node-health`)
### Node Health -- StatsD (`xrpld-statsd-node-health`)
| Panel | Type | PromQL | Labels Used |
| -------------------------- | ---------- | ------------------------------------------------------ | ---------------- |
| Validated Ledger Age | stat | `rippled_LedgerMaster_Validated_Ledger_Age` | — |
| Published Ledger Age | stat | `rippled_LedgerMaster_Published_Ledger_Age` | — |
| Operating Mode Duration | timeseries | `rippled_State_Accounting_*_duration` | — |
| Operating Mode Transitions | timeseries | `rippled_State_Accounting_*_transitions` | — |
| I/O Latency | timeseries | `histogram_quantile(0.95, rippled_ios_latency_bucket)` | — |
| Job Queue Depth | timeseries | `rippled_job_count` | — |
| Ledger Fetch Rate | stat | `rate(rippled_ledger_fetches[5m])` | — |
| Ledger History Mismatches | stat | `rate(rippled_ledger_history_mismatch[5m])` | — |
| Server State | stat | `rippled_server_info{metric="server_state"}` | `metric` |
| Uptime | stat | `rippled_server_info{metric="uptime"}` | `metric` |
| Peer Count | stat | `rippled_server_info{metric="peers"}` | `metric` |
| Validated Ledger Seq | stat | `rippled_server_info{metric="validated_ledger_seq"}` | `metric` |
| Build Version | stat | `rippled_build_info` | `version` |
| Complete Ledger Ranges | table | `rippled_complete_ledgers` | `bound`, `index` |
| Database Sizes | timeseries | `rippled_db_metrics{metric=~"db_kb_.*"}` | `metric` |
| Historical Fetch Rate | stat | `rippled_db_metrics{metric="historical_perminute"}` | `metric` |
| Panel | Type | PromQL | Labels Used |
| -------------------------------------- | ---------- | --------------------------------------------------------------- | ---------------- |
| Validated Ledger Age | stat | `xrpld_LedgerMaster_Validated_Ledger_Age` | — |
| Published Ledger Age | stat | `xrpld_LedgerMaster_Published_Ledger_Age` | — |
| Operating Mode Duration | timeseries | `xrpld_State_Accounting_*_duration` | — |
| Operating Mode Transitions | timeseries | `xrpld_State_Accounting_*_transitions` | — |
| I/O Latency | timeseries | `histogram_quantile(0.95, xrpld_ios_latency_bucket)` | — |
| Job Queue Depth | timeseries | `xrpld_job_count` | — |
| Ledger Fetch Rate | stat | `rate(xrpld_ledger_fetches[5m])` | — |
| Ledger History Mismatches | stat | `rate(xrpld_ledger_history_mismatch[5m])` | — |
| Key Jobs Execution Time | timeseries | `xrpld_acceptLedger{quantile="$quantile"}` (+ 10 more key jobs) | `quantile` |
| Key Jobs Dequeue Wait Time | timeseries | `xrpld_acceptLedger_q{quantile="$quantile"}` (+ 10 more) | `quantile` |
| FullBelowCache Size | timeseries | `xrpld_Node_family_full_below_cache_size` | |
| FullBelowCache Hit Rate | gauge | `xrpld_Node_family_full_below_cache_hit_rate` | — |
| Ledger Publish Gap | stat | `Published_Ledger_Age - Validated_Ledger_Age` | — |
| State Duration Rate (Full vs Tracking) | timeseries | `rate(xrpld_State_Accounting_Full_duration[5m]) / 1000000` | — |
| All Jobs Execution Time (Detail) | timeseries | `{__name__=~"xrpld_<all_jobs>", quantile="$quantile"}` | `quantile` |
| All Jobs Dequeue Wait (Detail) | timeseries | `{__name__=~"xrpld_<all_jobs>_q", quantile="$quantile"}` | `quantile` |
| Server State | stat | `xrpld_server_info{metric="server_state"}` | `metric` |
| Uptime | stat | `xrpld_server_info{metric="uptime"}` | `metric` |
| Peer Count | stat | `xrpld_server_info{metric="peers"}` | `metric` |
| Validated Ledger Seq | stat | `xrpld_server_info{metric="validated_ledger_seq"}` | `metric` |
| Build Version | stat | `xrpld_build_info` | `version` |
| Complete Ledger Ranges | table | `xrpld_complete_ledgers` | `bound`, `index` |
| Database Sizes | timeseries | `xrpld_db_metrics{metric=~"db_kb_.*"}` | `metric` |
| Historical Fetch Rate | stat | `xrpld_db_metrics{metric="historical_perminute"}` | `metric` |
### Network Traffic — System Metrics (`rippled-system-network`)
### Network Traffic -- StatsD (`xrpld-statsd-network`)
| Panel | Type | PromQL | Labels Used |
| ---------------------- | ---------- | -------------------------------------- | ----------- |
| Active Peers | timeseries | `rippled_Peer_Finder_Active_*_Peers` | — |
| Peer Disconnects | timeseries | `rippled_Overlay_Peer_Disconnects` | — |
| Total Network Bytes | timeseries | `rippled_total_Bytes_In/Out` | — |
| Total Network Messages | timeseries | `rippled_total_Messages_In/Out` | — |
| Transaction Traffic | timeseries | `rippled_transactions_Messages_In/Out` | — |
| Proposal Traffic | timeseries | `rippled_proposals_Messages_In/Out` | — |
| Validation Traffic | timeseries | `rippled_validations_Messages_In/Out` | — |
| Traffic by Category | bargauge | `topk(10, rippled_*_Bytes_In)` | — |
| Panel | Type | PromQL | Labels Used |
| ------------------------------------ | ---------- | ------------------------------------------ | ----------- |
| Active Peers | timeseries | `xrpld_Peer_Finder_Active_*_Peers` | — |
| Peer Disconnects | timeseries | `xrpld_Overlay_Peer_Disconnects` | — |
| Total Network Bytes | timeseries | `rate(xrpld_total_Bytes_In/Out[5m])` | — |
| Total Network Messages | timeseries | `xrpld_total_Messages_In/Out` | — |
| Transaction Traffic | timeseries | `xrpld_transactions_Messages_In/Out` | — |
| Proposal Traffic | timeseries | `xrpld_proposals_Messages_In/Out` | — |
| Validation Traffic | timeseries | `xrpld_validations_Messages_In/Out` | — |
| Traffic by Category | bargauge | `topk(10, xrpld_*_Bytes_In)` | — |
| Duplicate Traffic (Wasted Bandwidth) | timeseries | `rate(xrpld_*_duplicate_Bytes_In/Out[5m])` | — |
| All Traffic Categories (Detail) | timeseries | `topk(15, rate(xrpld_*_Bytes_In[5m]))` | — |
### RPC & Pathfinding — System Metrics (`rippled-system-rpc`)
### RPC & Pathfinding -- StatsD (`xrpld-statsd-rpc`)
| Panel | Type | PromQL | Labels Used |
| ------------------------- | ---------- | -------------------------------------------------------- | ----------- |
| RPC Request Rate | stat | `rate(rippled_rpc_requests[5m])` | — |
| RPC Response Time | timeseries | `histogram_quantile(0.95, rippled_rpc_time_bucket)` | — |
| RPC Response Size | timeseries | `histogram_quantile(0.95, rippled_rpc_size_bucket)` | — |
| RPC Response Time Heatmap | heatmap | `rippled_rpc_time_bucket` | — |
| Pathfinding Fast Duration | timeseries | `histogram_quantile(0.95, rippled_pathfind_fast_bucket)` | — |
| Pathfinding Full Duration | timeseries | `histogram_quantile(0.95, rippled_pathfind_full_bucket)` | — |
| Resource Warnings Rate | stat | `rate(rippled_warn[5m])` | — |
| Resource Drops Rate | stat | `rate(rippled_drop[5m])` | — |
| Panel | Type | PromQL | Labels Used |
| ------------------------- | ---------- | ------------------------------------------------------ | ----------- |
| RPC Request Rate | stat | `rate(xrpld_rpc_requests[5m])` | — |
| RPC Response Time | timeseries | `histogram_quantile(0.95, xrpld_rpc_time_bucket)` | — |
| RPC Response Size | timeseries | `histogram_quantile(0.95, xrpld_rpc_size_bucket)` | — |
| RPC Response Time Heatmap | heatmap | `xrpld_rpc_time_bucket` | — |
| Pathfinding Fast Duration | timeseries | `histogram_quantile(0.95, xrpld_pathfind_fast_bucket)` | — |
| Pathfinding Full Duration | timeseries | `histogram_quantile(0.95, xrpld_pathfind_full_bucket)` | — |
| Resource Warnings Rate | stat | `rate(xrpld_warn[5m])` | — |
| Resource Drops Rate | stat | `rate(xrpld_drop[5m])` | — |
### Span → Metric → Dashboard Summary
| Span Name | Prometheus Metric Filter | Grafana Dashboard |
| --------------------------- | ----------------------------------------- | --------------------------------------------- |
| `rpc.request` | `{span_name="rpc.request"}` | RPC Performance (Overall Throughput) |
| `rpc.process` | `{span_name="rpc.process"}` | RPC Performance (Overall Throughput) |
| `rpc.ws_message` | `{span_name="rpc.ws_message"}` | RPC Performance (WebSocket Rate) |
| `rpc.command.*` | `{span_name=~"rpc.command.*"}` | RPC Performance (Rate, Latency, Error, Top) |
| `tx.process` | `{span_name="tx.process"}` | Transaction Overview (Rate, Latency, Heatmap) |
| `tx.receive` | `{span_name="tx.receive"}` | Transaction Overview (Rate, Receive) |
| `tx.apply` | `{span_name="tx.apply"}` | Transaction Overview + Ledger Ops (Apply) |
| `consensus.accept` | `{span_name="consensus.accept"}` | Consensus Health (Duration, Rate, Heatmap) |
| `consensus.proposal.send` | `{span_name="consensus.proposal.send"}` | Consensus Health (Proposals Rate) |
| `consensus.ledger_close` | `{span_name="consensus.ledger_close"}` | Consensus Health (Close, Mode) |
| `consensus.validation.send` | `{span_name="consensus.validation.send"}` | Consensus Health (Validation Rate) |
| `consensus.accept.apply` | `{span_name="consensus.accept.apply"}` | Consensus Health (Apply Duration, Close Time) |
| `ledger.build` | `{span_name="ledger.build"}` | Ledger Ops (Build Rate, Duration, Heatmap) |
| `ledger.validate` | `{span_name="ledger.validate"}` | Ledger Ops (Validation Rate) |
| `ledger.store` | `{span_name="ledger.store"}` | Ledger Ops (Store Rate) |
| `peer.proposal.receive` | `{span_name="peer.proposal.receive"}` | Peer Network (Rate, Trusted/Untrusted) |
| `peer.validation.receive` | `{span_name="peer.validation.receive"}` | Peer Network (Rate, Trusted/Untrusted) |
| Span Name | Prometheus Metric Filter | Grafana Dashboard |
| ------------------------------ | -------------------------------------------- | --------------------------------------------- |
| `rpc.request` | `{span_name="rpc.request"}` | RPC Performance (Overall Throughput) |
| `rpc.process` | `{span_name="rpc.process"}` | RPC Performance (Overall Throughput) |
| `rpc.ws_message` | `{span_name="rpc.ws_message"}` | RPC Performance (WebSocket Rate) |
| `rpc.command.*` | `{span_name=~"rpc.command.*"}` | RPC Performance (Rate, Latency, Error, Top) |
| `tx.process` | `{span_name="tx.process"}` | Transaction Overview (Rate, Latency, Heatmap) |
| `tx.receive` | `{span_name="tx.receive"}` | Transaction Overview (Rate, Receive) |
| `tx.apply` | `{span_name="tx.apply"}` | Transaction Overview + Ledger Ops (Apply) |
| `txq.enqueue` | `{span_name="txq.enqueue"}` | -- (available but not paneled) |
| `txq.apply_direct` | `{span_name="txq.apply_direct"}` | -- (available but not paneled) |
| `txq.batch_clear` | `{span_name="txq.batch_clear"}` | -- (available but not paneled) |
| `txq.accept` | `{span_name="txq.accept"}` | -- (available but not paneled) |
| `txq.accept_tx` | `{span_name="txq.accept_tx"}` | -- (available but not paneled) |
| `txq.cleanup` | `{span_name="txq.cleanup"}` | -- (available but not paneled) |
| `consensus.round` | `{span_name="consensus.round"}` | -- (available but not paneled) |
| `consensus.phase.open` | `{span_name="consensus.phase.open"}` | -- (available but not paneled) |
| `consensus.establish` | `{span_name="consensus.establish"}` | -- (available but not paneled) |
| `consensus.update_positions` | `{span_name="consensus.update_positions"}` | -- (available but not paneled) |
| `consensus.check` | `{span_name="consensus.check"}` | -- (available but not paneled) |
| `consensus.accept` | `{span_name="consensus.accept"}` | Consensus Health (Duration, Rate, Heatmap) |
| `consensus.proposal.send` | `{span_name="consensus.proposal.send"}` | Consensus Health (Proposals Rate) |
| `consensus.ledger_close` | `{span_name="consensus.ledger_close"}` | Consensus Health (Close, Mode) |
| `consensus.validation.send` | `{span_name="consensus.validation.send"}` | Consensus Health (Validation Rate) |
| `consensus.accept.apply` | `{span_name="consensus.accept.apply"}` | Consensus Health (Apply Duration, Close Time) |
| `consensus.mode_change` | `{span_name="consensus.mode_change"}` | -- (available but not paneled) |
| `consensus.proposal.receive` | `{span_name="consensus.proposal.receive"}` | -- (available but not paneled) |
| `consensus.validation.receive` | `{span_name="consensus.validation.receive"}` | -- (available but not paneled) |
| `ledger.build` | `{span_name="ledger.build"}` | Ledger Ops (Build Rate, Duration, Heatmap) |
| `ledger.validate` | `{span_name="ledger.validate"}` | Ledger Ops (Validation Rate) |
| `ledger.store` | `{span_name="ledger.store"}` | Ledger Ops (Store Rate) |
| `peer.proposal.receive` | `{span_name="peer.proposal.receive"}` | Peer Network (Rate, Trusted/Untrusted) |
| `peer.validation.receive` | `{span_name="peer.validation.receive"}` | Peer Network (Rate, Trusted/Untrusted) |
## Log-Trace Correlation (Phase 8)
When rippled is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields:
When xrpld is built with `telemetry=ON`, log lines emitted within an active OpenTelemetry span automatically include `trace_id` and `span_id` fields:
```
2024-01-15T10:30:45.123Z LedgerMaster:NFO trace_id=abc123def456789012345678abcdef01 span_id=0123456789abcdef Validated ledger 42
@@ -414,45 +540,39 @@ Log files are ingested by the OTel Collector's `filelog` receiver, which tails `
```logql
# Find all logs for a specific trace
{job="rippled"} |= "trace_id=abc123def456789012345678abcdef01"
{job="xrpld"} |= "trace_id=abc123def456789012345678abcdef01"
# Error logs with trace context (log lines with ERR severity that have a trace_id)
{job="rippled"} |= "ERR" |= "trace_id="
{job="xrpld"} |= "ERR" |= "trace_id="
# All logs from a specific partition that were emitted during a span
{job="rippled"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
{job="xrpld"} |= "LedgerMaster" | regexp `trace_id=(?P<trace_id>[a-f0-9]+)` | trace_id != ""
# Logs from the last hour containing trace context
{job="rippled"} |= "trace_id=" | regexp `(?P<partition>\S+):(?P<sev>\S+)\s+trace_id=(?P<tid>[a-f0-9]+)`
{job="xrpld"} |= "trace_id=" | regexp `(?P<partition>\S+):(?P<sev>\S+)\s+trace_id=(?P<tid>[a-f0-9]+)`
# Count of traced vs untraced log lines
count_over_time({job="rippled"} |= "trace_id=" [5m])
count_over_time({job="xrpld"} |= "trace_id=" [5m])
```
### Verifying Log Correlation
1. Start the observability stack and rippled with telemetry enabled.
1. Start the observability stack and xrpld with telemetry enabled.
2. Send an RPC request: `curl http://localhost:5005 -d '{"method":"server_info"}'`
3. Check the debug.log for `trace_id=` entries: `grep trace_id= /path/to/debug.log`
4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="rippled"} |= "trace_id="`.
4. Open Grafana at http://localhost:3000 -> Explore -> Loki and search for `{job="xrpld"} |= "trace_id="`.
5. Click the TraceID link to navigate to the corresponding trace in Tempo.
## Troubleshooting
### No traces appearing in Tempo
1. Check rippled logs for `Telemetry starting` message
1. Check xrpld logs for `Telemetry starting` message
2. Verify `enabled=1` in the `[telemetry]` config section
3. Test collector connectivity: `curl -v http://localhost:4318/v1/traces`
4. Check collector logs: `docker compose logs otel-collector`
### No system metrics in Prometheus
1. Check rippled logs for `OTelCollector starting` message
2. Verify `server=otel` in the `[insight]` config section
3. Verify the endpoint in `[insight]` points to the OTLP/HTTP port (default: `http://localhost:4318/v1/metrics`)
4. Check that the `otlp` receiver is in the metrics pipeline receivers in `otel-collector-config.yaml`
5. Query Prometheus directly: `curl 'http://localhost:9090/api/v1/query?query=rippled_job_count'`
4. Check collector logs: `docker compose -f docker/telemetry/docker-compose.yml logs otel-collector`
5. Verify Tempo is receiving data: open Grafana → Explore → select Tempo datasource → search by `service.name = xrpld`
6. Check Tempo logs: `docker compose -f docker/telemetry/docker-compose.yml logs tempo`
### Server info gauge shows server_state=0
@@ -479,14 +599,14 @@ The `getKBUsed*()` methods require SQLite databases to exist. If running with
### No trace_id in log output
- Verify rippled was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag)
- Verify xrpld was built with `telemetry=ON` (the `XRPL_ENABLE_TELEMETRY` preprocessor flag)
- Verify `enabled=1` in the `[telemetry]` config section
- Log lines only contain `trace_id`/`span_id` when emitted inside an active span — background logs outside of RPC/consensus/transaction processing will not have trace context
- Check that the specific trace category is enabled (e.g., `trace_rpc=1`)
### No logs in Loki
- Verify the log file mount in docker-compose.yml points to the correct rippled log directory
- Verify the log file mount in docker-compose.yml points to the correct xrpld log directory
- Check OTel Collector logs for filelog receiver errors: `docker compose logs otel-collector`
- Verify Loki is running: `curl http://localhost:3100/ready`
- Check the filelog receiver glob pattern matches your log file paths