Commit Graph

61 Commits

Author SHA1 Message Date
Pratik Mankawde
4db67bc191 Phase 10: Synthetic workload generation and telemetry validation tools
Add comprehensive workload harness for end-to-end validation of the
Phases 1-9 telemetry stack:

Task 10.1 — Multi-node test harness:
  - docker-compose.workload.yaml with full OTel stack (Collector, Jaeger,
    Tempo, Prometheus, Loki, Grafana)
  - generate-validator-keys.sh for automated key generation
  - xrpld-validator.cfg.template for node configuration

Task 10.2 — RPC load generator:
  - rpc_load_generator.py with WebSocket client, configurable rates,
    realistic command distribution (40% health, 30% wallet, 15% explorer,
    10% tx lookups, 5% DEX), W3C traceparent injection

Task 10.3 — Transaction submitter:
  - tx_submitter.py with 10 transaction types (Payment, OfferCreate,
    OfferCancel, TrustSet, NFTokenMint, NFTokenCreateOffer, EscrowCreate,
    EscrowFinish, AMMCreate, AMMDeposit), auto-funded test accounts

Task 10.4 — Telemetry validation suite:
  - validate_telemetry.py checking spans (Jaeger), metrics (Prometheus),
    log-trace correlation (Loki), dashboards (Grafana)
  - expected_spans.json (17 span types, 22 attributes, 3 hierarchies)
  - expected_metrics.json (SpanMetrics, StatsD, Phase 9, dashboards)

Task 10.5 — Performance benchmark suite:
  - benchmark.sh for baseline vs telemetry comparison
  - collect_system_metrics.sh for CPU/memory/latency sampling
  - Thresholds: <3% CPU, <5MB memory, <2ms RPC p99, <5% TPS, <1% consensus

Task 10.6 — CI integration:
  - telemetry-validation.yml GitHub Actions workflow
  - run-full-validation.sh orchestrator script
  - Manual trigger + telemetry branch auto-trigger

Task 10.7 — Documentation:
  - workload/README.md with quick start and tool reference
  - Updated telemetry-runbook.md with validation and benchmark sections
  - Updated 09-data-collection-reference.md with validation inventory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:57 +00:00
Pratik Mankawde
055b88687a Phase 9: Internal Metric Instrumentation Gap Fill (Tasks 9.1-9.10)
Implement ~50 OTel metrics covering NodeStore I/O, cache hit rates,
TxQ state, PerfLog per-RPC/per-job counters, CountedObject instances,
and load factor breakdown via MetricsRegistry.

Core implementation:
- MetricsRegistry class with synchronous instruments (Counter, Histogram)
  for RPC and Job metrics, and ObservableGauge callbacks for cache, TxQ,
  CountedObject, LoadFactor, and NodeStore state polling.
- ServiceRegistry extended with getMetricsRegistry() virtual method.
- Application wires MetricsRegistry lifecycle (create/start/stop).
- PerfLogImp instrumented to emit OTel metrics on RPC and Job events.

Dashboards & observability:
- 3 new Grafana dashboards: RPC Performance, Job Queue, Fee Market/TxQ.
- Extended statsd-node-health dashboard with NodeStore, Cache, and
  CountedObject panels.
- 10 alerting rules added to telemetry-runbook.md.
- Integration test extended with 12 OTel metric validation checks.

Documentation:
- 09-data-collection-reference.md updated with Phase 9 metric tables.
- Unit tests for MetricsRegistry disabled-path (no-op) behavior.

All OTel SDK code guarded with #ifdef XRPL_ENABLE_TELEMETRY.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:28 +00:00
Pratik Mankawde
7d4e00c0b0 Phase 8: Update documentation for log-trace correlation
Task 8.6: Add Log-Trace Correlation section to telemetry-runbook.md
with LogQL examples, verification steps, and troubleshooting guidance.
Update 09-data-collection-reference.md section 5a from "Future" to
actual implementation docs covering log format, ingestion pipeline,
Grafana correlation config, and Loki backend. Add Phase 8 log
correlation test section and troubleshooting to TESTING.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:28 +00:00
Pratik Mankawde
9a69b91f36 Phase 7: Native OTel metrics migration (Tasks 7.1-7.7)
Replace StatsD UDP metric transport with native OpenTelemetry Metrics SDK
export via OTLP/HTTP behind the existing beast::insight::Collector interface.

- Task 7.1: Link opentelemetry-cpp to beast module in CMake when telemetry=ON
- Task 7.2: New OTelCollector class mapping beast::insight instruments to OTel
  SDK (Counter, ObservableGauge, Histogram, Counter<uint64>) with OTLP/HTTP
  export via PeriodicMetricReader at 1s intervals
- Task 7.3: Add server=otel branch to CollectorManager with endpoint config
- Task 7.4: Update otel-collector-config.yaml to use OTLP receiver for metrics
  pipeline (StatsD receiver commented out for backward compat)
- Task 7.5: Metric names preserved via dot-to-underscore formatting matching
  StatsD->Prometheus conventions
- Task 7.6: Rename Grafana dashboards from statsd-* to system-*, update titles
  and UIDs from "StatsD" to "System Metrics"
- Task 7.7: Update integration test to use server=otel, verify OTLP metrics
- Task 7.8: Update runbook, TESTING.md, config reference, and data collection
  reference docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:27 +00:00
Pratik Mankawde
634f7fbbdf formatting
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-03-12 22:12:13 +00:00
Pratik Mankawde
18f9467987 Phase 6: Integrate beast::insight StatsD metrics into telemetry pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:12:13 +00:00
Pratik Mankawde
3d9275fb99 formatting changes
Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
2026-03-12 22:11:46 +00:00
Pratik Mankawde
12f8099c51 Phase 5b: Add ledger/peer/tx spans + expand Grafana dashboards
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:11:46 +00:00
Pratik Mankawde
ca7af13ef2 Add consensus close time panels and docs for accept.apply span
- Add Ledger Apply Duration and Close Time Agreement panels to
  consensus-health dashboard
- Add consensus.accept.apply to telemetry runbook with TraceQL
  queries for close time disagreements and consensus failures
- Add span to TESTING.md expected span catalog and verification loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
0eaabfa2dc Phase 5: Observability stack — spanmetrics, dashboards, runbook
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:10:36 +00:00
Pratik Mankawde
d8d7a40fca Phase 1b: Telemetry core infrastructure
Add the OpenTelemetry telemetry library and supporting infrastructure:

Build system:
- Conan opentelemetry-cpp dependency with OTLP/gRPC exporter
- CMake integration for xrpl_telemetry library target
- Levelization ordering updates

Core library (libxrpl):
- Telemetry class: provider lifecycle, span creation, sampling config
- SpanGuard: RAII span management with attribute/exception helpers
- TelemetryConfig: parse [telemetry] config section
- NullTelemetry: no-op implementation when telemetry is disabled

Application integration:
- Telemetry member in ApplicationImp with start/stop lifecycle
- getTelemetry() interface on Application
- ServiceRegistry telemetry accessor

Docker observability stack:
- OTel Collector, Jaeger, Grafana docker-compose setup
- Collector config with OTLP gRPC receiver and Jaeger exporter

Config and docs:
- Example telemetry config section in xrpld-example.cfg
- Build documentation for telemetry setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:09:01 +00:00
Sergey Kuznetsov
0fd237d707 chore: Add nix development environment (#6314)
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-23 20:10:07 -05:00
nuxtreact
cd218346ff chore: Fix minor issues in comments (#6346) 2026-02-12 14:55:27 -05:00
Pratik Mankawde
96d17b7f66 ci: Add sanitizers to CI builds (#5996)
This change adds support for sanitizer build options in CI builds workflow. Currently `asan+ubsan` is enabled, while `tsan+ubsan` is left disabled as more changes are required.
2026-01-15 16:18:14 +00:00
Mayukha Vadari
3c9f5b6252 refactor: Fix typos in comments, configure cspell (#6164)
This change sets up a `cspell `configuration and fixes lots of typos in comments. There are no other code changes.
2026-01-07 12:10:19 -05:00
Bart
4565cc280b chore: Fix docs readme and cmake (#6122)
This change removes the unused `with_docs` option and fixes the README instructions on how to build the `docs` target.
2025-12-08 18:39:38 +00:00
Bart
9625514da8 chore: Clean up .gitignore and .gitattributes (#6001)
The .gitignore and .gitattributes files contain references to files and directories that the current build no longer produces, so this change removes obsolete entries in these files, and does some general reorganizing of the remaining entries.
2025-12-08 12:35:23 -05:00
Bart
896b8c3b54 chore: Fix file formatting (#5718) 2025-08-22 10:02:56 -04:00
Mayukha Vadari
97f0747e10 chore: Run prettier on all files (#5657) 2025-08-11 16:15:42 +00:00
Bronek Kozicki
dbeb841b5a docs: Update BUILD.md for Conan 2 (#5478)
This change updates BUILD.md for Conan 2, add fixes/workarounds for Apple Clang 17, Clang 20 and CMake 4. This also removes (from BUILD.md only) workarounds for compiler versions which we no longer support e.g. Clang 15 and adds compilation flag -Wno-deprecated-declarations to enable building with Clang 20 on Linux.
2025-08-06 10:18:41 +00:00
Denis Angell
6419f9a253 docs: Set up developer environment with specific XCode version (#5645) 2025-08-04 10:54:54 -04:00
Valentin Balaschenko
8b9e21e3f5 docs: Update build instructions for Ubuntu 22.04+ (#5292) 2025-05-27 18:32:25 +00:00
Ed Hennis
d22a5057b9 Prevent consensus from getting stuck in the establish phase (#5277)
- Detects if the consensus process is "stalled". If it is, then we can declare a 
  consensus and end successfully even if we do not have 80% agreement on
  our proposal.
  - "Stalled" is defined as:
    - We have a close time consensus
    - Each disputed transaction is individually stalled:
      - It has been in the final "stuck" 95% requirement for at least 2
        (avMIN_ROUNDS) "inner rounds" of phaseEstablish,
      - and either all of the other trusted proposers or this validator, if proposing,
        have had the same vote(s) for at least 4 (avSTALLED_ROUNDS) "inner
        rounds", and at least 80% of the validators (including this one, if
        appropriate) agree about the vote (whether yes or no).
- If we have been in the establish phase for more than 10x the previous
  consensus establish phase's time, then consensus is considered "expired",
  and we will leave the round, which sends a partial validation (indicating
  that the node is moving on without validating). Two restrictions avoid
  prematurely exiting, or having an extended exit in extreme situations.
  - The 10x time is clamped to be within a range of 15s
    (ledgerMAX_CONSENSUS) to 120s (ledgerABANDON_CONSENSUS).
  - If consensus has not had an opportunity to walk through all avalanche
    states (defined as not going through 8 "inner rounds" of phaseEstablish),
    then ConsensusState::Expired is treated as ConsensusState::No.
- When enough nodes leave the round, any remaining nodes will see they've
  fallen behind, and move on, too, generally before hitting the timeout. Any
  validations or partial validations sent during this time will help the
  consensus process bring the nodes back together.
2025-03-20 12:41:44 -04:00
Valentin Balaschenko
0d887ad815 docs: Add protobuf dependencies to linux setup instructions (#5156) 2024-10-29 16:26:20 -04:00
Ed Hennis
d9bd75e683 chore: Fix documentation generation job: (#5091)
* Add "doxygen" to list of supported branches to allow for testing and
  development.
* Add titles / H1 to some .md files that don't have them.
2024-08-15 17:03:50 -04:00
John Freeman
d9a5bca625 docs: fix broken links in docs/build/conan.md (#4699) 2024-01-23 20:35:22 -08:00
Elliot Lee
85342b21c8 chore: delete unused Dockerfile (#4791) 2023-10-31 23:00:11 -07:00
Jackson Mills
c915984340 docs(conan.md): fix broken link to conanfile.py (#4740) 2023-10-12 10:12:17 -07:00
David Fuelling
67238b9fa6 Update environment.md build doc to install lzma: (#4498)
On macOS, if you have not installed something that depends on `xz`, then your
system may lack `lzma`, resulting in a build error similar to:

```
Downloading libarchive-3.6.0.tar.xz completed [6250.61k]
libarchive/3.6.0: 
ERROR: libarchive/3.6.0: Error in source() method, line 120
        get(self, **self.conan_data["sources"][self.version], strip_root=True)
        ReadError: file could not be opened successfully:
- method gz: ReadError('not a gzip file')
- method bz2: ReadError('not a bzip2 file')
- method xz: CompressionError('lzma module is not available')
- method tar: ReadError('invalid header')
```

The solution is to ensure that `lzma` is installed by installing `xz`.
2023-04-27 10:18:59 -07:00
John Freeman
1f417764c3 Add install instructions for package managers: (#4472)
Add instructions for installing rippled using the package managers APT
and YUM. Some steps were adapted from xrpl.org.

---------

Co-authored-by: Michael Legleux <mlegleux@ripple.com>
2023-04-13 10:41:16 -07:00
John Freeman
d7725837f5 build: add interface library libxrpl: (#4449)
Make it easy for projects to depend on libxrpl by adding an `ALIAS`
target named `xrpl::libxrpl` for projects to link.

The name was chosen because:

* The current library target is named `xrpl_core`. There is no other
  "non-core" library target against which we need to distinguish the
  "core" library. We only export one library target, and it should just
  be named after the project to keep things simple and predictable.
* Underscores in target or library names are generally discouraged.
* Every target exported in CMake should be prefixed with the project
  name.

By adding an `ALIAS` target, existing consumers who use the `xrpl_core`
target will not be affected.

* In the future, there can be a migration plan to make `xrpl_core` the
  `ALIAS` target (and `libxrpl` the "real" target, which will affect the
  filename of the compiled binary), and eventually remove it entirely.

Also:

* Fix the Conan recipe so that consumers using Conan import a target
  named `xrpl::libxrpl`. This way, every consumer can use the same
  instructions.
* Document the two easiest methods to depend on libxrpl. Both have been
  tested.
* See #4443.
2023-03-22 17:21:03 -07:00
John Freeman
7745c72b2c docs: update build instructions: (#4381)
* Remove obsolete build instructions.
* By using Conan, builders can choose which dependencies specifically to
  build and link as shared objects.
* Refactor the build instructions based on the plan in #4433.
2023-03-22 12:02:42 -07:00
Elliot Lee
c77a8d5ec6 Update Docker.md (#4432)
* Add links to some related resources that may be helpful.
* Docker images can make testing easier to do.
2023-03-02 10:07:09 -08:00
Ikko Ashimine
81e7ec859d Fix typo in consensus.md
determing -> determining
2021-12-14 17:47:04 -08:00
Crypto Brad Garlinghouse
cf8438fe1d Remove obsolete URLs and references to Ripple 2021-04-01 10:38:47 -07:00
Peng Wang
7e97bfce10 Implement ledger forward replay 2021-01-25 18:49:49 -08:00
Peng Wang
706ca874b0 Implement negative UNL functionality:
This change can help improve the liveness of the network during periods of network
instability, by allowing the network to track which validators are presently not online
and to disregard them for the purposes of quorum calculations.
2020-06-30 09:15:37 -07:00
John Freeman
858e93c7f8 Improve the Doxygen Workflow
* Use official build environment Docker image
* Update documentation and Doxygen website URLs
2020-04-14 19:42:42 -07:00
John Freeman
3e9cff9287 Fix Doxygen build 2020-04-06 17:28:53 -07:00
Carl Hua
f00f263852 Set version to 1.5.0 2020-03-30 18:57:35 -04:00
Edward Hennis
e14f913244 Update TxQ developer docs:
* Rename a couple of member variables for clarity.
2018-10-01 11:26:22 -07:00
mDuo13
32ca1dd6ed Update README with XRP Ledger branding 2018-07-26 16:06:16 -07:00
Mike Ellery
37d9544ef7 Refactor/modernize our cmake:
Switch to target-oriented dependencies. Use imported targets for
dependencies (openssl, boost). Localize FindBoost to remove cmake
version dependence for latest boost support. Logically separate
"ripple-libpp" core sources and add install targets.
Add ninja build for msvc. Add two clang sanitizer builds. Misc script
changes to work with latest modernized cmake.
2018-07-20 08:58:04 -07:00
seelabs
cff1abba5d Add .clang-format rules and update code style 2018-07-16 17:49:42 -07:00
Mike Ellery
755849115e Add dev docs generation to Jenkins:
Fixes: RIPD-1521

Switch to pure doxygen HTML for developer docs. Remove docca/boostbook
system. Convert consensus document to markdown. Add existing markdown
files to doxygen input set. Fix some image paths and scale images for
use with MD links. Rename/cleanup some files for consistency.
Add pipeline logic for windows slaves. Add ninja and parallel test run
option. Add make doc target build in build-and-test.sh. Cleanup README
files. Add nounity windows build. Add link to jenkins summary table.
Add rippled_classic build (win). Improve formatting of summary table.
2018-03-02 07:37:15 -08:00
Brad Chase
94c6a2a850 Use LedgerTrie for preferred ledger (RIPD-1551):
These changes augment the Validations class with a LedgerTrie to better
track the history of support for validated ledgers. This improves the
selection of the preferred working ledger for consensus. The Validations
class now tracks both full and partial validations. Partial validations
are only used to determine the working ledger; full validations are
required for any quorum related function. Validators are also now
explicitly restricted to sending validations with increasing ledger
sequence number.
2018-02-02 20:38:38 -05:00
Brad Chase
2c13d9eb57 Redesign CSF framework (RIPD-1361):
- Separate `Scheduler` from `BasicNetwork`.
- Add an event/collector framework for monitoring invariants and calculating statistics.
- Allow distinct network and trust connections between Peers.
- Add a simple routing strategy to support broadcasting arbitrary messages.
- Add a common directed graph (`Digraph`) class for representing network and trust topologies.
- Add a `PeerGroup` class for simpler specification of the trust and network topologies.
- Add a `LedgerOracle` class to ensure distinct ledger histories and simplify branch checking.
- Add a `Submitter` to send transactions in at fixed or random intervals to fixed or random peers.

Co-authored-by: Joseph McGee
2017-12-01 14:15:04 -05:00
Brad Chase
01b4d5cdd4 Migrate thread safety to RCLConsensus (RIPD-1389):
Moves thread safety from generic Consensus to RCLConsensus and switch generic
Consensus to adaptor design.
2017-07-20 14:14:03 -04:00
Brad Chase
3dfb4a13f1 Expose consensus parameters for simulation (RIPD-1355) 2017-07-11 12:53:53 -04:00
Brad Chase
7ae3c91015 Refactor Validations (RIPD-1412,RIPD-1356):
Introduces a generic Validations class for storing and querying current and
recent validations.  Aditionally migrates the validation related timing
constants from LedgerTiming to the new Validations code.

Introduces RCLValidations as the version of Validations adapted for use in the
RCL.  This adds support for flushing/writing validations to the sqlite log and
also manages concurrent access to the Validations data.

RCLValidations::flush() no longer uses the JobQueue for its database
write at shutdown.  It performs the write directly without
changing threads.
2017-07-11 12:53:34 -04:00