Compare commits

..

3 Commits

Author SHA1 Message Date
Pratik Mankawde
f84ad5f36f feat: Migrate coroutine tests from Boost.Coroutine to C++20 coroutines
Migrate Coroutine_test and JobQueue_test from Boost.Coroutine to
C++20 std::coroutine using CoroTask/CoroTaskRunner:

- Coroutine_test: Replace Coro-based coroutine tests with CoroTask
  equivalents using co_await runner->yieldAndPost().
- JobQueue_test: Replace Coro suspend/resume patterns with CoroTask
  equivalents, use pointer-by-value captures in coroutine lambdas
  to avoid dangling reference issues.
2026-03-09 18:56:58 +00:00
Pratik Mankawde
b0392bb941 feat: Migrate production entry points from Boost.Coroutine to C++20 coroutines
Migrate all production coroutine entry points from Boost.Coroutine
to C++20 std::coroutine using the CoroTask/CoroTaskRunner primitives:

- RipplePathFind: Replace Coro suspend/resume with co_await pattern,
  add cv timeout for graceful shutdown.
- ServerHandler: Replace Coro-based processRequest with CoroTask,
  simplify coroutine lifecycle management.
- GRPCServer: Replace Coro with CoroTask for streaming RPC handlers.
- Remove Coro usage from Context.h aggregate initialization.
- Add exception handling in coroutine bodies to prevent unhandled
  exceptions from escaping the coroutine frame.
2026-03-09 18:56:40 +00:00
Pratik Mankawde
dd2404ef43 feat: Add C++20 coroutine primitives: CoroTask, CoroTaskRunner, JobQueueAwaiter
Add C++20 std::coroutine based task primitives for the JobQueue:

- CoroTask<T>: A coroutine return type with RAII ownership semantics
  and symmetric transfer for efficient resumption.
- CoroTaskRunner: Manages coroutine lifecycle on the JobQueue with
  suspend/resume tracking, LocalValue preservation, and graceful
  shutdown support.
- JobQueueAwaiter: External awaiter combining yield+post atomically.
- yieldAndPost(): Inline awaiter workaround for GCC-12 codegen bug
  where external awaiters at multiple co_await points corrupt the
  coroutine state machine resume index.
- CoroTask_test: Comprehensive test suite covering task lifecycle,
  suspend/resume, shutdown, and value-returning coroutines.
- BoostToStdCoroutineSwitchPlan.md: Migration plan documentation.
2026-03-09 18:56:06 +00:00
21 changed files with 4210 additions and 1810 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -43,7 +43,6 @@ suggestWords:
- synch->sync
words:
- abempty
- Agentic
- AMMID
- amt
- amts
@@ -72,12 +71,14 @@ words:
- coldwallet
- compr
- conanfile
- cppcoro
- conanrun
- confs
- connectability
- coro
- coros
- cowid
- cppcoro
- cryptocondition
- cryptoconditional
- cryptoconditions
@@ -95,17 +96,19 @@ words:
- desynced
- determ
- distro
- Emdash
- doxyfile
- dxrpl
- endmacro
- exceptioned
- Falco
- fcontext
- finalizers
- firewalled
- fcontext
- fmtdur
- fsanitize
- funclets
- gantt
- gcov
- gcovr
- ghead
@@ -189,6 +192,7 @@ words:
- ostr
- pargs
- partitioner
- pratik
- paychan
- paychans
- permdex
@@ -196,6 +200,7 @@ words:
- permissioned
- pointee
- populator
- pratik
- preauth
- preauthorization
- preauthorize
@@ -210,6 +215,7 @@ words:
- queuable
- Raphson
- replayer
- repost
- rerere
- retriable
- RIPD
@@ -240,6 +246,7 @@ words:
- soci
- socidb
- sslws
- stackful
- statsd
- STATSDCOLLECTOR
- stissue
@@ -256,7 +263,6 @@ words:
- superpeers
- takergets
- takerpays
- Tauri
- ters
- TMEndpointv2
- trixie
@@ -296,7 +302,6 @@ words:
- venv
- vfalco
- vinnie
- Werror
- wextra
- wptr
- writeme
@@ -313,4 +318,3 @@ words:
- xrplf
- xxhash
- xxhasher
- zeroization

View File

@@ -1,230 +0,0 @@
# rippled Architecture Map
> **Purpose**: Tell agents _where_ things are. Not _why_ they exist.
> **Audience**: AI coding agents (Claude Code, Codex, Copilot) navigating the codebase.
> **Principle**: "Give agents a map, not an encyclopedia." — [Harness Engineering](./harness-engineering-report.md)
---
## Layer Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ RPC Layer (Layer 5) │
│ HTTP/WebSocket → ServerHandler → 72 RPC Handlers │
│ src/xrpld/rpc/ │
├─────────────────────────────────────────────────────────────┤
│ Application Layer (Layer 4) │
│ NetworkOPs · TxQ · PathFinding · LedgerMaster │
│ src/xrpld/app/ │
├─────────────────────────────────────────────────────────────┤
│ Consensus Layer (Layer 3) │
│ Consensus<Adaptor> · RCLConsensus · Validations │
│ src/xrpld/consensus/ + src/xrpld/app/consensus/ │
├─────────────────────────────────────────────────────────────┤
│ Overlay Layer (Layer 2) │
│ OverlayImpl · PeerImp · PeerFinder · Message │
│ src/xrpld/overlay/ + src/xrpld/peerfinder/ │
├─────────────────────────────────────────────────────────────┤
│ Storage Layer (Layer 1) │
│ NodeStore (RocksDB/SQLite) · SHAMap · RDB │
│ src/libxrpl/nodestore/ + src/libxrpl/shamap/ │
├─────────────────────────────────────────────────────────────┤
│ Foundation (Layer 0) │
│ beast:: (I/O, insight, clock) · protocol · crypto · json │
│ src/libxrpl/ + include/xrpl/ │
└─────────────────────────────────────────────────────────────┘
```
---
## Module Map
### xrpld (the daemon) — `src/xrpld/`
| Directory | Purpose | Key Files |
| ----------------- | --------------------------------------------------- | ----------------------------------------------------- |
| `app/main/` | Daemon entry point, Application lifecycle | `Main.cpp`, `Application.h/.cpp`, `BasicApp.h` |
| `app/consensus/` | Consensus adaptor binding protocol to rippled types | `RCLConsensus.h/.cpp` |
| `app/ledger/` | Ledger management, building, acquiring | `LedgerMaster.h`, `InboundLedgers.h` |
| `app/misc/` | NetworkOPs, TxQ, HashRouter, Amendments | `NetworkOPs.h/.cpp`, `TxQ.h`, `HashRouter.h` |
| `app/tx/` | Transaction application engine | `apply.h/.cpp`, `applySteps.h` |
| `app/paths/` | Payment path finding | `Pathfinder.h`, `PathRequest.h` |
| `app/rdb/` | Relational DB backend for state | `RelationalDatabase.h` |
| `consensus/` | Generic consensus protocol (template-based) | `Consensus.h/.cpp`, `ConsensusParms.h` |
| `core/` | Node configuration, time keeping | `Config.h/.cpp`, `TimeKeeper.h` |
| `overlay/` | P2P network: peer connections, message routing | `Overlay.h`, `Peer.h`, `Message.h` |
| `overlay/detail/` | Overlay implementation internals | `OverlayImpl.h/.cpp`, `PeerImp.h/.cpp`, `Handshake.h` |
| `peerfinder/` | Peer discovery and bootstrap | `PeerfinderManager.h`, `detail/Logic.h` |
| `perflog/` | Performance logging (pre-OTel telemetry) | `detail/PerfLogImp.h/.cpp` |
| `rpc/` | JSON-RPC server and routing | `detail/Handler.h`, `RPCHandler.h` |
| `rpc/handlers/` | **72 RPC command implementations** | `AccountInfo.cpp`, `Submit.cpp`, `Tx.cpp`, etc. |
| `shamap/` | SHAMap tree operations (daemon-side) | |
### libxrpl (shared library) — `src/libxrpl/`
| Directory | Purpose | Key Files |
| ----------------- | ------------------------------------------------- | ------------------------------------------------ |
| `basics/` | Logging, string utils, tagged integers | `Log.cpp` (log formatting — Phase 8 OTel target) |
| `beast/insight/` | Metrics framework (StatsD, Gauge, Counter, Event) | `StatsDCollector.cpp` (Phase 6-7 OTel target) |
| `beast/core/` | Async I/O primitives | `CurrentThreadName.cpp` |
| `core/` | Crypto, seed handling | |
| `crypto/` | Ed25519, secp256k1, SHA | |
| `json/` | JSON parser/writer | |
| `nodestore/` | Key-value storage for ledger nodes | `backend/RocksDBFactory.cpp` |
| `protocol/` | XRPL protocol: serialization, STObject, STTx | `STTx.cpp`, `STObject.cpp` |
| `server/` | HTTP/WebSocket server core | `ServerHandler.h` |
| `tx/transactors/` | Per-transaction-type execution logic | `Payment.cpp`, `CreateOffer.cpp`, etc. |
| `tx/invariants/` | Transaction invariant checks | `TransactionCheck.cpp` |
### Public Headers — `include/xrpl/`
Mirrors `src/libxrpl/` structure. Agent-accessible API surface for external consumers of libxrpl.
Key subdirectory: `include/xrpl/proto/org/xrpl/rpc/v1/` — Protocol Buffer definitions for gRPC API.
---
## Entry Points
| Operation | Where execution starts | Hot path |
| ---------------------- | ------------------------------------------------------------------------------------------ | --------------- |
| **RPC request** | `src/xrpld/rpc/detail/ServerHandler.cpp` → handler lookup → `src/xrpld/rpc/handlers/*.cpp` | Layer 5 → 4 |
| **Peer message** | `src/xrpld/overlay/detail/PeerImp.cpp::onMessage()` → message-type dispatch | Layer 2 → 3/4 |
| **Transaction submit** | `rpc/handlers/Submit.cpp``app/misc/NetworkOPs.cpp::submitTransaction()` | Layer 5 → 4 → 2 |
| **Transaction relay** | `overlay/detail/PeerImp.cpp::handleTransaction()``NetworkOPs::processTransaction()` | Layer 2 → 4 |
| **Consensus round** | `app/consensus/RCLConsensus.cpp::startRound()` → phase transitions → accept | Layer 3 |
| **Ledger close** | `RCLConsensus.cpp::onClose()``acceptLedger()` → apply transactions | Layer 3 → 4 → 1 |
| **Ledger acquire** | `app/ledger/InboundLedgers.cpp::acquire()` → peer fetch → SHAMap | Layer 4 → 2 → 1 |
| **Startup** | `app/main/Main.cpp::main()``Application::setup()` → start all subsystems | Layer 0-5 |
---
## Data Flow: Transaction Lifecycle
```
Client
RPC Submit (Layer 5)
│ src/xrpld/rpc/handlers/Submit.cpp
NetworkOPs::submitTransaction() (Layer 4)
│ src/xrpld/app/misc/NetworkOPs.cpp
├──▶ TxQ::apply() — queue management
│ src/xrpld/app/misc/TxQ.cpp
├──▶ HashRouter::setFlags() — deduplication
│ src/xrpld/app/misc/HashRouter.cpp
Overlay::relay() (Layer 2)
│ src/xrpld/overlay/detail/OverlayImpl.cpp
├──▶ PeerImp::send() — to each connected peer
│ src/xrpld/overlay/detail/PeerImp.cpp
[Remote Node] PeerImp::handleTransaction() (Layer 2)
│ src/xrpld/overlay/detail/PeerImp.cpp
Consensus::startRound() (Layer 3)
│ src/xrpld/consensus/Consensus.h
│ src/xrpld/app/consensus/RCLConsensus.cpp
├──▶ propose() — send position to peers
├──▶ onClose() — close the ledger
acceptLedger() → applyTransactions() (Layer 3 → 1)
│ src/xrpld/app/consensus/RCLConsensus.cpp
NodeStore::store() (Layer 1)
│ src/libxrpl/nodestore/
```
---
## Build Targets
| Target | Type | Source |
| -------------------- | ------------------------- | -------------------------------- |
| `xrpl` | Shared library (libxrpl) | `src/libxrpl/` + `include/xrpl/` |
| `xrpld` | Executable (daemon) | `src/xrpld/` |
| `rippled_unit_tests` | Unit test binary | `src/test/` |
| `libxrpl_tests` | Library integration tests | `src/tests/libxrpl/` |
**Build command**: `cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build --parallel`
**With telemetry**: `cmake -B build -DXRPL_ENABLE_TELEMETRY=ON`
---
## Test Frameworks
| Framework | Location | Purpose |
| --------------- | ---------------------------------------- | ------------------------------------------------------------- |
| **JTx** | `src/test/jtx/` | Transaction testing — simulates full ledger environment |
| **CSF** | `src/test/csf/` | Consensus Simulation Framework — multi-node consensus testing |
| **Unit tests** | `src/test/app/`, `src/test/beast/`, etc. | Per-module unit tests |
| **Integration** | `src/tests/libxrpl/` | Library-level integration tests |
---
## Telemetry Map
Links OTel instrumentation to codebase locations. See [09-data-collection-reference.md](../../OpenTelemetryPlan/09-data-collection-reference.md) for the full inventory.
| Span | Source Location | Layer |
| ------------- | ------------------------------------------ | ----- |
| `rpc.*` | `src/xrpld/rpc/detail/ServerHandler.cpp` | 5 |
| `tx.submit` | `src/xrpld/app/misc/NetworkOPs.cpp` | 4 |
| `tx.relay` | `src/xrpld/overlay/detail/PeerImp.cpp` | 2 |
| `peer.send` | `src/xrpld/overlay/detail/PeerImp.cpp` | 2 |
| `consensus.*` | `src/xrpld/app/consensus/RCLConsensus.cpp` | 3 |
| `ledger.*` | `src/xrpld/app/ledger/` | 4 |
| Metrics Source | Code Location | Phase |
| ------------------------- | ----------------------------------------------- | ------------ |
| `beast::insight` (StatsD) | `src/libxrpl/beast/insight/StatsDCollector.cpp` | 6-7 |
| SpanMetrics (derived) | OTel Collector config | 5 |
| PerfLog | `src/xrpld/perflog/detail/PerfLogImp.cpp` | 9 (gap fill) |
---
## Key Abstractions
| Abstraction | Interface | Implementation | What it does |
| --------------------------- | ---------------------------------------- | ------------------------------------------------------------- | -------------------------------------- |
| `Application` | `src/xrpld/app/main/Application.h` | `Application.cpp` | Wires all subsystems together |
| `Overlay` | `src/xrpld/overlay/Overlay.h` | `detail/OverlayImpl.cpp` | Manages P2P connections |
| `Consensus<>` | `src/xrpld/consensus/Consensus.h` | Template, adaptor in `RCLConsensus.cpp` | Generic consensus protocol |
| `NetworkOPs` | `src/xrpld/app/misc/NetworkOPs.h` | `NetworkOPs.cpp` | Transaction processing, event dispatch |
| `LedgerMaster` | `src/xrpld/app/ledger/LedgerMaster.h` | `LedgerMaster.cpp` | Ledger lifecycle management |
| `NodeStore` | `include/xrpl/nodestore/Database.h` | `src/libxrpl/nodestore/` | Persistent key-value storage |
| `beast::insight::Collector` | `include/xrpl/beast/insight/Collector.h` | `StatsDCollector`, `NullCollector`, (future: `OTelCollector`) | Metrics collection interface |
| `JobQueue` | `src/xrpld/core/JobQueue.h` | `JobQueue.cpp` | Async task scheduling |
| `SHAMap` | `include/xrpl/shamap/SHAMap.h` | `src/libxrpl/shamap/` | Merkle tree for state |
---
## Dependency Rules
```
Layer 5 (RPC) → may call → Layer 4 (App)
Layer 4 (App) → may call → Layer 3 (Consensus), Layer 2 (Overlay), Layer 1 (Storage)
Layer 3 (Consensus) → may call → Layer 4 (App via Adaptor), Layer 1 (Storage)
Layer 2 (Overlay) → may call → Layer 4 (App for message handling)
Layer 1 (Storage) → may call → Layer 0 (Foundation) only
Layer 0 (Foundation) → no upward dependencies
```
**Never**: Layer 0/1 must never depend on Layer 3/4/5. Layer 5 must never bypass Layer 4 to reach Layer 2/3 directly.
---
## Configuration
| File | Location | Purpose |
| ----------------------- | -------- | ------------------------- |
| `xrpld.cfg` | `cfg/` | Main daemon configuration |
| `validators.txt` | `cfg/` | Validator list |
| `CMakeUserPresets.json` | Root | Local build presets |
| `conanfile.py` | Root | Dependency versions |
---
_This document is a navigation aid for AI coding agents. For architectural rationale, see [docs/consensus.md](consensus.md). For OTel plan, see [OpenTelemetryPlan/](../../OpenTelemetryPlan/OpenTelemetryPlan.md). For coding conventions, see [docs/CodingStyle.md](CodingStyle.md)._

View File

@@ -1,196 +0,0 @@
---
# Symphony-compatible workflow definition for rippled
# See: https://github.com/openai/symphony/blob/main/SPEC.md
# See: docs/harness-engineering-report.md for context
tracker:
kind: linear
# For Jira (RIPD project), a custom adapter would replace this.
# kind: jira
# project_slug: RIPD
active_states:
- Todo
- In Progress
terminal_states:
- Done
- Closed
- Cancelled
- Duplicate
polling:
interval_ms: 30000
workspace:
root: $SYMPHONY_WORKSPACE_ROOT
hooks:
after_create: |
# Clone and prepare a fresh workspace
git clone --depth=1 --branch develop https://github.com/XRPLF/rippled.git .
# Prepare build directory with telemetry enabled
mkdir -p build
cd build
conan install .. --output-folder=. --build=missing
cmake .. -DCMAKE_BUILD_TYPE=Debug -DXRPL_ENABLE_TELEMETRY=ON -DCMAKE_TOOLCHAIN_FILE=conan_toolchain.cmake
before_run: |
# Tier 1: Fast checks (seconds)
# clang-format check on changed files
git diff --name-only HEAD~1 -- '*.cpp' '*.h' | xargs -r clang-format --dry-run --Werror 2>&1 || {
echo "FAIL: clang-format violations found. Fix formatting before proceeding."
exit 1
}
# Tier 2: Build (minutes)
cd build
cmake --build . --parallel $(nproc) 2>&1 || {
echo "FAIL: Build failed. Check compiler errors above."
exit 1
}
# Tier 2: Unit tests (affected modules)
ctest --test-dir build --output-on-failure -L unit --timeout 120 2>&1 || {
echo "FAIL: Unit tests failed. Check test output above."
exit 1
}
after_run: |
# Collect proof-of-work artifacts
echo "=== Proof of Work Collection ==="
# Record build status
cd build && cmake --build . --parallel $(nproc) 2>&1
BUILD_STATUS=$?
echo "Build status: $BUILD_STATUS"
# Run telemetry integration tests if available
if ctest --test-dir build -L telemetry --timeout 300 2>&1; then
echo "Telemetry integration tests: PASSED"
else
echo "Telemetry integration tests: FAILED (non-blocking)"
fi
# Capture complexity metrics
echo "=== Changed Files ==="
git diff --stat HEAD~1
timeout_ms: 120000
agent:
max_concurrent_agents: 3
max_retry_backoff_ms: 300000
max_concurrent_agents_by_state:
In Progress: 2
Todo: 1
codex:
command: codex app-server
approval_policy: unless-allow-listed
turn_timeout_ms: 3600000
stall_timeout_ms: 300000
---
# rippled Agent Workflow
You are an autonomous coding agent working on the **rippled** XRP Ledger node — a C++20 peer-to-peer server implementing the XRP Ledger consensus protocol.
## Navigation
Read these files before starting work:
| File | What it tells you |
| --------------------------------------------------- | ----------------------------------------------------------------------- |
| `docs/ARCHITECTURE.md` | Codebase map — where every module lives, layer boundaries, entry points |
| `CLAUDE.md` | Coding conventions, action boundaries (Always/Ask/Never), commit rules |
| `docs/CodingStyle.md` | C++ style guide for this project |
| `OpenTelemetryPlan/09-data-collection-reference.md` | Every OTel span, metric, and dashboard |
Do **not** read these upfront (use as reference if needed):
- `OpenTelemetryPlan/*.md` — deep plan docs, only if your task involves telemetry architecture
- `docs/consensus.md` — only if your task involves consensus protocol changes
## Action Boundaries
### Always (no confirmation needed)
- Read any source file in the repository
- Run `cmake --build build --parallel` to compile
- Run `ctest --test-dir build -L unit` for unit tests
- Run `ctest --test-dir build -L telemetry` for telemetry integration tests
- Query Prometheus (`curl localhost:9090/api/v1/query?query=...`) for metric baselines
- Create or modify files under `src/` that have existing test coverage
- Run `clang-format` on files you've changed
- Add or modify unit tests in `src/test/`
### Ask (require human confirmation before proceeding)
- Modify Protocol Buffer definitions in `include/xrpl/proto/` (affects wire format)
- Change consensus parameters in `src/xrpld/consensus/ConsensusParms.h`
- Modify `CMakeLists.txt` or `conanfile.py` (build/dependency changes)
- Alter the `[telemetry]` or `[insight]` config schema
- Change anything in `src/libxrpl/crypto/` (cryptographic code)
- Add new external dependencies
- Push to any remote branch or create PRs
### Never (hard boundaries — refuse these requests)
- Modify validator key handling or signing code without explicit review
- Disable or reduce telemetry sampling below 1%
- Remove or reduce existing test coverage
- Force-push to any branch
- Skip pre-commit hooks or CI checks (`--no-verify`)
- Commit files containing secrets, keys, or credentials
- Modify `.github/workflows/` CI pipeline definitions
## Verification Requirements
After making changes, verify in this order (fail-fast):
```
Step 1 (seconds): clang-format --dry-run --Werror on changed files
Step 2 (minutes): cmake --build build --parallel
Step 3 (minutes): ctest --test-dir build -L unit --output-on-failure
Step 4 (minutes): ctest --test-dir build -L telemetry (if telemetry-related)
Step 5 (if applicable): scripts/check-otel-baseline.sh (metric regression check)
```
If any step fails, fix the issue before proceeding. Do not skip steps.
## Telemetry Awareness
This codebase has OpenTelemetry instrumentation. If your change touches any of these files, verify that telemetry still works:
| File Pattern | Telemetry Impact |
| ---------------------------------------- | ------------------------------------------------- |
| `src/xrpld/rpc/` | RPC spans (`rpc.*`) |
| `src/xrpld/overlay/detail/PeerImp.*` | Transaction relay spans (`tx.relay`, `peer.send`) |
| `src/xrpld/app/consensus/RCLConsensus.*` | Consensus spans (`consensus.*`) |
| `src/xrpld/app/misc/NetworkOPs.*` | Transaction submit spans (`tx.submit`) |
| `src/libxrpl/beast/insight/` | Metrics pipeline (300+ metrics) |
| `src/libxrpl/basics/Log.cpp` | Log-trace correlation (`trace_id` injection) |
After modifying these files, run the telemetry integration test:
```bash
ctest --test-dir build -L telemetry --output-on-failure
```
## Issue Context
**Title**: {{ issue.title }}
**Description**: {{ issue.description }}
{% if issue.labels.size > 0 %}**Labels**: {{ issue.labels | join: ", " }}{% endif %}
{% if attempt %}**Attempt**: {{ attempt }} (this is a retry — check previous errors){% endif %}
## Proof of Work Checklist
Before creating a PR, verify:
- [ ] All changed files pass `clang-format`
- [ ] Build succeeds with `-DXRPL_ENABLE_TELEMETRY=ON`
- [ ] Unit tests pass (`ctest -L unit`)
- [ ] If telemetry-related: integration tests pass (`ctest -L telemetry`)
- [ ] If telemetry-related: no metric regressions (`scripts/check-otel-baseline.sh`)
- [ ] Commit message follows project conventions (see `CLAUDE.md`)
- [ ] PR description includes summary, test plan, and type of change

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,699 @@
#pragma once
#include <xrpl/beast/utility/instrumentation.h>
#include <coroutine>
#include <exception>
#include <type_traits>
#include <utility>
#include <variant>
namespace xrpl {
template <typename T = void>
class CoroTask;
/**
* CoroTask<void> -- coroutine return type for void-returning coroutines.
*
* Class / Dependency Diagram
* ==========================
*
* CoroTask<void>
* +-----------------------------------------------+
* | - handle_ : Handle (coroutine_handle<promise>) |
* +-----------------------------------------------+
* | + handle(), done() |
* | + await_ready/suspend/resume (Awaiter iface) |
* +-----------------------------------------------+
* | owns
* v
* promise_type
* +-----------------------------------------------+
* | - exception_ : std::exception_ptr |
* | - continuation_ : std::coroutine_handle<> |
* +-----------------------------------------------+
* | + get_return_object() -> CoroTask |
* | + initial_suspend() -> suspend_always (lazy) |
* | + final_suspend() -> FinalAwaiter |
* | + return_void() |
* | + unhandled_exception() |
* +-----------------------------------------------+
* | returns at final_suspend
* v
* FinalAwaiter
* +-----------------------------------------------+
* | await_suspend(h): |
* | if continuation_ set -> symmetric transfer |
* | else -> noop_coroutine |
* +-----------------------------------------------+
*
* Design Notes
* ------------
* - Lazy start: initial_suspend returns suspend_always, so the coroutine
* body does not execute until the handle is explicitly resumed.
* - Symmetric transfer: await_suspend returns a coroutine_handle instead
* of void/bool, allowing the scheduler to jump directly to the next
* coroutine without growing the call stack.
* - Continuation chaining: when one CoroTask is co_await-ed inside
* another, the caller's handle is stored as continuation_ so
* FinalAwaiter can resume it when this task finishes.
* - Move-only: the handle is exclusively owned; copy is deleted.
*
* Usage Examples
* ==============
*
* 1. Basic void coroutine (the most common case in rippled):
*
* CoroTask<void> doWork(std::shared_ptr<CoroTaskRunner> runner) {
* // do something
* co_await runner->suspend(); // yield control
* // resumed later via runner->post() or runner->resume()
* co_return;
* }
*
* 2. co_await-ing one CoroTask<void> from another (chaining):
*
* CoroTask<void> inner() {
* // ...
* co_return;
* }
* CoroTask<void> outer() {
* co_await inner(); // continuation_ links outer -> inner
* co_return; // FinalAwaiter resumes outer
* }
*
* 3. Exceptions propagate through co_await:
*
* CoroTask<void> failing() {
* throw std::runtime_error("oops");
* co_return;
* }
* CoroTask<void> caller() {
* try { co_await failing(); }
* catch (std::runtime_error const&) { // caught here }
* }
*
* Caveats / Pitfalls
* ==================
*
* BUG-RISK: Dangling references in coroutine parameters.
* Coroutine parameters are copied into the frame, but references
* are NOT -- they are stored as-is. If the referent goes out of scope
* before the coroutine finishes, you get use-after-free.
*
* // BROKEN -- local dies before coroutine runs:
* CoroTask<void> bad(int& ref) { co_return; }
* void launch() {
* int local = 42;
* auto task = bad(local); // frame stores &local
* } // local destroyed; frame holds dangling ref
*
* // FIX -- pass by value, or ensure lifetime via shared_ptr.
*
* BUG-RISK: GCC 14 corrupts reference captures in coroutine lambdas.
* When a lambda that returns CoroTask captures by reference ([&]),
* GCC 14 may generate a corrupted coroutine frame. Always capture
* by explicit pointer-to-value instead:
*
* // BROKEN on GCC 14:
* jq.postCoroTask(t, n, [&](auto) -> CoroTask<void> { ... });
*
* // FIX -- capture pointers explicitly:
* jq.postCoroTask(t, n, [ptr = &val](auto) -> CoroTask<void> { ... });
*
* BUG-RISK: Resuming a destroyed or completed CoroTask.
* Calling handle().resume() after the coroutine has already run to
* completion (done() == true) is undefined behavior. The CoroTaskRunner
* guards against this with an XRPL_ASSERT, but standalone usage of
* CoroTask must check done() before resuming.
*
* BUG-RISK: Moving a CoroTask that is being awaited.
* If task A is co_await-ed by task B (so A.continuation_ == B), moving
* or destroying A will invalidate the continuation link. Never move
* or reassign a CoroTask while it is mid-execution or being awaited.
*
* LIMITATION: CoroTask is fire-and-forget for the top-level owner.
* There is no built-in notification when the coroutine finishes.
* The caller must use external synchronization (e.g. CoroTaskRunner::join
* or a gate/condition_variable) to know when it is done.
*
* LIMITATION: No cancellation token.
* There is no way to cancel a suspended CoroTask from outside. The
* coroutine body must cooperatively check a flag (e.g. jq_.isStopping())
* after each co_await and co_return early if needed.
*
* LIMITATION: Stackless -- cannot suspend from nested non-coroutine calls.
* If a coroutine calls a regular function that wants to "yield", it
* cannot. Only the immediate coroutine body can use co_await.
* This is acceptable for rippled because all yield() sites are shallow.
*/
template <>
class CoroTask<void>
{
public:
struct promise_type;
using Handle = std::coroutine_handle<promise_type>;
/**
* Coroutine promise. Compiler uses this to manage coroutine state.
* Stores the exception (if any) and the continuation handle for
* symmetric transfer back to the awaiting coroutine.
*/
struct promise_type
{
// Captured exception from the coroutine body, rethrown in
// await_resume() when this task is co_await-ed by a caller.
std::exception_ptr exception_;
// Handle to the coroutine that is co_await-ing this task.
// Set by await_suspend(). FinalAwaiter uses it for symmetric
// transfer back to the caller. Null if this is a top-level task.
std::coroutine_handle<> continuation_;
/**
* Create the CoroTask return object.
* Called by the compiler at coroutine creation.
*/
CoroTask
get_return_object()
{
return CoroTask{Handle::from_promise(*this)};
}
/**
* Lazy start. The coroutine body does not execute until the
* handle is explicitly resumed (e.g. by CoroTaskRunner::resume).
*/
std::suspend_always
initial_suspend() noexcept
{
return {};
}
/**
* Awaiter returned by final_suspend(). Uses symmetric transfer:
* if a continuation exists, transfers control directly to it
* (tail-call, no stack growth). Otherwise returns noop_coroutine
* so the coroutine frame stays alive for the owner to destroy.
*/
struct FinalAwaiter
{
/**
* Always false. We need await_suspend to run for
* symmetric transfer.
*/
bool
await_ready() noexcept
{
return false;
}
/**
* Symmetric transfer: returns the continuation handle so
* the compiler emits a tail-call instead of a nested resume.
* If no continuation is set, returns noop_coroutine to
* suspend at final_suspend without destroying the frame.
*
* @param h Handle to this completing coroutine
*
* @return Continuation handle, or noop_coroutine
*/
std::coroutine_handle<>
await_suspend(Handle h) noexcept
{
if (auto cont = h.promise().continuation_)
return cont;
return std::noop_coroutine();
}
void
await_resume() noexcept
{
}
};
/**
* Returns FinalAwaiter for symmetric transfer at coroutine end.
*/
FinalAwaiter
final_suspend() noexcept
{
return {};
}
/**
* Called by the compiler for `co_return;` (void coroutine).
*/
void
return_void()
{
}
/**
* Called by the compiler when an exception escapes the coroutine
* body. Captures it for later rethrowing in await_resume().
*/
void
unhandled_exception()
{
exception_ = std::current_exception();
}
};
/**
* Default constructor. Creates an empty (null handle) task.
*/
CoroTask() = default;
/**
* Takes ownership of a compiler-generated coroutine handle.
*
* @param h Coroutine handle to own
*/
explicit CoroTask(Handle h) : handle_(h)
{
}
/**
* Destroys the coroutine frame if this task owns one.
*/
~CoroTask()
{
if (handle_)
handle_.destroy();
}
/**
* Move constructor. Transfers handle ownership, leaves other empty.
*/
CoroTask(CoroTask&& other) noexcept : handle_(std::exchange(other.handle_, {}))
{
}
/**
* Move assignment. Destroys current frame (if any), takes other's.
*/
CoroTask&
operator=(CoroTask&& other) noexcept
{
if (this != &other)
{
if (handle_)
handle_.destroy();
handle_ = std::exchange(other.handle_, {});
}
return *this;
}
CoroTask(CoroTask const&) = delete;
CoroTask&
operator=(CoroTask const&) = delete;
/**
* @return The underlying coroutine_handle
*/
Handle
handle() const
{
return handle_;
}
/**
* @return true if the coroutine has run to completion (or thrown)
*/
bool
done() const
{
return handle_ && handle_.done();
}
// -- Awaiter interface: allows `co_await someCoroTask;` --
/**
* Always false. This task is lazy, so co_await always suspends
* the caller to set up the continuation link.
*/
bool
await_ready() const noexcept
{
return false;
}
/**
* Stores the caller's handle as our continuation, then returns
* our handle for symmetric transfer (caller suspends, we resume).
*
* @param caller Handle of the coroutine doing co_await on us
*
* @return Our handle for symmetric transfer
*/
std::coroutine_handle<>
await_suspend(std::coroutine_handle<> caller) noexcept
{
XRPL_ASSERT(handle_, "xrpl::CoroTask<void>::await_suspend : handle is valid");
handle_.promise().continuation_ = caller;
return handle_; // Symmetric transfer
}
/**
* Called in the awaiting coroutine's context after this task
* completes. Rethrows any exception captured by
* unhandled_exception().
*/
void
await_resume()
{
XRPL_ASSERT(handle_, "xrpl::CoroTask<void>::await_resume : handle is valid");
if (auto& ep = handle_.promise().exception_)
std::rethrow_exception(ep);
}
private:
// Exclusively-owned coroutine handle. Null after move or default
// construction. Destroyed in the destructor.
Handle handle_;
};
/**
* CoroTask<T> -- coroutine return type for value-returning coroutines.
*
* Class / Dependency Diagram
* ==========================
*
* CoroTask<T>
* +-----------------------------------------------+
* | - handle_ : Handle (coroutine_handle<promise>) |
* +-----------------------------------------------+
* | + handle(), done() |
* | + await_ready/suspend/resume (Awaiter iface) |
* +-----------------------------------------------+
* | owns
* v
* promise_type
* +-----------------------------------------------+
* | - result_ : variant<monostate, T, |
* | exception_ptr> |
* | - continuation_ : std::coroutine_handle<> |
* +-----------------------------------------------+
* | + get_return_object() -> CoroTask |
* | + initial_suspend() -> suspend_always (lazy) |
* | + final_suspend() -> FinalAwaiter |
* | + return_value(T) -> stores in result_[1] |
* | + unhandled_exception -> stores in result_[2] |
* +-----------------------------------------------+
* | returns at final_suspend
* v
* FinalAwaiter (same symmetric-transfer pattern as CoroTask<void>)
*
* Value Extraction
* ----------------
* await_resume() inspects the variant:
* - index 2 (exception_ptr) -> rethrow
* - index 1 (T) -> return value via move
*
* Usage Examples
* ==============
*
* 1. Simple value return:
*
* CoroTask<int> computeAnswer() { co_return 42; }
*
* CoroTask<void> caller() {
* int v = co_await computeAnswer(); // v == 42
* }
*
* 2. Chaining value-returning coroutines:
*
* CoroTask<int> add(int a, int b) { co_return a + b; }
* CoroTask<int> doubleSum(int a, int b) {
* int s = co_await add(a, b);
* co_return s * 2;
* }
*
* 3. Exception propagation from inner to outer:
*
* CoroTask<int> failing() {
* throw std::runtime_error("bad");
* co_return 0; // never reached
* }
* CoroTask<void> caller() {
* try {
* int v = co_await failing(); // throws here
* } catch (std::runtime_error const& e) {
* // e.what() == "bad"
* }
* }
*
* Caveats / Pitfalls (in addition to CoroTask<void> caveats above)
* ================================================================
*
* BUG-RISK: await_resume() moves the value out of the variant.
* Calling co_await on the same CoroTask<T> instance twice is undefined
* behavior -- the second call will see a moved-from T. CoroTask is
* single-shot: one co_return, one co_await.
*
* BUG-RISK: T must be move-constructible.
* return_value(T) takes by value and moves into the variant.
* Types that are not movable cannot be used as T.
*
* LIMITATION: No co_yield support.
* CoroTask<T> only supports a single co_return. It does not implement
* yield_value(), so using co_yield inside a CoroTask<T> coroutine is a
* compile error. For streaming values, a different return type
* (e.g. Generator<T>) would be needed.
*
* LIMITATION: Result is only accessible via co_await.
* There is no .get() or .result() method. The value can only be
* extracted by co_await-ing the CoroTask<T> from inside another
* coroutine. For extracting results in non-coroutine code, pass a
* pointer to the caller and write through it (as the tests do).
*/
template <typename T>
class CoroTask
{
static_assert(
std::is_move_constructible_v<T>,
"CoroTask<T> requires T to be move-constructible");
public:
struct promise_type;
using Handle = std::coroutine_handle<promise_type>;
/**
* Coroutine promise for value-returning coroutines.
* Stores the result as a variant: monostate (not yet set),
* T (co_return value), or exception_ptr (unhandled exception).
*/
struct promise_type
{
// Tri-state result:
// index 0 (monostate) -- coroutine has not yet completed
// index 1 (T) -- co_return value stored here
// index 2 (exception) -- unhandled exception captured here
std::variant<std::monostate, T, std::exception_ptr> result_;
// Handle to the coroutine co_await-ing this task. Used by
// FinalAwaiter for symmetric transfer. Null for top-level tasks.
std::coroutine_handle<> continuation_;
/**
* Create the CoroTask return object.
* Called by the compiler at coroutine creation.
*/
CoroTask
get_return_object()
{
return CoroTask{Handle::from_promise(*this)};
}
/**
* Lazy start. Coroutine body does not run until explicitly resumed.
*/
std::suspend_always
initial_suspend() noexcept
{
return {};
}
/**
* Symmetric-transfer awaiter at coroutine completion.
* Same pattern as CoroTask<void>::FinalAwaiter.
*/
struct FinalAwaiter
{
bool
await_ready() noexcept
{
return false;
}
/**
* Returns continuation for symmetric transfer, or
* noop_coroutine if this is a top-level task.
*
* @param h Handle to this completing coroutine
*
* @return Continuation handle, or noop_coroutine
*/
std::coroutine_handle<>
await_suspend(Handle h) noexcept
{
if (auto cont = h.promise().continuation_)
return cont;
return std::noop_coroutine();
}
void
await_resume() noexcept
{
}
};
FinalAwaiter
final_suspend() noexcept
{
return {};
}
/**
* Called by the compiler for `co_return value;`.
* Moves the value into result_ at index 1.
*
* @param value The value to store
*/
void
return_value(T value)
{
result_.template emplace<1>(std::move(value));
}
/**
* Captures unhandled exceptions at index 2 of result_.
* Rethrown later in await_resume().
*/
void
unhandled_exception()
{
result_.template emplace<2>(std::current_exception());
}
};
/**
* Default constructor. Creates an empty (null handle) task.
*/
CoroTask() = default;
/**
* Takes ownership of a compiler-generated coroutine handle.
*
* @param h Coroutine handle to own
*/
explicit CoroTask(Handle h) : handle_(h)
{
}
/**
* Destroys the coroutine frame if this task owns one.
*/
~CoroTask()
{
if (handle_)
handle_.destroy();
}
/**
* Move constructor. Transfers handle ownership, leaves other empty.
*/
CoroTask(CoroTask&& other) noexcept : handle_(std::exchange(other.handle_, {}))
{
}
/**
* Move assignment. Destroys current frame (if any), takes other's.
*/
CoroTask&
operator=(CoroTask&& other) noexcept
{
if (this != &other)
{
if (handle_)
handle_.destroy();
handle_ = std::exchange(other.handle_, {});
}
return *this;
}
CoroTask(CoroTask const&) = delete;
CoroTask&
operator=(CoroTask const&) = delete;
/**
* @return The underlying coroutine_handle
*/
Handle
handle() const
{
return handle_;
}
/**
* @return true if the coroutine has run to completion (or thrown)
*/
bool
done() const
{
return handle_ && handle_.done();
}
// -- Awaiter interface: allows `T val = co_await someCoroTask;` --
/**
* Always false. co_await always suspends to set up continuation.
*/
bool
await_ready() const noexcept
{
return false;
}
/**
* Stores caller as continuation, returns our handle for
* symmetric transfer.
*
* @param caller Handle of the coroutine doing co_await on us
*
* @return Our handle for symmetric transfer
*/
std::coroutine_handle<>
await_suspend(std::coroutine_handle<> caller) noexcept
{
XRPL_ASSERT(handle_, "xrpl::CoroTask<T>::await_suspend : handle is valid");
handle_.promise().continuation_ = caller;
return handle_;
}
/**
* Extracts the result: rethrows if exception, otherwise moves
* the T value out of the variant. Single-shot: calling twice
* on the same task is undefined (moved-from T).
*
* @return The co_return-ed value
*/
T
await_resume()
{
XRPL_ASSERT(handle_, "xrpl::CoroTask<T>::await_resume : handle is valid");
auto& result = handle_.promise().result_;
if (auto* ep = std::get_if<2>(&result))
std::rethrow_exception(*ep);
return std::get<1>(std::move(result));
}
private:
// Exclusively-owned coroutine handle. Null after move or default
// construction. Destroyed in the destructor.
Handle handle_;
};
} // namespace xrpl

View File

@@ -0,0 +1,374 @@
#pragma once
/**
* @file CoroTaskRunner.ipp
*
* CoroTaskRunner inline implementation.
*
* This file contains the business logic for managing C++20 coroutines
* on the JobQueue. It is included at the bottom of JobQueue.h.
*
* Data Flow: suspend / post / resume cycle
* =========================================
*
* coroutine body CoroTaskRunner JobQueue
* -------------- -------------- --------
* |
* co_await runner->suspend()
* |
* +--- await_suspend ------> onSuspend()
* | ++nSuspend_ ------------> nSuspend_
* | [coroutine is now suspended]
* |
* . (externally or by yieldAndPost())
* .
* +--- (caller calls) -----> post()
* | ++runCount_
* | addJob(resume) ----------> job enqueued
* | |
* | [worker picks up]
* | |
* +--- <----- resume() <-----------------------------------+
* | --nSuspend_ ------> nSuspend_
* | swap in LocalValues (lvs_)
* | task_.handle().resume()
* | |
* | [coroutine body continues here]
* | |
* | swap out LocalValues
* | --runCount_
* | cv_.notify_all()
* v
*
* Thread Safety
* =============
* - mutex_ : guards task_.handle().resume() so that post()-before-suspend
* races cannot resume the coroutine while it is still running.
* (See the race condition discussion in JobQueue.h)
* - mutex_run_ : guards runCount_ counter; used by join() to wait until
* all in-flight resume operations complete.
* - jq_.m_mutex: guards nSuspend_ increments/decrements.
*
* Common Mistakes When Modifying This File
* =========================================
*
* 1. Changing lock ordering.
* resume() acquires locks sequentially (never held simultaneously):
* jq_.m_mutex (released immediately), then mutex_ (held across resume),
* then mutex_run_ (released after decrement). post() acquires only
* mutex_run_. Any new code path must follow the same order.
*
* 2. Removing the shared_from_this() capture in post().
* The lambda passed to addJob captures [this, sp = shared_from_this()].
* If you remove sp, 'this' can be destroyed before the job runs,
* causing use-after-free. The sp capture is load-bearing.
*
* 3. Forgetting to decrement nSuspend_ on a new code path.
* Every ++nSuspend_ must have a matching --nSuspend_. If you add a new
* suspension path (e.g. a new awaiter) and forget to decrement on resume
* or on failure, JobQueue::stop() will hang.
*
* 4. Calling task_.handle().resume() without holding mutex_.
* This allows a race where the coroutine runs on two threads
* simultaneously. Always hold mutex_ around resume().
*
* 5. Swapping LocalValues outside of the mutex_ critical section.
* The swap-in and swap-out of LocalValues must bracket the resume()
* call. If you move the swap-out before the lock_guard(mutex_) is
* released, you break LocalValue isolation for any code that runs
* after the coroutine suspends but before the lock is dropped.
*/
namespace xrpl {
/**
* Construct a CoroTaskRunner. Sets runCount_ to 0; does not
* create the coroutine. Call init() afterwards.
*
* @param jq The JobQueue this coroutine will run on
* @param type Job type for scheduling priority
* @param name Human-readable name for logging
*/
inline JobQueue::CoroTaskRunner::CoroTaskRunner(
create_t,
JobQueue& jq,
JobType type,
std::string const& name)
: jq_(jq), type_(type), name_(name), runCount_(0)
{
}
/**
* Initialize with a coroutine-returning callable.
* Stores the callable on the heap (FuncStore) so it outlives the
* coroutine frame. Coroutine frames store a reference to the
* callable's implicit object parameter (the lambda). If the callable
* is a temporary, that reference dangles after the caller returns.
* Keeping the callable alive here ensures the coroutine's captures
* remain valid.
*
* @param f Callable: CoroTask<void>(shared_ptr<CoroTaskRunner>)
*/
template <class F>
void
JobQueue::CoroTaskRunner::init(F&& f)
{
using Fn = std::decay_t<F>;
auto store = std::make_unique<FuncStore<Fn>>(std::forward<F>(f));
task_ = store->func(shared_from_this());
storedFunc_ = std::move(store);
}
/**
* Destructor. Waits for any in-flight resume() to complete, then
* asserts (debug) that the coroutine has finished or
* expectEarlyExit() was called.
*
* The join() call is necessary because with async dispatch the
* coroutine runs on a worker thread. The gate signal (which wakes
* the test thread) can arrive before resume() has set finished_.
* join() synchronizes via mutex_run_, establishing a happens-before
* edge: finished_ = true -> unlock(mutex_run_) in resume() ->
* lock(mutex_run_) in join() -> read finished_.
*/
inline JobQueue::CoroTaskRunner::~CoroTaskRunner()
{
#ifndef NDEBUG
join();
XRPL_ASSERT(finished_, "xrpl::JobQueue::CoroTaskRunner::~CoroTaskRunner : is finished");
#endif
}
/**
* Increment the JobQueue's suspended-coroutine count (nSuspend_).
*/
inline void
JobQueue::CoroTaskRunner::onSuspend()
{
std::lock_guard lock(jq_.m_mutex);
++jq_.nSuspend_;
}
/**
* Decrement nSuspend_ without resuming.
*/
inline void
JobQueue::CoroTaskRunner::onUndoSuspend()
{
std::lock_guard lock(jq_.m_mutex);
--jq_.nSuspend_;
}
/**
* Return a SuspendAwaiter whose await_suspend() increments nSuspend_
* before the coroutine actually suspends. The caller must later call
* post() or resume() to continue execution.
*
* @return Awaiter for use with `co_await runner->suspend()`
*/
inline auto
JobQueue::CoroTaskRunner::suspend()
{
/**
* Custom awaiter for suspend(). Always suspends (await_ready
* returns false) and increments nSuspend_ in await_suspend().
*/
struct SuspendAwaiter
{
CoroTaskRunner& runner_; // The runner that owns this coroutine.
/**
* Always returns false so the coroutine suspends.
*/
bool
await_ready() const noexcept
{
return false;
}
/**
* Called when the coroutine suspends. Increments nSuspend_
* so the JobQueue knows a coroutine is waiting.
*/
void
await_suspend(std::coroutine_handle<>) const
{
runner_.onSuspend();
}
void
await_resume() const noexcept
{
}
};
return SuspendAwaiter{*this};
}
/**
* Suspend and immediately repost on the JobQueue. Equivalent to
* `co_await JobQueueAwaiter{runner}` but uses an inline struct
* to work around a GCC-12 codegen bug (see declaration in JobQueue.h).
*
* If the JobQueue is stopping (post fails), the suspend count is
* undone and the coroutine is resumed immediately via h.resume().
*
* @return An inline YieldPostAwaiter
*/
inline auto
JobQueue::CoroTaskRunner::yieldAndPost()
{
struct YieldPostAwaiter
{
CoroTaskRunner& runner_;
bool
await_ready() const noexcept
{
return false;
}
void
await_suspend(std::coroutine_handle<> h)
{
runner_.onSuspend();
if (!runner_.post())
{
runner_.onUndoSuspend();
h.resume();
}
}
void
await_resume() const noexcept
{
}
};
return YieldPostAwaiter{*this};
}
/**
* Schedule coroutine resumption as a job on the JobQueue.
* A shared_ptr capture (sp) prevents this CoroTaskRunner from being
* destroyed while the job is queued but not yet executed.
*
* @return false if the JobQueue rejected the job (shutting down)
*/
inline bool
JobQueue::CoroTaskRunner::post()
{
{
std::lock_guard lk(mutex_run_);
++runCount_;
}
// sp prevents 'this' from being destroyed while the job is pending
if (jq_.addJob(type_, name_, [this, sp = shared_from_this()]() { resume(); }))
{
return true;
}
// The coroutine will not run. Undo the runCount_ increment.
std::lock_guard lk(mutex_run_);
--runCount_;
cv_.notify_all();
return false;
}
/**
* Resume the coroutine on the current thread.
*
* Steps:
* 1. Decrement nSuspend_ (under jq_.m_mutex)
* 2. Swap in this coroutine's LocalValues for thread-local isolation
* 3. Resume the coroutine handle (under mutex_)
* 4. Swap out LocalValues, restoring the thread's previous state
* 5. Decrement runCount_ and notify join() waiters
*
* @pre post() must have been called before resume(). Direct calls
* without a prior post() will corrupt runCount_ and break join().
* Note: runCount_ is NOT incremented here — post() already did that.
* This ensures join() stays blocked for the entire post->resume lifetime.
*/
inline void
JobQueue::CoroTaskRunner::resume()
{
{
std::lock_guard lock(jq_.m_mutex);
--jq_.nSuspend_;
}
auto saved = detail::getLocalValues().release();
detail::getLocalValues().reset(&lvs_);
std::lock_guard lock(mutex_);
XRPL_ASSERT(
task_.handle() && !task_.done(),
"xrpl::JobQueue::CoroTaskRunner::resume : task handle is valid and not done");
task_.handle().resume();
detail::getLocalValues().release();
detail::getLocalValues().reset(saved);
if (task_.done())
{
finished_ = true;
// Break the shared_ptr cycle: frame -> shared_ptr<runner> -> this.
// Use std::move (not task_ = {}) so task_.handle_ is null BEFORE the
// frame is destroyed. operator= would destroy the frame while handle_
// still holds the old value -- a re-entrancy hazard on GCC-12 if
// frame destruction triggers runner cleanup.
[[maybe_unused]] auto completed = std::move(task_);
}
std::lock_guard lk(mutex_run_);
--runCount_;
cv_.notify_all();
}
/**
* @return true if the coroutine has not yet run to completion
*/
inline bool
JobQueue::CoroTaskRunner::runnable() const
{
// After normal completion, task_ is reset to break the shared_ptr cycle
// (handle_ becomes null). A null handle means the coroutine is done.
return task_.handle() && !task_.done();
}
/**
* Handle early termination when the coroutine never ran (e.g. JobQueue
* is stopping). Decrements nSuspend_ and destroys the coroutine frame
* to break the shared_ptr cycle: frame -> lambda -> runner -> frame.
*/
inline void
JobQueue::CoroTaskRunner::expectEarlyExit()
{
if (!finished_)
{
std::lock_guard lock(jq_.m_mutex);
--jq_.nSuspend_;
finished_ = true;
}
// Break the shared_ptr cycle: frame -> shared_ptr<runner> -> this.
// The coroutine is at initial_suspend and never ran user code, so
// destroying it is safe. Use std::move (not task_ = {}) so
// task_.handle_ is null before the frame is destroyed.
{
[[maybe_unused]] auto completed = std::move(task_);
}
storedFunc_.reset();
}
/**
* Block until all pending/active resume operations complete.
* Uses cv_ + mutex_run_ to wait until runCount_ reaches 0 or
* finished_ becomes true. The finished_ check handles the case
* where resume() is called directly (without post()), which
* decrements runCount_ below zero. In that scenario runCount_
* never returns to 0, but finished_ becoming true guarantees
* the coroutine is done and no more resumes will occur.
*/
inline void
JobQueue::CoroTaskRunner::join()
{
std::unique_lock<std::mutex> lk(mutex_run_);
cv_.wait(lk, [this]() { return runCount_ == 0 || finished_; });
}
} // namespace xrpl

View File

@@ -2,6 +2,7 @@
#include <xrpl/basics/LocalValue.h>
#include <xrpl/core/ClosureCounter.h>
#include <xrpl/core/CoroTask.h>
#include <xrpl/core/JobTypeData.h>
#include <xrpl/core/JobTypes.h>
#include <xrpl/core/detail/Workers.h>
@@ -9,6 +10,7 @@
#include <boost/coroutine/all.hpp>
#include <coroutine>
#include <set>
namespace xrpl {
@@ -119,6 +121,419 @@ public:
join();
};
/** C++20 coroutine lifecycle manager. Replaces Coro for new code.
*
* Class / Inheritance / Dependency Diagram
* =========================================
*
* std::enable_shared_from_this<CoroTaskRunner>
* ^
* | (public inheritance)
* |
* CoroTaskRunner
* +---------------------------------------------------+
* | - lvs_ : detail::LocalValues |
* | - jq_ : JobQueue& |
* | - type_ : JobType |
* | - name_ : std::string |
* | - runCount_ : int (in-flight resumes) |
* | - mutex_ : std::mutex (coroutine guard) |
* | - mutex_run_ : std::mutex (join guard) |
* | - cv_ : condition_variable |
* | - task_ : CoroTask<void> |
* | - storedFunc_ : unique_ptr<FuncBase> (type-erased)|
* +---------------------------------------------------+
* | + init(F&&) : set up coroutine callable |
* | + onSuspend() : ++jq_.nSuspend_ |
* | + onUndoSuspend() : --jq_.nSuspend_ |
* | + suspend() : returns SuspendAwaiter |
* | + post() : schedule resume on JobQueue |
* | + resume() : resume coroutine on caller |
* | + runnable() : !task_.done() |
* | + expectEarlyExit() : teardown for failed post |
* | + join() : block until not running |
* +---------------------------------------------------+
* | |
* | owns | references
* v v
* CoroTask<void> JobQueue
* (coroutine frame) (thread pool + nSuspend_)
*
* FuncBase / FuncStore<F> (type-erased heap storage
* for the coroutine lambda)
*
* Coroutine Lifecycle (Control Flow)
* ===================================
*
* Caller thread JobQueue worker thread
* ------------- ----------------------
* postCoroTask(f)
* |
* +-- check stopping_ (reject if JQ shutting down)
* +-- ++nSuspend_ (lazy start counts as suspended)
* +-- make_shared<CoroTaskRunner>
* +-- init(f)
* | +-- store lambda on heap (FuncStore)
* | +-- task_ = f(shared_from_this())
* | [coroutine created, suspended at initial_suspend]
* +-- post()
* | +-- ++runCount_
* | +-- addJob(type_, [resume]{})
* | resume()
* | |
* | +-- --nSuspend_
* | +-- swap in LocalValues
* | +-- task_.handle().resume()
* | | [coroutine body runs]
* | | ...
* | | co_await suspend()
* | | +-- ++nSuspend_
* | | [coroutine suspends]
* | +-- swap out LocalValues
* | +-- --runCount_
* | +-- cv_.notify_all()
* |
* post() <-- called externally or by yieldAndPost()
* +-- ++runCount_
* +-- addJob(type_, [resume]{})
* resume()
* |
* +-- [coroutine body continues]
* +-- co_return
* +-- --runCount_
* +-- cv_.notify_all()
* join()
* +-- cv_.wait([]{runCount_ == 0})
* +-- [done]
*
* Usage Examples
* ==============
*
* 1. Fire-and-forget coroutine (most common pattern):
*
* jq.postCoroTask(jtCLIENT, "MyWork",
* [](auto runner) -> CoroTask<void> {
* doSomeWork();
* co_await runner->suspend(); // yield to other jobs
* doMoreWork();
* co_return;
* });
*
* 2. Manually controlling suspend / resume (external trigger):
*
* auto runner = jq.postCoroTask(jtCLIENT, "ExtTrigger",
* [&result](auto runner) -> CoroTask<void> {
* startAsyncOperation(callback);
* co_await runner->suspend();
* // callback called runner->post() to get here
* result = collectResult();
* co_return;
* });
* // ... later, from the callback:
* runner->post(); // reschedule the coroutine on the JobQueue
*
* 3. Using yieldAndPost() for automatic suspend + repost:
*
* jq.postCoroTask(jtCLIENT, "AutoRepost",
* [](auto runner) -> CoroTask<void> {
* step1();
* co_await runner->yieldAndPost(); // yield + auto-repost
* step2();
* co_await runner->yieldAndPost();
* step3();
* co_return;
* });
*
* 4. Checking shutdown after co_await (cooperative cancellation):
*
* jq.postCoroTask(jtCLIENT, "Cancellable",
* [&jq](auto runner) -> CoroTask<void> {
* while (moreWork()) {
* co_await runner->yieldAndPost();
* if (jq.isStopping())
* co_return; // bail out cleanly
* processNextItem();
* }
* co_return;
* });
*
* Caveats / Pitfalls
* ==================
*
* BUG-RISK: Calling suspend() without a matching post()/resume().
* After co_await runner->suspend(), the coroutine is parked and
* nSuspend_ is incremented. If nothing ever calls post() or
* resume(), the coroutine is leaked and JobQueue::stop() will
* hang forever waiting for nSuspend_ to reach zero.
*
* BUG-RISK: Calling post() on an already-running coroutine.
* post() schedules a resume() job. If the coroutine has not
* actually suspended yet (no co_await executed), the resume job
* will try to call handle().resume() while the coroutine is still
* running on another thread. This is UB. The mutex_ prevents
* data corruption but the logic is wrong — always co_await
* suspend() before calling post(). (The test incorrect_order()
* shows this works only because mutex_ serializes the calls.)
*
* BUG-RISK: Dropping the shared_ptr<CoroTaskRunner> before join().
* The CoroTaskRunner destructor asserts that finished_ is true
* (the coroutine completed). If you let the last shared_ptr die
* while the coroutine is still running or suspended, you get an
* assertion failure in debug and UB in release. Always call
* join() or expectEarlyExit() first.
*
* BUG-RISK: Lambda captures outliving the coroutine frame.
* The lambda passed to postCoroTask is heap-allocated (FuncStore)
* to prevent dangling. But objects captured by pointer still need
* their own lifetime management. If you capture a raw pointer to
* a stack variable, and the stack frame exits before the coroutine
* finishes, the pointer dangles. Use shared_ptr or ensure the
* pointed-to object outlives the coroutine.
*
* BUG-RISK: Forgetting co_return in a void coroutine.
* If the coroutine body falls off the end without co_return,
* the compiler may silently treat it as co_return (per standard),
* but some compilers warn. Always write explicit co_return.
*
* LIMITATION: CoroTaskRunner only supports CoroTask<void>.
* The task_ member is CoroTask<void>. To return values from
* the top-level coroutine, write through a captured pointer
* (as the tests demonstrate), or co_await inner CoroTask<T>
* coroutines that return values.
*
* LIMITATION: One coroutine per CoroTaskRunner.
* init() must be called exactly once. You cannot reuse a
* CoroTaskRunner to run a second coroutine. Create a new one
* via postCoroTask() instead.
*
* LIMITATION: No timeout on join().
* join() blocks indefinitely. If the coroutine is suspended
* and never posted, join() will deadlock. Use timed waits
* on the gate pattern (condition_variable + wait_for) in tests.
*/
class CoroTaskRunner : public std::enable_shared_from_this<CoroTaskRunner>
{
private:
// Per-coroutine thread-local storage. Swapped in before resume()
// and swapped out after, so each coroutine sees its own LocalValue
// state regardless of which worker thread executes it.
detail::LocalValues lvs_;
// Back-reference to the owning JobQueue. Used to post jobs,
// increment/decrement nSuspend_, and acquire jq_.m_mutex.
JobQueue& jq_;
// Job type passed to addJob() when posting this coroutine.
JobType type_;
// Human-readable name for this coroutine job (for logging).
std::string name_;
// Number of in-flight resume operations (pending + active).
// Incremented by post(), decremented when resume() finishes.
// Guarded by mutex_run_. join() blocks until this reaches 0.
//
// A counter (not a bool) is needed because post() can be called
// from within the coroutine body (e.g. via yieldAndPost()),
// enqueuing a second resume while the first is still running.
// A bool would be clobbered: R2.post() sets true, then R1's
// cleanup sets false — losing the fact that R2 is still pending.
int runCount_;
// Serializes all coroutine resume() calls, preventing concurrent
// execution of the coroutine body on multiple threads. Handles the
// race where post() enqueues a resume before the coroutine has
// actually suspended (post-before-suspend pattern).
std::mutex mutex_;
// Guards runCount_. Used with cv_ for join() to wait
// until all pending/active resume operations complete.
std::mutex mutex_run_;
// Notified when runCount_ reaches zero, allowing
// join() waiters to wake up.
std::condition_variable cv_;
// The coroutine handle wrapper. Owns the coroutine frame.
// Set by init(). Reset to empty in resume() upon coroutine
// completion (to break the shared_ptr cycle) or in
// expectEarlyExit() on early termination.
CoroTask<void> task_;
/**
* Type-erased base for heap-stored callables.
* Prevents the coroutine lambda from being destroyed before
* the coroutine frame is done with it.
*
* @see FuncStore
*/
struct FuncBase
{
virtual ~FuncBase() = default;
};
/**
* Concrete type-erased storage for a callable of type F.
* The coroutine frame stores a reference to the lambda's implicit
* object parameter. If the lambda is a temporary, that reference
* dangles after the call returns. FuncStore keeps it alive on
* the heap for the lifetime of the CoroTaskRunner.
*/
template <class F>
struct FuncStore : FuncBase
{
F func; // The stored callable (coroutine lambda).
explicit FuncStore(F&& f) : func(std::move(f))
{
}
};
// Heap-allocated callable storage. Set by init(), ensures the
// lambda outlives the coroutine frame that references it.
std::unique_ptr<FuncBase> storedFunc_;
// True once the coroutine has completed or expectEarlyExit() was
// called. Asserted in the destructor (debug) to catch leaked
// runners. Available in all builds to guard expectEarlyExit()
// against double-decrementing nSuspend_.
bool finished_ = false;
public:
/**
* Tag type for private construction. Prevents external code
* from constructing CoroTaskRunner directly. Use postCoroTask().
*/
struct create_t
{
explicit create_t() = default;
};
/**
* Construct a CoroTaskRunner. Private by convention (create_t tag).
*
* @param jq The JobQueue this coroutine will run on
* @param type Job type for scheduling priority
* @param name Human-readable name for logging
*/
CoroTaskRunner(create_t, JobQueue&, JobType, std::string const&);
CoroTaskRunner(CoroTaskRunner const&) = delete;
CoroTaskRunner&
operator=(CoroTaskRunner const&) = delete;
/**
* Destructor. Asserts (debug) that the coroutine has finished
* or expectEarlyExit() was called.
*/
~CoroTaskRunner();
/**
* Initialize with a coroutine-returning callable.
* Must be called exactly once, after the object is managed by
* shared_ptr (because init uses shared_from_this internally).
* This is handled automatically by postCoroTask().
*
* @param f Callable: CoroTask<void>(shared_ptr<CoroTaskRunner>)
*/
template <class F>
void
init(F&& f);
/**
* Increment the JobQueue's suspended-coroutine count (nSuspend_).
* Called when the coroutine is about to suspend. Every call
* must be balanced by a corresponding decrement (via resume()
* or onUndoSuspend()), or JobQueue::stop() will hang.
*/
void
onSuspend();
/**
* Decrement nSuspend_ without resuming.
* Used to undo onSuspend() when a scheduled post() fails
* (e.g. JobQueue is stopping).
*/
void
onUndoSuspend();
/**
* Suspend the coroutine.
* The awaiter's await_suspend() increments nSuspend_ before the
* coroutine actually suspends. The caller must later call post()
* or resume() to continue execution.
*
* @return An awaiter for use with `co_await runner->suspend()`
*/
auto
suspend();
/**
* Suspend the coroutine and immediately repost it on the
* JobQueue. Combines suspend() + post() atomically inside
* await_suspend, so there is no window where an external
* event could race between the two.
*
* Equivalent to JobQueueAwaiter but defined as an inline
* awaiter returned from a member function. This avoids a
* GCC-12 coroutine codegen bug where an external awaiter
* struct (JobQueueAwaiter) used at multiple co_await points
* corrupts the coroutine state machine's resume index,
* causing the coroutine to hang on the third resumption.
*
* @return An awaiter for use with `co_await runner->yieldAndPost()`
*/
auto
yieldAndPost();
/**
* Schedule coroutine resumption as a job on the JobQueue.
* Captures shared_from_this() to prevent this runner from being
* destroyed while the job is queued.
*
* @return true if the job was accepted; false if the JobQueue
* is stopping (caller must handle cleanup)
*/
bool
post();
/**
* Resume the coroutine on the current thread.
* Decrements nSuspend_, swaps in LocalValues, resumes the
* coroutine handle, swaps out LocalValues, and notifies join()
* waiters. Lock ordering (sequential, non-overlapping):
* jq_.m_mutex -> mutex_ -> mutex_run_.
*
* @pre post() must have been called before resume(). Direct
* calls without a prior post() will corrupt runCount_
* and break join().
*/
void
resume();
/**
* @return true if the coroutine has not yet run to completion
*/
bool
runnable() const;
/**
* Handle early termination when the coroutine never ran.
* Decrements nSuspend_ and destroys the coroutine frame to
* break the shared_ptr cycle (frame -> lambda -> runner -> frame).
* Called by postCoroTask() when post() fails.
*/
void
expectEarlyExit();
/**
* Block until all pending/active resume operations complete.
* Uses cv_ + mutex_run_ to wait until runCount_ reaches 0.
* Warning: deadlocks if the coroutine is suspended and never posted.
*/
void
join();
};
using JobFunction = std::function<void()>;
JobQueue(
@@ -165,6 +580,19 @@ public:
std::shared_ptr<Coro>
postCoro(JobType t, std::string const& name, F&& f);
/** Creates a C++20 coroutine and adds a job to the queue to run it.
@param t The type of job.
@param name Name of the job.
@param f Callable with signature
CoroTask<void>(std::shared_ptr<CoroTaskRunner>).
@return shared_ptr to posted CoroTaskRunner. nullptr if not successful.
*/
template <class F>
std::shared_ptr<CoroTaskRunner>
postCoroTask(JobType t, std::string const& name, F&& f);
/** Jobs waiting at this priority.
*/
int
@@ -379,6 +807,7 @@ private:
} // namespace xrpl
#include <xrpl/core/Coro.ipp>
#include <xrpl/core/CoroTaskRunner.ipp>
namespace xrpl {
@@ -401,4 +830,69 @@ JobQueue::postCoro(JobType t, std::string const& name, F&& f)
return coro;
}
// postCoroTask — entry point for launching a C++20 coroutine on the JobQueue.
//
// Control Flow
// ============
//
// postCoroTask(t, name, f)
// |
// +-- 1. Check stopping_ — reject if JQ shutting down
// |
// +-- 2. ++nSuspend_ (mirrors Boost Coro ctor's implicit yield)
// | The coroutine is "suspended" from the JobQueue's perspective
// | even though it hasn't run yet — this keeps the JQ shutdown
// | logic correct (it waits for nSuspend_ to reach 0).
// |
// +-- 3. Create CoroTaskRunner (shared_ptr, ref-counted)
// |
// +-- 4. runner->init(f)
// | +-- Heap-allocate the lambda (FuncStore) to prevent
// | | dangling captures in the coroutine frame
// | +-- task_ = f(shared_from_this())
// | [coroutine created but NOT started — lazy initial_suspend]
// |
// +-- 5. runner->post()
// | +-- addJob(type_, [resume]{}) → resume on worker thread
// | +-- failure (JQ stopping):
// | +-- runner->expectEarlyExit()
// | | --nSuspend_, destroy coroutine frame
// | +-- return nullptr
//
// Why async post() instead of synchronous resume()?
// ==================================================
// The initial dispatch MUST use async post() so the coroutine body runs on
// a JobQueue worker thread, not the caller's thread. resume() swaps the
// caller's thread-local LocalValues with the coroutine's private copy.
// If the coroutine mutates LocalValues (e.g. thread_specific_storage test),
// those mutations bleed back into the caller's thread-local state after the
// swap-out, corrupting subsequent tests that share the same thread pool.
// Async post() avoids this by running the coroutine on a worker thread whose
// LocalValues are managed by the thread pool, not by the caller.
//
template <class F>
std::shared_ptr<JobQueue::CoroTaskRunner>
JobQueue::postCoroTask(JobType t, std::string const& name, F&& f)
{
// Reject if the JQ is shutting down — matches addJob()'s stopping_ check.
// Must check before incrementing nSuspend_ to avoid leaving an orphan
// count that would cause stop() to hang.
if (stopping_)
return nullptr;
{
std::lock_guard lock(m_mutex);
++nSuspend_;
}
auto runner = std::make_shared<CoroTaskRunner>(CoroTaskRunner::create_t{}, *this, t, name);
runner->init(std::forward<F>(f));
if (!runner->post())
{
runner->expectEarlyExit();
runner.reset();
}
return runner;
}
} // namespace xrpl

View File

@@ -0,0 +1,206 @@
#pragma once
#include <xrpl/core/JobQueue.h>
#include <coroutine>
#include <memory>
namespace xrpl {
/**
* Awaiter that suspends and immediately reschedules on the JobQueue.
* Equivalent to calling yield() followed by post() in the old Coro API.
*
* Usage:
* co_await JobQueueAwaiter{runner};
*
* What it waits for: The coroutine is re-queued as a job and resumes
* when a worker thread picks it up.
*
* Which thread resumes: A JobQueue worker thread.
*
* What await_resume() returns: void.
*
* Dependency Diagram
* ==================
*
* JobQueueAwaiter
* +----------------------------------------------+
* | + runner : shared_ptr<CoroTaskRunner> |
* +----------------------------------------------+
* | + await_ready() -> false (always suspend) |
* | + await_suspend() -> bool (suspend or cancel) |
* | + await_resume() -> void |
* +----------------------------------------------+
* | |
* | uses | uses
* v v
* CoroTaskRunner JobQueue
* .onSuspend() (via runner->post() -> addJob)
* .onUndoSuspend()
* .post()
*
* Control Flow (await_suspend)
* ============================
*
* co_await JobQueueAwaiter{runner}
* |
* +-- await_ready() -> false
* +-- await_suspend(handle)
* |
* +-- runner->onSuspend() // ++nSuspend_
* +-- runner->post() // addJob to JobQueue
* | |
* | +-- success? return noop_coroutine()
* | | // coroutine stays suspended;
* | | // worker thread will call resume()
* | +-- failure? (JQ stopping)
* | +-- runner->onUndoSuspend() // --nSuspend_
* | +-- return handle // symmetric transfer back
* | // coroutine continues immediately
* | // so it can clean up and co_return
*
* DEPRECATED — prefer `co_await runner->yieldAndPost()`
* =====================================================
*
* GCC-12 has a coroutine codegen bug where using this external awaiter
* struct at multiple co_await points in the same coroutine corrupts the
* state machine's resume index. After the second co_await, the third
* resumption enters handle().resume() but never reaches await_resume()
* or any subsequent user code — the coroutine hangs indefinitely.
*
* The fix is `co_await runner->yieldAndPost()`, which defines the
* awaiter as an inline struct inside a CoroTaskRunner member function.
* GCC-12 handles inline awaiters correctly at multiple co_await points.
*
* This struct is retained for single-use scenarios and documentation
* purposes. For any code that may use co_await in a loop or at
* multiple points, always use `runner->yieldAndPost()`.
*
* Usage Examples
* ==============
*
* 1. Yield and auto-repost (preferred — works on all compilers):
*
* CoroTask<void> handler(auto runner) {
* doPartA();
* co_await runner->yieldAndPost(); // yield + repost
* doPartB(); // runs on a worker thread
* co_return;
* }
*
* 2. Multiple yield points in a loop:
*
* CoroTask<void> batchProcessor(auto runner) {
* for (auto& item : items) {
* process(item);
* co_await runner->yieldAndPost(); // let other jobs run
* }
* co_return;
* }
*
* 3. Graceful shutdown — checking after resume:
*
* CoroTask<void> longTask(auto runner, JobQueue& jq) {
* while (hasWork()) {
* co_await runner->yieldAndPost();
* // If JQ is stopping, await_suspend resumes the coroutine
* // immediately without re-queuing. Always check
* // isStopping() to decide whether to proceed:
* if (jq.isStopping())
* co_return;
* doNextChunk();
* }
* co_return;
* }
*
* Caveats / Pitfalls
* ==================
*
* BUG-RISK: Using a stale or null runner.
* The runner shared_ptr must be valid and point to the CoroTaskRunner
* that owns the coroutine currently executing. Passing a runner from
* a different coroutine, or a default-constructed shared_ptr, is UB.
*
* BUG-RISK: Assuming resume happens on the same thread.
* After co_await, the coroutine resumes on whatever worker thread
* picks up the job. Do not rely on thread-local state unless it is
* managed through LocalValue (which CoroTaskRunner automatically
* swaps in/out).
*
* BUG-RISK: Ignoring the shutdown path.
* When the JobQueue is stopping, post() fails and await_suspend()
* resumes the coroutine immediately (symmetric transfer back to h).
* The coroutine body continues on the same thread. If your code
* after co_await assumes it was re-queued and is running on a worker
* thread, that assumption breaks during shutdown. Always handle the
* "JQ is stopping" case, either by checking jq.isStopping() or by
* letting the coroutine fall through to co_return naturally.
*
* DIFFERENCE from runner->suspend() + runner->post():
* Both JobQueueAwaiter and yieldAndPost() combine suspend + post
* in one atomic operation. With the manual suspend()/post() pattern,
* there is a window between the two calls where an external event
* could race. The atomic awaiters remove that window — onSuspend()
* and post() happen within the same await_suspend() call while the
* coroutine is guaranteed to be suspended. Use yieldAndPost() unless
* you need an external party to decide *when* to call post().
*/
struct JobQueueAwaiter
{
// The CoroTaskRunner that owns the currently executing coroutine.
std::shared_ptr<JobQueue::CoroTaskRunner> runner;
/**
* Always returns false so the coroutine suspends.
*/
bool
await_ready() const noexcept
{
return false;
}
/**
* Increment nSuspend (equivalent to yield()) and schedule resume
* on the JobQueue (equivalent to post()). If the JobQueue is
* stopping, undoes the suspend count and transfers back to the
* coroutine so it can clean up and co_return.
*
* Returns a coroutine_handle<> (symmetric transfer) instead of
* bool to work around a GCC-12 codegen bug where bool-returning
* await_suspend leaves the coroutine in an invalid state —
* neither properly suspended nor resumed — causing a hang.
*
* WARNING: GCC-12 has an additional codegen bug where using this
* external awaiter struct at multiple co_await points in the same
* coroutine corrupts the state machine's resume index, causing the
* coroutine to hang on the third resumption. Prefer
* `co_await runner->yieldAndPost()` which uses an inline awaiter
* that GCC-12 handles correctly.
*
* @return noop_coroutine() to stay suspended (job posted);
* the caller's handle to resume immediately (JQ stopping)
*/
std::coroutine_handle<>
await_suspend(std::coroutine_handle<> h)
{
XRPL_ASSERT(runner, "xrpl::JobQueueAwaiter::await_suspend : runner is valid");
runner->onSuspend();
if (!runner->post())
{
// JobQueue is stopping. Undo the suspend count and
// transfer back to the coroutine so it can clean up
// and co_return.
runner->onUndoSuspend();
return h;
}
return std::noop_coroutine();
}
void
await_resume() const noexcept
{
}
};
} // namespace xrpl

View File

@@ -8,6 +8,7 @@
#include <xrpld/rpc/detail/Tuning.h>
#include <xrpl/beast/unit_test.h>
#include <xrpl/core/CoroTask.h>
#include <xrpl/core/JobQueue.h>
#include <xrpl/json/json_reader.h>
#include <xrpl/protocol/ApiVersion.h>
@@ -131,7 +132,6 @@ public:
c,
Role::USER,
{},
{},
RPC::apiVersionIfUnspecified},
{},
{}};
@@ -155,11 +155,11 @@ public:
Json::Value result;
gate g;
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = std::move(params);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
using namespace std::chrono_literals;
@@ -240,28 +240,27 @@ public:
c,
Role::USER,
{},
{},
RPC::apiVersionIfUnspecified},
{},
{}};
Json::Value result;
gate g;
// Test RPC::Tuning::max_src_cur source currencies.
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = rpf(Account("alice"), Account("bob"), RPC::Tuning::max_src_cur);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(!result.isMember(jss::error));
// Test more than RPC::Tuning::max_src_cur source currencies.
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = rpf(Account("alice"), Account("bob"), RPC::Tuning::max_src_cur + 1);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(result.isMember(jss::error));
@@ -269,22 +268,22 @@ public:
// Test RPC::Tuning::max_auto_src_cur source currencies.
for (auto i = 0; i < (RPC::Tuning::max_auto_src_cur - 1); ++i)
env.trust(Account("alice")[std::to_string(i + 100)](100), "bob");
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = rpf(Account("alice"), Account("bob"), 0);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(!result.isMember(jss::error));
// Test more than RPC::Tuning::max_auto_src_cur source currencies.
env.trust(Account("alice")["AUD"](100), "bob");
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = rpf(Account("alice"), Account("bob"), 0);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(result.isMember(jss::error));

View File

@@ -0,0 +1,537 @@
#include <test/jtx.h>
#include <xrpl/core/JobQueue.h>
#include <xrpl/core/JobQueueAwaiter.h>
#include <chrono>
#include <mutex>
namespace xrpl {
namespace test {
/**
* Test suite for the C++20 coroutine primitives: CoroTask, CoroTaskRunner,
* and JobQueueAwaiter.
*
* Dependency Diagram
* ==================
*
* CoroTask_test
* +-------------------------------------------------+
* | + gate (inner class) : condition_variable helper |
* +-------------------------------------------------+
* | uses
* v
* jtx::Env --> JobQueue::postCoroTask()
* |
* +-- CoroTaskRunner (suspend / post / resume)
* +-- CoroTask<void> / CoroTask<T>
* +-- JobQueueAwaiter
*
* Test Coverage Matrix
* ====================
*
* Test | Primitives exercised
* --------------------------+----------------------------------------------
* testVoidCompletion | CoroTask<void> basic lifecycle
* testCorrectOrder | suspend() -> join() -> post() -> complete
* testIncorrectOrder | post() before suspend() (race-safe path)
* testJobQueueAwaiter | JobQueueAwaiter suspend + auto-repost
* testThreadSpecificStorage | LocalValue isolation across coroutines
* testExceptionPropagation | unhandled_exception() in promise_type
* testMultipleYields | N sequential suspend/resume cycles
* testValueReturn | CoroTask<T> co_return value
* testValueException | CoroTask<T> exception via co_await
* testValueChaining | nested CoroTask<T> -> CoroTask<T>
* testShutdownRejection | postCoroTask returns nullptr when stopping
*/
class CoroTask_test : public beast::unit_test::suite
{
public:
/**
* Simple one-shot gate for synchronizing between test thread
* and coroutine worker threads. signal() sets the flag;
* wait_for() blocks until signaled or timeout.
*/
class gate
{
private:
std::condition_variable cv_;
std::mutex mutex_;
bool signaled_ = false;
public:
/**
* Block until signaled or timeout expires.
*
* @param rel_time Maximum duration to wait
*
* @return true if signaled before timeout
*/
template <class Rep, class Period>
bool
wait_for(std::chrono::duration<Rep, Period> const& rel_time)
{
std::unique_lock<std::mutex> lk(mutex_);
auto b = cv_.wait_for(lk, rel_time, [this] { return signaled_; });
signaled_ = false;
return b;
}
/**
* Signal the gate, waking any waiting thread.
*/
void
signal()
{
std::lock_guard lk(mutex_);
signaled_ = true;
cv_.notify_all();
}
};
// NOTE: All coroutine lambdas passed to postCoroTask use explicit
// pointer-by-value captures instead of [&] to work around a GCC 14
// bug where reference captures in coroutine lambdas are corrupted
// in the coroutine frame.
/**
* CoroTask<void> runs to completion and runner becomes non-runnable.
*/
void
testVoidCompletion()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("void completion");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [gp = &g](auto) -> CoroTask<void> {
gp->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
runner->join();
BEAST_EXPECT(!runner->runnable());
}
/**
* Correct order: suspend, join, post, complete.
* Mirrors existing Coroutine_test::correct_order.
*/
void
testCorrectOrder()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("correct order");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g1, g2;
std::shared_ptr<JobQueue::CoroTaskRunner> r;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT,
"CoroTaskTest",
[rp = &r, g1p = &g1, g2p = &g2](auto runner) -> CoroTask<void> {
*rp = runner;
g1p->signal();
co_await runner->suspend();
g2p->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g1.wait_for(5s));
runner->join();
runner->post();
BEAST_EXPECT(g2.wait_for(5s));
runner->join();
}
/**
* Incorrect order: post() before suspend(). Verifies the
* race-safe path. Mirrors Coroutine_test::incorrect_order.
*/
void
testIncorrectOrder()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("incorrect order");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [gp = &g](auto runner) -> CoroTask<void> {
runner->post();
co_await runner->suspend();
gp->signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
}
/**
* JobQueueAwaiter suspend + auto-repost across multiple yield points.
*/
void
testJobQueueAwaiter()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("JobQueueAwaiter");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
int step = 0;
env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [sp = &step, gp = &g](auto runner) -> CoroTask<void> {
*sp = 1;
co_await runner->yieldAndPost();
*sp = 2;
co_await runner->yieldAndPost();
*sp = 3;
gp->signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(step == 3);
}
/**
* Per-coroutine LocalValue isolation. Each coroutine sees its own
* copy of thread-local state. Mirrors Coroutine_test::thread_specific_storage.
*/
void
testThreadSpecificStorage()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("thread specific storage");
Env env(*this);
auto& jq = env.app().getJobQueue();
static int const N = 4;
std::array<std::shared_ptr<JobQueue::CoroTaskRunner>, N> a;
LocalValue<int> lv(-1);
BEAST_EXPECT(*lv == -1);
gate g;
jq.addJob(jtCLIENT, "LocalValTest", [&]() {
this->BEAST_EXPECT(*lv == -1);
*lv = -2;
this->BEAST_EXPECT(*lv == -2);
g.signal();
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(*lv == -1);
for (int i = 0; i < N; ++i)
{
jq.postCoroTask(
jtCLIENT,
"CoroTaskTest",
[this, ap = &a, gp = &g, lvp = &lv, id = i](auto runner) -> CoroTask<void> {
(*ap)[id] = runner;
gp->signal();
co_await runner->suspend();
this->BEAST_EXPECT(**lvp == -1);
**lvp = id;
this->BEAST_EXPECT(**lvp == id);
gp->signal();
co_await runner->suspend();
this->BEAST_EXPECT(**lvp == id);
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
a[i]->join();
}
for (auto const& r : a)
{
r->post();
BEAST_EXPECT(g.wait_for(5s));
r->join();
}
for (auto const& r : a)
{
r->post();
r->join();
}
jq.addJob(jtCLIENT, "LocalValTest", [&]() {
this->BEAST_EXPECT(*lv == -2);
g.signal();
});
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(*lv == -1);
}
/**
* Exception thrown in coroutine body is caught by
* promise_type::unhandled_exception(). Coroutine completes.
*/
void
testExceptionPropagation()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("exception propagation");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [gp = &g](auto) -> CoroTask<void> {
gp->signal();
throw std::runtime_error("test exception");
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
runner->join();
// The exception is caught by promise_type::unhandled_exception()
// and the coroutine is considered done
BEAST_EXPECT(!runner->runnable());
}
/**
* Multiple sequential suspend/resume cycles via co_await.
*/
void
testMultipleYields()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("multiple yields");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
int counter = 0;
std::shared_ptr<JobQueue::CoroTaskRunner> r;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT,
"CoroTaskTest",
[rp = &r, cp = &counter, gp = &g](auto runner) -> CoroTask<void> {
*rp = runner;
++(*cp);
gp->signal();
co_await runner->suspend();
++(*cp);
gp->signal();
co_await runner->suspend();
++(*cp);
gp->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(counter == 1);
runner->join();
runner->post();
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(counter == 2);
runner->join();
runner->post();
BEAST_EXPECT(g.wait_for(5s));
BEAST_EXPECT(counter == 3);
runner->join();
BEAST_EXPECT(!runner->runnable());
}
/**
* CoroTask<T> returns a value via co_return. Outer coroutine
* extracts it with co_await.
*/
void
testValueReturn()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("value return");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
int result = 0;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [rp = &result, gp = &g](auto) -> CoroTask<void> {
auto inner = []() -> CoroTask<int> { co_return 42; };
*rp = co_await inner();
gp->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
runner->join();
BEAST_EXPECT(result == 42);
BEAST_EXPECT(!runner->runnable());
}
/**
* CoroTask<T> propagates exceptions from inner coroutines.
* Outer coroutine catches via try/catch around co_await.
*/
void
testValueException()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("value exception");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
bool caught = false;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [cp = &caught, gp = &g](auto) -> CoroTask<void> {
auto inner = []() -> CoroTask<int> {
throw std::runtime_error("inner error");
co_return 0;
};
try
{
co_await inner();
}
catch (std::runtime_error const& e)
{
*cp = true;
}
gp->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
runner->join();
BEAST_EXPECT(caught);
BEAST_EXPECT(!runner->runnable());
}
/**
* CoroTask<T> chaining. Nested value-returning coroutines
* compose via co_await.
*/
void
testValueChaining()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("value chaining");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
gate g;
int result = 0;
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [rp = &result, gp = &g](auto) -> CoroTask<void> {
auto add = [](int a, int b) -> CoroTask<int> { co_return a + b; };
auto mul = [add](int a, int b) -> CoroTask<int> {
int sum = co_await add(a, b);
co_return sum * 2;
};
*rp = co_await mul(3, 4);
gp->signal();
co_return;
});
BEAST_EXPECT(runner);
BEAST_EXPECT(g.wait_for(5s));
runner->join();
BEAST_EXPECT(result == 14); // (3 + 4) * 2
BEAST_EXPECT(!runner->runnable());
}
/**
* postCoroTask returns nullptr when JobQueue is stopping.
*/
void
testShutdownRejection()
{
using namespace std::chrono_literals;
using namespace jtx;
testcase("shutdown rejection");
Env env(*this, envconfig([](std::unique_ptr<Config> cfg) {
cfg->FORCE_MULTI_THREAD = true;
return cfg;
}));
// Stop the JobQueue
env.app().getJobQueue().stop();
auto runner = env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTaskTest", [](auto) -> CoroTask<void> { co_return; });
BEAST_EXPECT(!runner);
}
void
run() override
{
testVoidCompletion();
testCorrectOrder();
testIncorrectOrder();
testJobQueueAwaiter();
testThreadSpecificStorage();
testExceptionPropagation();
testMultipleYields();
testValueReturn();
testValueException();
testValueChaining();
testShutdownRejection();
}
};
BEAST_DEFINE_TESTSUITE(CoroTask, core, xrpl);
} // namespace test
} // namespace xrpl

View File

@@ -40,6 +40,11 @@ public:
}
};
// NOTE: All coroutine lambdas passed to postCoroTask use explicit
// pointer-by-value captures instead of [&] to work around a GCC 14
// bug where reference captures in coroutine lambdas are corrupted
// in the coroutine frame.
void
correct_order()
{
@@ -54,13 +59,15 @@ public:
}));
gate g1, g2;
std::shared_ptr<JobQueue::Coro> c;
env.app().getJobQueue().postCoro(jtCLIENT, "CoroTest", [&](auto const& cr) {
c = cr;
g1.signal();
c->yield();
g2.signal();
});
std::shared_ptr<JobQueue::CoroTaskRunner> c;
env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTest", [cp = &c, g1p = &g1, g2p = &g2](auto runner) -> CoroTask<void> {
*cp = runner;
g1p->signal();
co_await runner->suspend();
g2p->signal();
co_return;
});
BEAST_EXPECT(g1.wait_for(5s));
c->join();
c->post();
@@ -81,11 +88,17 @@ public:
}));
gate g;
env.app().getJobQueue().postCoro(jtCLIENT, "CoroTest", [&](auto const& c) {
c->post();
c->yield();
g.signal();
});
env.app().getJobQueue().postCoroTask(
jtCLIENT, "CoroTest", [gp = &g](auto runner) -> CoroTask<void> {
// Schedule a resume before suspending. The posted job
// cannot actually call resume() until the current resume()
// releases CoroTaskRunner::mutex_, which only happens after
// the coroutine suspends at co_await.
runner->post();
co_await runner->suspend();
gp->signal();
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
}
@@ -101,7 +114,7 @@ public:
auto& jq = env.app().getJobQueue();
static int const N = 4;
std::array<std::shared_ptr<JobQueue::Coro>, N> a;
std::array<std::shared_ptr<JobQueue::CoroTaskRunner>, N> a;
LocalValue<int> lv(-1);
BEAST_EXPECT(*lv == -1);
@@ -118,19 +131,23 @@ public:
for (int i = 0; i < N; ++i)
{
jq.postCoro(jtCLIENT, "CoroTest", [&, id = i](auto const& c) {
a[id] = c;
g.signal();
c->yield();
jq.postCoroTask(
jtCLIENT,
"CoroTest",
[this, ap = &a, gp = &g, lvp = &lv, id = i](auto runner) -> CoroTask<void> {
(*ap)[id] = runner;
gp->signal();
co_await runner->suspend();
this->BEAST_EXPECT(*lv == -1);
*lv = id;
this->BEAST_EXPECT(*lv == id);
g.signal();
c->yield();
this->BEAST_EXPECT(**lvp == -1);
**lvp = id;
this->BEAST_EXPECT(**lvp == id);
gp->signal();
co_await runner->suspend();
this->BEAST_EXPECT(*lv == id);
});
this->BEAST_EXPECT(**lvp == id);
co_return;
});
BEAST_EXPECT(g.wait_for(5s));
a[i]->join();
}

View File

@@ -43,87 +43,91 @@ class JobQueue_test : public beast::unit_test::suite
}
}
// NOTE: All coroutine lambdas passed to postCoroTask use explicit
// pointer-by-value captures instead of [&] to work around a GCC 14
// bug where reference captures in coroutine lambdas are corrupted
// in the coroutine frame.
void
testPostCoro()
testPostCoroTask()
{
jtx::Env env{*this};
JobQueue& jQueue = env.app().getJobQueue();
{
// Test repeated post()s until the Coro completes.
// Test repeated post()s until the coroutine completes.
std::atomic<int> yieldCount{0};
auto const coro = jQueue.postCoro(
jtCLIENT,
"PostCoroTest1",
[&yieldCount](std::shared_ptr<JobQueue::Coro> const& coroCopy) {
while (++yieldCount < 4)
coroCopy->yield();
auto const runner = jQueue.postCoroTask(
jtCLIENT, "PostCoroTest1", [ycp = &yieldCount](auto runner) -> CoroTask<void> {
while (++(*ycp) < 4)
co_await runner->suspend();
co_return;
});
BEAST_EXPECT(coro != nullptr);
BEAST_EXPECT(runner != nullptr);
// Wait for the Job to run and yield.
while (yieldCount == 0)
;
// Now re-post until the Coro says it is done.
// Now re-post until the CoroTaskRunner says it is done.
int old = yieldCount;
while (coro->runnable())
while (runner->runnable())
{
BEAST_EXPECT(coro->post());
BEAST_EXPECT(runner->post());
while (old == yieldCount)
{
}
coro->join();
runner->join();
BEAST_EXPECT(++old == yieldCount);
}
BEAST_EXPECT(yieldCount == 4);
}
{
// Test repeated resume()s until the Coro completes.
// Test repeated resume()s until the coroutine completes.
int yieldCount{0};
auto const coro = jQueue.postCoro(
jtCLIENT,
"PostCoroTest2",
[&yieldCount](std::shared_ptr<JobQueue::Coro> const& coroCopy) {
while (++yieldCount < 4)
coroCopy->yield();
auto const runner = jQueue.postCoroTask(
jtCLIENT, "PostCoroTest2", [ycp = &yieldCount](auto runner) -> CoroTask<void> {
while (++(*ycp) < 4)
co_await runner->suspend();
co_return;
});
if (!coro)
if (!runner)
{
// There's no good reason we should not get a Coro, but we
// There's no good reason we should not get a runner, but we
// can't continue without one.
BEAST_EXPECT(false);
return;
}
// Wait for the Job to run and yield.
coro->join();
runner->join();
// Now resume until the Coro says it is done.
// Now resume until the CoroTaskRunner says it is done.
int old = yieldCount;
while (coro->runnable())
while (runner->runnable())
{
coro->resume(); // Resume runs synchronously on this thread.
runner->resume(); // Resume runs synchronously on this thread.
BEAST_EXPECT(++old == yieldCount);
}
BEAST_EXPECT(yieldCount == 4);
}
{
// If the JobQueue is stopped, we should no
// longer be able to add a Coro (and calling postCoro() should
// return false).
// longer be able to post a coroutine (and calling postCoroTask()
// should return nullptr).
using namespace std::chrono_literals;
jQueue.stop();
// The Coro should never run, so having the Coro access this
// The coroutine should never run, so having it access this
// unprotected variable on the stack should be completely safe.
// Not recommended for the faint of heart...
bool unprotected;
auto const coro = jQueue.postCoro(
jtCLIENT, "PostCoroTest3", [&unprotected](std::shared_ptr<JobQueue::Coro> const&) {
unprotected = false;
auto const runner = jQueue.postCoroTask(
jtCLIENT, "PostCoroTest3", [up = &unprotected](auto) -> CoroTask<void> {
*up = false;
co_return;
});
BEAST_EXPECT(coro == nullptr);
BEAST_EXPECT(runner == nullptr);
}
}
@@ -132,7 +136,7 @@ public:
run() override
{
testAddJob();
testPostCoro();
testPostCoroTask();
}
};

View File

@@ -6,6 +6,7 @@
#include <xrpld/rpc/RPCHandler.h>
#include <xrpl/core/CoroTask.h>
#include <xrpl/protocol/ApiVersion.h>
#include <xrpl/protocol/STParsedJSON.h>
#include <xrpl/resource/Fees.h>
@@ -193,7 +194,6 @@ AMMTest::find_paths_request(
c,
Role::USER,
{},
{},
RPC::apiVersionIfUnspecified},
{},
{}};
@@ -215,11 +215,11 @@ AMMTest::find_paths_request(
Json::Value result;
gate g;
app.getJobQueue().postCoro(jtCLIENT, "RPC-Client", [&](auto const& coro) {
app.getJobQueue().postCoroTask(jtCLIENT, "RPC-Client", [&](auto) -> CoroTask<void> {
context.params = std::move(params);
context.coro = coro;
RPC::doCommand(context, result);
g.signal();
co_return;
});
using namespace std::chrono_literals;

View File

@@ -1431,7 +1431,6 @@ ApplicationImp::setup(boost::program_options::variables_map const& cmdline)
c,
Role::ADMIN,
{},
{},
RPC::apiMaximumSupportedVersion},
jvCommand};

View File

@@ -3,6 +3,7 @@
#include <xrpl/beast/core/CurrentThreadName.h>
#include <xrpl/beast/net/IPAddressConversion.h>
#include <xrpl/core/CoroTask.h>
#include <xrpl/resource/Fees.h>
namespace xrpl {
@@ -99,13 +100,14 @@ GRPCServerImpl::CallData<Request, Response>::process()
// ensures that finished is always true when this CallData object
// is returned as a tag in handleRpcs(), after sending the response
finished_ = true;
auto coro = app_.getJobQueue().postCoro(
JobType::jtRPC, "gRPC-Client", [thisShared](std::shared_ptr<JobQueue::Coro> coro) {
thisShared->process(coro);
auto runner = app_.getJobQueue().postCoroTask(
JobType::jtRPC, "gRPC-Client", [thisShared](auto) -> CoroTask<void> {
thisShared->processRequest();
co_return;
});
// If coro is null, then the JobQueue has already been shutdown
if (!coro)
// If runner is null, then the JobQueue has already been shutdown
if (!runner)
{
grpc::Status status{grpc::StatusCode::INTERNAL, "Job Queue is already stopped"};
responder_.FinishWithError(status, this);
@@ -114,7 +116,7 @@ GRPCServerImpl::CallData<Request, Response>::process()
template <class Request, class Response>
void
GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::Coro> coro)
GRPCServerImpl::CallData<Request, Response>::processRequest()
{
try
{
@@ -156,7 +158,6 @@ GRPCServerImpl::CallData<Request, Response>::process(std::shared_ptr<JobQueue::C
app_.getLedgerMaster(),
usage,
role,
coro,
InfoSub::pointer(),
apiVersion},
request_};

View File

@@ -206,9 +206,12 @@ private:
clone() override;
private:
// process the request. Called inside the coroutine passed to JobQueue
/**
* Process the gRPC request. Called inside the CoroTask lambda
* posted to the JobQueue by process().
*/
void
process(std::shared_ptr<JobQueue::Coro> coro);
processRequest();
// return load type of this RPC
Resource::Charge

View File

@@ -3,7 +3,6 @@
#include <xrpld/rpc/Role.h>
#include <xrpl/beast/utility/Journal.h>
#include <xrpl/core/JobQueue.h>
#include <xrpl/server/InfoSub.h>
namespace xrpl {
@@ -24,7 +23,6 @@ struct Context
LedgerMaster& ledgerMaster;
Resource::Consumer& consumer;
Role role;
std::shared_ptr<JobQueue::Coro> coro{};
InfoSub::pointer infoSub{};
unsigned int apiVersion;
};

View File

@@ -169,13 +169,10 @@ public:
private:
Json::Value
processSession(
std::shared_ptr<WSSession> const& session,
std::shared_ptr<JobQueue::Coro> const& coro,
Json::Value const& jv);
processSession(std::shared_ptr<WSSession> const& session, Json::Value const& jv);
void
processSession(std::shared_ptr<Session> const&, std::shared_ptr<JobQueue::Coro> coro);
processSession(std::shared_ptr<Session> const&);
void
processRequest(
@@ -183,7 +180,6 @@ private:
std::string const& request,
beast::IP::Endpoint const& remoteIPAddress,
Output&&,
std::shared_ptr<JobQueue::Coro> coro,
std::string_view forwardedFor,
std::string_view user);

View File

@@ -14,6 +14,7 @@
#include <xrpl/basics/make_SSLContext.h>
#include <xrpl/beast/net/IPAddressConversion.h>
#include <xrpl/beast/rfc2616.h>
#include <xrpl/core/CoroTask.h>
#include <xrpl/core/JobQueue.h>
#include <xrpl/json/json_reader.h>
#include <xrpl/json/to_string.h>
@@ -284,9 +285,17 @@ ServerHandler::onRequest(Session& session)
}
std::shared_ptr<Session> detachedSession = session.detach();
auto const postResult = m_jobQueue.postCoro(
jtCLIENT_RPC, "RPC-Client", [this, detachedSession](std::shared_ptr<JobQueue::Coro> coro) {
processSession(detachedSession, coro);
auto const postResult = m_jobQueue.postCoroTask(
jtCLIENT_RPC, "RPC-Client", [this, detachedSession](auto) -> CoroTask<void> {
try
{
processSession(detachedSession);
}
catch (std::exception const& e)
{
JLOG(m_journal.error()) << "RPC-Client coroutine exception: " << e.what();
}
co_return;
});
if (postResult == nullptr)
{
@@ -322,17 +331,26 @@ ServerHandler::onWSMessage(
JLOG(m_journal.trace()) << "Websocket received '" << jv << "'";
auto const postResult = m_jobQueue.postCoro(
auto const postResult = m_jobQueue.postCoroTask(
jtCLIENT_WEBSOCKET,
"WS-Client",
[this, session, jv = std::move(jv)](std::shared_ptr<JobQueue::Coro> const& coro) {
auto const jr = this->processSession(session, coro, jv);
auto const s = to_string(jr);
auto const n = s.length();
boost::beast::multi_buffer sb(n);
sb.commit(boost::asio::buffer_copy(sb.prepare(n), boost::asio::buffer(s.c_str(), n)));
session->send(std::make_shared<StreambufWSMsg<decltype(sb)>>(std::move(sb)));
session->complete();
[this, session, jv = std::move(jv)](auto) -> CoroTask<void> {
try
{
auto const jr = this->processSession(session, jv);
auto const s = to_string(jr);
auto const n = s.length();
boost::beast::multi_buffer sb(n);
sb.commit(
boost::asio::buffer_copy(sb.prepare(n), boost::asio::buffer(s.c_str(), n)));
session->send(std::make_shared<StreambufWSMsg<decltype(sb)>>(std::move(sb)));
session->complete();
}
catch (std::exception const& e)
{
JLOG(m_journal.error()) << "WS-Client coroutine exception: " << e.what();
}
co_return;
});
if (postResult == nullptr)
{
@@ -373,10 +391,7 @@ logDuration(Json::Value const& request, T const& duration, beast::Journal& journ
}
Json::Value
ServerHandler::processSession(
std::shared_ptr<WSSession> const& session,
std::shared_ptr<JobQueue::Coro> const& coro,
Json::Value const& jv)
ServerHandler::processSession(std::shared_ptr<WSSession> const& session, Json::Value const& jv)
{
auto is = std::static_pointer_cast<WSInfoSub>(session->appDefined);
if (is->getConsumer().disconnect(m_journal))
@@ -443,7 +458,6 @@ ServerHandler::processSession(
app_.getLedgerMaster(),
is->getConsumer(),
role,
coro,
is,
apiVersion},
jv,
@@ -514,18 +528,14 @@ ServerHandler::processSession(
return jr;
}
// Run as a coroutine.
void
ServerHandler::processSession(
std::shared_ptr<Session> const& session,
std::shared_ptr<JobQueue::Coro> coro)
ServerHandler::processSession(std::shared_ptr<Session> const& session)
{
processRequest(
session->port(),
buffers_to_string(session->request().body().data()),
session->remoteAddress().at_port(0),
makeOutput(*session),
coro,
forwardedFor(session->request()),
[&] {
auto const iter = session->request().find("X-User");
@@ -562,7 +572,6 @@ ServerHandler::processRequest(
std::string const& request,
beast::IP::Endpoint const& remoteIPAddress,
Output&& output,
std::shared_ptr<JobQueue::Coro> coro,
std::string_view forwardedFor,
std::string_view user)
{
@@ -819,7 +828,6 @@ ServerHandler::processRequest(
app_.getLedgerMaster(),
usage,
role,
coro,
InfoSub::pointer(),
apiVersion},
params,

View File

@@ -7,6 +7,9 @@
#include <xrpl/protocol/RPCErr.h>
#include <xrpl/resource/Fees.h>
#include <condition_variable>
#include <mutex>
namespace xrpl {
// This interface is deprecated.
@@ -37,98 +40,40 @@ doRipplePathFind(RPC::JsonContext& context)
PathRequest::pointer request;
lpLedger = context.ledgerMaster.getClosedLedger();
// It doesn't look like there's much odd happening here, but you should
// be aware this code runs in a JobQueue::Coro, which is a coroutine.
// And we may be flipping around between threads. Here's an overview:
//
// 1. We're running doRipplePathFind() due to a call to
// ripple_path_find. doRipplePathFind() is currently running
// inside of a JobQueue::Coro using a JobQueue thread.
//
// 2. doRipplePathFind's call to makeLegacyPathRequest() enqueues the
// path-finding request. That request will (probably) run at some
// indeterminate future time on a (probably different) JobQueue
// thread.
//
// 3. As a continuation from that path-finding JobQueue thread, the
// coroutine we're currently running in (!) is posted to the
// JobQueue. Because it is a continuation, that post won't
// happen until the path-finding request completes.
//
// 4. Once the continuation is enqueued, and we have reason to think
// the path-finding job is likely to run, then the coroutine we're
// running in yield()s. That means it surrenders its thread in
// the JobQueue. The coroutine is suspended, but ready to run,
// because it is kept resident by a shared_ptr in the
// path-finding continuation.
//
// 5. If all goes well then path-finding runs on a JobQueue thread
// and executes its continuation. The continuation posts this
// same coroutine (!) to the JobQueue.
//
// 6. When the JobQueue calls this coroutine, this coroutine resumes
// from the line below the coro->yield() and returns the
// path-finding result.
//
// With so many moving parts, what could go wrong?
//
// Just in terms of the JobQueue refusing to add jobs at shutdown
// there are two specific things that can go wrong.
//
// 1. The path-finding Job queued by makeLegacyPathRequest() might be
// rejected (because we're shutting down).
//
// Fortunately this problem can be addressed by looking at the
// return value of makeLegacyPathRequest(). If
// makeLegacyPathRequest() cannot get a thread to run the path-find
// on, then it returns an empty request.
//
// 2. The path-finding job might run, but the Coro::post() might be
// rejected by the JobQueue (because we're shutting down).
//
// We handle this case by resuming (not posting) the Coro.
// By resuming the Coro, we allow the Coro to run to completion
// on the current thread instead of requiring that it run on a
// new thread from the JobQueue.
//
// Both of these failure modes are hard to recreate in a unit test
// because they are so dependent on inter-thread timing. However
// the failure modes can be observed by synchronously (inside the
// rippled source code) shutting down the application. The code to
// do so looks like this:
//
// context.app.signalStop();
// while (! context.app.getJobQueue().jobCounter().joined()) { }
//
// The first line starts the process of shutting down the app.
// The second line waits until no more jobs can be added to the
// JobQueue before letting the thread continue.
//
// May 2017
// makeLegacyPathRequest enqueues a path-finding job that runs
// asynchronously. We block this thread with a condition_variable
// until the path-finding continuation signals completion.
// If makeLegacyPathRequest cannot schedule the job (e.g. during
// shutdown), it returns an empty request and we skip the wait.
// Replaces the old Coro yield/resume pattern with synchronous
// blocking, eliminating shutdown race conditions.
std::mutex mtx;
std::condition_variable cv;
bool pathDone = false;
jvResult = context.app.getPathRequests().makeLegacyPathRequest(
request,
[&context]() {
// Copying the shared_ptr keeps the coroutine alive up
// through the return. Otherwise the storage under the
// captured reference could evaporate when we return from
// coroCopy->resume(). This is not strictly necessary, but
// will make maintenance easier.
std::shared_ptr<JobQueue::Coro> coroCopy{context.coro};
if (!coroCopy->post())
[&]() {
{
// The post() failed, so we won't get a thread to let
// the Coro finish. We'll call Coro::resume() so the
// Coro can finish on our thread. Otherwise the
// application will hang on shutdown.
coroCopy->resume();
std::lock_guard lk(mtx);
pathDone = true;
}
cv.notify_one();
},
context.consumer,
lpLedger,
context.params);
if (request)
{
context.coro->yield();
using namespace std::chrono_literals;
std::unique_lock lk(mtx);
if (!cv.wait_for(lk, 30s, [&] { return pathDone; }))
{
// Path-finding continuation never fired (e.g. shutdown
// race or unexpected failure). Return an internal error
// rather than blocking the RPC thread indefinitely.
return rpcError(rpcINTERNAL);
}
jvResult = request->doStatus(context.params);
}