fix: Guard Coro::resume() against completed coroutines

A documented race condition (JobQueue.h:354-377) allows post() to schedule a resume() job that executes after the coroutine has already completed. With boost::coroutine v1, calling operator() on a completed coroutine was technically UB but happened to be benign in Release builds (BOOST_ASSERT compiled out). After the switch to boost::coroutine2, the same call becomes genuine UB — the underlying boost::context::continuation is in a moved-from state, leading to crashes or memory corruption. The XRPL_ASSERT at Coro.ipp:79 caught this in CI Release builds with -Dassert=ON, surfacing as an intermittent assertion failure. Replace the assertion with an early-return guard: if coro_ is already exhausted when resume() acquires the mutex, clean up and return. This makes the documented race condition safe by design rather than relying on undefined behavior being benign. Update the race condition documentation and resume() doc comment to reflect the new behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 12:12:34 +00:00 · 2026-03-20 15:30:30 +00:00
2 changed files with 21 additions and 5 deletions
--- a/include/xrpl/core/Coro.ipp
+++ b/include/xrpl/core/Coro.ipp
@@ -76,7 +76,22 @@ JobQueue::Coro::resume()
    auto saved = detail::getLocalValues().release();
    detail::getLocalValues().reset(&lvs_);
    std::lock_guard lock(mutex_);
-    XRPL_ASSERT(static_cast<bool>(coro_), "xrpl::JobQueue::Coro::resume : is runnable");
+    // A late resume() can arrive after the coroutine has already completed.
+    // This is an expected (if rare) outcome of the race condition documented
+    // in JobQueue.h:354-377 where post() schedules a resume job before the
+    // coroutine yields — the mutex serializes access, but by the time this
+    // resume() acquires the lock the coroutine may have already run to
+    // completion. Calling operator() on a completed boost::coroutine2 is
+    // undefined behavior, so we must check and return early.
+    if (!coro_)
+    {
+        detail::getLocalValues().release();
+        detail::getLocalValues().reset(saved);
+        std::lock_guard lk(mutex_run_);
+        running_ = false;
+        cv_.notify_all();
+        return;
+    }
    coro_();
    detail::getLocalValues().release();
    detail::getLocalValues().reset(saved);
--- a/include/xrpl/core/JobQueue.h
+++ b/include/xrpl/core/JobQueue.h
@@ -99,8 +99,8 @@ public:
            Effects:
               The coroutine continues execution from where it last left off
                 using this same thread.
-            Undefined behavior if called after the coroutine has completed
-              with a return (as opposed to a yield()).
+               If the coroutine has already completed, returns immediately
+                 (handles the documented post-before-yield race condition).
            Undefined behavior if resume() or post() called consecutively
              without a corresponding yield.
        */
@@ -357,8 +357,9 @@ private:
    If the post() job were to be executed before yield(), undefined behavior
    would occur. The lock ensures that coro_ is not called again until we exit
    the coroutine. At which point a scheduled resume() job waiting on the lock
-    would gain entry, harmlessly call coro_ and immediately return as we have
-    already completed the coroutine.
+    would gain entry. resume() checks if the coroutine has already completed
+    (coro_ converts to false) and returns early if so, since calling operator()
+    on a completed boost::coroutine2 pull_type is undefined behavior.

    The race condition occurs as follows: