Compare commits

...

1 Commits

Author SHA1 Message Date
Pratik Mankawde
331b4739ae fix: Guard Coro::resume() against completed coroutines
A documented race condition (JobQueue.h:354-377) allows post() to
schedule a resume() job that executes after the coroutine has already
completed. With boost::coroutine v1, calling operator() on a completed
coroutine was technically UB but happened to be benign in Release
builds (BOOST_ASSERT compiled out). After the switch to
boost::coroutine2, the same call becomes genuine UB — the underlying
boost::context::continuation is in a moved-from state, leading to
crashes or memory corruption.

The XRPL_ASSERT at Coro.ipp:79 caught this in CI Release builds with
-Dassert=ON, surfacing as an intermittent assertion failure.

Replace the assertion with an early-return guard: if coro_ is already
exhausted when resume() acquires the mutex, clean up and return. This
makes the documented race condition safe by design rather than relying
on undefined behavior being benign.

Update the race condition documentation and resume() doc comment to
reflect the new behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 15:30:30 +00:00
2 changed files with 21 additions and 5 deletions

View File

@@ -76,7 +76,22 @@ JobQueue::Coro::resume()
auto saved = detail::getLocalValues().release();
detail::getLocalValues().reset(&lvs_);
std::lock_guard lock(mutex_);
XRPL_ASSERT(static_cast<bool>(coro_), "xrpl::JobQueue::Coro::resume : is runnable");
// A late resume() can arrive after the coroutine has already completed.
// This is an expected (if rare) outcome of the race condition documented
// in JobQueue.h:354-377 where post() schedules a resume job before the
// coroutine yields — the mutex serializes access, but by the time this
// resume() acquires the lock the coroutine may have already run to
// completion. Calling operator() on a completed boost::coroutine2 is
// undefined behavior, so we must check and return early.
if (!coro_)
{
detail::getLocalValues().release();
detail::getLocalValues().reset(saved);
std::lock_guard lk(mutex_run_);
running_ = false;
cv_.notify_all();
return;
}
coro_();
detail::getLocalValues().release();
detail::getLocalValues().reset(saved);

View File

@@ -99,8 +99,8 @@ public:
Effects:
The coroutine continues execution from where it last left off
using this same thread.
Undefined behavior if called after the coroutine has completed
with a return (as opposed to a yield()).
If the coroutine has already completed, returns immediately
(handles the documented post-before-yield race condition).
Undefined behavior if resume() or post() called consecutively
without a corresponding yield.
*/
@@ -357,8 +357,9 @@ private:
If the post() job were to be executed before yield(), undefined behavior
would occur. The lock ensures that coro_ is not called again until we exit
the coroutine. At which point a scheduled resume() job waiting on the lock
would gain entry, harmlessly call coro_ and immediately return as we have
already completed the coroutine.
would gain entry. resume() checks if the coroutine has already completed
(coro_ converts to false) and returns early if so, since calling operator()
on a completed boost::coroutine2 pull_type is undefined behavior.
The race condition occurs as follows: