fix: Guard Coro::resume() against completed coroutines (#6608)

Signed-off-by: Pratik Mankawde <3397372+pratikmankawde@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Pratik Mankawde
2026-03-27 18:52:18 +00:00
committed by Bart
parent 29aba28f5b
commit 0c76bf991a
2 changed files with 19 additions and 7 deletions

View File

@@ -70,14 +70,24 @@ JobQueue::Coro::resume()
running_ = true;
}
{
std::lock_guard lock(jq_.m_mutex);
std::lock_guard lk(jq_.m_mutex);
--jq_.nSuspend_;
}
auto saved = detail::getLocalValues().release();
detail::getLocalValues().reset(&lvs_);
std::lock_guard lock(mutex_);
XRPL_ASSERT(static_cast<bool>(coro_), "xrpl::JobQueue::Coro::resume : is runnable");
coro_();
// A late resume() can arrive after the coroutine has already completed.
// This is an expected (if rare) outcome of the race condition documented
// in JobQueue.h:354-377 where post() schedules a resume job before the
// coroutine yields — the mutex serializes access, but by the time this
// resume() acquires the lock the coroutine may have already run to
// completion. Calling operator() on a completed boost::coroutine2 is
// undefined behavior, so we must check and skip invoking the coroutine
// body if it has already completed.
if (coro_)
{
coro_();
}
detail::getLocalValues().release();
detail::getLocalValues().reset(saved);
std::lock_guard lk(mutex_run_);

View File

@@ -99,8 +99,8 @@ public:
Effects:
The coroutine continues execution from where it last left off
using this same thread.
Undefined behavior if called after the coroutine has completed
with a return (as opposed to a yield()).
If the coroutine has already completed, returns immediately
(handles the documented post-before-yield race condition).
Undefined behavior if resume() or post() called consecutively
without a corresponding yield.
*/
@@ -357,8 +357,10 @@ private:
If the post() job were to be executed before yield(), undefined behavior
would occur. The lock ensures that coro_ is not called again until we exit
the coroutine. At which point a scheduled resume() job waiting on the lock
would gain entry, harmlessly call coro_ and immediately return as we have
already completed the coroutine.
would gain entry. resume() checks if the coroutine has already completed
(coro_ converts to false) and, if so, skips invoking operator() since
calling operator() on a completed boost::coroutine2 pull_type is undefined
behavior.
The race condition occurs as follows: