Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<condition_variable>: Parallel invocations of stop token condition_variable_any::wait_until can result in deadlocks #2218

Closed
TruePikachu opened this issue Sep 26, 2021 · 5 comments · Fixed by #2220
Labels
bug Something isn't working fixed Something works now, yay!

Comments

@TruePikachu
Copy link

Multiple calls of stop_token std::condition_variable_any::wait_until invoked in parallel (e.g. via threads) can result in a deadlock arising. Specifically, given

std::mutex mutex;
std::condition_variable_any cv;
std::stop_token stoken;
auto pred = []()->bool { return false; };

void do_work() {
    std::unique_lock lock(mutex);
    cv.wait_until(lock, stoken, std::chrono::steady_clock::now(), pred);
}

Invocation of do_work on two separate threads can result in a deadlock:

  • Thread 1 takes the lock, thread 2 doesn't get the lock and blocks.
  • Thread 1 executes wait_until:
  • Thread 2 now unblocks, and takes the lock on mutex.
  • Thread 2 starts wait_until, and blocks trying to take the lock on the guard.
  • Thread 1, still inside wait_until, sees that the wait had expired, and breaks out of an internal loop; this causes three variables to go out of scope, which are deconstructed in the reverse order of their construction:
    • The first object to deconstruct is _Now, which is trivial.
    • The second object to deconstruct is _Unlock_outer, and to do this we need to take mutex, which is currently locked by thread 2.
    • The third object to deconstruct is _Guard, which would unlock the guard thread 2 is currently waiting for.

This results in thread 1 wanting to take a lock held by thread 2, and thread 2 wanting to take a lock held by thread 1. There are, of course, different sequences of events which can also lead to this (thread 1 could have been waiting for a while already, or thread 2 could have taken the lock immediately but then yielded for thread 1).

I'm not versed well enough in the actual implementation of the STL to attempt a fix myself, but I've currently worked around the issue by using the non-stop-token version of the method.

@AlexGuteniev
Copy link
Contributor

  • The second object to deconstruct is _Unlock_outer, and to do this we need to take mutex, which is currently locked by thread 2.
  • The third object to deconstruct is _Guard, which would unlock the guard thread 2 is currently waiting for.

Doesn't _Guard already release the _Lck here:

_Guard.unlock();

?

@TruePikachu
Copy link
Author

Execution only reaches that point if _Now < _Abs_time i.e. the provided time point hasn't been reached. The issue arises if it has been reached.

@AlexGuteniev
Copy link
Contributor

Oh, I see. Then we should probably put _Guard.unlock(): before this break:

@AlexGuteniev
Copy link
Contributor

See #2220

@CaseyCarter CaseyCarter added the bug Something isn't working label Sep 27, 2021
@CaseyCarter
Copy link
Member

Thanks for the report, and thanks for the fix!

@StephanTLavavej StephanTLavavej added the fixed Something works now, yay! label Oct 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed Something works now, yay!
Projects
None yet
4 participants