Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[manager/dispatcher] Fix deadlock in dispatcher #2744

Merged
merged 1 commit into from
Sep 10, 2018

Commits on Sep 10, 2018

  1. Fix deadlock in dispatcher

    There was a rare case where the dispatcher could end up deadlocked when
    calling stop, which would cause the whole leadership change procedure to
    go sideways, the dispatcher to pile up with goroutines, and the node to
    crash.
    
    In a nutshell, calls to the Session RPC end up in a (*Cond).Wait(),
    waiting for a Broadcast that, once Stop is called, may never come. To
    avoid that case, Stop, after being called and canceling the Dispatcher
    context, does one final Broadcast to wake the sleeping waiters.
    
    However, because the rpcRW lock, which stops Stop from proceeding until
    all RPCs have returned, was previously obtained BEFORE the call to
    Broadcast, Stop would never reach this final Broadcast call, waiting on
    the Session RPCs to release the rpcRW lock, which they could not do
    until Broadcast was called. Hence, deadlock.
    
    To fix this, we simple have to move this final Broadcast to above the
    attempt to acquire the rpcRW lock, allowing everything to proceed
    correctly.
    
    Signed-off-by: Drew Erny <[email protected]>
    dperny committed Sep 10, 2018
    Configuration menu
    Copy the full SHA
    4f15251 View commit details
    Browse the repository at this point in the history