-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race in grpcsync.CallbackSerializer #6778
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6778 +/- ##
==========================================
+ Coverage 83.34% 83.51% +0.17%
==========================================
Files 285 285
Lines 30966 30965 -1
==========================================
+ Hits 25809 25861 +52
+ Misses 4076 4036 -40
+ Partials 1081 1068 -13
|
Actually, now I am not sure how exactly this could happen:
My guess is that close was called twice at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Please see the inline comment for a simplification.
@@ -85,11 +85,10 @@ func (cs *CallbackSerializer) run(ctx context.Context) { | |||
// Do nothing here. Next iteration of the for loop will not happen, | |||
// since ctx.Err() would be non-nil. | |||
case callback, ok := <-cs.callbacks.Get(): | |||
if !ok { | |||
return | |||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should need to read or check ok
at all here. There's no way ok
can ever be false, because the channel is only closed later. We'll unblock the select
only when the context is canceled or a callback is created.
Ah, right. Because |
There is no |
One thing strange:
|
Aha, I found it. Line 429 in be1d1c1
And Line 829 in be1d1c1
So if |
This make sense, however I am not sure what is the best way to fix that, which doesn't result in a lot of additional complexity. I guess it might be easier for you to close this PR and do the fix yourself? |
Sounds good. I'm not sure what the right fix is at this point, either, so it may take awhile unfortunately. |
#6783 FYI |
Here is the race condition that we encountered on one of our servers.
My understanding of this trace is the following:
The fix is to always check backlog while holding closedMu lock before returning from CallbackSerializer.run()