-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: rework resolver and balancer wrappers to avoid deadlock #6804
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6804 +/- ##
==========================================
+ Coverage 83.40% 83.59% +0.18%
==========================================
Files 285 285
Lines 30879 30763 -116
==========================================
- Hits 25754 25715 -39
+ Misses 4053 3987 -66
+ Partials 1072 1061 -11
|
- All channel idleness management goes through idle manager -- e.g. Connect(). - Add lock and closed state to balancer_wrapper to prevent a race where the balancer creates a subchannel and we enter idle mode during that process. - Split resolver wrapper into constructor and start method so the serializer is available during resolver.Build. - Add EnterIdleModeForTesting to idleness manager and use idlenessMgr methods in the idleness test instead of methods on the ClientConn which should not be called directly.
@zasweq I may have misunderstood, so if @arvindbr8 can review this tomorrow then feel free to ignore. Otherwise if you could review this next week so we can have it done in time to cherry-pick it to the release branch, that would be great. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a style pass. Technically, I have a very minute understanding of the original Exit Idle code, and this fix. I think since Easwar is back next week he should be primary reviewer since he wrote the ExitIdle code. We should also have a technical sync where you briefly describe to us the original code, all the issues, the fix and why it overall fixes.
@dfawley : I've only looked through the idleness code so far. But assigning back to you to make the requested changes while I will continue to review other changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments address (I think!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM module some nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
balancer_wrapper.go
Outdated
@@ -184,115 +168,49 @@ func (ccb *ccBalancerWrapper) buildLoadBalancingPolicy(name string) { | |||
ccb.curBalancerName = builder.Name() | |||
} | |||
|
|||
// close initiates async shutdown of the wrapper. To determine the wrapper has | |||
// finished shutting down, the channel should block on ccb.serializer.Done() | |||
// without cc.mu held. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could we also please mention here that the caller (channel) should hold cc.mu
when invoking close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// ccBalancerWrapper sits between the ClientConn and the Balancer. | ||
// | ||
// ccBalancerWrapper implements methods corresponding to the ones on the | ||
// balancer.Balancer interface. The ClientConn is free to call these methods | ||
// concurrently and the ccBalancerWrapper ensures that calls from the ClientConn | ||
// to the Balancer happen synchronously and in order. | ||
// to the Balancer happen in order by performing them in the serializer, without | ||
// any mutexes held. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do hold ccb.mu
in close()
, and require that the caller hold cc.mu
when calling into close()
. Would it be better to have that spelled out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is about how the calls into the plugin, themselves, are made without cc.mu
held. Not that calls into ccb
must not hold cc.mu
.
Probably we should just document the functions. (Done, and it looks like the only one that matters either way is close
, which needs cc.mu
.)
@@ -313,10 +231,6 @@ func (ccb *ccBalancerWrapper) RemoveSubConn(sc balancer.SubConn) { | |||
} | |||
|
|||
func (ccb *ccBalancerWrapper) UpdateAddresses(sc balancer.SubConn, addrs []resolver.Address) { | |||
if ccb.isIdleOrClosed() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not want to check if the balancer is closed here? UpdateAddresses
could lead to transport creation, right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more relevant whether the SubConn/addrConn was closed and not whether the balancer was closed. And we do that check here:
Line 1031 in 93389b7
if ac.state == connectivity.Shutdown || |
resolver_balancer_ext_test.go
Outdated
@@ -0,0 +1,264 @@ | |||
/* | |||
* | |||
* Copyright 2014 gRPC authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2023?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O_O
resolver_balancer_ext_test.go
Outdated
// the channel enters idle mode. | ||
func (s) TestEnterIdleDuringResolverUpdateState(t *testing.T) { | ||
enterIdle := internal.EnterIdleModeForTesting.(func(*grpc.ClientConn)) | ||
const name = "testeidrus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: In each of these tests, could we use t.Name()
as the resolver name instead? I think a shortened name could be very confusing for someone looking at a failed test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that originally, but it doesn't work because of case/slashes. I can fix it, though. Done.
balancer creates a subchannel and we enter idle mode during that process.
available during resolver.Build.
in the idleness test instead of methods on the ClientConn which should not be
called directly.
Fixes #6783
RELEASE NOTES: