Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

registry/rpc: use simpleBalancer instead of ClientConn.State() #1673

Merged
merged 3 commits into from
Nov 24, 2016

Conversation

dongsupark
Copy link
Contributor

As ClientConn.State() of gRPC has disappeared, we need to avoid using ClientConn.State(). Instead we should make use of gRPC rebalancer mechanism, just like etcdv3 is doing. To do that, introduce simpleBalancer, as a minimum structure to be used for grpc.Balancer.

Note that this change could be possibly invasive to the current behavior of fleet's gRPC. A careful review is needed.

/cc @hectorj2f
Fixes #1672

log.Infof("Status of rpc service: %d, connection state: %s", status, connState)
return status == pb.HealthCheckResponse_SERVING && err == nil
}
// NOTE: checking for readyc doesn't work as expected.
Copy link
Contributor

@hectorj2f hectorj2f Sep 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part WIP ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah right. I need to have a deeper look into this one. Unfortunately checking the readyc doesn't work as expected. If you have a better idea, please let me know. TBH I'm just a beginner of the new API of gRPC v1.0 or etcdv3.

@hectorj2f
Copy link
Contributor

@dongsupark could add a bigger description about what the balancer does ?
Also, this PR is a bit big. What do you think about a PR for the vendoring packages and another with the logic ? That would make the reviewing much easier and faster.

@dongsupark
Copy link
Contributor Author

@hectorj2f Sure, I will split it into multiple PRs, and add more descriptions.

@dongsupark dongsupark force-pushed the dongsu/grpc-fix-build-clientconn branch from 6d3a053 to f5f9223 Compare September 8, 2016 14:58
@dongsupark dongsupark changed the title registry/rpc: fix build errors from ClientConn.State() in gRPC registry/rpc: use simpleBalancer instead of ClientConn.State() Sep 8, 2016
@dongsupark
Copy link
Contributor Author

Force-pushed it by changing the following:

  • Fixed bugs in IsRegistryReady(). This function still needs to check r.Status() like before. It also needs to check if r.registryConn or r.balancer is nil.
  • Fixed a bug in rpcAcquireLeadership() by inserting reg.IsRegistryReady() to the condition for checking existing lease. This bug resulted in another failure in TestDynamicClusterNewMemberUnitMigration(), which got me locked up in a debugging nightmare. (Now it's solved)
  • Simplified balancer.go, removing unnecessary parts like getHost().
  • Added high-level descriptions on simpleBalancer.

And I created 2 PRs, #1676 (only vendor updates) and #1677 (relevant code), for easier reviews.

@dongsupark
Copy link
Contributor Author

FYI.
Running benchmarks with this PR included, I can observe performance regressions. Nomi benchmarks starting thousands of units over a multi-node cluster, unit starts take much longer time than the current master branch.
I'm not sure why. (gRPC balancer issue? Bug in my patches?)
Until we figured out its reason, let's not merge this PR.

Dongsu Park added 2 commits November 10, 2016 11:36
Update vendor/google.golang.org/grpc to the most recent master,
as of 2016-08-26.
Now that gRPC as well as other dependencies were updated, we need to update
fleet.pb.go too.
@dongsupark dongsupark force-pushed the dongsu/grpc-fix-build-clientconn branch 2 times, most recently from 4ef338e to ecb121a Compare November 10, 2016 11:26
As ClientConn.State() of gRPC has disappeared, we need to avoid using
ClientConn.State(). Instead we should make use of gRPC rebalancer
mechanism, just like etcdv3 is doing. To do that, introduce
simpleBalancer, as a minimum structure to be used for grpc.Balancer.
@dongsupark
Copy link
Contributor Author

dongsupark commented Nov 23, 2016

Running benchmarks with this PR included, I can observe performance regressions. Nomi benchmarks starting thousands of units over a multi-node cluster, unit starts take much longer time than the current master branch.

I finally got some time to do nomi benchmarks again. This time, I cannot see the regression any more.

When tested with etcd v3, results of master branch and this branch are nearly the same.
With etcd v2, this branch is even faster than master.
That's actually an expected result, as it's already known that performance got improved with gRPC v1.0, compared to the previous versions.

What I saw in September was probably just because of mistake in testing.
Tomorrow I'll run again benchmarks several times, and if it's still green, I'll merge this PR.

@dongsupark
Copy link
Contributor Author

I can confirm that its performance is all right.
With etcd2, unit start time was improved by 16%. With etcd3, it's nearly the same.
Benchmark raw data is available on https://github.com/dongsupark/fleet-benchmarks.
I'll merge this PR. Thanks!

@dongsupark dongsupark merged commit 8ecf7e8 into coreos:master Nov 24, 2016
@dongsupark dongsupark deleted the dongsu/grpc-fix-build-clientconn branch November 24, 2016 14:43
dongsupark pushed a commit that referenced this pull request Nov 24, 2016
registry/rpc: use simpleBalancer instead of ClientConn.State()
dongsupark pushed a commit that referenced this pull request Nov 24, 2016
registry/rpc: use simpleBalancer instead of ClientConn.State()
dongsupark pushed a commit that referenced this pull request Nov 24, 2016
registry/rpc: use simpleBalancer instead of ClientConn.State()
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants