-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdmain: configure keep-alive policy from server #8258
Conversation
b2e14fa
to
0be8a68
Compare
@@ -0,0 +1,36 @@ | |||
// Copyright 2017 The etcd Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a separate package under etcdserver, could this all go in embed
and extend v3rpc.Server()
to accept a list of server options?
clientv3/client.go
Outdated
opts = append(opts, grpc.WithKeepaliveParams(params)) | ||
|
||
params := keepalive.ClientParameters{ | ||
Time: kp.Policy.MinTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why this can't be kept user configurable?
|
c3561de
to
82ccb21
Compare
We need a way to check whether client is configured to send too_many_pings against servers (slow pings are ok). So there needs to be a separate package to import server-side configuration. If we define that in embed, clientv3 would need to vendor all server-side dependencies. |
clientv3/client.go
Outdated
@@ -217,6 +218,10 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts | |||
opts = []grpc.DialOption{grpc.WithTimeout(c.cfg.DialTimeout)} | |||
} | |||
if c.cfg.DialKeepAliveTime > 0 { | |||
// prevent incompatible settings that can lead to connection close from server-side |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be OK to let this be misconfigured and let the client have trouble (instead of the panic) since it's opt-in. In the future the etcd server may need <MinTime, and it'd be nicer if older clients could use it without any trouble.
de2e92d
to
aaf3ae3
Compare
etcdmain/config.go
Outdated
@@ -143,6 +144,9 @@ func newConfig() *config { | |||
fs.Int64Var(&cfg.QuotaBackendBytes, "quota-backend-bytes", cfg.QuotaBackendBytes, "Raise alarms when backend size exceeds the given quota. 0 means use the default quota.") | |||
fs.UintVar(&cfg.MaxTxnOps, "max-txn-ops", cfg.MaxTxnOps, "Maximum number of operations permitted in a transaction.") | |||
fs.UintVar(&cfg.MaxRequestBytes, "max-request-bytes", cfg.MaxRequestBytes, "Maximum client request size in bytes the server will accept.") | |||
fs.DurationVar(&cfg.KeepAliveEnforcementPolicyMinTime, "keepalive-min-time", time.Duration(0), "Minimum interval duration that a client should wait before pinging server.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we give a good default value? can we add gRPC prefix to these flags?
899b1ae
to
fcbb66d
Compare
@gyuho should grpc health-checking be kept to support #8308? http2 keepalives don't give any indication that the client should change endpoints and even watcher leader checking isn't enough for the general case (e.g., if a put fails on a partitioned member, it'd be nice if the client would try to switch endpoints) |
healthy checking is also helpful to detect http2 blackhole (an application that can talk http2, but not an etcd server). |
@xiang90 yes, I think etcd needs both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a clientv3/integration or e2e test to confirm watch will detect the dead connection and switch to another endpoint would be good for knowing this works
etcdmain/config.go
Outdated
@@ -143,6 +144,9 @@ func newConfig() *config { | |||
fs.Int64Var(&cfg.QuotaBackendBytes, "quota-backend-bytes", cfg.QuotaBackendBytes, "Raise alarms when backend size exceeds the given quota. 0 means use the default quota.") | |||
fs.UintVar(&cfg.MaxTxnOps, "max-txn-ops", cfg.MaxTxnOps, "Maximum number of operations permitted in a transaction.") | |||
fs.UintVar(&cfg.MaxRequestBytes, "max-request-bytes", cfg.MaxRequestBytes, "Maximum client request size in bytes the server will accept.") | |||
fs.DurationVar(&cfg.GRPCKeepAliveEnforcementPolicyMinTime, "grpc-keepalive-min-time", 5*time.Second, "Minimum interval duration that a client should wait before pinging server.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this default should be set in embed.NewConfig()
embed/config.go
Outdated
// sends goaway and closes the connection (too_many_pings). When too slow, | ||
// nothing happens. Server expects client pings only when there is any | ||
// active streams by setting 'PermitWithoutStream' false. | ||
GRPCKeepAliveEnforcementPolicyMinTime time.Duration `json:"grpc-keepalive-min-time"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the names match the json more closely? Like GRPCKeepAliveMinTime
etcdmain/help.go
Outdated
@@ -54,6 +54,12 @@ member flags: | |||
time (in milliseconds) of a heartbeat interval. | |||
--election-timeout '1000' | |||
time (in milliseconds) for an election to timeout. See tuning documentation for details. | |||
--grpc-keepalive-interval '5s' | |||
minimum duration interval that a client should wait before pinging server. | |||
--grpc-keepalive-interval '0s' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grpc-keepalive-interval already given above
43811e4
to
9c716ba
Compare
@heyitsanthony I am tracking down this to grpc, and now Go dial-ing code. The problem is even if we blackholed the endpoint, Go http/dialContext succeeds without failure. func TestDial(t *testing.T) {
capnslog.SetGlobalLogLevel(capnslog.CRITICAL)
clus := integration.NewClusterV3(t, &integration.ClusterConfig{
Size: 3,
})
defer func() {
capnslog.SetGlobalLogLevel(capnslog.CRITICAL)
clus.Terminate(t)
}()
ccfg := clientv3.Config{
Endpoints: []string{clus.Members[0].GRPCAddr()},
}
fmt.Println("blackhole-d 1", clus.Members[0].GRPCAddr())
clus.Members[0].Blackhole()
fmt.Println("blackhole-d 2", clus.Members[0].GRPCAddr())
time.Sleep(3 * time.Second)
capnslog.SetGlobalLogLevel(capnslog.INFO)
cli, err := clientv3.New(ccfg)
if err != nil {
t.Fatal(err)
}
defer cli.Close()
fmt.Println("PUT 1")
if _, err := cli.Put(context.TODO(), "foo", "bar"); err != nil {
t.Fatal(err)
}
fmt.Println("PUT 2")
fmt.Println("DONE")
} https://github.com/coreos/etcd/blob/master/clientv3/client.go#L244-L245 dialer := &net.Dialer{Timeout: t}
conn, err := dialer.DialContext(c.ctx, proto, host) Always succeeds even with blackholed endpoint. Now grpc-go http2 client establishes the transport to this blackholed endpoint, but fails on keepalive, so keep retrying. Keepalive fails because there's no frame received, but the initial connection succeeds. |
Dial should be able to connect in this case and retry is OK if there's only one endpoint. Is there starvation when trying to failover to another endpoint? |
Got it.
grpc-go, as long as the connectivity state is not Shutdown, will keep retrying, even if it's the blackholed endpoint. I am experimenting manual-flagging on the blackhole-d endpoint, so that balancer does not switch to blackhole-d ones. EDIT: The connection works fine on the blackholed endpoint, and grpc-go never calls TODO: maybe another PR
|
Signed-off-by: Gyu-Ho Lee <[email protected]>
Signed-off-by: Gyu-Ho Lee <[email protected]>
Signed-off-by: Gyu-Ho Lee <[email protected]>
Signed-off-by: Gyu-Ho Lee <[email protected]>
Signed-off-by: Gyu-Ho Lee <[email protected]>
97bfa47
to
e5d8187
Compare
Signed-off-by: Gyu-Ho Lee <[email protected]>
Signed-off-by: Gyu-Ho Lee <[email protected]>
Closing. Need more research in grpc-go logic. Will open a new PR soon. |
Parameters should match the one in server.
https://github.com/grpc/grpc-go/blob/master/keepalive/keepalive.go#L30
Updates #8199.
Address #8022.