-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver/api/v3rpc: debug user cancellation and log warning for rest #9105
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9105 +/- ##
=========================================
Coverage ? 75.93%
=========================================
Files ? 359
Lines ? 29987
Branches ? 0
=========================================
Hits ? 22771
Misses ? 5626
Partials ? 1590
Continue to review full report at Codecov.
|
etcdserver/api/v3rpc/watch.go
Outdated
@@ -140,7 +140,11 @@ func (ws *watchServer) Watch(stream pb.Watch_WatchServer) (err error) { | |||
// deadlock when calling sws.close(). | |||
go func() { | |||
if rerr := sws.recvLoop(); rerr != nil { | |||
plog.Debugf("failed to receive watch request from gRPC stream (%q)", rerr.Error()) | |||
if stream.Context().Err() == context.Canceled { | |||
plog.Debugf("failed to receive watch request from gRPC stream 11 (%q)", rerr.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove 11
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooops lol .. :( yes
etcdserver/api/v3rpc/watch.go
Outdated
@@ -140,7 +140,11 @@ func (ws *watchServer) Watch(stream pb.Watch_WatchServer) (err error) { | |||
// deadlock when calling sws.close(). | |||
go func() { | |||
if rerr := sws.recvLoop(); rerr != nil { | |||
plog.Debugf("failed to receive watch request from gRPC stream (%q)", rerr.Error()) | |||
if stream.Context().Err() == context.Canceled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both context.Cancel and context.DeadlineExceeded should be ignored i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should have a shared function for all streaming API logs #8939 (e.g. lease keepalive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyuho thanks! I just learned how to use lease :-) and could see the related log messages. So also modifying the message on LOC 110 (for user cancelled lease) in #8939 is what your suggesting right? I know you suggested to refer to the toErr
function in etcd/clientv3/client.go
in the issue discussion but I believe that was just to learn about different error code and not to make any changes there. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just like we do in clientv3, there should be an error parsing function on this rerr
error. Convert it to gRPC *status.statusError
and match the error code. If the error is canceled or deadline exceeded, we log with debug level.
can we manually test this PR?
|
@xiang90 +1 thanks, I actually pretty much already tested the first scenario before submitting commit - i.e. manually cancelling watcher and making sure that warning is printed but not debug message. Testing more per your suggestion. |
I think if you manually callcelling watcher, the log should be at debug level not warning level? |
Hmm..so by default the debug level is off right? For example, when I run etcd as
|
correct. and what you saw is the expected behavior. |
:-) OK, prefect. Thought so too :-). |
I have tested following, manual and auto interruption for watch and lease keep-alive, with new commit:
etcd was ran with and without |
etcdserver/api/v3rpc/lease.go
Outdated
@@ -107,7 +110,13 @@ func (ls *LeaseServer) leaseKeepAlive(stream pb.Lease_LeaseKeepAliveServer) erro | |||
return nil | |||
} | |||
if err != nil { | |||
plog.Debugf("failed to receive lease keepalive request from gRPC stream (%q)", err.Error()) | |||
ev, _ := status.FromError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spzala Let's wrap this as a separate function inside etcdserver/api/v3rpc
.
func isClientCtxErr(err error) error {
ev, _ := status.FromError(err)
...
return true // when code == codes.Canceled || code == codes.DeadlineExceeded
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, yup, makes sense specially that we are doing for all of the messages. Thanks!
@spzala We should do this for all streaming error handling (watch,keepalive receive/send). There are total 6 error we need to handle (see for others https://github.com/coreos/etcd/pull/8939/files). |
etcdserver/api/v3rpc/watch.go
Outdated
@@ -26,6 +26,9 @@ import ( | |||
pb "github.com/coreos/etcd/etcdserver/etcdserverpb" | |||
"github.com/coreos/etcd/mvcc" | |||
"github.com/coreos/etcd/mvcc/mvccpb" | |||
|
|||
"google.golang.org/grpc/codes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation.
@gyuho ah, I notice that in your previous PR earlier but thought user interruption/cancellation only impacts 'receive', something that I could test by cancelling watch or lease. But sounds good, I will be covering all. Is there anyway I can test 'send' as well quickly or let's not worry about testing send. Thanks! |
WIP. |
Does this silence the client watch canceling logs when |
@gyuho thanks! So when --debug is on (
And sure I will squash the commits. |
etcdserver/api/v3rpc/util.go
Outdated
@@ -81,3 +81,9 @@ func togRPCError(err error) error { | |||
} | |||
return grpcErr | |||
} | |||
|
|||
func isClientCtxErr(err error) bool { | |||
ev, _ := status.FromError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to be more defensive, we should check the error against context.DeadlineExceeded and context.Canceled when the fromError fails with error. i am not sure if grpc.send and recv will always return statusError type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Does it make sense that we pass stream.Context().Err()
(or sws.gRPCStream.Context().Err()
when needed) to this func and modify the func as below?:
func isClientCtxErr(err error) bool {
if err == nil {
return false
}
return err == context.Canceled || err == context.DeadlineExceeded
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure how exactly the client side context error gets propagated to server side. maybe the stream.Context().Err() only returns the server side context error not the one propagated from the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe add another parameter to isClientCtxErr
function to parse stream.Context
errors as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gyuho !!
@@ -81,3 +81,12 @@ func togRPCError(err error) error { | |||
} | |||
return grpcErr | |||
} | |||
|
|||
func isClientCtxErr(ctxErr error, err error) bool { | |||
if ctxErr != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ctxErr != nil {
return true
}
ev, err := status.FromError(err)
if err != nil {
return false
}
code := ev.Code()
return code == codes.Canceled || code == codes.DeadlineExceeded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On ctxErr != nil, return true
(instead of ctxErr == context.Canceled || ctxErr == context.DeadlineExceeded) makes sense since context errors seems either context.Canceled
or context.DeadlineExceeded
etcdserver/api/v3rpc/util.go
Outdated
if ctxErr != nil { | ||
return ctxErr == context.Canceled || ctxErr == context.DeadlineExceeded | ||
} | ||
ev, _ := status.FromError(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when there is an error, ev will be nil. if we try to get ev.Code on nil, it might panic. see the code above to improve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiang90 thanks!! Sure, agree. As you mentioned and just looking at status doc, since FromError can return nil we should handle it. I didn't think of it seeing few other places we aren't checking for nil on status.FromError (e.g in client.go and other places) and now it seems like we should, but that's something I will do in separate PR and see if sense that makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I guess we can check on nil or boolean.)
Lgtm. Defer to @gyuho |
@spzala Let's squash commits into one with commit title |
The context error with cancel code is typically for user cancellation which should be at debug level. For other error codes we should display a warning. Fixes etcd-io#9085
The context error with cancel code is typically for user cancellation which
should be at debug level. For other error codes we should display a warning.
Fixes #9085
Contributing guidelines
Please read our contribution workflow before submitting a pull request.