-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: annotate every cancel cause with message #5278
Conversation
Otherwise error messages still just show up as "context canceled" without any extra context on where the cancelation came from. Signed-off-by: Erik Sipsma <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't WithStack
enough for understanding the code-path where the error appears?
If we are changing the text message of the error message then I think we should do it in smaller patches, showing how it updates the error shown to users in these cases. So that we see the before and after, and can confirm that the new error message is improvement.
@tonistiigi Because
Can you explain what you mean by "showing how it updates the error shown to users in these cases"? The error messages would be modified to include the text I annotated in the error message. Part of the problem here is that we have no idea where the cancelation is coming from right now, so I can't really say what the error messages will look like until this change is made. If there's concerns with this I can just put it in a fork instead for dagger instead, just trying to avoid this. I think you also mentioned that some upstream buildkit users were reporting |
Yes, but it is hard for me to predict the full error message user sees without examples, and evaluate if that message would be clear to them. For some of them like the
Why can't you show the stack trace with the line locations? |
Agreed, but surely any information at all is better than the current state of just "context cancelled" on the client? Currently to debug one of these, if a user gets Aside from just making some of this easier to debug, it would also help improve triage (both for buildkit directly, and how we consume it in dagger), with issues where there are large numbers of users complaining about |
@jedevc For the user after they hit ctrl-c and build gracefully stops the meaningful error would be "your build was cancelled". While "context canceled" isn't the most amazing English, "solve status done after timeout" doesn't make it any better for the user in my opinion. @colinhemmings offtopic, but we should probably replace the user message with a nice sentence in this case, at least when we can confirm that both "user invoked cancellation request" and server shut down with "context canceled". |
Note that even if you don't want to show the full stack trace, you could parse out the frame that matches cancellation and just add "(filename:linenumber)" in the end of the error string. |
I had tried this a while back but failed, took a look again and realized it's something very dumb: if you get an error from I had been I sent out a dagger PR for this (dagger/dagger#8553). I'm not 100% sure if the stack trace parsing is really robust, but if there's more complication with all this we can just make a temporary fork of buildkit that has the change in this PR to ease debugging and run with that until we have figured out what's going wrong with those If there's anything worth upstreaming let me know @tonistiigi but will close this for now. |
If you use
Any cases you have for improving error messages shown to users for specific code paths, we are definitely interested in upstream. |
We've been hitting some random ephemeral
failed to compute cache key: context canceled
errors for a while now, which have been very hard to track down.Buildkit does use ContextCause everywhere, but only wraps the error using
github.com/pkg/errors.WithStack
so unless I'm missing something obvious the error messages don't actually contain any extra useful information.This changes everywhere to use
errors.Wrap
, which includes the stack trace same as before but also includes a message so error messages here will have actual context on the cause.I just went ahead and updated the entire codebase rather than those most likely to be a culprit for this bug since it seemed worth squashing this everywhere, but can be more selective if there's objections for any reason.
Also happy to be told I'm missing something obvious and there's a better way of going about this.