-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Engine healthcheck: deal with empty uuid file #2926
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lmbarros would logging something in the journal not work (and persisting on request if needed)? I know this is meant to be temporary but the writing logs to disk without persistent logging enabled makes me uneasy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexgg, I expected this would be controversial. :-)
Here's my reasoning:
ctr version
creating an empty uuid file), we don't even know if they ever happen. Asking a user to enable persistent logging after the fact would almost certainly mean that we'd lose data.date
of the last event).stat
ing these files to get their creation date.)The code is already logging to the journal. I can remove the creation of the breadcrumb files, but I think it could make it quite harder for us to get to the root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @lmbarros
We ask the user to enable persistent logging, reproduce, and then provide the logs
I fail to see how this is better that having all the events logged in persistent logs
Let's not speculate, if we don't write there is no media wear. If we do write, we don't have data to tell.
Could you summarize why file logging is better in any way than enabling persistent logging before the event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But also, I don't think logging these will help us debug the root cause. We will also need extra debug in containerd to understand when the empty file is being created, and we need those events to be in context with other system events to understand for example whether containerd is killed. So enabling persistent logging to debug the root cause is something we will need anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @alexgg,
The problem is that we don't know how to reproduce. My goal here is to collect data about a rare event that would be otherwise lost. If we know how to reproduce I wouldn't bother with the breadcrumbs.
I largely, but not completely agree here.
First, all evidence we have so far points that this is a rare event. What do we do after asking a user to enable persistent logging? Maybe we remove the uuid or replace it with an empty one to try to force the issue happening? There's a very high probability the result will be a fully working device and we don't learn anything. (Maybe you have some different procedure in mind that what I said.)
Second, I would be particularly interested in learning that
ctr version
could complete successfully and still generate an empty uuid file. This is a sanity check: I don't think this can possibly happen, but if it ever does it means that there's something important out of our radar. I am not sure we'd be able to detect this from inside containerd.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to look at it: what's the worst thing that could happen if we leave the breadcrumbs creation in? Even if all our beliefs about how rare this is are completely wrong, we'll make one or two writes to disk every time the health checks run. This is not good, granted, but for years we did much worse than that (when we used to run a container as part of the health check, this causes even more writes to disk).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @lmbarros there are several things I don't like:
But there might be something that persistent logs are missing that I can't see. I'll open a thread in Zulip to discuss how you view the debugging.