-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure ContainerHealthy condition is set back to True #15503
Open
SaschaSchwarze0
wants to merge
1
commit into
knative:main
Choose a base branch
from
SaschaSchwarze0:sascha-set-container-healthy-true
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+63
−0
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SaschaSchwarze0 hi, should we relax the condition?
We set the revision as containerUnhealthy when a "permanent" like failure is detected:
Resetting this should happen when we have at least one pod up (deployment.Status.AvailableReplicas>0) no ?
I am thinking of a bursty load scenario where not all pods become ready (eg. some new stay in pending state and some old recover from the old issue and become ready). However, then we keep the revision in false ready status if we don't reach the desired number of pods as set in the deployment replicas field.
Could this be true if we have a bursty load where deployment is set to replicas> currentScale>= minsCale >= 1 directly due to some autoscaling decision? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @skonto, sorry for the late response. I was out for some time.
Can you please help me and clarify what you're after. The code change I am making is to set container healthy to true. You now come up with a discussion and a related condition about when container healthy should be set to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SaschaSchwarze0 I am saying that a revision is serving traffic even when pods are not all ready right? So resetting this to true only when
*deployment.Spec.Replicas == deployment.Status.ReadyReplicas
seems a bit too strict no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Correct, a revision may be serving traffic even if it does have unhealthy containers in some of the replicas.
Whether that means that it is okay to set ContainerHealthy to True if already one of the replicas is running fine, I do not know.
My proposed condition basically was when I think that ContainerHealthy should definitely be set to true (because all replicas are fully ready).
How strict or relaxed we can be, I do not know. At some point I tried to find a spec on the exact meaning of the conditions of the different resources. But I had not found something.
What your condition can btw lead to is the following:
Would not happen with my proposed code because if one pod is not ready, there are not all replicas of the deployment ready. Anyway, just checking the first pod to determine whether ContainerHealthy is set to False is another questionable piece of code because it leads to different results when one has different pod statuses depending on the order in which the pods are returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree in general. Btw I am also trying to figure out the semantics. Maybe we should have at least one ready to reset to true (assuming we list all pods and we have containerHealthy=false). 🤔
This should happen only when all pods fail with probably the same reason, that is my understanding for the intention thus the check for available replicas=0.
if *deployment.Spec.Replicas > 0 && deployment.Status.AvailableReplicas == 0 {
@dprotaso any background info to add for the limitations here?