Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear wording in “Disruptions” concept page #22391

Closed
mltsy opened this issue Jul 6, 2020 · 14 comments
Closed

Unclear wording in “Disruptions” concept page #22391

mltsy opened this issue Jul 6, 2020 · 14 comments
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. language/en Issues or PRs related to English language lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@mltsy
Copy link
Contributor

mltsy commented Jul 6, 2020

This is a Bug Report

Problem:
Under the "Dealing with Disruptions" heading, it says "The frequency of voluntary disruptions varies. On a basic Kubernetes cluster, there are no voluntary disruptions at all." This is just not true... unless I'm missing some very limited definition of "Basic Kubernetes Cluster".

Essentially every possible "voluntary disruption" mentioned earlier on the page does happen on a basic Kubernetes cluster (deploying a new revision, deleting a pod accidentally, upgrading the cluster, etc.). I assume what is meant here is that no voluntary disruptions are automated on a basic K8s cluster? But I'm not sure...

Proposed Solution:
Possibly change the wording to "On a basic Kubernetes cluster, there is no automated process that causes voluntary disruptions (all processes involving voluntary disruption are initiated manually, by default)."

Page to Update:
https://kubernetes.io/docs/concepts/workloads/pods/disruptions/

@sftim
Copy link
Contributor

sftim commented Jul 6, 2020

On a basic Kubernetes cluster, there are no voluntary disruptions at all.

This seems valid to me, although the text doesn't back this up with an explanation. The blue-green deployment pattern is simpler to implement than, say, a canary release

If you use blue-green deployments then you have two Deployments (blue and green) and neither of these has any voluntary disruptions whilst in service.

If rewording this page to make things clearer, bear in mind that coarse-grained deployments are easier to implement than the fine-grained kind that need to heed PodDisruptionBudget etc.

@mltsy the text you suggested:

On a basic Kubernetes cluster, there is no automated process that causes voluntary disruptions (all processes involving voluntary disruption are initiated manually, by default).

also doesn't feel quite right, because you can have a very basic-looking cluster where deployments (ie, updates to Deployments) are triggered automatically. I think that might even be on the CKAD syllabus, it's that fundamental.

(Aside, but relevant: PodDisruptionBudget and EvenPodsSpread are both beta features).

@mltsy
Copy link
Contributor Author

mltsy commented Jul 6, 2020

Interesting... though I'm not sure I understand what your opinion is. First you say that the phrase "there are no voluntary disruptions at all" seems valid to you, but then you are also saying it's incorrect to claim there are no automated voluntary disruptions because "you can have a very basic-looking cluster where deployments are triggered automatically" ... so does a basic cluster have or not have voluntary disruptions? If it has "no voluntary disruptions at all" then it certainly doesn't have any automated voluntary disruptions, right?

Even if we ignore the case of deployments for the time being, and expect that on a "basic cluster" you should be using blue-green deployments, surely someone who launches a basic cluster will update the cluster with a new K8s version at some point, right? That's a voluntary disruption. It just seemed confusing to me, after I just got done reading all the things that might cause a "voluntary disruption" to say that they never happen on a "basic Kubernetes cluster".

I guess I'm just not sure what the point of that sentence is in the first place. I think it's meant to illustrate the difference between something like a hosted cluster (like GKE), and the most basic kind you might setup on your own... where in the case of a hosted cluster, there are automations, by default, that will cause voluntary disruptions, whereas in the case of a default cluster setup on your own, there would not be, unless you set them up. (But that doesn't mean there will be no voluntary disruptions ever on that cluster - it just means they aren't going to happen unless you make them happen)

@sftim
Copy link
Contributor

sftim commented Jul 6, 2020

We'd have to ask the original author to uncover their intent.

I suspect they were trying to distinguish between cases where people aren't worried about voluntary disruptions and comparing that to the cases where someone is actively keen to manage the impact of voluntary disruptions, whether those are from application rollouts or from cluster upgrades.

@mltsy
Copy link
Contributor Author

mltsy commented Jul 6, 2020

Sure - that makes sense. Maybe better wording, then, would be:

"On a basic Kubernetes cluster, there are no hidden/default automated voluntary disruptions to worry about."

That seems more to the point and a bit less confusing to me. Does it seem at least as accurate as the current wording?

@sftim
Copy link
Contributor

sftim commented Aug 10, 2020

/kind cleanup
/language en
/priority backlog
/retitle Unclear wording in “Disruptions” concept page

@k8s-ci-robot k8s-ci-robot changed the title Issue with k8s.io/docs/concepts/workloads/pods/disruptions/ Unclear wording in “Disruptions” concept page Aug 10, 2020
@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. language/en Issues or PRs related to English language priority/backlog Higher priority than priority/awaiting-more-evidence. labels Aug 10, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2020
@mltsy
Copy link
Contributor Author

mltsy commented Nov 9, 2020

/remove-lifecycle stale

The confusing documentation still exists.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 9, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2021
@mltsy
Copy link
Contributor Author

mltsy commented Feb 8, 2021

/remove-lifecycle stale

I think this could be resolved by replacing:
"On a basic Kubernetes cluster, there are no voluntary disruptions at all."
with:
"On a basic Kubernetes cluster, there are no automated voluntary disruptions."

Although is this actually true? The text mentions "Removing a pod from a node to permit something else to fit on that node." which I believe is something a basic Kubernetes cluster will do if you try to deploy something that doesn't fit on a node currently, but could by shifting other pods around... isn't it? That strikes me as an automated process (although it has a manual catalyst) that causes a voluntary disruption.

If that's the case, it might be better to say: "Rescheduling (moving) other pods during a deployment is the only voluntary disruption that may be considered automated (or indirectly triggered at least) on a basic Kubernetes cluster." Or maybe more succinctly and more to the point... "Every multi-node cluster is, by default, subject to some voluntary disruption"

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2021
@sftim
Copy link
Contributor

sftim commented Feb 8, 2021

Removing a pod from a node to permit something else to fit on that node.

AIUI Kubernetes does not come with a descheduler but you can add one in, such as: https://github.com/kubernetes-sigs/descheduler

/sig scheduling

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Feb 8, 2021
@mltsy
Copy link
Contributor Author

mltsy commented Feb 8, 2021

Ah! Okay, well that simplifies it then. (I'm using GKE, so it must have use a custom de/scheduler if Kubernetes doesn't come with a standard one that has this behavior) Yeah, it looks like the default plugin that determines this behavior is the DefaultPreemption plugin, which only evicts pods if a priority are set, which is not a default, so I think it's safe to say that's not a "default" behavior (since you have to enable it by creating and using PriorityClasses).

So, how about: "On a basic Kubernetes cluster, there are no automated voluntary disruptions (only user-triggered ones)."

@sftim
Copy link
Contributor

sftim commented Mar 3, 2021

So, how about: "On a basic Kubernetes cluster, there are no automated voluntary disruptions (only user-triggered ones)."

(Sounds good to me)

mltsy pushed a commit to mltsy/website-1 that referenced this issue Mar 4, 2021
mltsy added a commit to mltsy/website-1 that referenced this issue Mar 4, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2021
@mltsy
Copy link
Contributor Author

mltsy commented Jun 2, 2021

This is resolved :)

@mltsy mltsy closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. language/en Issues or PRs related to English language lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants