Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] docs: update do-not-disrupt description #6977

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jmdeal
Copy link
Contributor

@jmdeal jmdeal commented Sep 10, 2024

Fixes #N/A

Description
Updates the description for karpenter.sh/do-not-disrupt to reflect the changes made when TGP was introduced.

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jmdeal jmdeal requested a review from a team as a code owner September 10, 2024 23:31
Copy link

netlify bot commented Sep 10, 2024

Deploy Preview for karpenter-docs-prod ready!

Name Link
🔨 Latest commit 18d505d
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/66e0d6f169ab4f00086332c8
😎 Deploy Preview https://deploy-preview-6977--karpenter-docs-prod.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 10802199988

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 83.04%

Totals Coverage Status
Change from base Build 10799626785: 0.02%
Covered Lines: 5513
Relevant Lines: 6639

💛 - Coveralls

@@ -411,6 +411,7 @@ Karpenter should now be pulling and operating against the v1beta1 APIVersion as
* API Rename: NodePool’s ConsolidationPolicy `WhenUnderutilized` is now renamed to `WhenEmptyOrUnderutilized`
* Behavior Changes:
* Expiration is now forceful and begins draining as soon as it’s expired. Karpenter does not wait for replacement capacity to be available before draining, but will start provisioning a replacement as soon as the node is expired and begins draining.
* Pods with the `karpenter.sh/do-not-disrupt` annotation now block node termination. Termination of a node with these pods will be blocked until those pods are removed, enter a terminating or terminal state, or the NodeClaims's TerminationGracePeriod has expired.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link out to the section of the docs that describes TerminationGracePeriod here?

@@ -274,8 +282,14 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa

### Pod-Level Controls

You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod.
You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.
You can treat this annotation as a single-node, permanently blocking PDB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can treat this annotation as a single-node, permanently blocking PDB.
You can treat this annotation as a single-pod, permanently blocking PDB.

You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.
You can treat this annotation as a single-node, permanently blocking PDB.
This has the following consequences:
- Nodes with `do-not-disrupt` pods will be excluded from **voluntary** disruption, i.e. [Consolidation]({{<ref "#consolidation" >}}) and [Drift]({{<ref "#drift" >}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't voluntary disruption include drift? I'm not sure that they are considered different things here

You can treat this annotation as a single-node, permanently blocking PDB.
This has the following consequences:
- Nodes with `do-not-disrupt` pods will be excluded from **voluntary** disruption, i.e. [Consolidation]({{<ref "#consolidation" >}}) and [Drift]({{<ref "#drift" >}}).
- Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
- Like pods with a fully blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).

You can treat this annotation as a single-node, permanently blocking PDB.
This has the following consequences:
- Nodes with `do-not-disrupt` pods will be excluded from **voluntary** disruption, i.e. [Consolidation]({{<ref "#consolidation" >}}) and [Drift]({{<ref "#drift" >}}).
- Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}).
- Like pods with a blocking PDB, pods with the `do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{ref "#terminationcontroller"}}). These pods will either run to completion or be forcefully terminated when the node is near its terminationGracePeriod

Consider linking to terminationGracePeriod if you update the docs wording in this way too

@@ -274,8 +282,14 @@ Duration and Schedule must be defined together. When omitted, the budget is alwa

### Pod-Level Controls

You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod. This is useful for pods that you want to run from start to finish without disruption. By opting pods out of this disruption, you are telling Karpenter that it should not voluntarily remove a node containing this pod.
You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can block Karpenter from voluntarily choosing to disrupt certain pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.
You can block Karpenter from voluntarily disrupting pods by setting the `karpenter.sh/do-not-disrupt: "true"` annotation on the pod.


You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field. This field defines the duration of time that a node can be draining before it's forcibly deleted. A node begins draining when it's deleted. Pods will be deleted preemptively based on its TerminationGracePeriodSeconds before this terminationGracePeriod ends to give as much time to cleanup as possible. Note that if your pod's terminationGracePeriodSeconds is larger than this terminationGracePeriod, Karpenter may forcibly delete the pod before it has its full terminationGracePeriod to cleanup.
You can set a NodePool's `terminationGracePeriod` through the `spec.template.spec.terminationGracePeriod` field.
This field defines the duration of time that a node can be draining before it's forcibly deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This field defines the duration of time that a node can be draining before it's forcibly deleted.
This field defines the duration of time that a node can be draining before it's forcibly deleted.


This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter. Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.
This is especially useful in combination with `nodepool.spec.template.spec.expireAfter` to define an absolute maximum on the lifetime of a node, where a node is deleted at `expireAfter` and finishes draining within the `terminationGracePeriod` thereafter.
Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pods blocking eviction like PDBs and do-not-disrupt will block full draining until the `terminationGracePeriod` is reached.
Pods blocking eviction like PDBs and `do-not-disrupt` will block full draining until the `terminationGracePeriod` is reached.


For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish.
For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.
For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining `23h` after its creation. The NodeClaim will then be allowed to drain for up to `1h` before its forcefully terminated from the cluster.


For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`. Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds. If the node hasn't been fully drained after `55m`, Karpenter will delete the pod to allow it's full `terminationGracePeriodSeconds` to cleanup. If no pods are blocking draining, Karpenter will cleanup the node as soon as the node is fully drained, rather than waiting for the NodeClaim's `terminationGracePeriod` to finish.
For instance, a NodeClaim with `terminationGracePeriod` set to `1h` and an `expireAfter` set to `23h` will begin draining after it's lived for `23h`.
Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds.
Let's say a `do-not-disrupt` pod has `TerminationGracePeriodSeconds` set to `300` seconds (`5m`).

@jmdeal jmdeal changed the title docs: update do-not-disrupt description [DRAFT] docs: update do-not-disrupt description Sep 17, 2024
Copy link
Contributor

github-actions bot commented Oct 2, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants