From 25eac06e7098046a74179bc814bb9d7a4a63835d Mon Sep 17 00:00:00 2001 From: Christian Kotzbauer Date: Sat, 19 Aug 2023 10:35:04 +0200 Subject: [PATCH 1/4] doc: add drain-pod-selector (#71) Signed-off-by: Christian Kotzbauer --- content/en/docs/configuration.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/en/docs/configuration.md b/content/en/docs/configuration.md index 0a80816..defec42 100644 --- a/content/en/docs/configuration.md +++ b/content/en/docs/configuration.md @@ -17,6 +17,7 @@ Flags: --annotate-nodes if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots --blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots --drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default -1) + --drain-pod-selector string only drain pods with labels matching the selector (default: '', all pods) --drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time) --ds-name string name of daemonset on which to place lock (default "kured") --ds-namespace string namespace containing daemonset on which to place lock (default "kube-system") From 9b0429a4ef998a86a996a3242f6bd24b9bbc54c9 Mon Sep 17 00:00:00 2001 From: nkinkade Date: Sat, 19 Aug 2023 02:35:28 -0600 Subject: [PATCH 2/4] doc: Documents new --metrics-host flag (#69) https://github.com/kubereboot/kured/pull/811 Signed-off-by: Nathan Kinkade --- content/en/docs/configuration.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/content/en/docs/configuration.md b/content/en/docs/configuration.md index defec42..b0ffefb 100644 --- a/content/en/docs/configuration.md +++ b/content/en/docs/configuration.md @@ -31,6 +31,7 @@ Flags: --message-template-drain string message template used to notify about a node being drained (default "Draining node %s") --message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s") --message-template-uncordon string message template used to notify about a node being successfully uncordoned (default "Node %s rebooted & uncordoned successfully!") + --metrics-host string host where metrics will listen (default "") --metrics-port int port number where metrics will listen (default 8080) --node-id string node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable --notify-url string notify URL for reboot notifications (cannot use with --slack-hook-url flags) @@ -175,7 +176,10 @@ indicates the presence of the sentinel file: kured_reboot_required{node="ip-xxx-xxx-xxx-xxx.ec2.internal"} 0 ``` -Note: Use `--metrics-port` to set a different post where metrics should listen. +Note: Use `--metrics-host` and/or `--metrics-port` to set a different address +where metrics should listen. The values of these flags will be put together +like ":" to define a complete listen address for the metrics +server. The purpose of this metric is to power an alert which will summon an operator if the cluster cannot reboot itself automatically for a From 7b02140b278391ea2924648d7923397baf22821b Mon Sep 17 00:00:00 2001 From: Christian Kotzbauer Date: Sat, 19 Aug 2023 10:48:28 +0200 Subject: [PATCH 3/4] doc: add version range Signed-off-by: Christian Kotzbauer --- content/en/docs/installation.md | 1 + 1 file changed, 1 insertion(+) diff --git a/content/en/docs/installation.md b/content/en/docs/installation.md index b74e242..4cba9e7 100644 --- a/content/en/docs/installation.md +++ b/content/en/docs/installation.md @@ -24,6 +24,7 @@ server: | kured | {k8s.io/,}kubectl | k8s.io/client-go | k8s.io/apimachinery | expected kubernetes compatibility | | ------ | ----------------- | ---------------- | ------------------- | --------------------------------- | +| 1.14.0 | 0.27.4 | v0.27.4 | v0.27.4 | 1.26.x, 1.27.x, 1.28.x | | 1.13.2 | 0.26.7 | v0.26.7 | v0.26.7 | 1.25.x, 1.26.x, 1.27.x | | 1.12.2 | 0.25.5 | v0.25.5 | v0.25.5 | 1.24.x, 1.25.x, 1.26.x | | 1.11.0 | 0.24.7 | v0.24.7 | v0.24.7 | 1.23.x, 1.24.x, 1.25.x | From 0b451168d3b7ad0db2a191a4d50bfe0b21e13636 Mon Sep 17 00:00:00 2001 From: Christian Kotzbauer Date: Sat, 19 Aug 2023 10:48:39 +0200 Subject: [PATCH 4/4] doc: add new arguments Signed-off-by: Christian Kotzbauer --- content/en/docs/configuration.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/content/en/docs/configuration.md b/content/en/docs/configuration.md index b0ffefb..16450a4 100644 --- a/content/en/docs/configuration.md +++ b/content/en/docs/configuration.md @@ -13,6 +13,7 @@ Usage: Flags: --alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts + --alert-filter-match-only Only block if the alert-filter-regexp matches active alerts --alert-firing-only only consider firing alerts when checking for active alerts --annotate-nodes if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots --blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots @@ -51,6 +52,7 @@ Flags: --slack-username string slack username for reboot notifications (default "kured") --start-time string schedule reboot only after this time of day (default "0:00") --time-zone string use this timezone for schedule inputs (default "UTC") + --concurrency number amount of nodes to concurrently reboot. (default 1) ``` ## Reboot Sentinel File & Period @@ -124,6 +126,12 @@ You can also only block reboots for firing alerts: --alert-firing-only=true ``` +When inverting the matching-logic, only matching alerts can block a reboot: + +```console +--alert-filter-match-only=true +``` + See the section on Prometheus metrics for an important application of this filter. @@ -247,3 +255,12 @@ the daemonset YAML provided in the repository. Similarly `--lock-annotation` can be used to change the name of the annotation kured will use to store the lock, but the default is almost certainly safe. + +## Concurrent reboots + +> Note: Concurrent reboots are not save for production environments as +> there are no safeguards related to workloads on simultaneously rebooted nodes. + +The `--concurrency` argument can be configured to reboot multiple nodes at once. +E.g. with `--concurrency=3` it would be allowed to reboot three nodes concurrently on max. +This is useful for development clusters where interruptions of workloads are okay.