Skip to content

Commit

Permalink
Add label-with-exclude-from-external-lbs CLI argument to enable gra…
Browse files Browse the repository at this point in the history
…ceful removal/addition from external load balancers

Previously, kured issued the system reboot command without first
removing nodes from any connected external load balancers (ELBs).

This behavior caused downtime on restart because ELBs send traffic to
kube-proxy pods running on nodes until the ELB health checks fail or the
node is de-registered explicitly.

This patch solves the problem by adding a command line argument
(`label-with-exclude-from-external-lbs`) that, when enabled, adds a
"node.kubernetes.io/exclude-from-external-load-balancers" label to nodes
undergoing kured reboot. This label tells the Kubernetes control plane
to de-register the affected node from any connected ELBs. The node label
is removed after restart which causes the control plane to re-register
the node with the ELBs.

Close #358
  • Loading branch information
amorey committed Aug 26, 2021
1 parent 3682eb3 commit a9cdeb4
Show file tree
Hide file tree
Showing 7 changed files with 167 additions and 30 deletions.
59 changes: 30 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,35 +84,36 @@ The following arguments can be passed to kured via the daemonset pod template:

```console
Flags:
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only bool only consider firing alerts when checking for active alerts
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot bool force a reboot even if the drain is still running (default: false)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--lock-release-delay duration hold lock after reboot by this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--notify-url url for reboot notifications (cannot use with --slack-hook-url flags)
--period duration reboot check period (default 1h0m0s)
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-command string command to run when a reboot is required by the sentinel (default "/sbin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
--slack-channel string slack channel for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notfications (default "kured")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC")
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only bool only consider firing alerts when checking for active alerts
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default: -1)
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node (default: 0)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot bool force a reboot even if the drain is still running (default: false)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
-h, --help help for kured
--label-with-exclude-from-external-lbs add "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot (default: false)
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--lock-release-delay duration hold lock after reboot by this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--notify-url url for reboot notifications (cannot use with --slack-hook-url flags)
--period duration reboot check period (default 1h0m0s)
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-command string command to run when a reboot is required by the sentinel (default "/sbin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-sentinel string path to file whose existence signals need to reboot (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a successful run signals need to reboot (default ""). If non-empty, sentinel file will be ignored.
--slack-channel string slack channel for reboot notfications
--slack-hook-url string slack hook URL for reboot notfications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notfications (default "kured")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC")
```

### Reboot Sentinel File & Period
Expand Down
1 change: 1 addition & 0 deletions charts/kured/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ The following changes have been made compared to the stable chart:
| `configuration.startTime` | cli-parameter `--start-time` | `""` |
| `configuration.timeZone` | cli-parameter `--time-zone` | `""` |
| `configuration.annotateNodes` | cli-parameter `--annotate-nodes` | `false` |
| `configuration.labelWithELBX` | cli-parameter `--label-with-exclude-from-external-lbs` | `false` |
| `rbac.create` | Create RBAC roles | `true` |
| `serviceAccount.create` | Create a service account | `true` |
| `serviceAccount.name` | Service account name to create (or use if `serviceAccount.create` is false) | (chart fullname) |
Expand Down
3 changes: 3 additions & 0 deletions charts/kured/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,9 @@ spec:
{{- if .Values.configuration.annotateNodes }}
- --annotate-nodes={{ .Values.configuration.annotateNodes }}
{{- end }}
{{- if .Values.configuration.labelWithELBX }}
- --label-with-exclude-from-external-lbs={{ .Values.configuration.labelWithELBX }}
{{- end }}
{{- range $key, $value := .Values.extraArgs }}
{{- if $value }}
- --{{ $key }}={{ $value }}
Expand Down
1 change: 1 addition & 0 deletions charts/kured/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ configuration:
startTime: "" # only reboot after this time of day (default "0:00")
timeZone: "" # time-zone to use (valid zones from "time" golang package)
annotateNodes: false # enable 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' node annotations to signify kured reboot operations
labelWithELBX: false # add "node.kubernetes.io/exclude-from-external-load-balancers" label to nodes undergoing kured reboot
lockReleaseDelay: 0 # hold lock after reboot by this amount of time (default 0, disabled)

rbac:
Expand Down
71 changes: 70 additions & 1 deletion cmd/kured/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ var (
rebootEnd string
timezone string
annotateNodes bool
labelWithELBX bool

// Metrics
rebootRequiredGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Expand All @@ -88,6 +89,15 @@ const (
KuredMostRecentRebootNeededAnnotation string = "weave.works/kured-most-recent-reboot-needed"
)

const (
// ExcludeFromELBsLabelKey is a label key that tells the K8S control plane to exclude a node from external load balancers
ExcludeFromELBsLabelKey = "node.kubernetes.io/exclude-from-external-load-balancers"
// ExcludeFromELBsLabelVal is a label value used to track label placement by kured
ExcludeFromELBsLabelVal = "kured-remove-after-reboot"
// ExcludeFromELBsLabelKeyEscaped is the escaped label key value passed to the Patch() function
ExcludeFromELBsLabelKeyEscaped = "node.kubernetes.io~1exclude-from-external-load-balancers"
)

func init() {
prometheus.MustRegister(rebootRequiredGauge)
}
Expand Down Expand Up @@ -163,7 +173,9 @@ func main() {

rootCmd.PersistentFlags().BoolVar(&annotateNodes, "annotate-nodes", false,
"if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots")

rootCmd.PersistentFlags().BoolVar(&labelWithELBX, "label-with-exclude-from-external-lbs", false,
"if set, the label 'node.kubernetes.io/exclude-from-external-load-balancers' will be added to nodes undergoing kured reboots")

if err := rootCmd.Execute(); err != nil {
log.Fatal(err)
}
Expand Down Expand Up @@ -348,6 +360,54 @@ func release(lock *daemonsetlock.DaemonSetLock) {
}
}

func enableExcludeFromELBs(client kubernetes.Interface, nodeID string) error {
log.Infof("Adding ExcludeFromELBs label to node")

// Add ExcludeFromELBs node label
labelPatch := fmt.Sprintf(`[{"op":"add","path":"/metadata/labels/%s","value":"%s" }]`, ExcludeFromELBsLabelKeyEscaped, ExcludeFromELBsLabelVal)
_, err := client.CoreV1().Nodes().Patch(context.Background(), nodeID, types.JSONPatchType, []byte(labelPatch), metav1.PatchOptions{})
if err != nil {
log.Errorf("Unable to add ExcludeFromELBs label to node: %s", err.Error())
return err
}

return nil
}

func disableExcludeFromELBs(client kubernetes.Interface, nodeID string) error {
ctx := context.Background()

// Get node
node, err := client.CoreV1().Nodes().Get(ctx, nodeID, metav1.GetOptions{})
if err != nil {
log.Warnf("Unable to find node: %s", nodeID)
return err
}

// Check ExcludeFromELBs node label
labelVal, ok := node.Labels[ExcludeFromELBsLabelKey]
if !ok {
return nil
}

// Different label value found
if labelVal != ExcludeFromELBsLabelVal {
log.Debugf("Found ExcludeFromELBs label on node with value: '%s' (no action taken)", labelVal)
return nil
}

// Remove ExcludeFromELBs node label
labelPatch := fmt.Sprintf(`[{"op":"remove","path":"/metadata/labels/%s"}]`, ExcludeFromELBsLabelKeyEscaped)
_, err = client.CoreV1().Nodes().Patch(ctx, nodeID, types.JSONPatchType, []byte(labelPatch), metav1.PatchOptions{})
if err != nil {
log.Errorf("Unable to remove ExcludeFromELBs label from node: %s", err.Error())
return err
}

log.Infof("Removed ExcludeFromELBs label from node")
return nil
}

func drain(client *kubernetes.Clientset, node *v1.Node) {
nodename := node.GetName()

Expand Down Expand Up @@ -492,6 +552,9 @@ func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []s
log.Fatal(err)
}

// Remove ExcludeFromELBs label immediately to allow ELB registration
disableExcludeFromELBs(client, nodeID)

lock := daemonsetlock.New(client, nodeID, dsNamespace, dsName, lockAnnotation)

nodeMeta := nodeMeta{}
Expand All @@ -500,6 +563,7 @@ func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []s
if err != nil {
log.Fatalf("Error retrieving node object via k8s API: %v", err)
}

if !nodeMeta.Unschedulable {
uncordon(client, node)
}
Expand All @@ -513,6 +577,7 @@ func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []s
deleteNodeAnnotation(client, nodeID, KuredRebootInProgressAnnotation)
}
}

throttle(releaseDelay)
release(lock)
}
Expand Down Expand Up @@ -583,6 +648,10 @@ func rebootAsRequired(nodeID string, rebootCommand []string, sentinelCommand []s
continue
}

if labelWithELBX {
enableExcludeFromELBs(client, nodeID)
}

drain(client, node)

if rebootDelay > 0 {
Expand Down
61 changes: 61 additions & 0 deletions cmd/kured/main_test.go
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
package main

import (
"context"
"reflect"
"testing"

log "github.com/sirupsen/logrus"
"github.com/weaveworks/kured/pkg/alerts"
assert "gotest.tools/v3/assert"
k8sv1 "k8s.io/api/core/v1"
k8smetav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
k8sfake "k8s.io/client-go/kubernetes/fake"

papi "github.com/prometheus/client_golang/api"
)
Expand Down Expand Up @@ -223,3 +227,60 @@ func Test_rebootRequired_fatals(t *testing.T) {
}

}

func newK8STestClient() *k8sfake.Clientset {
// Init test client and add a node
client := k8sfake.NewSimpleClientset()

node := &k8sv1.Node{
ObjectMeta: k8smetav1.ObjectMeta{
Name: "test-node-id",
Labels: map[string]string{"key1": "val1"}, // seed map so patch `add` operation will work
},
}

_, err := client.CoreV1().Nodes().Create(context.TODO(), node, k8smetav1.CreateOptions{})
if err != nil {
log.Fatal(err.Error())
}

return client
}

func Test_enableExcludeFromELBs(t *testing.T) {
testclient := newK8STestClient()

// Test that method returns error if node doesn't exist
err := enableExcludeFromELBs(testclient, "doesnt-exist")
assert.Error(t, err, "nodes \"doesnt-exist\" not found")

// Test that method adds ExcludeFromELBs label if node exists
err = enableExcludeFromELBs(testclient, "test-node-id")
assert.Assert(t, err == nil)

nodes, err := testclient.CoreV1().Nodes().List(context.TODO(), k8smetav1.ListOptions{})
labels := nodes.Items[0].Labels
assert.Equal(t, labels[ExcludeFromELBsLabelKey], ExcludeFromELBsLabelVal)
}

func Test_disableExcludeFromELBs(t *testing.T) {
testclient := newK8STestClient()

// Test that method returns error if node doesn't exist
err := disableExcludeFromELBs(testclient, "doesnt-exist")
assert.Error(t, err, "nodes \"doesnt-exist\" not found")

// Test that method executes silently on existing node without pre-existing ExcludeFromELBs label
err = disableExcludeFromELBs(testclient, "test-node-id")
assert.Assert(t, err == nil)

// Add ExcludeFromELBs label and check that method removes it
enableExcludeFromELBs(testclient, "test-node-id")
err = disableExcludeFromELBs(testclient, "test-node-id")
assert.Assert(t, err == nil)

nodes, err := testclient.CoreV1().Nodes().List(context.TODO(), k8smetav1.ListOptions{})
labels := nodes.Items[0].Labels
_, ok := labels[ExcludeFromELBsLabelKey]
assert.Assert(t, !ok)
}
1 change: 1 addition & 0 deletions kured-ds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,5 @@ spec:
# - --end-time=23:59:59
# - --time-zone=UTC
# - --annotate-nodes=false
# - --label-with-exclude-from-external-lbs=false
# - --lock-release-delay=30m

0 comments on commit a9cdeb4

Please sign in to comment.