Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enterprise Linux Sentinel Command seems bugged. #498

Closed
alb-dev opened this issue Jan 17, 2022 · 15 comments · Fixed by #504
Closed

Enterprise Linux Sentinel Command seems bugged. #498

alb-dev opened this issue Jan 17, 2022 · 15 comments · Fixed by #504
Labels

Comments

@alb-dev
Copy link

alb-dev commented Jan 17, 2022

Hello,

right now i am testing a k3s cluster with rocky linux 8.5. I deployed kured via helm chart with following configuration

  configuration:
      timeZone: "Europe/Berlin"
      startTime: "10pm"
      endTime: "2am"
      messageTemplateDrain: "⏳ Draining node %s"
      messageTemplateReboot: "♻️ Rebooted node %s"
      rebootSentinelCommand: "needs-restarting -r"
      rebootCommand: "/usr/bin/systemctl reboot"
      period: "1h0m0s"  

It seems like "period", "startTime" and "endTime" are ignored.
Beside that it looks like dnf-util's "needs-restarting -r" is not properly registered. Is something wrong with my implementation?

Best regards
alb

@ckotzbauer
Copy link
Member

ckotzbauer commented Jan 18, 2022

Hi @alb-dev,
can you please share the following infos?

  • Kubernetes version
  • Kured version
  • Kured Pod-Logs

If there are problems with config-parameters or commands this is visible in the logs. Kured only executes the given commands as is with "nsenter" https://github.com/weaveworks/kured/blob/96bf7c1addef0b31ec6c8ab49e927480f0c337e7/cmd/kured/main.go#L260-L269

@alb-dev
Copy link
Author

alb-dev commented Jan 18, 2022

Hey @ckotzbauer
Thanks for your fast response.

Kubernetes version: 1.23.1+k3s2
Kured version: 2.11.2
Kured Pod-Logs

time="2022-01-18T20:53:28Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2022-01-18T20:53:28Z" level=info msg="Kubernetes Reboot Daemon: 1.9.1"
time="2022-01-18T20:53:28Z" level=info msg="Node ID: wk-intel"
time="2022-01-18T20:53:28Z" level=info msg="Lock Annotation: kube-system/kured:weave.works/kured-node-lock"
time="2022-01-18T20:53:28Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2022-01-18T20:53:28Z" level=info msg="Lock release delay not set, lock will be released immediately after rebooting"
time="2022-01-18T20:53:28Z" level=info msg="PreferNoSchedule taint: "
time="2022-01-18T20:53:28Z" level=info msg="Blocking Pod Selectors: []"
time="2022-01-18T20:53:28Z" level=info msg="Reboot schedule: SunMonTueWedThuFriSat between 22:00 and 02:00 Europe/Berlin"
time="2022-01-18T20:53:28Z" level=info msg="Reboot check command: [needs-restarting -r] every 1h0m0s"
time="2022-01-18T20:53:28Z" level=info msg="Reboot command: [/usr/bin/systemctl reboot]"
time="2022-01-18T20:53:29Z" level=info msg="Waiting for process with pid 8223 to finish." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:53:29Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:53:29Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:53:30Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:53:30Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:54:29Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-18T20:54:29Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out

@ckotzbauer
Copy link
Member

Thanks @alb-dev for the details.
From my perspective it seems, that there is no reboot required. The log-messages "No core libraries or services have been updated since boot-up. Reboot should not be necessary." are emitted from "needs-restarting -r" as we log the stdout from the reboot-sentinel-command here.
Are you able to execute the "needs-restarting -r" directly on the host? There should be the same output and it should exit with code non-zero.
Can you check this?

@alb-dev
Copy link
Author

alb-dev commented Jan 19, 2022

I guess the command is working correctly @ckotzbauer as the kernel is up to date and the return code is 0.

What i dont understand is why the sentinel command is executed very minute instead of once an hour and why start and endtime are ignored.

Thanks for the support!

@ckotzbauer
Copy link
Member

Okay.

  • When the command exits with code 0, kured will assume, that a reboot is required. So you have to wrap the command in a script that only exits with code 0 when a reboot is required
  • The check is executed at the interval of the period-setting (start-time and end-time are ignored here)
  • The start-time and end-time are only important when it comes to reboots. The detection-logic can run all the time.
  • The period is set to 1h in your config, I see three command-outputs in your logs and they have a difference from 1s and 1m... This is a bit strange. Can you share logs from a longer period?

@alb-dev
Copy link
Author

alb-dev commented Jan 20, 2022

Hello

Yeah no Problem. Because of the new 5.1 cve a kernel update was release which may help with testing.

time="2022-01-20T21:26:04Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:26:04Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:27:05Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:27:05Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:28:05Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:28:05Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:29:05Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:29:05Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:30:06Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:30:06Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:31:06Z" level=info msg="No core libraries or services have been updated since boot-up." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:31:06Z" level=info msg="Reboot should not be necessary." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:32:07Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:32:07Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:32:07Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:32:07Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:32:07Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:33:07Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:33:07Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:33:07Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:33:07Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:33:07Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:34:08Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:34:08Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:34:08Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:34:08Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:34:08Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:35:08Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:35:08Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:35:08Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:35:08Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:35:08Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:36:09Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:36:09Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:36:09Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:36:09Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-01-20T21:36:09Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out

The kernel update is register from "needs-restarting" which is returning eh exit code 1 if a a reboot is required. Neither 0 nor 1 is triggering a reboot at the moment.

@ckotzbauer
Copy link
Member

Hm, as the time in the logs is 21:36:09 (UTC) and the time-frame which allows reboots is from 22:00 to 02:00 (UTC+1) I see no reason why the reboot is not triggered. I think the exit-code handling of custom-sentinel-commands is maybe not correctly working in your case.

@ckotzbauer ckotzbauer added bug and removed question labels Jan 24, 2022
@alb-dev
Copy link
Author

alb-dev commented Jan 24, 2022

I will try to build a wrapper output so that only 1 or 0 is returned. Maybe this will work for EL Distros.

@stephenl03
Copy link

Would something as simple as this wrapper script be sufficient? I'm about to deploy kured to a fedora based cluster and stumbled upon this open issue during my research.

needs-restarting -r && exit 1 || exit 0

@khuedoan
Copy link
Contributor

khuedoan commented Feb 22, 2022

I've tried the following config (and other combinations) but it doesn't seem to work:

configuration:
  rebootSentinelCommand: 'sh -c "! needs-restarting --reboothint"'
time="2022-02-22T19:30:04Z" level=info msg="Binding node-id command flag to environment variable: KURED_NODE_ID"
time="2022-02-22T19:30:04Z" level=info msg="Kubernetes Reboot Daemon: 1.9.1"
time="2022-02-22T19:30:04Z" level=info msg="Node ID: metal0"
time="2022-02-22T19:30:04Z" level=info msg="Lock Annotation: kured/kured:weave.works/kured-node-lock"
time="2022-02-22T19:30:04Z" level=info msg="Lock TTL not set, lock will remain until being released"
time="2022-02-22T19:30:04Z" level=info msg="Lock release delay not set, lock will be released immediately after rebooting"
time="2022-02-22T19:30:04Z" level=info msg="PreferNoSchedule taint: "
time="2022-02-22T19:30:04Z" level=info msg="Blocking Pod Selectors: []"
time="2022-02-22T19:30:04Z" level=info msg="Reboot schedule: SunMonTueWedThuFriSat between 00:00 and 23:59 UTC"
time="2022-02-22T19:30:04Z" level=info msg="Reboot check command: [sh -c ! needs-restarting --reboothint] every 1h0m0s"
time="2022-02-22T19:30:04Z" level=info msg="Reboot command: [/bin/systemctl reboot]"
time="2022-02-22T19:30:04Z" level=info msg="Waiting for process with pid 215543 to finish." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info msg="  * systemd" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:04Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info msg="  * systemd" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:30:05Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info msg="  * systemd" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:31:04Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info msg="  * systemd" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:32:04Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info msg="Core libraries or services have been updated since boot-up:" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info msg="  * kernel" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info msg="  * systemd" cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info msg="Reboot is required to fully utilize these updates." cmd=/usr/bin/nsenter std=out
time="2022-02-22T19:33:05Z" level=info msg="More information: https://access.redhat.com/solutions/27943" cmd=/usr/bin/nsenter std=out

Running the command manually on the host returns 0:

[root@metal0 ~]# sh -c "! needs-restarting -r"
Core libraries or services have been updated since boot-up:
  * kernel
  * systemd

Reboot is required to fully utilize these updates.
More information: https://access.redhat.com/solutions/27943
[root@metal0 ~]# echo $?
0

And same return code if I run it with nsenter:

[root@metal1 ~]# /usr/bin/nsenter sh -c "! needs-restarting -r"
Core libraries or services have been updated since boot-up:
  * kernel
  * systemd

Reboot is required to fully utilize these updates.
More information: https://access.redhat.com/solutions/27943
[root@metal1 ~]# echo $?
0

If I remove sh -c it will show:

nsenter: failed to execute !: No such file or directory

@khuedoan
Copy link
Contributor

My bad I need to wait for about an hour. The following config in values.yaml works:

configuration:
  rebootSentinelCommand: 'sh -c "! needs-restarting --reboothint"'

@alb-dev
Copy link
Author

alb-dev commented Feb 23, 2022

My bad I need to wait for about an hour. The following config in values.yaml works:

configuration:
  rebootSentinelCommand: 'sh -c "! needs-restarting --reboothint"'

Nice. I build a cron wrapper which create the reboot-required file which is kinda a dirty. I like your solution. Maybe this could be added to the docs?

@khuedoan
Copy link
Contributor

Sure, I've created a PR.

@radhika-pr
Copy link

radhika-pr commented Sep 28, 2023

Looks like the issue still exists for redhat (Red Hat Enterprise Linux release 8.8 (Ootpa).
Kured helm chart release - 5.2.0

The configuration tested were :

rebootSentinelCommand: 'sh -c "! needs-restarting --reboothint"'

rebootSentinelCommand: sh -c "! needs-restarting --reboothint"

default period is used for the values.
The configuration looks like below:

configuration:
  timeZone: "UTC"
  startTime: "8:00"
  endTime: "20:00"
  rebootDays: [mo,tu,we,th]
  drainTimeout: "30m"
  drainPodSelector: "longhorn.io/component!=instance-manager"
  forceReboot: false
  lockReleaseDelay: "30m"
  rebootSentinelCommand: sh -c "! needs-restarting --reboothint" 
  rebootCommand: sh -c "/bin/systemctl disable reboot-guard.service;reboot"
  logFormat: "json"
  metricsPort: 8089  

tolerations:
- key: node-role.kubernetes.io/controller
  operator: Exists

hostNetwork: true

service:
  create: true
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/path: "/metrics"
    prometheus.io/port: "8089"
  name: kured

logs from worker node:

{"cmd":"/usr/bin/nsenter","level":"info","msg":"Reboot should not be necessary.","std":"out","time":"2023-09-28T07:29:24Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Updating Subscription Management repositories.","std":"out","time":"2023-09-28T07:30:25Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"No core libraries or services have been updated since boot-up.","std":"out","time":"2023-09-28T07:30:26Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Reboot should not be necessary.","std":"out","time":"2023-09-28T07:30:26Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Updating Subscription Management repositories.","std":"out","time":"2023-09-28T07:31:26Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"No core libraries or services have been updated since boot-up.","std":"out","time":"2023-09-28T07:31:27Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Reboot should not be necessary.","std":"out","time":"2023-09-28T07:31:27Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Updating Subscription Management repositories.","std":"out","time":"2023-09-28T07:32:27Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"No core libraries or services have been updated since boot-up.","std":"out","time":"2023-09-28T07:32:28Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Reboot should not be necessary.","std":"out","time":"2023-09-28T07:32:28Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Updating Subscription Management repositories.","std":"out","time":"2023-09-28T07:33:29Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"No core libraries or services have been updated since boot-up.","std":"out","time":"2023-09-28T07:33:30Z"}
{"cmd":"/usr/bin/nsenter","level":"info","msg":"Reboot should not be necessary.","std":"out","time":"2023-09-28T07:33:30Z"}

@llyons
Copy link

llyons commented Jun 21, 2024

Where do you add the

  rebootSentinelCommand: "needs-restarting -r"
  rebootCommand: "/usr/bin/systemctl reboot"
  
  in the 1.15.1 version of the kured yaml I dont see where a configuration section is defined.
  
  I do see this
  
  ```
    template:
metadata:
  creationTimestamp: null
  labels:
    name: kured
spec:
  containers:
  - command:
    - /usr/bin/kured
    - --reboot-sentinel=/sentinel/reboot-required
    - --period=30m
    - --message-template-drain=Draining node %s
    - --message-template-reboot=Rebooting node %s
    - --message-template-uncordon=Node %s rebooted & uncordoned successfully!
    - --reboot-days=sun,mon,tue,wed,thu,fri,sat
    - --reboot-delay=90s
    - --start-time=10pm
    - --end-time=1am
    - --time-zone=America/Chicago
    - --log-format=text
    env:
    - name: KURED_NODE_ID
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: spec.nodeName
  ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants