Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When upgrading through suc the hosts file get extra entries from the pod #2934

Closed
Akvanvig opened this issue Oct 10, 2024 · 13 comments
Closed
Labels
bug Something isn't working

Comments

@Akvanvig
Copy link

Akvanvig commented Oct 10, 2024

Kairos version:

PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo
KAIROS_ID_LIKE="kairos-standard-ubuntu-24.04"
KAIROS_IMAGE_LABEL="24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1"
KAIROS_ARTIFACT="kairos-ubuntu-24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1+k3s1"
KAIROS_FLAVOR="ubuntu"
KAIROS_FLAVOR_RELEASE="24.04"
KAIROS_FAMILY="ubuntu"
KAIROS_MODEL="generic"
KAIROS_NAME="kairos-standard-ubuntu-24.04"
KAIROS_BUG_REPORT_URL="https://github.com/kairos-io/kairos/issues"
KAIROS_SOFTWARE_VERSION="v1.31.1+k3s1"
KAIROS_TARGETARCH="amd64"
KAIROS_GITHUB_REPO="kairos-io/kairos"
KAIROS_VERSION="v3.2.1-v1.31.1-k3s1"
KAIROS_REGISTRY_AND_ORG="quay.io/kairos"
KAIROS_HOME_URL="https://github.com/kairos-io/kairos"
KAIROS_ID="kairos"
KAIROS_PRETTY_NAME="kairos-standard-ubuntu-24.04 v3.2.1-v1.31.1-k3s1"
KAIROS_IMAGE_REPO="quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.2.1-k3sv1.31.1-k3s1"
KAIROS_VARIANT="standard"
KAIROS_RELEASE="v3.2.1"
KAIROS_SOFTWARE_VERSION_PREFIX="k3s"
KAIROS_VERSION_ID="v3.2.1-v1.31.1-k3s1"

CPU architecture, OS, and Version:

Linux localhost 6.8.0-45-generic #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Describe the bug

When upgrading using the routine documented for system-upgrade-controller, the containers /etc/hosts file seemingly gets merged with the hosts file and ends up with more and more entries.
This is after two upgrades using suc since provisioning:

root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
127.0.0.1       localhost
root@localhost:~#

To Reproduce

using the plan shown in documentation here, apply and upgrade a cluster. After reboot, check the /etc/hosts file:

root@localhost:~# cat /etc/hosts
127.0.0.1       localhost
root@localhost:~# kubectl apply -f upgrade.yaml 
plan.upgrade.cattle.io/os-upgrade configured
root@localhost:~# 
Broadcast message from root@localhost (Thu 2024-10-10 14:17:44 UTC):

The system will reboot now!
Connection to 192.168.122.63 closed.
$ ssh 192.168.122.63
$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
$

Expected behavior

It should work the same way as when upgrading using the kairos-agent ugrade command directly, so result in a hostsfile equal to the one we started with:

root@localhost:~# cat /etc/hosts
127.0.0.1       localhost
root@localhost:~#

Logs

Not been able to find any logs indicating what's gone wrong here

Additional context

Does not seem to affect upgrades using kairos-agent upgrade directly.
My guess is that it's related to the suc upgrade using the containers root as a source, but haven't yet found how it could be prevented. Not aware of any other files being affected in a similar way.

@Akvanvig Akvanvig added bug Something isn't working triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Oct 10, 2024
@Itxaka
Copy link
Member

Itxaka commented Oct 14, 2024

yep, confirmed.

Built with master, k3s image, set a single node k8s, upgrade with system-upgrade-controller, result sin duplicated lines in /etc/hosts

kairos@kairos-k3s:~$ cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1 localhost kairos-k3s
127.0.0.1 localhost kairos-k3s

Not sure whats going on, lol

@Itxaka
Copy link
Member

Itxaka commented Oct 14, 2024

can be reproduced by running the initramfs stage several times with kairos-agent run-stage initramfs

Seem like yip is not picking up or checking that the line exists?

@Itxaka
Copy link
Member

Itxaka commented Oct 14, 2024

using yip directly seems to work though??

@Itxaka
Copy link
Member

Itxaka commented Oct 14, 2024

ah seems that its the 31_host file from system/oem and its only run in initramfs.before

@Itxaka
Copy link
Member

Itxaka commented Oct 14, 2024

yes, somehow the check is failing so it recreates the hostname..

@Akvanvig
Copy link
Author

Could it be just the check in 31_host? Seems a bit strange that it would add the extra comment as well from the container then as in the example under describe the bug? 🤔
Checked on a cluster that had been upgraded a few times using suc, and ends up with one extra comment and one extra hosts line for each upgrade plus the original one

@Akvanvig
Copy link
Author

Akvanvig commented Oct 14, 2024

Went and tested in a vm, and seems like it's like you're saying and the extra 127.0.0.1 localhost line is simply the 31_hosts adding an extra line.
The extra comment line being added seems to be just Kubernetes mounting the node hosts file into the container and then adding its own comment to the top. (again)

Set up a pod that is about equal to the suc-container and there it is:

root@localhost:~# kubectl exec -it -n system-upgrade suc-busybox-test -- sh
/ # cat /etc/hosts 
# Kubernetes-managed hosts file (host network).
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost
/ # exit

root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1       localhost
127.0.0.1       localhost

I guess the second problem could be solved by dropping comments in the hosts file in yip (unless that's something you guys aim to not modify) somewhere in this loop 🤔
https://github.com/mudler/yip/blob/master/pkg/plugins/hostname.go#L74-L82
If modifying yip is not an option, the suc-upgrade.sh could maybe be modified to either remove comments with sed first or maybe copy the original from /host/etc/hosts and then do the upgrade here?
https://github.com/kairos-io/packages/blob/main/packages/system/suc-upgrade/suc-upgrade.sh#L39

@Itxaka
Copy link
Member

Itxaka commented Oct 15, 2024

This patch seems to alleviate it, after 2 upgrades I no longer get the duplicated entries: kairos-io/packages#1113

I do get duplicated comments though. I still dont get why. If k8s mounts stuff under /etc/hosts from the host into the container, thats ok but the upgrade should just ignore that and copy it. Plus, /etc is ephemeral so after a reboot it should go away?

The only thing I can see touching that file is that yaml file... No idea where the duplication comes from, could it be that the plugin is adding extra lines somehow? But maybe the underlying /etc in the image does have the /etc/hosts duplication ??

Im really confused over this one

@Itxaka Itxaka removed triage Add this label to issues that should be triaged and prioretized in the next planning call unconfirmed labels Oct 15, 2024
@Akvanvig
Copy link
Author

Looking at the Kairos-container available in the kairos registries it contains an etc/hosts file, but this is empty so that explains why it is reset once you upgrade with kairos-agent upgrade $image

On the other hand when it's ran as pod in kubernetes, then kubernetes will give it a hosts file based on either cluster-network or node (host-network). This seems to be what is causing problems here, I don't really have a good solution for this though as long as the pod-fs is mounted and used to upgrade.
My suggestion about overwriting existing file or copying the file from host doesn't seem to work though based on some testing since I couldn't find a way to delete/overwrite the file from the pods.

Not sure how you could get around this? I assume the upgrade command when provided the --source just takes the entire OS there and packs it up?

kubernetes doc: https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/

@Itxaka
Copy link
Member

Itxaka commented Oct 16, 2024

Not sure how you could get around this? I assume the upgrade command when provided the --source just takes the entire OS there and packs it up?

Yep, it does. Maybe we should either skip the hosts file or overwrite it on each boot before filling the hostname? so we start from initramfs with a clean hosts file that we know its "clean" on each boot?

@Itxaka
Copy link
Member

Itxaka commented Oct 16, 2024

mmmh, going into the upgrade container I can see this:

/dev/disk/by-label/COS_PERSISTENT on /etc/hosts type ext4 (rw,relatime)

so its storing the hosts file in the persistent partition. But only on the running container, outside in the host the /etc/hosts is not persistent...

@Itxaka
Copy link
Member

Itxaka commented Oct 16, 2024

somehow somewhere, with the patch this suddenly seems to be fixed. There is also another patch that may affect this, that changes the config read paths as we were not reading the current system paths for configs (kairos-io/kairos-agent#579)

I could not reproduce it anymore with framework 2.14.1 (latest agent and cloud configs). I need to try it again tomorrow freom a clean image though, but it may have gone away.

@Itxaka
Copy link
Member

Itxaka commented Oct 22, 2024

  • built kairos from master earthly +iso --FLAVOR=ubuntu --FLAVOR_RELEASE=24.04 --FAMILY=ubuntu --MODEL=generic --VARIANT=standard --BASE_IMAGE=ubuntu:24.04
  • Installed as normal, checked /etc/hosts, checked version KAIROS_VERSION_ID="v3.2.1-23-g409dc0d"
  • Installed system-upgrade-controller
  • Built and pushed an "upgrade" image which is the same one but with a different version in /etc/kairos-release
FROM quay.io/kairos/ubuntu:24.04-standard-amd64-generic-v3.2.1-23-g409dc0d
RUN sudo sed -i 's/^KAIROS_VERSION=.*/KAIROS_VERSION="10.0.0"/' /etc/kairos-release
  • applied the upgrade
---
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: os-upgrade
  namespace: system-upgrade
  labels:
    k3s-upgrade: server
spec:
  concurrency: 1
  # This is the version (tag) of the image to upgrade to.
  version: "test5"
  nodeSelector:
    matchExpressions:
      - {key: kubernetes.io/hostname, operator: Exists}
  serviceAccountName: system-upgrade
  cordon: false
  drain:
    force: false
    disableEviction: true
  upgrade:
    # Here goes the image which is tied to the flavor being used.
    # You can also specify your custom image stored in a public registry.
    image: ttl.sh/upgradekairos
    command:
    - "/usr/sbin/suc-upgrade"
  • on reboot checked version KAIROS_VERSION="10.0.0" and checked /etc/hosts and they seem okay.
root@localhost:~# cat /etc/hosts
# Kubernetes-managed hosts file (host network).
127.0.0.1	localhost

So I think this is fixed. Im closing it, please reopen if it happens again :D

@Itxaka Itxaka closed this as completed Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants