-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699
Comments
Relevant Slack thread: https://kubernetes.slack.com/archives/C6BRQSH2S/p1623781138240100 Interestingly enough, I can't reproduce it on clean install of 4.7.0-0.okd-2021-06-13-090745. |
I started with 4.7.0-0.okd-2021-05-22-050008 which still uses FCOS-33. |
@aalgera you could check MachineConfigs/99-worker-okd-extensions, and if |
This should be resolved by openshift/okd-machine-os#145 and vrutkovs/machine-config-operator@90c5c91 |
I've removed all extensions and then MCO turned into DEGRATED STATE because of the existing nodes not being able to uninstall the packages as it wasn't requested. As the node Isn't in production yet I just removed the node and then MCO reconciliated. Although I think there's a RedHat KB that provides a workaround to bypass it without removing the node. After that I just scaled up new nodes with success. |
Verified the scale up works again, installed 06-04, upgraded to 06-13 and then to 2021-06-19-191547 nightly. Upgrade worked correctly with 99-okd-master/worker-extensions and machineset can be scaled up. |
Fixed in https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.7.0-0.okd-2021-06-19-191547, please reopen if this still happens |
After upgrading from 4.7.0-0.okd-2021-06-04-191031 to 4.7.0-0.okd-2021-06-13-090745, I am not able to create new worker nodes. I do manage to acess the machine and find that the configuration of this machine halts due to an error in machine-config-daemon-firstboot.service.
In the logs of this service I found the following error:
jun 17 20:54:35 localhost machine-config-daemon[2363]: I0617 20:54:35.639020 2363 rpm-ostree.go:261] Running captured: rpm-ostree update --install NetworkManager-ovs --install glusterfs --install glusterfs-fuse --install qemu-guest-agent jun 17 20:54:36 localhost machine-config-daemon[2363]: I0617 20:54:36.370100 2363 update.go:439] Rolling back applied changes to OS due to error: error running rpm-ostree update --install NetworkManager-ovs --install glusterfs --install glusterfs-fuse --install qemu-guest-agent: error: "NetworkManager-ovs" is already provided by: NetworkManager-ovs-1:1.30.4-1.fc34.x86_64. Use --allow-inactive to explicitly require it. jun 17 20:54:36 localhost machine-config-daemon[2363]: : exit status 1
I did a clean install starting with 4.7.0-0.okd-2021-05-22-050008, updating to 4.7.0-0.okd-2021-06-04-191031 and then to 4.7.0-0.okd-2021-06-13-090745. I didn't see this problem in 4.7.0-0.okd-2021-05-22-050008 and 4.7.0-0.okd-2021-06-04-191031
Version
4.7.0-0.okd-2021-06-13-090745 -IPI VMWare
How reproducible
100% reproducible, I did a clean install of
Log bundle
The text was updated successfully, but these errors were encountered: