Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699

aalgera · 2021-06-17T21:27:22Z

After upgrading from 4.7.0-0.okd-2021-06-04-191031 to 4.7.0-0.okd-2021-06-13-090745, I am not able to create new worker nodes. I do manage to acess the machine and find that the configuration of this machine halts due to an error in machine-config-daemon-firstboot.service.

In the logs of this service I found the following error:
jun 17 20:54:35 localhost machine-config-daemon[2363]: I0617 20:54:35.639020 2363 rpm-ostree.go:261] Running captured: rpm-ostree update --install NetworkManager-ovs --install glusterfs --install glusterfs-fuse --install qemu-guest-agent jun 17 20:54:36 localhost machine-config-daemon[2363]: I0617 20:54:36.370100 2363 update.go:439] Rolling back applied changes to OS due to error: error running rpm-ostree update --install NetworkManager-ovs --install glusterfs --install glusterfs-fuse --install qemu-guest-agent: error: "NetworkManager-ovs" is already provided by: NetworkManager-ovs-1:1.30.4-1.fc34.x86_64. Use --allow-inactive to explicitly require it. jun 17 20:54:36 localhost machine-config-daemon[2363]: : exit status 1
I did a clean install starting with 4.7.0-0.okd-2021-05-22-050008, updating to 4.7.0-0.okd-2021-06-04-191031 and then to 4.7.0-0.okd-2021-06-13-090745. I didn't see this problem in 4.7.0-0.okd-2021-05-22-050008 and 4.7.0-0.okd-2021-06-04-191031

Version
4.7.0-0.okd-2021-06-13-090745 -IPI VMWare

How reproducible
100% reproducible, I did a clean install of

Log bundle

The text was updated successfully, but these errors were encountered:

vrutkovs · 2021-06-17T21:29:15Z

Relevant Slack thread: https://kubernetes.slack.com/archives/C6BRQSH2S/p1623781138240100

Interestingly enough, I can't reproduce it on clean install of 4.7.0-0.okd-2021-06-13-090745.
I'll check if its happening on 2021-06-04 -> 2021-06-13 and then try the scale up

aalgera · 2021-06-17T21:41:02Z

I started with 4.7.0-0.okd-2021-05-22-050008 which still uses FCOS-33.

Udbv · 2021-06-18T07:59:22Z

@aalgera you could check MachineConfigs/99-worker-okd-extensions, and if NetworkManager-ovs is present there - remove

vrutkovs · 2021-06-18T11:22:43Z

This should be resolved by openshift/okd-machine-os#145 and vrutkovs/machine-config-operator@90c5c91

uselessidbr · 2021-06-18T22:16:56Z

@aalgera you could check MachineConfigs/99-worker-okd-extensions, and if NetworkManager-ovs is present there - remove

I've removed all extensions and then MCO turned into DEGRATED STATE because of the existing nodes not being able to uninstall the packages as it wasn't requested. As the node Isn't in production yet I just removed the node and then MCO reconciliated. Although I think there's a RedHat KB that provides a workaround to bypass it without removing the node.

After that I just scaled up new nodes with success.

vrutkovs · 2021-06-19T20:54:14Z

Verified the scale up works again, installed 06-04, upgraded to 06-13 and then to 2021-06-19-191547 nightly. Upgrade worked correctly with 99-okd-master/worker-extensions and machineset can be scaled up.

vrutkovs · 2021-06-20T07:17:46Z

Fixed in https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.7.0-0.okd-2021-06-19-191547, please reopen if this still happens

yaroslavkasatikov · 2021-07-14T09:50:53Z

Hi @vrutkovs

we have this issue again in #441

vrutkovs mentioned this issue Jun 18, 2021

Node provisioning failing in 4.7.0-0.okd-2021-06-13-090745 #702

Closed

vrutkovs closed this as completed Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699

Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699

aalgera commented Jun 17, 2021

vrutkovs commented Jun 17, 2021 •

edited

Loading

aalgera commented Jun 17, 2021

Udbv commented Jun 18, 2021

vrutkovs commented Jun 18, 2021 •

edited

Loading

uselessidbr commented Jun 18, 2021

vrutkovs commented Jun 19, 2021

vrutkovs commented Jun 20, 2021

yaroslavkasatikov commented Jul 14, 2021 •

edited

Loading

Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699

Cannot create new worker node after upgrade from 4.7.0-0.okd-2021-06-04-191031 - VMWare IPI #699

Comments

aalgera commented Jun 17, 2021

vrutkovs commented Jun 17, 2021 • edited Loading

aalgera commented Jun 17, 2021

Udbv commented Jun 18, 2021

vrutkovs commented Jun 18, 2021 • edited Loading

uselessidbr commented Jun 18, 2021

vrutkovs commented Jun 19, 2021

vrutkovs commented Jun 20, 2021

yaroslavkasatikov commented Jul 14, 2021 • edited Loading

vrutkovs commented Jun 17, 2021 •

edited

Loading

vrutkovs commented Jun 18, 2021 •

edited

Loading

yaroslavkasatikov commented Jul 14, 2021 •

edited

Loading