Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd upgrade failure from kubernetes 1.21.4 -> 1.22.1 #7937

Closed
cristicalin opened this issue Sep 4, 2021 · 1 comment · Fixed by #7938
Closed

etcd upgrade failure from kubernetes 1.21.4 -> 1.22.1 #7937

cristicalin opened this issue Sep 4, 2021 · 1 comment · Fixed by #7938
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@cristicalin
Copy link
Contributor

During an upgrade from kubernetes 1.21.4 (with containerd and calico) to kubernetes 1.22.1, the etcd upgrade failed and the upgrade was rolled back.

Environment:

  config file = /root/kubespray/ansible.cfg
  configured module search path = ['/root/kubespray/library']
  ansible python module location = /root/venv/lib/python3.8/site-packages/ansible
  executable location = /root/venv/bin/ansible
  python version = 3.8.10 (default, Jun  2 2021, 10:49:15) [GCC 9.4.0]
  • Version of Python (python --version):
Python 3.8.10

Kubespray version (commit) (git rev-parse --short HEAD):

d5e8797a (with crictl fixes from https://github.com/kubernetes-sigs/kubespray/pull/7936)

Network plugin used:

Calico 3.20.0

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

ubuntu-nuc-00.kaveman.intra | SUCCESS => {
    "hostvars[inventory_hostname]": {                                                                    
        "ansible_check_mode": false, 
        "ansible_config_file": "/root/kubespray/ansible.cfg",        
        "ansible_connection": "local",       
        "ansible_diff_mode": false,                                                                      
        "ansible_facts": {},                                                                             
        "ansible_forks": 5,                  
        "ansible_inventory_sources": [         
            "/root/inventory.ini"                                                                        
        ],                                                                                               
        "ansible_playbook_python": "/root/venv/bin/python",
        "ansible_verbosity": 0,
        "ansible_version": {       
            "full": "2.10.11",    
            "major": 2,                    
            "minor": 10,                   
            "revision": 11,      
            "string": "2.10.11"                 
        },                                                                                               
        "calico_advertise_cluster_ips": true,                                                            
        "calico_advertise_service_loadbalancer_ips": [                             
            "10.5.0.0/16"                  
        ],                               
        "calico_datastore": "kdd",                                                                       
        "calico_felix_prometheusmetricsenabled": true,                                                   
        "calico_ip_auto_method: \"interface": "eno1\"\"",
        "calico_ipip_mode": "Never",
        "calico_iptables_backend": "NFT",
        "calico_version": "v3.20.0",
        "calico_vxlan_mode": "Never",
        "cert_manager_enabled": false,
        "container_manager": "containerd",
        "dns_min_replicas": 1,
        "dns_prevent_single_point_failure": "false", 
        "download_container": false,
        "dynamic_kubelet_configuration": true,
        "etcd_kubeadm_enabled": true,
        "force_certificate_regeneration": true,
        "group_names": [
            "etcd",
            "k8s_cluster",
            "kube_control_plane",
            "kube_node"
        ],
        "groups": {
            "all": [
                "ubuntu-nuc-00.kaveman.intra"
            ],
            "calico_rr": [],
            "etcd": [
                "ubuntu-nuc-00.kaveman.intra"
            ],
            "k8s_cluster": [
                "ubuntu-nuc-00.kaveman.intra"
            ],
            "kube_control_plane": [
                "ubuntu-nuc-00.kaveman.intra"
            ],
            "kube_node": [
                "ubuntu-nuc-00.kaveman.intra"
            ],
            "ungrouped": []
        },
        "helm_enabled": true,
        "ingress_nginx_enabled": true,
        "ingress_nginx_host_network": true,
        "inventory_dir": "/root",
        "inventory_file": "/root/inventory.ini",
        "inventory_hostname": "ubuntu-nuc-00.kaveman.intra",
        "inventory_hostname_short": "ubuntu-nuc-00", 
        "kata_containers_enabled": true,
        "kata_containers_version": "2.1.0",
        "kube_encrypt_secret_data": true,
        "kube_network_plugin_multus": true,
        "kube_proxy_strict_arp": true,
        "kube_version": "v1.22.1",
        "kubernetes_audit": true,
        "local_as": 64512,
        "metallb_controller_tolerations": [
            {
                "effect": "NoSchedule",
                "key": "node-role.kubernetes.io/master"
            },
            {
                "effect": "NoSchedule",
                "key": "node-role.kubernetes.io/control-plane"
            }
        ],
        "metallb_enabled": true,
        "metallb_ip_range": [
            "10.5.0.0/16"
        ],
        "metallb_protocol": "bgp",
        "metallb_speaker_enabled": false,
        "metrics_server_enabled": true,
        "nerdctl_enabled": true,
        "nodelocaldns_bind_metrics_host_ip": true,
        "nodelocaldns_external_zones": [
            {
                "cache": 30,
                "nameservers": [
                    "192.168.0.1"
                ],
                "zones": [
                    "kaveman.intra"
                ]
            }
        ],
        "omit": "__omit_place_holder__20d9ffeefac342598c3d43f0748b87bc0a174122",
        "peer_with_router": true,
        "peers": [
            {
                "as": "64520",
                "name": "gateway",
                "router_id": "192.168.0.1",
                "scope": "global"
            }
        ],
        "playbook_dir": "/root/kubespray",
        "resolvconf_mode": "host_resolvconf",
        "typha_enabled": true,
        "upgrade_cluster_setup": true
    }
}

Command used to invoke ansible:

ansible-playbook -i ../inventory.ini cluster.yml -vvvv

Output of ansible run:

        "Static pod: etcd-ubuntu-nuc-00.kaveman.intra hash: 2f298c5a6b9d58ec166f7bc71c708f85",                                                                                                                     
        "Static pod: etcd-ubuntu-nuc-00.kaveman.intra hash: 2f298c5a6b9d58ec166f7bc71c708f85",                                                                                                                     
        "Static pod: etcd-ubuntu-nuc-00.kaveman.intra hash: 2f298c5a6b9d58ec166f7bc71c708f85",                                                                                                                     
        "Static pod: etcd-ubuntu-nuc-00.kaveman.intra hash: 2f298c5a6b9d58ec166f7bc71c708f85",                                                                                                                     
        "Static pod: etcd-ubuntu-nuc-00.kaveman.intra hash: 2f298c5a6b9d58ec166f7bc71c708f85",                                                                                                                     
        "[upgrade/etcd] Failed to upgrade etcd: couldn't upgrade control plane. kubeadm has tried to recover everything into the earlier state. Errors faced: timed out waiting for the condition",                
        "[upgrade/etcd] Waiting for previous etcd to become available",                                                                                                                                            
        "[upgrade/etcd] Etcd was rolled back and is now available"                                                                                                                                                 
    ]                                                                                                                                                                                                              
}                                                                                                                

Anything else do we need to know:

@cristicalin cristicalin added the kind/bug Categorizes issue or PR as related to a bug. label Sep 4, 2021
@cristicalin
Copy link
Contributor Author

Turns our my issue was related to dynamic_kubelet_configuration=True in my inventory. Kubernetes 1.22.1 requires a feature gate enabled for dynamic configuration according to the logs:

Sep 04 11:50:53 ubuntu-nuc-00.kaveman.intra kubelet[2517133]: Flag --dynamic-config-dir has been deprecated, Feature DynamicKubeletConfig is deprecated in 1.22 and will not move to GA. It is planned to be removed from Kubernetes in the version 1.23. Please use alternative ways to update kubelet configuration.

We should ensure this is properly disabled when kube_version >= 1.22.0 to avoid upgrade issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant