Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes cluster become unresponsive after one node goes down #31553

Closed
mayanksinghse opened this issue Jan 28, 2022 · 3 comments
Closed

Kubernetes cluster become unresponsive after one node goes down #31553

mayanksinghse opened this issue Jan 28, 2022 · 3 comments
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@mayanksinghse
Copy link

We did setup of k8s cluster on datacenter with 2 master and 5 worker node by using kubeadm for initialization of cluster .Added Cluster config for reference.

kind: InitConfiguration apiVersion: kubeadm.k8s.io/v1beta3 localAPIEndpoint: advertiseAddress: VIP_ADDRESS --- kind: ClusterConfiguration apiVersion: kubeadm.k8s.io/v1beta3 kubernetesVersion: v1.22.0 controlPlaneEndpoint: "MANAGEMENT_VIP_ADDRESS:6444" networking: podSubnet: 192.168.0.0/16 --- kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 0.0.0.0 cgroupDriver: ADVERTISED_CGROUP_DRIVER_NAME shutdownGracePeriod: 6m shutdownGracePeriodCriticalPods: 4m

Version used for diffrent components. etcd:3.5.0-0 kube-apiserver:v1.22 kube-controller-manager:v1.22 kubelet-1.22.1

Issue we do have is once we shutdown node one, then complete server start misbehaving and nodes become read only on most of cases.Even we are not able to run kubectl command.

Getting Exception in kube api log:-

W0128 11:23:25.351294 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.347009 1 clientconn.go:1326] [core] grpc: addrConn.createTransport failed to connect to {127.0.0.1:2379 127.0.0.1 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting... W0128 11:23:26.352155 1 clientconn.go:1

@k8s-ci-robot
Copy link
Contributor

@mayanksinghse: This issue is currently awaiting triage.

SIG Docs takes a lead on issue triage for this website, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 28, 2022
@neolit123
Copy link
Member

neolit123 commented Jan 28, 2022 via email

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Jan 28, 2022
@k8s-ci-robot
Copy link
Contributor

@neolit123: Closing this issue.

In response to this:

You might be seeing kubernetes/kubeadm#2567
Which is a missing etcd feature for checking single members vs whole
cluster health. Kubeadm is waiting on core k8s to support a version of etcd
that has it.

You can comment on that issue.

But also, even numbers of CP nodes do not make sense. You need min 3. Check
what quorum means in the etcd docs.

This is not a k/website issue.
/kind support
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

3 participants