-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“kube-subnet-mgr” doesn't work over 100 nodes #719
Comments
@drinktee Do you have the logs from the API server when this was happening? |
It's possible (though I can't see how) that it could be related to the "100" here - https://github.com/coreos/flannel/blob/master/subnet/kube/kube.go#L129 |
I'm running into the same issue - kube-flannel seems to hang on new nodes once the cluster has reached 100 running nodes. The kube-flannel logs shows "Waiting 10m0s for node controller to sync", but that timeout never seems to expire. I don't see any red flags in the logs myself, but I've included them below. Logs from a broken kube-flannel pod, which has been running for ~30 minutes now:
API server logs:
|
Thanks @Capitrium - I managed to repro the problem. I've added a fix in #729 |
See the flannel-git repo on quay if you want an image to try out. |
Expected Behavior
I have setup a 170 nodes kubernetes cluster. I used this daemonset yaml to deploy flannel. Network was set 172.17.0.0/16. I found that only aboud 100 nodes works and other nodes didn't hava flannel0 , flannel.1.
Current Behavior
The log prints 'Waiting %s for node controller to syn'. I found that the code seems to hang at this place.
After I delete '--kube-subnet-mgr' and use etcd to config network , the cluster works.Every node has a flannel interface.
Your Environment
The text was updated successfully, but these errors were encountered: