-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watching stops working after 10 minutes #85
Comments
I have same issue with aks 1.16.10 and sidecar 0.1.151. |
@monotek Could you please take care of it? |
Please try with current version 0.1.193. |
The same issue. |
As a containment I'm using But what's causing the problem in I created #90, maybe this helps 🤞🏼 |
@auroaj & @PeterGerrard |
@axdotl |
Any updates on this? I'm getting the same issue and sidecar stops working after couple of minutes. |
No, as you can see there was no feedback if it works with image 0.1.209. |
I have tried it with image 0.1.209, and it doesn't work |
Do anyone of you know whats the last working version? |
I think it is a general issue. Run long time w/ |
So lastg working k8s version would be interesting too. |
The title says after 10min it stops to work, I adjusted the test to add a new configmap after 11min and it works with all k8s versions. Can anyone say after what time they observed the problem? |
My assumption is, that this is related to interrupts to kube-api. This might cause the resource watching to stop. |
3-4 hours for me. |
Checked with kiwigrid/k8s-sidecar:1.1.0 and AKS K8s Rev: v1.17.11. |
There is an interesting fix in the Kubernetes Python Client v12.0.0. From the https://github.com/kubernetes-client/python/blob/release-12.0/CHANGELOG.md
|
Thanks for merging. I updated the deployment yesterday to the docker image tag 0.1.259 and this morning, 15 hours later, it still detects modifications on configmaps 👍
And so the resource watcher gets restarted. BTW, the tag 1.2.0 had a build error, that's why I used 0.1.259 from the CI build. |
Hmmm...
I've checked twice, with
I'll try to check it in another location. |
I'm also still getting an issue with version 'k8s-sidecar:0.1.259'. It stops working after a couple of minutes |
Checked in another location. Unfortunately, it was updated to v1.17.11 too.
|
are we sure that 0.1.259 matches the 1.2.0 release ? the deploy stage failed: https://app.circleci.com/pipelines/github/kiwigrid/k8s-sidecar/47/workflows/f0000c91-ba71-42b7-828d-0f235915ab29/jobs/274 |
It seems to be. The only commit in the
However, I still had the same issue as @qaiserali where it stopped working after a few minutes. |
Any luck from anyone with a more recent version ? We moved to SLEEP method, but now the dashboards are not getting removed on ConfigMap deletion :( |
@djsly We noticed exactly the same thing about dashboards not getting deleted. With a specific use of this sidecar (Grafana) earlier today, this behaviour under So we went back to In our case |
I think there was a similar issue with the notifications API that affected aadpodidentity as well. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm also seeing this issue. Running |
Still experiencing this issue on AKS with k8s-sidecar:1.14.2. Is there still the problem that when using LIST deleted resources are not deleted? |
yes. We're not keeping track of resources, with |
Issue is reproducible with 1.14.2 sidecar as well. [2021-11-10 03:47:48] ProtocolError when calling kubernetes: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) |
Thanks for the report. Unfortunately I have no idea what's going wrong here. Any ideas? Current state is that we pass urllib So I'd be happy to follow any pointers you might have. |
Unfortunately I have no idea either. |
Anyone here is familiar with how the network timeouts are configured when using watches/informers with client-go? |
Ah, I always thought this repo is in golang. Now I checked the code for the first time with @vsliouniaev's kubernetes-client/python#1148 (comment) comment in mind. Apparently, details about these are now covered in https://github.com/kubernetes-client/python/blob/master/examples/watch/timeout-settings.md.
We can give the below change a try here. Update this part: k8s-sidecar/sidecar/resources.py Lines 193 to 199 in cbb48df
As this: additional_args = {
'label_selector': label_selector,
# Tune default timeouts as outlined in
# https://github.com/kubernetes-client/python/issues/1148#issuecomment-626184613
# https://github.com/kubernetes-client/python/blob/master/examples/watch/timeout-settings.md
# I picked 60 and 66 due to https://github.com/nolar/kopf/issues/847#issuecomment-971651446
# 60 is a polite request to the server, asking it to cleanly close the connection after that.
# If you have a network outage, this does nothing.
# You can set this number much higher, maybe to 3600 seconds (1h).
'timeout_seconds': os.environ.get(WATCH_SERVER_TIMEOUT, 60),
# 66 is a client-side timeout, configuring your local socket.
# If you have a network outage dropping all packets with no RST/FIN,
# this is how long your client waits before realizing & dropping the connection.
# You can keep this number low, maybe 60 seconds.
'_request_timeout': os.environ.get(WATCH_CLIENT_TIMEOUT, 66),
}
...
stream = watch.Watch().stream(getattr(v1, _list_namespace[namespace][resource]), **additional_args) This is also effectively what the alternative kopf based implementation does here, also see nolar/kopf#585 on the historical context on these settings: And, ironically the kopf (which OmegaVVeapon/kopf-k8s-sidecar is based on) project has nolar/kopf#847 currently open which seems to be related, but I guess that's another edge case. We have been using kopf in our AKS clusters without regular issues. But this particular kiwigrid/k8s-sidecar issue is quite frequent. I'll try to give my suggestion above a chance if I can reserve some time, but given that we already have a workaround in place (grafana/helm-charts#18 (comment)), it won't likely be soon. |
@bergerx Thanks a lot for this detailed analysis. I took the opportunity and incorporated your proposal in a PR. Hopefully we can get this issue fixed with it. 🤞 |
Watching a set of configmaps fails to be alerted of new changes after 10 minutes of no changes.
##Repro steps
Expected Behaviour
Will see a modification occur
Actual Behaviour
Nothing
Done on AKS with kubernetes version 1.16.10
The text was updated successfully, but these errors were encountered: