Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch pods times out/stops recognizes new notices after ~30 mins #512

Open
khchau7 opened this issue Aug 11, 2021 · 2 comments
Open

Watch pods times out/stops recognizes new notices after ~30 mins #512

khchau7 opened this issue Aug 11, 2021 · 2 comments

Comments

@khchau7
Copy link

khchau7 commented Aug 11, 2021

client.watch_pods times out/stops recognizing new notices if there is no watch event that occurs for about 30 minutes even with timeouts set (open=60, read=nil) when creating the kubeclient.
The watch_pods is wrapped in an infinite loop, so if it simply exits or raises an exception, there is a mechanism in place to sleep and then restart the watch. However, we currently don't break out of watch_pods and instead time out and not act on any new notices received.

Might need to set the tcp keep alive (keep_alive_timeout) in the http_options in kubeclient.rb to avoid this. Need to replicate the behavior provided in client-go as specified here: https://github.com/yangl900/knet
"The k8s client-go by default turns on TCP Keepalive, and the client side will send an ACK packet to API server every 30s. With this, even though the SLB default timeout is 4 minutes, the TCP connection will never be idle and so will never be reset."

Any other ideas as to why this could be occurring?

@cben
Copy link
Collaborator

cben commented Aug 17, 2021

https://github.com/yangl900/knet is very interesting, I need to absorb the info there...
I had "folk knowledge" that whatever you do, apiserver eventually closes inactive connections, but can't point to a source. Anyway more understanding & tuning are good.

Note that keeping an open connection for many minutes with no activity is likely to cause a "version too old" if it does get disconnected. Consider also adding kubeclient support to opt-in to watch bookmarks which reduce this issue:
https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/956-watch-bookmark#motivation

@chrisjohnson
Copy link

We're seeing this as well. When the actual connection gets closed, the watch command exits with an error (as documented) which we catch and use to restart the listener. But there is something that happens where the listener stops getting notices, but doesn't actually exit, leaving the process more or less stuck. We're going to add a timer-based kill on our side as a workaround but it seems like kubeclient has a condition it needs to account for.

jperville added a commit to PerfectMemory/kubeclient that referenced this issue Nov 21, 2023
To configure the keep alive timeout, instanciate KubeClient::Client with
the `:keep_alive_timeout` keyword argument, defaulting to 60 (seconds).

Fixes issue ManageIQ#512.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants