Watch pods times out/stops recognizes new notices after ~30 mins #512

khchau7 · 2021-08-11T23:50:16Z

client.watch_pods times out/stops recognizing new notices if there is no watch event that occurs for about 30 minutes even with timeouts set (open=60, read=nil) when creating the kubeclient.
The watch_pods is wrapped in an infinite loop, so if it simply exits or raises an exception, there is a mechanism in place to sleep and then restart the watch. However, we currently don't break out of watch_pods and instead time out and not act on any new notices received.

Might need to set the tcp keep alive (keep_alive_timeout) in the http_options in kubeclient.rb to avoid this. Need to replicate the behavior provided in client-go as specified here: https://github.com/yangl900/knet
"The k8s client-go by default turns on TCP Keepalive, and the client side will send an ACK packet to API server every 30s. With this, even though the SLB default timeout is 4 minutes, the TCP connection will never be idle and so will never be reset."

Any other ideas as to why this could be occurring?

cben · 2021-08-17T08:57:57Z

https://github.com/yangl900/knet is very interesting, I need to absorb the info there...
I had "folk knowledge" that whatever you do, apiserver eventually closes inactive connections, but can't point to a source. Anyway more understanding & tuning are good.

Note that keeping an open connection for many minutes with no activity is likely to cause a "version too old" if it does get disconnected. Consider also adding kubeclient support to opt-in to watch bookmarks which reduce this issue:
https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/956-watch-bookmark#motivation

chrisjohnson · 2023-10-10T16:03:50Z

We're seeing this as well. When the actual connection gets closed, the watch command exits with an error (as documented) which we catch and use to restart the listener. But there is something that happens where the listener stops getting notices, but doesn't actually exit, leaving the process more or less stuck. We're going to add a timer-based kill on our side as a workaround but it seems like kubeclient has a condition it needs to account for.

To configure the keep alive timeout, instanciate KubeClient::Client with the `:keep_alive_timeout` keyword argument, defaulting to 60 (seconds). Fixes issue ManageIQ#512.

khchau7 mentioned this issue Aug 12, 2021

added support for keep alive timeouts #513

Open

jperville mentioned this issue Nov 21, 2023

feat: add timeout_seconds option to Kubeclient::Client#watch_entities #624

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch pods times out/stops recognizes new notices after ~30 mins #512

Watch pods times out/stops recognizes new notices after ~30 mins #512

khchau7 commented Aug 11, 2021

cben commented Aug 17, 2021

chrisjohnson commented Oct 10, 2023

Watch pods times out/stops recognizes new notices after ~30 mins #512

Watch pods times out/stops recognizes new notices after ~30 mins #512

Comments

khchau7 commented Aug 11, 2021

cben commented Aug 17, 2021

chrisjohnson commented Oct 10, 2023