-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SharedInformer does not survive to an API server restart #2992
Comments
@akram you should see reconnects - Line 113 in b91fd7e
kubernetes-client/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/RequestConfig.java Line 46 in 74cc63d
|
I believe we have hit a similar situation. If the relist operation fails due to the api server being unavailable it looks like no further reconnects will be attempted. |
Specifically we see this after logs: ERROR [io.fab.kub.cli.dsl.int.WatchConnectionManager] (OkHttp https://172.30.0.1/...) Unhandled exception encountered in watcher event handler: java.util.concurrent.RejectedExecutionException: Error while doing ReflectorRunnable list where the root exception is a timeout. |
Relates to: #2010 |
SharedInformer created to watch
ImageStreams
orBuildConfig
does not survive a k8s API server restart.Please note, that this probably related to a bug in k8s api, as I was abled to reproduce the behaviour using the
oc
command.As user using
oc
, if I restart the api server, while watchingimagestreams
, I got the following error:The same operation using
oc get secrets -w
does not fail.In the kubernetes-client, this materialized by an
EOFException
caught in theio.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener
which does not restart the WebSocket, but instead discards it from the manager.This is silent as a user point of view.
As a possible fix, we can considered having an
else
statement here: https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatcherWebSocketListener.java#L105As, for an existing and started websocket, it is still possible to get a null response, which may have the signification that the websocket has been started previously, but became unaivalable.
edit:
Discussing with apiserver team, it seems that it is also impacting core objects, not only openshift specific. In my test, I was deleting the openshift apiserver part only. But, the same error is raised then in case of any other objects.
The text was updated successfully, but these errors were encountered: