Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch stream disconnecting - how to watch forever? #728

Closed
richstokes opened this issue Jan 17, 2019 · 12 comments
Closed

Watch stream disconnecting - how to watch forever? #728

richstokes opened this issue Jan 17, 2019 · 12 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@richstokes
Copy link

richstokes commented Jan 17, 2019

Hi,

Is there a way to keep the watch stream connected forever? I am getting disconnected after approx. 5-10 minutes with this error:

Traceback (most recent call last):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 572, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 331, in _error_catcher
    yield
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 637, in read_chunked
    self._update_chunk_length()
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 576, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "k8s-live.py", line 69, in watchPodEvents
    for event in w.stream(api_instance.list_pod_for_all_namespaces):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/kubernetes/watch/watch.py", line 130, in stream
    for line in iter_resp_lines(resp):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/kubernetes/watch/watch.py", line 45, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 665, in read_chunked
    self._original_response.close()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 349, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

The watch function I am using (which works great up until it gets disconnected):

def watchPodEvents(clusterName):
    api_instance = client.CoreV1Api()
    w = watch.Watch()
    for event in w.stream(api_instance.list_pod_for_all_namespaces):
        print("Cluster: %s Event: %s %s %s" % (clusterName, event['type'],event['object'].kind, event['object'].metadata.name))

    return
@richstokes
Copy link
Author

I built a workaround to handle the exception and reconnect, not sure if this is the right solution or if the python Kubernetes client should be taking care of this?

def watchPodEvents(clusterName):
    api_instance = client.CoreV1Api()
    last_seen_version = ''
    while True:
        w = watch.Watch()
        try:
            for event in w.stream(api_instance.list_pod_for_all_namespaces, resource_version=last_seen_version):
                last_seen_version = event['object'].metadata.resource_version
                print("Cluster: %s Event: %s %s %s" % (clusterName, event['type'],event['object'].kind, event['object'].metadata.name))
        except ProtocolError:
            print("watchPodEvents ProtocolError (%s), continuing.." % clusterName)

@richstokes
Copy link
Author

Spoke too soon, Now I'm running into this bug with the above workaround, #701

@govindKAG
Copy link

running into the same issue, got anything ?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 20, 2019
@richstokes
Copy link
Author

Still an issue

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@richstokes
Copy link
Author

/reopen - not sure why closing issues based on elapsed time is a thing.

@shanit-saha
Copy link

shanit-saha commented Sep 12, 2019

@richstokes & @govindKAG : Facing the same problem at our end, By any chance have you been able to get to some solution or workaround.

@logicfox
Copy link

logicfox commented Dec 2, 2019

Has anyone figured out a reliable solution to this problem?

@shanit-saha
Copy link

@logicfox

A given Kubernetes server will only preserve a historical list of changes for a limited time [ 5-15 minutes as per configuration ]. K8 internally uses etcd3 and that preserve changes in the last 5 minutes by default. The requested watch operations fails because the historical version of that resource is not available. If the specified value is no longer valid whether due to expiration ( generally five to fifteen minutes ) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token.
When retrieving a collection of resources (either namespace or cluster scoped), the response from the server will contain a resourceVersion value that can be used to initiate a watch against the server. So the requested watch operations fail because the historical version of that resource is not available, So you will have to refactor the code to use the last resource_version. On an event of status code 410 Gone the code now restarts the watch from the last ResourceVersion that we preserve in a variable. A code snippet below should assist is comprehending.

ResourceVersion=int(self._Pod_Resource_version)
pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod,                                        , namespace="default", label_selector=label_selector_str)
for pod_event in pod_stream:
   self._Pod_Resource_version = pod_event['object'].metadata.resource_version
   .....
   .....

Everytime the watch happens the PodResource version needs to be preserved as have been depicted in the code snippet above and then handle that in the watch exception handling block, by restarting the watch again but with the last resource version preserved with something like below

 if  PodResVersion>0 :
                pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod
                                              , namespace="default"
                                              , label_selector=label_selector_str,resource_version=self._Pod_Resource_version)
            else:                      
                pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod
                                              , namespace="default"
                                              , label_selector=label_selector_str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants