Watch stream disconnecting - how to watch forever? #728

richstokes · 2019-01-17T15:52:01Z

Hi,

Is there a way to keep the watch stream connected forever? I am getting disconnected after approx. 5-10 minutes with this error:

Traceback (most recent call last):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 572, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 331, in _error_catcher
    yield
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 637, in read_chunked
    self._update_chunk_length()
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 576, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "k8s-live.py", line 69, in watchPodEvents
    for event in w.stream(api_instance.list_pod_for_all_namespaces):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/kubernetes/watch/watch.py", line 130, in stream
    for line in iter_resp_lines(resp):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/kubernetes/watch/watch.py", line 45, in iter_resp_lines
    for seg in resp.read_chunked(decode_content=False):
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 665, in read_chunked
    self._original_response.close()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/rich/Library/Python/3.7/lib/python/site-packages/urllib3/response.py", line 349, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

The watch function I am using (which works great up until it gets disconnected):

def watchPodEvents(clusterName):
    api_instance = client.CoreV1Api()
    w = watch.Watch()
    for event in w.stream(api_instance.list_pod_for_all_namespaces):
        print("Cluster: %s Event: %s %s %s" % (clusterName, event['type'],event['object'].kind, event['object'].metadata.name))

    return

The text was updated successfully, but these errors were encountered:

richstokes · 2019-01-17T18:34:11Z

I built a workaround to handle the exception and reconnect, not sure if this is the right solution or if the python Kubernetes client should be taking care of this?

def watchPodEvents(clusterName):
    api_instance = client.CoreV1Api()
    last_seen_version = ''
    while True:
        w = watch.Watch()
        try:
            for event in w.stream(api_instance.list_pod_for_all_namespaces, resource_version=last_seen_version):
                last_seen_version = event['object'].metadata.resource_version
                print("Cluster: %s Event: %s %s %s" % (clusterName, event['type'],event['object'].kind, event['object'].metadata.name))
        except ProtocolError:
            print("watchPodEvents ProtocolError (%s), continuing.." % clusterName)

richstokes · 2019-01-17T22:49:18Z

Spoke too soon, Now I'm running into this bug with the above workaround, #701

govindKAG · 2019-02-20T10:47:57Z

running into the same issue, got anything ?

fejta-bot · 2019-05-21T11:14:29Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-20T12:01:42Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

richstokes · 2019-06-20T16:08:04Z

Still an issue

fejta-bot · 2019-07-20T16:51:26Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-20T16:51:33Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

richstokes · 2019-07-20T16:56:28Z

/reopen - not sure why closing issues based on elapsed time is a thing.

shanit-saha · 2019-09-12T04:06:09Z

@richstokes & @govindKAG : Facing the same problem at our end, By any chance have you been able to get to some solution or workaround.

logicfox · 2019-12-02T23:20:35Z

Has anyone figured out a reliable solution to this problem?

shanit-saha · 2019-12-03T11:53:55Z

@logicfox

A given Kubernetes server will only preserve a historical list of changes for a limited time [ 5-15 minutes as per configuration ]. K8 internally uses etcd3 and that preserve changes in the last 5 minutes by default. The requested watch operations fails because the historical version of that resource is not available. If the specified value is no longer valid whether due to expiration ( generally five to fifteen minutes ) or a configuration change on the server, the server will respond with a 410 ResourceExpired error together with a continue token.
When retrieving a collection of resources (either namespace or cluster scoped), the response from the server will contain a resourceVersion value that can be used to initiate a watch against the server. So the requested watch operations fail because the historical version of that resource is not available, So you will have to refactor the code to use the last resource_version. On an event of status code 410 Gone the code now restarts the watch from the last ResourceVersion that we preserve in a variable. A code snippet below should assist is comprehending.

ResourceVersion=int(self._Pod_Resource_version)
pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod,                                        , namespace="default", label_selector=label_selector_str)
for pod_event in pod_stream:
   self._Pod_Resource_version = pod_event['object'].metadata.resource_version
   .....
   .....

Everytime the watch happens the PodResource version needs to be preserved as have been depicted in the code snippet above and then handle that in the watch exception handling block, by restarting the watch again but with the last resource version preserved with something like below

 if  PodResVersion>0 :
                pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod
                                              , namespace="default"
                                              , label_selector=label_selector_str,resource_version=self._Pod_Resource_version)
            else:                      
                pod_stream = watch.Watch().stream( self._k8PodClient.list_namespaced_pod
                                              , namespace="default"
                                              , label_selector=label_selector_str)

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 20, 2019

k8s-ci-robot closed this as completed Jul 20, 2019

max-rocket-internet mentioned this issue Sep 26, 2019

IncompleteRead while watching #972

Closed

ephraimbuddy mentioned this issue Apr 18, 2021

Parse Error 410 in kubernetes Watcher and return latest resource version apache/airflow#15418

Closed

zzstoatzz mentioned this issue Nov 11, 2022

adds pod tasks PrefectHQ/prefect-kubernetes#21

Merged

1 task

BitTheByte mentioned this issue Nov 24, 2022

Prefect agent raises InvalidChunkLength while streaming Kubernetes logs PrefectHQ/prefect#7653

Closed

4 tasks

Zebradil mentioned this issue Nov 30, 2022

ProtocolError - InvalidChunkLength kiwigrid/k8s-sidecar#233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch stream disconnecting - how to watch forever? #728

Watch stream disconnecting - how to watch forever? #728

richstokes commented Jan 17, 2019 •

edited

Loading

richstokes commented Jan 17, 2019

richstokes commented Jan 17, 2019

govindKAG commented Feb 20, 2019

fejta-bot commented May 21, 2019

fejta-bot commented Jun 20, 2019

richstokes commented Jun 20, 2019

fejta-bot commented Jul 20, 2019

k8s-ci-robot commented Jul 20, 2019

richstokes commented Jul 20, 2019

shanit-saha commented Sep 12, 2019 •

edited

Loading

logicfox commented Dec 2, 2019

shanit-saha commented Dec 3, 2019

Watch stream disconnecting - how to watch forever? #728

Watch stream disconnecting - how to watch forever? #728

Comments

richstokes commented Jan 17, 2019 • edited Loading

richstokes commented Jan 17, 2019

richstokes commented Jan 17, 2019

govindKAG commented Feb 20, 2019

fejta-bot commented May 21, 2019

fejta-bot commented Jun 20, 2019

richstokes commented Jun 20, 2019

fejta-bot commented Jul 20, 2019

k8s-ci-robot commented Jul 20, 2019

richstokes commented Jul 20, 2019

shanit-saha commented Sep 12, 2019 • edited Loading

logicfox commented Dec 2, 2019

shanit-saha commented Dec 3, 2019

@logicfox

richstokes commented Jan 17, 2019 •

edited

Loading

shanit-saha commented Sep 12, 2019 •

edited

Loading