Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal server error when call watch on list_pod_for_all_namespace #701

Closed
yuyang0 opened this issue Dec 11, 2018 · 6 comments
Closed

internal server error when call watch on list_pod_for_all_namespace #701

yuyang0 opened this issue Dec 11, 2018 · 6 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@yuyang0
Copy link

yuyang0 commented Dec 11, 2018

my code is like this

last_seen_version = None
label_selector = "xxxxx"
while True:
    try:
        w = watch.Watch()
        if last_seen_version is not None:
            watcher = w.stream(self.core_v1api.list_pod_for_all_namespaces, label_selector=label_selector, resource_version=last_seen_version)
        else:
            watcher = w.stream(self.core_v1api.list_pod_for_all_namespaces, label_selector=label_selector)
        for event in watcher:
            obj = event['object']
            labels = obj.metadata.labels or {}
            last_seen_version = obj.metadata.resource_version
    except ProtocolError:
        logger.warn('skip this error... because kubernetes disconnect client after default 10m...')

and the code throws ApiException(500)

Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/kae/app/console/bin/watch_pods.py", line 75, in watch_app_job_pods
raise e
File "/kae/app/console/bin/watch_pods.py", line 53, in watch_app_job_pods
for event in watcher:
File "/usr/local/lib/python3.6/site-packages/kubernetes/watch/watch.py", line 128, in stream
resp = func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 13812, in list_pod_for_all_namespaces
(data) = self.list_pod_for_all_namespaces_with_http_info(**kwargs)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py", line 13909, in list_pod_for_all_namespaces_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 321, in call_api
_return_http_data_only, collection_formats, _preload_content, _request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 342, in request
headers=headers)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in GET
query_params=query_params)
File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 222, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'Date': 'Tue, 11 Dec 2018 08:01:02 GMT', 'Content-Length': '186'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"resourceVersion: Invalid value: \"None\": strconv.ParseUint: parsing \"None\": invalid syntax","code":500}

I'm not sure if this is a bug in python client or kubernetes itself.

@ananace
Copy link

ananace commented Dec 13, 2018

I've been testing a patch that seems to have worked around this same issue for me, going to throw up a PR once I'm sure I'm not breaking anything;

--- a/kubernetes/watch/watch.py    2018-12-13 12:55:30.282078582 +0100
+++ b/kubernetes/watch/watch.py    2018-12-13 12:57:34.727886214 +0100
@@ -79,17 +79,23 @@
     def unmarshal_event(self, data, return_type):
         js = json.loads(data)
         js['raw_object'] = js['object']
+        version = None
         if return_type:
             obj = SimpleNamespace(data=json.dumps(js['raw_object']))
             js['object'] = self._api_client.deserialize(obj, return_type)
             if hasattr(js['object'], 'metadata'):
-                self.resource_version = js['object'].metadata.resource_version
-            # For custom objects that we don't have model defined, json
-            # deserialization results in dictionary
-            elif (isinstance(js['object'], dict) and 'metadata' in js['object']
-                  and 'resourceVersion' in js['object']['metadata']):
-                self.resource_version = js['object']['metadata'][
-                    'resourceVersion']
+                version = js['object'].metadata.resource_version
+        # For custom objects that we don't have model defined, json
+        # deserialization results in dictionary
+        if (version is None
+            and isinstance(js['object'], dict)
+            and 'metadata' in js['object']
+            and 'resourceVersion' in js['object']['metadata']):
+            version = js['object']['metadata']['resourceVersion']
+
+        if version is not None:
+            self.resource_version = version
+
         return js
 
     def stream(self, func, *args, **kwargs):

Edit: Broke - not everything but - a few things. Written up a new patch instead;

--- a/kubernetes/watch/watch.py	2018-12-14 08:15:10.574488606 +0100
+++ b/kubernetes/watch/watch.py	2018-12-14 08:30:41.285017302 +0100
@@ -14,6 +14,7 @@
 
 import json
 import pydoc
+import re
 
 from kubernetes import client
 
@@ -123,12 +124,27 @@
         kwargs['watch'] = True
         kwargs['_preload_content'] = False
 
+        reloading = None
         timeouts = ('timeout_seconds' in kwargs)
         while True:
+            reloading = False
             resp = func(*args, **kwargs)
             try:
                 for line in iter_resp_lines(resp):
-                    yield self.unmarshal_event(line, return_type)
+                    ev = self.unmarshal_event(line, return_type)
+                    raw = ev['raw_object']
+
+                    if 'Status' in raw.get('kind', '') and\
+                       'Failure' in raw.get('status', '') and\
+                       'Gone' in raw.get('reason', ''):
+                        new_ver = re.search("\((\d+)\)",
+                                            raw.get('message',''))
+                        if new_ver and new_ver.group(1):
+                            self.resource_version = int(new_ver.group(1)) + 1
+                            reloading = True
+                            break
+
+                    yield ev
                     if self._stop:
                         break
             finally:
@@ -136,5 +156,5 @@
                 resp.close()
                 resp.release_conn()
 
-            if timeouts or self._stop:
+            if (timeouts or self._stop) and not reloading:
                 break

@richstokes
Copy link

I'm running into the same issue. Something not right here, as we see the response "Failure","message":"resourceVersion: Invalid value: \"None\" yet we are passing a valid resourceVersion to the watch method.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants