Monitoring - Look for specific messages in retries #2451

waprin · 2016-09-27T19:11:26Z

This starts to address #2415.

This was the most obvious check to add. For the other errors, 404s, 500s, and 503s all provide very generic error messages (and really we need the API team to just fix the 500s).

As far as retry logic, I'm not convinced it can be significantly improved, using a base of 3 makes the jumps too big. Maybe we could start from a higher number, but I think it would complicate the retry logic to save at best a few seconds.

So I am voting to just close #2415 after this is merged but let me know if you disagree.

system_tests/monitoring.py

@@ -207,9 +207,14 @@ def _query_timeseries_with_retries():
            def _has_timeseries(result):
                return len(list(result)) > 0

+            def _unknown_metric(result):
+                return ('The provided filter doesn\'t refer to any known '
+                        'metric.'in result.message)


system_tests/monitoring.py

            retry_result = RetryResult(_has_timeseries,
                                       max_tries=MAX_RETRIES)(client.query)
-            return RetryErrors(BadRequest, max_tries=MAX_RETRIES)(retry_result)
+            return RetryErrors(BadRequest, _unknown_metric,


dhermes · 2016-09-27T19:20:47Z

RE: "generic messages" Sometimes the error payload isn't even JSON:
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/4395b35adfabac9084c67184c74070db017a8a35/system_tests/storage.py#L34-L38

Sometimes it's a gRPC error (which is usually full of good info):
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/4395b35adfabac9084c67184c74070db017a8a35/system_tests/bigtable.py#L89-L92

But usually we can get specific error info from the error response:

waprin · 2016-09-27T22:15:01Z

Review comments addressed, I am looking at the other retries, some of them are a bit harder to repro so still playing with it.

system_tests/monitoring.py

@@ -29,6 +29,21 @@
 retry_500 = RetryErrors(InternalServerError)
 retry_503 = RetryErrors(ServiceUnavailable)

+# Retry predicates


rimey · 2016-09-27T22:39:04Z

The error message text returned by the API is not part of the API. It's subject to change, in which case the system tests will break. If you think it's worth the hassle, I won't object, but I'm not sure why you do.
I find this code confusing because it executes the query twice as many times as it should. The query() method returns a Query object; it doesn't execute the query and return a "result" as this code seems to think.

waprin · 2016-09-27T22:51:16Z

@rimey

I am inclined to agree, I started working on this because the issue got created, but not sure what problems it's solving, it seems unlikely we will start passing tests we should fail because of different errors with the same error code. I am more than happy to just close the issue but I think tres and danny disagree.
I was aware of this when I originally wrote the code, but not sure how to fix it even after discussing it with @supriyagarg . It's true that query() just creates the object, but one you've called list on it to iterate over it, you can't do so again. The results for a given Query aren't stored anywhere in the object after iteration either. So without modifying the class there is no obvious method to retry besides the query call itself unless I'm missing something.

@dhermes I also don't see anything in the errors array in this specific response, message seems like the best we can do in this case.

dhermes · 2016-09-27T22:53:40Z

I also don't see anything in the errors array in this specific response, message seems like the best we can do in this case.

That's why I mentioned

See #2414 about issues parsing the errors out of the payload

@waprin I can just take over that issue if you like. Wasn't trying to make it an undue burden on you.

waprin · 2016-09-27T23:06:39Z

@dhermes definitely not an undue burden, but you seem like you understand what you want better, so happy to punt it to you, but if you change your mind I am more than happy to do it.

rimey · 2016-09-27T23:10:35Z

@waprin You wrote:

It's true that query() just creates the object, but one you've called list on it to iterate over it, you can't do so again.

That's incorrect.

The results for a given Query aren't stored anywhere in the object after iteration either. So without modifying the class there is no obvious method to retry besides the query call itself unless I'm missing something.

I'm not understanding the problem you are running into.

Please don't modify the class. It's working as intended.

waprin · 2016-09-28T19:42:24Z

@rimey

That's incorrect.

Yes, I was totally confused and misunderstood the problem I had previously encountered.

I'm not understanding the problem you are running into.

Looked a it again and realized this is the issue:

#2459

by re-creating the Query object I was getting a new end_time which is making the tests pass. I think we should probably fix Query to not replace seconds, but alternatively I could manually specify the correct time interval for the time query in the system test.

lukesneeringer · 2017-08-07T20:07:56Z

This issue may no longer be relevant due to its age. Feel free to re-open.

Monitoring - Look for specific messages in retries

efea375

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 27, 2016

dhermes reviewed Sep 27, 2016

View reviewed changes

review comments

08ca167

dhermes reviewed Sep 27, 2016

View reviewed changes

system_tests/monitoring.py

@@ -29,6 +29,21 @@

retry_500 = RetryErrors(InternalServerError)

retry_503 = RetryErrors(ServiceUnavailable)

# Retry predicates

This comment was marked as spam.

Sign in to view

Bill Prin added 2 commits September 27, 2016 15:51

review comments

4206d2d

Fix messup

e651d3e

Actually needed those newlines

539449b

tseaver added the api: monitoring Issues related to the Cloud Monitoring API. label Sep 29, 2016

lukesneeringer closed this Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring - Look for specific messages in retries #2451

Monitoring - Look for specific messages in retries #2451

waprin commented Sep 27, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

dhermes commented Sep 27, 2016

waprin commented Sep 27, 2016

This comment was marked as spam.

rimey commented Sep 27, 2016

waprin commented Sep 27, 2016 •

edited by dhermes

Loading

dhermes commented Sep 27, 2016

waprin commented Sep 27, 2016

rimey commented Sep 27, 2016

waprin commented Sep 28, 2016 •

edited

Loading

lukesneeringer commented Aug 7, 2017

Monitoring - Look for specific messages in retries #2451

Monitoring - Look for specific messages in retries #2451

Conversation

waprin commented Sep 27, 2016

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

dhermes commented Sep 27, 2016

waprin commented Sep 27, 2016

This comment was marked as spam.

rimey commented Sep 27, 2016

waprin commented Sep 27, 2016 • edited by dhermes Loading

dhermes commented Sep 27, 2016

waprin commented Sep 27, 2016

rimey commented Sep 27, 2016

waprin commented Sep 28, 2016 • edited Loading

lukesneeringer commented Aug 7, 2017

waprin commented Sep 27, 2016 •

edited by dhermes

Loading

waprin commented Sep 28, 2016 •

edited

Loading