Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring - Look for specific messages in retries #2451

Closed
wants to merge 5 commits into from

Conversation

waprin
Copy link
Contributor

@waprin waprin commented Sep 27, 2016

This starts to address #2415.

This was the most obvious check to add. For the other errors, 404s, 500s, and 503s all provide very generic error messages (and really we need the API team to just fix the 500s).

As far as retry logic, I'm not convinced it can be significantly improved, using a base of 3 makes the jumps too big. Maybe we could start from a higher number, but I think it would complicate the retry logic to save at best a few seconds.

So I am voting to just close #2415 after this is merged but let me know if you disagree.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Sep 27, 2016
@@ -207,9 +207,14 @@ def _query_timeseries_with_retries():
def _has_timeseries(result):
return len(list(result)) > 0

def _unknown_metric(result):
return ('The provided filter doesn\'t refer to any known '
'metric.'in result.message)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

retry_result = RetryResult(_has_timeseries,
max_tries=MAX_RETRIES)(client.query)
return RetryErrors(BadRequest, max_tries=MAX_RETRIES)(retry_result)
return RetryErrors(BadRequest, _unknown_metric,

This comment was marked as spam.

This comment was marked as spam.

@waprin
Copy link
Contributor Author

waprin commented Sep 27, 2016

Review comments addressed, I am looking at the other retries, some of them are a bit harder to repro so still playing with it.

@@ -29,6 +29,21 @@
retry_500 = RetryErrors(InternalServerError)
retry_503 = RetryErrors(ServiceUnavailable)

# Retry predicates

This comment was marked as spam.

@rimey
Copy link
Contributor

rimey commented Sep 27, 2016

  1. The error message text returned by the API is not part of the API. It's subject to change, in which case the system tests will break. If you think it's worth the hassle, I won't object, but I'm not sure why you do.
  2. I find this code confusing because it executes the query twice as many times as it should. The query() method returns a Query object; it doesn't execute the query and return a "result" as this code seems to think.

@waprin
Copy link
Contributor Author

waprin commented Sep 27, 2016

@rimey

  1. I am inclined to agree, I started working on this because the issue got created, but not sure what problems it's solving, it seems unlikely we will start passing tests we should fail because of different errors with the same error code. I am more than happy to just close the issue but I think tres and danny disagree.
  2. I was aware of this when I originally wrote the code, but not sure how to fix it even after discussing it with @supriyagarg . It's true that query() just creates the object, but one you've called list on it to iterate over it, you can't do so again. The results for a given Query aren't stored anywhere in the object after iteration either. So without modifying the class there is no obvious method to retry besides the query call itself unless I'm missing something.

@dhermes I also don't see anything in the errors array in this specific response, message seems like the best we can do in this case.

@dhermes
Copy link
Contributor

dhermes commented Sep 27, 2016

I also don't see anything in the errors array in this specific response, message seems like the best we can do in this case.

That's why I mentioned

See #2414 about issues parsing the errors out of the payload

@waprin I can just take over that issue if you like. Wasn't trying to make it an undue burden on you.

@waprin
Copy link
Contributor Author

waprin commented Sep 27, 2016

@dhermes definitely not an undue burden, but you seem like you understand what you want better, so happy to punt it to you, but if you change your mind I am more than happy to do it.

@rimey
Copy link
Contributor

rimey commented Sep 27, 2016

@waprin You wrote:

It's true that query() just creates the object, but one you've called list on it to iterate over it, you can't do so again.

That's incorrect.

The results for a given Query aren't stored anywhere in the object after iteration either. So without modifying the class there is no obvious method to retry besides the query call itself unless I'm missing something.

I'm not understanding the problem you are running into.

Please don't modify the class. It's working as intended.

@waprin
Copy link
Contributor Author

waprin commented Sep 28, 2016

@rimey

That's incorrect.

Yes, I was totally confused and misunderstood the problem I had previously encountered.

I'm not understanding the problem you are running into.

Looked a it again and realized this is the issue:

#2459

by re-creating the Query object I was getting a new end_time which is making the tests pass. I think we should probably fix Query to not replace seconds, but alternatively I could manually specify the correct time interval for the time query in the system test.

@tseaver tseaver added the api: monitoring Issues related to the Cloud Monitoring API. label Sep 29, 2016
@lukesneeringer
Copy link
Contributor

This issue may no longer be relevant due to its age. Feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: monitoring Issues related to the Cloud Monitoring API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Monitoring System Tests Retries More Granular
6 participants