Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine a more appropriate sleep time for BigQuery system tests #1391

Closed
dhermes opened this issue Jan 15, 2016 · 7 comments
Closed

Determine a more appropriate sleep time for BigQuery system tests #1391

dhermes opened this issue Jan 15, 2016 · 7 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API.

Comments

@dhermes dhermes added the api: bigquery Issues related to the BigQuery API. label Jan 15, 2016
@blowmage
Copy link
Contributor

In gcloud-ruby we do a couple things. First, we have an incremental backoff that waits for a job to complete. The other thing we do is retry failed acceptance tests, which has been a huge win for us. BigQuery has many hiccups, and we used to restart failed tests manually, but this has all but eliminated manual restarts.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 15, 2016

We toyed with retries in #535 but dismissed, I suppose this may be a reason to bring it back.

(That'd also fix our flaky "eventual consistency" errors with storage API queries.)

@callmehiphop
Copy link

If I had to guess I would say we have a 60%ish success rate on system tests, BigQuery being a frequent offender of failures. When I first started we would timeout after 30 seconds and have since upped that to 60, which has helped a lot, but failures are still pretty common for us.

@blowmage
Copy link
Contributor

Retrying tests has been super huge for us. It has saved us so much time. Many services have problems, not just BigQuery and Storage.

But the incremental backoff seems to be more of what this issue is about. We don't limit how long it blocks. The CI build will eventually time out if it takes too long, but we've never seen a job in our tests take that long. We occasionally see 2+ minutes for a job to complete though.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 15, 2016

👍 I like it. @tseaver WDYT?

@tmatsuo I'm still curious if there is a magic number > 90 seconds.

@tmatsuo
Copy link
Contributor

tmatsuo commented May 12, 2016

@dhermes
I don't know, but if it's an eventual consistency issue, I would just have longer deadline.

Do you run the test on other place than travis? Do you run nightly builds?

I'm asking because if you're running nightly builds, you can usually treat the test in question as flaky and run the test in nightly builds with much longer timeout.

@dhermes dhermes added the flaky label Aug 11, 2016
@dhermes
Copy link
Contributor Author

dhermes commented Aug 11, 2016

@tseaver Closing this since our retry stuff has picked up steam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

No branches or pull requests

4 participants