-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine a more appropriate sleep time for BigQuery system tests #1391
Comments
In gcloud-ruby we do a couple things. First, we have an incremental backoff that waits for a job to complete. The other thing we do is retry failed acceptance tests, which has been a huge win for us. BigQuery has many hiccups, and we used to restart failed tests manually, but this has all but eliminated manual restarts. |
We toyed with retries in #535 but dismissed, I suppose this may be a reason to bring it back. (That'd also fix our flaky "eventual consistency" errors with storage API queries.) |
If I had to guess I would say we have a 60%ish success rate on system tests, BigQuery being a frequent offender of failures. When I first started we would timeout after 30 seconds and have since upped that to 60, which has helped a lot, but failures are still pretty common for us. |
Retrying tests has been super huge for us. It has saved us so much time. Many services have problems, not just BigQuery and Storage. But the incremental backoff seems to be more of what this issue is about. We don't limit how long it blocks. The CI build will eventually time out if it takes too long, but we've never seen a job in our tests take that long. We occasionally see 2+ minutes for a job to complete though. |
@dhermes Do you run the test on other place than travis? Do you run nightly builds? I'm asking because if you're running nightly builds, you can usually treat the test in question as flaky and run the test in nightly builds with much longer timeout. |
@tseaver Closing this since our |
We sleep for 90 seconds but we get lots of failures when the job state is not done. This makes the BigQuery system tests fail quite often.
@tmatsuo Is there a better number of seconds we could sleep for? @stephenplusplus @callmehiphop @blowmage have you guys run into any issues with your own system tests?
Some more (collected in #1104)
The text was updated successfully, but these errors were encountered: