-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt seed should retry on bigquery when it tells us to #1579
Comments
This is something I have a need for, and plan on implementing in a fork, but would love to contribute back to dbt. My rough plan for how to implement this is to wrap calls to the client (I use bigquery) in a call to a helper function that retries in accordance with the timeout, and a new configurable 'retries' parameter set in profiles.yml. It would be nice to delegate retrying logic to another library that already implements nice things like politely exponentially backing-off (the bigquery adapter already depends on google.api_core which has a 'retry' module that does exactly this). Before I get too far on something that doesn't align well with what dbt wants, it would be helpful to know if it makes sense to implement this per adapter. The adapter feels to me like the most logical place to implement this, as the first point where exceptions from the client can be caught and retried, but this is a feature that all adapters could benefit from. A compelling reason to implement this per adapter might be that exceptions will vary per adapter, so logic that determines whether an exception is retryable will depend on the adapter. Curious to hear any thoughts on this approach! |
Hey @kconvey - I think this feature definitely has merit on BigQuery. I don't really perceive a need/desire for it on other databases though. BQ returns random 500 errors with some regularity and retrying really does make the query succeed. That just... doesn't usually happen on any of Snowflake/Redshift/Postgres. My thinking is that when BigQuery returns a 500 error code (or an error message that says "Retrying may solve the problem"), dbt can retry the query. I think there could be merit to retrying with some sort of backoff, but really, I'd be equivalently comfortable retrying a single time after something like 10 seconds. I'm happy for the number of retries and the timeout interval to be configurable though. Overall, I very much agree with your thinking here! I'd just say that we can make this BQ specific initially, but we should implement it in a way that could be extended to other plugins in the future. |
Closed by #1963 - this is going out in dbt v0.15.1 |
Issue
Sometimes
dbt seed
runs fail on bigquery with an error telling us to retry, but we don't.Issue description
This is especially annoying in integration tests, here's an example:
https://dev.azure.com/fishtown-analytics/dbt/_build/results?buildId=302
The failure text is:
And, in fact, retrying will solve the problem. This is some weird transient failure inside bigquery. I'm pretty much positive that the second message is caused by the first (BQ gets an empty csv due to the previous error?) and should be ignored. dbt should catch this and retry the seed.
I'm sure this also happens in the "real world" once in a while.
Results
dbt seed
failed with an error. I expected it to succeedSystem information
dbt version 0.14.0-ish
The operating system you're running on: Any OS
The python version you're using (probably the output of
python --version
): Any pythonSteps to reproduce
Run dbt seed a few thousand times because you run a lot of tests
Experience a couple failures, feel frustrated about having to re-run the entire suite each time for this.
The text was updated successfully, but these errors were encountered: