dbt seed should retry on bigquery when it tells us to #1579

beckjake · 2019-06-27T15:46:58Z

Issue

Sometimes dbt seed runs fail on bigquery with an error telling us to retry, but we don't.

Issue description

This is especially annoying in integration tests, here's an example:
https://dev.azure.com/fishtown-analytics/dbt/_build/results?buildId=302

The failure text is:

Runtime Error in seed source (data\source.csv)
  Error encountered during execution. Retrying may solve the problem.
  You are loading data without specifying data format, data will be treated as CSV format by default. If this is not what you mean, please specify data format by --source_format.

And, in fact, retrying will solve the problem. This is some weird transient failure inside bigquery. I'm pretty much positive that the second message is caused by the first (BQ gets an empty csv due to the previous error?) and should be ignored. dbt should catch this and retry the seed.

I'm sure this also happens in the "real world" once in a while.

Results

dbt seed failed with an error. I expected it to succeed

System information

dbt version 0.14.0-ish

The operating system you're running on: Any OS

The python version you're using (probably the output of python --version): Any python

Steps to reproduce

Run dbt seed a few thousand times because you run a lot of tests
Experience a couple failures, feel frustrated about having to re-run the entire suite each time for this.

The text was updated successfully, but these errors were encountered:

kconvey · 2019-11-15T15:53:29Z

This is something I have a need for, and plan on implementing in a fork, but would love to contribute back to dbt.

My rough plan for how to implement this is to wrap calls to the client (I use bigquery) in a call to a helper function that retries in accordance with the timeout, and a new configurable 'retries' parameter set in profiles.yml. It would be nice to delegate retrying logic to another library that already implements nice things like politely exponentially backing-off (the bigquery adapter already depends on google.api_core which has a 'retry' module that does exactly this).

Before I get too far on something that doesn't align well with what dbt wants, it would be helpful to know if it makes sense to implement this per adapter. The adapter feels to me like the most logical place to implement this, as the first point where exceptions from the client can be caught and retried, but this is a feature that all adapters could benefit from. A compelling reason to implement this per adapter might be that exceptions will vary per adapter, so logic that determines whether an exception is retryable will depend on the adapter.

Curious to hear any thoughts on this approach!

drewbanin · 2019-11-17T16:15:41Z

Hey @kconvey - I think this feature definitely has merit on BigQuery. I don't really perceive a need/desire for it on other databases though. BQ returns random 500 errors with some regularity and retrying really does make the query succeed. That just... doesn't usually happen on any of Snowflake/Redshift/Postgres.

My thinking is that when BigQuery returns a 500 error code (or an error message that says "Retrying may solve the problem"), dbt can retry the query. I think there could be merit to retrying with some sort of backoff, but really, I'd be equivalently comfortable retrying a single time after something like 10 seconds. I'm happy for the number of retries and the timeout interval to be configurable though.

Overall, I very much agree with your thinking here! I'd just say that we can make this BQ specific initially, but we should implement it in a way that could be extended to other plugins in the future.

drewbanin · 2020-01-07T16:26:58Z

Closed by #1963 - this is going out in dbt v0.15.1

advincze mentioned this issue Oct 9, 2019

Allowing for steps to retry #1630

Closed

drewbanin added this to the 0.15.2: Barbara Gittings milestone Nov 11, 2019

kconvey mentioned this issue Nov 27, 2019

Implement retries in BQ adapter #1963

Merged

drewbanin modified the milestones: Barbara Gittings, 0.15.1 Jan 7, 2020

drewbanin closed this as completed Jan 7, 2020

kconvey mentioned this issue Aug 11, 2020

Add retry of additional errors #2694

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbt seed should retry on bigquery when it tells us to #1579

dbt seed should retry on bigquery when it tells us to #1579

beckjake commented Jun 27, 2019

kconvey commented Nov 15, 2019

drewbanin commented Nov 17, 2019

drewbanin commented Jan 7, 2020

dbt seed should retry on bigquery when it tells us to #1579

dbt seed should retry on bigquery when it tells us to #1579

Comments

beckjake commented Jun 27, 2019

Issue

Issue description

Results

System information

Steps to reproduce

kconvey commented Nov 15, 2019

drewbanin commented Nov 17, 2019

drewbanin commented Jan 7, 2020