Option to "Continue loading from failure point" when restarting a previous failed run #2568

bashyroger · 2020-06-18T09:22:22Z

Describe the feature

1st, this request is a sort if inverse of 2142

When a model run fails, all dependent, downstream model runs are skipped.
Now, when you fix your error and you restart a run, it would be great that you could continue it / retry from the point of failure with a flag like continue_from_last_failure or skip_already_processed. This will save a lot of processing time / costs when the job runs for a long time and makes restarting inherently simple.

As an example, when scheduling a daily run that takes 2 hours and the job failing at 1,5 hour: it would be great when you do not have to restart it completely OR have go though the laborious work of using the models clause to start running from the point of failure

I realize that this is complex to add as it requires to log / track state of every model that has been run. But I do know this is possible, as I have build something like this a decade ago when developing an ETL automation tool for SQL server, written in procedural SQL...

Describe alternatives you've considered

The alternative is going though the laborious work of using the models clause to start everything from the point of failure or pay the price for a total re-run

Who will this benefit?

Basically everyone that has a run failing having significant costs (in processing and / or time), wanting to restart it as efficient as possible.

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2020-06-18T14:21:08Z

I agree this would be very compelling. It's already on our radar, as one of the intended use cases of #2465. Could you take a look at that issue, and comment here if it differs from what you had in mind?

bashyroger · 2020-06-19T07:11:56Z

#2465 indeed makes possible what I request @jtcohen6! So, this ticket is a duplicate of that one / can be closed.

The core indeed is that you need to start tracking state of everything that ran.
IMO that should be done in a database, that you then also can use for reporting on your runs, either directly in DBT cloud or (even better), by logging that data to the target analytical solution in a DBT specific logging database / schema

jtcohen6 · 2020-06-19T13:56:40Z

I see what you mean. While dbt has had a light touch historically in terms of the metadata it persists in the database—almost none, outside of snapshots and a (fairly slow) add-on logging package—there is good cause to start preserving past invocations and state to power longitudinal analysis.

Some datasets are conducive to that right from within an analytical SQL database. For other, less-tabular data, we may be able to provide more compelling analyses from within dbt Cloud.

bashyroger added enhancement New feature or request triage labels Jun 18, 2020

jtcohen6 removed the triage label Jun 18, 2020

jtcohen6 added the duplicate This issue or pull request already exists label Jun 19, 2020

jtcohen6 closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to "Continue loading from failure point" when restarting a previous failed run #2568

Option to "Continue loading from failure point" when restarting a previous failed run #2568

bashyroger commented Jun 18, 2020

jtcohen6 commented Jun 18, 2020

bashyroger commented Jun 19, 2020

jtcohen6 commented Jun 19, 2020

Option to "Continue loading from failure point" when restarting a previous failed run #2568

Option to "Continue loading from failure point" when restarting a previous failed run #2568

Comments

bashyroger commented Jun 18, 2020

Describe the feature

Describe alternatives you've considered

Who will this benefit?

jtcohen6 commented Jun 18, 2020

bashyroger commented Jun 19, 2020

jtcohen6 commented Jun 19, 2020