Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to "Continue loading from failure point" when restarting a previous failed run #2568

Closed
bashyroger opened this issue Jun 18, 2020 · 3 comments
Labels
duplicate This issue or pull request already exists enhancement New feature or request

Comments

@bashyroger
Copy link

Describe the feature

1st, this request is a sort if inverse of 2142

When a model run fails, all dependent, downstream model runs are skipped.
Now, when you fix your error and you restart a run, it would be great that you could continue it / retry from the point of failure with a flag like continue_from_last_failure or skip_already_processed. This will save a lot of processing time / costs when the job runs for a long time and makes restarting inherently simple.

As an example, when scheduling a daily run that takes 2 hours and the job failing at 1,5 hour: it would be great when you do not have to restart it completely OR have go though the laborious work of using the models clause to start running from the point of failure

I realize that this is complex to add as it requires to log / track state of every model that has been run. But I do know this is possible, as I have build something like this a decade ago when developing an ETL automation tool for SQL server, written in procedural SQL...

Describe alternatives you've considered

The alternative is going though the laborious work of using the models clause to start everything from the point of failure or pay the price for a total re-run

Who will this benefit?

Basically everyone that has a run failing having significant costs (in processing and / or time), wanting to restart it as efficient as possible.

@bashyroger bashyroger added enhancement New feature or request triage labels Jun 18, 2020
@jtcohen6
Copy link
Contributor

I agree this would be very compelling. It's already on our radar, as one of the intended use cases of #2465. Could you take a look at that issue, and comment here if it differs from what you had in mind?

@jtcohen6 jtcohen6 removed the triage label Jun 18, 2020
@bashyroger
Copy link
Author

#2465 indeed makes possible what I request @jtcohen6! So, this ticket is a duplicate of that one / can be closed.

The core indeed is that you need to start tracking state of everything that ran.
IMO that should be done in a database, that you then also can use for reporting on your runs, either directly in DBT cloud or (even better), by logging that data to the target analytical solution in a DBT specific logging database / schema

@jtcohen6 jtcohen6 added the duplicate This issue or pull request already exists label Jun 19, 2020
@jtcohen6
Copy link
Contributor

I see what you mean. While dbt has had a light touch historically in terms of the metadata it persists in the database—almost none, outside of snapshots and a (fairly slow) add-on logging package—there is good cause to start preserving past invocations and state to power longitudinal analysis.

Some datasets are conducive to that right from within an analytical SQL database. For other, less-tabular data, we may be able to provide more compelling analyses from within dbt Cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants