Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart Reruns from Failures/Errors #4017

Merged
merged 61 commits into from
Oct 18, 2021
Merged

Conversation

sungchun12
Copy link
Contributor

@sungchun12 sungchun12 commented Oct 6, 2021

resolves #3891

Description

Instead of having to manually override dbt commands with the models in scope that need to rerun, dbt can now parse the run_results.json to do so automatically.

Example Usage
First dbt job model has an error.
image

Rerun dbt job errors and downstream nodes.
image

Result Selectors in Scope
Note: This pull request does NOT limit which selectors the user can call on from the below list. There are use cases to analyze the nodes for any of these result statuses(ex: dbt ls --select result:skipped)

  • result:fail
  • result:error
  • result:warn
  • result:success
  • result:skipped
  • result:pass

Testing Approach
Tested result selectors relevant to the commands in scope. For example, you'll see a test for dbt run --select result:error, but you won't see a test for dbt run --select result:fail. This is because a result:fail selector will never exist for dbt run in the run_results.json.

For concurrent selector tests, we tested the most common use cases below.

  • dbt run --select state:modified+ result:error+ -—defer -—state ./target
    • Rerun all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use
  • dbt build --select state:modified+ result:error+ -—defer -—state ./target
    • Rerun and retest all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use
  • dbt build --select state:modified+ result:error+ result:fail+ --defer --state ./target
    • Rerun all my erroneous models AND all my failed tests
    • Rerun all my erroneous models AND run changes I made concurrently that may relate to the erroneous models for downstream use
    • There's a failed test that's unrelated to modified or error nodes(think: source test that needs to refresh a data load in order to pass)
  • dbt test --select result:fail --exclude <example test> -—defer -—state ./target
    • Rerun all my failed tests and exclude tests that I know will still fail
    • This can apply to updates in source data during the "EL" process that need to be rerun after they are refreshed

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@sungchun12
Copy link
Contributor Author

I noticed there a test error that's unrelated to the changes made in this pull request.
image

Succeeded in a previous run with the same command syntax: https://github.com/dbt-labs/dbt/runs/3820843861#step:9:1319
image

@sungchun12 sungchun12 marked this pull request as ready for review October 7, 2021 18:34
@sungchun12
Copy link
Contributor Author

The same thing is happening in the latest merge to develop: https://github.com/dbt-labs/dbt/runs/3829307022#step:9:1227

@sungchun12
Copy link
Contributor Author

@fclesio, would love for you to try this out!

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sungchun12 @matt-winkler This looks fantastic. I especially appreciate the thoroughness with which you've tested and vested the array of anticipated use cases.

@leahwicz I contributed some of the initial code for this :), so I wouldn't hate if someone on the Core team could give this a quick code review, as a complement to my function review here. (This definitely falls under the Execution sub-team.) At the same time, it doesn't touch that much code, builds pretty firmly on top of the precedent set by the state: selection method, and the integration testing looks solid. I could go either way, depending on your comfort level.

(For the moment, I'm just going to commit a quick update to the changelog, since we've released v1.0.0-b1. I have every intention of including this feature in v1.0.0-b2.)

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(quick hits to get the tests passing)

@sungchun12
Copy link
Contributor Author

Looks like the adapter tests have changed since the previous commit because they're being skipped now. Let me know if I need to pass a parameter somewhere to get things running.

Copy link
Contributor

@gshank gshank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is not that big and is limited to the new feature implemented in this pull request, so I think the risk is pretty low. The new code follows the standard of the existing code quite well.

I'd like to have the commits squashed into one and rebased on current main before we merge this though :)

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this contribution, @sungchun12 @matt-winkler!

@gshank I agree re: adding this to main as a single commit. I think GitHub's squash-merge capabilities could get the job done.

@jtcohen6 jtcohen6 requested a review from gshank October 18, 2021 07:42
@jtcohen6 jtcohen6 merged commit 97f31c8 into main Oct 18, 2021
@jtcohen6 jtcohen6 deleted the feature/smart-rerun-from-failures branch October 18, 2021 14:43
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Add result: selection method

* make a copy for modified state test suite

* test case notes

* remove macro tests

* add a test setup command

* copy run results state

* passing test case, todos, split work

* clean up result:success test case

* start with build command and remove previous state where needed

* add error result selector tests for seed

* add another error seed test case

* remove todo

* passing build result:error tests

* single failure build test

* add passing test

* fix node assertions for tests

* fix tests

* draft fail+ tests

* add severity to test

* result:warn passing test

* result:warn+ passing tests

* add passing concurrent selector test

* add downstream flag

* add comment

* passing test

* fix test for dynamic node selection

* add build concurrent selector passing test

* add run test cases

* add integration tests for dbt test

* fix formatting

* rename test

* remove extra comments

* add extra newline

* add concurrent selector test / build cases

* clean up todos

* test all nodes

* DRY rebuild code

* test all nodes

* add TODO update assertion code

* cleaner assert code

* fix this test to have a fixed set

* more cleanup

* add changelog

* update concurrent selectors on dbt test

* remove todo

* Update changelog

* Apply suggestions from code review

Co-authored-by: Jeremy Cohen <[email protected]>

* fix changelog

* fix Contributors headers

Co-authored-by: Jeremy Cohen <[email protected]>
Co-authored-by: Matt Winkler <[email protected]>

automatic commit by git-black, original commits:
  97f31c8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Checkpointing in dbt
4 participants