Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2427] [Feature] [Investigate] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes #7387

Closed
3 of 5 tasks
Tracked by #7301
jtcohen6 opened this issue Apr 18, 2023 · 16 comments · Fixed by #10699
Closed
3 of 5 tasks
Tracked by #7301
Assignees
Labels
enhancement New feature or request Impact: CIP paper_cut A small change that impacts lots of users in their day-to-day user docs [docs.getdbt.com] Needs better documentation
Milestone

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 18, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

As a dbt developer, I need to write a custom "preflight" check in an on-run-start hook, and know that if it raises an exception, the rest of the invocation (including all selected DAG nodes) will be skipped.

At the same time, I don't want a failing on-run-* hook (including one that raises an exception) to cause dbt's invocation to terminate gracelessly, and to prevent it from writing run_results.json. Instead, I want a failing hook to be treated more like a failing model: handle the error, then skip downstream resources.


Copying from #7045 (comment):

We should treat hooks as participants in the DAG. Hooks are meaningfully ordered (linked lists). A failure in one hook should prevent the subsequent hooks from running. A failure in ANY on-run-start hook should prevent the selected DAG nodes from running.

on-run-start-1 --> on-run-start-2 --> [selection from DAG] --> on-run-end-1 --> on-run-end-2

As far as the underlying implementation, I don't have a strong feeling that these hooks have to be actual participants in the actual dbt graph. It could stay a simple for loop, so long as we have enough try handling to meet the expected behavior described above.

Describe alternatives you've considered

These are both strictly worse:

  • Previous behavior (pre-1.4): A failing on-run-end hook would cause the entire invocation to terminate gracelessly, and dbt would not write run_results.json
  • Current behavior (since v1.4): A failing on-run-start hook does not cause the invocation to skip subsequent nodes. This removes the possibility of using on-run-start hooks as custom "preflight" checks for the command, which is the use case outlined in [CT-2179] [Bug] Failing on-run-start does not stop dbt run #7045.

Who will this benefit?

Users of on-run-* hooks

Acceptance criteria

  • If there is a failure in an on-run-start hook, dbt should skip all selected resources from running. This is not true right now.
  • If there is a failure in an on-run-end hook, dbt should still write run_results.json for all previous results. This is true right now, and the GAP team is counting on it remaining true.
@jtcohen6 jtcohen6 added enhancement New feature or request Team:Execution labels Apr 18, 2023
@github-actions github-actions bot changed the title [Feature] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes [CT-2427] [Feature] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes Apr 18, 2023
@iknox-fa iknox-fa added the Refinement Maintainer input needed label May 1, 2023
@iknox-fa
Copy link
Contributor

iknox-fa commented May 1, 2023

Per BLG--
We'd like to take a quick step back and discuss the utility/future of pre and post hooks. @jtcohen6 can you and @aranke get together and discuss?

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented May 2, 2023

@iknox-fa @aranke The context for this issue was a behavior change in v1.4, around whether a failing on-run-start hook would cause the rest of the invocation to skip. For that full context, see the conversation in:

We discussed that issue at some length during a live BLG back in March (thanks @dbeatty10 @MichelleArk @emmyoop for weighing in!), and I opened the present issue (#7387) to capture the proposal. So - I have fairly high confidence in what's being proposed in this issue, as a way of providing the advantages of both the previous (pre-1.4) + the current (1.4+) behavior. If you have specific product/UX concerns, let's find time to chat!

@jtcohen6 jtcohen6 removed the Refinement Maintainer input needed label May 2, 2023
@jtcohen6
Copy link
Contributor Author

While the previous behavior was never explicitly documented, it's important for reenabling patterns like the one described in this article:

@graciegoheen
Copy link
Contributor

@jtcohen6 Is this just for on-run-start and on-run-end? Or do we want the same behavior for pre-hooks and post-hooks?

@graciegoheen
Copy link
Contributor

@jtcohen6 Would you also be able to add an acceptance criteria to the ticket?

@jtcohen6
Copy link
Contributor Author

@graciegoheen

  • just on-run-start / on-run-end hooks
  • updated the description to more clearly state acceptance criteria

@martynydbt
Copy link

This may already be how this works.

@martynydbt martynydbt changed the title [CT-2427] [Feature] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes [CT-2427] [Feature] [Investigate] Gracefully handle failures in on-run-* hooks, and skip subsequent hooks/nodes Aug 28, 2023
@peterallenwebb
Copy link
Contributor

We aren't (Peter/Michelle) sure whether it already works like this or not. Another question is what a failure in the on-run-end hook would look like. Would it result in a fail-fast exception, or some other exception we aren't currently handling? So there is some investigation and experimentation to be done as part of the work here.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Aug 29, 2023

This issue reflects that, since v1.4:

  • dbt has gracefully handled failures in on-run-end hooks — we still write run_results.json, which is good! There might be no changed needed here.
  • dbt no longer stops runs due to a failure in an on-run-start hook — we don't skip subsequent models. This issue is not fixed in recent versions of dbt-core.

Repro case: Make a project with some models, and add this to the project file:

# dbt_project.yml

on-run-start:
  - "select askdjf"

The expected behavior is that, if an on-run-start hook fails, the rest of the models for that invocation will not run. That is not the case:

$ dbt run
07:40:37  Running with dbt=1.7.0-b1
07:40:37  Registered adapter: postgres=1.7.0-b1
07:40:37  Unable to do partial parsing because of a version mismatch
07:40:37  Found 2 models, 1 operation, 0 sources, 0 exposures, 0 metrics, 353 macros, 0 groups, 0 semantic models
07:40:37
07:40:38
07:40:38  Running 1 on-run-start hook
07:40:38  1 of 1 START hook: my_dbt_project.on-run-start.0 ............................... [RUN]
07:40:38  Database error while running on-run-start
07:40:38  Concurrency: 5 threads (target='dev')
07:40:38
07:40:38  1 of 2 START sql view model dbt_jcohen.model_one ............................... [RUN]
07:40:38  1 of 2 OK created sql view model dbt_jcohen.model_one .......................... [CREATE VIEW in 0.13s]
07:40:38  2 of 2 START sql view model dbt_jcohen.model_two ............................... [RUN]
07:40:38  2 of 2 OK created sql view model dbt_jcohen.model_two .......................... [CREATE VIEW in 0.04s]
07:40:38
07:40:38  Finished running 2 view models in 0 hours 0 minutes and 0.32 seconds (0.32s).
07:40:38
07:40:38  Completed with 1 error and 0 warnings:
07:40:38
07:40:38    on-run-start failed, error:
 column "askdjf" does not exist
LINE 2: select askdjf
               ^
07:40:38
07:40:38  Done. PASS=2 WARN=0 ERROR=1 SKIP=0 TOTAL=3

This is important to support use cases where users want to define custom "project checks" that run before models are built — sort of like pre-commit hooks — and raise a compiler error if the check fails. For example, this discourse post which I linked in the comment above.

@KhalidF-Sky
Copy link

Hello! We have some code that runs on the "on-run-start" hook. If this code fails to run, we would like the entire process to stop at that point. We expect the pre and post-hooks to also cause the process to fail and stop if there are any SQL errors. It would be great if this could be included, but if not, could it be made configurable? This way, everyone will be satisfied. ;)

@graciegoheen
Copy link
Contributor

We expect the pre and post-hooks to also cause the process to fail and stop if there are any SQL errors.

I believe this is already the way pre and post-hooks work. This issue is to do the same for on-run-start and on-run-end.

@jtcohen6 jtcohen6 added the paper_cut A small change that impacts lots of users in their day-to-day label Sep 22, 2023
@dbeatty10
Copy link
Contributor

dbeatty10 commented Oct 5, 2023

Lightly editing questions from @katarinagacnik here:

since this behavior changed, we lost the ability to forcefully end the run.
Do you have any alternative solutions for this? How could we force exit() dbt run automatically when certain conditions are met?

@jtcohen6 what are the workarounds that you mentioned here?

In the meantime, there are workarounds to achieve this sort of behavior in a multi-step dbt deployment.

@jade-thirdwave
Copy link

Hi, I'm wondering any updates around this issue? We also want to stop the subsequent process when an exception is raised in the on-run-start hooks.

@graciegoheen graciegoheen added this to the v1.9 milestone Apr 10, 2024
@vitorweiss
Copy link

I think I understood the behavior of this.

I have a similar context here where I want to block users from running a dbt command without a select flag. For that I have this macro:

{% macro check_select_arg() %}

    {% if
        not invocation_args_dict.select
        and target.name not in ['prod']
    %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}

I added it to dbt_project.yml and It worked fine. It always raised a compilation error every run:

(venv) vitor_weiss@vitor-weiss ~/dbt $ dbt test
19:07:49  Running with dbt=1.7.10
19:07:50  Registered adapter: databricks=1.7.0
19:07:50  Unable to do partial parsing because saved manifest not found. Starting full parse.
19:07:53  Encountered an error:
Compilation Error in operation dbt-on-run-start-0 (./dbt_project.yml)
  Error: You must provide at least one select argument
  
  > in macro check_select_arg (macros/general_config/check_select_arg.sql)
  > called by operation dbt-on-run-start-0 (./dbt_project.yml)
(venv) vitor_weiss@vitor-weiss ~/dbt $ dbt test
19:08:06  Running with dbt=1.7.10
19:08:07  Registered adapter: databricks=1.7.0
19:08:07  Unable to do partial parsing because saved manifest not found. Starting full parse.
19:08:11  Encountered an error:
Compilation Error in operation dbt-on-run-start-0 (./dbt_project.yml)
  Error: You must provide at least one select argument
  
  > in macro check_select_arg (macros/general_config/check_select_arg.sql)
  > called by operation dbt-on-run-start-0 (./dbt_project.yml)
(venv) vitor_weiss@vitor-weiss ~/dbt $ dbt test

But later, I realized that I needed to add another condition to my macro that only blocks certain commands (build, test, ...) as below:

{% macro check_select_arg() %}

    {% if
        not invocation_args_dict.select
        and target.name not in ['prod']
        and invocation_args_dict.which in ['build', 'run', 'test', 'source', 'snapshot']
    %}

        {{ exceptions.raise_compiler_error("Error: You must provide at least one select argument") }}

    {% endif %}

{% endmacro %}

After that, my macro stopped blocking the execution of dbt, although it generated a database error as @jtcohen6 showed us.

If I remove my target folder and use the macro for the first time, without changing it, it works as expected. If I modify anything in the macro it changes the behavior, allowing it to run even when a compilation error occurs. Perhaps this could lead to a solution, knowing how to reproduce the error and the “normal” behavior.

I'm using dbt 1.7.10 by the way.

@dbeatty10
Copy link
Contributor

If I remove my target folder and use the macro for the first time, without changing it, it works as expected. If I modify anything in the macro it changes the behavior, allowing it to run even when a compilation error occurs.

@vitorweiss The actual vs. expected behavior you describe here sounds like it is working the way you want when dbt does not use partial parsing, but doesn't work they way you want otherwise.

Workaround

If so, then you can add the --no-partial-parse flag to get your desired behavior. In v1.8+, this can be set in your dbt_project.yml like this:

# dbt_project.yml
flags:
  # set default for running this project -- anywhere, anytime, by anyone
  partial_parse: false

Other feature request

Maybe #10323 could be expanded to also cover errors in addition to warnings 🤷

Namely, always raise warnings (and errors!) in every invocation, even when partial parsing is used.

@aranke aranke self-assigned this Sep 5, 2024
@graciegoheen graciegoheen added the user docs [docs.getdbt.com] Needs better documentation label Sep 24, 2024
@FishtownBuildBot
Copy link
Collaborator

Opened a new issue in dbt-labs/docs.getdbt.com: dbt-labs/docs.getdbt.com#6172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Impact: CIP paper_cut A small change that impacts lots of users in their day-to-day user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.