BashOperator to raise AirflowSkipException on exit code 127 #13421

dstandish · 2021-01-01T07:15:02Z

Airflow's skipped task status can be useful when building dags for some trickier requirements.

This PR adds to BashOperator the capability to end the task in skipped state.

If the command exits with code 255, an AirflowSkippedException is thrown.

If you have a better suggestion re choice of exit code, suggest. I chose 255 because I think it is pretty unlikely to be found in the wild, which means this would only be a "breaking" change in a negligible sense.

If we don't like exit codes for this, we could come up with some convention for parsing stdout. But exit codes seems simple and good enough.

kurtqq · 2021-01-01T07:37:04Z

What is special about BashOperator that you suggest this modification only for this operator?
I agree that SKIP state can be useful but I think there should be a more generic approach that will fit all operators rather than specific one.

dstandish · 2021-01-01T07:40:04Z

What is special about BashOperator that you suggest this modification only for this operator?
I agree that SKIP state can be useful but I think there should be a more generic approach that will fit all operators rather than specific one.

Most other operators do their work in python. So they can easily raise AirflowSkipException. The generic approach is simply to raise AirflowSkipException.

But you can't do that from bash. That's what's special about bash operator. You need some way to signal from within the bash process "ok, i want to skip now", so that when it comes back to python, the exception can be raised. And my proposal is to use exit code for this signalling.

The idea came to me from a stack overflow question where the user was using bash operator, and a possible solution would have been to trigger a skip from within bash, but it's not currently possible.

kurtqq · 2021-01-01T07:57:58Z

Most other operators do their work in python. So they can easily raise AirflowSkipException. The generic approach is simply to raise AirflowSkipException.

Many operators are executed on remote machines that don't have airflow installed in so you cant just raise airflow exception.
Can you explain how to do that for SQL & Spark operators?

dstandish · 2021-01-01T08:18:38Z

I understand what you're saying. I'm not sure it would be practical to implement for sql, especially in a uniform way. There is so much variation in sql databases, and limitless variation with the types of queries people execute and the returns there may or may not be and the methods used.... In a database where you can throw exceptions like MS Sql, probably you could pick one to be "the skip exception". In others (like snowflake) where you cannot, I can't think of a practical way. Not sure about spark.

In any case though, for bash it is pretty straightforward and clear and easy to understand. We're already evaluating exit code. Why not reserve one of them to mean skip. Anyway, it's an idea 🤷.

potiuk · 2021-01-01T08:26:35Z

In any case though, for bash it is pretty straightforward and clear and easy to understand. We're already evaluating exit code. Why not reserve one of them to mean skip. Anyway, it's an idea .

Yep. I think this is a good idea and Spark/SQL opertors can raise their own ways of raising Skip exception on their own terms. Nothing wrong with that.

potiuk · 2021-01-01T08:28:20Z

However before relesing 2.0.1 and deciding what to do with branching we should only merge after this happens (as this is new feature and should go to 2.1.

github-actions · 2021-01-01T08:29:01Z

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

dstandish · 2021-01-01T08:38:12Z

deciding what to do with branching

not familiar with what's going on with branching.... but no need to recap it here if it will surface on dev list

in other news, it looks like one of the tests did not like the addition of the task in example_bash_operator.py... i'll look into this tomorrow.

thanks

dstandish · 2021-01-02T04:36:41Z

OK @potiuk I fixed the test.

One problem is that our example dags also serve as test dags, so if you add another example task (as I did) you might break a test.

The other contributing factor was that the test depended on hardcoded counts of the tasks in the dag.

In my resultion I thought about simply excluding my task from the dag (i.e. purge it from dag1.task_dict). But task removal isn't really supported so I abandoned that.

What I ended up doing is a mild refactor of the test such that the "starting states" were controlled in a dictionary and instead of hardcoding state stats, I add a helper function to read this from the config.

github-actions · 2021-01-02T05:19:52Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

dstandish · 2021-01-03T21:44:38Z

Found more tests that need to be updated

I think example dags should not be used in tests: #13453

potiuk · 2021-01-03T22:27:57Z

@dstandish - while I understand your frustrations (been there, done that), I honestly think it is quite the opposite (and no offence but it is a bit selfish thinking).

To be honest, our plans are to turn all example DAGs (in providers) into system tests. It makes perfect sense to use examples as tests. If we did not do it, your change would make examples invalid for example.

I think having example DAGs wrong (which would have happen if they were not part of the tests) is far worse (for the community) than you having to fix the examples during the change. You ask to trade off your time with the time of at least hundreds of people (users) scratching their heads while following the examples.

Imagine the users asking questions: why it does not work if I follow the example ? Repeat x-times. Trade it off with the time of a single person simply iterating on fixing the tests with a clear information (failed tests). While I think it's worse for you as a peraon, it's far better for the community as a whole to fix the examples in this case.

That is at least my way of thinking - when I spend 5x times more time than I initially thought about on fixing something where i also have to fix documentation, tests and related tests that start to break (and when it turns out that my change need to be a bit more generalised).

No hard feeling - it's just just an opinion and a.way of thinking :). I might be wrong, of course, and the examples might be useless as examples. But I would rather spend even a bit more time to add them to documentation and make them useful in this case as well.

It just follows my motto of 'wirh every change leave the (small) world around you behind a little better that before' :).

dstandish · 2021-01-03T23:45:37Z

hehe no worries :)

i think the example_bash_operator does a good job of illustrating why example dags should not be used in tests.

in general, tests should have only the elements that are absolutely necessary to test the behavior they are examining.

let me give an example.

consider this class:

class MyObj:
    def __init__(self, param1=None, param2=None):
        self.param1 = param1
        self.param2 = param2
        
    def get_param1(self):
        return self.param1

consider these tests

def test_get_param1():
    m = MyObj('a')
    assert m.get_param1() == 'a'

def test_get_param1():
    m = MyObj('a', 'b')
    assert m.get_param1() == 'a'

The first test is better because the value for param2 is totally irrelevant to the test. Including 'b' is just extra noise, and it is in fact a costly distraction because if there is something wrong with the test, it's another thing we have to chase down to see if it is the cause. And if we need to make a change, then it's another thing we need to understand.

Similarly, when we use example_bash_operator as a test dag for many different tests, inevitably we end up with a lot of extra noise in tests. An example of this is in test test_backfill_multi_dates in test_backfill_job.py. It uses example bash operator.

There is a component in there that spells out expected execution order:

        expected_execution_order = [
            ("runme_0", DEFAULT_DATE),
            ("runme_1", DEFAULT_DATE),
            ("runme_2", DEFAULT_DATE),
            ("runme_0", end_date),
            ("runme_1", end_date),
            ("runme_2", end_date),
            ("also_run_this", DEFAULT_DATE),
            ("also_run_this", end_date),
            ("run_after_loop", DEFAULT_DATE),
            ("run_after_loop", end_date),
            ("run_this_last", DEFAULT_DATE),
            ("run_this_last", end_date),
        ]

I'd bet this test could be accomplished with fewer tasks. But it uses all of them because that's what is in example_bash_operator. And probably as people have added to that example, they add to this and other tests test. But by doing so, we are spending effort, and not getting any return on our effort. Extraneous content in a test makes it tougher to understand exactly what it is testing. So adding more tasks to this test makes this test worse, not better. It increases technical debt. And that is not good for the community. If I add another task to this test, merely because I want to demonstrate new feature in an example dag, then I make this test worse.

Keeping test and example combined is not good for users either. In handcuffs us with regard to writing example dags because we are constrained by the tests that reference it, tests which we don't want to modify without good reason.

So in sum there are two main problems with this approach:

test and example dags are the same
- constrained from designing example dags to best exemplify features because they have mixed purposes
- can't freely alter an example without dealing with entirely unrelated tests
having too many tests share the same test dag
- makes tests unnecessarily complicated and noisy
- can reach a point where the structure required by one test is incompatible, or at least very inconvenient, when trying to meet the needs of another test
- developer inconvenience -- need to understand, and alter, many different tests, entirely unrelated to your change
  - non-trivial risk that "fixing" a test actually breaks it, when did not really need to change it at all for your PR

So, I do believe there are costs incurred in (1) mixing test and example dags and (2) having many tests share the same test dag. And my question is, what is the benefit? If there is a return, then great. But I don't see it.

What do you think?

dstandish · 2021-01-04T06:10:52Z

ok @potiuk ..... if only for the sake of discussion, i have copied example_bash_operator.py as it existed before to tests/dags/test_miscellaneous.py and renamed the dag to miscellaneous_test_dag.

i have called it miscellaneous because, although the bash operator is used in it, it's not used in any tests of the bash operator. it's just a generic test dag used in a variety of tests with no unifying theme.

i did not update references in other tests which also use example_bash_operator but which are not so picky about its structure that they fail when you add a task.

I also left in place the refactor of test_mark_tasks.py because even though the refactor would not be needed with the creation of miscellaneous_test_dag, the improvements help readability and are worth keeping.

airflow/example_dags/example_bash_operator.py

github-actions · 2021-02-05T04:02:51Z

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

dstandish · 2021-02-05T05:56:00Z

@ashb @potiuk i think this one's good to go

ashb · 2021-02-05T13:48:47Z

airflow/operators/bash.py

-    with ``set -e;``
-    Example:
+    Airflow will evaluate the exit code of the bash command.  In general, a non-zero exit code will result in
+    task failure and zero will result in task success.  Exit code ``255`` will throw an


Suggested change

task failure and zero will result in task success. Exit code ``255`` will throw an

task failure and zero will result in task success. Exit code ``127`` will throw an

etc. Docs still refer to 255 below too.

ashb

LGTM other than the docs still saying 255

dstandish · 2021-02-06T08:38:38Z

LGTM other than the docs still saying 255

my bad... fixed. just one flakey build error now so i'm letting this sleeping dog lie.

boring-cyborg bot added the kind:documentation label Jan 1, 2021

potiuk approved these changes Jan 1, 2021

View reviewed changes

potiuk added this to the Airflow 2.1 milestone Jan 1, 2021

github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jan 1, 2021

dstandish force-pushed the bash-skip-operator branch from e2b5da0 to 9a97f88 Compare January 2, 2021 04:29

dstandish force-pushed the bash-skip-operator branch from 9a97f88 to b444a42 Compare January 3, 2021 05:28

dstandish requested a review from potiuk January 4, 2021 06:14

ashb requested changes Jan 4, 2021

View reviewed changes

airflow/example_dags/example_bash_operator.py Outdated Show resolved Hide resolved

github-actions bot removed the full tests needed We need to run full set of tests for this PR to merge label Jan 4, 2021

dstandish force-pushed the bash-skip-operator branch from a732dd8 to 609b50b Compare January 7, 2021 17:31

dstandish requested a review from ashb January 7, 2021 17:32

dstandish force-pushed the bash-skip-operator branch from 609b50b to d6348c3 Compare January 9, 2021 17:36

dstandish added 4 commits February 4, 2021 18:16

BashOperator raises AirflowSkipException on exit code 255

e902df8

!fixup

64da1ec

fix test

08f6917

move move existing bash operator dag to tests/dags

5528dd7

dstandish added 2 commits February 4, 2021 18:24

!fixup

4635fc1

use code 127

b34668c

dstandish force-pushed the bash-skip-operator branch from d6348c3 to b34668c Compare February 5, 2021 02:24

ashb reviewed Feb 5, 2021

View reviewed changes

dstandish changed the title ~~BashOperator to raise AirflowSkipException on exit code 255~~ BashOperator to raise AirflowSkipException on exit code 127 Feb 6, 2021

fix missed 255 references

0084a76

ashb approved these changes Feb 7, 2021

View reviewed changes

ashb merged commit 31b6918 into apache:master Feb 7, 2021

dstandish mentioned this pull request Feb 7, 2021

Add SubprocessHook for running commands from operators #13423

Merged

ashb mentioned this pull request Mar 26, 2021

Add Option for BashOperator to Set State to SKIPPED #8245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BashOperator to raise AirflowSkipException on exit code 127 #13421

BashOperator to raise AirflowSkipException on exit code 127 #13421

dstandish commented Jan 1, 2021

kurtqq commented Jan 1, 2021

dstandish commented Jan 1, 2021 •

edited

Loading

kurtqq commented Jan 1, 2021 •

edited

Loading

dstandish commented Jan 1, 2021 •

edited

Loading

potiuk commented Jan 1, 2021

potiuk commented Jan 1, 2021

github-actions bot commented Jan 1, 2021

dstandish commented Jan 1, 2021

dstandish commented Jan 2, 2021

github-actions bot commented Jan 2, 2021

dstandish commented Jan 3, 2021

potiuk commented Jan 3, 2021

dstandish commented Jan 3, 2021 •

edited

Loading

dstandish commented Jan 4, 2021 •

edited

Loading

github-actions bot commented Feb 5, 2021

dstandish commented Feb 5, 2021

ashb Feb 5, 2021

ashb Feb 5, 2021

ashb left a comment

dstandish commented Feb 6, 2021 •

edited

Loading

	task failure and zero will result in task success. Exit code ``255`` will throw an
	task failure and zero will result in task success. Exit code ``127`` will throw an

BashOperator to raise AirflowSkipException on exit code 127 #13421

BashOperator to raise AirflowSkipException on exit code 127 #13421

Conversation

dstandish commented Jan 1, 2021

kurtqq commented Jan 1, 2021

dstandish commented Jan 1, 2021 • edited Loading

kurtqq commented Jan 1, 2021 • edited Loading

dstandish commented Jan 1, 2021 • edited Loading

potiuk commented Jan 1, 2021

potiuk commented Jan 1, 2021

github-actions bot commented Jan 1, 2021

dstandish commented Jan 1, 2021

dstandish commented Jan 2, 2021

github-actions bot commented Jan 2, 2021

dstandish commented Jan 3, 2021

potiuk commented Jan 3, 2021

dstandish commented Jan 3, 2021 • edited Loading

dstandish commented Jan 4, 2021 • edited Loading

github-actions bot commented Feb 5, 2021

dstandish commented Feb 5, 2021

ashb Feb 5, 2021

Choose a reason for hiding this comment

ashb Feb 5, 2021

Choose a reason for hiding this comment

ashb left a comment

Choose a reason for hiding this comment

dstandish commented Feb 6, 2021 • edited Loading

dstandish commented Jan 1, 2021 •

edited

Loading

kurtqq commented Jan 1, 2021 •

edited

Loading

dstandish commented Jan 1, 2021 •

edited

Loading

dstandish commented Jan 3, 2021 •

edited

Loading

dstandish commented Jan 4, 2021 •

edited

Loading

dstandish commented Feb 6, 2021 •

edited

Loading