-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BashOperator to raise AirflowSkipException on exit code 127 #13421
Conversation
What is special about |
Most other operators do their work in python. So they can easily raise But you can't do that from bash. That's what's special about bash operator. You need some way to signal from within the bash process "ok, i want to skip now", so that when it comes back to python, the exception can be raised. And my proposal is to use exit code for this signalling. The idea came to me from a stack overflow question where the user was using bash operator, and a possible solution would have been to trigger a skip from within bash, but it's not currently possible. |
Many operators are executed on remote machines that don't have airflow installed in so you cant just raise airflow exception. |
I understand what you're saying. I'm not sure it would be practical to implement for sql, especially in a uniform way. There is so much variation in sql databases, and limitless variation with the types of queries people execute and the returns there may or may not be and the methods used.... In a database where you can throw exceptions like MS Sql, probably you could pick one to be "the skip exception". In others (like snowflake) where you cannot, I can't think of a practical way. Not sure about spark. In any case though, for bash it is pretty straightforward and clear and easy to understand. We're already evaluating exit code. Why not reserve one of them to mean skip. Anyway, it's an idea 🤷. |
Yep. I think this is a good idea and Spark/SQL opertors can raise their own ways of raising Skip exception on their own terms. Nothing wrong with that. |
However before relesing 2.0.1 and deciding what to do with branching we should only merge after this happens (as this is new feature and should go to 2.1. |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
not familiar with what's going on with branching.... but no need to recap it here if it will surface on dev list in other news, it looks like one of the tests did not like the addition of the task in example_bash_operator.py... i'll look into this tomorrow. thanks |
e2b5da0
to
9a97f88
Compare
OK @potiuk I fixed the test. One problem is that our example dags also serve as test dags, so if you add another example task (as I did) you might break a test. The other contributing factor was that the test depended on hardcoded counts of the tasks in the dag. In my resultion I thought about simply excluding my task from the dag (i.e. purge it from What I ended up doing is a mild refactor of the test such that the "starting states" were controlled in a dictionary and instead of hardcoding state stats, I add a helper function to read this from the config. |
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
9a97f88
to
b444a42
Compare
Found more tests that need to be updated I think example dags should not be used in tests: #13453 |
@dstandish - while I understand your frustrations (been there, done that), I honestly think it is quite the opposite (and no offence but it is a bit selfish thinking). To be honest, our plans are to turn all example DAGs (in providers) into system tests. It makes perfect sense to use examples as tests. If we did not do it, your change would make examples invalid for example. I think having example DAGs wrong (which would have happen if they were not part of the tests) is far worse (for the community) than you having to fix the examples during the change. You ask to trade off your time with the time of at least hundreds of people (users) scratching their heads while following the examples. Imagine the users asking questions: why it does not work if I follow the example ? Repeat x-times. Trade it off with the time of a single person simply iterating on fixing the tests with a clear information (failed tests). While I think it's worse for you as a peraon, it's far better for the community as a whole to fix the examples in this case. That is at least my way of thinking - when I spend 5x times more time than I initially thought about on fixing something where i also have to fix documentation, tests and related tests that start to break (and when it turns out that my change need to be a bit more generalised). No hard feeling - it's just just an opinion and a.way of thinking :). I might be wrong, of course, and the examples might be useless as examples. But I would rather spend even a bit more time to add them to documentation and make them useful in this case as well. It just follows my motto of 'wirh every change leave the (small) world around you behind a little better that before' :). |
hehe no worries :) i think the example_bash_operator does a good job of illustrating why example dags should not be used in tests. in general, tests should have only the elements that are absolutely necessary to test the behavior they are examining. let me give an example. consider this class: class MyObj:
def __init__(self, param1=None, param2=None):
self.param1 = param1
self.param2 = param2
def get_param1(self):
return self.param1 consider these tests def test_get_param1():
m = MyObj('a')
assert m.get_param1() == 'a' def test_get_param1():
m = MyObj('a', 'b')
assert m.get_param1() == 'a' The first test is better because the value for Similarly, when we use example_bash_operator as a test dag for many different tests, inevitably we end up with a lot of extra noise in tests. An example of this is in test There is a component in there that spells out expected execution order: expected_execution_order = [
("runme_0", DEFAULT_DATE),
("runme_1", DEFAULT_DATE),
("runme_2", DEFAULT_DATE),
("runme_0", end_date),
("runme_1", end_date),
("runme_2", end_date),
("also_run_this", DEFAULT_DATE),
("also_run_this", end_date),
("run_after_loop", DEFAULT_DATE),
("run_after_loop", end_date),
("run_this_last", DEFAULT_DATE),
("run_this_last", end_date),
] I'd bet this test could be accomplished with fewer tasks. But it uses all of them because that's what is in example_bash_operator. And probably as people have added to that example, they add to this and other tests test. But by doing so, we are spending effort, and not getting any return on our effort. Extraneous content in a test makes it tougher to understand exactly what it is testing. So adding more tasks to this test makes this test worse, not better. It increases technical debt. And that is not good for the community. If I add another task to this test, merely because I want to demonstrate new feature in an example dag, then I make this test worse. Keeping test and example combined is not good for users either. In handcuffs us with regard to writing example dags because we are constrained by the tests that reference it, tests which we don't want to modify without good reason. So in sum there are two main problems with this approach:
So, I do believe there are costs incurred in (1) mixing test and example dags and (2) having many tests share the same test dag. And my question is, what is the benefit? If there is a return, then great. But I don't see it. What do you think? |
ok @potiuk ..... if only for the sake of discussion, i have copied i have called it miscellaneous because, although the bash operator is used in it, it's not used in any tests of the bash operator. it's just a generic test dag used in a variety of tests with no unifying theme. i did not update references in other tests which also use example_bash_operator but which are not so picky about its structure that they fail when you add a task. I also left in place the refactor of |
a732dd8
to
609b50b
Compare
609b50b
to
d6348c3
Compare
d6348c3
to
b34668c
Compare
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
airflow/operators/bash.py
Outdated
with ``set -e;`` | ||
Example: | ||
Airflow will evaluate the exit code of the bash command. In general, a non-zero exit code will result in | ||
task failure and zero will result in task success. Exit code ``255`` will throw an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task failure and zero will result in task success. Exit code ``255`` will throw an | |
task failure and zero will result in task success. Exit code ``127`` will throw an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etc. Docs still refer to 255 below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than the docs still saying 255
my bad... fixed. just one flakey build error now so i'm letting this sleeping dog lie. |
Airflow's
skipped
task status can be useful when building dags for some trickier requirements.This PR adds to
BashOperator
the capability to end the task inskipped
state.If the command exits with code 255, an
AirflowSkippedException
is thrown.If you have a better suggestion re choice of exit code, suggest. I chose 255 because I think it is pretty unlikely to be found in the wild, which means this would only be a "breaking" change in a negligible sense.
If we don't like exit codes for this, we could come up with some convention for parsing stdout. But exit codes seems simple and good enough.