Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6569] Flush pending Sentry exceptions before exiting forked process #7232

Merged

Conversation

mikeclarke
Copy link
Contributor

@mikeclarke mikeclarke commented Jan 21, 2020

After switching to os.fork() for the task runner, there is the possibility that exceptions queued by Sentry will not be emitted prior to the process exiting.

This fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior to calling os._exit() within the forked task runner.

This is covered by existing unit tests within tests/task/task_runner/test_standard_task_runner.py.


Issue link: AIRFLOW-6569

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Jan 21, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Jan 21, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community!
If you have any issues or are unsure about any anything please check our
Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)

Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits
    will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory).
    Adding a new operator? Check this short [guide](https://github
    .com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows
    how users should use it.
  • Consider using Breeze environment for testing
    locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from
    Committers.

Apache Airflow is a community-driven project and together we are making it better 🚀.

In case of doubts contact the developers at:
Mailing List: [email protected]
Slack: https://apache-airflow-slack.herokuapp.com/

@robinedwards
Copy link
Contributor

Hahah I was just working on an almost identical solution. Nice one :-)

@codecov-io
Copy link

codecov-io commented Feb 4, 2020

Codecov Report

Merging #7232 into master will increase coverage by 54.78%.
The diff coverage is 22.22%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #7232       +/-   ##
===========================================
+ Coverage   33.23%   88.01%   +54.78%     
===========================================
  Files         935      935               
  Lines       45170    45177        +7     
===========================================
+ Hits        15010    39764    +24754     
+ Misses      30160     5413    -24747     
Impacted Files Coverage Δ
airflow/task/task_runner/standard_task_runner.py 64.06% <0.00%> (+41.11%) ⬆️
airflow/sentry.py 87.01% <50.00%> (+32.21%) ⬆️
airflow/cli/cli_parser.py 97.24% <0.00%> (+0.91%) ⬆️
.../providers/google/cloud/operators/presto_to_gcs.py 38.66% <0.00%> (+2.66%) ⬆️
airflow/executors/celery_executor.py 88.51% <0.00%> (+3.37%) ⬆️
airflow/utils/decorators.py 90.47% <0.00%> (+4.76%) ⬆️
airflow/models/base.py 100.00% <0.00%> (+6.25%) ⬆️
airflow/exceptions.py 100.00% <0.00%> (+6.66%) ⬆️
airflow/www/forms.py 100.00% <0.00%> (+6.77%) ⬆️
airflow/utils/timeout.py 76.00% <0.00%> (+8.00%) ⬆️
... and 807 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dafdd0...212485b. Read the comment docs.

@ashb
Copy link
Member

ashb commented Feb 5, 2020

Does this need a specific version of the sentry library to work?

airflow/sentry.py Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Mar 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added stale Stale PRs per the .github/workflows/stale.yml policy file and removed stale Stale PRs per the .github/workflows/stale.yml policy file labels Mar 21, 2020
@robinedwards
Copy link
Contributor

Any chance you could rebase this again? Cheers

@mikeclarke mikeclarke force-pushed the fix/flush-sentry-errors-AIRFLOW-6569 branch from 3408e42 to 212485b Compare April 6, 2020 16:53
@mikeclarke
Copy link
Contributor Author

Rebased again, thanks for the heads up.

@mikeclarke mikeclarke force-pushed the fix/flush-sentry-errors-AIRFLOW-6569 branch 3 times, most recently from 261facf to 7b46d39 Compare April 10, 2020 21:52
@maxcountryman
Copy link
Contributor

I believe the bug this patch addresses is actively preventing us from receiving alerts in production.

@will-misslin
Copy link

I believe the bug this patch addresses is actively preventing us from receiving alerts in production.

I second this affecting our alerting.

@kellyldougan
Copy link

When can we expect this bug fix to get merged?

@mik-laj
Copy link
Member

mik-laj commented May 19, 2020

@tiopi @AbhiPrasad Can I ask for review? Can you share your experience with applications that use fork and Sentry?

@AbhiPrasad
Copy link
Contributor

Yeah these changes LGTM, and are in line with how we have to use the python sdk until we add extended multiprocessing support. We should merge after rebase if tests pass.

@mikeclarke mikeclarke force-pushed the fix/flush-sentry-errors-AIRFLOW-6569 branch from 7b46d39 to 1d19d01 Compare May 20, 2020 18:04
@mikeclarke
Copy link
Contributor Author

Rebased master just now - last time the tests were failing for unrelated reasons, but we'll see.

@maxcountryman
Copy link
Contributor

It seems like the tests that are failing are unrelated to this patch and seem to be the same tests that are also failing on master.

What's blocking merging this in at this point?

@mik-laj
Copy link
Member

mik-laj commented May 22, 2020

Can you do rebase to the latest master? I would like to accept these changes, but I need green tests (excluding quarantine)

Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior
to calling `os._exit` within the forked task runner.
@mikeclarke mikeclarke force-pushed the fix/flush-sentry-errors-AIRFLOW-6569 branch from 1d19d01 to b113457 Compare May 22, 2020 21:28
@mikeclarke
Copy link
Contributor Author

Can you do rebase to the latest master? I would like to accept these changes, but I need green tests (excluding quarantine)

Rebased again, thanks.

@mik-laj mik-laj merged commit db70da2 into apache:master May 23, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented May 23, 2020

Awesome work, congrats on your first merged pull request!

@nacosta
Copy link

nacosta commented May 26, 2020

@mikeclarke thanks a lot for the fix.

@mik-laj , do you know if this fix will go out in the next release 1.10.11?

Many thanks

@mik-laj
Copy link
Member

mik-laj commented May 26, 2020

@kaxil do you have any information about it?

@kaxil kaxil added this to the Airflow 1.10.11 milestone May 27, 2020
@kaxil
Copy link
Member

kaxil commented May 27, 2020

Yes can be included in 1.10.11

@ad-m
Copy link
Contributor

ad-m commented May 27, 2020

@kaxil , thank you for information and add to milestone "Airflow 1.10.11".

kaxil pushed a commit that referenced this pull request Jun 22, 2020
Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior
to calling `os._exit` within the forked task runner.

(cherry picked from commit db70da2)
potiuk pushed a commit that referenced this pull request Jun 29, 2020
Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior
to calling `os._exit` within the forked task runner.

(cherry picked from commit db70da2)
@kaxil kaxil added the type:improvement Changelog: Improvements label Jul 1, 2020
kaxil pushed a commit that referenced this pull request Jul 1, 2020
Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior
to calling `os._exit` within the forked task runner.

(cherry picked from commit db70da2)
ncolomer added a commit to ncolomer/airflow that referenced this pull request Oct 20, 2020
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Fixes AIRFLOW-6569 by explicitly flushing pending exceptions prior
to calling `os._exit` within the forked task runner.

(cherry picked from commit db70da2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.