Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix webserver exiting when gunicorn master crashes. Closes #13469 #13470

Conversation

drago-f5a
Copy link
Contributor

As described in #13469, when gunicorn master processes exits, the webserver process continues to run indefinitely, periodically logging an error message.

The method _spawn_new_workers() attempts to spawn new workers, and if that fails the webserver is meant to exit. However, this does not currently work due to a logical mistake when computing number of workers to spawn, and a typo (passing current number of workers instead of the count of workers to spawn).

Btw, I believe the same issue is present in the 2.0 branch.

@boring-cyborg
Copy link

boring-cyborg bot commented Jan 4, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@kaxil kaxil requested a review from mik-laj January 4, 2021 23:31
@kaxil kaxil added this to the Airflow 2.0.1 milestone Jan 4, 2021
@mik-laj
Copy link
Member

mik-laj commented Jan 6, 2021

@drago-f5a can you create a new PR for Airflow 2.0? We do not accept new changes to Airflow 1.10, unless they have already been merged to Airflow 2.0 It would still be nice to add tests to prevent regression also.

class TestGunicornMonitor(unittest.TestCase):

Thanks for finding this bug. I suspect it might not have been easy.

@drago-f5a
Copy link
Contributor Author

@mik-laj I've created a PR for 2.0, and updated/added tests. See:
#13518

Note that you can not cherry-pick the fix into 1.10, due to the package reorganization.

@kaxil kaxil modified the milestones: Airflow 2.0.1, Airflow 1.10.15 Jan 19, 2021
@kaxil kaxil force-pushed the fix/webserver-restart-on-gunicorn-master-crash branch from 2070aaf to a702693 Compare January 22, 2021 22:01
@jhtimmins
Copy link
Contributor

@drago-f5a @potiuk Is this still relevant or should this PR be closed?

@drago-f5a
Copy link
Contributor Author

@drago-f5a @potiuk Is this still relevant or should this PR be closed?

@jhtimmins I believe the plan is to backport #13518 to 1.10.15. However, given the CLI reorganization that occurred between 1.10.x and 2.0.0, the commit from #13518 can not be cherry-picked into v1-10-stable branch. This PR here is essentially the backport, minus the tests.

If it seems like I am dancing around answering the question, it's only because I am not familiar with the backporting process you follow. Should this PR be used for the purpose of backporting? If yes, should we also backport the tests from #13518? Let me know if I can be of any help to move this forward.

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Mar 3, 2021
@kaxil kaxil force-pushed the fix/webserver-restart-on-gunicorn-master-crash branch from a702693 to 7179912 Compare March 3, 2021 00:42
@github-actions
Copy link

github-actions bot commented Mar 3, 2021

The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.

@kaxil kaxil merged commit c990931 into apache:v1-10-stable Mar 9, 2021
linuxfft added a commit to linuxfft/airflow that referenced this pull request Aug 13, 2021
Apache Airflow 1.10.15

- Fix `airflow db upgrade` to upgrade db as intended (apache#13267)
- Moved boto3 limitation to snowflake (apache#13286)
- `KubernetesExecutor` should accept images from `executor_config` (apache#13074)
- Scheduler should acknowledge active runs properly (apache#13803)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Include `airflow/contrib/executors` in the dist package
- Pin Click version for Python 2.7 users
- Ensure all statsd timers use millisecond values. (apache#10633)
- [`kubernetes_generate_dag_yaml`] - Fix dag yaml generate function (apache#13816)
- Fix `airflow tasks clear` cli command wirh `--yes` (apache#14188)
- Fix permission error on non-POSIX filesystem (apache#13121) (apache#14383)
- Fixed deprecation message for "variables" command (apache#14457)
- BugFix: fix the `delete_dag` function of json_client (apache#14441)
- Fix merging of secrets and configmaps for `KubernetesExecutor` (apache#14090)
- Fix webserver exiting when gunicorn master crashes (apache#13470)
- Bump ini from 1.3.5 to 1.3.8 in `airflow/www_rbac`
- Bump datatables.net from 1.10.21 to 1.10.23 in `airflow/www_rbac`
- Webserver: Sanitize string passed to origin param (apache#14738)
- Make `rbac_app`'s `db.session` use the same timezone with `@provide_session` (apache#14025)

- Adds airflow as viable docker command in official image (apache#12878)
- `StreamLogWriter`: Provide (no-op) close method (apache#10885)
- Add 'airflow variables list' command for 1.10.x transition version (apache#14462)

- Update URL for Airflow docs (apache#13561)
- Clarifies version args for installing 1.10 in Docker (apache#12875)
linuxfft added a commit to linuxfft/airflow that referenced this pull request Aug 13, 2021
Apache Airflow 1.10.15

- Fix `airflow db upgrade` to upgrade db as intended (apache#13267)
- Moved boto3 limitation to snowflake (apache#13286)
- `KubernetesExecutor` should accept images from `executor_config` (apache#13074)
- Scheduler should acknowledge active runs properly (apache#13803)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Include `airflow/contrib/executors` in the dist package
- Pin Click version for Python 2.7 users
- Ensure all statsd timers use millisecond values. (apache#10633)
- [`kubernetes_generate_dag_yaml`] - Fix dag yaml generate function (apache#13816)
- Fix `airflow tasks clear` cli command wirh `--yes` (apache#14188)
- Fix permission error on non-POSIX filesystem (apache#13121) (apache#14383)
- Fixed deprecation message for "variables" command (apache#14457)
- BugFix: fix the `delete_dag` function of json_client (apache#14441)
- Fix merging of secrets and configmaps for `KubernetesExecutor` (apache#14090)
- Fix webserver exiting when gunicorn master crashes (apache#13470)
- Bump ini from 1.3.5 to 1.3.8 in `airflow/www_rbac`
- Bump datatables.net from 1.10.21 to 1.10.23 in `airflow/www_rbac`
- Webserver: Sanitize string passed to origin param (apache#14738)
- Make `rbac_app`'s `db.session` use the same timezone with `@provide_session` (apache#14025)

- Adds airflow as viable docker command in official image (apache#12878)
- `StreamLogWriter`: Provide (no-op) close method (apache#10885)
- Add 'airflow variables list' command for 1.10.x transition version (apache#14462)

- Update URL for Airflow docs (apache#13561)
- Clarifies version args for installing 1.10 in Docker (apache#12875)
andrewdanks added a commit to Affirm/airflow that referenced this pull request Mar 18, 2022
Apache Airflow 1.10.15

- Fix `airflow db upgrade` to upgrade db as intended (apache#13267)
- Moved boto3 limitation to snowflake (apache#13286)
- `KubernetesExecutor` should accept images from `executor_config` (apache#13074)
- Scheduler should acknowledge active runs properly (apache#13803)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Include `airflow/contrib/executors` in the dist package
- Pin Click version for Python 2.7 users
- Ensure all statsd timers use millisecond values. (apache#10633)
- [`kubernetes_generate_dag_yaml`] - Fix dag yaml generate function (apache#13816)
- Fix `airflow tasks clear` cli command wirh `--yes` (apache#14188)
- Fix permission error on non-POSIX filesystem (apache#13121) (apache#14383)
- Fixed deprecation message for "variables" command (apache#14457)
- BugFix: fix the `delete_dag` function of json_client (apache#14441)
- Fix merging of secrets and configmaps for `KubernetesExecutor` (apache#14090)
- Fix webserver exiting when gunicorn master crashes (apache#13470)
- Bump ini from 1.3.5 to 1.3.8 in `airflow/www_rbac`
- Bump datatables.net from 1.10.21 to 1.10.23 in `airflow/www_rbac`
- Webserver: Sanitize string passed to origin param (apache#14738)
- Make `rbac_app`'s `db.session` use the same timezone with `@provide_session` (apache#14025)

- Adds airflow as viable docker command in official image (apache#12878)
- `StreamLogWriter`: Provide (no-op) close method (apache#10885)
- Add 'airflow variables list' command for 1.10.x transition version (apache#14462)

- Update URL for Airflow docs (apache#13561)
- Clarifies version args for installing 1.10 in Docker (apache#12875)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants