Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high memory leak, cannot start even webserver #29841

Closed
1 of 2 tasks
antonio-antuan opened this issue Mar 1, 2023 · 5 comments · Fixed by #29916
Closed
1 of 2 tasks

high memory leak, cannot start even webserver #29841

antonio-antuan opened this issue Mar 1, 2023 · 5 comments · Fixed by #29916
Assignees
Labels
area:core kind:bug This is a clearly a bug
Milestone

Comments

@antonio-antuan
Copy link

Apache Airflow version

2.5.1

What happened

I'd used airflow 2.3.1 and everything was fine.
Then I decided to move to airflow 2.5.1.
I can't start even webserver, airflow on my laptop consumes the entire memory (32Gb) and OOM killer comes.

I investigated a bit. So it starts with airflow 2.3.4. Only using official docker image (apache/airflow:2.3.4) and only on linux laptop, mac is ok.

Memory leak starts when source code tries to import for example airflow.cli.commands.webserver_command module using airflow.utils.module_loading.import_string.
I dived deeply and found that it happens when "import daemon" is performed.
You can reproduce it with this command: docker run --rm --entrypoint="" apache/airflow:2.3.4 /bin/bash -c "python -c 'import daemon'". Once again, reproducec only on linux (my kernel is 6.1.12).
That's weird considering daemon hasn't been changed since 2018.

What you think should happen instead

No response

How to reproduce

docker run --rm --entrypoint="" apache/airflow:2.3.4 /bin/bash -c "python -c 'import daemon'"

Operating System

Arch Linux (kernel 6.1.12)

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@antonio-antuan antonio-antuan added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 1, 2023
@antonio-antuan
Copy link
Author

antonio-antuan commented Mar 1, 2023

ok, clarified

this is the place where things happens: site-packages/daemon/daemon.py:868

def get_maximum_file_descriptors():
    """ Get the maximum number of open file descriptors for this process.

        :return: The number (integer) to use as the maximum number of open
            files for this process.

        The maximum is the process hard resource limit of maximum number of
        open file descriptors. If the limit is “infinity”, a default value
        of ``MAXFD`` is returned.
        """
    (__, hard_limit) = resource.getrlimit(resource.RLIMIT_NOFILE)

    result = hard_limit
    if hard_limit == resource.RLIM_INFINITY:
        result = MAXFD

    return result


_total_file_descriptor_range = (0, get_maximum_file_descriptors())
_total_file_descriptor_set = set(range(*_total_file_descriptor_range))

apache/airflow:2.3.1 (and above I think) doesn't have this version of library so nothing happens during initialization.
I'd like to suggest better ulimits for image

upd: forgot the most important thing, please see screenshot
image

@antonio-antuan
Copy link
Author

created an issue for python-daemon: https://pagure.io/python-daemon/issue/72

@potiuk
Copy link
Member

potiuk commented Mar 1, 2023

Thanks for looking into it and finding out the root cause being python-daemon. So far we missed the root cause of it, but we knew the new containerd has the problem due to their changed settings.

This is a known issue with container.d changing their default setting - we had been discussing it recenty in #29731 and the discussion contains also some workarounds that you can use.

The fact that we know it's python-daemon opens up a possibility that we can likely patch it somehow while waiting for either containerd reverting the change or Python daemon fixing their behaviour

@antonio-antuan
Copy link
Author

Glad to help :)

@potiuk potiuk added this to the Airflow 2.5.2 milestone Mar 1, 2023
@potiuk potiuk self-assigned this Mar 1, 2023
@josh-fell josh-fell removed the needs-triage label for new issues that we didn't triage yet label Mar 2, 2023
potiuk added a commit to potiuk/airflow that referenced this issue Mar 3, 2023
This PR vendors-in 2.3.2 version of the `python-daemon` package in
order to fix the apache#29841 issue resulting from containerd configure
infinite nofile as default.
potiuk added a commit to potiuk/airflow that referenced this issue Mar 3, 2023
This PR synchronizes to the main version of python-daemon
(e38f05a5780626637e7ff10a08f9ed354acbe399) as of 03.03.2023.

This version has been tested and solves the problem where
starting python-daemon inside newer containerd with unlimited
nofile limits uses up the whole available memory.

It is added on top of apache#29845 that vendors-in 2.3.2 version of the package.

This one and apache#29845 should be removed and replaced with >= requirement
of python-daemon once it is released.

Fixes: apache#29841
@potiuk
Copy link
Member

potiuk commented Mar 4, 2023

The python-daemon==3.0.0 has been releassed with the fix - you can upgrade it and it should fix the problem.

potiuk added a commit to potiuk/airflow that referenced this issue Mar 4, 2023
Recent change in the new containerd causes memory exhaution as
huge amount of memory were used by python-daemon when starting,
thus running Airflow in Docker for multiple OS-es using the new
containerd was impossible without implementing some workarounds.

Python daemon fix has been released in 3.0.0 version in response to
https://pagure.io/python-daemon/issue/72 and we should add min
version for the package to make sure the new version is used.

Fixes: apache#29841
potiuk added a commit that referenced this issue Mar 4, 2023
…9916)

Recent change in the new containerd causes memory exhaution as
huge amount of memory were used by python-daemon when starting,
thus running Airflow in Docker for multiple OS-es using the new
containerd was impossible without implementing some workarounds.

Python daemon fix has been released in 3.0.0 version in response to
https://pagure.io/python-daemon/issue/72 and we should add min
version for the package to make sure the new version is used.

Fixes: #29841
pierrejeambrun pushed a commit that referenced this issue Mar 7, 2023
…9916)

Recent change in the new containerd causes memory exhaution as
huge amount of memory were used by python-daemon when starting,
thus running Airflow in Docker for multiple OS-es using the new
containerd was impossible without implementing some workarounds.

Python daemon fix has been released in 3.0.0 version in response to
https://pagure.io/python-daemon/issue/72 and we should add min
version for the package to make sure the new version is used.

Fixes: #29841
(cherry picked from commit c8cc49a)
pierrejeambrun pushed a commit that referenced this issue Mar 8, 2023
…9916)

Recent change in the new containerd causes memory exhaution as
huge amount of memory were used by python-daemon when starting,
thus running Airflow in Docker for multiple OS-es using the new
containerd was impossible without implementing some workarounds.

Python daemon fix has been released in 3.0.0 version in response to
https://pagure.io/python-daemon/issue/72 and we should add min
version for the package to make sure the new version is used.

Fixes: #29841
(cherry picked from commit c8cc49a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
3 participants