Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problems with, and test agsinst, Python 3.8 #8674

Closed
2 of 3 tasks
ashb opened this issue May 1, 2020 · 11 comments
Closed
2 of 3 tasks

Fix problems with, and test agsinst, Python 3.8 #8674

ashb opened this issue May 1, 2020 · 11 comments
Assignees
Labels
area:Scheduler including HA (high availability) scheduler pinned Protect from Stalebot auto closing

Comments

@ashb
Copy link
Member

ashb commented May 1, 2020

Airflow is currently tested against python 3.5 only. This may be insufficient as users may wish to run airflow on newer versions, which may introduce breaking changes.

This is a bit of an "epic" issue to track the current know Python 3.8 bugs.

Migrated from AIRFLOW-4762.

@ashb ashb added pinned Protect from Stalebot auto closing area:Scheduler including HA (high availability) scheduler labels May 1, 2020
@jhtimmins
Copy link
Contributor

I'm currently working on this one 🐍 🤙

@jhtimmins
Copy link
Contributor

jhtimmins commented May 5, 2020

Additional known issue:
The TestDagFileProcessorAgent test fails because it assumes the processes have shared memory, which is no longer true with spawn.

@jhtimmins jhtimmins removed their assignment May 6, 2020
@jhtimmins
Copy link
Contributor

jhtimmins commented May 6, 2020

Oops, accidentally unassigned myself.

@mik-laj
Copy link
Member

mik-laj commented May 6, 2020

@jhtimmins fixed

@jhtimmins
Copy link
Contributor

@mik-laj Thanks very much!

@jhtimmins
Copy link
Contributor

PR #8794 is part of this issue

@jhtimmins
Copy link
Contributor

jhtimmins commented May 13, 2020

Should Airflow support process spawning?

Tl;Dr - Spawning creates the possibility of hard-to-detect bugs in child processes. It’s the default on MacOS, the sole option on Windows, and Linux has spawn but defaults to fork.

Overview

  • Prior to Python 3.8, multiprocessing.Process used fork by default on Linux and macOS.
  • As of 3.8, spawn is the default method on MacOS. “The fork start method should be considered unsafe as it can lead to crashes of the subprocess”.
  • Windows continues to only support spawn.

Differences - Fork vs Spawn

  • Spawn starts a new Python interpreter process for the child. Only a small subset of data from the parent’s scope is passed to the child. Safer than Fork, but slower.
  • Forking uses os.fork() to fork the parent process. All parent memory objects are available to the child.

Issue in 3.8:

  • Actually a Mac issue
  • Calling an Objective-C method between fork() and exec() may cause the child process to fail bc of a memory lock issue.
  • As of macOS 10.13, Apple has added a check for these cases, automatically crashing threads where this behavior occurs.
  • Python 3.7 disables this check. 3.8 gets around the issue by spawning new processes instead forking.

Problems with spawn:
Spawn doesn’t have access to the parent’s state. This can cause unexpected behavior changes. For example, I noticed that a test passed on both spawn and fork, but spawning took substantially longer to finish. The was caused by a config value set in the parent process, which the child couldn’t access. When the child looked up the config value, the default value was returned. This caused ~13 dags to be tested, instead of the single dag that the test required.

This class of bug can be mitigated by using environment variables to set config values globally, rather than local to the parent process.

Using environment variables to pass state presents two issues:

  1. Adding to global state can be dangerous, as it is hard to tell what other code is reading/writing that state. The risk of conflict is high.
  2. Airflow has a substantial amount of code already using configs to share state between processes. It isn’t obvious where this is/isn’t an issue, especially when silent failure is a possibility.

Because of these issues, we should thoughtfully consider whether to use the system defaults, or to hardcode to only using fork on Mac and Linux (h/t to Daniel for suggesting this discussion).

Considerations:

  • While Python 3.8 can use fork instead of spawn, doing so reintroduces the original possibility of child process failure due to memory lock conflicts.
  • There is some discussion among Python maintainers whether to standardize with spawn across all systems, or whether they should each use their own default. They chose not to take this approach for 3.8, but it’s worth noting that it’s an option.
  • Using Fork on macOS is essentially a continuation of the status quo on all three platforms.
    • macOS will work the same as it did for 3.7. Tests will pass
    • Given that Windows only supported spawning all along, I assume that neither the scheduler nor the tests have worked (ever?). I have not tested this personally.
    • Linux will not change.

Recommendations:
The choice here depends where stability or multi-system support is a higher priority to the Airflow project.

If stability > multi-system support - If stability is key, then I recommend we hardcode forking all systems. This will continue to make Airflow incompatible with Windows, and introduces the possibility of memory bugs on Mac. The benefit is that we don’t need to store state globally on macOS and we don’t need to hunt down pieces of code that are currently failing silently.

If multi-system support > stability - We should continue to support the default method on every system, and address bugs when they arise. We should come up with a standardized process for storing configuration values in either config or environment variables. Right now those methods are closely coupled, increasing the likelihood of errors.

I lean towards multi-system support, with the caveat that this will likely cause hard to find bugs. Those bugs will probably be localized to MacOS, which should lower the risk for Airflow instances running on production Linux machines.

@potiuk
Copy link
Member

potiuk commented May 14, 2020

Great analysis @jhtimmins thanks!

Indeed - we discussed the fork/spawn case in the past and the MacOS vs. Linux change in 3.8. I think we are not at all concerned with Windows - anyhow the only recommended way to run Airflow on Windows is WSL2. Regarding MacOS vs. Linux. I think it's great it works on MacOS for development purposes, but for production, I believe we only ever support Linux. Also, we are moving more and more into Dockerised solutions - with official production image, coming docker-compose support, Helm Chart - we are Docker-first for execution of Airflow in production.

For development we also have Breeze environment which I think long term might become the default and while we still support (and will support) running local virtualenv on MacOS, we already have a number of tests that will not work straight from the IDE or when run locally - they need integrations (rabbitmq, Cassandra, databases etc.) and for those Breeze's docker environment is the one that should be used. We have full MacOS support for Breeze as well so I think making sure tests pass and are fast on MacOS natively is not a huge problem. I think our main concern should be that all true "unit" tests work in any environment but the more complex "integration" tests can only work in Breeze.

I think it really depends how many tests will not work. I believe this is a very small subset of tests. Those are the important ones, but I believe there are not many of those. If that's the case, then I personally would be for hard-coding forking and marking the tests that are bound to fail on MacOS with a custom pytest marker (say @pytest.marker.linux ). Those tests will then be automatically skipped when run on MacOS and the message we can put is "These tests only work in Linux environment, please use Breeze docker environment if you are running them on MacOS" or similar. This way we avoid people being surprised by failing tests, give them clear information how to run them. At the same time if people develop on Linux, they will still be able to run those tests "natively" without having to switch to breeze environment.

@ashb
Copy link
Member Author

ashb commented May 14, 2020

For development we also have Breeze environment which I think long term might become the default

Let's separate out development Airflow itself and developing DAGs etc.

For the former: maybe, but it should never be a requirement on contributors. (And as long as I'm working on Airflow my default's gonna be direct development, not using Docker 😀. I am on Linux now though.)

For the latter: being able to use OSX (and yes, eventually Windows) as a user writing dags and running a development Airflow instance without Docker is a critical goal of on-boarding for new users because:

  1. Running multiple docker containers at once can start to seriously impact performance on less-powerful laptops.
  2. Docker is still quite confusing if you aren't familiar with it, especially when it comes to rebuilding and volume mounts etc.

We can solve some of that, but I am set on still supporting running directly for users on OSX.

I believe this is a very small subset of tests.

Yes, it is. The only real issue in the spawn-vs-fork comes when we use the conf_vars decorator in the test -- where the temporarily changed config values doesn't affect any spawned subprocess. Given the fix for that is quite easy (we can set the AIRFLOW__* environment variable in the conf_var decorator too.) that I wouldn't even say we need a special decorator for it.

And the config mocking issue aside, I think being able to test on OSX/with spawn is useful to getting us towards testing it runs on OSX/Windows.

@BrainLogic
Copy link

BrainLogic commented May 28, 2020

Hi Airflow Team,

I tried to run airflow 1.10.10 with Python 3.8.3 (the same for 3.8.0) on macOs Mojave 10.14.6 as a demon process airflow webserver -p 8080 -D

this way does not work:

Traceback (most recent call last):
  File "/.../airflow/.venv/bin/airflow", line 37, in <module>
    args.func(args)
  File "/.../airflow/.venv/lib/python3.8/site-packages/airflow/utils/cli.py", line 75, in wrapper
    return f(*args, **kwargs)
  File "/.../airflow/.venv/lib/python3.8/site-packages/airflow/bin/cli.py", line 966, in webserver
    ctx = daemon.DaemonContext(
  File "/.../airflow/.venv/lib/python3.8/site-packages/daemon/daemon.py", line 280, in __init__
    detach_process = is_detach_process_context_required()
  File "/.../airflow/.venv/lib/python3.8/site-packages/daemon/daemon.py", line 801, in is_detach_process_context_required
    if is_process_started_by_init() or is_process_started_by_superserver():
  File "/.../airflow/.venv/lib/python3.8/site-packages/daemon/daemon.py", line 778, in is_process_started_by_superserver
    if is_socket(stdin_fd):
  File "/.../airflow/.venv/lib/python3.8/site-packages/daemon/daemon.py", line 743, in is_socket
    file_socket = socket.fromfd(fd, socket.AF_INET, socket.SOCK_RAW)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 544, in fromfd
    return socket(family, type, proto, nfd)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py", line 231, in __init__
    _socket.socket.__init__(self, family, type, proto, fileno)
OSError: [Errno 38] Socket operation on non-socket

probably, it can be related to https://bugs.python.org/issue39685

@potiuk
Copy link
Member

potiuk commented Jul 14, 2020

Fixed in 1.10.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler pinned Protect from Stalebot auto closing
Projects
None yet
Development

No branches or pull requests

5 participants