Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 support in docker images #15635

Closed
andormarkus opened this issue May 3, 2021 · 35 comments
Closed

ARM64 support in docker images #15635

andormarkus opened this issue May 3, 2021 · 35 comments
Labels
area:production-image Production image improvements and fixes kind:feature Feature Requests

Comments

@andormarkus
Copy link
Contributor

andormarkus commented May 3, 2021

Description

ARM64 support in docker images

Use case / motivation

In the recent years ARM based cpu popularity hugely increased. The arm based cpus are offering better price to performance ratio and laptops outstanding performance. It's getting popular to run ARM nodes in your Kubernetes clusterss.
If you want to run airflow on arm node in kubernetes the docker image wont start it wailt fails with standard_init_linux.go:219: exec user process caused: exec format error.

ARM based cpu in the database world like AWS Graviton or in the consumer world like Apple M1.

Related Issues

In #14796 @potiuk confirmed there is no office support for ARM64 support in docker images

Update: ARM is supported for development now. The support for released imges is coming.

@andormarkus andormarkus added the kind:feature Feature Requests label May 3, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented May 3, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented May 3, 2021

Absolutely

@potiuk
Copy link
Member

potiuk commented May 3, 2021

Been on my radar for quite some time. We were mostly waiting for getting the docker into stable version for MacOS. And yeah I think this is the next 'big' thing for the images once we get full PIP21 compatibility and python 3.9 support

@uranusjr
Copy link
Member

uranusjr commented May 3, 2021

The biggest obstacle is likely not Airflow itself (which would likely run on ARM just fine), but the many dependencies it has. Either upstreams need to publish ARM wheels (although my understanding is the most difficult-to-build ones already do), or Airflow needs to develop infrastructures to build the images.

@potiuk
Copy link
Member

potiuk commented May 3, 2021

The biggest obstacle is likely not Airflow itself (which would likely run on ARM just fine), but the many dependencies it has. Either upstreams need to publish ARM wheels (although my understanding is the most difficult-to-build ones already do), or Airflow needs to develop infrastructures to build the images.

Correct. We tried it before and we had limited success previously. The biggest problems were pyarrow/numpy and related especially for the older versions. But with the recent dependency bumps for pyarrow i think this problem might be gone already.

@potiuk
Copy link
Member

potiuk commented May 3, 2021

Airflow needs to develop infrastructures to build the images.

Correct. We have credits from GCP that we will be able to use to run Runners in the ARM infrastructure. We are now running out of S3 credits (trying to get some more) and I am working with Ash, Bin and Claudio on switching to GCP. I think infra won't be a problem :)

@potiuk
Copy link
Member

potiuk commented May 3, 2021

Just checked pyarrow 4.0.0 has aarch wheels. We are using 3.0 now so maybe it's not yet solved.

@andormarkus
Copy link
Contributor Author

We could install and run numpy on AWS ARM without issue, however installing pyarrow 2.0.0 and 3.0.0 was not possible without serious workarounds. Pyarrow 4.0.0 support aarch64.

@potiuk Can airflow move to pyarrow 4.0.0 ? Should I create a separate ticket issue for it?

@potiuk
Copy link
Member

potiuk commented May 3, 2021

This ain't as easy as that. The pyarrow limitation comes from apache-beam currently. And there will likely be other dependencies with no ARM support - with all the transitive ones we have 500 (!) dependencies for Airflow: https://github.com/apache/airflow/blob/constraints-master/constraints-3.8.txt.

The basic 'no-providers" dependencies are much smaller - just 142 of those :) https://github.com/apache/airflow/blob/constraints-master/constraints-no-providers-3.8.txt. So those might be much easier to get working.

For ARM we might have to add support to exclude some of the providers (like apache-beam) to make it works but that would require implementing more complex CI build/test path, specifically for the ARM image. We will enventually do that, but I think this will come only after we have full GCP support for our CI builds, release all remaining providers for PyPI 21 and add support for Python 3.9. All those are a bit more higher priority.

Also we might want also to exclude some packages for Python 3.9 #15515 - for now we have thrift-sasl which comes from Hive/Kerberos for example (but they are working on support). So likely I am going to develop some exclusion mechanism for our CI build + tests that we might want to use.

However if someone would like to take the current Dockerfile/constraints/extras and try it work for ARM - to see what else we are still missing - happy to hear some experiences from that.

@vikramkoka vikramkoka added the area:production-image Production image improvements and fixes label May 4, 2021
@FyZzyss
Copy link

FyZzyss commented May 24, 2021

I managed to run airflow 2.0 in docker on the apple m1 with python 3.8.2.
This was helped by upgrading docker to version 3.3.2 with qemu support and adding more resources to docker.

@potiuk
Copy link
Member

potiuk commented May 24, 2021

Ah cool. Any special considerations?

@FyZzyss
Copy link

FyZzyss commented May 24, 2021

Ah cool. Any special considerations?

There were no more problems.

@zeevo
Copy link

zeevo commented Jun 22, 2021

I managed to run airflow 2.0 in docker on the apple m1 with python 3.8.2.
This was helped by upgrading docker to version 3.3.2 with qemu support and adding more resources to docker.

Woah! I'm trying to run Airflow on arm64 docker. How did you do it?

@tikikun
Copy link

tikikun commented Sep 14, 2021

Hi everyone, is Airflow supporting ARM64 now?

@potiuk
Copy link
Member

potiuk commented Sep 15, 2021

Not yet (natively). The Docker on MacOS with ARM however should work with emulation.

@zeevo
Copy link

zeevo commented Dec 23, 2021

Does anyone have an ideas or Dockerfiles to share for this?

@andormarkus
Copy link
Contributor Author

As I know this project uses Github actions to build images. Github actions only support amd64 runners does not offer arm64 runner like other CI provider. You can build arm64 images on amd64 platform with the help on Docker BuildX however it will have a 3-4X time penalty compared to a native arm64 runner.

To make sure Airflow running on arm64 without hiccup you needs to rerun all of the test on arm64 runner as well. I don't know how do that on Github with amd64 runner. Our CI provider has amd64 and arm64 runners and we double run our test to make sure both platform is fully supported.

Our CI provider has a very generous open source but migrating this project to other CI provider would be a lot of work. To run Airflow on Amazon Graviton (other public provider does not have arm offering yet) would not have significant impact on our monthly bill.

@potiuk
Copy link
Member

potiuk commented Dec 23, 2021

Thanks @andormarkus - I understand your CI provides ARM. And I am pretty sure those are weeks rather than months that GitHub Actions will provide one - ARM is becoming a "highly reguested" thing. So nice try ;) but GitHub Actions are not a problem here. This is by far the least (and easy to bypass) of our blockers

As I know this project uses Github actions to build images. Github actions only support amd64 runners does not offer arm64 runner like other CI provider. You can build arm64 images on amd64 platform with the help on Docker BuildX however it will have a 3-4X time penalty compared to a native arm64 runner.

Actually we build our images for CI tests with our self-hosted runners of Amazon. So we could set the pipelines on GitHub actions using those. We will just need to build a bit of infrastructure for that.

The actual blocker here is native support for ARM for a number of libraries and tools we are using. Airflow has 550+ pyhon dependencies, where some of those have native c/C++ libraries and need precompiled versions. We have complex cross-dependencies which simply make it complex to get the right set that will work.

Another thing that "blocks us" to hav full ARM support is lack of ARM images for MySQL https://bugs.mysql.com/bug.php?id=103462.

However, while we won't be able to get "full" support" for ARM yet for all the dependencies I am pretty confident we can get "selective" support. Python 3.9 + Postgres + subset of providers should actually work.

I am going to take a stab on it after Xmas, however. I even have a new MacBook PRO to play a bit with that. This will require a few upgrades to our build infrastructure:

  • adapt our Dockerfiles to support multi-platform builds
  • changing build infrastructure to use BUILDX and produce multiplatform images (including proper caching)
  • adding posibility to select test scope depending on the Architecture
  • adding possibility to select and build constraints that will depend on the architecture (I am pretty sure we will have different set of constraints for our dependencies for different architectures)
  • converting our self-hosted AMIs to be ARM-based

All that is needed to make it works. The final "result" that you see as an image seems like "simple" thing to do. But in order do that a set of prerequisites need to be fulfilled.

@zeevo
Copy link

zeevo commented Dec 23, 2021

@potiuk

I am going to take a stab on it after Xmas, however. I even have a new MacBook PRO to play a bit with that. This will require a few upgrades to our build infrastructure:

🙏 Thanks and good luck!

@Mr-YYM
Copy link

Mr-YYM commented Dec 27, 2021

CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O         PIDS
ce252f9769da   airflow_airflow-webserver_1   107.54%   997.1MiB / 5.8GiB   16.79%    1.77MB / 1.21MB   114MB / 24.6kB    41
efeef0b18e3a   airflow_airflow-worker_1      94.19%    716.1MiB / 5.8GiB   12.06%    25.8kB / 44kB     36.2MB / 0B       33
45f05042fd21   airflow_airflow-scheduler_1   100.43%   587.2MiB / 5.8GiB   9.89%     767kB / 889kB     32MB / 4.1kB      26
76495df5987e   airflow_airflow-triggerer_1   86.23%    264.5MiB / 5.8GiB   4.45%     381kB / 459kB     38.5MB / 0B       9
d50abc5e2641   airflow_flower_1              0.04%     255.9MiB / 5.8GiB   4.31%     26.2kB / 65.5kB   57.4MB / 0B       28
f2d578f6c774   airflow_redis_1               0.08%     7.754MiB / 5.8GiB   0.13%     96.1kB / 33.6kB   12.2MB / 0B       5
916b26b80728   airflow_postgres_1            1.64%     53.78MiB / 5.8GiB   0.91%     2.58MB / 2.92MB   34.8MB / 2.25MB   14

when running with emulation, CPU is high

@tikikun
Copy link

tikikun commented Jan 4, 2022

Has anyone done this ?

@potiuk
Copy link
Member

potiuk commented Jan 5, 2022

I am getting closer to attempting it. For now I need to merge a series of PRs for docker image optimisation and mostly this one finally - #20664 -> this will change build infrastructure for our image to buildx/buildkit which opens up the built to be mutliplatform. Keep your fingers crossed.

BTW. My new MacPro is bare usable without it so I have some incentive to make it happen :)

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

FYI. One of the last "serious" blockers for ARM image is numpy. The <1.21 limit added by Apache Beam is going to be removed once Apache Beam releases next version, which is planned for first week of February #19059 (comment)

I could - potentialy - exclude Beam (we did that in the past), but since we know it will be fixed in weeks, we should wait. I will attempt to prepare ARM image right after (cc: @dstandish :))

@andormarkus
Copy link
Contributor Author

@potiuk We are waiting for this almost a year, plus weeks does not matters.
Will this be part of 2.2.4 or we need to wait till 2.2.5?

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

I think 2.3.0, realistically

@potiuk
Copy link
Member

potiuk commented Jan 24, 2022

And also copying my comment from airflow-helm/charts#488 (comment)

Just to explain where it comes from.

The problem is mainly because many of the dependencies are scrambling to get proper "architecture" support. This is a good thing - long term - that Apple switched to ARM (more choice is a good thing) and it pretty much forced all the developers to pay attention.

The ARM support has to - unfortunately - bubble up - from OS support for ARM (already there for linux for quite some time thanks to Android), then low-level libraries in C/C++/Rust, then Python low-level libraries (like Numpy) finally applications and tools like Airflow (or for example MySQL or MSSQL that have to provide clients and images that are also ARM-based). All this is a huge effort by all parties involved - not as much to make the change but mainly to make sure that it works and that they have the right continuous integration in-place to support it, so that they can release the software with confidence after passing all the tests.

I think ~ mid 2022 will be the time where all this effort will be nearly complete and those who won't be there will have to do it as they will lag behind (this is the main reason I deffered buying M1 MacBook as I knew it was very far from being ready when M1 was released first and I hated the touchbar, and lack of HDMI. MagSafe is great BTW and I am glad it is back).

So crossed_fingers that all our dependencies will be ready soon (or at least the crucial ones that will allow Airlfow to reliably and reproducibly build ARM image and add CI harness for it).

potiuk added a commit to potiuk/airflow that referenced this issue Mar 10, 2022
This support is mostly for the developers, not for CI full chain yet.
It has several limitations:

* no MySQL client support
* no MsSQL client support
* no CI tests yet

What is implemented:

* automated detection of ARM/AMD architecture when building and
  running breeze
* automated cache refresh on CI for ARM/AMD

Currently only development (ghcr.io) images are supported for ARM.

Fixes: apache#18849
Fixes: apache#17494
Relates to: apache#15635

The images published in DockerHub for now are AMD64 only. We will
run development with M1 images for some time and later we will
likely make our DockerHub images multi-platform as well.

Also Hadolint does not have ARM images yet so we had to disable it
and we should re-enable it back after the support is added.
See hadolint/hadolint#411
potiuk added a commit that referenced this issue Mar 10, 2022
This support is mostly for the developers, not for CI full chain yet.
It has several limitations:

* no MySQL client support
* no MsSQL client support
* no CI tests yet

What is implemented:

* automated detection of ARM/AMD architecture when building and
  running breeze
* automated cache refresh on CI for ARM/AMD

Currently only development (ghcr.io) images are supported for ARM.

Fixes: #18849
Fixes: #17494
Relates to: #15635

The images published in DockerHub for now are AMD64 only. We will
run development with M1 images for some time and later we will
likely make our DockerHub images multi-platform as well.

Also Hadolint does not have ARM images yet so we had to disable it
and we should re-enable it back after the support is added.
See hadolint/hadolint#411
@phyyou
Copy link

phyyou commented Apr 13, 2022

I'm waiting for my raspberry PI...

@potiuk
Copy link
Member

potiuk commented Apr 13, 2022

OK. Just to update it a bit - the dev image is there. The prod image might be added as experimental in 2.3.0 as well. I think we are more or less ready for that.

@houqp
Copy link
Member

houqp commented Apr 15, 2022

@potiuk do you know which set of dev images were built with arm support? I tried some of the newer ones from the github registry but they were all amd64 it looks like.

@potiuk
Copy link
Member

potiuk commented Apr 15, 2022

For example this one:

https://github.com/apache/airflow/pkgs/container/airflow%2Fmain%2Fprod%2Fpython3.7/19115034?tag=latest

image

Generally all the "dev" images of our are following the scheme:

  • ghcr.io/apache/airflow/main/prod/pythonX.Y
  • ghcr.io/apache/airflow/main/ci/pythonX.Y

Following the naming convention we have for those: https://github.com/apache/airflow/blob/main/IMAGES.rst#naming-conventions

You can always retag the latest image to a regular airflow one if you want to use some more familiar names:

docker pull ghcr.io/apache/airflow/main/prod/python3.7
docker tag ghcr.io/apache/airflow/main/prod/python3.7 apache/airflow:latest

Note that X.Y >= 3.7 because we dropped Python 3.6 in main and it is not refreshed any more.

We only build and refresh the "latest" image for multi-platform. They are automatically refreshing after successful main build - so the images often reflect the latest "main" (unless we are in a period of fixing some main failures because of dependencies update - then they might be few commits behind).

Those are the images that breeze uses, thus the naming convention is different than the "release" images which are published on GitHub. We do not publish multi-platform images there yet.

We have also "per-commit" images that are used CI - but for now those are single-platform (amd64) only. The naming convention there is:

  • ghcr.io/apache/airflow/main/ci/pythonX.Y:<COMMIT_SHA>
  • ghcr.io/apache/airflow/main/prod/pythonX.Y:<COMIMT_SHA>

I proposed to start releasing the experimental multi-platform images to DockerHub as well (we could do it for one of the betas/rcs of 2.3.0) - the tooling we have is ready for it - but I have not received any answers yet :) - feel free to comment if you think it is a good idea: https://lists.apache.org/thread/pqhks390dkso9x668gbnvjq6k6wv8h9h

@houqp
Copy link
Member

houqp commented Apr 19, 2022

@potiuk do you plan to wait for couple more comments before we move on to push the official experimental arm images?

@potiuk
Copy link
Member

potiuk commented Apr 19, 2022

yes. I am going to follow-up. There was an Easter period and 2.3.0 was kinda busy period for many people, so I expect some discussion to resume this week.

@potiuk
Copy link
Member

potiuk commented May 1, 2022

The 2.3.0 now officially supports ARM64 and AMD64:

Screenshot 2022-05-01 at 13 41 28

Closing this one :).

@potiuk potiuk closed this as completed May 1, 2022
@zeevo
Copy link

zeevo commented May 2, 2022

@potiuk You are awesome! Thanks

@houqp
Copy link
Member

houqp commented May 2, 2022

Thank you @potiuk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:production-image Production image improvements and fixes kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

10 participants