Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use manylinux2014 to get aarch64/ppc64le support #11170

Closed
hrw opened this issue Dec 4, 2019 · 95 comments · Fixed by #12705 or MacPython/scipy-wheels#93
Closed

Use manylinux2014 to get aarch64/ppc64le support #11170

hrw opened this issue Dec 4, 2019 · 95 comments · Fixed by #12705 or MacPython/scipy-wheels#93
Labels
Official binaries Items related to the official SciPy binaries, including wheels and vendored libs
Milestone

Comments

@hrw
Copy link

hrw commented Dec 4, 2019

Manylinux2014 image got released some time ago. One of things it brings is support for !x86 architectures.

Scipy is using Travis CI so two things can be joined and used to generate wheel files for aarch64 and ppc64le architectures.

Amount of software installed to be able to run 'pip install scipy' would be cut. Also install times would be nicer.

@rgommers
Copy link
Member

rgommers commented Dec 4, 2019

There's limited tooling support and a bunch of TODOs on the rollout plan (pypa/manylinux#338). So I'm wondering what the best plan is for producing wheels right now. It's too late for SciPy 1.4.0, which is ready to go. So we have 6 months till the next release.

Some questions:

  • do we need to produce both manylinux2010 and manylinux2014 wheels for x86, or just one of them?
  • do we want ppcle64 wheels before fixing the knownfails and persistent crash we have at the moment?
  • we'll need aarch64 CI, do we use Drone or TravisCI or ...?

@rgommers rgommers added the Official binaries Items related to the official SciPy binaries, including wheels and vendored libs label Dec 4, 2019
@rgommers rgommers added this to the 1.5.0 milestone Dec 4, 2019
@tylerjereddy
Copy link
Contributor

we'll need aarch64 CI, do we use Drone or TravisCI or ...?

I think Drone has been working well for OpenBLAS ARM testing (@martin-frbg @isuruf ?). I feel like Shippable was causing quite a few issues for NumPy, though maybe that has quieted down?

@isuruf
Copy link
Contributor

isuruf commented Dec 4, 2019

Drone has been working well for conda-forge. Few advantages over Travis-CI.

  • Faster to boot
  • 60 minutes per job instead of 50 mins
  • Large limit on log size
  • Travis has timeout of 10 minutes with no output, but drone doesn't have it (AFAIK)
  • arm32 (armhf) support

@martin-frbg
Copy link

Yes, Drone has been working quite nicely for OpenBLAS since isuruf set it up.

@hrw
Copy link
Author

hrw commented Dec 5, 2019

@rgommers which manylinux to use is a question of which distros you support. Manylinux2014 is CentOS 7+ while manylinux2010 is CentOS 6+ (afaik).

@matthew-brett
Copy link
Contributor

matthew-brett commented Dec 5, 2019 via email

This was referenced Mar 27, 2020
@odidev
Copy link
Contributor

odidev commented Apr 2, 2020

Hi @rgommers as per the discussion #11724, I have modified the Travis.yml file to run builds for aarch64 ( currently only for python v3.7 ), but since the tests are taking longer than 50 minutes to execute the build is getting timed-out. Please have a look:

https://travis-ci.com/github/odidev/scipy/builds/157507705

I have already requested the Travis community to increase the timeout here: https://travis-ci.community/t/request-regarding-timeout-extension-for-arm64-builds/7938

Also, since you suggested that using drone-ci will be a much preferable option, I have been working on creating the drone.yml for aarch64 support ( for python v3.7 ), The tests have been successfully triggered but here also the test suite is taking longer than 60 minutes to complete hence build is halted: https://cloud.drone.io/odidev/scipy/13/1/2

I will be more than happy to contribute to making the aarch64 ci support available for scipy, Please share your thoughts on this.

@hrw
Copy link
Author

hrw commented Apr 2, 2020

Can tests be run in parallel? AArch64 machines usually have more cores than x86-64 ones.

@rgommers
Copy link
Member

rgommers commented Apr 5, 2020

TravisCI gets to 65%, Drone only to 25% of the test suite - not what I had expected.

Tests can be run in parallel. If I look at what's happening now, NPY_NUM_BUILD_JOBS=2 is set so that should speed up the build a little, however the tests aren't yet run in parallel. Installing pytest-xdist and running python runtests.py --parallel=2 should fix that. I believe a TravisCI job gets 2 cores, although it would be worth double checking that for ARM64.

@isuruf
Copy link
Contributor

isuruf commented Apr 7, 2020

Drone machines have 96 cores each, but is shared across projects.

@odidev
Copy link
Contributor

odidev commented Apr 7, 2020

@rgommers I am running the Travis-ci builds and facing the below issue while building it, could you please suggest what am I missing: https://travis-ci.com/github/odidev/scipy

@rgommers
Copy link
Member

rgommers commented Apr 7, 2020

Probably removing -u from the runtests command, see https://stackoverflow.com/questions/14258500/python-significance-of-u-option. Best to keep modifications to a minimum I'd think, rather than throwing out so many build/test flags.

@odidev
Copy link
Contributor

odidev commented Apr 8, 2020

Thanks, @rgommers I have built scipy and here is the build log on Travis-ci for parallel=2 :
https://travis-ci.com/github/odidev/scipy/builds/159274043.

Since the above build was halting I have checked the same for parallel=3, Please have a look at the log: https://travis-ci.com/github/odidev/scipy

Also, I have tried the same on drone ci as well for parallel=3, Please have a look at the log:
https://cloud.drone.io/odidev/scipy/19/1/2

Please share your thoughts on this.

@rgommers
Copy link
Member

rgommers commented Apr 9, 2020

Okay, so parallel=2 still times out (just, gets to 98%) and parallel=3 passes all tests except for:

$ if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then ./tools/check_pyext_symbol_hiding.sh build; fi

build/testenv/lib/python3.7/site-packages/scipy/cluster/_vq.cpython-37m-aarch64-linux-gnu.so: too many public symbols!
0000000000005be4 T PyInit__vq
0000000000026718 B __pyx_module_is_main_scipy__cluster___vq
00000000000128d4 T _fini
0000000000002aa8 T _init
The command "if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then ./tools/check_pyext_symbol_hiding.sh build; fi" exited with 1.

See gh-8463 for context on this test. The actual build log is hidden so it's hard to be completely sure, but it looks like it's using the conda-forge compilers and linker (gcc-7.4 is also installed with apt-get, but conda compilers will be the first ones found normally). Those haven't been tested before on aarch64, so likely a real issue. Not clear to me where, could be in conda-forge's binutils, in cython or in cluster.vq.

@pv or @isuruf any thoughts?

@isuruf
Copy link
Contributor

isuruf commented Apr 9, 2020

#11833

@odidev
Copy link
Contributor

odidev commented Apr 15, 2020

@rgommers I have been working on the Travis build for arm64, Please find the outcome for the same:

Note:- when I have only triggered the two arm64 builds.

  • When running all the builds, all are getting successfully passed except the two arm64 related builds they are getting timed-out . I have tried increasing the --parallel=4 and --parallel=5 for arm64 but still getting timeout issue. Please have a look:
    https://travis-ci.com/github/ossdev07/scipy

Please suggest what should possibly the next step here.

@isuruf
Copy link
Contributor

isuruf commented Apr 15, 2020

@odidev, can you install pytest-xdist on drone as well and see if that helps?

@isuruf
Copy link
Contributor

isuruf commented Apr 15, 2020

Nvm, it doesn't.

@rgommers
Copy link
Member

* The builds are getting successfully passed for arm64 platform with `--parallel=3`, Please have a look: https://travis-ci.com/github/ossdev07/scipy/jobs/318718768.

Green in 48 minutes, nice.

When running all the builds, all are getting successfully passed except the two arm64 related builds they are getting timed-out

Do we need two jobs? I'd be happy to have just the first one, and leave out TESTMODE=full. That may speed things up for the other job.

@odidev
Copy link
Contributor

odidev commented Apr 16, 2020

@rgommers I have raised PR as per your suggestion, Please have a look: #11867

@rgommers
Copy link
Member

rgommers commented May 2, 2020

Okay, step 1 is done, we have ARM64 on TravisCI. Step 2 will be to add support for producing manylinux2014 wheels to https://github.com/MacPython/scipy-wheels, and then we need to put them on PyPI for 1.5.0

@andyfaff
Copy link
Contributor

andyfaff commented May 2, 2020

Is there a reason we dont make wheel artefacts in the ci processes? It's good to have a defined process for releases, but it might be nice to make bleeding edge wheel artefacts available.

@charris
Copy link
Member

charris commented May 2, 2020

ISTR that there were still some bugs in manylinux2014, at least for aarch64. @mattip Thoughts?

@andyfaff
Copy link
Contributor

andyfaff commented May 2, 2020

This is more of a general question for all OS and arches.

@rgommers
Copy link
Member

rgommers commented May 2, 2020

Good point @andyfaff, once the migration from Rackspace to Anaconda is done, we should copy what NumPy does to provide latest wheels for CI purposes (see tools/travis-upload-wheel.sh in the numpy repo).

@hrw
Copy link
Author

hrw commented Aug 4, 2020

According to pyca/cryptography#5292 (comment) you may try new aarch64 nodes on travis ci (if you are on .com not .org). Those are AWS Graviton2 based so should be faster.

@odidev
Copy link
Contributor

odidev commented Aug 10, 2020

@rgommers, almost all the test cases are passing on AArch64 machine except one and the same test is failing on x86_64 machine as well.

Below are the steps to resolve aarch64 issue:

  1. As suggested by @rgommers, ran the tests in fast test_mode instead of full test_mode.
  2. Installed https://anaconda.org/multibuild-wheels-staging/openblas-libs/v0.3.9/download/openblas-v0.3.9-manylinux2014_aarch64.tar.gz openblas via inbuilt function instead of installing threw apt/yum. This needed to add SHA-256 value of openblas-v0.3.9-manylinux2014_aarch64.tar.gz file in sha256_vals list at line openblas_support.py#L37

Below is the failing test case on both the platform
failure log snippet-
_/venv/lib/python3.6/site-packages/scipy/optimize/_linprog_simplex.py:166: in pivot_row
return True, min_rows[0]
E IndexError: index 0 is out of bounds for axis 0 with size 0
T = array([[ 1., 0., 0., 1., 0., 1., 0., nan],
[ 0., 0., 1., 0., 1., 0., 1., nan],
[ 1., 2., 3., 0., 0., 0., 0., inf],
[-1., -0., -1., -1., -1., 0., 0., nan]])

Failure on x86_64:
https://travis-ci.com/github/odidev/scipy-wheels/builds/179065731

Failure on AArch64:
https://travis-ci.com/github/odidev/scipy-wheels/jobs/369550585

Can you please let me know if we need to solve this issue or else this is know and accepted? I am unable to verify this as I can't see any build run log link on scipy-wheel repository.

Thanks

@rgommers
Copy link
Member

Interesting, that test failure doesn't seem like a blocker for ARM64 wheels then, but it would be good to figure it out (possibly in a separate issue). Did you change the OpenBLAS version for x86_64? If not, then I'm not sure why we don't see it on master.

It seems like you're at the point where we can merge your work @odidev, very nice. Can you open a PR with your changes? Please leave that one failure as is, and open a separate issue in this repo for it. Then we'll figure it out there; we may add an xfail to it temporarily to make the wheel build pass.

@odidev
Copy link
Contributor

odidev commented Aug 11, 2020

@rgommers, no, I did not changed OpenBLAS version. I just used AArch64 specific version of OpenBLAS in scipy. To do this exercise, I forked scipy and added AArch64 specific version of OpenBLAS into it and integrated this forked scipy as git submodule in scipy-wheel repository.

Here, first we need to open a PR in scipy and then in scipy-wheel. I am working on it. Thanks.

@odidev
Copy link
Contributor

odidev commented Aug 12, 2020

@tylerjereddy, @rgommers It looks like this issue got auto closed due to merging PR-12705 Can you please re-open this issue?

@tylerjereddy
Copy link
Contributor

The ARM64 wheels PR is getting a bit closer I think: MacPython/scipy-wheels#93

I've added some minor comments there just now. A few other thoughts:

  • do we plan to try to backport to 1.5.x to release ARM64 wheels before 1.6.0?
  • travis ci .com instead of the older travisci.org apparently allows for better/faster ARM nodes, but can we reasonably expect all of MacPython to migrate over soon-ish?

@rgommers
Copy link
Member

do we plan to try to backport to 1.5.x to release ARM64 wheels before 1.6.0?

This would be helpful for quite a few users, but I also have a memory of breaking people's deployment pipelines when we added wheels for old releases once for NumPy. I can't remember why though. I'd be inclined to not do this, IIRC we once decided it was a bad idea to go amend old releases.

@matthew-brett
Copy link
Contributor

As I remember it, when I uploaded Intell-architecture Numpy wheels for previous releases, it found someone whose pipeline depended on the fact that Numpy was installing from source for an older release. I suspect this is very unusual.

@tylerjereddy
Copy link
Contributor

To clarify, if we release a new 1.5.3 point release, it could still break the workflow for people because it has one more binary/wheel available vs. 1.5.2?

I'm not suggesting we add an extra binary/wheel to something that has already been released, but just have another point release that adds ARM64 wheels alongside new wheels of the usual type.

@tylerjereddy
Copy link
Contributor

See also some discussion here: MacPython/scipy-wheels#93 (comment)

We should also keep in mind that while I should probably do a 1.5.3 release because of XCode 12 compatibility break anyway, backporting ARM64 shims/fixes on both the main repo and wheels repo is sure to run into some hiccups.

@tylerjereddy
Copy link
Contributor

One might even wonder if adding support for a new architecture is considered a "feature" that falls outside the scope of a point release semantic versioning-like approach, though that's perhaps a bit formal. The main concern I have is just quite experienced folks above indicating that problems have happened in the past.

@rgommers
Copy link
Member

I think adding the wheel for 1.5.3 makes sense. If it breaks, we fix it or we pull it quickly.

@AGSaidi
Copy link

AGSaidi commented Oct 1, 2020

Personally, I'd love to see an 1.5.3 arm64 wheel too. @tylerjereddy if the time to backport is the concern, I think you'll have some volunteers to help.

@tylerjereddy
Copy link
Contributor

Okie, we're working on it

@mattkanwisher
Copy link

Seems like aarch64 is only for Macs? There is a lot of machine learning happening on the edge with SciPy, it currently takes over an hour to install on embedded arm devices. Any chance we can get wheels for linux aarch64 also?

@mattip
Copy link
Contributor

mattip commented Oct 2, 2020

Seems like aarch64 is only for Macs?

The wheels are called manylinux2014 because they are for linux. MacOS11 hybrid (or whatever they call it) is a different kettle of fish to fry.

@rgommers
Copy link
Member

rgommers commented Nov 8, 2020

@tylerjereddy manylinux2014 wheels for aarch64 are up, and ppc64le is probably not worth doing at this point given new TravisCI plan limitations. So this can be closed?

@tylerjereddy
Copy link
Contributor

Alright, I'll close it then. The time/energy needed to deal with that Travis CI shift and also expand (or even just maintain) our wheels offerings is maybe something worthy of a mention in a maintenance grant.

@rgommers
Copy link
Member

rgommers commented Nov 9, 2020

The time/energy needed to deal with that Travis CI shift and also expand (or even just maintain) our wheels offerings is maybe something worthy of a mention in a maintenance grant.

Yes I agree. It's one of the most essential as well as most painful things that needs doing in SciPy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Official binaries Items related to the official SciPy binaries, including wheels and vendored libs
Projects
None yet