Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST: 1.21.x + blas variants #237

Closed
wants to merge 4 commits into from

Conversation

h-vetinari
Copy link
Member

Continuing the analysis from #227 & #196. Should not be merged for the same reasons as #227.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

Update for 1.21.0

From 1 failure out of 64 for 1.20.3, there are now 4 (mostly flaky) failures.

Note: travis seems to be hanging across the board, but should pass once restarted.. Same expectation for aarch, which is still stuck in a long queue. Will update this comment if necessary.

The badnews:

  • win+blis remains flaky
  • broken pipes reappeared on win

Details

lib before after updated
numpy 1.20.3 1.21.0 X
libblas 3.9.0-9 3.9.0-9
blis 0.8.1-0 0.8.1-0
openblas 0.3.15-pthreads-1 0.3.15-pthreads-1
mkl 2021.2-389 2021.2-389
netlib 3.9.0-5 3.9.0-5
pypy 7.3.4-4 7.3.4-4

variant before after
win + blis 12 failures for py37-only 12 failures for py39-only
win Passed Reappearing failures due to The process tried to write to a nonexistent pipe.

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✔️ ✔️ -
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ / ❌ ✔️ / ❌ ✔️ / ❌ ✔️ / ❌ 4F
sum* 1F 1F 1F 1F 4F

* sum of Failures (out of a total of 64 CI combinations being tested)

Build logs:
Azure
Drone
Travis

win + blis + cpython 3.9: 12 failures
=========================== short test summary info ===========================
FAILED core/tests/test_multiarray.py::TestMatmul::test_dot_equivalent[args4]
FAILED core/tests/test_multiarray.py::TestMatmul::test_matmul_object - Assert...
FAILED linalg/tests/test_linalg.py::TestSolve::test_sq_cases - AssertionError...
FAILED linalg/tests/test_linalg.py::TestSolve::test_generalized_sq_cases - As...
FAILED linalg/tests/test_linalg.py::TestInv::test_sq_cases - AssertionError: ...
FAILED linalg/tests/test_linalg.py::TestInv::test_generalized_sq_cases - Asse...
FAILED linalg/tests/test_linalg.py::TestPinv::test_generalized_sq_cases - Ass...
FAILED linalg/tests/test_linalg.py::TestPinv::test_generalized_nonsq_cases - ...
FAILED linalg/tests/test_linalg.py::TestDet::test_sq_cases - AssertionError: ...
FAILED linalg/tests/test_linalg.py::TestDet::test_generalized_sq_cases - Asse...
FAILED linalg/tests/test_linalg.py::TestMatrixPower::test_power_is_minus_one[dt13]
FAILED linalg/tests/test_linalg.py::TestCholesky::test_basic_property - Asser...
= 12 failed, 16017 passed, 354 skipped, 20 xfailed, 1 xpassed, 229 warnings in 622.85s (0:10:22) =

@h-vetinari
Copy link
Member Author

Finally opened an issue for blis: flame/blis#514

@h-vetinari
Copy link
Member Author

So I had to restart the CI because travis died (and I don't have restart rights). This now lead to the reappearance of numpy/numpy#19192, though exclusively for PyPy.

@mattip @r-devulap, is it possible that something is messing with the glibc-detection used in numpy/numpy#19209 on PyPy? Also, I really don't understand why this passed an hour before (with the exact same commit).

@mattip
Copy link

mattip commented Jun 30, 2021

I really don't understand why this passed an hour before

It may be run on different machines, some with AVX512 some without

@mattip
Copy link

mattip commented Jun 30, 2021

It seems NumPy master has extended numpy.show_config to show which CPU features are detected. It would be nice if we could see that here, it would allow reasoning about runs on different CI machines

recipe/meta.yaml Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Member Author

So I had to restart the CI because travis died (and I don't have restart rights). This now lead to the reappearance of numpy/numpy#19192, though exclusively for PyPy.

@mattip @r-devulap, is it possible that something is messing with the glibc-detection used in numpy/numpy#19209 on PyPy? Also, I really don't understand why this passed an hour before (with the exact same commit).

After following @mattip's tip for investigating the SIMD capabilities of the agents again, we're basically reconfirming numpy/numpy#19192 (failing runs have AVX512F? AVX512CD? AVX512_SKX?, passing runs have AVX512F* AVX512CD* AVX512_SKX*), but that just shows that the glibc-version-skip introduced in numpy/numpy#19209 does not properly work on PyPy for some reason.

@h-vetinari
Copy link
Member Author

Finally opened an issue for blis: flame/blis#514

However, the SIMD check did help to verify that the blis failures are not actually flaky, but happen in the presence of AVX512.

@h-vetinari
Copy link
Member Author

[...] that just shows that the glibc-version-skip introduced in numpy/numpy#19209 does not properly work on PyPy for some reason.

@mattip, is ver = os.confstr('CS_GNU_LIBC_VERSION').rsplit(' ')[1] supposed to work on PyPy? Or perhaps, what would be the correct way to pick up the system glibc version?

@r-devulap
Copy link

Hi @h-vetinari looks like ver = os.confstr('CS_GNU_LIBC_VERSION').rsplit(' ')[1] doesn't work on pypy3. It throws an ValueError: unrecognized configuration name. This explains why you are seeing the error again. I am not sure how to make it work on pypy3, still looking into it..

@h-vetinari
Copy link
Member Author

This explains why you are seeing the error again. I am not sure how to make it work on pypy3, still looking into it..

Thanks a lot for investigating!

@mattip
Copy link

mattip commented Jul 1, 2021

PyPy does not implement that value. I opened an issue in PyPy and a corresponding one in NumPy numpy/numpy#19385. Note that packaging.tags uses a ctypes workaround, but that seems like overkill for a problem that PyPy should solve.

@h-vetinari
Copy link
Member Author

PyPy does not implement that value. I opened an issue in PyPy and a corresponding one in NumPy numpy/numpy#19385.

Thanks! :)

@mattip
Copy link

mattip commented Jul 1, 2021

PyPy issue is fixed. Is it worth making a patch and releasing a new pypy7.3.5 build? The next PyPy release will probably be a few months coming, and I don't know how common it is to use PyPy + centos7

@h-vetinari
Copy link
Member Author

PyPy issue is fixed. Is it worth making a patch and releasing a new pypy7.3.5 build?

I think that would be worthwhile, carrying a patch is not a big deal IMO.

The next PyPy release will probably be a few months coming, and I don't know how common it is to use PyPy + centos7

Maybe I misunderstand, but CentOS 6/7 are just stand-ins for linux here, where PyPy usage is highest.

@mattip
Copy link

mattip commented Jul 1, 2021

I mis-stated the failing combination above. It is PyPy + glibc2.12, which was found on centos6, not centos7. Centos6 is EOL since Nov 2020. But for some reason the conda environment uses it.

@h-vetinari
Copy link
Member Author

Centos6 is EOL since Nov 2020. But for some reason the conda environment uses it.

See here: conda-forge/conda-forge.github.io#1436

@isuruf
Copy link
Member

isuruf commented Jul 1, 2021

But for some reason the conda environment uses it.

It's the same reason that numpy supports manylinux2010 (which is glibc 2.12). 😉

@mattip
Copy link

mattip commented Jul 2, 2021

NumPy uses manylinux2010 not to support the outdated CentOS6, but because it still supports older linux versions that may not have pip v20. I am not sure conda has the same problem.

@h-vetinari
Copy link
Member Author

I am not sure conda has the same problem.

A very similar one - once conda moves off of CentOS 6, the packages built for linux are not usable on older distros anymore. As can be seen from the issue I linked, a move away from this is on the horizon, but that wasn't realistic or desirable until quite recently.

@h-vetinari
Copy link
Member Author

Update for numpy 1.21.2: still all green! 🥳

After 0 failure out of 76 for 1.21.1, we remained at 0 failures (out of 68; having removed the AVX512-specific runs for blis).

Details

lib before after updated
version
updated
build
numpy 1.21.1 1.21.2 X
libblas 3.9.0-10 3.9.0-11 X
blis 0.8.1-1 0.8.1-1
openblas 0.3.17-pthreads-1 0.3.17-pthreads-1
mkl 2021.3.0-557 2021.3.0-564 X
netlib 3.9.0-5 3.9.0-5
pypy 7.3.5-7 7.3.5-9 X

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✔️ ✔️ -
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - - - 0

* sum of Failures (out of a total of 68 CI combinations being tested)

Build logs:
Azure
Drone
Travis

@h-vetinari
Copy link
Member Author

Update for numpy 1.21.3: still all green! 🥳

After 0 failures out of 68 runs for 1.21.2, we remained at 0 failures.

Details

lib before after updated
version
updated
build
numpy 1.21.2 1.21.3 X
libblas 3.9.0-11 3.9.0-12 X
blis 0.8.1-1 0.8.1-1
openblas 0.3.17-pthreads-1 0.3.18-pthreads-0 X
mkl 2021.3.0-564 2021.4.0-729 X
netlib 3.9.0-5 3.9.0-5
pypy 7.3.5-9 7.3.5-9

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✔️ ✔️ -
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - - - 0

* sum of Failures (out of a total of 68 CI combinations being tested)

Build logs:
Azure
Drone

@rgommers
Copy link
Contributor

very nice!

@h-vetinari
Copy link
Member Author

h-vetinari commented Dec 28, 2021

Update for numpy 1.21.5: all green except PPC (as before)

Due to the missing sys.exit wrapper for numpy.test, we were missing some error reporting. In particular, the PPC builds were all failing before already, so with that in mind: After 8 failures (PPC-only) out of 68 runs for 1.21.3, we are now at 10 failures (PPC-only) out of 86 runs (added python 3.10 everywhere).

Notable

Details

lib before after updated
version
updated
build
numpy 1.21.3 1.21.5 X
libblas 3.9.0-12 3.9.0-12
blis 0.8.1-1 0.8.1-1
openblas 0.3.18-pthreads-0 0.3.18-pthreads-0
mkl 2021.4.0-729 2021.4.0-729
netlib 3.9.0-5 3.9.0-5
pypy 7.3.5-9 7.3.7-3 X

variant blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✖️ ✖️ 10F
osx / arm ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - 5F 5F 10F

* sum of Failures (out of a total of 86 CI combinations being tested)

Build logs:
Azure

@mattip
Copy link

mattip commented Dec 28, 2021

Nice. A heads-up that there is apparently a problem with the recently released OpenBLAS 0.3.19 and NumPy: see numpy/numpy#20660

@h-vetinari
Copy link
Member Author

Not expecting a 1.21.6 release, so closing this.

@h-vetinari h-vetinari closed this Feb 5, 2022
@h-vetinari h-vetinari reopened this Apr 12, 2022
@h-vetinari h-vetinari changed the base branch from master to numpy121 April 12, 2022 22:24
@h-vetinari
Copy link
Member Author

Update for 1.21.6 (+ new PyPy builds and BLAS updates): all green except PPC (as before)

Turns out I guessed wrong about:

Not expecting a 1.21.6 release, so closing this.

Also, due to the rebuilds for pypy3.8/3.9, much less several relevant BLAS (& infrastructure) changes, it makes sense to do an update here.

From 10 failures (PPC-only) out of 86 runs, we're now at 12 failures (PPC-only) out of 108 runs.

Notable

  • Added accelerate BLAS flavour on osx
  • Testing against PyPy 3.8 and 3.9 added everywhere but for osx-arm
  • Version bumps for openblas, blis & MKL
  • Switched to running the full test suite; emulation keeps running only label='fast' tests.

Details

variant before after
linux + ppc test failures due to emulation problems as before

lib before after updated
version
updated
build
numpy 1.21.5 1.21.6 X
libblas 3.9.0-12 3.9.0-14 X
blis 0.8.1-1 0.9.0-0 X
openblas 0.3.18-pthreads-1 0.3.20-pthreads-0 X
mkl 2021.4.0-729 2022.0.1-803 X
netlib 3.9.0-5 3.9.0-5
pypy 7.3.7-3 7.3.9-1 X
qemu-user-static ? 6.1.0-8

variant accelerate blis mkl netlib openblas sum*
linux / x86 ✔️ ✔️ ✔️ ✔️ -
linux / aarch ✔️ ✔️ -
linux / ppc64le ✖️ ✖️ 12F
osx / arm ✔️ ✔️ ✔️ -
osx / x86 ✔️ ✔️ ✔️ ✔️ ✔️ -
win / x86 ✔️ ✔️ ✔️ ✔️ -
sum* - - - 6F 6F 12F

* sum of Failures (out of a total of 108 CI combinations being tested)

Build logs:
Azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants