Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 #11629

Merged

Conversation

boegel
Copy link
Member

@boegel boegel commented Nov 8, 2020

(created using eb --new-pr)
requires #11337 (intel/2020b), #11489 (foss/2020b)

note: marked as WIP since the versionsuffix should be removed (and the tests should be changed accordingly)

…SciPy-bundle-2020.11-intel-2020b-Python-3.8.6.eb
@boegel boegel added the update label Nov 8, 2020
@boegel
Copy link
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3108.skitty.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/c4cbdde126de8ecac55c4987b319f310 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 8, 2020

Test report by @boegel
SUCCESS
Build succeeded for 15 out of 15 (2 easyconfigs in total)
node2406.golett.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/7418d361621ab3dfa92cb89a3718dc71 for a full test report.

@boegel boegel changed the title {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 (WIP) Nov 9, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 9, 2020
@boegel boegel added the 2020b issues & PRs related to 2020b label Nov 9, 2020
@boegel boegel added this to the 4.3.2 milestone Nov 9, 2020
@boegel boegel changed the title {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 (WIP) {lang}[foss/2020b,intel/2020b] SciPy-bundle v2020.11 w/ Python 3.8.6 Nov 9, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 9, 2020
@boegel
Copy link
Member Author

boegel commented Nov 9, 2020

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
easybuild1.novalocal - Linux centos linux 8.2.2004, POWER, IBM pSeries (emulated by qemu) (power8le), Python 3.6.8
See https://gist.github.com/63a2955015aa5b1cc2c6621250d33604 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 9, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9724

Test results coming soon (I hope)...

- notification for comment with ID 724282509 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member Author

boegel commented Nov 9, 2020

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3568.doduo.os - Linux RHEL 8.2, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/f7a4b7f0618fb5510c5986547cf2d51b for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/2224cfc86d9db434b2fcfbbc48602a4f for a full test report.

@boegel boegel requested a review from Micket November 10, 2020 08:10
Micket
Micket previously approved these changes Nov 12, 2020
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make a mental note to include hypothesis into Python in 2021a.

@Micket
Copy link
Contributor

Micket commented Nov 12, 2020

I just now saw that we already have a stand-alone hypothesis easyconfig, used in PyTorch and some other thing.
(somehow this is just a builddep in PyTorch, which seems a bit odd, hypothesis isn't a build-related package it is?)
But, since we are probably going to have sucha module for 2020a and 2020b as well.. should we use it here?

@boegel
Copy link
Member Author

boegel commented Nov 12, 2020

I just now saw that we already have a stand-alone hypothesis easyconfig, used in PyTorch and some other thing.
(somehow this is just a builddep in PyTorch, which seems a bit odd, hypothesis isn't a build-related package it is?)
But, since we are probably going to have sucha module for 2020a and 2020b as well.. should we use it here?

hypothesis is a testing library, and it's actually only a build dep for numpy too, see https://github.com/numpy/numpy/tree/master/INSTALL.rst.txt#prerequisites .

So it makes total sense to make it a separate easyconfig and only include it as a build dep, I'll look into changing that 👍

@boegel
Copy link
Member Author

boegel commented Nov 12, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9763

Test results coming soon (I hope)...

- notification for comment with ID 726113140 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member Author

boegel commented Nov 12, 2020

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3401.kirlia.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), Python 2.7.5
See https://gist.github.com/5dc4e7b21c9a9b1c34fdfb52da36c175 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/bd919ac550e9a0e335b494f52f2ef052 for a full test report.

@boegel boegel force-pushed the 20201108102926_new_pr_SciPy-bundle202011 branch from 8df1b02 to 4367b20 Compare November 16, 2020 15:02
@easybuilders easybuilders deleted a comment from boegelbot Nov 16, 2020
@easybuilders easybuilders deleted a comment from boegelbot Nov 16, 2020
@boegel
Copy link
Member Author

boegel commented Nov 16, 2020

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on generoso

PR test command 'EB_PR=11629 EB_ARGS= /apps/slurm/default/bin/sbatch --job-name test_PR_11629 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 9772

Test results coming soon (I hope)...

- notification for comment with ID 728118363 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
generoso-c1-s-1 - Linux centos linux 8.2.2004, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/b178deeb993f3279c3d17dba1a445212 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 16, 2020

Test report by @boegel
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node2304.phanpy.os - Linux centos linux 7.8.2003, x86_64, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (haswell), Python 2.7.5
See https://gist.github.com/53474dbcde81a6ef12edb20ab43b105c for a full test report.

Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket
Copy link
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 7 out of 13 (3 easyconfigs in total)
alvis-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/ef9794eae3ea0619d9a2d78cbd4c1ed6 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
bear-pg0306u19a.bear.cluster - Linux RHEL 8.2, POWER, 8335-GTX (power9le), Python 3.6.8
See https://gist.github.com/f3ba48b399e418a5cbdf132f196daf03 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 16, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

@Micket
Copy link
Contributor

Micket commented Nov 16, 2020

I forgot to set UCX_TLS for the VM i'm building on

@Micket
Copy link
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/4db0242f4dbf0e230780c61045147caf for a full test report.

@Micket
Copy link
Contributor

Micket commented Nov 16, 2020

Test report by @Micket
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/c14f65d4ee6670c909f5801a0bcefae7 for a full test report.

Copy link
Contributor

@lexming lexming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lexming
Copy link
Contributor

lexming commented Nov 19, 2020

Test report by @lexming
SUCCESS
Build succeeded for 7 out of 7 (3 easyconfigs in total)
node381.hydra.os - Linux centos linux 7.7.1908, x86_64, Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz, Python 2.7.5
See https://gist.github.com/3e39596d2852c49762bdf001a418d2f7 for a full test report.

@Micket
Copy link
Contributor

Micket commented Nov 20, 2020

I can't make sense of my build errors;

== 2020-11-20 11:01:20,166 run.py:222 INFO running cmd: python -c "import numexpr"
== 2020-11-20 11:01:20,411 extensioneasyblock.py:181 INFO Sanity check for numexpr successful!
== 2020-11-20 11:01:20,412 easyblock.py:2669 WARNING failing sanity check for 'numexpr' extension: (see log for details)

what?!

@Micket
Copy link
Contributor

Micket commented Nov 20, 2020

Test report by @Micket
FAILED
Build succeeded for 1 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/904c3ae9c7d99eb1245aeab6051e0ffb for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 23, 2020

@Micket There must be an error higher up? Are you sure you're using the numexpr easyblock from develop?
See easybuilders/easybuild-easyblocks#2022.

@schiotz
Copy link
Contributor

schiotz commented Nov 27, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

We are seing this error in some of our own code with intel/2020a. Apparently, calling MPI_Init from a program that is not started with mpiexec / mpirun cause this error. According to the MPI standard, an MPI implementation is strongly encouraged (but not required) to allow this situation, and just initialize an MPI environment with rank=1, as if called with mpiexec -n 1. But newer versions of Intel MPI apparently fail this. I found a bug report somewhere where Intel claims to have fixed it in 2019 update 6, but that does not seem to be the case.

@Micket
Copy link
Contributor

Micket commented Nov 27, 2020

@schiotz I was just building on a VM that doesn't have infiniband. I had to set UCX_TLS.

I'm rebuilding now with the easyblock from pr 2022

@schiotz
Copy link
Contributor

schiotz commented Nov 27, 2020

@Micket Is that an issue? Running on machines without infiniband if MPI is installed to support Infiniband? In that case, what should UCX_TLS be set to?

@Micket
Copy link
Contributor

Micket commented Nov 27, 2020

Yes, seems to be. Intel MPI will try to use a transport that isn't supported. In fact, it's even an issue on older IB hardware, cf.
#10899
easybuilders/easybuild-easyblocks#2253

On my build machine (that completely lacks IB) I set UCX_TLS=self,tcp which seems to do the trick.

@Micket
Copy link
Contributor

Micket commented Nov 27, 2020

Test report by @Micket
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2022
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
vera-c1 - Linux centos linux 7.8.2003, x86_64, Intel Xeon Processor (Skylake), Python 2.7.5
See https://gist.github.com/c1837efe1bb256c7b2900f48c273fadd for a full test report.

@lexming
Copy link
Contributor

lexming commented Nov 28, 2020

Going in, thanks @boegel !

@lexming lexming merged commit b823998 into easybuilders:develop Nov 28, 2020
@boegel boegel deleted the 20201108102926_new_pr_SciPy-bundle202011 branch November 28, 2020 16:39
@boegel
Copy link
Member Author

boegel commented Nov 28, 2020

@Micket Any idea what the problem is here in your failing test report?

Abort(1091215) on node 5 (rank 5 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........: 
MPID_Init(1149)..............: 
MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed
[1605544318.626885] [alvis-c1:196032:0]         select.c:444  UCX  ERROR no active messages transport to <no debug data>: self/memory - Destination is unreachable

We are seing this error in some of our own code with intel/2020a. Apparently, calling MPI_Init from a program that is not started with mpiexec / mpirun cause this error. According to the MPI standard, an MPI implementation is strongly encouraged (but not required) to allow this situation, and just initialize an MPI environment with rank=1, as if called with mpiexec -n 1. But newer versions of Intel MPI apparently fail this. I found a bug report somewhere where Intel claims to have fixed it in 2019 update 6, but that does not seem to be the case.

I can confirm this problem, it's very annoying, but it's only an issue with the impi in intel/2020a (it doesn't happen with intel/2020b, I think).

@lexming
Copy link
Contributor

lexming commented Nov 28, 2020

@boegel that is the error without easybuilders/easybuild-easyblocks#2253

@boegel
Copy link
Member Author

boegel commented Nov 28, 2020

@boegel that is the error without easybuilders/easybuild-easyblocks#2253

I thought that fix was only relevant for multi-node runs? Perhaps not, ok :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2020b issues & PRs related to 2020b update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants