-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{chem}[GCCcore/11.2.0,foss/2021b] LAMMPS v29Sep2021 (with and without cuda) #14815
{chem}[GCCcore/11.2.0,foss/2021b] LAMMPS v29Sep2021 (with and without cuda) #14815
Conversation
Hi, I wanted to test the easyconfigs at my laptop, but running
And there are plenty of lines below, eg., libwebp-1.1.0-GCCcore-10.2.0.eb:31:37: E741 ambiguous variable name 'l' I have loaded the development version of easybuild via a module. Any ideas why is that so? Best Regards, |
One fail, eg., test-suite (3.5, Lmod-7.8.22, Lua), "Error: Version 3.5 with arch x64 "not found is not clear to me. Does not look like it's the submitted easyconfings files |
I will produce some suggestions to this in a while, currently testing a build of a slightly different CUDA version So don't merge just yet. |
Thanks, don't worry, there is a long way before merge because easybuilders/easybuild-easyblocks#2213 has to be submitted first. |
] | ||
|
||
# hardware-specific option | ||
cuda_compute_capabilities = ['8.0'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be included here, it should be set by the site
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but you can comment it out and leave it as a guide)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing it with a comment about cuda_compute_capabilities
would be good. Especially as this can highlight that the LAMMPS easyblock builds for the highest item in the list, as opposed to some other software that produces a fat-binary compatible with all items in that list.
There are quite a few options being given that don't seem to be understood by CMake:
|
The docs seem to want to do a git clone:
I found a
|
Ah, I was fixing the git issue with either a patch or local configuration. Not sure what should we do here. Perhaps just abandon documentation and the corresponding patch would be easier The point above (CMake)... The messages about the configured libraries libraries configured within 29Sep2021: so, Eigen is missing in the second one, Boost is not present in both because it is actually not in the dependencies list. I guess it is possible to fix Eigen (in easyblock), or just remove it (easier), and ignore Boost warning. It is harder to work through packages because previous lammps version did not report the packages it is build with in the log. But
are set by the easyblock. PKG_USER-OMP was configuring what is OPENMP now as far as I understood, and it is configured. I will try also adding INTEL to the package list, which is not configured in the new build. I definitely have exe file (lmp). And I do have liblammps.so in the lib64 directory, so BUILD_EXE and BUILD_LIB can be also ignored since the corresponding targets are anyway present. Best, |
Looks like the easyblock needs some more tweaking I think to handle the newer scenarios. |
With the option
this builds ok for me (but I built it on a node with no GPUs, so it failed the tests!). |
I see, I will try to work on EasyBlock next week. It does not look that complicated |
Ah, according to #14517 it looks like we do need to force-enable the MPI CUDA support |
Hi @ocaisa, I was working on it, and cuda-awareness is on via NCCL(has UCX-CUDA inside) and LAMMPS-29Sep2021-force-cudaaware-kokkos.patch |
I came up with another patch yesterday that I think is a little safer (will still work on non-GPU nodes): lammps/lammps#3140 (comment) |
Ok, let me test/think about it today to see if it does the job for the submitted easyconfig |
I've got an error nvlink fatal : Could not open input file '/home/adavydov/easybuild/local_installpath/software/tbb/2020.3-GCCcore-11.2.0/lib/libtbbmalloc.so' Although the mentioned file is present and has "INPUT (libtbbmalloc.so.2)" inside. Any ideas? |
It is failing to correctly deal with that file (see https://sourceware.org/binutils/docs/ld/Implicit-Linker-Scripts.html for information about these type of files), which are an alternative method to having a symlink of |
Does it mean that one has to change the way tbb is setting up the libraries? I don't see how to compile LAMMPS without it for a moment, besides ignoring tbb... Although #15876 seems to work with it. |
Only one package needs TBB, but I can't remember which now. If you exclude that you can move on with your life! |
From https://docs.lammps.org/Build_extras.html, it looks like only the INTEL package needs TBB |
@arkdavy I see the same issue for #15900 (see https://gist.github.com/boegelbot/99fd77113c6b64c588539ac40def2884#file-lammps-23jun2022-foss-2021a-kokkos-cuda-11-3-1_partial-log-L484). The issue is with |
I did a bit of digging around and this is triggered by the When TBB is there you get
So probably we could skirt around this by providing the path to the versioned |
The "fix" is in 920d48e (alternatively one could define CMake variables to the library but it didn't seem worth the effort) |
Ok, I have removed the tbb, and could build the code. I will update the branch but still will have to test if the new cudaaware works. Meanwhile, having --update-pr broken because EB does not see GitPython and I have to go now. I will write here as soon as cuda aware tests are done (I do not see how to definitely check it without running a couple of calculations and a cluster). |
Hi @ocaisa . I have tried the new patch (in the last commit), and it didn't work apparently: WARNING: Turning off GPU-aware MPI since it is not detected, use '-pk kokkos gpu/aware on' to override (src/KOKKOS/kokkos.cpp:277) The same is concluded by comparing the timing of my tests between 1-GPU and 2-GPU calculations |
@arkdavy is that with a rebuild of the OpenMPI module? |
Ok, I have rebuilt the OpenMPI and see that there are CUDA-related patches and config option, but the result is still the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(created using
eb --new-pr
)requires easybuilders/easybuild-easyblocks#2213