Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lib}[GCC/11.2.0] MPItrampoline v2.8.0 w/ OpenMPI v4.1.2 (WIP) #14607

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

boegel
Copy link
Member

@boegel boegel commented Dec 21, 2021

(created using eb --new-pr)

Marked as WIP, since I haven't experimented with this yet, and we need to figure out how to easily switch to other MPI libraries as well...

cc @eschnett

@boegel boegel added the new label Dec 21, 2021
@boegel boegel marked this pull request as draft December 21, 2021 20:17
@boegel boegel changed the title {lib}[GCC/11.2.0] MPItrampoline v2.8.0 {lib}[GCC/11.2.0] MPItrampoline v2.8.0 w/ OpenMPI v4.1.2 (WIP) Dec 21, 2021
@boegel boegel added this to the 4.x milestone Dec 21, 2021
@boegel
Copy link
Member Author

boegel commented Dec 21, 2021

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=14607 EB_ARGS= /opt/software/slurm/bin/sbatch --job-name test_PR_14607 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 7553

Test results coming soon (I hope)...

- notification for comment with ID 999069507 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns2 - Linux rocky linux 8.4, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/930375fecdfd2773ce0d745c06bdfe84 for a full test report.

@boegelbot
Copy link
Collaborator

@boegel: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/1608513698
Output from first failing test suite run:

FAIL: test_style_conformance (test.easyconfigs.styletests.StyleTest)
Check the easyconfigs for style
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test/easyconfigs/styletests.py", line 68, in test_style_conformance
    self.assertEqual(result, 0, error_msg)
  File "/opt/hostedtoolcache/Python/2.7.18/x64/lib/python2.7/site-packages/easybuild/base/testing.py", line 116, in assertEqual
    raise AssertionError("%s:\nDIFF%s:\n%s" % (msg, limit, ''.join(diff[:self.ASSERT_MAX_DIFF])))
AssertionError: There shouldn't be any code style errors (and/or warnings), found 1:
/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/m/MPItrampoline/MPItrampoline-2.8.0-GCC-11.2.0.eb:50:5: E265 block comment should start with '# '

: 1 != 0:
DIFF:
- 1

----------------------------------------------------------------------
Ran 13912 tests in 513.797s

FAILED (failures=1)
ERROR: Not all tests were successful

bleep, bloop, I'm just a bot (boegelbot v20200716.01)
Please talk to my owner @boegel if you notice you me acting stupid),
or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

@eschnett
Copy link

In principle, it is possible to install MPItrampoline by itself, without any other MPI implementation. In practice, it can be convenient to ensure that a real MPI implementation together with its MPIwrapper are installed at the same time, since this provides a convenient fallback for users of MPItrampoline. It seems you want to do this here.

When doing so, it is important that MPItrampoline is installed as a regular library (that can be found by applications), whereas the real MPI implementation as well as MPIwrapper must remain hidden. (Of course, MPIwrapper must be pointed to OpenMPI.) The "clients" of MPItrampoline must not accidentally see the real MPI library nor MPIwrapper.

When MPItrampoline is configured, it might be convenient to add configuration options pointing it to the installed MPIwrapper, if you have enough control over the install path to make this possible. If so, the environment variable MPITRAMPOLINE_LIB does not need to be set at run time.

I don't know enough of EasyBuild to know how to achieve this. If you were using configure for everything, this is how it would approximately go:

  1. Install OpenMPI into $prefix/mpiwrapper-openmpi
  2. Install MPIwrapper into the same directory, passing $prefix/mpiwrapper-openmpi as value of MPI_HOME
  3. Install MPItrampoline into $prefix, setting the configuration option MPITRAMPOLINE_DEFAULT_LIB to $prefix/mpiwrapper-openmpi/lib/libmpiwrapper.so

This should get you started. I'd be happy to discuss further.

In addition to the above, the environment variable MPITRAMPOLINE_MPIEXEC needs to point to $prefix/mpiwrapper-openmpi/bin/mpiwrapperexec if the default OpenMPI implementation is to be used. (Apparently I didn't introduce an MPItrampoline configuration option for this yet!)

For debugging, you can set the environment variable MPITRAMPOLINE_VERBOSE=1 while running.

As briefly mentioned in the talk, on macOS, OpenMPI uses the build option -flat_namespace, which doesn't work with MPItrampoline. Since there is no option to disable this, you need to run a global replace over the configure scripts in the source tree. I use these commands (in the OpenMPI source tree, before configuring):

find . -type f -print0 | xargs -0 perl -pi -e 's/-Wl,-flat_namespace//g';
./autogen.pl

Fun fact: OpenMPI and MPIwrapper can be compiled with a completely different tool chain than MPItrampoline. (Of course, the CPU architecture must match.) Any incompatibilities between C++ and Fortran compilers will not matter.

default_component_specs = {'start_dir': '%(name)s-%(version)s'}

components = [
('MPIwrapper', '2.2.1', {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the comment of @eschnett it seems that MPIwrapper (and OpenMPI which would be a dependency of it) should be extracted and used as a build dependency, that way we can reliably set the environment variables that MPItrampoline requires for runtime without having those modules loaded when something is being built with the toolchain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean we would need to poke around in the module files that are generated for MPI implementations to make sure that required envvars are not being missed. Maybe the right thing to do is to have MPIwrapper (with OpenMPI as a build dependency) to be loaded but not allow it to add to LD_LIBRARY_PATH/LIBRARY_PATH?

Copy link
Member

@ocaisa ocaisa Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MPIwrapper easyblock could be smart enough to inherit any MPI-relevant environment variables from OpenMPI (or whichever MPI it is wrapping).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, MPItrampoline loads MPIwrapper as a "plugin". (That's the technical term used by ld and/or cmake.) If you have another project using plugins, then you could look there for guidance.

@ocaisa
Copy link
Member

ocaisa commented Feb 2, 2022

@eschnett One question I was wondering about. Let's say a user wants to compile their own code, these use EB for some dependencies and then they load GCC + OpenMPI (so now mpicc from OpenMPI is at the front of the path). They don't know anything about MPItrampoline and use OpenMPI directly to compile their code, linking to some libraries from EB that use MPItrampoline. I guess this will link ok, but what will happen at runtime?

@eschnett
Copy link

eschnett commented Feb 2, 2022

@ocaisa You cannot mix MPI libraries. This is like mixing MPICH and OpenMPI, or mixing OpenBLAS and MKL. I assume there would be linker errors, but if not, there will be a segfault at run time. Each application must choose a unique MPI implementation, and all its dependencies must also be built with the same MPI implementation.

@ocaisa
Copy link
Member

ocaisa commented Feb 2, 2022

Ok, so from a user-facing perspective this would be a very major change, we would need a guard on the OpenMPI module that wouldn't allow it to be loaded when MPItrampoline is loaded.

@eschnett
Copy link

eschnett commented Feb 2, 2022

Isn't that the same case as with MPICH and OpenMPI? Or OpenBLAS and MKL? Or gcc and clang? How do you handle these cases?

The whole point of MPItrampoline, of course, is to avoid this complication: In an ideal future, all packages would use MPItrampoline (which could be configured to use OpenMPI as default).

@ocaisa
Copy link
Member

ocaisa commented Feb 2, 2022

Yes, indeed it's the same, the difference would be the brand recognition 😄

For end users who don't actually build their own software, EasyBuild can handle it all through it's toolchain definitions. Drilling that toolchain into people who do build their own software would be harder. In Lmod you can somehow handle this with the hierarchical module naming schemes and the family(), this would block them from mixing and matching.

@ocaisa
Copy link
Member

ocaisa commented Feb 8, 2022

@eschnett Sorry to keep bothering you, but I've thinking about the right way to implement this a lot. The issue I am concerned about is what happens if we arrange things such that the OpenMPI library can't be seen by the linker. In a typical scenario this means not loading the OpenMPI module file, which in turn means not loading the OpenMPI dependencies (UCX, libfabric,...). Now, what happens when OpenMPI actually gets used by MPIwrapper and OpenMPI attempts to load plugins? Since the dependencies are not loaded it is very likely that necessary libraries will not be resolved by the dynamic linker at runtime. Is the expectation here that the MPI library being linked to by MPIwrapper uses RPATH?

For example, here is what the loader resolution looks like for an OpenMPI from JSC:

[ocais1@jsfl02 lib]$ libtree -p libmpi.so
libmpi.so 
├── /p/software/jusuf/stages/2022/software/OpenMPI/4.1.2-GCC-11.2.0/lib/libopen-pal.so.40 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/libevent/2.1.12-GCCcore-11.2.0/lib/libevent_pthreads-2.1.so.7 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/libevent/2.1.12-GCCcore-11.2.0/lib/libevent_core-2.1.so.7 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/hwloc/2.5.0-GCCcore-11.2.0/lib/libhwloc.so.15 [LD_LIBRARY_PATH]
│   │   ├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libOpenCL.so.1 [LD_LIBRARY_PATH]
│   │   ├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libcudart.so.11.0 [LD_LIBRARY_PATH]
│   │   │   └── /usr/lib64/librt.so.1 [ld.so.conf]
│   │   ├── /p/software/jusuf/stages/2022/software/libpciaccess/0.16-GCCcore-11.2.0/lib/libpciaccess.so.0 [LD_LIBRARY_PATH]
│   │   ├── /p/software/jusuf/stages/2022/software/libxml2/2.9.10-GCCcore-11.2.0/lib/libxml2.so.2 [LD_LIBRARY_PATH]
│   │   │   ├── /p/software/jusuf/stages/2022/software/XZ/5.2.5-GCCcore-11.2.0/lib/liblzma.so.5 [LD_LIBRARY_PATH]
│   │   │   └── /p/software/jusuf/stages/2022/software/zlib/1.2.11-GCCcore-11.2.0/lib/libz.so.1 [LD_LIBRARY_PATH]
│   │   ├── /p/software/jusuf/stages/2022/software/XZ/5.2.5-GCCcore-11.2.0/lib/liblzma.so.5 [LD_LIBRARY_PATH]
│   │   ├── /p/software/jusuf/stages/2022/software/zlib/1.2.11-GCCcore-11.2.0/lib/libz.so.1 [LD_LIBRARY_PATH]
│   │   └── /usr/lib64/libnvidia-ml.so.1 [ld.so.conf]
│   ├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libOpenCL.so.1 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libcudart.so.11.0 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/libpciaccess/0.16-GCCcore-11.2.0/lib/libpciaccess.so.0 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/libxml2/2.9.10-GCCcore-11.2.0/lib/libxml2.so.2 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/XZ/5.2.5-GCCcore-11.2.0/lib/liblzma.so.5 [LD_LIBRARY_PATH]
│   ├── /p/software/jusuf/stages/2022/software/zlib/1.2.11-GCCcore-11.2.0/lib/libz.so.1 [LD_LIBRARY_PATH]
│   ├── /usr/lib64/librt.so.1 [ld.so.conf]
│   ├── /usr/lib64/libnvidia-ml.so.1 [ld.so.conf]
│   └── /usr/lib64/libutil.so.1 [ld.so.conf]
├── /p/software/jusuf/stages/2022/software/libevent/2.1.12-GCCcore-11.2.0/lib/libevent_core-2.1.so.7 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/libevent/2.1.12-GCCcore-11.2.0/lib/libevent_pthreads-2.1.so.7 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/hwloc/2.5.0-GCCcore-11.2.0/lib/libhwloc.so.15 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libOpenCL.so.1 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/CUDA/11.5/lib/libcudart.so.11.0 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/libpciaccess/0.16-GCCcore-11.2.0/lib/libpciaccess.so.0 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/libxml2/2.9.10-GCCcore-11.2.0/lib/libxml2.so.2 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/XZ/5.2.5-GCCcore-11.2.0/lib/liblzma.so.5 [LD_LIBRARY_PATH]
├── /p/software/jusuf/stages/2022/software/zlib/1.2.11-GCCcore-11.2.0/lib/libz.so.1 [LD_LIBRARY_PATH]
├── /usr/lib64/libnvidia-ml.so.1 [ld.so.conf]
├── /usr/lib64/libutil.so.1 [ld.so.conf]
└── /usr/lib64/librt.so.1 [ld.so.conf]

excluding loading the module would leave most of the libraries unresolved since LD_LIBRARY_PATH is being relied upon (and this is just for the direct dependencies, not including the libraries that OpenMPI would load via plugins).

@eschnett
Copy link

eschnett commented Feb 8, 2022

I don't think that LD_LIBRARY_PATH or similar mechanisms are going to work if there are shared libraries with identical names.

MPItrampoline is used in the build process of potentially large applications. It would be convenient if this didn't need to be changed.

MPIwrapper, on the other hand, is usually only used to wrap a single library (an MPI implementation), and this can be controlled much more easily. Often, it would wrap a system-provided MPI implementation.

My suggestion would be to ensure that MPIwrapper is built and installed in such a way that it does not rely on LD_LIBRARY_PATH, nor on modules that need to be loaded, nor other similar mechanisms. If you control how the underlying MPI library is built, then you can also build it as shared library (still using -fPIC), and then MPIwrapper will not need to find it at all.

When I build MPIwrapper, I tend to explicitly pass the location of the desired MPI library, and cmake will then bake this into the generated libraries. Here is an example:

cmake -S . -B mpiwrapper-openmpi -DCMAKE_CXX_COMPILER=/cm/shared/apps/openmpi/gcc-9/64/4.1.0/bin/mpic++ -DCMAKE_Fortran_COMPILER=/cm/shared/apps/openmpi/gcc-9/64/4.1.0/bin/mpifort -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$HOME/src/c/MPIstuff/mpiwrapper-openmpi
cmake --build mpiwrapper-openmpi
cmake --install mpiwrapper-openmpi

@boegel
Copy link
Member Author

boegel commented Feb 9, 2022

@eschnett OK, but what if OpenMPI links to UCX? Those libuc*.so libraries need to be found.

Are you implicitely assuming that the libmpi.so of OpenMPI knows where the libraries it depends on are located (via RPATH)?
If so, that's absolutely fine, we can deal with that, EasyBuild has robust support for RPATH linking.

The default mechanism in EasyBuild is currently still to resolve libraries via $LD_LIBRARY_PATH, but if that won't work for MPItrampoline and the MPI libraries it can talk to (and its dependencies), then that's OK (right @ocaisa?), it just needs to be clear...

@eschnett
Copy link

eschnett commented Feb 9, 2022

MPItrampoline offers several options to "help" MPIwrapper load the underlying MPI library. The main problem in practice is that one doesn't have any control on how the underlying MPI library is installed on an HPC system.

If you are building e.g. MPICH or OpenMPI yourself, then things are much easier. For example, in Julia we ship a standard MPICH for use with MPItrampoline, and this is built as a static library, and thus MPIwrapper is a single shared library (technically called a "plugin").

There are also various issues specific to macOS. I don't know whether this is a major concern for you. I am not aware of any macOS HPC system, so I didn't put much effort into this. Building MPICH or OpenMPI there should work fine.

There is an environment variable MPITRAMPOLINE_PRELOAD that allows you to define a list of libraries (including their complete paths) that will be loaded before MPIwrapper is loaded. This is intended for MPI libraries that are not installed well, e.g. in case they need librt but don't declare this dependency, and if MPItrampoline loads MPIwrapper into its own namespace (this is disabled by default) there can then be unresolved symbols. In practice, I find that namespaces do not work (probably because MPI libraries are installed in ways not quite designed for that), and that using RTLD_DEEPBIND (which is the default) is sufficient.

I don't know of a way to influence the search path when loading the dependencies of a plugin (i.e. the actual MPI library). It would be straightforward to introduce a new environment variable MPITRAMPOLINE_LD_LIBRARY_PATH that makes MPItrampoline temporarily modify LD_LIBRARY_PATH when loading the plugin. I don't think this will work – I think that the ELF loader takes a snapshot of LD_LIBRARY_PATH very early during startup – but if you can point me to some documentation, I'd be happy to add that feature.

You can set the environment variable MPITRAMPOLINE_VERBOSE=1 when running the application to see more verbose output.

In short:

  • I recommend using LD_LIBRARY_PATH (if at all) only for the main application and MPItrampoline, not for MPIwrapper nor the actual MPI library
  • If possible, I would use RPATH for the actual MPI library and all its dependencies
  • If that isn't feasible, then you can try MPITRAMPOLINE_PRELOAD, and listing all MPIwrapper dependencies there (this would be quite tedious, though)
  • If it is possible to re-define LD_LIBRARY_PATH while loading the plugin, then I'd be happy to add that feature
  • Building MPItrampoline as static library might also be a good approach; it's a rather small library compared to an actual MPI library

As a side note: I tried renaming MPItrampoline's libmpi.so to libmpitrampoline.so, but this doesn't work. There are important applications (I don't recall which, might have been Boost or PETSc) which expect the MPI library to be linked via -lmpi. I would very much like to rename it if that was possible because this would avoid quite a bit of confusion.

If you have further questions I'd be happy to have a Zoom call.

@ocaisa
Copy link
Member

ocaisa commented Feb 10, 2022

Ok, I think we can probably work with MPITRAMPOLINE_PRELOAD. We can use libtree to check which libraries are resolved by LD_LIBRARY_PATH and preload those. For OpenMPI, we'd also need to check the plugins (I bet there is something similar for MPICH). It might be a long list but ok. I like this idea because it also gives a sensible way to deal with mixing a system installation of MPI inside MPIwrapper and a Gentoo Prefix installation of MPItrampoline (which would be using a it's own linker, and therefore have different default search paths).

The complexity is definitely in figuring out exactly what to do with MPIwrapper. We can break the effort into two parts, making the initial installation of MPItrampoline with an OpenMPI default (built as a component of that installation) should work already and we can already start exploring that. We can figure out the tricky part of MPIwrapper later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants