Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add documentation on heterogeneous clusters (WIP) #448

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 122 additions & 0 deletions docs/Heterogeneous_clusters.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
.. _heterogeneous_clusters:

Heterogeneous clusters
=======================================

This page provides an overview on the different ways in which EasyBuild can be setup in heterogeneous clusters.

Here, by "heterogeneous clusters" we mean clusters with nodes that support different instruction sets, either of the
same family (e.g. Intel "broadwell", "skylake") or different ones (e.g. Intel "skylake", AMD "epyc"). There are other
ways in which a cluster can be heterogeneous, e.g. different OS versions, and some of the options covered here can be
applied to those, but they will not be covered explicitly.

For some time now new instruction sets are how the most significant performance differences in new architectures are
realized. The most common example is the width and operations of vectorization extensions (e.g. SSE, AVX), and so
building software that takes advantage of those is crucial for HPC.

.. contents::
:depth: 3
:backlinks: none

.. _heterogenous_clusters_defaults:

Default behaviour of EasyBuild in heterogeneous clusters
--------------------------------------------------------

By default, EasyBuild optimizes builds for the CPU architecture of the build host, by instructing the compiler to
generate instructions for the highest instruction set supported by the process architecture of the build host
processor (cfr. :ref:`controlling_compiler_optimization_flags`).

In an heterogenous cluster, this means that the software may not run in nodes that do not support the build host's
instruction set (it would exit with an ``Illegal instruction`` error, in the case of software built with GNU
toolchains, or ``Please verify that both the operating system and the processor support X, Y and Z instructions``
for software built with Intel toolchains), and that it will not be fully optimized when running in nodes that support
higher instruction sets than those of the build host. The first problem can be solved by building with
``--optarch=GENERIC``, but it will make the second problem even worse.

(With an Intel toolchain, the problem can be reduced by generating multiple code paths with the ``-ax`` compiler
option in ``--optarch``, but no such option is available (yet) in the GNU toolchains. Intel Math Kernel Library will
automatically dispatch optimized versions of routines according to the execution node's instruction set, and OpenBLAS
can also be built for multiple instruction sets, but that is not the default behaviour.)

The solution is then to build multiple copies of each software, at least for those where performance is crucial,
which is easily achieved simply by running EasyBuild from each type of node, the caveat being where exactly to
install copies for different architectures in a way that they can be loaded, with their dependencies, and used
effectively across the cluster by all users.

.. _heterogeneous_clusters_visibility:

Visibility of achitectures in heterogenous clusters
---------------------------------------------------

One way of distinguishing between the many alternatives for using EasyBuild in an heterogeneous cluster concerns
whether each host only sees the software compiled for its own architecture (plus any software eventually compiled for
``GENERIC``) or if it sees everything.

By mounting architecture dependent targets on the same mountpoint in every host, the configuration is then very
similar to what it would be in an homogeneous cluster, except that each (non-``GENERIC``) software still needs to be
built for each architecture.

This can be more robust, in the sense that from the point of view of each node, it looks like an homogeneous cluster.
On the other hand, it is less flexible, as there are situations where it can be useful to load software built for
another instruction set (usually, a subset).

In order to maximize visibility and flexibility, all architectures can be visible, and the default architecture
controlled with an architecture environment variable inserted into ``EASYBUILD_INSTALLPATH`` and ``MODULEPATH``,
at least.

.. _heterogenous_clusters_reducing_clutter:

Reducing clutter in heterogeneous clusters
------------------------------------------

In either of the two options above, multiple paths can be used to separate ``GENERIC`` software that only needs to be
compiled once, or not at all (template libraries and software only available in binary form), from software that needs
to be optimized for each architecture.

One specific case is the one of the Hierarchical Module Naming Scheme (HMNS), since the packages in the ``Core`` level
are good candidates for a single ``GENERIC`` build, but this needs to be done manually. Since most of the modules here
are typically built as dependencies, this option implies separately building all ``Core`` software with
``--optarch=GENERIC`` once before building applications that depend on them.

Alternatively, instead of using the hierarchy to decide what to build for a generic architecture, one could decide
based on the toolchain, e.g. by associating ``--optarch=GENERIC`` with the ``GCCcore`` toolchain, which would work for
any module naming scheme.

.. _heterogenous_clusters_users:

User-built software in heterogeneous clusters
---------------------------------------------

EasyBuild allows users to leverage on the central software, modules, sources and easyconfigs repositories and build
additional or customized software on their own home folders. However, in this case the usage of mountpoints described
above would be cumbersome at best. One solution is to add an architecture dependent path to ``--subdir-user-modules``
(link to documentation on framework PR #2395?).

.. _heterogenous_clusters_examples:

Examples of EasyBuild configurations in sites with heterogeneous clusters
-------------------------------------------------------------------------

While the alternatives mentioned above should be simple to implement and solve the main issues in heterogeneous
clusters, many sites using EasyBuild have more sophisticated configurations with either better solutions to these
issues or addressing further, eventually site-specific, issues.

Regardless, it is useful to look at examples of how sites are using EasyBuild with heterogeneous clusters (and why).

HPC UGent
~~~~~~~~~

...

sciCORE Basel
~~~~~~~~~~~~~

...

Compute Canada
~~~~~~~~~~~~~~

...

...