Inject `OMP_NUM_THREADS=1` environmental variable #187

chrisjsewell · 2021-06-10T15:19:33Z

In the latest version of QM (21.06.04) I manually added export OMP_NUM_THREADS=1 in the ~/.bashrc. This was to remedy issues we have found with parallel (MPI) runs of the simulation codes (at least quantum espresso and siesta); these work fine in isolation, but have issues in the full build (with all codes installed)

To finalise this:

If this is sufficient to solve the problem and/or if it will have any negative impact for other codes.
We should decide where this should be injected into the build automation (possibly as a task in marvel-nccr.simulationbase).

From email discussion:

@giovannipizzi :

A few more debugging and possibly the answer to the problem:

I tried to increase the number of CPUs available to the VM: I discovered that in the problematic one, the actual CPU usage (even when running with mpirun -np 2) was actually 400% rather than 200%; this hinted to me to a concurrent usage of both OPENMP and MPI.
it seems that setting
export OMP_NUM_THREADS=1
before running in the problematic VM fixes the issue (@chris or @iurii, can you confirm?).
If this is the case, we can set this in the .bashrc

What was still puzzling is that setting various values of OMP_NUM_THREADS in the fast machine does not change the behaviour.
I then checked the list of linked dynamic libraries for the two executables: the difference was this:
libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0

I tried to uninstall libopenblas as follows:
sudo apt-get remove libopenblas-base
and now it seems that QE is back to normal speed!

As a double check, I installed the package in the fast machine:
sudo apt-get update
sudo apt-get install libopenblas-base
and indeed the mpirun go back to very slow.

CONCLUSION: So it seems the problem is the presence of openblas that will try to use as many OpenMPI processes as possible, unless export OMP_NUM_THREADS=1 is set.
@chris can you confirm this behaviour?

Since openblas might be needed by some codes, I think the best approach is to add export OMP_NUM_THREADS=1 in the .bashrc

@chris: once this is confirmed, can you please summarise the results in the appropriate issues, so we'll find this information in a few years when this problem will pop up again? :-)
And also see if you can prepare a "fixed" Quantum Mobile for the future? (maybe Iurii's school it's ok to use the VM that already works, so there is no need for additional tests)

@chrisjsewell :

obenblas appears to be installed (only) by Fleur

From https://groups.google.com/g/openblas-users/c/W6ehBvPsKTw/m/N0_nyMyYlS0J:

When both OPENBLAS_NUM_THREADS and OMP_NUM_THREADS are set, OpenBLAS will use OPENBLAS_NUM_THREADS. If OPENBLAS_NUM_THREADS is unset, OpenBLAS will try to use OMP_NUM_THREADS.

Nicola:

my understanding is that the mpi parallialzation is already hard coded in optimal ways, and having libraries spin off threads is actually counterproductive and inteferes.

no idea if this is actually still true.

The text was updated successfully, but these errors were encountered:

ltalirz · 2021-08-02T08:00:53Z

I agree that this should go into marvel-nccr.simulationbase

chrisjsewell changed the title ~~Inject `~~ Inject OMP_NUM_THREADS=1 environmental variable Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inject `OMP_NUM_THREADS=1` environmental variable #187

Inject `OMP_NUM_THREADS=1` environmental variable #187

chrisjsewell commented Jun 10, 2021

ltalirz commented Aug 2, 2021

Inject OMP_NUM_THREADS=1 environmental variable #187

Inject OMP_NUM_THREADS=1 environmental variable #187

Comments

chrisjsewell commented Jun 10, 2021

ltalirz commented Aug 2, 2021

Inject `OMP_NUM_THREADS=1` environmental variable #187

Inject `OMP_NUM_THREADS=1` environmental variable #187