Juropatest

The prototype for the replacement of Juropa can currently be accessed via juropatest.fz-juelich.de.

The system is equipped with 70 compute nodes, each of which has two 14 (!) core Xeon E5-2695 v3 processors at 2.3 GHz which support FMA and symmetric multithreading. As a result, theoretical peak performance per core is 18.4 GFlop/s (but this is a very synthetic number).

Compilation

The first step is to load the intel environment which will provide the Intel compiler.

$ module load intel

Pure MPI

To configure tmlqcd (in pure MPI mode and with 4D parallelization), we proceed as follows:

$CODEPATH/configure --disable-omp --enable-mpi --with-mpidimension=4 --enable-alignment=32 
--with-lapack="-L/usr/local/software/juropatest/Stage1/software/MPI/intel/2015.0.090/impi/5.0.1.035/imkl/11.2.0.090/mkl/lib/intel64 
-lmkl_blas95_lp64 -lmkl_avx2 -lmkl_core -lmkl_sequential -lmkl_intel_lp64" 
--with-limedir=$yourlimedir --disable-sse2 --disable-sse3 --enable-gaugecopy 
--enable-halfspinor CC=mpicc CFLAGS=-fma -axCORE-AVX2 -O3 -std=c99 F77=ifort

and in one line for easy copying:

$CODEPATH/configure --disable-omp --enable-mpi --with-mpidimension=4 --enable-alignment=32 --with-lapack="-L/usr/local/software/juropatest/Stage1/software/MPI/intel/2015.0.090/impi/5.0.1.035/imkl/11.2.0.090/mkl/lib/intel64 -lmkl_blas95_lp64 -lmkl_avx2 -lmkl_core -lmkl_sequential -lmkl_intel_lp64" --with-limedir=$yourlimedir --with-lemondir=$yourlemondir --disable-sse2 --disable-sse3 --enable-gaugecopy --enable-halfspinor CC=mpicc CFLAGS=-fma -axCORE-AVX2 -O3 -std=c99 F77=ifort

Hybrid

The pesky factor of 7 in the number of cores means that it might make sense to use the hybrid code with or without overlapping (to be tested!) of communication and computation. The former does three volume loops and is not necessarily faster because of this overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Juropatest

Compilation

Pure MPI

Hybrid

Overlapping

No Overlapping

Clone this wiki locally