Skip to content
Bartosz Kostrzewa edited this page Jan 7, 2015 · 11 revisions

The prototype for the replacement of Juropa can currently be accessed via juropatest.fz-juelich.de.

The system is equipped with 70 compute nodes, each of which has two 14 (!) core Xeon E5-2695 v3 processors at 2.3 GHz which support FMA and symmetric multithreading. As a result, theoretical peak performance per core is 18.4 GFlop/s (but this is a very synthetic number).

Compilation

The first step is to load the intel environment which will provide the Intel compiler.

$ module load intel

Pure MPI

To configure tmlqcd (in pure MPI mode and with 4D parallelization), we proceed as follows:

$CODEPATH/configure --disable-omp --enable-mpi --with-mpidimension=4 --enable-alignment=32 
--with-lapack="-L/usr/local/software/juropatest/Stage1/software/MPI/intel/2015.0.090/impi/5.0.1.035/imkl/11.2.0.090/mkl/lib/intel64 
-lmkl_blas95_lp64 -lmkl_avx2 -lmkl_core -lmkl_sequential -lmkl_intel_lp64" 
--with-limedir=$yourlimedir --disable-sse2 --disable-sse3 --enable-gaugecopy 
--enable-halfspinor CC=mpicc CFLAGS=-fma -axCORE-AVX2 -O3 -std=c99 F77=ifort

and in one line for easy copying:

$CODEPATH/configure --disable-omp --enable-mpi --with-mpidimension=4 --enable-alignment=32 --with-lapack="-L/usr/local/software/juropatest/Stage1/software/MPI/intel/2015.0.090/impi/5.0.1.035/imkl/11.2.0.090/mkl/lib/intel64 -lmkl_blas95_lp64 -lmkl_avx2 -lmkl_core -lmkl_sequential -lmkl_intel_lp64" --with-limedir=$yourlimedir --with-lemondir=$yourlemondir --disable-sse2 --disable-sse3 --enable-gaugecopy --enable-halfspinor CC=mpicc CFLAGS=-fma -axCORE-AVX2 -O3 -std=c99 F77=ifort

Hybrid

The pesky factor of 7 in the number of cores means that it might make sense to use the hybrid code with or without overlapping (to be tested!) of communication and computation. The former does three volume loops and is not necessarily faster because of this overhead.

Overlapping

No Overlapping