Eigen compatibility with OpenACC #311

iomaganaris · 2020-04-27T16:55:01Z

This issue was raised when running nrnivmodl-core with the following ModelDB model https://senselab.med.yale.edu/ModelDB/ShowModel?model=19176&file=%2fHCN2k%2fhcn2.mod#tabs-2 using PGI 19.4 and PGI 19.10 with the OpenACC backend generated from NMODL.
The generated c++ file for hcn2.mod file contains a call to the following Eigen solver generated by the translation of the DERIVATIVE block:

X = Eigen::PartialPivLU<Eigen::Ref<Eigen::Matrix<double, 4, 4>>>(Jm).solve(F);

Compiling this file with pgc++ there is the following issue (with -Minfo=acc added to the compilation flags):

PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 63)
std::exception::exception():
     22, include "multicore.hpp"
          32, include "membfunc.hpp"
               31, include "vector"
                    61, include "allocator.h"
                         46, include "c++allocator.h"
                              33, include "new_allocator.h"
                                   33, include "new"
                                        40, include "exception"
                                             63, Generating implicit acc routine seq
PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 63)
std::exception::exception() [subobject]:
      0, Generating implicit acc routine seq
PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 57)
std::bad_alloc::bad_alloc():
     22, include "multicore.hpp"
          32, include "membfunc.hpp"
               31, include "vector"
                    61, include "allocator.h"
                         46, include "c++allocator.h"
                              33, include "new_allocator.h"
                                   33, include "new"
                                        57, Generating implicit acc routine seq

This was due to throwing an exception in https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/util/Memory.h#L70.
After fixing this issue by commenting out the problematic line, there was another issue regarding atomic coming from https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/products/Parallelizer.h#L14 which was fixed by adding the -DEIGEN_HAS_CXX11_ATOMIC=0 compiler flag to pgc++.

PGCC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected load/store type (x86_64/core/mod2c/hcn2.cpp: 1)
std::__atomic_base<int>::store(int, std::memory_order):
     32, include "LU"
          11, include "Core"
              297, include "Parallelizer.h"
                    14, include "atomic"
                         41, include "atomic_base.h"
                             468, Generating implicit acc routine seq
                                  Generating acc routine seq
                                  Generating Tesla code
PGCC-F-0704-Compilation aborted due to previous errors. (x86_64/core/mod2c/hcn2.cpp

Following those, there was an issue coming from the llvm based pgc++ compiler, so we tried with the nollvm backend.

Eigen::EigenBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double, double>, const Eigen::Transpose<const Eigen::Block<const Eigen::Block<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)4, (int)4, (int)0, (int)4, (int)4>, (int)0, Eigen::OuterStride<(int)-1>>, (int)4, (int)1, (bool)1>, (int)-1, (int)1, (bool)0>, (int)1, (int)1, (bool)0>>, const Eigen::Block<const Eigen::Block<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)4, (int)4, (int)0, (int)4, (int)4>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)4, (bool)0>, (int)1, (int)-1, (bool)0>, (int)1, (int)1, (bool)0>>>::cols() const:
     32, include "LU"
          11, include "Core"
              240, include "EigenBase.h"
                    62, Generating implicit acc routine seq
                        Generating acc routine seq
                        Generating Tesla code
              259, include "NoAlias.h"
              261, include "Matrix.h"
              265, include "CwiseUnaryOp.h"
              271, include "Stride.h"
              273, include "Map.h"
              275, include "Block.h"
              279, include "Transpose.h"
              283, include "Redux.h"
              289, include "Solve.h"
              291, include "SolverBase.h"
              293, include "Transpositions.h"
pgc++-Fatal-/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm TERMINATED by signal 11
Arguments to /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm
/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm x86_64/core/mod2c/hcn2.cpp -opt 2 -terse 1 -inform warn -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -vect 56 -y 34 16 -x 34 0x8 -x 32 25952256 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 59 4 -x 129 2 -tp skylake -x 120 0x1000 -astype 0 -x 121 1 -fn x86_64/core/mod2c/hcn2.cpp -il /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++uoOV8vHgaqXY.il/hcn2.il -inlib /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++0oOVCczrYGzB.ext -insize 200 -x 221 25000 -x 222 5 -x 115 10 -x 14 32 -x 117 0x200 -x 123 0x80000000 -x 123 4 -x 119 0x20 -def __pgnu_vsn=80300 -x 70 0x40000000 -x 183 4 -x 121 0x800 -x 6 0x20000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -x 249 70 -x 120 0x200000 -x 70 0x40000000 -x 164 0x800000 -x 39 4 -x 68 0x1 -accel tesla -accel host -x 186 0x80000 -x 180 0x4000400 -x 163 0x1 -cudaver 10010 -x 176 0x100 -cudacap 70 -x 121 0xc00 -x 194 0x40000 -x 186 0x80 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/2019/cuda/10.1 -x 9 1 -x 72 0x1 -x 136 0x11 -x 37 0x481000 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 16384 -x 162 16384 -x 62 8 -gnuvsn 80300 -x 69 0x200 -x 123 0x400 -cmdline '+pgc++ x86_64/core/mod2c/hcn2.cpp -O2 -Mvect=simd -acc --diag_suppress 177 -mp --c++14 -Minline=size:200,levels:10 -Minfo=acc -DEIGEN_HAS_CXX11_ATOMIC=0 -DPG_ACC_BUGS -DCUDA_PROFILING -DCORENEURON_BUILD -DHAVE_MALLOC_H -DSWAP_ENDIAN_DISABLE_ASM -DEIGEN_DONT_VECTORIZE=1 -DNRNMPI=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING -I/gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/CoreNeuron/build_gpu_sympy/install/include -I/gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/CoreNeuron/build_gpu_sympy/install/include/coreneuron/utils/randoms -I/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/tools/2020-02-01/linux-rhel7-x86_64/gcc-8.3.0/hpe-mpi-2.21-7pbszh6v5u/include -fPIC -c -o x86_64/core/build/hcn2.o' -asm /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++uoOV8WmKrzY7.ll
make: *** [x86_64/core/build/hcn2.o] Error 127

The final issue we came across was the following:

PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 336)
void Eigen::internal::gemv_dense_selector<(int)2, (int)1, (bool)1>::run<Eigen::Transpose<const Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>>, Eigen::Transpose<const Eigen::Block<const Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)-1, (bool)0>>, Eigen::Transpose<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)-1, (bool)0>>>(const T1 &, const T2 &, T3 &, const T3::Scalar &):
     32, include "LU"
          11, include "Core"
              288, include "GeneralProduct.h"
                   309, Generating implicit acc routine seq
              292, include "PermutationMatrix.h"
              294, include "TriangularMatrix.h"
              317, include "VectorwiseOp.h"
Unimplemented opcode: 0
PGCC-F-0000-Internal compiler error. Unimplemented opcode.       4  (x86_64/core/mod2c/hcn2.cpp: 46)
PGCC/x86 Linux 19.10-0: compilation aborted
make: *** [x86_64/core/build/hcn2.o] Error 2

For this there is no solution found.
To reproduce all the issues in a gpu node:

module load git
git clone https://github.com/BlueBrain/CoreNeuron.git
cd CoreNeuron
mkdir build_gpu_sympy
cd build_gpu_sympy
module load nvhpc cuda hpe-mpi boost python-dev flex bison cmake
cmake .. -DCMAKE_INSTALL_PREFIX=./install -DCORENRN_ENABLE_NMODL=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_NMODL_FLAGS="sympy --analytic" -DNRN_ENABLE_CORENEURON=ON -DCMAKE_BUILD_TYPE=Release
make -j36
cd HCN2k
mkdir mod_eigen
cp hcn2.mod mod_eigen
<install-dir>/bin/nrnivmodl -coreneuron mod_eigen

CoreNeuron and NMODL master branches were used

The text was updated successfully, but these errors were encountered:

iomaganaris · 2020-06-03T09:24:29Z

Following the incompatibilities of Eigen with OpenACC I started investigating if Eigen can be called from CUDA kernels. To do this I am using a simple example using the Eigen::PartialPivLU solver created by @cattabiani on top of which I added a CUDA kernel to run the same solver.
My tries are currently WIP here.
During the development I faced 4 issues:

Compilation issues when adding the Eigen::PartialPivLU solver in the CUDA kernel. To get the code compiled I needed to do the following changes in the Eigen source code:

diff --git a/Eigen/src/Core/SolverBase.h b/Eigen/src/Core/SolverBase.h
index 501461042..e7d5ca5a3 100644
--- a/Eigen/src/Core/SolverBase.h
+++ b/Eigen/src/Core/SolverBase.h
@@ -94,7 +94,7 @@ class SolverBase : public EigenBase<Derived>
     SolverBase()
     {}

-    ~SolverBase()
+    EIGEN_DEVICE_FUNC ~SolverBase()
     {}

     using Base::derived;
@@ -102,7 +102,7 @@ class SolverBase : public EigenBase<Derived>
     /** \returns an expression of the solution x of \f$ A x = b \f$ using the current decomposition of A.
       */
     template<typename Rhs>
-    inline const Solve<Derived, Rhs>
+    EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const Solve<Derived, Rhs>
     solve(const MatrixBase<Rhs>& b) const
     {
       internal::solve_assertion<typename internal::remove_all<Derived>::type>::template run<false>(derived(), b);
diff --git a/Eigen/src/LU/PartialPivLU.h b/Eigen/src/LU/PartialPivLU.h
index b8938013a..dafcab719 100644
--- a/Eigen/src/LU/PartialPivLU.h
+++ b/Eigen/src/LU/PartialPivLU.h
@@ -173,7 +173,7 @@ template<typename _MatrixType> class PartialPivLU
       * \sa TriangularView::solve(), inverse(), computeInverse()
       */
     template<typename Rhs>
-    inline const Solve<PartialPivLU, Rhs>
+    EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const Solve<PartialPivLU, Rhs>
     solve(const MatrixBase<Rhs>& b) const;
     #endif

@@ -593,7 +593,7 @@ struct Assignment<DstXprType, Inverse<PartialPivLU<MatrixType> >, internal::assi
   * \sa class PartialPivLU
   */
 template<typename Derived>
-inline const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
 MatrixBase<Derived>::partialPivLu() const
 {
   return PartialPivLU<PlainObject>(eval());

Moving the Eigen MatrixXd and VectorXd structs to the GPU. To do this I followed two paths.

Move the structs as they are based on this suggestion, where I managed to get it compiled but the matrices in the kernel had only zeros
Turn the structs into C style arrays (double*) to move them and then use Eigen::Map to map them to Eigen structs. This is the current implementation but still there is some issue with the memory since it's all 0s in the GPU
It seems that moving straight away the Eigen structs is nicer but needs more work to understand what is the issue I came across.

After doing all the above I get the following error during execution:

bash-4.2$ ./testEigenGPU
Size of the matrix?
4
v_device data:
1
3
5
3
v in device
0.000000
0.000000
0.000000
0.000000
Error with cudaDeviceSync: unspecified launch failure
Random matrix:

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1

Random vector:

1
3
5
3

Solution (x) of M*x = v:

0
1
2
3

Device Solution (x) of M*x = v:

0x7fff7a000400

I tried to debug this with ddt, cuda-gdb and cuda-memcheck and I get the following with cuda-gdb:

Starting program: /gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/testEigen/build/testEigenGPU
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffcffff700 (LWP 244304)]
Size of the matrix?
4
v_device data:
1
3
5
3
[New Thread 0x7fffbdb06700 (LWP 244306)]
v in device
0.000000
0.000000
0.000000
0.000000

Thread 1 "testEigenGPU" received signal SIGTRAP, Trace/breakpoint trap.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x0000000000db4900 in runPartialPivLuGPU(double*, double*, double*, int)<<<(1,1,1),(1,1,1)>>> ()

and cuda-memcheck:

bash-4.2$ cuda-memcheck ./testEigenGPU
========= CUDA-MEMCHECK
Size of the matrix?
4
v_device data:
1
3
5
3
========= Unknown Error
=========
v in device
0.000000
0.000000
0.000000
0.000000
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so.1 [0x3b9803]
=========     Host Frame:./testEigenGPU [0x54d16]
=========     Host Frame:./testEigenGPU [0x4caf]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22495]
=========     Host Frame:./testEigenGPU [0x4489]
=========
Error with cudaDeviceSync: unspecified launch failure

By googling the errors I found out that those errors probably come from some segmentation fault coming from the Eigen::PartialPivLU solver.
4. A bunch of warning: calling a __host__ function from a __host__ __device__ function is not allowed during compilation, which I don't know if they are really used by the solver and are the root of all the errors

CUDA used: 10.1.243
GCC used: 9.3.0

TODO:
~~1. Check why the matrices are not copied correctly into the device~~
2. Try to debug the errors

cc: @pramodk @ohm314

…resolve accuracy issues

…wMajor storage order

…wMajor storage order (improvement with in-place transposition)

…r and its unit test (full compatibility with OpenACC/CUDA backends)

…ag for PGI - resolve failing CI

* Two factors contribute to the above solution: - New eigen branch (version 3.5 and above). Currently we are using a mirrored version of Eigen in BlueBrain organisation https://github.com/BlueBrain/eigen/releases/tag/v3.5-alpha - An API that makes possible to call any Eigen `__device__` function from within OpenACC regions. * More details: Eigen-3.5+ provides better GPU support; however, some functions cannot be called directly from within OpenACC regions. Therefore, we need to wrap them in a special API (decorate them with `__device__` & `acc routine` tokens), which allows us to eventually call them from OpenACC. Calling these functions from CUDA kernels presents no issue. * From #726: Avoid use `[]` operator with eigen Matrix objects. This results into runtime error with OpenACC and PGI compiler. * Note that this should works in combination with BlueBrain/CoreNeuron/pull/624 fixes #311 fixes #135

BlueBrain/nmodl#311 has been closed, so hopefully these work now.

- Create nrnivmodl targets in nrn_add_test_group and add a CORENEURON option to that command. This means that we do not have to compile *all* MOD files for CoreNEURON if CoreNEURON is enabled in the build. - Drop MODFILE_PATTERNS, NRNIVMODL_ARGS and SUBMODULE arguments to nrn_add_test that were not being used. - The same de-duplication is done as before, so there is no change to the number of nrnivmodl invocations, but some targets no longer pass -coreneuron to nrnivmodl. * testcorenrn: re-enable NMODL+GPU tests now BlueBrain/nmodl#311 is closed.

iomaganaris added codegen Code generation backend solver Solver and numerical methods gpu labels Apr 27, 2020

kotsaloscv self-assigned this May 21, 2021

kotsaloscv pushed a commit that referenced this issue May 31, 2021

Eigen compatibility with OpenACC #311 : WIP

13eb7c9

kotsaloscv pushed a commit that referenced this issue Jun 3, 2021

Eigen compatibility with OpenACC #311 : WIP

f3734ea

kotsaloscv pushed a commit that referenced this issue Jun 4, 2021

Eigen compatibility with OpenACC #311 : WIP

d057bdb

kotsaloscv pushed a commit that referenced this issue Jun 4, 2021

Eigen compatibility with OpenACC #311 : WIP

dda0ff0

kotsaloscv pushed a commit that referenced this issue Jun 4, 2021

Eigen compatibility with OpenACC #311 : WIP

62bdd32

kotsaloscv linked a pull request Jun 7, 2021 that will close this issue

Eigen openacc compatibility #690

Closed

kotsaloscv pushed a commit that referenced this issue Jun 7, 2021

Eigen compatibility with OpenACC #311 : clang-format v11

dd4b984

kotsaloscv pushed a commit that referenced this issue Jun 8, 2021

Eigen compatibility with OpenACC #311 : Crout solver modification to …

3466882

…resolve accuracy issues

kotsaloscv pushed a commit that referenced this issue Jun 9, 2021

Eigen compatibility with OpenACC #311 : Resolved a bug with Eigen::Ro…

808d06e

…wMajor storage order

kotsaloscv pushed a commit that referenced this issue Jun 9, 2021

Eigen compatibility with OpenACC #311 : Resolved a bug with Eigen::Ro…

41fc88c

…wMajor storage order (improvement with in-place transposition)

kotsaloscv pushed a commit that referenced this issue Jun 9, 2021

Eigen compatibility with OpenACC #311 / #135 : Updated Newton's solve…

a94d025

…r and its unit test (full compatibility with OpenACC/CUDA backends)

kotsaloscv pushed a commit that referenced this issue Jun 10, 2021

Eigen compatibility with OpenACC #311 / #135 : Added a compilation fl…

c4a0e21

…ag for PGI - resolve failing CI

kotsaloscv pushed a commit that referenced this issue Jul 20, 2021

Eigen compatibility with OpenACC #311 / #135 : Merge with master

a369989

pramodk added the bug Something isn't working label Jul 20, 2021

olupton added the frontiers-paper-2021 label Aug 23, 2021

kotsaloscv mentioned this issue Sep 6, 2021

Compatibility issues between Eigen and GPUs (OpenACC/CUDA) #728

Merged

kotsaloscv linked a pull request Sep 6, 2021 that will close this issue

Compatibility issues between Eigen and GPUs (OpenACC/CUDA) #728

Merged

pramodk closed this as completed in #728 Sep 6, 2021

olupton mentioned this issue Apr 6, 2022

NeuronTestHelper: CORENEURON option, reorganise. neuronsimulator/nrn#1768

Merged

olupton added a commit to neuronsimulator/nrn that referenced this issue Apr 6, 2022

testcorenrn: re-enable NMODL+GPU tests.

4289b45

BlueBrain/nmodl#311 has been closed, so hopefully these work now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eigen compatibility with OpenACC #311

Eigen compatibility with OpenACC #311

iomaganaris commented Apr 27, 2020 •

edited

Loading

iomaganaris commented Jun 3, 2020 •

edited

Loading

Eigen compatibility with OpenACC #311

Eigen compatibility with OpenACC #311

Comments

iomaganaris commented Apr 27, 2020 • edited Loading

iomaganaris commented Jun 3, 2020 • edited Loading

iomaganaris commented Apr 27, 2020 •

edited

Loading

iomaganaris commented Jun 3, 2020 •

edited

Loading