Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eigen compatibility with OpenACC #311

Closed
iomaganaris opened this issue Apr 27, 2020 · 1 comment · Fixed by #728
Closed

Eigen compatibility with OpenACC #311

iomaganaris opened this issue Apr 27, 2020 · 1 comment · Fixed by #728
Assignees
Labels
bug Something isn't working codegen Code generation backend gpu solver Solver and numerical methods

Comments

@iomaganaris
Copy link
Contributor

iomaganaris commented Apr 27, 2020

This issue was raised when running nrnivmodl-core with the following ModelDB model https://senselab.med.yale.edu/ModelDB/ShowModel?model=19176&file=%2fHCN2k%2fhcn2.mod#tabs-2 using PGI 19.4 and PGI 19.10 with the OpenACC backend generated from NMODL.
The generated c++ file for hcn2.mod file contains a call to the following Eigen solver generated by the translation of the DERIVATIVE block:

X = Eigen::PartialPivLU<Eigen::Ref<Eigen::Matrix<double, 4, 4>>>(Jm).solve(F);

Compiling this file with pgc++ there is the following issue (with -Minfo=acc added to the compilation flags):

PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 63)
std::exception::exception():
     22, include "multicore.hpp"
          32, include "membfunc.hpp"
               31, include "vector"
                    61, include "allocator.h"
                         46, include "c++allocator.h"
                              33, include "new_allocator.h"
                                   33, include "new"
                                        40, include "exception"
                                             63, Generating implicit acc routine seq
PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 63)
std::exception::exception() [subobject]:
      0, Generating implicit acc routine seq
PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 57)
std::bad_alloc::bad_alloc():
     22, include "multicore.hpp"
          32, include "membfunc.hpp"
               31, include "vector"
                    61, include "allocator.h"
                         46, include "c++allocator.h"
                              33, include "new_allocator.h"
                                   33, include "new"
                                        57, Generating implicit acc routine seq

This was due to throwing an exception in https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/util/Memory.h#L70.
After fixing this issue by commenting out the problematic line, there was another issue regarding atomic coming from https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/products/Parallelizer.h#L14 which was fixed by adding the -DEIGEN_HAS_CXX11_ATOMIC=0 compiler flag to pgc++.

PGCC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected load/store type (x86_64/core/mod2c/hcn2.cpp: 1)
std::__atomic_base<int>::store(int, std::memory_order):
     32, include "LU"
          11, include "Core"
              297, include "Parallelizer.h"
                    14, include "atomic"
                         41, include "atomic_base.h"
                             468, Generating implicit acc routine seq
                                  Generating acc routine seq
                                  Generating Tesla code
PGCC-F-0704-Compilation aborted due to previous errors. (x86_64/core/mod2c/hcn2.cpp

Following those, there was an issue coming from the llvm based pgc++ compiler, so we tried with the nollvm backend.

Eigen::EigenBase<Eigen::CwiseBinaryOp<Eigen::internal::scalar_product_op<double, double>, const Eigen::Transpose<const Eigen::Block<const Eigen::Block<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)4, (int)4, (int)0, (int)4, (int)4>, (int)0, Eigen::OuterStride<(int)-1>>, (int)4, (int)1, (bool)1>, (int)-1, (int)1, (bool)0>, (int)1, (int)1, (bool)0>>, const Eigen::Block<const Eigen::Block<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)4, (int)4, (int)0, (int)4, (int)4>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)4, (bool)0>, (int)1, (int)-1, (bool)0>, (int)1, (int)1, (bool)0>>>::cols() const:
     32, include "LU"
          11, include "Core"
              240, include "EigenBase.h"
                    62, Generating implicit acc routine seq
                        Generating acc routine seq
                        Generating Tesla code
              259, include "NoAlias.h"
              261, include "Matrix.h"
              265, include "CwiseUnaryOp.h"
              271, include "Stride.h"
              273, include "Map.h"
              275, include "Block.h"
              279, include "Transpose.h"
              283, include "Redux.h"
              289, include "Solve.h"
              291, include "SolverBase.h"
              293, include "Transpositions.h"
pgc++-Fatal-/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm TERMINATED by signal 11
Arguments to /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm
/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/19.4/bin/pggpp2-llvm x86_64/core/mod2c/hcn2.cpp -opt 2 -terse 1 -inform warn -x 119 0xa10000 -x 122 0x40 -x 123 0x1000 -x 127 4 -x 127 17 -x 19 0x400000 -x 28 0x40000 -x 120 0x10000000 -x 70 0x8000 -x 122 1 -x 125 0x20000 -quad -vect 56 -y 34 16 -x 34 0x8 -x 32 25952256 -y 19 8 -y 35 0 -x 42 0x30 -x 39 0x40 -x 39 0x80 -x 59 4 -x 129 2 -tp skylake -x 120 0x1000 -astype 0 -x 121 1 -fn x86_64/core/mod2c/hcn2.cpp -il /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++uoOV8vHgaqXY.il/hcn2.il -inlib /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++0oOVCczrYGzB.ext -insize 200 -x 221 25000 -x 222 5 -x 115 10 -x 14 32 -x 117 0x200 -x 123 0x80000000 -x 123 4 -x 119 0x20 -def __pgnu_vsn=80300 -x 70 0x40000000 -x 183 4 -x 121 0x800 -x 6 0x20000 -autoinl 10 -x 168 400 -x 174 128000 -x 14 0x200000 -x 14 0x400000 -x 249 70 -x 120 0x200000 -x 70 0x40000000 -x 164 0x800000 -x 39 4 -x 68 0x1 -accel tesla -accel host -x 186 0x80000 -x 180 0x4000400 -x 163 0x1 -cudaver 10010 -x 176 0x100 -cudacap 70 -x 121 0xc00 -x 194 0x40000 -x 186 0x80 -x 189 0x8000 -y 163 0xc0000000 -x 189 0x10 -y 189 0x4000000 -cudaroot /gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/compilers/2020-02-01/linux-rhel7-x86_64/gcc-4.8.5/pgi-19.4-hdirysdrvd/linux86-64-llvm/2019/cuda/10.1 -x 9 1 -x 72 0x1 -x 136 0x11 -x 37 0x481000 -mp -x 69 0x200 -x 69 0x400 -x 69 2 -x 0 0x1000000 -x 2 0x100000 -x 0 0x2000000 -x 161 16384 -x 162 16384 -x 62 8 -gnuvsn 80300 -x 69 0x200 -x 123 0x400 -cmdline '+pgc++ x86_64/core/mod2c/hcn2.cpp -O2 -Mvect=simd -acc --diag_suppress 177 -mp --c++14 -Minline=size:200,levels:10 -Minfo=acc -DEIGEN_HAS_CXX11_ATOMIC=0 -DPG_ACC_BUGS -DCUDA_PROFILING -DCORENEURON_BUILD -DHAVE_MALLOC_H -DSWAP_ENDIAN_DISABLE_ASM -DEIGEN_DONT_VECTORIZE=1 -DNRNMPI=1 -DLAYOUT=0 -DDISABLE_HOC_EXP -DENABLE_SPLAYTREE_QUEUING -I/gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/CoreNeuron/build_gpu_sympy/install/include -I/gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/CoreNeuron/build_gpu_sympy/install/include/coreneuron/utils/randoms -I/gpfs/bbp.cscs.ch/ssd/apps/hpc/jenkins/deploy/tools/2020-02-01/linux-rhel7-x86_64/gcc-8.3.0/hpe-mpi-2.21-7pbszh6v5u/include -fPIC -c -o x86_64/core/build/hcn2.o' -asm /gpfs/bbp.cscs.ch/ssd/slurmTmpFS/magkanar/773510/pgc++uoOV8WmKrzY7.ll
make: *** [x86_64/core/build/hcn2.o] Error 127

The final issue we came across was the following:

PGCC-W-0155-External and Static variables are not supported in acc routine - _ZTVSt9exception (x86_64/core/mod2c/hcn2.cpp: 336)
void Eigen::internal::gemv_dense_selector<(int)2, (int)1, (bool)1>::run<Eigen::Transpose<const Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>>, Eigen::Transpose<const Eigen::Block<const Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)-1, (bool)0>>, Eigen::Transpose<Eigen::Block<Eigen::Ref<Eigen::Matrix<double, (int)-1, (int)-1, (int)0, (int)-1, (int)-1>, (int)0, Eigen::OuterStride<(int)-1>>, (int)1, (int)-1, (bool)0>>>(const T1 &, const T2 &, T3 &, const T3::Scalar &):
     32, include "LU"
          11, include "Core"
              288, include "GeneralProduct.h"
                   309, Generating implicit acc routine seq
              292, include "PermutationMatrix.h"
              294, include "TriangularMatrix.h"
              317, include "VectorwiseOp.h"
Unimplemented opcode: 0
PGCC-F-0000-Internal compiler error. Unimplemented opcode.       4  (x86_64/core/mod2c/hcn2.cpp: 46)
PGCC/x86 Linux 19.10-0: compilation aborted
make: *** [x86_64/core/build/hcn2.o] Error 2

For this there is no solution found.
To reproduce all the issues in a gpu node:

module load git
git clone https://github.com/BlueBrain/CoreNeuron.git
cd CoreNeuron
mkdir build_gpu_sympy
cd build_gpu_sympy
module load nvhpc cuda hpe-mpi boost python-dev flex bison cmake
cmake .. -DCMAKE_INSTALL_PREFIX=./install -DCORENRN_ENABLE_NMODL=ON -DCORENRN_ENABLE_GPU=ON -DCORENRN_NMODL_FLAGS="sympy --analytic" -DNRN_ENABLE_CORENEURON=ON -DCMAKE_BUILD_TYPE=Release
make -j36
cd HCN2k
mkdir mod_eigen
cp hcn2.mod mod_eigen
<install-dir>/bin/nrnivmodl -coreneuron mod_eigen

CoreNeuron and NMODL master branches were used

@iomaganaris iomaganaris added codegen Code generation backend solver Solver and numerical methods gpu labels Apr 27, 2020
@iomaganaris
Copy link
Contributor Author

iomaganaris commented Jun 3, 2020

Following the incompatibilities of Eigen with OpenACC I started investigating if Eigen can be called from CUDA kernels. To do this I am using a simple example using the Eigen::PartialPivLU solver created by @cattabiani on top of which I added a CUDA kernel to run the same solver.
My tries are currently WIP here.
During the development I faced 4 issues:

  1. Compilation issues when adding the Eigen::PartialPivLU solver in the CUDA kernel. To get the code compiled I needed to do the following changes in the Eigen source code:
diff --git a/Eigen/src/Core/SolverBase.h b/Eigen/src/Core/SolverBase.h
index 501461042..e7d5ca5a3 100644
--- a/Eigen/src/Core/SolverBase.h
+++ b/Eigen/src/Core/SolverBase.h
@@ -94,7 +94,7 @@ class SolverBase : public EigenBase<Derived>
     SolverBase()
     {}

-    ~SolverBase()
+    EIGEN_DEVICE_FUNC ~SolverBase()
     {}

     using Base::derived;
@@ -102,7 +102,7 @@ class SolverBase : public EigenBase<Derived>
     /** \returns an expression of the solution x of \f$ A x = b \f$ using the current decomposition of A.
       */
     template<typename Rhs>
-    inline const Solve<Derived, Rhs>
+    EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const Solve<Derived, Rhs>
     solve(const MatrixBase<Rhs>& b) const
     {
       internal::solve_assertion<typename internal::remove_all<Derived>::type>::template run<false>(derived(), b);
diff --git a/Eigen/src/LU/PartialPivLU.h b/Eigen/src/LU/PartialPivLU.h
index b8938013a..dafcab719 100644
--- a/Eigen/src/LU/PartialPivLU.h
+++ b/Eigen/src/LU/PartialPivLU.h
@@ -173,7 +173,7 @@ template<typename _MatrixType> class PartialPivLU
       * \sa TriangularView::solve(), inverse(), computeInverse()
       */
     template<typename Rhs>
-    inline const Solve<PartialPivLU, Rhs>
+    EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const Solve<PartialPivLU, Rhs>
     solve(const MatrixBase<Rhs>& b) const;
     #endif

@@ -593,7 +593,7 @@ struct Assignment<DstXprType, Inverse<PartialPivLU<MatrixType> >, internal::assi
   * \sa class PartialPivLU
   */
 template<typename Derived>
-inline const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
+EIGEN_DEVICE_FUNC EIGEN_ALWAYS_INLINE const PartialPivLU<typename MatrixBase<Derived>::PlainObject>
 MatrixBase<Derived>::partialPivLu() const
 {
   return PartialPivLU<PlainObject>(eval());
  1. Moving the Eigen MatrixXd and VectorXd structs to the GPU. To do this I followed two paths.
  • Move the structs as they are based on this suggestion, where I managed to get it compiled but the matrices in the kernel had only zeros
  • Turn the structs into C style arrays (double*) to move them and then use Eigen::Map to map them to Eigen structs. This is the current implementation but still there is some issue with the memory since it's all 0s in the GPU
    It seems that moving straight away the Eigen structs is nicer but needs more work to understand what is the issue I came across.
  1. After doing all the above I get the following error during execution:
bash-4.2$ ./testEigenGPU
Size of the matrix?
4
v_device data:
1
3
5
3
v in device
0.000000
0.000000
0.000000
0.000000
Error with cudaDeviceSync: unspecified launch failure
Random matrix:

1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1

Random vector:

1
3
5
3

Solution (x) of M*x = v:

0
1
2
3

Device Solution (x) of M*x = v:

0x7fff7a000400

I tried to debug this with ddt, cuda-gdb and cuda-memcheck and I get the following with cuda-gdb:

Starting program: /gpfs/bbp.cscs.ch/project/proj16/magkanar/GPU_EIGEN/testEigen/build/testEigenGPU
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffcffff700 (LWP 244304)]
Size of the matrix?
4
v_device data:
1
3
5
3
[New Thread 0x7fffbdb06700 (LWP 244306)]
v in device
0.000000
0.000000
0.000000
0.000000

Thread 1 "testEigenGPU" received signal SIGTRAP, Trace/breakpoint trap.
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x0000000000db4900 in runPartialPivLuGPU(double*, double*, double*, int)<<<(1,1,1),(1,1,1)>>> ()

and cuda-memcheck:

bash-4.2$ cuda-memcheck ./testEigenGPU
========= CUDA-MEMCHECK
Size of the matrix?
4
v_device data:
1
3
5
3
========= Unknown Error
=========
v in device
0.000000
0.000000
0.000000
0.000000
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/lib64/libcuda.so.1 [0x3b9803]
=========     Host Frame:./testEigenGPU [0x54d16]
=========     Host Frame:./testEigenGPU [0x4caf]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22495]
=========     Host Frame:./testEigenGPU [0x4489]
=========
Error with cudaDeviceSync: unspecified launch failure

By googling the errors I found out that those errors probably come from some segmentation fault coming from the Eigen::PartialPivLU solver.
4. A bunch of warning: calling a __host__ function from a __host__ __device__ function is not allowed during compilation, which I don't know if they are really used by the solver and are the root of all the errors

CUDA used: 10.1.243
GCC used: 9.3.0

TODO:
1. Check why the matrices are not copied correctly into the device
2. Try to debug the errors

cc: @pramodk @ohm314

@kotsaloscv kotsaloscv self-assigned this May 21, 2021
kotsaloscv pushed a commit that referenced this issue May 31, 2021
kotsaloscv pushed a commit that referenced this issue Jun 3, 2021
kotsaloscv pushed a commit that referenced this issue Jun 4, 2021
kotsaloscv pushed a commit that referenced this issue Jun 4, 2021
kotsaloscv pushed a commit that referenced this issue Jun 4, 2021
@kotsaloscv kotsaloscv linked a pull request Jun 7, 2021 that will close this issue
kotsaloscv pushed a commit that referenced this issue Jun 8, 2021
kotsaloscv pushed a commit that referenced this issue Jun 9, 2021
kotsaloscv pushed a commit that referenced this issue Jun 9, 2021
…wMajor storage order (improvement with in-place transposition)
kotsaloscv pushed a commit that referenced this issue Jun 9, 2021
…r and its unit test (full compatibility with OpenACC/CUDA backends)
kotsaloscv pushed a commit that referenced this issue Jun 10, 2021
@pramodk pramodk added the bug Something isn't working label Jul 20, 2021
@kotsaloscv kotsaloscv linked a pull request Sep 6, 2021 that will close this issue
pramodk pushed a commit that referenced this issue Sep 6, 2021
* Two factors contribute to the above solution:
   - New eigen branch (version 3.5 and above). Currently we are
      using a mirrored version of Eigen in BlueBrain organisation
      https://github.com/BlueBrain/eigen/releases/tag/v3.5-alpha
   - An API that makes possible to call any Eigen `__device__` 
      function from within OpenACC regions.
* More details: Eigen-3.5+ provides better GPU support; however,
   some functions cannot be called directly from within OpenACC regions.
   Therefore, we need to wrap them in a special API (decorate them with 
    `__device__` & `acc routine` tokens), which allows us to eventually call
    them from OpenACC. Calling these functions from CUDA kernels presents
    no issue.
*  From #726: Avoid use `[]` operator with eigen Matrix objects. This results
    into runtime error with OpenACC and PGI compiler.
* Note that this should works in combination with BlueBrain/CoreNeuron/pull/624

fixes #311 
fixes #135
olupton added a commit to neuronsimulator/nrn that referenced this issue Apr 6, 2022
BlueBrain/nmodl#311 has been closed, so hopefully these work now.
olupton added a commit to neuronsimulator/nrn that referenced this issue Apr 7, 2022
- Create nrnivmodl targets in nrn_add_test_group and add a CORENEURON
  option to that command. This means that we do not have to compile
  *all* MOD files for CoreNEURON if CoreNEURON is enabled in the build.
- Drop MODFILE_PATTERNS, NRNIVMODL_ARGS and SUBMODULE arguments
  to nrn_add_test that were not being used.
- The same de-duplication is done as before, so there is no change to
  the number of nrnivmodl invocations, but some targets no longer pass
  -coreneuron to nrnivmodl.
* testcorenrn: re-enable NMODL+GPU tests now BlueBrain/nmodl#311 is closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working codegen Code generation backend gpu solver Solver and numerical methods
Projects
None yet
4 participants