Skip to content

Commit

Permalink
Reduce redundant GPU allocations (#2393)
Browse files Browse the repository at this point in the history
# Introduction

- Do not allocate and fill storage for resetting ion concentrations if
these are never changed.
Reasoning: If concentrations are never changed, we do not reset them and
thus do not need
  to store the values. 
- Remove a duplicate of the CV area in solver and shared state and pass
a reference during solving
- Remove redundant `solution` from GPU solver
- Do not store / allocate diffusion values if not needed.

This saves **per CV**
- `1 x 8B` for `cv_area` unconditionally
- `1 x 8B` for `Xd` for each ion with no diffusion is in use (majority
of cases)
- `2 x 8B` for `Xi` for each ion (`reset` and `init`) if not written
(reasonably often)
- `2 x 8B` for `Xo` for each ion (`reset` and `init`) if not written
(majority of cases)
- `1 x 8B` for `eX` reset for each ion if not read (majority)
- `1 x 8B` for `eX` for each ion if not read (rarely)

In my standard benchmark, `busyring` with complex cells, this saves
about 18% of the total GPU
allocation for the cell data (`shared_state`).

This has become a mixed bag, fixing a few additional things that came up
during testing this:
- a bug in event handling on GPU
- pybind11 stub generation

---------

Co-authored-by: Jannik Luboeinski <[email protected]>
Co-authored-by: boeschf <[email protected]>
  • Loading branch information
3 people authored Oct 23, 2024
1 parent fd2718c commit b8b768d
Show file tree
Hide file tree
Showing 17 changed files with 184 additions and 185 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ jobs:
run: |
mkdir build
cd build
cmake .. -DCMAKE_CXX_COMPILER=g++-12 -DCMAKE_C_COMPILER=gcc-12 -DARB_WITH_PYTHON=ON -DARB_VECTORIZE=ON -DPython3_EXECUTABLE=`which python` -DARB_WITH_MPI=OFF -DARB_WITH_ASSERTIONS=ON -DARB_WITH_PROFILING=ON
cmake .. -DCMAKE_CXX_COMPILER=g++-12 -DCMAKE_C_COMPILER=gcc-12 -DARB_WITH_PYTHON=ON -DARB_VECTORIZE=ON -DPython3_EXECUTABLE=`which python` -DARB_WITH_MPI=OFF -DARB_WITH_ASSERTIONS=ON -DARB_WITH_PROFILING=ON -DARB_BUILD_PYTHON_STUBS=OFF
make -j4 tests examples pyarb
cd -
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/sanitize.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
mkdir build
cd build
export SAN="-fsanitize=${{ matrix.sanitizer }} -fno-omit-frame-pointer"
cmake .. -GNinja -DCMAKE_BUILD_TYPE=debug -DCMAKE_CXX_FLAGS="$SAN" -DCMAKE_C_FLAGS="$SAN" -DCMAKE_EXE_LINKER_FLAGS="$SAN" -DCMAKE_MODULE_LINKER_FLAGS="$SAN" -DARB_WITH_ASSERTIONS=ON -DCMAKE_CXX_COMPILER=$CXX -DCMAKE_C_COMPILER=$CC -DARB_VECTORIZE=${{ matrix.simd }} -DARB_WITH_MPI=OFF -DARB_WITH_PYTHON=ON -DPython3_EXECUTABLE=`which python`
cmake .. -GNinja -DCMAKE_BUILD_TYPE=debug -DCMAKE_CXX_FLAGS="$SAN" -DCMAKE_C_FLAGS="$SAN" -DCMAKE_EXE_LINKER_FLAGS="$SAN" -DCMAKE_MODULE_LINKER_FLAGS="$SAN" -DARB_BUILD_PYTHON_STUBS=OFF -DARB_WITH_ASSERTIONS=ON -DCMAKE_CXX_COMPILER=$CXX -DCMAKE_C_COMPILER=$CC -DARB_VECTORIZE=${{ matrix.simd }} -DARB_WITH_MPI=OFF -DARB_WITH_PYTHON=ON -DPython3_EXECUTABLE=`which python`
ninja -j4 -v tests examples pyarb
cd -
- name: Run unit tests
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ jobs:
- name: Update pip
run: python -m pip install --upgrade pip
- name: Install Python packages
run: pip install numpy sphinx svgwrite sphinx-rtd-theme mpi4py pandas seaborn
run: pip install numpy sphinx svgwrite sphinx-rtd-theme mpi4py pandas seaborn pybind11-stubgen
- name: Clone w/ submodules
uses: actions/checkout@v4
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test-spack.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ jobs:
- name: build and install arbor through spack dev-build
run: |
source spack/share/spack/setup-env.sh
spack install --no-check-signature --only dependencies arbor@develop target=x86_64_v2 +python ^python@${{ matrix.python-version }}
spack install --no-check-signature --only dependencies arbor@develop target=x86_64_v2 +python ~pystubs ^python@${{ matrix.python-version }}
cd arbor
spack dev-build arbor@develop target=x86_64_v2 +python ^python@${{ matrix.python-version }}
spack dev-build arbor@develop target=x86_64_v2 +python ~pystubs ^python@${{ matrix.python-version }}
- name: load arbor and verify installation, python examples.
run: |
Expand Down
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ set(CMAKE_SUPPRESS_DEVELOPER_WARNINGS ON CACHE INTERNAL "" FORCE)

# Make CUDA support throw errors if architectures remain unclear
cmake_policy(SET CMP0104 NEW)

# Ensure CMake is aware of the policies for modern RPATH behavior
cmake_policy(SET CMP0072 NEW)

Expand Down Expand Up @@ -46,6 +45,9 @@ if(NOT DEFINED CMAKE_INTERPROCEDURAL_OPTIMIZATION)
endif()
endif()

# Use pybind11-stubgen to make type stubs.
cmake_dependent_option(ARB_BUILD_PYTHON_STUBS "Use pybind11-stubgen to build type stubs." ON "ARB_WITH_PYTHON" OFF)

# Turn on this option to force the compilers to produce color output when output is
# redirected from the terminal (e.g. when using ninja or a pager).

Expand Down
41 changes: 41 additions & 0 deletions arbor/backends/common_types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#include <arbor/common_types.hpp>

#include "fvm_layout.hpp"
#include "util/range.hpp"
#include "backends/threshold_crossing.hpp"
#include "execution_context.hpp"
Expand All @@ -21,4 +22,44 @@ struct fvm_detector_info {
execution_context ctx;
};

struct ion_data_flags {
// flags for resetting ion states after W access
bool write_eX_:1 = false; // is eX written?
bool write_Xo_:1 = false; // is Xo written?
bool write_Xi_:1 = false; // is Xi written?
bool write_Xd_:1 = false; // is Xd written?
bool read_eX_:1 = false; // is eX read?
bool read_Xo_:1 = false; // is Xo read?
bool read_Xi_:1 = false; // is Xi read?
bool read_Xd_:1 = false; // is Xd read?

ion_data_flags(const fvm_ion_config& config):
write_eX_(config.revpot_written),
write_Xo_(config.econc_written),
write_Xi_(config.iconc_written),
write_Xd_(config.is_diffusive),
read_eX_(config.revpot_read),
read_Xo_(config.econc_read),
read_Xi_(config.iconc_read),
read_Xd_(config.is_diffusive)
{}

ion_data_flags() = default;
ion_data_flags(const ion_data_flags&) = default;
ion_data_flags& operator=(const ion_data_flags&) = default;

bool xi() const { return read_Xi_ || write_Xi_; }
bool reset_xi() const { return write_Xi_; }

bool xo() const { return read_Xo_ || write_Xo_; }
bool reset_xo() const { return write_Xo_; }

bool xd() const { return read_Xd_ || write_Xd_; }
bool reset_xd() const { return write_Xd_; }

bool ex() const { return read_eX_ || write_eX_; }
bool reset_ex() const { return write_eX_; }
};


}
2 changes: 1 addition & 1 deletion arbor/backends/gpu/event_stream.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ struct event_stream : BaseEventStream {

protected:
void init() override final {
resize(this->device_ev_data_, this->device_ev_data_.size());
resize(this->device_ev_data_, this->ev_data_.size());
memory::copy_async(this->ev_data_, this->device_ev_data_);
this->base_ptr_ = this->device_ev_data_.data();
}
Expand Down
63 changes: 20 additions & 43 deletions arbor/backends/gpu/shared_state.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,51 +46,31 @@ std::pair<arb_value_type, arb_value_type> minmax_value_impl(arb_size_type n, con
ion_state::ion_state(const fvm_ion_config& ion_data,
unsigned, // alignment/padding ignored.
solver_ptr ptr):
write_eX_(ion_data.revpot_written),
write_Xo_(ion_data.econc_written),
write_Xi_(ion_data.iconc_written),
write_Xd_(ion_data.is_diffusive),
read_Xo_(ion_data.econc_written || ion_data.econc_read), // ensure that if we have W access, also R access is flagged
read_Xi_(ion_data.iconc_written || ion_data.iconc_read),
flags_(ion_data),
node_index_(make_const_view(ion_data.cv)),
iX_(ion_data.cv.size(), NAN),
eX_(ion_data.init_revpot.begin(), ion_data.init_revpot.end()),
gX_(ion_data.cv.size(), NAN),
charge(1u, static_cast<arb_value_type>(ion_data.charge)),
solver(std::move(ptr)) {
// We don't need to allocate these if we never use them...
if (read_Xi_) {
Xi_ = make_const_view(ion_data.init_iconc);
}
if (read_Xo_) {
Xo_ = make_const_view(ion_data.init_econc);
}
if (write_Xi_ || write_Xd_) {
// ... but this is used by Xd and Xi!
reset_Xi_ = make_const_view(ion_data.reset_iconc);
}
if (write_Xi_) {
init_Xi_ = make_const_view(ion_data.init_iconc);
arb_assert(node_index_.size()==init_Xi_.size());
}
if (write_Xo_) {
init_Xo_ = make_const_view(ion_data.init_econc);
reset_Xo_ = make_const_view(ion_data.reset_econc);
arb_assert(node_index_.size()==init_Xo_.size());
}
if (write_eX_) {
init_eX_ = make_const_view(ion_data.init_revpot);
arb_assert(node_index_.size()==init_eX_.size());
}
if (write_Xd_) {
Xd_ = array(ion_data.cv.size(), NAN);
}
if (flags_.reset_xi()
||flags_.reset_xd()) reset_Xi_ = make_const_view(ion_data.reset_iconc);
if (flags_.reset_xi()) init_Xi_ = make_const_view(ion_data.init_iconc);
if (flags_.xi()) Xi_ = make_const_view(ion_data.init_iconc);

if (flags_.reset_xo()) reset_Xo_ = make_const_view(ion_data.reset_econc);
if (flags_.reset_xo()) init_Xo_ = make_const_view(ion_data.init_econc);
if (flags_.xo()) Xo_ = make_const_view(ion_data.init_econc);

if (flags_.reset_ex()) init_eX_ = make_const_view(ion_data.init_revpot);
if (flags_.ex()) eX_ = make_const_view(ion_data.init_revpot);

if (flags_.xd()) Xd_ = make_const_view(ion_data.reset_iconc);
}

void ion_state::init_concentration() {
// NB. not resetting Xd here, it's controlled via the solver.
if (write_Xi_) memory::copy(init_Xi_, Xi_);
if (write_Xo_) memory::copy(init_Xo_, Xo_);
if (flags_.reset_xi()) memory::copy(init_Xi_, Xi_);
if (flags_.reset_xo()) memory::copy(init_Xo_, Xo_);
}

void ion_state::zero_current() {
Expand All @@ -100,13 +80,10 @@ void ion_state::zero_current() {

void ion_state::reset() {
zero_current();
if (write_Xi_) memory::copy(reset_Xi_, Xi_);
if (write_Xo_) memory::copy(reset_Xo_, Xo_);
if (write_eX_) memory::copy(init_eX_, eX_);
// This goes _last_ or at least after Xi since we might have removed reset_Xi
// when Xi is constant. Thus conditionally resetting Xi first and then copying
// Xi -> Xd is save in all cases.
if (write_Xd_) memory::copy(reset_Xi_, Xd_);
if (flags_.reset_xi()) memory::copy(reset_Xi_, Xi_);
if (flags_.reset_xo()) memory::copy(reset_Xo_, Xo_);
if (flags_.reset_ex()) memory::copy(init_eX_, eX_);
if (flags_.reset_xd()) memory::copy(reset_Xi_, Xd_);
}

// istim_state methods:
Expand Down
42 changes: 17 additions & 25 deletions arbor/backends/gpu/shared_state.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@

namespace arb {
namespace gpu {

/*
* Ion state fields correspond to NMODL ion variables, where X
* is replaced with the name of the ion. E.g. for calcium 'ca':
Expand All @@ -40,30 +39,23 @@ struct ARB_ARBOR_API ion_state {
using solver_type = arb::gpu::diffusion_state<arb_value_type, arb_index_type>;
using solver_ptr = std::unique_ptr<solver_type>;

bool write_eX_:1; // is eX written?
bool write_Xo_:1; // is Xo written?
bool write_Xi_:1; // is Xi written?
bool write_Xd_:1; // is Xd written?
bool read_eX_:1; // is eX read?
bool read_Xo_:1; // is Xo read?
bool read_Xi_:1; // is Xi read?
bool read_Xd_:1; // is Xd read?

iarray node_index_; // Instance to CV map.
array iX_; // (A/m²) current density
array eX_; // (mV) reversal potential
array Xi_; // (mM) internal concentration
array Xd_; // (mM) diffusive concentration
array Xo_; // (mM) external concentration
array gX_; // (kS/m²) per-species conductivity

array init_Xi_; // (mM) area-weighted initial internal concentration
array init_Xo_; // (mM) area-weighted initial external concentration
array reset_Xi_; // (mM) area-weighted user-set internal concentration
array reset_Xo_; // (mM) area-weighted user-set internal concentration
array init_eX_; // (mM) initial reversal potential

array charge; // charge of ionic species (global, length 1)
ion_data_flags flags_; // Track what and when to reset / allocate

iarray node_index_; // Instance to CV map.
array iX_; // (A/m²) current density
array eX_; // (mV) reversal potential
array Xi_; // (mM) internal concentration
array Xd_; // (mM) diffusive concentration
array Xo_; // (mM) external concentration
array gX_; // (kS/m²) per-species conductivity

array init_Xi_; // (mM) area-weighted initial internal concentration
array init_Xo_; // (mM) area-weighted initial external concentration
array reset_Xi_; // (mM) area-weighted user-set internal concentration
array reset_Xo_; // (mM) area-weighted user-set internal concentration
array init_eX_; // (mM) initial reversal potential

array charge; // charge of ionic species (global, length 1)

solver_ptr solver = nullptr;

Expand Down
64 changes: 20 additions & 44 deletions arbor/backends/multicore/shared_state.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,52 +54,31 @@ ion_state::ion_state(const fvm_ion_config& ion_data,
unsigned align,
solver_ptr ptr):
alignment(min_alignment(align)),
write_eX_(ion_data.revpot_written),
write_Xo_(ion_data.econc_written),
write_Xi_(ion_data.iconc_written),
write_Xd_(ion_data.is_diffusive),
read_Xo_(ion_data.econc_written || ion_data.econc_read), // ensure that if we have W access, also R access is flagged
read_Xi_(ion_data.iconc_written || ion_data.iconc_read),
read_Xd_(ion_data.is_diffusive),
flags_{ion_data},
node_index_(ion_data.cv.begin(), ion_data.cv.end(), pad(alignment)),
iX_(ion_data.cv.size(), NAN, pad(alignment)),
eX_(ion_data.init_revpot.begin(), ion_data.init_revpot.end(), pad(alignment)),
gX_(ion_data.cv.size(), NAN, pad(alignment)),
charge(1u, ion_data.charge, pad(alignment)),
solver(std::move(ptr)) {
// We don't need to allocate these if we never use them...
if (read_Xi_) {
Xi_ = {ion_data.init_iconc.begin(), ion_data.init_iconc.end(), pad(alignment)};
}
if (read_Xo_) {
Xo_ = {ion_data.init_econc.begin(), ion_data.init_econc.end(), pad(alignment)};
}
if (write_Xi_ || write_Xd_) {
// ... but this is used by Xd and Xi!
reset_Xi_ = {ion_data.reset_iconc.begin(), ion_data.reset_iconc.end(), pad(alignment)};
}
if (write_Xi_) {
init_Xi_ = {ion_data.init_iconc.begin(), ion_data.init_iconc.end(), pad(alignment)};
arb_assert(node_index_.size()==init_Xi_.size());
}
if (write_Xo_) {
init_Xo_ = {ion_data.init_econc.begin(), ion_data.init_econc.end(), pad(alignment)};
reset_Xo_ = {ion_data.reset_econc.begin(), ion_data.reset_econc.end(), pad(alignment)};
arb_assert(node_index_.size()==init_Xo_.size());
}
if (write_eX_) {
init_eX_ = {ion_data.init_revpot.begin(), ion_data.init_revpot.end(), pad(alignment)};
arb_assert(node_index_.size()==init_eX_.size());
}
if (read_Xd_) {
Xd_ = {ion_data.reset_iconc.begin(), ion_data.reset_iconc.end(), pad(alignment)};
}
if (flags_.reset_xi()
||flags_.reset_xd()) reset_Xi_ = {ion_data.reset_iconc.begin(), ion_data.reset_iconc.end(), pad(alignment)};
if (flags_.reset_xi()) init_Xi_ = {ion_data.init_iconc.begin(), ion_data.init_iconc.end(), pad(alignment)};
if (flags_.xi()) Xi_ = {ion_data.init_iconc.begin(), ion_data.init_iconc.end(), pad(alignment)};

if (flags_.reset_xo()) reset_Xo_ = {ion_data.reset_econc.begin(), ion_data.reset_econc.end(), pad(alignment)};
if (flags_.reset_xo()) init_Xo_ = {ion_data.init_econc.begin(), ion_data.init_econc.end(), pad(alignment)};
if (flags_.xo()) Xo_ = {ion_data.init_econc.begin(), ion_data.init_econc.end(), pad(alignment)};

if (flags_.reset_ex()) init_eX_ = {ion_data.init_revpot.begin(), ion_data.init_revpot.end(), pad(alignment)};
if (flags_.ex()) eX_ = {ion_data.init_revpot.begin(), ion_data.init_revpot.end(), pad(alignment)};

if (flags_.xd()) Xd_ = {ion_data.reset_iconc.begin(), ion_data.reset_iconc.end(), pad(alignment)};
}

void ion_state::init_concentration() {
// NB. not resetting Xd here, it's controlled via the solver.
if (write_Xi_) std::copy(init_Xi_.begin(), init_Xi_.end(), Xi_.begin());
if (write_Xo_) std::copy(init_Xo_.begin(), init_Xo_.end(), Xo_.begin());
if (flags_.reset_xi()) std::copy(init_Xi_.begin(), init_Xi_.end(), Xi_.begin());
if (flags_.reset_xo()) std::copy(init_Xo_.begin(), init_Xo_.end(), Xo_.begin());
}

void ion_state::zero_current() {
Expand All @@ -109,13 +88,10 @@ void ion_state::zero_current() {

void ion_state::reset() {
zero_current();
if (write_Xi_) std::copy(reset_Xi_.begin(), reset_Xi_.end(), Xi_.begin());
if (write_Xo_) std::copy(reset_Xo_.begin(), reset_Xo_.end(), Xo_.begin());
if (write_eX_) std::copy(init_eX_.begin(), init_eX_.end(), eX_.begin());
// This goes _last_ or at least after Xi since we might have removed reset_Xi
// when Xi is constant. Thus conditionally resetting Xi first and then copying
// Xi -> Xd is safe in all cases.
if (write_Xd_) std::copy(Xi_.begin(), Xi_.end(), Xd_.begin());
if (flags_.reset_xi()) std::copy(reset_Xi_.begin(), reset_Xi_.end(), Xi_.begin());
if (flags_.reset_xo()) std::copy(reset_Xo_.begin(), reset_Xo_.end(), Xo_.begin());
if (flags_.reset_ex()) std::copy(init_eX_.begin(), init_eX_.end(), eX_.begin());
if (flags_.reset_xd()) std::copy(reset_Xi_.begin(), reset_Xi_.end(), Xd_.begin());
}

// istim_state methods:
Expand Down
43 changes: 18 additions & 25 deletions arbor/backends/multicore/shared_state.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,32 +48,25 @@ struct ARB_ARBOR_API ion_state {
using solver_type = diffusion_solver;
using solver_ptr = std::unique_ptr<solver_type>;

unsigned alignment = 1; // Alignment and padding multiple.
unsigned alignment = 1; // Alignment and padding multiple.

ion_data_flags flags_; // Track what and when to reset / allocate

iarray node_index_; // Instance to CV map.
array iX_; // (A/m²) current density
array eX_; // (mV) reversal potential
array Xi_; // (mM) internal concentration
array Xd_; // (mM) diffusive internal concentration
array Xo_; // (mM) external concentration
array gX_; // (kS/m²) per-species conductivity

array init_Xi_; // (mM) area-weighted initial internal concentration
array init_Xo_; // (mM) area-weighted initial external concentration
array reset_Xi_; // (mM) area-weighted user-set internal concentration
array reset_Xo_; // (mM) area-weighted user-set internal concentration
array init_eX_; // (mV) initial reversal potential

bool write_eX_:1; // is eX written?
bool write_Xo_:1; // is Xo written?
bool write_Xi_:1; // is Xi written?
bool write_Xd_:1; // is Xd written?
bool read_eX_:1; // is eX read?
bool read_Xo_:1; // is Xo read?
bool read_Xi_:1; // is Xi read?
bool read_Xd_:1; // is Xd read?

iarray node_index_; // Instance to CV map.
array iX_; // (A/m²) current density
array eX_; // (mV) reversal potential
array Xi_; // (mM) internal concentration
array Xd_; // (mM) diffusive internal concentration
array Xo_; // (mM) external concentration
array gX_; // (kS/m²) per-species conductivity

array init_Xi_; // (mM) area-weighted initial internal concentration
array init_Xo_; // (mM) area-weighted initial external concentration
array reset_Xi_; // (mM) area-weighted user-set internal concentration
array reset_Xo_; // (mM) area-weighted user-set internal concentration
array init_eX_; // (mV) initial reversal potential

array charge; // charge of ionic species (global value, length 1)
array charge; // charge of ionic species (global value, length 1)

solver_ptr solver = nullptr;

Expand Down
Loading

0 comments on commit b8b768d

Please sign in to comment.