Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB particle coupling: performance drag due to exceptions #4752

Closed
RudolfWeeber opened this issue Jul 14, 2023 · 2 comments · Fixed by #4757
Closed

LB particle coupling: performance drag due to exceptions #4752

RudolfWeeber opened this issue Jul 14, 2023 · 2 comments · Fixed by #4757
Assignees

Comments

@RudolfWeeber
Copy link
Contributor

RudolfWeeber commented Jul 14, 2023

Some recent changes in the Walberla branch seem to have introduced a lot of (caught) exceptions in the particle coupling code. As a result, the stack unwinding funcitons of the c library are some of the most dominatn contributions to the runtime.

For a perf report of the lb benchmark with 10 lb sites per particle and 10.000 particles per core:

Overhead  Command     Shared Object                                      Symbol
........  ..........  .................................................  .......................................................................................................................................................
   8.20%  python3.10  espresso_core.so                                   [.] RegularDecomposition::init_cell_interactions
   6.58%  python3.10  libgcc_s.so.1                                      [.] _Unwind_Find_FDE
   4.57%  python3.10  ld-linux-x86-64.so.2                               [.] _dl_find_object
   3.15%  python3.10  espresso_walberla.so                               [.] walberla::pystencils::internal_streamsweepsingleprecision_streamsweepsingleprecision::streamsweepsingleprecision_streamsweepsingleprecision
   3.08%  python3.10  espresso_core.so                                   [.] integrate
   2.72%  python3.10  espresso_core.so                                   [.] force_calc
   2.63%  python3.10  espresso_walberla.so                               [.] walberla::get_block_and_cell
   2.27%  python3.10  espresso_core.so                                   [.] LB::get_agrid
   2.06%  python3.10  espresso_core.so                                   [.] friction_thermo_langevin

As can be seen, the stack unwinding is more expensive than the stream/collide.

 1.87%  python3.10  espresso_core.so                                   [.] CellStructure::verlet_list_loop<force_calc(CellStructure&, double, double)::{lambda(Particle&, Particle&, Distance const&)#2}, VerletCriterion<GetNonbondedCutoff> >
 1.85%  python3.10  espresso_core.so                                   [.] in_local_halo
 1.70%  python3.10  libgcc_s.so.1                                      [.] 0x0000000000014b67
 1.57%  python3.10  espresso_walberla.so                               [.] walberla::pystencils::internal_48c9ee502281a70505dce0378c55abd5::collidesweepsingleprecisionthermalizedavx_collidesweepsingleprecisionthermalizedavx
 1.39%  python3.10  libgcc_s.so.1                                      [.] 0x00000000000157eb
 1.37%  python3.10  espresso_core.so                                   [.] add_non_bonded_pair_force
 1.26%  python3.10  espresso_core.so                                   [.] resort_particles_if_needed

Looking at a call-graph yielded, that the exception is thrown (and apparently caught) during lb velocity interpolation. The exception seems to be thrown in an interal of the walberla domain decomposition to retrieve the data from the field.

@jngrad
Copy link
Member

jngrad commented Jul 18, 2023

Does this help?

diff --git a/src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp b/src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp
index 6c230b158..1bcfbb182 100644
--- a/src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp
+++ b/src/walberla_bridge/src/lattice_boltzmann/LBWalberlaImpl.hpp
@@ -590,3 +590,3 @@ class LBWalberlaImpl : public LBWalberlaBase {
 
-    auto field = bc->block->template getData<VectorField>(m_velocity_field_id);
+    auto field = bc->block->template uncheckedFastGetData<VectorField>(m_velocity_field_id);
     auto const vec = lbm::accessor::Vector::get(field, bc->cell);
@@ -917,3 +917,3 @@ class LBWalberlaImpl : public LBWalberlaBase {
 
-    auto pdf_field = bc->block->template getData<PdfField>(m_pdf_field_id);
+    auto pdf_field = bc->block->template uncheckedFastGetData<PdfField>(m_pdf_field_id);
     auto const density = lbm::accessor::Density::get(pdf_field, bc->cell);

@jngrad jngrad changed the title LB particel coupling: performance drag due to exceptions LB particle coupling: performance drag due to exceptions Jul 18, 2023
@jngrad
Copy link
Member

jngrad commented Jul 18, 2023

Using the same benchmark with the same parameters on 1 MPI rank, runtime goes from 175 ms/loop to 45 ms/loop.

Before the patch:

  Children      Self  Command     Shared Object  Symbol
    13.95%     6.62%  python3.10  libgcc_s.so.1  [.] _Unwind_Find_FDE

After the patch:

  Children      Self  Command     Shared Object  Symbol
     0.16%     0.08%  python3.10  libgcc_s.so.1  [.] _Unwind_Find_FDE

@jngrad jngrad self-assigned this Jul 18, 2023
@kodiakhq kodiakhq bot closed this as completed in #4757 Jul 19, 2023
kodiakhq bot added a commit that referenced this issue Jul 19, 2023
Fixes #4752

Description of changes:
- skip expensive runtime checks on type-erased waLBerla fields during particle coupling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants