Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trilinos Master Merge PR Generator: Auto PR created to promote from master_merge_20240301_175926 branch to master #12793

Merged
merged 31 commits into from
Mar 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b0ceb69
Fix hypergraph crashing when local IDs aren't used
lkotipal Feb 9, 2024
f5ba916
Prevent hierarchical partitioning from doubly freeing communicators
lkotipal Feb 9, 2024
07ab17c
Stokhos: Fix Tpetra-related build errors with UVM enabled.
etphipp Feb 24, 2024
5bf7427
ifpack2 : split extract of SparseContainer to extractGraph and extrac…
iyamazaki Feb 24, 2024
313e4a4
ifpack2 : add timers to SparseContainer
iyamazaki Feb 25, 2024
307304e
Add URLs to release notes
cgcgcg Feb 25, 2024
62e7221
ifpack2 : add tester for BlockRelaxation with Amesos2
iyamazaki Feb 26, 2024
bc8999b
Merge Pull Request #12774 from cgcgcg/Trilinos/releaseNotes15.1
trilinos-autotester Feb 26, 2024
603334a
Update version strings post 15.1.0 release for develop
achauphan Feb 26, 2024
d49940c
Updated RELEASE_NOTES noting the switch to semantic versioning
achauphan Feb 26, 2024
ce68165
Merge Pull Request #12771 from etphipp/Trilinos/stokhos_cuda_uvm
trilinos-autotester Feb 27, 2024
13aa3f7
Add PR UVM build config
sebrowne Feb 28, 2024
d871563
Merge Pull Request #12781 from sebrowne/Trilinos/uvm_updates
trilinos-autotester Feb 28, 2024
21257b9
Krino: Snapshot 02-28-24 10:34 from Sierra 5.17.7-196-gef0c0e5f (#12782)
drnobleabq Feb 28, 2024
ae6a065
Sacado: update mat_vec performance test for HIP backend
rppawlo Feb 28, 2024
e39fae2
Panzer: updates for HIP backend
rppawlo Feb 28, 2024
3e997d6
Merge pull request #12731 from lkotipal/fix-crashes
ndellingwood Feb 29, 2024
cabbf69
Tpetra_Details_Behavior.cpp: add missing <array> include
cwpearson Feb 29, 2024
81b9eb2
Merge pull request #12775 from iyamazaki/ifpack2-blkRelax
iyamazaki Feb 29, 2024
456b133
Merge Pull Request #12787 from cwpearson/Trilinos/fix/12773
trilinos-autotester Feb 29, 2024
a46263e
add getAutomaticNSubparts
kliegeois Feb 29, 2024
ae5f385
Merge Pull Request #12785 from rppawlo/Trilinos/sacado-mat-vec-perf-t…
trilinos-autotester Feb 29, 2024
9725f81
Panzer MiniEM: Rewrite interpolation assembly
cgcgcg Feb 27, 2024
41d062d
Teko Utilities: Add explicitScale
cgcgcg Feb 28, 2024
183f7a6
Panzer MiniEM: Reduce number of assembled operators
cgcgcg Feb 28, 2024
846c0c7
Panzer MiniEM: Remove unneeded matrix from maxwell-large input deck
cgcgcg Feb 29, 2024
d230b84
guard the sort not to be run with Jacobi
kliegeois Feb 29, 2024
c501a6d
Merge pull request #12777 from achauphan/post-15-1-version-update
achauphan Feb 29, 2024
7379acb
Merge Pull Request #12788 from cgcgcg/Trilinos/miniemInterpolation
trilinos-autotester Feb 29, 2024
29e57ff
Modify num_teams to account for the vector length
kliegeois Mar 1, 2024
a469209
Merge Pull Request #12790 from kliegeois/Trilinos/getAutomaticNSubparts
trilinos-autotester Mar 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion RELEASE_NOTES
Original file line number Diff line number Diff line change
@@ -1,75 +1,93 @@

###############################################################################
# #
# Trilinos Release 15.1 Release Notes TBD, 2024 #
# Trilinos Release 15.1.0 Release Notes February 26, 2024 #
# #
###############################################################################

Amesos2

- The interface to SuperLU_DIST now also works for the CUDA-enabled
variant of the library.
https://github.com/trilinos/Trilinos/pull/12524


Framework

- Began using semantic versioning for Trilinos with 15.1.0 release.


Ifpack2

- BlockRelaxation can now generate blocks using a Zoltan2.
https://github.com/trilinos/Trilinos/pull/12728


Kokkos & Kokkos Kernels

- Inclusion of version 4.2.1 of Kokkos and Kokkos Kernels
https://github.com/trilinos/Trilinos/pull/12707


MueLu

- The reformulated Maxwell solver (RefMaxwell) was generalized to
also work for grad-div / Darcy flow problems.
https://github.com/trilinos/Trilinos/pull/12142

- In an effort to consolidate the old non-Kokkos code path with the
newer Kokkos code path, the following factories were deprecated
and should be removed from input decks: NullspaceFactory_kokkos,
SaPFactory_kokkos, UncoupledAggregationFactory_kokkos.
https://github.com/trilinos/Trilinos/pull/12720
https://github.com/trilinos/Trilinos/pull/12740


Panzer

- MiniEM can now also assemble and solve Darcy problems using first
or higher order mixed finite elements.
https://github.com/trilinos/Trilinos/pull/12142


PyTrilinos2

- New package that auto-generates Python interfaces for Trilinos
packages. Currently, most of Tpetra is exposed. We are planning on
adding other packages.
https://github.com/trilinos/Trilinos/pull/12332


ROL

- An auto-generated Python interface was added. A standalone Python
package can be downloaded from rol.sandia.gov
https://github.com/trilinos/Trilinos/pull/12770


Teko

- Block Jacobi and Gauss-Seidel methods allow now to specify
preconditioners for the iterative solves of the diagonal blocks.
https://github.com/trilinos/Trilinos/pull/12675


Tpetra

- Tpetra will now assume by default that the MPI library is GPU
aware, unless automatic detection or the user indicates otherwise.
https://github.com/trilinos/Trilinos/pull/12517

- Reject unrecognized TPETRA_* environment variable. Misspelled or
removed environment variables are no longer silently ignored.
https://github.com/trilinos/Trilinos/pull/12722

- In order to allocate in shared host/device space (i.e.
CudaUVMSpace, HIPManagedSpace or SYCLSharedUSMSpace) by default,
please use the CMake options
KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON
Tpetra_ALLOCATE_IN_SHARED_SPACE=ON
https://github.com/trilinos/Trilinos/pull/12622


###############################################################################
Expand Down
6 changes: 3 additions & 3 deletions Version.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,10 @@
# for release mode and set the version.
#

SET(Trilinos_VERSION 15.1)
SET(Trilinos_VERSION 15.2.0)
SET(Trilinos_MAJOR_VERSION 15)
SET(Trilinos_MAJOR_MINOR_VERSION 150100)
SET(Trilinos_VERSION_STRING "15.1 (Dev)")
SET(Trilinos_MAJOR_MINOR_VERSION 150200)
SET(Trilinos_VERSION_STRING "15.2.0-dev")
SET(Trilinos_ENABLE_DEVELOPMENT_MODE_DEFAULT ON) # Change to 'OFF' for a release

# Used by testing scripts and should not be used elsewhere
Expand Down
4 changes: 2 additions & 2 deletions packages/framework/ini-files/config-specs.ini
Original file line number Diff line number Diff line change
Expand Up @@ -2409,9 +2409,9 @@ use CUDA11-RUN-SERIAL-TESTS
opt-set-cmake-var ROL_example_PinT_parabolic-control_AugmentedSystem_test_MPI_2_DISABLE BOOL FORCE : ON


[rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_all]
[rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_pr]
use rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables
use PACKAGE-ENABLES|ALL
use PACKAGE-ENABLES|PR
opt-set-cmake-var Trilinos_ENABLE_TESTS BOOL FORCE : OFF


Expand Down
2 changes: 1 addition & 1 deletion packages/ifpack2/src/Ifpack2_BlockTriDiContainer_def.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ namespace Ifpack2 {
const bool useSeqMethod = false;
const bool overlapCommAndComp = false;
initInternal(matrix, importer, overlapCommAndComp, useSeqMethod);
n_subparts_per_part_ = 1;
n_subparts_per_part_ = -1;
IFPACK2_BLOCKHELPER_TIMER_FENCE(typename BlockHelperDetails::ImplType<MatrixType>::execution_space)
}

Expand Down
112 changes: 98 additions & 14 deletions packages/ifpack2/src/Ifpack2_BlockTriDiContainer_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -842,22 +842,83 @@ namespace Ifpack2 {
return Teuchos::null;
}

template<typename local_ordinal_type>
local_ordinal_type costTRSM(const local_ordinal_type block_size) {
return block_size*block_size;
}

template<typename local_ordinal_type>
local_ordinal_type costGEMV(const local_ordinal_type block_size) {
return 2*block_size*block_size;
}

template<typename local_ordinal_type>
local_ordinal_type costTriDiagSolve(const local_ordinal_type subline_length, const local_ordinal_type block_size) {
return 2 * subline_length * costTRSM(block_size) + 2 * (subline_length-1) * costGEMV(block_size);
}

template<typename local_ordinal_type>
local_ordinal_type costSolveSchur(const local_ordinal_type num_parts,
const local_ordinal_type num_teams,
const local_ordinal_type line_length,
const local_ordinal_type block_size,
const local_ordinal_type n_subparts_per_part) {
const local_ordinal_type subline_length = ceil(double(line_length - (n_subparts_per_part-1) * 2) / n_subparts_per_part);
if (subline_length < 1) {
return INT_MAX;
}

const local_ordinal_type p_n_lines = ceil(double(num_parts)/num_teams);
const local_ordinal_type p_n_sublines = ceil(double(n_subparts_per_part)*num_parts/num_teams);
const local_ordinal_type p_n_sublines_2 = ceil(double(n_subparts_per_part-1)*num_parts/num_teams);

const local_ordinal_type p_costApplyE = p_n_sublines_2 * subline_length * 2 * costGEMV(block_size);
const local_ordinal_type p_costApplyS = p_n_lines * costTriDiagSolve((n_subparts_per_part-1)*2,block_size);
const local_ordinal_type p_costApplyAinv = p_n_sublines * costTriDiagSolve(subline_length,block_size);
const local_ordinal_type p_costApplyC = p_n_sublines_2 * 2 * costGEMV(block_size);

if (n_subparts_per_part == 1) {
return p_costApplyAinv;
}
return p_costApplyE + p_costApplyS + p_costApplyAinv + p_costApplyC;
}

template<typename local_ordinal_type>
local_ordinal_type getAutomaticNSubparts(const local_ordinal_type num_parts,
const local_ordinal_type num_teams,
const local_ordinal_type line_length,
const local_ordinal_type block_size) {
local_ordinal_type n_subparts_per_part_0 = 1;
local_ordinal_type flop_0 = costSolveSchur(num_parts, num_teams, line_length, block_size, n_subparts_per_part_0);
local_ordinal_type flop_1 = costSolveSchur(num_parts, num_teams, line_length, block_size, n_subparts_per_part_0+1);
while (flop_0 > flop_1) {
flop_0 = flop_1;
flop_1 = costSolveSchur(num_parts, num_teams, line_length, block_size, (++n_subparts_per_part_0)+1);
}
return n_subparts_per_part_0;
}

template<typename ArgActiveExecutionMemorySpace>
struct SolveTridiagsDefaultModeAndAlgo;

///
/// setup part interface using the container partitions array
///
template<typename MatrixType>
BlockHelperDetails::PartInterface<MatrixType>
createPartInterface(const Teuchos::RCP<const typename BlockHelperDetails::ImplType<MatrixType>::tpetra_block_crs_matrix_type> &A,
const Teuchos::Array<Teuchos::Array<typename BlockHelperDetails::ImplType<MatrixType>::local_ordinal_type> > &partitions,
const typename BlockHelperDetails::ImplType<MatrixType>::local_ordinal_type n_subparts_per_part) {
const typename BlockHelperDetails::ImplType<MatrixType>::local_ordinal_type n_subparts_per_part_in) {
IFPACK2_BLOCKHELPER_TIMER("createPartInterface");
using impl_type = BlockHelperDetails::ImplType<MatrixType>;
using local_ordinal_type = typename impl_type::local_ordinal_type;
using local_ordinal_type_1d_view = typename impl_type::local_ordinal_type_1d_view;
using local_ordinal_type_2d_view = typename impl_type::local_ordinal_type_2d_view;
using size_type = typename impl_type::size_type;

const auto blocksize = A->getBlockSize();
constexpr int vector_length = impl_type::vector_length;
constexpr int internal_vector_length = impl_type::internal_vector_length;

const auto comm = A->getRowMap()->getComm();

Expand All @@ -867,6 +928,40 @@ namespace Ifpack2 {
const local_ordinal_type A_n_lclrows = A->getLocalNumRows();
const local_ordinal_type nparts = jacobi ? A_n_lclrows : partitions.size();

typedef std::pair<local_ordinal_type,local_ordinal_type> size_idx_pair_type;
std::vector<size_idx_pair_type> partsz(nparts);

if (!jacobi) {
for (local_ordinal_type i=0;i<nparts;++i)
partsz[i] = size_idx_pair_type(partitions[i].size(), i);
std::sort(partsz.begin(), partsz.end(),
[] (const size_idx_pair_type& x, const size_idx_pair_type& y) {
return x.first > y.first;
});
}

local_ordinal_type n_subparts_per_part;
if (n_subparts_per_part_in == -1) {
// If the number of subparts is set to -1, the user let the algorithm
// decides the value automatically
using execution_space = typename impl_type::execution_space;

const int line_length = partsz[0].first;

const local_ordinal_type team_size =
SolveTridiagsDefaultModeAndAlgo<typename execution_space::memory_space>::
recommended_team_size(blocksize, vector_length, internal_vector_length);

const local_ordinal_type num_teams = execution_space().concurrency() / (team_size * vector_length);

n_subparts_per_part = getAutomaticNSubparts(nparts, num_teams, line_length, blocksize);

printf("Automatically chosen n_subparts_per_part = %d for nparts = %d, num_teams = %d, team_size = %d, line_length = %d, and blocksize = %d;\n", n_subparts_per_part, nparts, num_teams, team_size, line_length, blocksize);
}
else {
n_subparts_per_part = n_subparts_per_part_in;
}

// Total number of sub lines:
const local_ordinal_type n_sub_parts = nparts * n_subparts_per_part;
// Total number of sub lines + the Schur complement blocks.
Expand Down Expand Up @@ -896,14 +991,6 @@ namespace Ifpack2 {
// reorder parts to maximize simd packing efficiency
p.resize(nparts);

typedef std::pair<local_ordinal_type,local_ordinal_type> size_idx_pair_type;
std::vector<size_idx_pair_type> partsz(nparts);
for (local_ordinal_type i=0;i<nparts;++i)
partsz[i] = size_idx_pair_type(partitions[i].size(), i);
std::sort(partsz.begin(), partsz.end(),
[] (const size_idx_pair_type& x, const size_idx_pair_type& y) {
return x.first > y.first;
});
for (local_ordinal_type i=0;i<nparts;++i)
p[i] = partsz[i].second;

Expand Down Expand Up @@ -2074,9 +2161,6 @@ namespace Ifpack2 {
};
#endif

template<typename ArgActiveExecutionMemorySpace>
struct SolveTridiagsDefaultModeAndAlgo;

template<typename impl_type, typename WWViewType>
KOKKOS_INLINE_FUNCTION
void
Expand Down Expand Up @@ -3251,7 +3335,7 @@ namespace Ifpack2 {

{
#ifdef IFPACK2_BLOCKTRIDICONTAINER_USE_PRINTF
printf("Star ComputeSchurTag\n");
printf("Start ComputeSchurTag\n");
#endif
IFPACK2_BLOCKHELPER_TIMER("BlockTriDi::NumericPhase::ComputeSchurTag");
writeBTDValuesToFile(part2packrowidx0_sub.extent(0), scalar_values_schur, "before_schur.mm");
Expand All @@ -3270,7 +3354,7 @@ namespace Ifpack2 {

{
#ifdef IFPACK2_BLOCKTRIDICONTAINER_USE_PRINTF
printf("Star FactorizeSchurTag\n");
printf("Start FactorizeSchurTag\n");
#endif
IFPACK2_BLOCKHELPER_TIMER("BlockTriDi::NumericPhase::FactorizeSchurTag");
Kokkos::TeamPolicy<execution_space,FactorizeSchurTag>
Expand Down
4 changes: 3 additions & 1 deletion packages/ifpack2/src/Ifpack2_SparseContainer_decl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ class SparseContainer
using inverse_mv_type = Tpetra::MultiVector<InverseScalar, InverseLocalOrdinal, InverseGlobalOrdinal, InverseNode>;
using InverseCrs = Tpetra::CrsMatrix<InverseScalar, InverseLocalOrdinal, InverseGlobalOrdinal, InverseNode>;
using InverseMap = typename Tpetra::Map<InverseLocalOrdinal, InverseGlobalOrdinal, InverseNode>;

using InverseGraph = typename InverseCrs::crs_graph_type;
using typename Container<MatrixType>::HostView;
using typename Container<MatrixType>::ConstHostView;
using HostViewInverse = typename inverse_mv_type::dual_view_type::t_host;
Expand Down Expand Up @@ -287,6 +287,8 @@ class SparseContainer

//! Extract the submatrices identified by the local indices set by the constructor.
void extract ();
void extractGraph ();
void extractValues ();

/// \brief Post-permutation, post-view version of apply().
///
Expand Down
Loading
Loading