Slow performance of function `dsyev` and `dsyevx` (not fully paralleled) #4758

ajz34 · 2024-06-19T09:11:37Z

Hello developers!

I found that functions dsyev and dsyevx seems not fully paralleled, when

compiled by gcc (11.4/12.3)
AMD (16 cores @ Ryzen 7945HX of laptop / 2 cores @ EPYC 7763 of github action)
make options TARGET=ZEN USE_64BITINT=1 DYNAMIC_ARCH=1 NO_CBLAS=0 NO_LAPACK=0 NO_LAPACKE=0 NO_AFFINITY=1 USE_OPENMP=1

Preliminary testing on Intel CPU may also show similar problem.

I'm not sure whether if it's the problem of make configurations, or OpenBLAS currently not fully implemented parallel version of dsyev and dsyevx.
Hope to hear any thoughts or advices, and thanks in advance!

I guess that dsyevr and dsyevd could be better replacements to dsyev. dsyevd is the fastest but consumes more memory, while dsyevr uses much smaller temporary memory.
So additionally, as a programmer not very familiar to low-level BLAS/LAPACK, I wonder that if it's common to use dsyevr and dsyevd as eigen-solvers, instead of dsyev? If so, this may not be such important issue.

Benchmark results (16 cores @ Ryzen 7945HX)

CPU ratio refers to (CPU time) / (elapsed time)
eigen problem of matrix of 2048 x 2048, eigenvectors required, filled with values from 1 -- 2048

	OpenBLAS 0.3.27		MKL 2024.1
function	elapsed time	CPU ratio	elapsed time	CPU ratio
`dsyev`	4861.5 msec	1.65	588.7 msec	15.75
`dsyevd`	392.2 msec	13.69	225.9 msec	15.42
`dsyevr`	805.6 msec	6.30	698.6 msec	4.69
`dsyevx`	3969.8 msec	1.82	542.2 msec	15.76

Reproduction of this issue can be found in Github Action CI (2 physical cores @ EPYC 7763 of github action)(https://github.com/ajz34/issue_openblas_dsyev/actions/runs/9578584638/job/26409147609).

For scripts used in 16 cores @ Ryzen 7945HX, also see https://github.com/ajz34/issue_openblas_dsyev/tree/16-cores-Ryzen-7945HX.

The text was updated successfully, but these errors were encountered:

martin-frbg · 2024-06-19T11:01:51Z

The LAPACK included in OpenBLAS is almost completely copied from the reference implementation, also known as "netlib" LAPACK https://github.com/Reference-LAPACK/lapack - which is not optimized for speed (and not parallelized except for a few functions that can use OpenMP parallelism if available). Only a handful of functions (such as getrf/potrf) have been reimplemented in OpenBLAS, for everything else the only performance advantage over the reference implementation comes from using the optimized BLAS functions.
(And yes, I would expect that one "normally" prefers DSYEVD or DSYEVR over the original QR algorithm. (see e.g. https://scicomp.stackexchange.com/questions/11827/flop-counts-for-lapack-symmetric-eigenvalue-routines-dsyev-dsyevd-dsyevx-and-d )

ajz34 · 2024-06-19T13:11:22Z

@martin-frbg
I searched dsyev in issue list and found few issue talking on this topic. But I haven't expected that this also happens to zheev in issues tagged with Lapack issue 😹
And thanks for explanation! 😄

martin-frbg · 2024-06-19T13:50:24Z

Well, it might be a useful coincidence if the bottleneck in DSYEV turned out to be the (D)LASR function too, but I have not checked.

martin-frbg added the LAPACK issue Deficiency in code imported from Reference-LAPACK label Jun 19, 2024

ajz34 mentioned this issue Oct 10, 2024

CMake causes slow eigenvalue decomposition (dsyev, dspgv, etc.) #4931

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow performance of function `dsyev` and `dsyevx` (not fully paralleled) #4758

Slow performance of function `dsyev` and `dsyevx` (not fully paralleled) #4758

ajz34 commented Jun 19, 2024

martin-frbg commented Jun 19, 2024 •

edited

Loading

ajz34 commented Jun 19, 2024 •

edited

Loading

martin-frbg commented Jun 19, 2024

Slow performance of function dsyev and dsyevx (not fully paralleled) #4758

Slow performance of function dsyev and dsyevx (not fully paralleled) #4758

Comments

ajz34 commented Jun 19, 2024

martin-frbg commented Jun 19, 2024 • edited Loading

ajz34 commented Jun 19, 2024 • edited Loading

martin-frbg commented Jun 19, 2024

Slow performance of function `dsyev` and `dsyevx` (not fully paralleled) #4758

Slow performance of function `dsyev` and `dsyevx` (not fully paralleled) #4758

martin-frbg commented Jun 19, 2024 •

edited

Loading

ajz34 commented Jun 19, 2024 •

edited

Loading