Skip to content

OpenBLAS 0.3.27 version

Compare
Choose a tag to compare
@martin-frbg martin-frbg released this 04 Apr 20:33
· 402 commits to release-0.3.0 since this release
ce3f668

general:

  • added initial (generic) support for the CSKY architecture
  • capped the maximum number of threads used in GEMM, GETRF and POTRF to avoid creating
    underutilized or idle threads
  • sped up multithreaded POTRF on all platforms
  • added extension openblas_set_num_threads_local() that returns the previous thread count
  • re-evaluated the SGEMV and DGEMV load thresholds to avoid activating multithreading
    for too small workloads
  • improved the fallback code used when the precompiled number of threads is exceeded,
    and made it callable multiple times during the lifetime of an instance
  • added CBLAS interfaces for the BLAS extensions ?AMIN,?AMAX, CAXPYC and ZAXPYC
  • fixed a potential buffer overflow in the interface to the GEMMT kernels
  • fixed use of incompatible pointer types in GEMMT and C/ZAXPBY as flagged by GCC-14
  • fixed unwanted case sensitivity of the character parameters in ?TRTRS
  • sped up the OpenMP thread management code
  • fixed sizing of logical variables in INTERFACE64 builds of the C version of LAPACK
  • fixed inclusion of new LAPACK and LAPACKE functions from LAPACK 3.11 in the shared library
  • added a testsuite for the BLAS extensions
  • modified the error thresholds for SGS/DGS functions in the LAPACK testsuite to suppress
    spurious errors
  • added support for building the benchmark collection with CMAKE
  • added rewriting of linker options to avoid linking both libgomp and libomp in CMAKE builds
    with OpenMP enabled that use clang with gfortran
  • fixed building on systems with ucLibc
  • added support for calling ?NRM2 with a negative increment value on all architectures
  • added support for the LLVM18 version of the flang-new compiler
  • fixed handling of the OPENBLAS_LOOPS variable in several benchmarks
  • Integrated fixes from the Reference-LAPACK project:
    • Increased accuracy in C/ZLARFGP (Reference-LAPACK PR 981)

x86:

  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed GEMM3M functions failing in CMAKE builds

x86-64:

  • removed all instances of sched_yield() on Linux and BSD
  • fixed a potential deadlock in the thread server on MSWindows (introduced in 0.3.26)
  • fixed GEMM3M functions failing in CMAKE builds
  • fixed handling of NaN and Inf arguments in ZSCAL
  • added compiler checks for AVX512BF16 compatibility
  • fixed LLVM compiler options for Sapphire Rapids
  • fixed cpu handling fallbacks for Sapphire Rapids with
    disabled AVX2 in DYNAMIC_ARCH mode
  • fixed extensions SCSUM and DZSUM
  • improved GEMM performance for ZEN targets

arm:

  • fixed handling of NaN and Inf arguments in ZSCAL

arm64:

  • added initial support for the Cortex-A76 cpu
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed default compiler options for gcc (-march and -mtune)
  • added support for ArmCompilerForLinux
  • added support for the NeoverseV2 cpu in DYNAMIC_ARCH builds
  • fixed mishandling of the INTERFACE64 option in CMAKE builds
  • corrected SCSUM kernels (erroneously duplicating SCASUM behaviour)
  • added SVE-enabled kernels for CSUM/ZSUM
  • worked around an inaccuracy in the NRM2 kernels for NeoverseN1 and Apple M

power:

  • improved performance of SGEMM on POWER8/9/10
  • improved performance of DGEMM on POWER10
  • added support for OpenMP builds with xlc/xlf on AIX
  • improved cpu autodetection for DYNAMIC_ARCH builds on older AIX
  • fixed cpu core counting on AIX
  • added support for building a shared library on AIX

riscv64:

  • added support for the X280 cpu
  • added support for semi-generic RISCV models with vector length 128 or 256
  • added support for compiling with either RVV 0.7.1 or RVV 1.0 standard compilers
  • fixed handling of NaN and Inf arguments in ZSCAL
  • improved cpu model autodetection
  • fixed corner cases in ?AXPBY for C910V
  • fixed handling of zero increments in ?AXPY kernels for C910V

loongarch64:

  • added optimized kernels for ?AMIN and ?AMAX
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed handling of corner cases in ?AXPBY
  • fixed computation of SAMIN and DAMIN in LSX mode
  • fixed computation of ?ROT
  • added optimized SSYMV and DSYMV kernels for LSX and LASX mode
  • added optimized CGEMM and ZGEMM kernels for LSX and LASX mode
  • added optimized CGEMV and ZGEMV kernels

mips:

  • fixed utilizing MSA on P5600 and related cpus (broken in 0.3.22)
  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed mishandling of the INTERFACE64 option in CMAKE builds

zarch:

  • fixed handling of NaN and Inf arguments in ZSCAL
  • fixed calculation of ?SUM on Z13

md5sum
ef71c66ffeb1ab0f306a37de07d2667f OpenBLAS-0.3.27.tar.gz
4b85246b10d61f362fe8b9b45cd145f0 OpenBLAS-0.3.27.zip
317c6c4f93f233d8be8ea0ad6fd7979e OpenBLAS-0.3.27-x64-64.zip
2b8d25e6a01ad4830ecca4e521172b02 OpenBLAS-0.3.27-x64.zip
c59038e5ea36ee431f5cb7f5de8bf9d9 OpenBLAS-0.3.27-x86.zip

Download OpenBLAS