Releases: ROCm/rocFFT
Releases · ROCm/rocFFT
rocFFT 1.0.24 for ROCm 5.7.0
Optimizations
- Improved performance of complex forward/inverse 1D FFTs (2049 <= length <= 131071) that use Bluestein's algorithm.
Added
- Implemented a solution map version converter and finish the first conversion from ver.0 to ver.1. Where version 1 removes some incorrect kernels (sbrc/sbcr using half_lds)
Changed
- Moved rocfft_rtc_helper executable to lib/rocFFT directory on Linux.
- Moved library kernel cache to lib/rocFFT directory.
rocFFT 1.0.23 for ROCm 5.6.1
rocFFT code for ROCm 5.6.1 did not change. The library was rebuilt for the updated ROCm 5.6.1 stack.
rocFFT 1.0.23 for ROCm 5.6.0
Added
- Implemented half-precision transforms, which can be requested by passing rocfft_precision_half to rocfft_plan_create.
- Implemented a hierarchical solution map which saves how to decompose a problem and the kernels to be used.
- Implemented a first version of offline-tuner to support tuning kernels for C2C/Z2Z problems.
Changed
- Replaced std::complex with hipComplex data types for data generator.
- FFT plan dimensions are now sorted to be row-major internally where possible, which produces better plans if the dimensions were accidentally specified in a different order (column-major, for example).
- Added --precision argument to benchmark/test clients. --double is still accepted but is deprecated as a method to request a double-precision transform.
Fixed
- Fixed over-allocation of LDS in some real-complex kernels, which was resulting in kernel launch failure.
rocFFT 1.0.22 for ROCm 5.5.1
rocFFT code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.
rocFFT 1.0.22 for ROCm 5.5.0
Optimizations
- Improved performance of 1D lengths < 2048 that use Bluestein's algorithm.
- Reduced time for generating code during plan creation.
- Optimized 3D R2C/C2R lengths 32, 84, 128.
- Optimized batched small 1D R2C/C2R cases.
Added
- Added gfx1101 to default AMDGPU_TARGETS.
Changed
- Moved client programs to C++17.
- Moved planar kernels and infrequently used Stockham kernels to be runtime-compiled.
- Moved transpose, real-complex, Bluestein, and Stockham kernels to library kernel cache.
Fixed
- Removed zero-length twiddle table allocations, which fixes errors from hipMallocManaged.
- Fixed incorrect freeing of HIP stream handles during twiddle computation when multiple devices are present.
rocFFT 1.0.21 for ROCm 5.4.4
rocFFT code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.
rocFFT 1.0.21 for ROCm 5.4.3
Fixed
- Removed source directory from rocm_install_targets call to prevent installation of rocfft.h in an unintended location.
rocFFT 1.0.20 for ROCm 5.4.2
rocFFT code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.
rocFFT 1.0.20 for ROCm 5.4.1
Fixed
- Fixed incorrect results on strided large 1D FFTs where batch size does not equal the stride.
rocFFT 1.0.19 for ROCm 5.4.0
Optimizations
- Optimized some strided large 1D plans.
Added
- Added rocfft_plan_description_set_scale_factor API to efficiently multiply each output element of a FFT by a given scaling factor.
- Created a rocfft_kernel_cache.db file next to the installed library. SBCC kernels are moved to this file when built with the library, and are runtime-compiled for new GPU architectures.
- Added gfx1100 and gfx1102 to default AMDGPU_TARGETS.
Changed
- Moved runtime compilation cache to in-memory by default. A default on-disk cache can encounter contention problems
on multi-node clusters with a shared filesystem. rocFFT can still be told to use an on-disk cache by setting the
ROCFFT_RTC_CACHE_PATH environment variable.