This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
Releases: JuliaGPU/CuArrays.jl
Releases · JuliaGPU/CuArrays.jl
v1.3.0
New features:
- #330: wrappers for CUTENSOR 0.2.2
- #414: support for multiple memory pools, and an initial
split
memory allocator. activate by running withCUARRAYS_MEMORY_POOL=split
-- feedback is appreciated. - #421: headers are now autogenerates, and expose all of the underlying APIs
- #416: optimized ND-reverse kernel
- #447: optimized ND-accumulate kernel (also findall, cumsum, etc)
Important bug fixes:
v1.2.1
Hotfix: avoid errors when CUDNN is not available and its functionality isn't used.
v1.2.0
v1.2.0 (2019-08-23)
Closed issues:
- size(collect(F.Q)) != size(Matrix(F.Q)) when size(F, 1) > size(F, 2) for F :: CuQR (#393)
- CuArrays.randn(odd_number, another_odd_number) throws CURANDError (#392)
- Pkg.test("CuArrays") fails on julia-1.1.1, linux, sm35, sm70, cuda10.1, cudnn7.0.6 (#387)
- Assigning a number to a CuArray slice via broadcasting fails (#386)
- using CuArrays fails (#384)
- Invalid versions with Adapt (#374)
- Allow CuArrays to initialise if there is no CUDA/GPU on the system (#371)
- LLVM error when using abs function to TrackedArray + CuArray (#346)
Merged pull requests:
- Improve CURAND: support for non-pow2 dims, and other fixes (#399) (maleadt)
- Define det(::CuQRPackedQ) (#397) (tkf)
- Only set the CUBLAS math mode on CUBLAS 9.0 or higher (#396) (maleadt)
- FFT test failure workarounds (#395) (maleadt)
- Avoid ugly rounding issues when printing pool stats. (#394) (maleadt)
- Define det(::CuQRPackedQ) (#391) (tkf)
- Add ldiv! and tests (#383) (willtebbutt)
- Don't use map to convert SubArray indices. (#382) (maleadt)
- Warn about missing libraries. (#381) (maleadt)
- Support for limiting total memory usage. (#379) (maleadt)
- Replace Pkg.build with runtime initialization (#375) (maleadt)
- Add more methods for seed! (#344) (findmyway)
v1.1.0
v1.1.0 (2019-07-17)
Closed issues:
- Slow permute(view) (#367)
- view of CuArray with CPU indices yields CPU array (#360)
- Sudden performance drop after many matrix multiplications (#359)
- Avoid scalar fallback for non-contiguous views (#354)
- unsupported call through a literal pointer on float power broadcast (#349)
- Using CuArrays with complex arrays (#347)
- CuArrays test fail: an illegal memory access (#342)
- Fail to precompile CuArrays (#329)
- Wrong NNlib bounds wiith CuArrays 1.0.2 (#327)
- Register v1.0.2 (#326)
- RFC: Rename CuArrays.CURAND.curand -> CuArrays.CURAND.rand (#310)
- reverse(::CuArray) (#299)
- Support for cufftSetStream() (#256)
- Package is not loadable if e.g. CUBLAS cannot be opened (#255)
- Mimic Base exports (#224)
- Reduction to a tuple (or any other complex type) (#28)
Merged pull requests:
- Add 1.2 testing. (#370) (maleadt)
- Fix ND GPUArrays.LocalMemory allocation. (#369) (maleadt)
- Allow combined use of SubArray and PermutedDimsArray (#368) (maleadt)
- Add dims support to similar for broadcasted arrays (#358) (lcw)
- remove redundant global annotation on TimerOutput in function (#357) (KristofferC)
- Various small tests (#353) (kshyatt)
- More tests for BLAS and improve error msg (#352) (kshyatt)
- precompile is the default now. (#337) (maleadt)
- Use more efficient / nondeprecated array ctors. (#335) (maleadt)
- Support for reversing arrays. (#334) (maleadt)
- Wrap cufftSetStream. (#332) (maleadt)
- Don't export cu-prefixed functions, module prefix instead. (#331) (maleadt)
- Always require the libraries that are part of CUDA. (#328) (maleadt)
- broadcast for ^ (#325) (chengchingwen)
v1.0.2
Use released versions of dependencies.
v1.0.1
v1.0.0
v0.9.1
Bug fix release:
- fix loading the package when build failed
- fix
CuArrays.@sync
not properly synchronizing - fix svd wrappers and high level interface (@andreasnoack)
- improve broadcast rewrite of custom functions (@chengchingwen)