Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v2.5.0
CUDA v2.5.0
v2.4.0
CUDA v2.4.0
Closed issues:
- cublasXtStrmm test failures on Windows 10 Julia 1.1 (#124)
- CUSPARSE tests broken (#259)
- Make @cuda return a kernel object (#341)
- Depend on CompilerSupportLibraries (#359)
- CUBLAS and exceptions test failures on Windows (#536)
- argmax(::CuArray) returns nothing with NaN-values (#553)
- Multiple @cuDynamicSharedMem in kernel causes unexpected behavior (#555)
- Illegal memory access with atomic shared memory (#558)
- CUDA.sqrt will not found symbol "__nv_sqrt" (#559)
- Exception with CUDA.exp (#561)
- Use LazyArtifacts instead of Pkg (#570)
- Test runner: early bail out (#578)
- memory reporting issue (#579)
- c[3:4]=0 leads to exception (#580)
- Add math ops (including broadcast) for half types (#581)
- Dot product of Array and CuArray fails with CPU address error. (#586)
- Support for CUDA-capable GPU with compute capability 4.0 like GTX 1080 (#587)
- mapreducedim! not threadsafe (#588)
- Allow separate directories for cuda and cudnn (#590)
- Difficulties installing CUDA on Julia 1.6.0 . (#591)
- Bug in Initialisation Error (#603)
- CUDA.jl initialisation fails after suspending Ubuntu 20.04 with CUDA 11.2 (#605)
- CUDA 11.2 CUBLASError and "CUDA.jl does not yet support CUDA with nvdisasm 11.2.67" (#607)
- This intrinsic must be compiled to be called (#611)
- OpenGL interop (#612)
- Add support for CuFFT callback functions (#614)
- I can’t multiply a CSR sparse matrix anymore (#615)
- Julia version requirement (#619)
Merged pull requests:
- Support all combinations of datatypes and transposes/adjoints in LinearAlgebra (#535) (@cqql)
- Use structs for texture intrinsic return types. (#554) (@maleadt)
- Backport some 1.6 fixes (#557) (@maleadt)
- Update manifest (#560) (@github-actions[bot])
- Correct dims error (#562) (@DhairyaLGandhi)
- Lock
_shmem_cb
(#564) (@vchuravy) - Move to Julia 1.6 (#566) (@maleadt)
- Adapt to JuliaLang/julia#38487. (#568) (@maleadt)
- Support for 'delayed kernels' (#569) (@maleadt)
- Run cuda-memcheck as part of CI (#571) (@maleadt)
- Use at-sync instead of calls to synchronize in tests. (#572) (@maleadt)
- Update artifacts to include cuda-memcheck (#573) (@maleadt)
- Use LazyArtifacts instead of Pkg. (#574) (@maleadt)
- Improve LinearAlgebra impl methods for triangular types (#575) (@maleadt)
- New findmin/max implementation using single-pass reduction (#576) (@maleadt)
- Fix synchronization before testing cublasXt calls. (#577) (@maleadt)
- Fix used memory reporting. (#582) (@maleadt)
- Implement Statistics.varm/stdm instead of Statistics._var (#583) (@sdewaele)
- Test for #558. (#584) (@maleadt)
- Add a quick failure option to the test runner. (#585) (@maleadt)
- Add lock around
cfunction
lookup (#589) (@vchuravy) - Catch all initialization errors. (#593) (@maleadt)
- Update dependencies. (#596) (@maleadt)
- Fix wrong initialisation error message (#604) (@qin-yu)
- Fixes wrong spacing in docstring admonition (#608) (@navidcy)
- Fix broadcasting with Base.angle (#618) (@marius311)
- Test with the 1.6 nightly, not 1.7. (#620) (@maleadt)
- Wrap cudaGL.h (#621) (@maleadt)
- Initial compatibility with CUDA 11.2. (#622) (@maleadt)
- 1.5 compatibility release (#623) (@maleadt)
- Add CUDA 11.2 artifacts. (#624) (@maleadt)
v2.3.0
v2.2.1
CUDA v2.2.1
v2.2.0
CUDA v2.2.0
Closed issues:
- cudnn missing after downloading artifact (#521)
- Downloading artifact: CUDA110 when using DiffEqFlux (#542)
Merged pull requests:
- Update manifest (#520) (@github-actions[bot])
- Try out Buildkite. (#522) (@maleadt)
- Update manifest (#529) (@github-actions[bot])
- Support for / Upgrade to CUDA 11.1 update 1. (#530) (@maleadt)
- Fix and test svd! (#531) (@maleadt)
- Move more CI to Buildkite. (#532) (@maleadt)
- Use type symbols to generate wrapper methods (#534) (@cqql)
- Fully move to Buildkite. (#537) (@maleadt)
- Add unit_diag option for sv2! functions (#540) (@amontoison)
- Documentation fixes (#543) (@maleadt)
v2.1.0
CUDA v2.1.0
Closed issues:
- CUDNN convolution with Float16 always returns zeros (#92)
- axp(b)y! and mul! (scalar multiplication) with mixed argument types (#144)
- Dispatching to generic matmul instead of CUBLAS (#164)
- Support for Ints and Float16? (#165)
- Subarrays/views support (#172)
- Easy way to pick among multiple GPUs (#174)
- More prominently document JULIA_CUDA_USE_BINARYBUILDER (#204)
- ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (#247)
- Pkg.test error for cutensor test on Windows (#422)
- Runtime build improvements (#456)
- Fusing Wrappers (#467)
- Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (#482)
- mapreduce assumes commutative op (#484)
- SubArray Broadcast Bug in 2.0 (#488)
- Nested SubArray Scalar Indexing (#490)
- Sparse matrix * view(vector) regression in 2.0 (#493)
- Error transforming a reshaped 0-dimentional GPU array to a CPU array (#494)
- test cuda FAILURE (#496)
- Reshaped CuArray is not DenseCuArray (#511)
- assignment failure when using array slicing. (#516)
Merged pull requests:
- Use the correct CUDNN scaling parameter type. (#454) (@maleadt)
- Fix versioned dylib discovery. (#486) (@maleadt)
- Move inv from GPUArrays. (#487) (@maleadt)
- Use dense array types in sparse wrappers. (#495) (@maleadt)
- Update manifest (#497) (@github-actions[bot])
- Revert array wrapper union changes (#498) (@maleadt)
- Clean-up pointer field. (#499) (@maleadt)
- mapreduce: change iteration for compatibility with non-commutative operators. (#500) (@maleadt)
- Use versioned libcuda (#502) (@maleadt)
- Dynamically choose versioned libcuda (#503) (@mustafaquraish)
- Update multigpu.md (#504) (@efmanu)
- Upgrade artifacts for CUDA 11 compatibility. (#506) (@maleadt)
- Update dependencies. (#507) (@maleadt)
- Convert unsigned short ints to Cint for printf. (#508) (@maleadt)
- Update manifest (#510) (@github-actions[bot])
- Fix reshape with missing dimensions. (#512) (@maleadt)
- Don't return a pointer from 'alias'. (#513) (@maleadt)
- Add some docs (#514) (@maleadt)
- Fix CUDNN-optimized activation broadcasts (#515) (@maleadt)
- Fix cooperative launch test. (#517) (@maleadt)
- Fixes for Windows (#518) (@maleadt)
- CUTENSOR fixes on Windows (#519) (@maleadt)
v2.0.2
CUDA v2.0.2
Closed issues:
- cu() behavior for complex floating point numbers (#91)
- Error when following example on using multiple GPUs on multiple processes (#468)
- MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
- Drop BinaryProvider? (#474)
- Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3))
broken (#480)- copyto!() between cpu and gpu with subarrays (#491)
Merged pull requests:
- Adapt to GPUCompiler changes. (#458) (@maleadt)
- Fix initialization of global state (#471) (@maleadt)
- Remove 'view' implementation. (#472) (@maleadt)
- Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
- Remove BinaryProvider dep. (#475) (@maleadt)
- typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
- NFC array simplifications. (#481) (@maleadt)
- Update manifest (#485) (@github-actions[bot])
- Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)
v2.0.1
v2.0.0
CUDA v2.0.0
Closed issues:
- Test failure during threading tests (#15)
- Bad allocations in memory pool after device_reset! (#16)
- CuArrays can lose Blas on reshaped views (#78)
- allowscalar performance (#87)
- Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
- 5-arg mul! for CUSPARSE (#98)
- copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
- Array primitives broken for CUSPARSE arrays (#113)
- SplittingPool: CPU allocations (#117)
- error while concatenating to an empty CuArray (#139)
- Showing sparse arrays goes wrong (#146)
- Improve test coverage (#147)
- CuArrays allocates a lot of memory on the default GPU (#153)
- [Feature Request] Indexing CuArray with CuArray (#155)
- Reshaping CuArray throws error during backpropagation (#162)
- Match syntax and APIs against Julia 1.0 standard libraries (#163)
- CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
- RFC: converts
SparseMatrixCSC
toCuSparseMatrixCSR
viacu
by default (#216) - Add a CuSparseMatrixCOO type (#220)
- Test runner stumbles over path separators (#236)
- Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
- Atomic operations only work on global memory (#311)
- Performance: cudnn algorithm selection (#318)
- CUSPARSE is broken in CUDA.jl 1.2 (#322)
- Device-side broadcast regression on 1.5 (#350)
- API for fast math-like mode (#354)
- CUDA 11.0 Update 1: cublasSetWorkspace (#365)
- Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
- CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
- CUDA throws OOM error when initializing API on multiple devices (#398)
- Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
- Reverse performance regression (#410)
- Tag for LLVM 3? (#412)
- CUDA not working (#415)
StatsBase.transform
fails onCuArray
(#426)- Further unification of
CUBLAS.axpy!
andLinearAlgebra.BLAS.axpy!
(#432) - size(range), length(range) and range[end] fail inside CUDA kernels (#434)
- InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
- Missing dispatch for matrix multiplication with views? (#448)
- New version not available yet? (#452)
- using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
- Unable to upgrade to the latest version (#459)
Merged pull requests:
- Performance improvements by calling cuDNN API (#321) (@gartangh)
- Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
- Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
- Adapt to GPUArrays test change. (#394) (@maleadt)
- Default to per-thread stream semantics (#395) (@maleadt)
- Add a missing context argument for stateless codegen. (#399) (@maleadt)
- Keep track of package latency timings. (#400) (@maleadt)
- Update manifest (#402) (@github-actions[bot])
- Latency improvements (#403) (@maleadt)
- Fix bounds checking with GPU views. (#404) (@maleadt)
- Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
- Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
- Refactor CUSPARSE (#409) (@maleadt)
- Fix typo (#411) (@yixingfu)
- Update manifest (#413) (@github-actions[bot])
- Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
- Simplify and update wrappers (#416) (@maleadt)
- GEMM improvements (#417) (@maleadt)
- CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
- add CuSparseMatrixCOO (#421) (@marius311)
- Update manifest (#423) (@github-actions[bot])
- Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
- Improve init error message (#425) (@maleadt)
- CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
- CUFFT-related optimizations (#428) (@maleadt)
- Fix reverse/view regression (#429) (@maleadt)
- Update packages (#433) (@maleadt)
- Introduce StridedCuArray (#435) (@maleadt)
- Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
- Introduce DenseCuArray union (#437) (@maleadt)
- Array simplifications (#438) (@maleadt)
- Fix and test reverse on wrapped array. (#439) (@maleadt)
- Fixes after recent array wrapper changes (#441) (@maleadt)
- Adapt to GPUArrays changes. (#442) (@maleadt)
- Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
- Fix finalization of copied arrays. (#444) (@maleadt)
- Support for/Add CUDA 11.1 (#445) (@maleadt)
- Update manifest (#449) (@github-actions[bot])
- Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
- Have convert call CuSparseArray's constructors. (#451) (@maleadt)