Releases · JuliaGPU/CUDA.jl

13 Jan 02:16

github-actions

v2.5.0

bfb5d73

v2.5.0

CUDA v2.5.0

Diff since v2.4.0

Assets 2

08 Jan 14:55

github-actions

v2.4.0

2d5700c

v2.4.0

CUDA v2.4.0

Diff since v2.3.0

Closed issues:

cublasXtStrmm test failures on Windows 10 Julia 1.1 (#124)
CUSPARSE tests broken (#259)
Make @cuda return a kernel object (#341)
Depend on CompilerSupportLibraries (#359)
CUBLAS and exceptions test failures on Windows (#536)
argmax(::CuArray) returns nothing with NaN-values (#553)
Multiple @cuDynamicSharedMem in kernel causes unexpected behavior (#555)
Illegal memory access with atomic shared memory (#558)
CUDA.sqrt will not found symbol "__nv_sqrt" (#559)
Exception with CUDA.exp (#561)
Use LazyArtifacts instead of Pkg (#570)
Test runner: early bail out (#578)
memory reporting issue (#579)
c[3:4]=0 leads to exception (#580)
Add math ops (including broadcast) for half types (#581)
Dot product of Array and CuArray fails with CPU address error. (#586)
Support for CUDA-capable GPU with compute capability 4.0 like GTX 1080 (#587)
mapreducedim! not threadsafe (#588)
Allow separate directories for cuda and cudnn (#590)
Difficulties installing CUDA on Julia 1.6.0 . (#591)
Bug in Initialisation Error (#603)
CUDA.jl initialisation fails after suspending Ubuntu 20.04 with CUDA 11.2 (#605)
CUDA 11.2 CUBLASError and "CUDA.jl does not yet support CUDA with nvdisasm 11.2.67" (#607)
This intrinsic must be compiled to be called (#611)
OpenGL interop (#612)
Add support for CuFFT callback functions (#614)
I can’t multiply a CSR sparse matrix anymore (#615)
Julia version requirement (#619)

Merged pull requests:

Support all combinations of datatypes and transposes/adjoints in LinearAlgebra (#535) (@cqql)
Use structs for texture intrinsic return types. (#554) (@maleadt)
Backport some 1.6 fixes (#557) (@maleadt)
Update manifest (#560) (@github-actions[bot])
Correct dims error (#562) (@DhairyaLGandhi)
Lock _shmem_cb (#564) (@vchuravy)
Move to Julia 1.6 (#566) (@maleadt)
Adapt to JuliaLang/julia#38487. (#568) (@maleadt)
Support for 'delayed kernels' (#569) (@maleadt)
Run cuda-memcheck as part of CI (#571) (@maleadt)
Use at-sync instead of calls to synchronize in tests. (#572) (@maleadt)
Update artifacts to include cuda-memcheck (#573) (@maleadt)
Use LazyArtifacts instead of Pkg. (#574) (@maleadt)
Improve LinearAlgebra impl methods for triangular types (#575) (@maleadt)
New findmin/max implementation using single-pass reduction (#576) (@maleadt)
Fix synchronization before testing cublasXt calls. (#577) (@maleadt)
Fix used memory reporting. (#582) (@maleadt)
Implement Statistics.varm/stdm instead of Statistics._var (#583) (@sdewaele)
Test for #558. (#584) (@maleadt)
Add a quick failure option to the test runner. (#585) (@maleadt)
Add lock around cfunction lookup (#589) (@vchuravy)
Catch all initialization errors. (#593) (@maleadt)
Update dependencies. (#596) (@maleadt)
Fix wrong initialisation error message (#604) (@qin-yu)
Fixes wrong spacing in docstring admonition (#608) (@navidcy)
Fix broadcasting with Base.angle (#618) (@marius311)
Test with the 1.6 nightly, not 1.7. (#620) (@maleadt)
Wrap cudaGL.h (#621) (@maleadt)
Initial compatibility with CUDA 11.2. (#622) (@maleadt)
1.5 compatibility release (#623) (@maleadt)
Add CUDA 11.2 artifacts. (#624) (@maleadt)

Assets 2

19 Nov 01:35

github-actions

v2.3.0

e06704d

v2.3.0

CUDA v2.3.0

Diff since v2.2.1

Closed issues:

Misaligned address on load from Const (#548)

Merged pull requests:

Allow PermutedDimsArray in gemm_strided_batched (#539) (@mcabbott)
Fix broken checkbounds for CuSparseMatrixCSR and tests (#545) (@achuchmala)
Emphasize rebooting option. (#547) (@xanfus)
fix address calculation for ldg (#549) (@vchuravy)
Don't use explicit per-stream threads. (#551) (@maleadt)

Assets 2

13 Nov 11:43

maleadt

v2.2.1

1596a2c

v2.2.1

CUDA v2.2.1

Diff since v2.2.0

Assets 2

13 Nov 09:00

github-actions

v2.2.0

b5f8373

v2.2.0

CUDA v2.2.0

Diff since v2.1.0

Closed issues:

cudnn missing after downloading artifact (#521)
Downloading artifact: CUDA110 when using DiffEqFlux (#542)

Merged pull requests:

Update manifest (#520) (@github-actions[bot])
Try out Buildkite. (#522) (@maleadt)
Update manifest (#529) (@github-actions[bot])
Support for / Upgrade to CUDA 11.1 update 1. (#530) (@maleadt)
Fix and test svd! (#531) (@maleadt)
Move more CI to Buildkite. (#532) (@maleadt)
Use type symbols to generate wrapper methods (#534) (@cqql)
Fully move to Buildkite. (#537) (@maleadt)
Add unit_diag option for sv2! functions (#540) (@amontoison)
Documentation fixes (#543) (@maleadt)

Assets 2

30 Oct 12:11

github-actions

v2.1.0

602c549

v2.1.0

CUDA v2.1.0

Diff since v2.0.2

Closed issues:

CUDNN convolution with Float16 always returns zeros (#92)
axp(b)y! and mul! (scalar multiplication) with mixed argument types (#144)
Dispatching to generic matmul instead of CUBLAS (#164)
Support for Ints and Float16? (#165)
Subarrays/views support (#172)
Easy way to pick among multiple GPUs (#174)
More prominently document JULIA_CUDA_USE_BINARYBUILDER (#204)
ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (#247)
Pkg.test error for cutensor test on Windows (#422)
Runtime build improvements (#456)
Fusing Wrappers (#467)
Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (#482)
mapreduce assumes commutative op (#484)
SubArray Broadcast Bug in 2.0 (#488)
Nested SubArray Scalar Indexing (#490)
Sparse matrix * view(vector) regression in 2.0 (#493)
Error transforming a reshaped 0-dimentional GPU array to a CPU array (#494)
test cuda FAILURE (#496)
Reshaped CuArray is not DenseCuArray (#511)
assignment failure when using array slicing. (#516)

Merged pull requests:

Use the correct CUDNN scaling parameter type. (#454) (@maleadt)
Fix versioned dylib discovery. (#486) (@maleadt)
Move inv from GPUArrays. (#487) (@maleadt)
Use dense array types in sparse wrappers. (#495) (@maleadt)
Update manifest (#497) (@github-actions[bot])
Revert array wrapper union changes (#498) (@maleadt)
Clean-up pointer field. (#499) (@maleadt)
mapreduce: change iteration for compatibility with non-commutative operators. (#500) (@maleadt)
Use versioned libcuda (#502) (@maleadt)
Dynamically choose versioned libcuda (#503) (@mustafaquraish)
Update multigpu.md (#504) (@efmanu)
Upgrade artifacts for CUDA 11 compatibility. (#506) (@maleadt)
Update dependencies. (#507) (@maleadt)
Convert unsigned short ints to Cint for printf. (#508) (@maleadt)
Update manifest (#510) (@github-actions[bot])
Fix reshape with missing dimensions. (#512) (@maleadt)
Don't return a pointer from 'alias'. (#513) (@maleadt)
Add some docs (#514) (@maleadt)
Fix CUDNN-optimized activation broadcasts (#515) (@maleadt)
Fix cooperative launch test. (#517) (@maleadt)
Fixes for Windows (#518) (@maleadt)
CUTENSOR fixes on Windows (#519) (@maleadt)

Assets 2

15 Oct 14:14

github-actions

v2.0.2

a8ac15c

v2.0.2

CUDA v2.0.2

Diff since v2.0.1

Closed issues:

cu() behavior for complex floating point numbers (#91)
Error when following example on using multiple GPUs on multiple processes (#468)
MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
Drop BinaryProvider? (#474)
Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3)) broken (#480)
copyto!() between cpu and gpu with subarrays (#491)

Merged pull requests:

Adapt to GPUCompiler changes. (#458) (@maleadt)
Fix initialization of global state (#471) (@maleadt)
Remove 'view' implementation. (#472) (@maleadt)
Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
Remove BinaryProvider dep. (#475) (@maleadt)
typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
NFC array simplifications. (#481) (@maleadt)
Update manifest (#485) (@github-actions[bot])
Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)

Assets 2

05 Oct 08:12

github-actions

v2.0.1

785c3b3

v2.0.1

CUDA v2.0.1

Diff since v2.0.0

Closed issues:

Can't update (#462)

Merged pull requests:

Remove duplicate comment (#464) (@blegat)
Add functionality to precompile the runtime library. (#465) (@maleadt)
Update manifest (#470) (@github-actions[bot])

Assets 2

02 Oct 07:12

github-actions

v2.0.0

70d93cc

v2.0.0

CUDA v2.0.0

Diff since v1.3.3

Closed issues:

Test failure during threading tests (#15)
Bad allocations in memory pool after device_reset! (#16)
CuArrays can lose Blas on reshaped views (#78)
allowscalar performance (#87)
Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
5-arg mul! for CUSPARSE (#98)
copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
Array primitives broken for CUSPARSE arrays (#113)
SplittingPool: CPU allocations (#117)
error while concatenating to an empty CuArray (#139)
Showing sparse arrays goes wrong (#146)
Improve test coverage (#147)
CuArrays allocates a lot of memory on the default GPU (#153)
[Feature Request] Indexing CuArray with CuArray (#155)
Reshaping CuArray throws error during backpropagation (#162)
Match syntax and APIs against Julia 1.0 standard libraries (#163)
CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
RFC: converts SparseMatrixCSC to CuSparseMatrixCSR via cu by default (#216)
Add a CuSparseMatrixCOO type (#220)
Test runner stumbles over path separators (#236)
Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
Atomic operations only work on global memory (#311)
Performance: cudnn algorithm selection (#318)
CUSPARSE is broken in CUDA.jl 1.2 (#322)
Device-side broadcast regression on 1.5 (#350)
API for fast math-like mode (#354)
CUDA 11.0 Update 1: cublasSetWorkspace (#365)
Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
CUDA throws OOM error when initializing API on multiple devices (#398)
Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
Reverse performance regression (#410)
Tag for LLVM 3? (#412)
CUDA not working (#415)
StatsBase.transform fails on CuArray (#426)
Further unification of CUBLAS.axpy! and LinearAlgebra.BLAS.axpy! (#432)
size(range), length(range) and range[end] fail inside CUDA kernels (#434)
InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
Missing dispatch for matrix multiplication with views? (#448)
New version not available yet? (#452)
using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
Unable to upgrade to the latest version (#459)

Merged pull requests:

Performance improvements by calling cuDNN API (#321) (@gartangh)
Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
Adapt to GPUArrays test change. (#394) (@maleadt)
Default to per-thread stream semantics (#395) (@maleadt)
Add a missing context argument for stateless codegen. (#399) (@maleadt)
Keep track of package latency timings. (#400) (@maleadt)
Update manifest (#402) (@github-actions[bot])
Latency improvements (#403) (@maleadt)
Fix bounds checking with GPU views. (#404) (@maleadt)
Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
Refactor CUSPARSE (#409) (@maleadt)
Fix typo (#411) (@yixingfu)
Update manifest (#413) (@github-actions[bot])
Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
Simplify and update wrappers (#416) (@maleadt)
GEMM improvements (#417) (@maleadt)
CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
add CuSparseMatrixCOO (#421) (@marius311)
Update manifest (#423) (@github-actions[bot])
Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
Improve init error message (#425) (@maleadt)
CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
CUFFT-related optimizations (#428) (@maleadt)
Fix reverse/view regression (#429) (@maleadt)
Update packages (#433) (@maleadt)
Introduce StridedCuArray (#435) (@maleadt)
Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
Introduce DenseCuArray union (#437) (@maleadt)
Array simplifications (#438) (@maleadt)
Fix and test reverse on wrapped array. (#439) (@maleadt)
Fixes after recent array wrapper changes (#441) (@maleadt)
Adapt to GPUArrays changes. (#442) (@maleadt)
Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
Fix finalization of copied arrays. (#444) (@maleadt)
Support for/Add CUDA 11.1 (#445) (@maleadt)
Update manifest (#449) (@github-actions[bot])
Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
Have convert call CuSparseArray's constructors. (#451) (@maleadt)

Assets 2

25 Aug 11:08

github-actions

v1.3.3

be21077

v1.3.3

CUDA v1.3.3

Diff since v1.3.2

Closed issues:

Type changing Array conversions give error when allowscalar(false) (#344)
getindex(::CuArray, ::Adjoint, ::Colon) fails (#345)
View with array indices causes memory copy before broadcast (#384)
Regression with Julia 1.5 (#390)

Merged pull requests:

Replace DevicePtr with Core.LLVMPtr. (#199) (@maleadt)
Make sure view indices reside on the GPU too. (#388) (@maleadt)
CompatHelper: Update DataStructures to v0.18 (#389) (@ChrisRackauckas)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v2.5.0

CUDA v2.4.0

CUDA v2.3.0

CUDA v2.2.1

CUDA v2.2.0

CUDA v2.1.0

CUDA v2.0.2

CUDA v2.0.1

CUDA v2.0.0

CUDA v1.3.3

Releases: JuliaGPU/CUDA.jl

v2.5.0

CUDA v2.5.0

v2.4.0

CUDA v2.4.0

v2.3.0

CUDA v2.3.0

v2.2.1

CUDA v2.2.1

v2.2.0

CUDA v2.2.0

v2.1.0

CUDA v2.1.0

v2.0.2

CUDA v2.0.2

v2.0.1

CUDA v2.0.1

v2.0.0

CUDA v2.0.0

v1.3.3

CUDA v1.3.3