v1.2.1
github-actions
released this
31 Jul 08:08
·
2702 commits
to master
since this release
CUDA v1.2.1
Closed issues:
- CuArrays.zeros(T, 0) fails (#81)
- CUDAnative.cos calls the base cos function in nested broadcast (#102)
- CuSparseMatrixHYB * CuMatrix = nothing (#256)
- Strange reordering of struct fields with dynamic parallelism (#263)
- Performance: bias add (#298)
- CUDA 11 libraries incorrectly looked up in artifact (#300)
- CUTENSOR for windows (#301)
- Performance: sum (#302)
- Performance: getindex(a, i::Array{Int}) (#303)
- Display for CuArray within Tuples does not respect :limit=>true (#305)
- Performance: elementwise operations (#307)
- Performance: perceptron (#312)
- windows install error: isfile(__libcupti[]) (#324)
- std with dims is not type stable (#336)
Merged pull requests:
- Re-enable threading tests. (#25) (@maleadt)
- Reorganize and simplify some includes (#296) (@maleadt)
- Only run benchmarks on the master branch. (#297) (@maleadt)
- Optimizations for broadcast (#299) (@maleadt)
- Update manifest (#304) (@github-actions[bot])
- Test runner improvements for multigpu mode (#309) (@maleadt)
- Artifact improvements for CUDA 11 on Windows (#310) (@maleadt)
- Optimize element-wise operations (#313) (@maleadt)
- Check if reported GPU memory use is available. (#314) (@maleadt)
- Update artifacts: include cusolverMg, and use Yggdrasil binaries. (#315) (@maleadt)
- Specialization fixes for mapreducedim. (#316) (@maleadt)
- Fix invalid conversion of pointer to signed integer. (#317) (@maleadt)
- Work around (presumed) Windows driver bug in exception test. (#319) (@maleadt)
- Update manifest (#323) (@github-actions[bot])
- Bump CUDNN and CUTENSOR (#325) (@maleadt)
- Simplify NVML discovery. (#326) (@maleadt)
- Separate CURAND wrappers from Random impl. (#327) (@maleadt)
- Simplify discovering binaries by using Sys.which. (#328) (@maleadt)
- Add wrapper for NVML utilization rates. (#329) (@maleadt)
- Attach CUSPARSE docstrings to bare methods, not empty functions. (#331) (@maleadt)
- Eagerly reduce the amount of worker threads. (#332) (@maleadt)
- Bump dependencies. (#333) (@maleadt)
- Clean-up library wrappers [NFC] (#334) (@maleadt)
- Fix CUDNN v8 discovery and loading on Windows (#335) (@maleadt)
- Fix type stability of Statistics.var with dims. (#337) (@maleadt)
- Fix parameter alignment for dynamic parallelism. (#338) (@maleadt)
- Micro-optimize Base.fill. (#339) (@maleadt)