Releases: IntelPython/numba-dpex
Releases · IntelPython/numba-dpex
0.23.0
Fixed
- Array alignment problem for stack arrays allocated for kernel arguments. (#1357)
- Issue #892, #906 caused by incorrect code generation for indexing (#1377)
- Generation of
KernelHasReturnValueError
error insideKernelDispatcher
. (#1394) - Issue #1390: broken support for slicing into
dpctl.tensor.usm_ndarray
in kernels (#1425) - Support for Wheels package on Windows (#1430)
- Incorrect mangled name for kernel function arguments (#1443)
- Remove artifacts from conda/wheel packages residing in root level (#1450)
- GDB tests to work properly on Intel Max GPU (#1451)
- Improper wheels installation on unsupported platforms (#1452)
- Ref-counting of Python object temporaries in unboxing code (#1454)
- Segfault caused by using
malloc
to allocateNRT_MemInfo
. Replaced with Numba's NRTalloc
(#1458) - Incorrect package name in README.md (#1463)
Added
- A new overloaded
dimensions
attribute for all index-space id classes (#1359) - Support for
AtomicRef
creation using multi-dimensional arrays (#1367) - Support for linearized indexing functions inside a JIT compiled kernel (#1368)
- Improved documentation: overview (#1341), kernel programming guide (#1388), API docs (#1414), configs options (#1415), comparison with SYCL API (#1417)
- New
PrivateArray
class inkernel_api
to replacedpex.private.array
(#1370, #1377) - Support for libsycinterface::DPCTLKernelArgType enum for specifying type of kernel args instead of hard coding (#1382)
- New indexing unit tests for kernel_api simulator and JIT compiled modes (#1378)
- New unit tests to verify all
kernel_api
features usable insidedevice_func
(#1391) - A
sycl::local_accessor
-like API (kernel_api.LocalAccessor
) for numba-dpex kernel (#1331) - Specialization support for
device_func
decorator (#1398) - Support for all
kernel_api
functions inside thenumba_dpex.kernel
decorator. (#1400) - Support for dpnp 0.15 (#1434, #1464)
- Improvements to pyproject.toml configs to build numba-dpex from source. (#1449)
- Load the
SPV_INTEL_variable_length_array
SPIR-V extension to supporting arrays in private address-space on Intel Max GPU. (#1451)
Changed
- Default inline threshold value set to
2
fromNone
. (#1385) - Port parfor kernel templates to
kernel_api
(#1416), (#1424) - Use
SPIRVKernelDispatcher
for parfor kernel dispatch (#1435, #1448) - All examples use the latest dpctl API (#1431)
- Minimum required dpctl version is now 0.16.1
- Minimum required numba version is now 0.59.0 (#1462)
Removed
- OpenCL-like kernel API functions (#1420)
func
decorator (replaced bydevice_func
) (#1400)numba_dpex.experimental.kernel
andnumba_dpex.experimental.device_func
(#1400)
New Contributors
- @AndresGuzman-Ballen made their first contribution in #1447
0.22.1
Fixed
- Bug in boxing a DpnpNdArray from parent (#1155)
- Strided layouts and F-contiguous layouts supported in experimental kernel (#1178)
- Barrier call code-generation on OpenCL CPU devices (#1280, #1310)
- Importing numba-dpex can break numba execution (#1267)
- Overhead on launching numba_dpex.kernel functions (#1236)
Added
- Support for dpctl.SyclEvent data type inside dpjit (#1134)
- Support for kernel_api.Range and kernel_api.NdRange inside dpjit (#1148)
- DPEX_OPT: a numba-dpex-specific optimization level config option (#1158)
- Uploading wheels packages to anaconda (#1160)
- flake8 eradicate linter option (#1177)
- Support dpctl.SyclEvent.wait call inside dpjit (#1179)
- Creation of sycl event and queue inside dpjit (#1193, #1190, #1218)
- Experimental kernel dispatcher for kernel compilation (#1178, #1205)
- Added experimental target context for SPIRV codegen (#1213, #1225)
- GDB test cases in public CI (#1209)
- Async kernel submission option (#1219, #1249)
- A new literal type to store IntEnum as Literal types (#1227)
- SYCL-like memory enum classes to the experimental module (#1239)
- call_kernel function to launch kernels (#1260)
- Experimental overloads for an AtomicRef class and fetch_* methods (#1257, #1261)
- New device-specific USMNdArrayModel for USMNdArray and DpnpNdArray types (#1293)
- Experimental atomic load, store and exchange operations (#1297)
- Kernel_api module to simulate kernel functions in pure Python (#1304, #1326)
- Experimental implementation of group barrier operation (#1280)
- Experimental atomic compare_exchange implementation (#1312)
- Experimental group index class (#1310)
- OpenSSF scorecard (#1320)
- Experimental feature index overload methods (#1323)
- Experimental feature group index overload methods (#1330)
- API Documentation for kernel API (#1332)
Changed
- Switch to dpc++ compiler for building numba-dpex (#1210)
- Versioneer and pytest configs into pyproject.toml (#1212)
- numba-dpex can be imported even if no SYCL device is detected by dpctl (#1272)
Removed
- Kernel launch params as lists/tuple. Only Range/NdRange supported (#1251)
- DEFAULT_LOCAL_SIZE global constant (#1291)
- Functions to invoke spirv-tools utilities from spirv_generator (#1292)
- Incomplete vectorize decorator from numba-dpex (#1298)
- Support for Numba 0.57 (#1307)
Deprecated
- OpenCL-like kernel API fucntions in numba_dpex.ocldecl module
0.21.4
0.21.3
Fixed
Added
- Support tests on single point precision GPUs (#1143)
- Initial work on Coverity scan CI (#1128)
- Python 3.11 support (#1123)
- Security policy (#1117)
- scikit-build to build native extensions (#1107, #1116, #1127, #1139, #1140)
Changed
- Rename helper function to clearly indicate its usage (#1145)
- The data model used by the DpnpNdArray type for kernel functions(#1118)
Removed
- Support for Python 3.8 (#1113)
0.21.2
0.21.1
0.21.0 Clean House
Added
- Support addition and multiplication-based prange reduction loops (#999)
- Proper boxing, unboxing of dpctl.SyclQueue objects inside dpjit decorated functions (#963, #1064)
- Support for
queue
keyword arguments inside dpnp array constructors in dpjit (#1032) - Overloads for dpnp array constructors: dpnp.full (#991), dpnp.full_like (#997)
- Support for complex64 and complex128 types as kernel arguments and in parfors (#1033, #1035)
- New config to run the ConstantSizeStaticLocalMemoryPass optionally (#999)
- Support for Numba 0.57 (#1030, #1003, #1002)
- Support for Python 3.11 (#1054)
- Support for SPIRV 1.4 (#1056, #1060)
Changed
- Parfor lowering happens using the kernel pipeline (#996)
- Minimum required Numba version is now 0.57 (#1030)
- Numba monkey patches are moved to numba_dpex.numba_patches (#1030)
- Redesigned unit test suite (#1018, #1017, #1015, #1036, #1037, #1072)
Fixed
- Fix stride computation when unboxing a dpnp array (#1023)
- Using cached queue instead of creating new one on type inference (#946)
- Fixed bug in reduction mul operation for dpjit (#1048)
- Offload of parfor nodes to OpenCL UHD GPU devices (#1074)
Removed
- Support for offloading NumPy-based parfor nodes to SYCL devices (#1041)
- Removed rename_numpy_functions_pass (#1041)
- Dpnp overloads using stubs (#1041, #1025)
- Support for
like
keyword argument in dpnp array constructor overloads (#1043) - Support for NumPy arrays as kernel arguments (#1049)
- Kernel argument access specifiers (#1049)
- Support for dpctl.device_context to launch kernels and njit offloading (#1041)
0.20.1
Added
*Replaced llvm_spirv from oneAPI path by dpcpp-llvm-spirv package.(#979)
*Added Dockerfile and a manual workflow to publish pre-built packages to the repo.(#973)
Fixed
*Fixed default dtype derivation when creating a dpnp.ndarray. (#993)
*Adjusted test_windows step to work with intel-opencl-rt=2023.1.0. (#990)
*Fixed layout in dpnp overload.(#987)
*Handled the case when arraystruct->meminfo is null to close gh-965. (#972)
0.20.0 Phoenix
Added
- New dpjit decorator supporting dpnp compilation (#887)
- Boxing and unboxing functionality for dpnp.ndarray to numba_dpex (#902)
- New DpexTarget and dispatcher for compiling dpnp using numba-dpex (#887)
- Overload implementation for dpnp.empty (#902)
- Overload implementation for dpnp.empty_like, dpnp.zeros_like and
dpnp.ones_like inside dpjit (#928) - Overload implementation for dpnp.zeros and dpnp.ones inside dpjit (#923)
- Compilation and offload support for dpnp vector style expressions using Numba
parfors (#957) - Compilation of over 70 ufuncs for dpnp inside dpjit (#957)
- Backported the split parfor pass from upstream Numba. (#949)
- Numba type aliases to numba_dpex. (#851)
- Numba prange alias inside numba_dpex. (#957)
- New LRU cache for kernels (#804) and funcs (#877)
- New Range and NdRange classes for kernel submission that follow sycl's range
and ndrange syntax. (#888) - Monkey pacthes to Numba 0.56.4 to support dpnp ufuncs, allocating dpnp
arrays (#954) - New config flag (NUMBA_DPEX_DUMP_KERNEL_LLVM) to dump a kernel's
LLVM IR (#924) - A badge to our gitter chatroom (#919)
- A small script to update copyright headers (#917)
- A new dpexrt_python extension to support USM allocators for Numba
NRT_MemInfo (#902) - Updated examples for kernel API demonstrating compute-follows-data programming
model. (#826)
Changed
CLK_GLOBAL_MEM_FENCE
andCLK_LOCAL_MEM_FENCE
flags renamed to
GLOBAL_MEM_FENCE
andLOCAL_MEM_FENCE
. (#844)- Switched from Ubuntu-latest to Ubuntu-20.04 for conda package build (#836)
- Rename USMNdArrayType to USMNdArray (#851)
- Changes to the Numba type to represent dpnp ndarray typess now renamed to
DpnpNdarray (#880) - Improved exceptions and user errors (#804)
- Updated internal API for kernel interface with improved support for
__sycl_usm_array_interface__
protocol (#804) - Pin generated spirv version for kernels to 1.1 (#885)
- Rename DpexContext and DpexTypingContext to DpexKernelTarget and
DpexKernelTypingContext (#887) - Renamed existing dpnp overloads that used stubs to dpnp_stubs_impl.py (#953)
- Dpctl version requirement mismatch is now a warning and not an
ImportError (#925) - Update to versioneer 0.28 (#827)
- Update to dpctl 0.14 (#858)
- Update linters: black to 23.1.0, isort to 5.12.0 (#900)
- License in setup.py to match actual project licensing (#904)
Fixed
- Kernel specialization, compute follows data programming model for
kernels (#804) - Dispatcher/caching rewrite to address performance regression (#912, #896)
- func decorator qualname ambiguation fix (#905)
Removed
- Removes the numpy_usm_shared module from numba_dpex. (#841)
- Removes the usage of llvmlite.llvmpy (#932)