Summary
This release reaches an important milestone by making offloading fully asynchronous. Calls to dpnp
submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish. The sequential semantics a user comes to expect from execution of Python script is preserved though.
In addition, this release completes implementation of dpnp.fft
module and adds several new array manipulation, indexing and elementwise routines. Moreover, it adds support to build dpnp
for Nvidia GPUs.
DPNP is now compatible with NumPy 2.0.
Details
Added
- Added implementation of
dpnp.gradient
function #1859 - Added implementation of
dpnp.sort_complex
function #1864 - Added implementation of
dpnp.fft.fft
anddpnp.fft.ifft
functions #1879 - Added implementation of
dpnp.isneginf
anddpnp.isposinf
functions #1888 - Added implementation of
dpnp.fft.fftfreq
anddpnp.fft.rfftfreq
functions #1898 - Added implementation of
dpnp.fft.fftshift
anddpnp.fft.ifftshift
functions #1900 - Added implementation of
dpnp.isreal
,dpnp.isrealobj
,dpnp.iscomplex
, anddpnp.iscomplexobj
functions #1916 - Added support to build
dpnp
for Nvidia GPU #1926 - Added implementation of
dpnp.fft.rfft
anddpnp.fft.irfft
functions #1928 - Added implementation of
dpnp.nextafter
function #1938 - Added implementation of
dpnp.trim_zero
function #1941 - Added implementation of
dpnp.fft.hfft
anddpnp.fft.ihfft
functions #1954 - Added implementation of
dpnp.logaddexp2
function #1955 - Added implementation of
dpnp.flatnonzero
function #1956 - Added implementation of
dpnp.float_power
function #1957 - Added implementation of
dpnp.fft.fft2
,dpnp.fft.ifft2
,dpnp.fft.fftn
, anddpnp.fft.ifftn
functions #1961 - Added implementation of
dpnp.array_equal
anddpnp.array_equiv
functions #1965 - Added implementation of
dpnp.nan_to_num
function #1966 - Added implementation of
dpnp.fix
function #1971 - Added implementation of
dpnp.fft.rfft2
,dpnp.fft.irfft2
,dpnp.fft.rfftn
, anddpnp.fft.irfftn
functions #1982 - Added implementation of
dpnp.argwhere
function #2000 - Added implementation of
dpnp.real_if_close
function #2002 - Added implementation of
dpnp.ndim
anddpnp.size
functions #2014 - Added implementation of
dpnp.append
anddpnp.asarray_chkfinite
functions #2015 - Added implementation of
dpnp.array_split
,dpnp.split
,dpnp.hsplit
,dpnp.vsplit
, anddpnp.dsplit
functions #2017 - Added runtime dependency on
intel-gpu-ocl-icd-system
package #2023 - Added implementation of
dpnp.ravel_multi_index
anddpnp.unravel_index
functions #2022 - Added implementation of
dpnp.resize
anddpnp.rot90
functions #2030 - Added implementation of
dpnp.require
function #2036
Changed
- Extended pre-commit pylint check to
dpnp.fft
module #1860 - Reworked
vm
vector math backend to reusedpctl.tensor
functions around unary and binary functions #1868 - Extended
dpnp.ndarray.astype
method to supportdevice
keyword argument #1870 - Improved performance of
dpnp.linalg.solve
by implementing a dedicated kernel for its batch implementation #1877 - Extended
dpnp.fabs
to supportorder
andout
keyword arguments by writing a dedicated kernel for it #1878 - Extended
dpnp.linalg
module to supportusm_ndarray
as input #1880 - Reworked
dpnp.mod
implementation to be an alias fordpnp.remainder
#1882 - Removed the legacy implementation of linear algebra functions from the backend #1887
- Removed the legacy implementation of elementwise functions from the backend #1890
- Extended
dpnp.all
anddpnp.any
to supportout
keyword argument #1893 - Reworked
dpnp.repeat
to add a explicit type check of input array #1894 - Improved performance of different functions by adopting asynchronous implementation of
dpctl
#1897 - Extended
dpnp.fmax
anddpnp.fmin
to supportorder
andout
keyword arguments by writing dedicated kernels for them #1905 - Removed the legacy implementation of array creation and manipulation functions from the backend #1903
- Extended
dpnp.extract
implementation to align with NumPy #1906 - Reworked backend implementation to align with non-backward compatible changes in DPC++ 2025.0 #1907
- Removed the legacy implementation of indexing functions from the backend #1908
- Extended
dpnp.take
implementation to align with NumPy #1909 - Extended
dpnp.place
implementation to align with NumPy #1912 - Reworked the implementation of indexing functions to avoid unnecessary casting to
dpnp_array
when input isusm_ndarray
#1913 - Reduced code duplication in the implementation of sorting functions #1914
- Removed the obsolete dparray interface #1915
- Improved performance of
dpnp.linalg
module for BLAS routines by adopting asynchronous implementation ofdpctl
#1919 - Relocated
dpnp.einsum
utility functions to a separate file #1920 - Improved performance of
dpnp.linalg
module for LAPACK routines by adopting asynchronous implementation ofdpctl
#1922 - Reworked
dpnp.matmul
to allow larger batch size to be used #1927 - Removed data synchronization where it is not needed #1930
- Leveraged
dpctl.tensor
implementation fordpnp.where
to support scalar as input #1932 - Improved performance of
dpnp.linalg.eigh
by implementing a dedicated kernel for its batch implementation #1936 - Reworked
dpnp.isclose
anddpnp.allclose
to comply with compute follows data approach #1937 - Extended
dpnp.deg2rad
anddpnp.radians
to supportorder
andout
keyword arguments by writing dedicated kernels for them #1943 dpnp
uses pybind11 2.13.1 #1944- Extended
dpnp.degrees
anddpnp.rad2deg
to supportorder
andout
keyword arguments by writing dedicated kernels for them #1949 - Extended
dpnp.unwrap
to support all keyword arguments provided by NumPy #1950 - Leveraged
dpctl.tensor
implementation fordpnp.count_nonzero
function #1962 - Leveraged
dpctl.tensor
implementation fordpnp.diff
function #1963 - Leveraged
dpctl.tensor
implementation fordpnp.take_along_axis
function #1969 - Reworked
dpnp.ediff1d
implementation through existing functions instead of a separate kernel #1970 - Reworked
dpnp.unique
implementation through existing functions whenaxis
is given otherwise through leveragingdpctl.tensor
implementation #1972 - Improved performance of
dpnp.linalg.svd
by implementing a dedicated kernel for its batch implementation #1936 - Leveraged
dpctl.tensor
implementation forshape.setter
method #1975 - Extended
dpnp.ndarray.copy
to support compute follow data keyword arguments #1976 - Reworked
dpnp.select
implementation through existing functions instead of a separate kernel #1977 - Leveraged
dpctl.tensor
implementation fordpnp.from_dlpack
anddpnp.ndarray.__dlpack__
functions #1980 - Reworked
dpnp.linalg
module backend implementation for BLAS rouitnes to work with OneMKL interfaces #1981 - Reworked
dpnp.ediff1d
implementation to reduce code duplication #1983 dpnp
can be used with any NumPy from 1.23 to 2.0 #1985- Reworked
dpnp.unique
implementation to properly handle NaNs values #1972 - Removed
dpnp.issubcdtype
per NumPy 2.0 recommendation #1996 - Reworked
dpnp.unique
implementation to align with NumPy 2.0 #1999 - Reworked
dpnp.linalg.solve
backend implementation to work with OneMKL Interfaces #2001 - Reworked
dpnp.trapezoid
implementation through existing functions instead of falling back on NumPy #2003 - Added
copy
keyword todpnp.array
to align with NumPy 2.0 #2006 - Extended
dpnp.heaviside
to supportorder
andout
keyword arguments by writing dedicated kernel for it #2008 dpnp
uses pybind11 2.13.5 #2010- Added
COMPILER_VERSION_2025_OR_LATER
flag to be able to rundpnp.fft
module with both 2024.2 and 2025.0 versions of the compiler #2025 - Cleaned up an implementation of
dpnp.gradient
by removing obsolete TODO which is not going to be done #2032 - Updated
Array Manipulation Routines
page in documentation to add missing functions and to remove duplicate entries #2033 dpnp
uses pybind11 2.13.6 #2041- Updated
dpnp.fft
backend to depend onINTEL_MKL_VERSION
flag to ensures that the appropriate code segment is executed based on the version of OneMKL #2035 - Use
dpctl::tensor::alloc_utils::sycl_free_noexcept
instead ofsycl::free
inhost_task
tasks associated with life-time management of temporary USM allocations #2058 - Improved implementation of
dpnp.kron
to avoid unnecessary copy for non-contiguous arrays #2059 - Updated the test suit for
dpnp.fft
module #2071 - Reworked
dpnp.clip
implementation to align with Python Array API 2023.12 specification #2048 - Skipped outdated tests for
dpnp.linalg.solve
due to compatibility issues with NumPy 2.0 #2074 - Updated installation instructions #2098
Fixed
- Resolved an issue with
dpnp.matmul
when an f_contiguousout
keyword is passed to the the function #1872 - Resolved a possible race condition in
dpnp.inv
#1940 - Resolved an issue with failing tests for
dpnp.append
when running on a device without fp64 support #2034 - Resolved an issue with input array of
usm_ndarray
passed intodpnp.ix_
#2047 - Added a workaround to prevent crash in tests on Windows in internal CI/CD (when running on either Lunar Lake or Arrow Lake) #2062
- Fixed a crash in
dpnp.choose
caused by missing control of releasing temporary allocated device memory #2063 - Resolved compilation warning and error while building in debug mode #2066
- Fixed an issue with asynchronous execution in
dpnp.fft
module #2067
Full Changelog: 0.15.0...0.16.0