-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace SYCL 1.2.1 group barrier with SYCL 2020 alternative #1679
Comments
sycl::group_barrier
did not used in __group_barrier(_Item __item)
There are two major reasons of using the outdated barrier API:
I suppose that the most important reason is 1. However, oneDPL is claimed to be SYCL 2020 conformant, so SYCL 2020 group_barrier should be used, at least for the compilers other than oneAPI DPC++ compiler. |
Let's finally clarify the question of semantics. The
In other words, it serves as both a barrier for work items and as a memory fence (with unclear ordering semantics, but at least acquire-release as it seems from the description) for operations within local (i.e. work group) memory scope. The work-group barrier
In other words, it serves as both the barrier and the acquire-release memory fence in the specified memory scope, which is the group scope The differences I observe are:
All in all, it seems that the new I think the comment telling about them being "not quite equivalent" is there either because earlier versions of SYCL 2020 did not provide enough clarity or because the barrier memory ordering semantics of 1.2.1 were confused with those of atomics, for which 1.2.1 only supported relaxed memory ordering. But I do not believe that the relaxed ordering would satisfy the described "complete before" requirement. |
As far as I understand, another difference (and the reason for performance drop mentioned in the linked issues) is that the old version "Executes a work-group barrier with memory ordering on the local address space", while the new version affects all memory operations, in both local and global address spaces (but both only within work-group/local scope):
|
Thanks @al42and - indeed, this is an important difference that I missed, and that impacts performance. If that difference is important for oneDPL code, then we should make it visible - either in the oneDPL wrapper name or maybe with a template parameter - that this barrier orders operations only for data in local memory but does not order global data accesses. |
The comment intel/llvm#12531 (comment) refers to a device compiler bug that has been fixed (intel/intel-graphics-compiler@ed639f6) and that should improve performance. Despite the subtle semantical difference, we still need to drop the use of the outdated API. |
oneDPL/include/oneapi/dpl/pstl/hetero/dpcpp/sycl_defs.h
Line 170 in 470df99
Please see details in intel/llvm#12531
Please see details in https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_group_broadcast
The text was updated successfully, but these errors were encountered: