-
Notifications
You must be signed in to change notification settings - Fork 0
OpenCL sin Performance
Similar to the CPU test, we will first measure the performance while involving as little memory access as possible. Due to the complexity of the OpenCL driver, we will also measure some overhead related to scheduling a job from the CPU, waiting for the previous dependent job to finish and to run a dummy kernel. These numbers should give us an idea about how much job we need to schedule to avoid being bottlenecked by these overhead. The roundtrip time for scheduling a single dummy kernel will also give us an upper bound on the overhead latency we should expect though we still need to measure that in a more realistic setting later.
In order to force the CPU to do computation without accessing memory,
we use asm volatile
to create a dummy use of the result on the CPU.
AFAICT, we do not have anything as direct as this in OpenCL so we need to find another way.
In this test, we do this by storing the result to memory behind a branch that will never
be taken at the runtime. We also make the condition of the store depend on the calculated value
such that the compiler will not be able to move the computation into the same branch.
More specifically, we have something similar to
float res = amp * sin(...);
if (res > threshold) {
// store `res` to memory
}
As long as we pass in an amp
that is significantly smaller than threshold
the branch will never be taken and the compiler will generally not optimize this case out.
As mentioned in the accuracy test
we will test both sin
and native_sin
.
For each tests, including the dummy one mentioned above,
we will vary the dimention we run each kernel on and the number of repitition
we schedule this in the command queue.
We do this for a command queue that is either in order or out of order.
For the computation test (i.e. not dummy), we also vary the number of times we evaluate
the sin
/native_sin
function inside the kernel to minimize the effect of
the kernel overhead on the measurement.
The full code for the test can be found in opencl-dry-compute.cpp
and the results can be found under data/cl-dry-compute
.
As mentioned before, we have three different platforms to test.
-
Intel OpenCL CPU runtime
-
i7-6700K
-
i9-10885H
-
-
Intel Compute OpenCL runtime
-
UHD 530
-
UHD 640
-
-
AMD ROCm OpenCL driver
AMD Radeon RX 5500 XT