-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL Backend NUMA Issues #8
Comments
@bhack |
Yes it is something not related to Numa but to consider on some Intel device. Also interesting https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance |
@bhack |
And also APU systems like AMD HSA Kaveri and Intel Broadwell, as this PDF points out, yup. |
Excerpt from my current thesis:
An issue that came up testing the OpenCL hybrid backend was
that the performance did not scale as expected with systems that have more than
one CPU. Such systems have non-unified memory access (NUMA) because the
CPUs share one address space for memory, but every processor has its own cache
and memory interface. Accessing data across the other CPU comes with a large
performance penalty. Compute kernels, such as the matrix-matrix multiplication
in the BLAS library or the custom OpenCL kernels, cause the threads to work on
adjacent data. This means a write operation of one CPU is likely to invalidate
cache lines across both CPUs. At this point, the synchronization overhead seems
to become larger than any speedup of having additional cores working on the al-
gorithms.
To get the expected speedup, the two (or more) processors need to be presented to the Caffe
library as separate devices. Then the library can be used in two individual
instances. As the OpenCL hybrid backend uses two separate parallelization mech-
anisms (OpenCL kernels and a parallelized BLAS), two solutions would need to
be applied:
library does not show NUMA issues.
using device fission. The splitting rule needs to be that all cores belonging
to one processor (tested by cache affinity) are tied to the same sub-device.
Only one is then used per Caffe instance. Device fission is an extension to
OpenCL that is already available (cl_ext_fission).
The text was updated successfully, but these errors were encountered: