-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorFlow tf.matmul
ends up using CPU backend for 32bit floats
#14120
TensorFlow tf.matmul
ends up using CPU backend for 32bit floats
#14120
Comments
Just to clarify; these example matrices are tiny, but there doesn't seem to be a case where it dispatches based on size. a = tf.constant(np.random.rand(10000, 10000), dtype=tf.float32)
b = tf.constant(np.random.rand(10000, 10000), dtype=tf.float32)
c = tf.matmul(a,b) still uses the CPU |
On containers installed through pytorch/pytorch on docker hub or NVIDIA NGC the output for 16, 32 and 64 bit the result always seem to be
However, installing TF2.5.0 with conda through an overlay leads to:
|
This seem to get solved by easybuilders/easybuild-easyblocks#2583 according to initial testing. |
It seems like there's a runtime switch
With
|
This was brought up in the slack, but it seems our TF CUDA builds ends up using _MklMatMul on the CPU when using 32 bit floats.
In my testing, version 2.2.0, 2.3.1, 2.4.1, 2.5.0 all have this problem, example output looks like
TensorFlow 2.6.0 seems to work correctly (though I have not extensively tested all type of operations).
Using 16bit or 64bit floats they all use the GPU;
Containers with TF doesn't seem to have this issue, so, it's something special to our builds. Perhaps the MKL stuff should be disabled somehow for CUDA builds?
The text was updated successfully, but these errors were encountered: