-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Low accuracy on Arc A380 #5
Comments
@ip2016 thanks for reporting this issue. We will have a look, but first try in other Intel GPU, we can't reproduce this issue, will try on A380 also. |
@ip2016 May I ask what kind of CPU you’re using? We have tested your example on our HW platforms. Results show that your CPU results seem a little weird. |
Hello @Tengfei09 Thanks for your response.
|
I did some additional testing, still getting incompatible results sometimes. Here is one which is reproducible:
Running the same code with tensorflow-cpu, I'm getting:
Virtual environment version:
|
I changed code a bit and now CPU result from the tensorflow intel plugin matches the result from tensorflow-cpu, but not the GPU result.
|
@ip2016 , good to see your latest result. For float point, I think this is reasonable, we can't expect bit-by-bit same in float point arithmetic, normally, we use relative tolerance and absolute tolerance to compare float point, here the tolerance is less than 1e-6 which is reasonable to me.
|
@yiqianglee Thanks for your input. This is how it runs on CPU: (in progress)
And this is a XPU run (in progress):
I also notices it runs much slower on GPU. The "train" code is below (a simple example from huggingface):
|
@ip2016 Thanks for reporting, we can reproduce it now, working on fix. |
Thanks for the fast fix. I have 2 more issues that I'm not sure if these are bugs or limitations.
|
@ip2016 |
It seems like XPU calculation accuracy deteriorates in 4th-5th digit after the dot on common math operation.
Here is the sample code:
Which yields the following results:
System:
Asrock A380
Ubuntu 22.04 (kernel 5.17.0-1019-oem)
The text was updated successfully, but these errors were encountered: