Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some rounding error issue between CPU & GPU #23

Closed
haampie opened this issue Apr 25, 2021 · 6 comments
Closed

Some rounding error issue between CPU & GPU #23

haampie opened this issue Apr 25, 2021 · 6 comments

Comments

@haampie
Copy link
Contributor

haampie commented Apr 25, 2021

I've tried on MultiFloats.jl on the GPU, but I'm getting loss of precision compared to the CPU:

using CUDA, MultiFloats
A = rand(Float64x8, 100, 100)
B = rand(Float64x8, 100, 100)
A * B - Array(CuArray(A) * CuArray(B))

Gives me

100×100 Matrix{MultiFloat{Float64, 8}}:
 -1.23827e-98    9.35263e-99  -8.83181e-99  …  -4.70324e-99   -1.3348e-98
 -1.98421e-99    8.20389e-99   1.67043e-98      1.45499e-98    2.32225e-98
 -2.77264e-99   -3.30951e-99   1.32426e-98     -1.09181e-98    7.84157e-100
  1.92544e-98    6.35776e-99  -8.85547e-99      1.29435e-98   -4.89252e-99
 -5.52038e-99    5.35901e-99  -3.705e-98        1.53947e-99    7.38954e-99
 -2.16904e-98    1.64505e-98  -1.16536e-98  …  -3.19036e-98    7.5397e-99
  6.72487e-98    6.07349e-99  -2.87359e-98      ...

but eps(Float64x8) is 5.9091063153828709e-126.

What explain this? The order of iteration?

@dzhang314
Copy link
Owner

Thanks for the find @haampie ! This is very mysterious to me. I haven't played with Julia/CUDA interop before, so I'm unsure how Julia code gets compiled for GPU execution. The fact that most of the limbs are accurate is especially puzzling; if it was simply the case that CUDA's arithmetic/fma operations are improperly rounded, then you would expect all of limbs 2-8 to be garbage. But the fact that limbs 1-6 are right, while limbs 7-8 are wrong, rules out all of my easy hypotheses for what could be going wrong.

I'll look into this the next time I work on MultiFloats.jl (which honestly might take a while... grad student life has me swamped these days)

@kunyuan
Copy link

kunyuan commented Mar 7, 2022

Just to add another data point. I tried exactly the same code on my GPU (1080ti), I am getting the accuracy 1e-114. Is this the expected accuracy level?

100×100 Matrix{MultiFloat{Float64, 8}}:
-1.14468e-115 2.54748e-115 -6.94563e-115 -4.96936e-115 2.13842e-115 8.64325e-115 1.6073e-115 … 3.4182e-115 6.56281e-115 3.55884e-115 4.6424e-115 -1.1596e-115 2.23816e-115
...

@dzhang314
Copy link
Owner

Hey @kunyuan, thanks for your interest in MultiFloats.jl! No, this is not the expected accuracy, and I'm afraid to report I still don't understand what's going on with GPU MultiFloat calculations. I'll report back when I have time to take a look at this in detail.

@orkolorko
Copy link

Hi @dzhang314, do you know CAMPARY?
CAMPARY
It is a library whose idea is similar to your Multifloat library but made to work on the GPU.
I did a reimplementation in Julia some times ago, on the CPU, so quite similar to your library.
If you're interested I can try to take off some dust and compare with Multifloat...

@rguerrab
Copy link

Is this still an issue in v2?

@dzhang314
Copy link
Owner

dzhang314 commented Sep 26, 2024

@rguerrab I'm not sure about the current status of this issue, since I don't do any Julia GPU development. I originally wanted to investigate this myself, but after a few years, I've never found the opportunity or resources.

What I can tell you is that MultiFloats.jl does not contain any GPU-specific code -- it is written purely in terms of Float64 arithmetic operations (including FMA) that are specified by IEEE 754 to return identical results on any standards-conforming CPU or GPU. Therefore, any discrepancy between CPU and GPU indicates a fault in either the hardware or the compiler toolchain, not MultiFloats.jl itself. This was always true in MultiFloats.jl v1.0 and remains true in v2.0.

I'm going to close this issue until someone can demonstrate that this is a fault in MultiFloats.jl as opposed to, say, an overly-aggressive optimization pass in CUDA.jl. Note that the algorithms in MultiFloats.jl are all sensitive to rounding mode (must be round-to-nearest, ties-to-even) and contraction (rounding must occur exactly in the places where IEEE 754 specifies it, no more and no less). If you have an aggressive optimizing compiler that doesn't strictly preserve IEEE 754 semantics, then MultiFloats.jl will silently and catastrophically break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants