Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fp32 to f16 benchmarking #343

Open
markreidvfx opened this issue Aug 4, 2023 · 0 comments
Open

fp32 to f16 benchmarking #343

markreidvfx opened this issue Aug 4, 2023 · 0 comments

Comments

@markreidvfx
Copy link

markreidvfx commented Aug 4, 2023

This article was published a few moths ago showing various methods for converting f32 to f16
https://www.corsix.org/content/converting-fp32-to-fp16

I had some existing code for checking accuracy against hardware that I've adapted it to benchmark them along with Imath's implementation and a few of my own.
https://github.com/markreidvfx/float2half_test

image

image

x86 since ivy bridge (2012) and AMD jaguar (2013) can have the f16c extensions, I'm not sure if its optional feature, but you can use the cpuid instruction to find out if its present. Ideally this benchmark should be tested on machines that don't have the fp16c extension. Unfortunately I only have a 2 macOS machines that are that old.

image

Interestingly ryg performs somewhat better, I think its because the macOS 10.13 platform might guarantee sse4.2 is present and clang is able to better vectorize. I haven't really analyzed the asm yet.

There are more machines I ran the benchmark on my github project page.

ryg and maratyszcza seem to perform the best but it can vary a lot between compilers. They do a clever trick by using the floating point unit to do the rounding the the is needed for the conversion. I implemented a SSE2 version of maratyszcza that I think would a good fallback for x86_64.

Another test the benchmark runs is exact accuracy against hardware.

image

Imath doesn't do an exact hardware match of all NaN values. They are still NaN but different then hardware.
image

This can be fixed by changing
https://github.com/AcademySoftwareFoundation/Imath/blob/main/src/Imath/half.h#L401

return ret | (uint16_t) m | (uint16_t) (m == 0);

to

return ret | (uint16_t) m | 0x0200
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant