You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
x86 since ivy bridge (2012) and AMD jaguar (2013) can have the f16c extensions, I'm not sure if its optional feature, but you can use the cpuid instruction to find out if its present. Ideally this benchmark should be tested on machines that don't have the fp16c extension. Unfortunately I only have a 2 macOS machines that are that old.
Interestingly ryg performs somewhat better, I think its because the macOS 10.13 platform might guarantee sse4.2 is present and clang is able to better vectorize. I haven't really analyzed the asm yet.
There are more machines I ran the benchmark on my github project page.
ryg and maratyszcza seem to perform the best but it can vary a lot between compilers. They do a clever trick by using the floating point unit to do the rounding the the is needed for the conversion. I implemented a SSE2 version of maratyszcza that I think would a good fallback for x86_64.
Another test the benchmark runs is exact accuracy against hardware.
Imath doesn't do an exact hardware match of all NaN values. They are still NaN but different then hardware.
This article was published a few moths ago showing various methods for converting f32 to f16
https://www.corsix.org/content/converting-fp32-to-fp16
I had some existing code for checking accuracy against hardware that I've adapted it to benchmark them along with Imath's implementation and a few of my own.
https://github.com/markreidvfx/float2half_test
x86 since ivy bridge (2012) and AMD jaguar (2013) can have the f16c extensions, I'm not sure if its optional feature, but you can use the
cpuid
instruction to find out if its present. Ideally this benchmark should be tested on machines that don't have the fp16c extension. Unfortunately I only have a 2 macOS machines that are that old.Interestingly
ryg
performs somewhat better, I think its because the macOS 10.13 platform might guarantee sse4.2 is present and clang is able to better vectorize. I haven't really analyzed the asm yet.There are more machines I ran the benchmark on my github project page.
ryg
andmaratyszcza
seem to perform the best but it can vary a lot between compilers. They do a clever trick by using the floating point unit to do the rounding the the is needed for the conversion. I implemented a SSE2 version ofmaratyszcza
that I think would a good fallback for x86_64.Another test the benchmark runs is exact accuracy against hardware.
Imath doesn't do an exact hardware match of all NaN values. They are still NaN but different then hardware.
This can be fixed by changing
https://github.com/AcademySoftwareFoundation/Imath/blob/main/src/Imath/half.h#L401
to
The text was updated successfully, but these errors were encountered: