Add AVX512 accelerated 1D/3D LUTS #1932

markreidvfx · 2024-01-17T04:01:20Z

ocioperf.exe --transform tests/data/files/clf/lut1d_32f_example.clf

Line by Line Average, lut dim 65, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

ocioperf.exe --transform tests/data/files/clf/lut3d_preview_tier_test.clf

Line by Line Average, lut dim 33x33x33, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz

I've only been able to test on one machine with AVX512. Not exactly the performance gains I was hoping for. I'm still new to the instructions set, maybe there are some more optimizations we could do. There are quite a few AVX512 extensions. I've limited this implementation to just the AVX512F (foundation) instructions. That basically means any AVX512 capable CPU should be able run it.

Github actions use to have more intel CPU's with AVX512 available. Lately I've been getting only AMD EPYC CPU's without AVX512 for CI. I don't think there is anyway to request a specific cpu. This is very frustrating and will make this more difficult to maintain and test.

doug-walker

Thanks Mark!

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, it's surprising the timings are not faster. Maybe it's bound by memory accesses? I've seen cases where rearranging how the LUTs are stored in memory (at the cost of taking more space) resulted in a speed-up, though not sure if that would help here.

tests/cpu/CMakeLists.txt

src/OpenColorIO/ops/lut1d/Lut1DOpCPU_AVX512.cpp

tests/cpu/UnitTestMain.cpp

tests/cpu/AVX512_tests.cpp

src/OpenColorIO/AVX512.h

markreidvfx · 2024-01-22T04:16:22Z

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, the half float conversion instructions are all part of the AVX512F (Foundation) extension.

The exact overlap between AVX and AVX2 and F16c support has never been exactly clear to me. I think AVX2 pretty much guarantees F16c but I think its best to check with those extensions.

markreidvfx · 2024-01-30T07:34:03Z

I did a bit more perf testing of this with my old lut3d_perf tool

It also turns out that github runners on a private repos are different then the public repo ones. The private ones can have avx512.

I was able to test this pull request on windows with avx512 by setting up a private fork. I kinda used up all my free minutes for the month doing it but all the tests pass 😆

doug-walker

LGTM!

tests/cpu/AVX512_tests.cpp

Signed-off-by: Mark Reid <[email protected]>

markreidvfx · 2024-03-20T06:16:41Z

@remia I added your suggestion to all the SIMD tests. I also rebased on top of the current main.

doug-walker reviewed Jan 22, 2024

View reviewed changes

doug-walker approved these changes Mar 18, 2024

View reviewed changes

doug-walker requested a review from remia March 18, 2024 04:19

remia approved these changes Mar 18, 2024

View reviewed changes

tests/cpu/AVX512_tests.cpp Outdated Show resolved Hide resolved

markreidvfx added 9 commits March 19, 2024 21:54

Initial AVX512 support

4f6f8d7

Signed-off-by: Mark Reid <[email protected]>

Lut1DOp add AVX512 implementation

49f9ee4

Signed-off-by: Mark Reid <[email protected]>

Lut3DOp add AVX512 implementation

f35b33a

Signed-off-by: Mark Reid <[email protected]>

Don't use SIMD if only 1 pixel is requested

916f133

Signed-off-by: Mark Reid <[email protected]>

Remove #if, f16c is always available with AVX512

0c7ee25

Signed-off-by: Mark Reid <[email protected]>

Cast pointers to __m512 instead of __m256

3a69085

Signed-off-by: Mark Reid <[email protected]>

Use size method from vector being tested

ce3fcbd

Signed-off-by: Mark Reid <[email protected]>

Add to help message that f16c is only used with AVX/AVX2

15a41f9

Signed-off-by: Mark Reid <[email protected]>

Clarify test case by using uint8 maxValue

3e33b9c

Signed-off-by: Mark Reid <[email protected]>

markreidvfx force-pushed the avx512_v1 branch from 0a82724 to 3e33b9c Compare March 20, 2024 05:21

doug-walker merged commit 91e8826 into AcademySoftwareFoundation:main Mar 21, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX512 accelerated 1D/3D LUTS #1932

Add AVX512 accelerated 1D/3D LUTS #1932

markreidvfx commented Jan 17, 2024 •

edited

Loading

doug-walker left a comment

markreidvfx commented Jan 22, 2024 •

edited

Loading

markreidvfx commented Jan 30, 2024

doug-walker left a comment

markreidvfx commented Mar 20, 2024

Add AVX512 accelerated 1D/3D LUTS #1932

Add AVX512 accelerated 1D/3D LUTS #1932

Conversation

markreidvfx commented Jan 17, 2024 • edited Loading

doug-walker left a comment

Choose a reason for hiding this comment

markreidvfx commented Jan 22, 2024 • edited Loading

markreidvfx commented Jan 30, 2024

doug-walker left a comment

Choose a reason for hiding this comment

markreidvfx commented Mar 20, 2024

markreidvfx commented Jan 17, 2024 •

edited

Loading

markreidvfx commented Jan 22, 2024 •

edited

Loading