Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX512 accelerated 1D/3D LUTS #1932

Merged
merged 9 commits into from
Mar 21, 2024

Conversation

markreidvfx
Copy link
Contributor

@markreidvfx markreidvfx commented Jan 17, 2024

ocioperf.exe --transform tests/data/files/clf/lut1d_32f_example.clf

Line by Line Average, lut dim 65, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
image

ocioperf.exe --transform tests/data/files/clf/lut3d_preview_tier_test.clf

Line by Line Average, lut dim 33x33x33, 3840x2160 image, Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
image

I've only been able to test on one machine with AVX512. Not exactly the performance gains I was hoping for. I'm still new to the instructions set, maybe there are some more optimizations we could do. There are quite a few AVX512 extensions. I've limited this implementation to just the AVX512F (foundation) instructions. That basically means any AVX512 capable CPU should be able run it.

Github actions use to have more intel CPU's with AVX512 available. Lately I've been getting only AMD EPYC CPU's without AVX512 for CI. I don't think there is anyway to request a specific cpu. This is very frustrating and will make this more difficult to maintain and test.

Copy link
Collaborator

@doug-walker doug-walker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark!

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, it's surprising the timings are not faster. Maybe it's bound by memory accesses? I've seen cases where rearranging how the LUTs are stored in memory (at the cost of taking more space) resulted in a speed-up, though not sure if that would help here.

tests/cpu/CMakeLists.txt Show resolved Hide resolved
src/OpenColorIO/ops/lut1d/Lut1DOpCPU_AVX512.cpp Outdated Show resolved Hide resolved
tests/cpu/UnitTestMain.cpp Outdated Show resolved Hide resolved
tests/cpu/AVX512_tests.cpp Outdated Show resolved Hide resolved
tests/cpu/AVX512_tests.cpp Outdated Show resolved Hide resolved
src/OpenColorIO/AVX512.h Outdated Show resolved Hide resolved
src/OpenColorIO/AVX512.h Outdated Show resolved Hide resolved
@markreidvfx
Copy link
Contributor Author

markreidvfx commented Jan 22, 2024

I'd like to clarify the F16C option in relation to this. I guess if AVX512 is supported then we should assume F16C is always supported too, right? I have a few comments below related to this.

Yes, the half float conversion instructions are all part of the AVX512F (Foundation) extension.

The exact overlap between AVX and AVX2 and F16c support has never been exactly clear to me. I think AVX2 pretty much guarantees F16c but I think its best to check with those extensions.

@markreidvfx
Copy link
Contributor Author

I did a bit more perf testing of this with my old lut3d_perf tool

image

It also turns out that github runners on a private repos are different then the public repo ones. The private ones can have avx512.

I was able to test this pull request on windows with avx512 by setting up a private fork. I kinda used up all my free minutes for the month doing it but all the tests pass 😆

Copy link
Collaborator

@doug-walker doug-walker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@doug-walker doug-walker requested a review from remia March 18, 2024 04:19
tests/cpu/AVX512_tests.cpp Outdated Show resolved Hide resolved
@markreidvfx
Copy link
Contributor Author

@remia I added your suggestion to all the SIMD tests. I also rebased on top of the current main.

@doug-walker doug-walker merged commit 91e8826 into AcademySoftwareFoundation:main Mar 21, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants