-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adsk contrib - Add support for neon intrinsic integration #1828
Adsk contrib - Add support for neon intrinsic integration #1828
Conversation
It seems this PR fixes the universal build from Intel based macOS, so it could fix the failing Python wheels nightly build, will be good to check, once this is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple really minor notes which could be cleaned up later.
Turning off SSE2NEON with
On apple arm64 i'm getting 5 test_cpu_exec failed tests
I don't think they are related to this pull request as I get the same ones from the current main branch. I haven't dug into it but they look kind of like rounding errors. All the cpu test pass under rosetta. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented on a few minor points. Looking at @markreidvfx latest comment, do we need to add / update the macOS build variants in CI to cover all the new configurations?
Signed-off-by: Cédrik Fuoco <[email protected]>
…on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <[email protected]>
…ey serve the same purpose Signed-off-by: Cédrik Fuoco <[email protected]>
…n false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
…cmake Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
… code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]>
This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
… on APPLE platform Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
bcd853f
to
5bb952a
Compare
…logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <[email protected]>
…clearer. Signed-off-by: Cédrik Fuoco <[email protected]>
Thanks @remia, @markreidvfx and @michdolan for your comments. Everything should be covered by the new commits.
At least, I think adding an entry for OCIO_USE_SSE2NEON=OFF in the macOS build variants would be good. |
Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
@cedrik-fuoco-adsk I think only 20 jobs can run in parallel on a workflow but nothing prevent us from having more than 20, it will just take longer. Another possibility is also to add new variants to a nightly build if these are rare build settings. |
…takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <[email protected]>
…integration Signed-off-by: Cédrik Fuoco <[email protected]>
#cmakedefine01 OCIO_USE_SSE3 | ||
#cmakedefine01 OCIO_USE_SSSE3 | ||
#cmakedefine01 OCIO_USE_SSE4 | ||
#cmakedefine01 OCIO_USE_SSE42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Building non universal arm64, I still needed to specify -DOCIO_USE_SSE2=ON
etc to turn all these on. Even with -DOCIO_USE_SSE2NEON=ON
. -DOCIO_USE_SSE2NEON=ON
should probably imply all these SSE features are ON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should already be ON by default if building a non-universal arm64.
See in CMakeLists.txt
They are not turned off since the CheckSupportX86SIMD.cmake
is not run on an arm64-only build. I've tested on my M1 and I get the expected CPUInfoConfig.h
, It should go in that path here:
I've tested with ocioperf using only OCIO_USE_SSE2NEON=ON and OFF. With my heavy_transforms.clf and the line by line test, I get about 226 ms
with -DOCIO_USE_SSE2NEON=ON
and about 467 ms
with -DOCIO_USE_SSE2NEON=OFF
.
What do you mean by "still needed"? Did you do another build beforehand or it was from a cleaned directory? @markreidvfx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not turning on for me when I build natively on arm64. I've tried cleaning my env several times.
I'm just doing the basic mkdir build && cd build && cmake .. && cmake --build .
and my CPUInfoConfig.h
shows
#if (OCIO_ARCH_X86 && !defined(__aarch64__)) || (defined(__aarch64__) && OCIO_USE_SSE2NEON)
#define OCIO_USE_SSE2 0
#define OCIO_USE_SSE3 0
#define OCIO_USE_SSSE3 0
#define OCIO_USE_SSE4 0
#define OCIO_USE_SSE42 0
It looks like its because CMAKE_OSX_ARCHITECTURES
isn't set to anything when I build it this way.
That section in CMakeLists.txt needs it to be arm64
.
(APPLE AND "${CMAKE_SYSTEM_PROCESSOR}" MATCHES "arm64" AND "${CMAKE_OSX_ARCHITECTURES}" MATCHES "arm64")
cmake .. -DCMAKE_OSX_ARCHITECTURES=arm64
works.
Should we set CMAKE_OSX_ARCHITECTURES = CMAKE_SYSTEM_PROCESSOR if undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, I get it now. I'll prepare a new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit should have fixed the issue (I've tested it on my M1). @markreidvfx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That fixes it for me on my M2 mac too.
… documentations. Signed-off-by: Cédrik Fuoco <[email protected]>
|
||
On MacOS, the default is to build for the native architecture. The ``-DCMAKE_OSX_ARCHITECTURES`` option | ||
may be set to ``arm64;x86_64`` to build the universal binaries. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubled checked, the order the architectures doesn't cause any issues, but we might want to reword this to show you need quotes around the architectures -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"
. It took me like 10mins to realize the semicolon ;
in bash or zshell ends a command :p
Signed-off-by: Cédrik Fuoco <[email protected]>
…' of https://github.com/autodesk-forks/OpenColorIO into adsk_contrib/add-support-for-neon-intrinsic-integration
Thanks for your help with this @markreidvfx ! We updated the install instructions accordingly. Going to merge this as the expected tests passed before we updated the documentation. |
…twareFoundation#1828) * Merging the previous ARM Neon branch Signed-off-by: Cédrik Fuoco <[email protected]> * Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <[email protected]> * Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose Signed-off-by: Cédrik Fuoco <[email protected]> * Using try_compile instead of check_cxx_source_compiles as it was given false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <[email protected]> * Comments clean up and refactor some comments and documentations Signed-off-by: Cédrik Fuoco <[email protected]> * Added something in the documentation for Rosetta and small change in cmake Signed-off-by: Cédrik Fuoco <[email protected]> * Fixing the build under Rosetta and fixing issue in the MacOS CI. Signed-off-by: Cédrik Fuoco <[email protected]> * Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <[email protected]> * use emmintrin.h for only sse2 features Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * Allow F16C to be completely turned off Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * Seperate SIMD test code from code that adds unit tests. This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * remove uneeded checks Signed-off-by: Cédrik Fuoco <[email protected]> * use software implementations of f16c intrinsics for SSE2 Signed-off-by: Cédrik Fuoco <[email protected]> * Added preprocessor checks for ARM as it is needed for universal build on APPLE platform Signed-off-by: Cédrik Fuoco <[email protected]> * Adding missing checks for "not arm64" Signed-off-by: Cédrik Fuoco <[email protected]> * Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <[email protected]> * Fixing some spacing, documentations and making some cmake conditions clearer. Signed-off-by: Cédrik Fuoco <[email protected]> * Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF Signed-off-by: Cédrik Fuoco <[email protected]> * typo Signed-off-by: Cédrik Fuoco <[email protected]> * Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <[email protected]> * Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations. Signed-off-by: Cédrik Fuoco <[email protected]> * Update documentation and remove ocio_use_sse2neon from the CI matrix Signed-off-by: Cédrik Fuoco <[email protected]> --------- Signed-off-by: Cédrik Fuoco <[email protected]> Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> Co-authored-by: Mark Reid <[email protected]> Co-authored-by: Doug Walker <[email protected]> Signed-off-by: Brooke <[email protected]>
…twareFoundation#1828) * Merging the previous ARM Neon branch Signed-off-by: Cédrik Fuoco <[email protected]> * Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables Signed-off-by: Cédrik Fuoco <[email protected]> * Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose Signed-off-by: Cédrik Fuoco <[email protected]> * Using try_compile instead of check_cxx_source_compiles as it was given false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON. Signed-off-by: Cédrik Fuoco <[email protected]> * Comments clean up and refactor some comments and documentations Signed-off-by: Cédrik Fuoco <[email protected]> * Added something in the documentation for Rosetta and small change in cmake Signed-off-by: Cédrik Fuoco <[email protected]> * Fixing the build under Rosetta and fixing issue in the MacOS CI. Signed-off-by: Cédrik Fuoco <[email protected]> * Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags. Signed-off-by: Cédrik Fuoco <[email protected]> * use emmintrin.h for only sse2 features Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * Allow F16C to be completely turned off Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * Seperate SIMD test code from code that adds unit tests. This isolates the compulation units and avoids executing illegal hardware instructions Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> * remove uneeded checks Signed-off-by: Cédrik Fuoco <[email protected]> * use software implementations of f16c intrinsics for SSE2 Signed-off-by: Cédrik Fuoco <[email protected]> * Added preprocessor checks for ARM as it is needed for universal build on APPLE platform Signed-off-by: Cédrik Fuoco <[email protected]> * Adding missing checks for "not arm64" Signed-off-by: Cédrik Fuoco <[email protected]> * Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF. Signed-off-by: Cédrik Fuoco <[email protected]> * Fixing some spacing, documentations and making some cmake conditions clearer. Signed-off-by: Cédrik Fuoco <[email protected]> * Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF Signed-off-by: Cédrik Fuoco <[email protected]> * typo Signed-off-by: Cédrik Fuoco <[email protected]> * Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build. OCIO no longuer build a universal binary by default Signed-off-by: Cédrik Fuoco <[email protected]> * Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations. Signed-off-by: Cédrik Fuoco <[email protected]> * Update documentation and remove ocio_use_sse2neon from the CI matrix Signed-off-by: Cédrik Fuoco <[email protected]> --------- Signed-off-by: Cédrik Fuoco <[email protected]> Signed-off-by: Mark Reid <[email protected]> Signed-off-by: Cédrik Fuoco <[email protected]> Co-authored-by: Mark Reid <[email protected]> Co-authored-by: Doug Walker <[email protected]> Signed-off-by: Doug Walker <[email protected]>
This is a PR that is replacing the PR about ARM Neon here: #1775.
The new PR contains all the previous code changes that were already accepted in the previous PR and contains new commits that were needed to integrate it with the code from the PR Add AVX2/AVX/SSE2 SIMD accelerated 1D/3D LUTS (merged in main already).
Therefore, please ignore the first commit named "Merging the previous ARM Neon branch".