Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adsk contrib - Add support for neon intrinsic integration #1828

Conversation

cedrik-fuoco-adsk
Copy link
Contributor

This is a PR that is replacing the PR about ARM Neon here: #1775.

The new PR contains all the previous code changes that were already accepted in the previous PR and contains new commits that were needed to integrate it with the code from the PR Add AVX2/AVX/SSE2 SIMD accelerated 1D/3D LUTS (merged in main already).

Therefore, please ignore the first commit named "Merging the previous ARM Neon branch".

@remia
Copy link
Collaborator

remia commented Aug 27, 2023

It seems this PR fixes the universal build from Intel based macOS, so it could fix the failing Python wheels nightly build, will be good to check, once this is merged.

Copy link
Collaborator

@michdolan michdolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a couple really minor notes which could be cleaned up later.

share/cmake/utils/CheckSupportAVX512.cmake Outdated Show resolved Hide resolved
src/OpenColorIO/SSE.h Outdated Show resolved Hide resolved
src/OpenColorIO/SSE2.h Outdated Show resolved Hide resolved
@markreidvfx
Copy link
Contributor

markreidvfx commented Aug 29, 2023

Turning off SSE2NEON with -DOCIO_USE_SSE2NEON=OFF fails to compile. Looks like clang is trying to compile SSE code as arm64.

In file included from /Users/mark/Dev/OpenColorIO/src/OpenColorIO/ops/cdl/CDLOpCPU.cpp:12:
/Users/mark/Dev/OpenColorIO/src/OpenColorIO/SSE.h:45:23: error: unknown type name '__m128'
        static inline __m128 _mm_max_ps(__m128 a, __m128 b)

On apple arm64 i'm getting 5 test_cpu_exec failed tests

[ 153/1049] [CPUProcessor / optimizations                                ] - FAILED
[ 417/1049] [FileFormatCTF / bake_1d_3d                                  ] - FAILED
[ 427/1049] [FileFormatHDL / bake_1d_shaper                              ] - FAILED
[ 462/1049] [FileFormatResolveCube / bake_1d_shaper                      ] - FAILED
[ 471/1049] [FileFormatSpi1D / bake_1d_shaper                            ] - FAILED

I don't think they are related to this pull request as I get the same ones from the current main branch. I haven't dug into it but they look kind of like rounding errors.

All the cpu test pass under rosetta.

Copy link
Collaborator

@remia remia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on a few minor points. Looking at @markreidvfx latest comment, do we need to add / update the macOS build variants in CI to cover all the new configurations?

CMakeLists.txt Outdated Show resolved Hide resolved
docs/quick_start/installation.rst Outdated Show resolved Hide resolved
docs/quick_start/installation.rst Show resolved Hide resolved
docs/quick_start/installation.rst Show resolved Hide resolved
share/cmake/utils/CheckSupportAVX.cmake Outdated Show resolved Hide resolved
share/cmake/utils/CheckSupportF16C.cmake Outdated Show resolved Hide resolved
share/cmake/modules/install/Installsse2neon.cmake Outdated Show resolved Hide resolved
cedrik-fuoco-adsk and others added 15 commits August 29, 2023 12:55
…on unifying the way related cmake variables

Signed-off-by: Cédrik Fuoco <[email protected]>
…ey serve the same purpose

Signed-off-by: Cédrik Fuoco <[email protected]>
…n false positive. Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON.

Signed-off-by: Cédrik Fuoco <[email protected]>
… code and fixed CheckSupport compiler flags.

Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
This isolates the compulation units and
avoids executing illegal hardware instructions

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
@cedrik-fuoco-adsk cedrik-fuoco-adsk force-pushed the adsk_contrib/add-support-for-neon-intrinsic-integration branch from bcd853f to 5bb952a Compare August 29, 2023 16:56
…logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF.

Signed-off-by: Cédrik Fuoco <[email protected]>
@cedrik-fuoco-adsk
Copy link
Contributor Author

cedrik-fuoco-adsk commented Aug 29, 2023

Thanks @remia, @markreidvfx and @michdolan for your comments.

Everything should be covered by the new commits.

do we need to add / update the macOS build variants in CI to cover all the new configurations?

At least, I think adding an entry for OCIO_USE_SSE2NEON=OFF in the macOS build variants would be good.
@remia, Is there a limit to the number of tasks in a particular workflow?

@remia
Copy link
Collaborator

remia commented Aug 29, 2023

@cedrik-fuoco-adsk I think only 20 jobs can run in parallel on a workflow but nothing prevent us from having more than 20, it will just take longer. Another possibility is also to add new variants to a nightly build if these are rare build settings.

cedrik-fuoco-adsk and others added 2 commits August 29, 2023 16:10
…takes double the time to do the universal build.

OCIO no longuer build a universal binary by default

Signed-off-by: Cédrik Fuoco <[email protected]>
#cmakedefine01 OCIO_USE_SSE3
#cmakedefine01 OCIO_USE_SSSE3
#cmakedefine01 OCIO_USE_SSE4
#cmakedefine01 OCIO_USE_SSE42
Copy link
Contributor

@markreidvfx markreidvfx Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building non universal arm64, I still needed to specify -DOCIO_USE_SSE2=ON etc to turn all these on. Even with -DOCIO_USE_SSE2NEON=ON. -DOCIO_USE_SSE2NEON=ON should probably imply all these SSE features are ON.

Copy link
Contributor Author

@cedrik-fuoco-adsk cedrik-fuoco-adsk Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should already be ON by default if building a non-universal arm64.
See in CMakeLists.txt

They are not turned off since the CheckSupportX86SIMD.cmake is not run on an arm64-only build. I've tested on my M1 and I get the expected CPUInfoConfig.h, It should go in that path here:
image

I've tested with ocioperf using only OCIO_USE_SSE2NEON=ON and OFF. With my heavy_transforms.clf and the line by line test, I get about 226 ms with -DOCIO_USE_SSE2NEON=ON and about 467 ms with -DOCIO_USE_SSE2NEON=OFF.

What do you mean by "still needed"? Did you do another build beforehand or it was from a cleaned directory? @markreidvfx

Copy link
Contributor

@markreidvfx markreidvfx Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not turning on for me when I build natively on arm64. I've tried cleaning my env several times.
I'm just doing the basic mkdir build && cd build && cmake .. && cmake --build .

and my CPUInfoConfig.h shows

#if (OCIO_ARCH_X86 && !defined(__aarch64__)) || (defined(__aarch64__) && OCIO_USE_SSE2NEON)
    #define OCIO_USE_SSE2 0
    #define OCIO_USE_SSE3 0
    #define OCIO_USE_SSSE3 0
    #define OCIO_USE_SSE4 0
    #define OCIO_USE_SSE42 0

It looks like its because CMAKE_OSX_ARCHITECTURES isn't set to anything when I build it this way.
That section in CMakeLists.txt needs it to be arm64.

(APPLE AND "${CMAKE_SYSTEM_PROCESSOR}" MATCHES "arm64" AND "${CMAKE_OSX_ARCHITECTURES}" MATCHES "arm64")

cmake .. -DCMAKE_OSX_ARCHITECTURES=arm64 works.

Should we set CMAKE_OSX_ARCHITECTURES = CMAKE_SYSTEM_PROCESSOR if undefined?

Copy link
Contributor Author

@cedrik-fuoco-adsk cedrik-fuoco-adsk Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I get it now. I'll prepare a new commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit should have fixed the issue (I've tested it on my M1). @markreidvfx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fixes it for me on my M2 mac too.


On MacOS, the default is to build for the native architecture. The ``-DCMAKE_OSX_ARCHITECTURES`` option
may be set to ``arm64;x86_64`` to build the universal binaries.

Copy link
Contributor

@markreidvfx markreidvfx Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubled checked, the order the architectures doesn't cause any issues, but we might want to reword this to show you need quotes around the architectures -DCMAKE_OSX_ARCHITECTURES="arm64;x86_64". It took me like 10mins to realize the semicolon ; in bash or zshell ends a command :p

@doug-walker
Copy link
Collaborator

Thanks for your help with this @markreidvfx !

We updated the install instructions accordingly. Going to merge this as the expected tests passed before we updated the documentation.

@doug-walker doug-walker merged commit caa20dd into AcademySoftwareFoundation:main Sep 1, 2023
22 checks passed
@doug-walker doug-walker deleted the adsk_contrib/add-support-for-neon-intrinsic-integration branch September 1, 2023 03:02
brkglvn01 pushed a commit to brkglvn01/OpenColorIO that referenced this pull request Oct 23, 2023
…twareFoundation#1828)

* Merging the previous ARM Neon branch

Signed-off-by: Cédrik Fuoco <[email protected]>

* Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables

Signed-off-by: Cédrik Fuoco <[email protected]>

* Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose

Signed-off-by: Cédrik Fuoco <[email protected]>

* Using try_compile instead of check_cxx_source_compiles as it was given false positive.  Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Comments clean up and refactor some comments and documentations

Signed-off-by: Cédrik Fuoco <[email protected]>

* Added something in the documentation for Rosetta and small change in cmake

Signed-off-by: Cédrik Fuoco <[email protected]>

* Fixing the build under Rosetta and fixing issue in the MacOS CI.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags.

Signed-off-by: Cédrik Fuoco <[email protected]>

* use emmintrin.h for only sse2 features

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* Allow F16C to be completely turned off

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* Seperate SIMD test code from code that adds unit tests.

This isolates the compulation units and
avoids executing illegal hardware instructions

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* remove uneeded checks

Signed-off-by: Cédrik Fuoco <[email protected]>

* use software implementations of f16c intrinsics for SSE2

Signed-off-by: Cédrik Fuoco <[email protected]>

* Added preprocessor checks for ARM as it is needed for universal build on APPLE platform

Signed-off-by: Cédrik Fuoco <[email protected]>

* Adding missing checks for "not arm64"

Signed-off-by: Cédrik Fuoco <[email protected]>

* Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Fixing some spacing, documentations and making some cmake conditions clearer.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF

Signed-off-by: Cédrik Fuoco <[email protected]>

* typo

Signed-off-by: Cédrik Fuoco <[email protected]>

* Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build.
OCIO no longuer build a universal binary by default

Signed-off-by: Cédrik Fuoco <[email protected]>

* Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Update documentation and remove ocio_use_sse2neon from the CI matrix

Signed-off-by: Cédrik Fuoco <[email protected]>

---------

Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
Co-authored-by: Mark Reid <[email protected]>
Co-authored-by: Doug Walker <[email protected]>
Signed-off-by: Brooke <[email protected]>
doug-walker added a commit to autodesk-forks/OpenColorIO that referenced this pull request Dec 6, 2023
…twareFoundation#1828)

* Merging the previous ARM Neon branch

Signed-off-by: Cédrik Fuoco <[email protected]>

* Testing each SIMD variants using a small code snippet and first pass on unifying the way related cmake variables

Signed-off-by: Cédrik Fuoco <[email protected]>

* Removing the usage of USE_SSE in favor of the new OCIO_USE_SSE2 as they serve the same purpose

Signed-off-by: Cédrik Fuoco <[email protected]>

* Using try_compile instead of check_cxx_source_compiles as it was given false positive.  Stubbing cpuinfo for Apple ARM plateform. Handling Apple M1 correctly and adding support for SSE2NEON.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Comments clean up and refactor some comments and documentations

Signed-off-by: Cédrik Fuoco <[email protected]>

* Added something in the documentation for Rosetta and small change in cmake

Signed-off-by: Cédrik Fuoco <[email protected]>

* Fixing the build under Rosetta and fixing issue in the MacOS CI.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Using try_compile for CheckSupportSSEUsingSSE2NEON to standardize the code and fixed CheckSupport compiler flags.

Signed-off-by: Cédrik Fuoco <[email protected]>

* use emmintrin.h for only sse2 features

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* Allow F16C to be completely turned off

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* Seperate SIMD test code from code that adds unit tests.

This isolates the compulation units and
avoids executing illegal hardware instructions

Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>

* remove uneeded checks

Signed-off-by: Cédrik Fuoco <[email protected]>

* use software implementations of f16c intrinsics for SSE2

Signed-off-by: Cédrik Fuoco <[email protected]>

* Added preprocessor checks for ARM as it is needed for universal build on APPLE platform

Signed-off-by: Cédrik Fuoco <[email protected]>

* Adding missing checks for "not arm64"

Signed-off-by: Cédrik Fuoco <[email protected]>

* Ease the future maintainability a the new OCIO_USE_xyz be moving the logic into CPUInfoConfig.h.in as well as fixing issue on ARM when building with OCIO_USE_SSE2NEON=OFF.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Fixing some spacing, documentations and making some cmake conditions clearer.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Adding a build in ci_workflow for macos USE_OCIO_SSE2NEON=OFF

Signed-off-by: Cédrik Fuoco <[email protected]>

* typo

Signed-off-by: Cédrik Fuoco <[email protected]>

* Changing back all the macOS (except one) builds to x86_64 only as it takes double the time to do the universal build.
OCIO no longuer build a universal binary by default

Signed-off-by: Cédrik Fuoco <[email protected]>

* Update the CMakeLists.txt logic to accomodate all scenario and fixing documentations.

Signed-off-by: Cédrik Fuoco <[email protected]>

* Update documentation and remove ocio_use_sse2neon from the CI matrix

Signed-off-by: Cédrik Fuoco <[email protected]>

---------

Signed-off-by: Cédrik Fuoco <[email protected]>
Signed-off-by: Mark Reid <[email protected]>
Signed-off-by: Cédrik Fuoco <[email protected]>
Co-authored-by: Mark Reid <[email protected]>
Co-authored-by: Doug Walker <[email protected]>
Signed-off-by: Doug Walker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants