Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abseil doesn't build with clang on windows due to missing AVX types #1759

Closed
achernya opened this issue Sep 18, 2024 · 8 comments
Closed

abseil doesn't build with clang on windows due to missing AVX types #1759

achernya opened this issue Sep 18, 2024 · 8 comments

Comments

@achernya
Copy link

achernya commented Sep 18, 2024

Describe the issue

On clang on Windows, absl/crc/internal/non_temporal_memcpy.h fails to build with errors

In file included from third-party\abseil-cpp\absl\crc\internal\crc_non_temporal_memcpy.cc:20:                                                                                      
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:147:5: error: unknown type name '__m256i'                                                          
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);                                                                                                                       
    ^                                                                                                                                                                              
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:147:47: error: unknown type name '__m256i'                                                         
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);                                                                                                                       
                                              ^                                                                                                                                    
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:148:11: error: unknown type name '__m256i'                                                         
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);                                                                                                           
          ^                                                                                                                                                                        
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:148:59: error: unknown type name '__m256i'                                                         
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);                                                                                                           
                                                          ^                                                                                                                        
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:149:62: error: use of undeclared identifier '__m256i'                                              
    constexpr int kOpsPerCacheLine = kCacheLineSize / sizeof(__m256i);                                                                                                             
                                                             ^                                                                                                                     
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:153:7: error: unknown type name '__m256i'                                                          
      __m256i temp1, temp2;                                                                                                                                                        
      ^                                                                                                                                                                            
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:155:7: error: use of undeclared identifier 'temp2'; did you mean 'temp1'?                          
      temp2 = _mm256_lddqu_si256(src_cacheline + 1);                                                                                                                               
      ^                                                                                                                                                                            
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:153:15: note: 'temp1' declared here                                                                
      __m256i temp1, temp2;                                                                                                                                                        
              ^                                                                                                                                                                    
[redacted]\third-party\abseil-cpp\absl\crc\internal\non_temporal_memcpy.h:157:46: error: use of undeclared identifier 'temp2'; did you mean 'exp2'?                          
      _mm256_stream_si256(dst_cacheline + 1, temp2);                                                                                                                               
                                             ^                                                                                                                                     
[redacted]\third-party\toolchains\visual_studio\14.29.30133\VC\Tools\MSVC\14.29.30133\include\cmath:651:16: note: 'exp2' declared here                                  
_GENERIC_MATH1(exp2)

It looks like clang picked up some headers from MSVC (although we are not running clang-cl.exe, but clang++.exe)
https://github.com/abseil/abseil-cpp/blob/master/absl/crc/internal/non_temporal_memcpy.h#L124-L127 does and have __m256i defined.

If I add a !defined(_MSC_VER_) to the __SSE3__ clause, everything builds fine.

Steps to reproduce the problem

Attempt to build absl with clang 15 on Windows. (I suspect it affects newer clang as well, but I haven't checked.) Per my read of https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md clang-cl 15 should still be supported.

The invocation does not have any special clang-cl command line, i.e., no equivalent of -mavx or -march=native or similar -- just the usual -c, -o, -MD, -MF args.

What version of Abseil are you using?

20240722.0 (4447c75)

What operating system and version are you using?

CentOS Stream release 9 (although buck2 is running some parts of the build on Windows)

What compiler and version are you using?

llvm 15.0.5 (windows-x86_64 clang++.exe)

What build system are you using?

buck2

Additional context

No response

@achernya achernya changed the title [Bug]: Please title this bug report abseil doesn't build with clang on windows due to missing AVX types Sep 18, 2024
@derekmauro
Copy link
Member

Steps to reproduce the problem

Attempt to build absl with clang 15 on Windows. (I suspect it affects newer clang as well, but I haven't checked.) Per my read of https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md clang-cl 15 should still be supported.

The invocation does not have any special clang-cl command line, i.e., no equivalent of -mavx or -march=native or similar -- just the usual -c, -o, -MD, -MF args.

We do test this configuration and do not see this issue, so there must be some difference that isn't getting captured here. Please provide concrete reproduction steps using a supported build system.

@achernya
Copy link
Author

I'll look into getting a repro, which will be challenging since I don't have direct access to the Windows builders.

Can you help me understand why you believe the current #if defined(...) are correct? 256-bit registers are only available starting AVX, not SSE3, so even with the comment above the check, it doesn't look right to me.

@derekmauro
Copy link
Member

I think I understand what is going on here. 18018aa tries to force that function to compile with AVX (see the gnu::target line above the function) so that runtime dispatch can be used when AVX is available, but to also allow the same binary to work with CPUs that do not support AVX. I believe the #ifdef needs to be modified to verify that the compiler supports gnu::target.

@achernya
Copy link
Author

Okay, reproduction steps:

  1. Get a WIndows 11 machine. I used a VM from https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/. I set up my VM to have an EPYC CPU (because my host is AMD based).
  2. Install llvm-15.0.7-win64.exe from https://github.com/llvm/llvm-project/releases/tag/llvmorg-15.0.7, and install it, selecting "add to PATH for all users"
  3. Install cmake-3.30.3-windows-x86_64.msi from https://cmake.org/download/ and install it
  4. Download ninja-win.zip from https://github.com/ninja-build/ninja/releases and extract it. I put it in C:\Users\User\Documents
  5. Install Visual Studio Community 2019 with the C++ workload. (This is technically out of support, but only the Windows headers are needed here. I believe this will work fine with 2022, but I haven't tried it yet since I am trying to reproduce the environment that found the error)
  6. Clone abseil-cpp.git and checkout the 20240722.0 tag.
  7. In cmd.exe, run call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64
  8. mkdir build && cd build
  9. cmake -DCMAKE_MAKE_PROGRAM:PATH=C:\Users\User\Documents\ninja.exe -DCMAKE_CXX_COMPILER:PATH="C:\Program Files\LLVM\bin\clang++.exe" -DCMAKE_CXX_FLAGS="-march=westmere" -G Ninja ..
  10. cmake --build . The build will fail, although the error message won't be as obvious. I recommend re-running with cmake --build . -- -j1 to get it.
C:\Users\User\Documents\src\abseil-cpp\build>cmake --build . -- -j1
[1/108] Building CXX object absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj
FAILED: absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj
C:\PROGRA~1\LLVM\bin\CLANG_~1.EXE  -IC:/Users/User/Documents/src/abseil-cpp -march=westmere -D_DLL -D_MT -Xclang --dependent-lib=msvcrt -Wall -Wextra -Wc++98-compat-extra-semi -Wcast-qual -Wconversion -Wdeprecated-pragma -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj -MF absl\crc\CMakeFiles\crc32c.dir\internal\crc_non_temporal_memcpy.cc.obj.d -o absl/crc/CMakeFiles/crc32c.dir/internal/crc_non_temporal_memcpy.cc.obj -c C:/Users/User/Documents/src/abseil-cpp/absl/crc/internal/crc_non_temporal_memcpy.cc
In file included from C:/Users/User/Documents/src/abseil-cpp/absl/crc/internal/crc_non_temporal_memcpy.cc:20:
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:147:5: error: unknown type name '__m256i'
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);
    ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:147:47: error: unknown type name '__m256i'
    __m256i *dst_cacheline = reinterpret_cast<__m256i *>(d);
                                              ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:148:11: error: unknown type name '__m256i'
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);
          ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:148:59: error: unknown type name '__m256i'
    const __m256i *src_cacheline = reinterpret_cast<const __m256i *>(s);
                                                          ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:149:62: error: use of undeclared identifier '__m256i'
    constexpr int kOpsPerCacheLine = kCacheLineSize / sizeof(__m256i);
                                                             ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:153:7: error: unknown type name '__m256i'
      __m256i temp1, temp2;
      ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:155:7: error: use of undeclared identifier 'temp2'; did you mean 'temp1'?
      temp2 = _mm256_lddqu_si256(src_cacheline + 1);
      ^~~~~
      temp1
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:153:15: note: 'temp1' declared here
      __m256i temp1, temp2;
              ^
C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:157:46: error: use of undeclared identifier 'temp2'; did you mean 'exp2'?
      _mm256_stream_si256(dst_cacheline + 1, temp2);
                                             ^~~~~
                                             exp2
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include\cmath:651:16: note: 'exp2' declared here
_GENERIC_MATH1(exp2)
               ^
8 errors generated.
ninja: build stopped: subcommand failed.

The -march=westmere is load-bearing -- without it, __SSE3__ never gets set by clang, and never exposes this issue. So my comment in the initial report about there being no -march or similar flags must be in error (there's possibly something in the clang wrapper script by buck, or LLVM was built to assume a different CPU)

Either way, this reproduces with a supported build-system, and a fairly reasonable -march CPU variant.

@achernya
Copy link
Author

achernya commented Sep 19, 2024

I believe the #ifdef needs to be modified to verify that the compiler supports gnu::target

If I put #error Supports gnu::target on line 119, my reproducer gets

C:/Users/User/Documents/src/abseil-cpp\absl/crc/internal/non_temporal_memcpy.h:119:2: error: Supports gnu::target
#error Supports gnu::target

So apparently clang++.exe supports gnu::target, but there's still no __m256i defined in that case.

Looking at clang 15's immintrin.h, it has

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    defined(__AVX__)
#include <avxintrin.h>
#endif

I checked to see if clang++.exe is defining _MSC_VER by adding

#if defined(_MSC_VER)
#error "Windows?"
#endif

and I do get this error.

So avxintrin.h (which is what has the __m256i typedefs) never gets included without __AVX__ on Windows, but does get included without Windows, which is why this code compiles on Linux (and presumably macos) with -march=westmere.

@achernya
Copy link
Author

achernya commented Sep 19, 2024

Looking at immintrin.h from gcc 12 as well as LLVM 15, it looks like gcc unconditionally includes avxintrin.h and makes the typedefs available.

LLVM makes avxintrin.h available if __AVX__ is defined, or _MSC_VER is not defined.

So I think it's likely sufficient to change from

#if defined(__SSE3__) || (defined(_MSC_VER) && defined(__AVX__))

to

#if  !defined(_MSC_VER) || defined(__AVX__)

(I don't know if we need an explicit gcc check with (defined(__GNUC__) && !defined(__llvm__)))

@derekmauro
Copy link
Member

Thanks. I managed to reproduce. I'll try to submit a fix in the next few days.

@achernya
Copy link
Author

achernya commented Sep 23, 2024

Thanks for the fix, @derekmauro!

However, I think

#if ((defined(__AVX__) || defined(ABSL_INTERNAL_CAN_FORCE_AVX)) && \
     defined(__SSE3__)) ||                                         \
    (defined(_MSC_VER) && defined(__AVX__))

should be simplified to something similar to what I suggested above. Probably

#if defined(__AVX__) || defined(ABSL_INTERNAL_CAN_FORCE_AVX)

given how you defined ABSL_INTERNAL_CAN_FORCE_AVX to already include !defined(_MSC_VER). Primarily the issue here is we need to remove __SSE3__ as it's misleading and confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants