-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang] Add intrin0.h
header to mimic intrin0.h
used by MSVC STL for clang-cl
#75711
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write If you have received no comments on your PR for a week, you can request a review If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-clang Author: None (MaxEW707) ChangesFixes #53520. DescriptionProvide Inside This should now allow I still need to figure out how to best update MSVC STL to detect for the presence on TestingBuilt clang locally and ran the test suite. I still need to do a pass over the existing unit tests for the ms intrinsics to make sure there aren't any gaps. Wanted to get this PR up for discussion first. Modified latest MSVC STL from github to point to Wrote some test files that included MSVC STL headers that rely on intrinsics such as BenchmarksThe following include times are based on the x64 target with the modified headers in this PR.
If there is anything required or a different approach is preferred let me know. I would very much like to move this over the finish line so we can use function targets with clang-cl. Patch is 46.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75711.diff 4 Files Affected:
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index f8fdd402777e48..e5ce039d5789ad 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -252,6 +252,7 @@ set(x86_files
)
set(windows_only_files
+ intrin0.h
intrin.h
vadefs.h
)
diff --git a/clang/lib/Headers/immintrin.h b/clang/lib/Headers/immintrin.h
index 9bfe2fcdabdb3a..f57cd385455c22 100644
--- a/clang/lib/Headers/immintrin.h
+++ b/clang/lib/Headers/immintrin.h
@@ -16,62 +16,62 @@
#include <x86gprintrin.h>
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__MMX__)
#include <mmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SSE__)
#include <xmmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SSE2__)
#include <emmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SSE3__)
#include <pmmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SSSE3__)
#include <tmmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__SSE4_2__) || defined(__SSE4_1__))
#include <smmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AES__) || defined(__PCLMUL__))
#include <wmmintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__CLFLUSHOPT__)
#include <clflushoptintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__CLWB__)
#include <clwbintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX__)
#include <avxintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX2__)
#include <avx2intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__F16C__)
#include <f16cintrin.h>
#endif
@@ -79,217 +79,217 @@
/* No feature check desired due to internal checks */
#include <bmiintrin.h>
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__BMI2__)
#include <bmi2intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__LZCNT__)
#include <lzcntintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__POPCNT__)
#include <popcntintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__FMA__)
#include <fmaintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512F__)
#include <avx512fintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VL__)
#include <avx512vlintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512BW__)
#include <avx512bwintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512BITALG__)
#include <avx512bitalgintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512CD__)
#include <avx512cdintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VPOPCNTDQ__)
#include <avx512vpopcntdqintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512VPOPCNTDQ__))
#include <avx512vpopcntdqvlintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VNNI__)
#include <avx512vnniintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512VNNI__))
#include <avx512vlvnniintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVXVNNI__)
#include <avxvnniintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512DQ__)
#include <avx512dqintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512BITALG__))
#include <avx512vlbitalgintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512BW__))
#include <avx512vlbwintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512CD__))
#include <avx512vlcdintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512DQ__))
#include <avx512vldqintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512ER__)
#include <avx512erintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512IFMA__)
#include <avx512ifmaintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512IFMA__) && defined(__AVX512VL__))
#include <avx512ifmavlintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVXIFMA__)
#include <avxifmaintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VBMI__)
#include <avx512vbmiintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VBMI__) && defined(__AVX512VL__))
#include <avx512vbmivlintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VBMI2__)
#include <avx512vbmi2intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VBMI2__) && defined(__AVX512VL__))
#include <avx512vlvbmi2intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512PF__)
#include <avx512pfintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512FP16__)
#include <avx512fp16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512FP16__))
#include <avx512vlfp16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512BF16__)
#include <avx512bf16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512BF16__))
#include <avx512vlbf16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__PKU__)
#include <pkuintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__VPCLMULQDQ__)
#include <vpclmulqdqintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__VAES__)
#include <vaesintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__GFNI__)
#include <gfniintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVXVNNIINT8__)
#include <avxvnniint8intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVXNECONVERT__)
#include <avxneconvertintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SHA512__)
#include <sha512intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SM3__)
#include <sm3intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SM4__)
#include <sm4intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVXVNNIINT16__)
#include <avxvnniint16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__RDPID__)
/// Reads the value of the IA32_TSC_AUX MSR (0xc0000103).
///
@@ -304,7 +304,7 @@ _rdpid_u32(void) {
}
#endif // __RDPID__
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__RDRND__)
/// Returns a 16-bit hardware-generated random value.
///
@@ -367,7 +367,7 @@ _rdrand64_step(unsigned long long *__p)
}
#endif /* __RDRND__ */
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__FSGSBASE__)
#ifdef __x86_64__
/// Reads the FS base register.
@@ -481,7 +481,7 @@ _writegsbase_u64(unsigned long long __V)
#endif
#endif /* __FSGSBASE__ */
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__MOVBE__)
/* The structs used below are to force the load/store to be unaligned. This
@@ -541,18 +541,18 @@ _storebe_i64(void * __P, long long __D) {
#endif
#endif /* __MOVBE */
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__RTM__)
#include <rtmintrin.h>
#include <xtestintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SHA__)
#include <shaintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__FXSR__)
#include <fxsrintrin.h>
#endif
@@ -560,22 +560,22 @@ _storebe_i64(void * __P, long long __D) {
/* No feature check desired due to internal MSC_VER checks */
#include <xsaveintrin.h>
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__XSAVEOPT__)
#include <xsaveoptintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__XSAVEC__)
#include <xsavecintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__XSAVES__)
#include <xsavesintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SHSTK__)
#include <cetintrin.h>
#endif
@@ -584,91 +584,91 @@ _storebe_i64(void * __P, long long __D) {
* whereas others are also available at all times. */
#include <adxintrin.h>
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__RDSEED__)
#include <rdseedintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__WBNOINVD__)
#include <wbnoinvdintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__CLDEMOTE__)
#include <cldemoteintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__WAITPKG__)
#include <waitpkgintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__MOVDIRI__) || defined(__MOVDIR64B__)
#include <movdirintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__PCONFIG__)
#include <pconfigintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SGX__)
#include <sgxintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__PTWRITE__)
#include <ptwriteintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__INVPCID__)
#include <invpcidintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AMX_FP16__)
#include <amxfp16intrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__KL__) || defined(__WIDEKL__)
#include <keylockerintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AMX_TILE__) || defined(__AMX_INT8__) || defined(__AMX_BF16__)
#include <amxintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AMX_COMPLEX__)
#include <amxcomplexintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__AVX512VP2INTERSECT__)
#include <avx512vp2intersectintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
(defined(__AVX512VL__) && defined(__AVX512VP2INTERSECT__))
#include <avx512vlvp2intersectintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__ENQCMD__)
#include <enqcmdintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__SERIALIZE__)
#include <serializeintrin.h>
#endif
-#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) || \
+#if !defined(__SCE__) || __has_feature(modules) || \
defined(__TSXLDTRK__)
#include <tsxldtrkintrin.h>
#endif
diff --git a/clang/lib/Headers/intrin.h b/clang/lib/Headers/intrin.h
index 9ebaea9fee9421..2d9ebe2be9c01a 100644
--- a/clang/lib/Headers/intrin.h
+++ b/clang/lib/Headers/intrin.h
@@ -15,6 +15,8 @@
#ifndef __INTRIN_H
#define __INTRIN_H
+#include <intrin0.h>
+
/* First include the standard intrinsics. */
#if defined(__i386__) || defined(__x86_64__)
#include <x86intrin.h>
@@ -131,9 +133,6 @@ void __writefsqword(unsigned long, unsigned __int64);
void __writefsword(unsigned long, unsigned short);
void __writemsr(unsigned long, unsigned __int64);
void *_AddressOfReturnAddress(void);
-unsigned char _BitScanForward(unsigned long *_Index, unsigned long _Mask);
-unsigned char _BitScanReverse(unsigned long *_Index, unsigned long _Mask);
-unsigned char _bittest(long const *, long);
unsigned char _bittestandcomplement(long *, long);
unsigned char _bittestandreset(long *, long);
unsigned char _bittestandset(long *, long);
@@ -151,7 +150,6 @@ long _InterlockedExchangeAdd_HLERelease(long volatile *, long);
__int64 _InterlockedExchangeAdd64_HLEAcquire(__int64 volatile *, __int64);
__int64 _InterlockedExchangeAdd64_HLERelease(__int64 volatile *, __int64);
void _ReadBarrier(void);
-void _ReadWriteBarrier(void);
unsigned int _rorx_u32(unsigned int, const unsigned int);
int _sarx_i32(int, unsigned int);
#if __STDC_HOSTED__
@@ -182,12 +180,6 @@ unsigned char __readgsbyte(unsigned long);
unsigned long __readgsdword(unsigned long);
unsigned __int64 __readgsqword(unsi...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Is |
Looks like it belongs to MSVC: https://github.com/microsoft/STL/blob/a8888806c6960f1687590ffd4244794c753aa819/stl/inc/yvals_core.h#L2040 So I think the right order here is:
If we land this as-is, it'll tank build time on Windows. |
In other words, please undo the changes to |
While this is true, I don't think it's the right tradeoff for us to leave Intel intrinsics inaccessible for users who don't want to enable new microarchitectural features globally with command line flags. You may recall there are ODR issues, where enabling AVX globally results in AVX instructions being generated in inline functions, which then prevail at link time, and crash at runtime on chips where AVX is not available. We are currently prioritizing compile time above those needs, and that doesn't seem like the right tradeoff. What I'd like to see is a pull request sent to https://github.com/microsoft/stl with some agreement about how to structure the ifdefs so we can use intrin0.h when it is available. We probably can't use Then, when we land this PR, the net effect is that compile time with MSVC slows down until the next release of the Microsoft STL, but users have access to Intel intrinsics as soon as possible. Does that seem reasonable? |
It ships with MSVC.
Sounds good I'll do that.
I was thinking I could use Another option I had was to preemptively get a PR up to MSVC STL with a A custom name like |
Thanks!
I think we can be flexible, and the important thing is that our solution should work for them. |
MSVC STL requires the x64 adc intel intrinsics. I moved those to a separate file I also made a table here of all the intrinsics available with MSVC's The last table at the end shows the intrinsics that aren't implemented in clang but MSVC STL has workarounds using the clang intrinsics instead. We don't have to worry about supporting those at the moment.
|
@MaxEW707 If you pull out the adcintrin.h change into a separate PR we can fast track it as it should be done anyhow |
See #75711 for discussion. As a summary from the PR above, `<adxintrin.h>` includes adc intrinsics and adx intrinsics. To support MSVC STL we need to expose the adc intrinsics inside the currently proposed `<intrin0.h>` header. Move the processor agnostic adc intrinsics into a separate file that can be included from `<immintrin.h>` and the currently proposed `<intrin0.h>`.
Wow, that clang/lib/Headers/yvals_core.h hacks is gross, I love it :D lgtm, and good to land provided you do a quick re-check to confirm that this doesn't tank compile times now with that part included. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are risks to shadowing yvals_core.h
, but @CaseyCarter is at least aware of it here.
Once Clang ships its own intrin0.h header, the MSVC STL can adjust their yvals_core.h code to use it, and then we can delete this shadow header. This gives us the best outcome: Fast compiles today, target attribute support in MSVC-mode, and minimal tech debt in the long run. Sounds good to me, thanks for finding a solution!
clang/lib/Headers/yvals_core.h
Outdated
*===-----------------------------------------------------------------------=== | ||
*/ | ||
|
||
/* Only include this if we are aiming for MSVC compatibility. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newer lib/Headers changes prefer //
-style comments. Do we need /* */
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need /* */
.
I was just copying from the other files in this folder for the copyright header and clang-format didn't complain.
I'll give the style guide a read and get this fixed :).
This file is inside msvc stl so we don't need to worry about C89 compat on microsoft platforms.
I am not familiar with the MSVC ecosystem, but the change looks reasonable. Thanks! |
I will need someone to commit on my behalf since I do not have write access.
The include times below are done with the following source file using a locally release build of clang,
clang-cl.exe without clang-cl.exe with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the timing information! The yvals_core.h hack is actually a really clever way to handle this. I'm a bit worried that we don't have any test coverage for this file to begin with, so it's a bit hard to validate that the changes are correct, but we're early enough in a release cycle that I think we'll get feedback with plenty of time to address unintentional fallout.
The changes should come with a release note in clang/docs/ReleaseNotes.rst
so that users know about the compile time performance improvements. Otherwise, this LGTM! I can land the changes on your behalf once there's a release note.
Just want to say thanks for everyone taking the time to review the PR and providing feedback :).
I'll look into adding unit tests next week in a separate PR. If not it will be closer towards the end of March due to personal things.
Done. Feel free to reword the release notes as necessary. I believe I landed on something that is informative but succinct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for devising this clever way to improve MSVCSTL throughput!
Friendly reminder that I require someone to commit on my behalf since I do not have write access :). |
@AaronBallman Can you merge this please? We are eagerly waiting for this to end up in a release! ;-) |
Thank you for the ping, this fell off my radar! |
…for clang-cl (llvm#75711) Fixes llvm#53520. #### Description #### Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC STL requires. The `intrin0.h` header matches the latest header provided by MSVC 1939 which does include some extra intrinsics that the MSVC STL does not use. Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`. If you want me to change those to `intrin0.h` for the moved intrinsics let me know. This should now allow `immintrin.h` to be used with function targets for runtime cpu detection of simd instruction sets without worrying about the compile-time overhead from MSVC STL including `intrin.h` on clang. I still need to figure out how to best update MSVC STL to detect for the presence of `intrin0.h` from clang and to use this header over `intrin.h`. #### Testing #### Built clang locally and ran the test suite. I still need to do a pass over the existing unit tests for the ms intrinsics to make sure there aren't any gaps. Wanted to get this PR up for discussion first. Modified latest MSVC STL from github to point to `intrin0.h` for clang. Wrote some test files that included MSVC STL headers that rely on intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests against x86, arm, aarch64, and x64. #### Benchmarks #### The following include times are based on the x64 target with the modified headers in this PR. These timings were done by using `clang-cl.exe -ftime-trace` and taking the wall time for parsing `intrin.h` and `intrin0.h`. `intrin.h` takes ~897ms to parse. `intrin0.h` takes ~1ms to parse. If there is anything required or a different approach is preferred let me know. I would very much like to move this over the finish line so we can use function targets with clang-cl.
See llvm/llvm-project#75711 for discussion. As a summary from the PR above, `<adxintrin.h>` includes adc intrinsics and adx intrinsics. To support MSVC STL we need to expose the adc intrinsics inside the currently proposed `<intrin0.h>` header. Move the processor agnostic adc intrinsics into a separate file that can be included from `<immintrin.h>` and the currently proposed `<intrin0.h>`.
…for clang-cl (llvm#75711) Fixes llvm#53520. #### Description #### Provide `intrin0.h` to be the minimal set of intrinsics that the MSVC STL requires. The `intrin0.h` header matches the latest header provided by MSVC 1939 which does include some extra intrinsics that the MSVC STL does not use. Inside `BuiltinHeaders.def` I kept the header description as `intrin.h`. If you want me to change those to `intrin0.h` for the moved intrinsics let me know. This should now allow `immintrin.h` to be used with function targets for runtime cpu detection of simd instruction sets without worrying about the compile-time overhead from MSVC STL including `intrin.h` on clang. I still need to figure out how to best update MSVC STL to detect for the presence of `intrin0.h` from clang and to use this header over `intrin.h`. #### Testing #### Built clang locally and ran the test suite. I still need to do a pass over the existing unit tests for the ms intrinsics to make sure there aren't any gaps. Wanted to get this PR up for discussion first. Modified latest MSVC STL from github to point to `intrin0.h` for clang. Wrote some test files that included MSVC STL headers that rely on intrinsics such as `atomic`, `bit` and `vector`. Built the unit tests against x86, arm, aarch64, and x64. #### Benchmarks #### The following include times are based on the x64 target with the modified headers in this PR. These timings were done by using `clang-cl.exe -ftime-trace` and taking the wall time for parsing `intrin.h` and `intrin0.h`. `intrin.h` takes ~897ms to parse. `intrin0.h` takes ~1ms to parse. If there is anything required or a different approach is preferred let me know. I would very much like to move this over the finish line so we can use function targets with clang-cl.
Fixes #53520.
Description
Provide
intrin0.h
to be the minimal set of intrinsics that the MSVC STL requires.The
intrin0.h
header matches the latest header provided by MSVC 1939 which does include some extra intrinsics that the MSVC STL does not use.Inside
BuiltinHeaders.def
I kept the header description asintrin.h
. If you want me to change those tointrin0.h
for the moved intrinsics let me know.This should now allow
immintrin.h
to be used with function targets for runtime cpu detection of simd instruction sets without worrying about the compile-time overhead from MSVC STL includingintrin.h
on clang.I still need to figure out how to best update MSVC STL to detect for the presence of
intrin0.h
from clang and to use this header overintrin.h
.Testing
Built clang locally and ran the test suite. I still need to do a pass over the existing unit tests for the ms intrinsics to make sure there aren't any gaps. Wanted to get this PR up for discussion first.
Modified latest MSVC STL from github to point to
intrin0.h
for clang.Wrote some test files that included MSVC STL headers that rely on intrinsics such as
atomic
,bit
andvector
. Built the unit tests against x86, arm, aarch64, and x64.Benchmarks
The following include times are based on the x64 target with the modified headers in this PR.
These timings were done by using
clang-cl.exe -ftime-trace
and taking the wall time for parsingintrin.h
andintrin0.h
.intrin.h
takes ~897ms to parse.intrin0.h
takes ~1ms to parse.If there is anything required or a different approach is preferred let me know. I would very much like to move this over the finish line so we can use function targets with clang-cl.