Implement remaining __clz*i2 intrinsics #639

tea · 2024-07-06T12:58:18Z

This one was a bit interesting to figure out.

Firstly, the existing implementation was wrong, as it was working with whatever native word is, whereas __clzsi2 should be 32-bit specifically (SI is 32-bit exactly). compiler-rt implementation is always 32-bit. Interestingly, existing CI tests didn't catch this because testcrate only tests against compiler-rt when compiled with --no-default-features -F c (because testcrate defaults mangled-names). When tested in this way, testcreate surely enough failed __clzsi2 test on 64 bit target.

Even more interestingly, the compiler-rt implementation is also wrong! It is 32-bit implementation but it uses int type for its argument, so on 16-bit platform it couldn't get 32-bit value. I guess compiler-rt is not used much for 16 bit?

Now, libgcc implementation is the correct one. It always uses 16 bit types for HI functions, 32 bit types for SI, 64 bit for DI and 128 bit for TI. But it has its own twist: it only implements two functions out of all possible - one for native word size and one for double that. So on 16 bit it would be HI and SI, on 32 bit SI/DI and on 64 bit DI/TI. Which is why buggy existing implementation didn't caused any dramas: it would never be called as the compilers know it shouldn't even exist (indeed, even trying hard to call it via __builtin_clz(uint32_t) results in something akin to _clzdi2(x as u64) - 32).

So I did the same thing as with bswap - simply relegate work to the LLVM and let it spit out proper builtin implementation. i686 and x86-64 versions look great using bsr. Riscv64... cannot say if it became better or worse than it was; its a different algorithm, though both are branchless and logarithmic in complexity. One would need to benchmark it like crazy to decide which is better :)

I like new version better because of two things:

it could get better "for free" as llvm improves (definitely more eyes on llvm builtins than on compiler-builtins repo).
it could do much better if allowed (e.g. when there is a native cpu instruction to do it; Riscv64gc+Zbb functions const of two instructions).

With this implementation tests are not quite as useful; they can only work as a documentation of sorts since they check the same implementation against itself. Well, they also can check against compiler-rt so there's that.

Amanieu · 2024-07-06T13:22:36Z

Unfortunately that won't work here because LLVM does in fact use this builtin, for example on the armv4t-unknown-linux-gnueabi target.

it could do much better if allowed (e.g. when there is a native cpu instruction to do it; Riscv64gc+Zbb functions const of two instructions).

In that case the implementation in compiler-builtins won't be called anyways since LLVM/GCC will just use the Zbb instruction directly.

tgross35 · 2024-07-06T19:51:47Z

Possible generic version:

fn leading_zeros<I: Int>(mut x: I) -> usize {
    let mut z = I::BITS;
    let mut t: I;
    let mut shift = I::BITS / 2;

    while shift > 1 {
        t = x >> shift;
        if t != I::ZERO {
            z -= shift;
            x = t;
        }

        shift /= 2;
    }

    t = x >> 1;
    let res = if t != I::ZERO { z - 2 } else { z - x.cast() };
    res as usize
}

LLVM correctly unrolls the loop to get the same result as the existing usize-only version. Which means it doesn't make use of e.g. bsr on x86, but I think we'd have to handwrite those if we want them emitted reliably (and not much purpose since LLVM knows that lowering on those platforms).

tgross35 · 2024-07-06T20:00:18Z

testcrate/tests/misc.rs

+    #[cfg(any(target_pointer_width = "32", target_pointer_width = "64"))]
+    {
+        use compiler_builtins::int::leading_zeros::__clzdi2;
+        fuzz(N, |x: u64| {
+            if x == 0 {
+                return; // undefined value for an intrinsic
+            }
+            let lz = x.leading_zeros() as usize;
+            let lz0 = __clzdi2(x);
+            if lz0 != lz {
+                panic!("__clzdi2({}): std: {}, builtins: {}", x, lz, lz0);
+            }
+        });
+    }
+    #[cfg(target_pointer_width = "64")]
+    {
+        use compiler_builtins::int::leading_zeros::__clzti2;
+        fuzz(N, |x: u128| {
+            if x == 0 {
+                return; // undefined value for an intrinsic
+            }
+            let lz = x.leading_zeros() as usize;
+            let lz0 = __clzti2(x);
+            if lz0 != lz {
+                panic!("__clzti2({}): std: {}, builtins: {}", x, lz, lz0);
+            }
+        });
+    }


What exactly is happening on 16 & 32-bit platforms for the gating - is LLVM lowering .leading_zeros() to this missing symbol? From CI it looks like MSVC is missing ___clzti2 on arm and 32-bit so you'll have to update the gates.

If this logic is a bit more complex than just pointer sizes, it might be better to add a NoSysU128Clz option to

compiler-builtins/testcrate/build.rs

Lines 5 to 10 in 06db2de

enum Feature {

NoSysF128,

NoSysF128IntConvert,

NoSysF16,

NoSysF16F128Convert,

}

, update that file, and then gate on that (easier if the gates need to be reused).

I gated these, and the intrinsics themselves, the same way the compilers do (well, libgcc does so they have to follow along?). Makes no sense to test for intrinsics which aren't ever used.

As for 128 bit on 64-bit msvc - I guess they don't have 128 bit int somehow, according to compiler-rt. I just took "optimized c" attribute off the implementation instead of trying to gate it. Single branch isn't that bad.

Ah, I missed that the gates were on the intrinsics as well. Could you please remove them? The reasons that libgcc doesn't supply a specific intrinsic on specific platforms may vary (no integers of that type, GCC itself doesn't use them, platform bugs, untested, just not done yet etc). But if we don't have any specific reason not to make them available and there’s a chance that the compilers make use of them, we should just provide them.

Also a lot of the purpose of this library is to fill in for the shortcomings of libgcc, so it doesn't make any sense to copy their limitations :).

well, libgcc does so they have to follow along?

Unfortunately this isn't very accurate, LLVM (and GCC) both pretty frequently emit symbols that aren't available on lesser-known platforms. Rust hits this more often because we make less common types available on all platforms (e.g. u128, f16, f128) where there might not be a C equivalent available.

Not like this specific symbol is likely to be a problem anywhere. But we should avoid unnecessary cfg (unless it's a arch-specific symbol) because it just means somebody might have to chase them down in the future and wonder why it's disabled.

Removed all the gate. Let the flood begin!

As for libgcc and their builtins, the reason there is simple: that's how they do it. Basically they have two and only two functions __clzSI2 and __clzDI2 (note uppercase letters). Despite their names, they operate on native word size and double that respectively. These function names (and argument types) get macroed into whatever proper name and argument type is for the platform.

Same is true for many other builtins there.

Thanks for the change. I think you might be able to remove some C sources that we are building (mentioned below) but otherwise the implementation looks good to me now.

src/int/leading_zeros.rs

tgross35 · 2024-07-16T08:34:26Z

src/int/leading_zeros.rs

+    const { assert!(T::BITS <= 64) };
+    if T::BITS >= 64 {


Optional nit: the second condition could become == 64 since > 64 is rejected right above

I'd rather leave this as is, as it is more future-proof (in case any 128 bit implementations would actually pop up, and whoever applies 128 bit part happens to be both incredibly careless and doesn't test anything...)

More seriously though, result of the expression is a compile-time constant so it doesn't matter at all which operator to use here, and >= is more consistent with 32 bit part down below.

tgross35 · 2024-07-16T08:43:35Z

build.rs

@@ -382,7 +383,6 @@ mod c {
            sources.extend(&[
                ("__absvti2", "absvti2.c"),
                ("__addvti3", "addvti3.c"),
-                ("__clzti2", "clzti2.c"),


I think you could also remove __clzdi2 and __clzsi2 on lines 349-350 now that these aren't broken. Possibly

compiler-builtins/build.rs

Line 577 in 2bf0425

sources.extend(&[("__clzdi2", "clzdi2.c"), ("__clzsi2", "clzsi2.c")])

too if our implementation works there.

I wasn't sure what's the policy on these. Is it "only apply C replacements for broken/missing functions"?

That is accurate, part of the goal of this crate is to be able to target some platforms without needing a C toolchain. I think that usually once something gets ported over and tested, it can be removed from the C sources lists.

I guess maybe we only used the C implementations on thumb because ours was broken?

Done.

Aarch64 CI is down because of new rustc I guess (0/1 asm labels are now off limits instead of just discouraged). This particular thing about numeric labels amazes me so much... This is due to llvm bug, but instead of dealing with that people decided to document it in rust book (so I guess it is a feature now) and make it a compile-time error.

I don't think it's an easy llvm bug to fix unfortunately. That lint should almost certainly be x86-only though, where LLVM was already throwing an error and the lint just makes it a more accurate error. I brought it up.

You can add

#![allow(binary_asm_labels)] // aarch64 has no problems with binary labels

to the top of src/aarch64_linux.rs just to get that to pass.

tgross35

LGTM now, thanks. @Amanieu still needs to do the final review. The CI error obviously isn't your problem.

tgross35 · 2024-07-18T21:30:53Z

Fix coming for the lint btw, probably will be on tomorrow's nightly. Just rebase once it's available rust-lang/rust#127935

tgross35 reviewed Jul 6, 2024

View reviewed changes

tea force-pushed the clz branch from b4ab209 to e21b939 Compare July 6, 2024 20:20

tgross35 reviewed Jul 6, 2024

View reviewed changes

src/int/leading_zeros.rs Outdated Show resolved Hide resolved

tea force-pushed the clz branch from e21b939 to 2bf0425 Compare July 8, 2024 07:04

tgross35 reviewed Jul 16, 2024

View reviewed changes

tgross35 approved these changes Jul 16, 2024

View reviewed changes

tea added 2 commits July 24, 2024 12:50

Implement remaining __clz*i2 intrinsics

3041451

Never use C version of clz builtins

29d3466

Amanieu force-pushed the clz branch from bb60456 to 29d3466 Compare July 24, 2024 11:50

Amanieu enabled auto-merge July 24, 2024 11:50

Amanieu merged commit b78d0f1 into rust-lang:master Jul 24, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement remaining __clz*i2 intrinsics #639

Implement remaining __clz*i2 intrinsics #639

tea commented Jul 6, 2024

Amanieu commented Jul 6, 2024

tgross35 commented Jul 6, 2024 •

edited

Loading

tgross35 Jul 6, 2024 •

edited

Loading

tea Jul 6, 2024

tgross35 Jul 6, 2024 •

edited

Loading

tea Jul 8, 2024

tea Jul 15, 2024

tgross35 Jul 16, 2024

tgross35 Jul 16, 2024

tea Jul 16, 2024

tgross35 Jul 16, 2024

tea Jul 16, 2024

tgross35 Jul 16, 2024 •

edited

Loading

tea Jul 16, 2024

tgross35 Jul 16, 2024 •

edited

Loading

tgross35 left a comment •

edited

Loading

tgross35 commented Jul 18, 2024

	enum Feature {
	NoSysF128,
	NoSysF128IntConvert,
	NoSysF16,
	NoSysF16F128Convert,
	}

Implement remaining __clz*i2 intrinsics #639

Implement remaining __clz*i2 intrinsics #639

Conversation

tea commented Jul 6, 2024

Amanieu commented Jul 6, 2024

tgross35 commented Jul 6, 2024 • edited Loading

tgross35 Jul 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross35 Jul 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross35 Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgross35 Jul 16, 2024 • edited Loading

Choose a reason for hiding this comment

tgross35 left a comment • edited Loading

Choose a reason for hiding this comment

tgross35 commented Jul 18, 2024

tgross35 commented Jul 6, 2024 •

edited

Loading

tgross35 Jul 6, 2024 •

edited

Loading

tgross35 Jul 6, 2024 •

edited

Loading

tgross35 Jul 16, 2024 •

edited

Loading

tgross35 Jul 16, 2024 •

edited

Loading

tgross35 left a comment •

edited

Loading