Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy multiplatform support #23

Closed
miroslavp opened this issue Mar 18, 2023 · 15 comments
Closed

Easy multiplatform support #23

miroslavp opened this issue Mar 18, 2023 · 15 comments

Comments

@miroslavp
Copy link
Contributor

Hey,
I have noticed that we can easily address this issue. Currently we only use two x86 SSE2 instructions - Sse2.CompareEqual and Sse2.MoveMask. Those two can be easily replaced with platform independent equivalents - the static method Vector128.Equal and the extension method Vector128.ExtractMostSignificantBits.
At run-time the JIT will replace them with SSE2 or their analogues on ARM, depending on which architecture the app is running on.

So basically code like this

//compare vectors
var comparison = Sse2.CompareEqual(left, right);
//convert to int bitarray
int result = Sse2.MoveMask(comparison);

can be changed to

int result = (int)Vector128.Equals(left, right).ExtractMostSignificantBits();

I've tried and replaced the SSE2 calls with Vector128.Equal and Vector128.ExtractMostSignificantBits and it started working on my brother's Macbook Air with M1 processor without x86 emulation.

To make it work, we also need to change this check

if (!Sse2.IsSupported)
{
throw new NotSupportedException("Simd SSe2 is not supported");
}

to

 if (!Vector128.IsHardwareAccelerated) 
 { 
     throw new NotSupportedException("Simd is not supported"); 
 } 

Isn't that cool or what?

@Wsm2110 Wsm2110 closed this as completed Mar 18, 2023
@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

"At run-time the JIT will replace them with SSE2 or their analogues on ARM, depending on which architecture the app is running on."

Would love to see where you read this :)

The thing is i don't have a macbook or any arm related hardware. So it's really hard to test.

@Wsm2110 Wsm2110 reopened this Mar 18, 2023
@miroslavp
Copy link
Contributor Author

Don't need to read it, I've tried it myself on sharplab.io. Click on the link below to see it yourself. On your left-hand side is the c# code. On your right-hand side is the JIT-ted code. You can see that the code generated by the JIT compiler is identical. By the way, there is a nice VS plugin called Disasmo. You can use it on your methods to see the generated ASM code if you don't want to use sharplab.io. The downside is it only works for console apps

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4a9Jq049+ggJI8ZEMfKhKV6zY2btufASyMYovLrmXXatnQ/3OPG4eXiwAGgAcSBrUmgDMTKQMQgwA3jQMmVlZYm4K2BgwDABqMGAY0OSkEQA8uMAAnoUAfAyysgCipB0AjhzYkgAUpeWV1XWNLQwKZeRoJWUVUFW19U0wrTNgpACUGdnZ6dQHJ5nEAOxtuDCkwqJi2LC9/UNbc9NluzGnmQC++5kASdcrx8oUFqNluM1lMAAqSAoAM2g+CMABMYGIYFwMTxngNhosxqtJhsPmB3iMlisJutNp89scfkcfgcLhDqdUWPjJLhBm95lsvkCDv8mQcRdkQWCiu4MG1OqQALIQGZKvAAa0JkJpMLJW0ZrJZrKy7Nk11uKrVmv5ZR23x+YuykpyeQKsp4DHhSJR6Mx2NxGCtMHVuC1VOJtKmBpdmWNJqYl0Gcp2W25CFc2HKKtwGFkvAA5lxeIjlNgeAAhXgYPn22MMX4MGi/IA

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 18, 2023

By the way, M1 results for the Get benchmark is about the same as yours - "7.5 ms" (on a Macbook Air, without a fan) which is pretty impressive

Updated:
On my desktop machine with AMD Ryzen 5 3600 and gigantic fan it is 10ms ... which is a shame.

ARM cpus are getting stronger and stronger

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 18, 2023

If you are really interested to see the code where the JIT emits the instructions you can take a look in these two files
for x32/x64
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicxarch.cpp
for arm64
https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsicarm64.cpp

Search for NI_Vector128_Equals and NI_Vector128_ExtractMostSignificantBits

@miroslavp
Copy link
Contributor Author

These are the default implementations of the methods.
https://github.com/dotnet/runtime/blob/7a0b0e138750b896b4ea0a8ac8fca77e266dcd56/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs#L1423-L1438
https://github.com/dotnet/runtime/blob/7a0b0e138750b896b4ea0a8ac8fca77e266dcd56/src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs#L1466-L1480
However pay attention on the [Intrinsic] attribute they have. This hints the JIT compiler that if the hardware supports SIMD it should replace the implementation with the corresponding SIMD instructions - Sse2.CompareEqual (vpcmpeqb) and Sse2.MoveMask (vpmovmskb) on x86

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 18, 2023

FYI, just made my brother run the DenseMapSIMD tests, and all 26 of them have passed successfully on his Macbook Air M1. This is after replacing the Sse2 intrinsics with the platform independent ones.

However 6 DenseMap FastMap (non-SIMD) tests have failed both on his M1 and on my AMD Ryzen. Please take a look

@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

Ill take a look mate :)

@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

FYI, just made my brother run the DenseMapSIMD tests, and all 26 of them have passed successfully on his Macbook Air M1. This is after replacing the Sse2 intrinsics with the platform independent ones.

However 6 DenseMap (non-SIMD) tests have failed both on his M1 and on my AMD Ryzen. Please take a look

not really sure why they are failing, they seem to work on my intel cpu

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 18, 2023

Sorry about that ... the failing tests are for FastMap
This is one of them

        [TestMethod]
        public void AssertCustomEnumerators()
        {
            var map = new FastMap<uint, uint>();
            map.Emplace(202, 202); //13
            map.Emplace(131, 131); //15
            map.Emplace(597, 597); //15
            map.Emplace(681, 681); //14
            map.Emplace(893, 893); //14
            map.Emplace(516, 516); //14

            var count = 0;
            var count2 = 0;
            foreach (uint unused in map.Keys)
            {
                count++;
            }

            Assert.IsTrue(count == 6);

            foreach (uint unused in map.Values)
            {
                count2++;
            }

            Assert.IsTrue(count2 == 6);
        }

It fails on Assert.IsTrue(count == 6);

@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

Should probably add a deprecated attribute. I like my robinhood hashmap implementation thats the only reason its still in here.

@miroslavp
Copy link
Contributor Author

Did you manage to reproduce it on your machine?

@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

yea thnx

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 18, 2023

Here's a sample run of GetBenchmark under M1 using "VisualStudio for Mac". The results of FastMap are debatable, considering that 6 of its tests are failing

BenchmarkDotNet=v0.13.1, OS=macOS 13.2.1 (22D68) [Darwin 22.3.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
.NET SDK=7.0.202
[Host] : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT [AttachedDebugger]
Job-YUSQNG : .NET 7.0.4 (7.0.423.11508), Arm64 RyuJIT

InvocationCount=128 IterationCount=50 LaunchCount=1
RunStrategy=Monitoring WarmupCount=10

Method Mean Error StdDev
DenseMapSIMD 7.940 ms 0.0127 ms 0.0257 ms
DenseMap 11.415 ms 0.0321 ms 0.0648 ms
FastMap 7.046 ms 0.0342 ms 0.0691 ms
SlimDictionary 14.283 ms 0.0241 ms 0.0486 ms
Dictionary 17.728 ms 0.0582 ms 0.1176 ms

@Wsm2110
Copy link
Owner

Wsm2110 commented Mar 18, 2023

Actually was looking for something like this:

dotnet/runtime#49397
dotnet/runtime#63331

Anyhow, if you like you can submit a pull request :)

@miroslavp
Copy link
Contributor Author

miroslavp commented Mar 19, 2023

Done
pull request

@Wsm2110 Wsm2110 closed this as completed Mar 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants