Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do about big endian architectures' support? #348

Open
Oppen opened this issue Jan 13, 2022 · 6 comments
Open

What to do about big endian architectures' support? #348

Oppen opened this issue Jan 13, 2022 · 6 comments

Comments

@Oppen
Copy link
Contributor

Oppen commented Jan 13, 2022

During evaluation of #198 the issue of big endian code not being tested by the CI pipelines was raised.
This means, despite code being in place to support it, the architectures should be considered unsupported because they're untested.

There are a few options to deal with this:

  • Properly considering those archs unsupported, in which case the reasonable way to deal with it is to remove the code and just assume little endian;
  • Add pipelines to run on at least one such architecture.

The Go toolchain supports a few BE archs, in particular PowerPC64 with GOARCH=ppc64, and qemu can be used to run commands on the emulated architecture, and I successfully cross-compiled and run the tests for ppc64 that way locally:

$ GOARCH=ppc64 go test -c -o ppc64_test
$ qemu-ppc64 ./ppc64_test 
==roaring==
{1,2,3,4,5,100,1000}
{3,4,1000}
{}
Cardinality:  7
Contains 3?  true
1
3
4
5
1000

Wrote  22  bytes
I wrote the content to a byte stream and read it back.
size before run optimize: 1810 bytes, and after: 38 bytes.
2022/01/13 19:39:53 rtest N= 15
2022/01/13 19:39:53 rtest N= 1024
2022/01/13 19:39:54 rtest N= 4096
2022/01/13 19:39:54 rtest N= 65536
2022/01/13 19:39:55 rtest N= 1048576
2022/01/13 19:40:04 rtest N= 15
2022/01/13 19:40:04 rtest N= 100
2022/01/13 19:40:04 rtest N= 512
2022/01/13 19:40:04 rtest N= 1023
2022/01/13 19:40:04 rtest N= 1025
2022/01/13 19:40:04 rtest N= 4095
2022/01/13 19:40:04 rtest N= 4096
2022/01/13 19:40:04 rtest N= 4097
2022/01/13 19:40:04 rtest N= 65536
2022/01/13 19:40:05 rtest N= 1048576
2022/01/13 19:41:26 rtest N= 15
2022/01/13 19:41:26 rtest N= 1024
2022/01/13 19:41:26 rtest N= 4096
2022/01/13 19:41:26 rtest N= 65536
2022/01/13 19:41:26 rtest N= 1048576
2022/01/13 19:41:35 rtest N= 15
2022/01/13 19:41:35 rtest N= 100
2022/01/13 19:41:35 rtest N= 512
2022/01/13 19:41:35 rtest N= 1023
2022/01/13 19:41:35 rtest N= 1025
2022/01/13 19:41:35 rtest N= 4095
2022/01/13 19:41:35 rtest N= 4096
2022/01/13 19:41:35 rtest N= 4097
2022/01/13 19:41:35 rtest N= 65536
2022/01/13 19:41:36 rtest N= 1048576
200100
PASS

In case we choose to support those by adding pipelines to check them, we should decide whether we want to make them first or second class citizens with regards to performance. Most mainstream architectures being little endian, it may make sense to trade off performance in BE if this improves performance for the LE case.
This question is practical rather than hypothetical, as it was suggested for hashing to cast slices in a way that would mean extra allocations for BE, in exchange for the ability to pass bigger blocks to the hasher.

For a start, it would be interesting to know whether there are currently any users of BE roaring.

@lemire
Copy link
Member

lemire commented Jan 13, 2022

Most mainstream architectures being little endian, it may make sense to trade off performance in BE if this improves performance for the LE case.

I think that this is absolutely true. BE is a fringe requirement.

I bet that we have no BE user. It might be fine to have a fallback (slow) approach.

@Oppen
Copy link
Contributor Author

Oppen commented Jan 14, 2022

I think it's worth documenting in the README that BE is only supported as a fallback and will suffer degraded performance for some operations. Just to avoid surprises and spurious issues in the unlikely case someone chooses to user roaring there.

@Oppen
Copy link
Contributor Author

Oppen commented Jan 14, 2022

I'm working on a PR to try and test on ppc64. I could add a line in that same PR about it.

@jacksonrnewhouse
Copy link

I believe https://github.com/RoaringBitmap/roaring/blob/master/serialization_generic.go does not assume endianness, hence the "generic" monicker. I also believe it to be correct, as that was what was being used on arm64 machines until we updated the build flags. Now, those arm64 machines are little endian, so there might be something else that breaks on BE, but the intent to support both is there, at least. The performance wasn't terrible using serialization_generic, but it did do many more allocations, as it couldn't do unsafe slice manipulations.

@Oppen
Copy link
Contributor Author

Oppen commented Jan 16, 2022

@jacksonrnewhouse that's not the code that runs for little endian. Despite saying it's generic, because it technically is, for little endian this is what gets built: https://github.com/RoaringBitmap/roaring/blob/master/serialization_littleendian.go

This performs better because it simply does unsafe casting of the slices, but is only valid on LE. It doesn't allocate or copy or call generic functions (those get inlined in practice tho, but all the rest is still overhead).

@Oppen
Copy link
Contributor Author

Oppen commented Jan 16, 2022

Regarding whether it is correct: we still need to test it, and we still prefer to update the build flags so LE use the LE-optimized path. We'll inevitably lag behind for some time when new archs come out, but that still is gonna happen and thus we'll still be skipping the generic code from tests.

lemire added a commit that referenced this issue Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants