-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid usage of __builtin_popcountll when -simdutf is unset #453
Conversation
Thanks @wz1000. First of all, before we discuss the chosen approach, I'd like to see a reproducible evidence of the issue as an additional CI job. This would help us to validate the solution and prevent future regressions. |
@Bodigrim I've modified the See https://github.com/haskell/text/runs/7279337599 (which is a run from #454) for an example of the failure without this patch. |
Lines 197 to 198 in 971051b
is there for a reason indeed. If you remove it, the build breaks. But this does not motivate me to avoid __builtin_popcountll , because extra-libraries: gcc_s is a much simpler workaround. What I've been asking you was to demonstrate that the existing setup is not enough and there exists a reproducible configuration with a build failure.
|
I've improved the implementation to only use 2 bit shifts and a multiplication to compute the popcount and added a CI job that tests it on alpine. |
72e91da
to
4820609
Compare
Unfortunately the static alpine GHC binaries are not usable because of https://gitlab.haskell.org/ghc/ghc/-/issues/21844 I'm stumped by this, not sure how to add a non windows CI job that demonstrates the problem. |
a777146
to
f6c3cd4
Compare
I guess I can try to run it in an alpine container and it should work. I'll try this tomorrow. |
As I suggested in #450 (comment), try to reproduce |
I've fixed the alpine job and also attempted to add a windows job to perform the check you added in bytestring, but that job seems to fail in a different way to the intended failure mode. I'm not sure about this. See https://github.com/haskell/text/runs/7337239414?check_suite_focus=true |
The simplest setup I can come up is https://github.com/Bodigrim/text/tree/purge-path. After the first commit the build succeeds, but as soon as we purge PATH, if fails (the error message is just error code 1, that's fine). If this setup is enough, I'd rather avoid Alpine job with unreleased GHC version. Now I don't quite understand what does it have to do with |
The failure in #454 is due not to This should be fixed in the |
#454 also neglects to disable the |
Like Ben said, there are two mostly independent issues here:
#450 is only triggered if we don't link against The alpine CI job I added demonstrates issue #450. The windows CI job added by this PR demonstrates #456. |
@wz1000 could you please check performance impact of your change? |
and avoid linking against gcc/gcc_s on all platforms. This works around https://gitlab.haskell.org/ghc/ghc/-/issues/21787 and https://gitlab.haskell.org/ghc/ghc/-/issues/19900 which cause problems when GHC's RTS linker tries to load `text`, which occurs if you use a statically linked GHC to compile a file with a TH splice that depends on `text`. Fixes haskell#450 Please enter the commit message for your changes. Lines starting
Here are the results of running Before: https://gist.github.com/wz1000/7604ab1663d8612bbd8485b13ae22f86#file-before-csv |
Comparisions using
|
Tried to improve the implementation of Baseline is
|
I've successfully validated the implementation of #include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <sys/types.h>
static inline const size_t popcount16(uint16_t x) {
// Taken from https://en.wikipedia.org/wiki/Hamming_weight
const uint16_t m1 = 0x5555; //binary: 0101...
const uint16_t m2 = 0x3333; //binary: 00110011..
const uint16_t m4 = 0x0f0f; //binary: 4 zeros, 4 ones ...
x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits
return (x >> 8) + (x & 0x00FF);
}
int main() {
for(int i = 0; i <= 0xFFFF; i++) {
size_t a,b;
a = __builtin_popcount((uint16_t) i);
b = popcount16(i);
if (a != b) {
printf("No match %d %d \n", a, b);
exit(1);
}
}
printf("All values validated\n");
} |
It is very surprising that software emulation of @wz1000 I assume you want to take care of all |
The problem is only triggered if GCC ends up using its own software implementation of popcount, rather than emitting the actual instruction. It is a known issue that the GCC emulation is suboptimal. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041
I'm reasonably confident all the other usages of the symbol are OK because they are guarded by sufficient feature flags to guarantee that GCC emits the popcount instruction for those usages. The RTS linker bug is only triggered if GCC decides to use its software emulation for popcount. |
tests/THLinkTest.hs
Outdated
@@ -0,0 +1,12 @@ | |||
-- Simple test script for #450 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please inline this into simdutf-flag-alpine.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this.
Thanks @wz1000! |
@Bodigrim Could we have a release for inclusion into 9.4? |
The RC will be out this week, but we can use the master branch for that without waiting for a release. The final release will be made in about 2 weeks (early August) and we do need a release by then, ideally before. |
I will test the master branch one more time to be sure, but I think all the patches we need have been merged. |
@wz1000 @bgamari Released as |
Also avoid linking against gcc/gcc_s on all platforms.
This works around https://gitlab.haskell.org/ghc/ghc/-/issues/21787
and https://gitlab.haskell.org/ghc/ghc/-/issues/19900 which cause
problems when GHC's RTS linker tries to load
text
, which occurs ifyou use a statically linked GHC to compile a file with a TH splice that
depends on
text
.Since we don't require SSE4.2 to build
text -simdutf
, this shouldn'tbe much of a pessimisation.
Fixes #450
This is meant to be a temporary workaround for https://gitlab.haskell.org/ghc/ghc/-/issues/21787 while we work on a robust method for properly exposing all GCC symbols from the RTS linker.
However, older versions of GHC (particularly the 9.0 series and earlier) which won't be patched still need a workaround so that
text-2.0
is usable under all configurations.It is also problematic to include
extra-libraries: gcc_s
when compiling with clang.