-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve search
/find_end
perf by dropping memcmp
#4654
Conversation
The difference in "before" results between It is curious that I would want someone else to confirm the results, and maintainers decision what to do with this. |
A possible way to handle it is to remove 32 and 64 bit optimization/vectorization attempts at all, so that |
And if we are to keep vectorization only for 8-bit and 16-bit elements, we may drop the current implementation and not review/commit it in the first place, if SSE4.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review - looking good so far!
Who's a good search? You are! Yes you!
I pushed changes to address the issues that I found, but I still need to think about whether we should be vectorizing this at all. I'm leaning towards ripping out the existing |
…pred`. `_Equal_rev_pred_unchecked` is called by classic/parallel `search`/`find_end`. `_Equal_rev_pred` is called by ranges `search`/`find_end`. This doesn't affect `equal` etc.
Ok, after looking at the benchmarks, I've taken the radical step of reverting both the vectorization changes and the existing
This compares There's some noise here (e.g. Note that dropping the The In addition to keeping the benchmark from this vectorization attempt, I've also kept the correctness test (even though |
search
search
/find_end
perf by dropping memcmp
Agreed. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for investigating and improving the performance here even if the final result was very different from the initial vision! 🔮 🪄 🚀 |
It is used only once after microsoft#4654
Resolves #2453
These benchmark results are no longer relevant, as the PR intention has changed
Before:
After:
strstr
is given for a reference in the benchmark, it is not affected by the optimization.It may be impossible to reach
strstr
performance, as it usespcmpistri
(and reading beyond the last element, aspcmpistri
is not very useful otherwise). We can trypcmpestri
for 8-bit and 16-bit cases, but still it may be not as efficient, asstrstr
. I'd prefer to try this additional optimization in a next PR though.