-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize find_end
#4943
Vectorize find_end
#4943
Conversation
I can look into improving the "evil" case results by detecting it and switching strategy |
Resolved adjacent add/edit conflict in stl/src/vector_algorithms.cpp.
This comment was marked as resolved.
This comment was marked as resolved.
Thanks! 😸 I pushed some fixes, please double-check. Final perf results on my 5950X look good. The regressions for the highly pathological cases aren't too bad, and I don't think we need to add extra implementation complexity to avoid them.
|
I observe that you dropped parens in |
Nah, I just wasn't consistently complaining. 😹 When an expression in a conditional operator is especially long then I don't find extra parens to be as objectionable. |
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for finding a way to make this faster! 🔍 🔎 😹 |
📜 Overview
std::search
of 1 and 2 bytes elements withpcmpestri
#4745 in the opposite directionpcmpestri
instruction and variable step, but it looked like multiple-indicespcmpestrm
and fixed step would work better for this direction, because the former would return more partial false matches than in forward direction, as the match with higher index is more likely to be partial.pcmpestrm
approach could have been made more efficient if DevCom-10689455 is fixed, but honestly I don't hope for that much__std_search_impl
, the place where_Match_1st_16
is introduced. I basically implemented the same thing from scratch, and the new attempt gave clearer control flow implementation. They are not shared though, as_First1 > _Stop1
check is needed for variable step, but not needed for fixed step, so here's still some variation.replace
with smaller elements.search
+ closer to start andfind_end
+ closer to end, It is useful to test the case where startup cost contributes more.👿 The Evil case
Unlike many other algorithms,
search
algorithm run time highly depends on the data. It is possible to craft data which takes way more time than typical data of the same length.The worst case could be haystack of a single repeating value, and the needle with the same repeating values and another value in the end. This is bad for plain
search
, although Boyer-Moore or similar algorithms are expected to handle that better.find_end
are much worse for that casesearch
are also worse after vectorization for such case, but not that much asfind_end
It look possible to detect the evil case (by the amount of matched beginnings per some amount of data) and switch to another algorithm, or to modify the whole algorithm to be less affected. It can be done in this or subsequent PR.
I doubt how much the Evil case is important.
🏁 Benchmark results
Legend for the benchmark parameter:
STL/benchmarks/src/search.cpp
Lines 71 to 77 in 60825c4
Re-testong of the search PR against then-main as there are more cases in the benchmark: