Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectorize find_first_of for long needle 🪡 #4557

Merged
merged 4 commits into from
Apr 9, 2024

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Apr 2, 2024

In benchmark results only the last line changed:

before

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
bm<uint8_t, 2, 3>            2.70 ns         2.35 ns    597333333
bm<uint16_t, 2, 3>           2.71 ns         2.13 ns    689230769
bm<uint8_t, 7, 4>            14.1 ns         10.9 ns    112000000
bm<uint16_t, 7, 4>           12.4 ns         10.6 ns    100000000
bm<uint8_t, 9, 3>            9.77 ns         7.70 ns    190638298
bm<uint16_t, 9, 3>           10.4 ns         8.06 ns    162909091
bm<uint8_t, 22, 5>           10.1 ns         8.02 ns    208372093
bm<uint16_t, 22, 5>          11.0 ns         8.64 ns    151864407
bm<uint8_t, 3056, 7>          273 ns          233 ns      5973333
bm<uint16_t, 3056, 7>         561 ns          425 ns      3089655
bm<uint8_t, 1011, 11>        95.5 ns         79.2 ns     17568627
bm<uint16_t, 1011, 11>       2872 ns         2448 ns       497778

after

-----------------------------------------------------------------
Benchmark                       Time             CPU   Iterations
-----------------------------------------------------------------
bm<uint8_t, 2, 3>            2.71 ns         1.80 ns    746666667
bm<uint16_t, 2, 3>           2.71 ns         1.55 ns   1000000000
bm<uint8_t, 7, 4>            14.1 ns         7.47 ns    175686275
bm<uint16_t, 7, 4>           12.3 ns         5.82 ns    190638298
bm<uint8_t, 9, 3>            9.78 ns         6.75 ns    208372093
bm<uint16_t, 9, 3>           10.7 ns         7.54 ns    190638298
bm<uint8_t, 22, 5>           10.3 ns         7.21 ns    169056604
bm<uint16_t, 22, 5>          11.1 ns         8.63 ns    175686275
bm<uint8_t, 3056, 7>          276 ns          205 ns     11200000
bm<uint16_t, 3056, 7>         552 ns          432 ns      2890323
bm<uint8_t, 1011, 11>        94.4 ns         76.2 ns     19478261
bm<uint16_t, 1011, 11>        481 ns          365 ns      3895652

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner April 2, 2024 19:22
@StephanTLavavej StephanTLavavej added the performance Must go faster label Apr 2, 2024
@StephanTLavavej

This comment was marked as resolved.

@AlexGuteniev
Copy link
Contributor Author

There are real test failures here.

There were multiple errors.

After fixing all of them, I thought it would be better to preserve the <= 16 version, as the generic becomes noticeable slower.

@AlexGuteniev
Copy link
Contributor Author

Review with "hide whitespace difference" to see the <= 16 branch preserved.

@StephanTLavavej StephanTLavavej self-assigned this Apr 3, 2024
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 5ced5d2 into microsoft:main Apr 9, 2024
35 checks passed
@StephanTLavavej
Copy link
Member

Thanks for improving your vectorized implementation even further! 🚀 🚀 🚀

@AlexGuteniev AlexGuteniev deleted the long_needle branch April 10, 2024 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants