`<xutility>`: optimize `_Find_unchecked` for data sizes other than 1 #2379

AlexGuteniev · 2021-12-05T16:11:42Z

_Find_unchecked is the implementation of std::find.

Currently it is optimized for 8 bit size array elements by using memchr.

It is possible to expand this optimization for 16 bit size array elements by using wmemchr.

For greater size operands, like 32 or 64 bit, it is possible to implement the optimization either, but it takes some manual implementation.

Assuming 32 bit element, the comparison using SSE2 can be made with _mm_cmpeq_epi32. Then with _mm_movemask_epi8, mask can be extracted. If it is nonzero, countr_zero may be used to determine the position of the first match. This only requires SSE2, which is x64 baseline.

With AVX2, there are 256-bit variables available and 64 bit data sizes are possible.

The implementations should probably go to vector_algorithm.cpp

The text was updated successfully, but these errors were encountered:

AlexGuteniev · 2021-12-05T16:37:37Z

Note there's a PR currently in review in the same area: #2380

AlexGuteniev · 2021-12-15T18:27:44Z

Correction, we can't rely on wmemchr as it is not vectorized and thus slow.
This is reported as DevCom-1614562.
Need to wait till that is fixed, or we can implement 16-bit version on our own as well.

AlexGuteniev · 2021-12-18T11:04:16Z

Inspected memchr under the debugger.
Apparently it implements exactly the algorithm I suggest.
But it is just SSE2, and we deserve AVX2 already.
Besides, it suffers from DevCom-1615707, which is relevant for ranges find with unreachable_sentinel, and may be relevant elsewhere.

So, looks like should implement a custom solution even for size 1.

AlexGuteniev · 2021-12-18T11:10:10Z

Relevant test: std\tests\Dev11_0316853_find_memchr_optimization
Also ranges find test: std\tests\P0896R4_ranges_alg_find
Also parallel find test: std\tests\P0024R2_parallel_algorithms_find

New vector search to resolve microsoft#2379 and test that it fixes microsoft#2431 find

StephanTLavavej added the performance Must go faster label Dec 8, 2021

AlexGuteniev mentioned this issue Dec 9, 2021

<xutility>: vectorize std::count #2384

Closed

AlexGuteniev mentioned this issue Dec 16, 2021

Random effect of Intel JCC Errata on micro optimizations #2405

Closed

AlexGuteniev mentioned this issue Dec 18, 2021

<ranges>: Access violation on find in byte range with unreachable_sentinel #2431

Closed

AlexGuteniev added a commit to AlexGuteniev/STL that referenced this issue Dec 18, 2021

SSE2 and AVX2 std::find

0a740fc

New vector search to resolve microsoft#2379 and test that it fixes microsoft#2431 find

AlexGuteniev mentioned this issue Dec 18, 2021

SSE2 & AVX2 std::find & std::count #2434

Merged

StephanTLavavej closed this as completed in #2434 Apr 4, 2022

StephanTLavavej added the fixed Something works now, yay! label Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`<xutility>`: optimize `_Find_unchecked` for data sizes other than 1 #2379

`<xutility>`: optimize `_Find_unchecked` for data sizes other than 1 #2379

AlexGuteniev commented Dec 5, 2021

AlexGuteniev commented Dec 5, 2021 •

edited

Loading

AlexGuteniev commented Dec 15, 2021

AlexGuteniev commented Dec 18, 2021

AlexGuteniev commented Dec 18, 2021

<xutility>: optimize _Find_unchecked for data sizes other than 1 #2379

<xutility>: optimize _Find_unchecked for data sizes other than 1 #2379

Comments

AlexGuteniev commented Dec 5, 2021

AlexGuteniev commented Dec 5, 2021 • edited Loading

AlexGuteniev commented Dec 15, 2021

AlexGuteniev commented Dec 18, 2021

AlexGuteniev commented Dec 18, 2021

`<xutility>`: optimize `_Find_unchecked` for data sizes other than 1 #2379

`<xutility>`: optimize `_Find_unchecked` for data sizes other than 1 #2379

AlexGuteniev commented Dec 5, 2021 •

edited

Loading