Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations for unreachable sentinels #1810

Merged

Conversation

AdamBucior
Copy link
Contributor

@AdamBucior AdamBucior commented Apr 6, 2021

Enables optimizations for contiguous iterator + unreachable_sentinel_t ranges.

The only thing I'm concerned about is users overloading operator==(custom contiguous iterator, unreachable_sentinel_t) which doesn't always return false. Such code should definitely be forbidden by the standard but I'm not sure if it is.

@AdamBucior AdamBucior requested a review from a team as a code owner April 6, 2021 15:57
@StephanTLavavej StephanTLavavej added performance Must go faster ranges C++20/23 ranges labels Apr 6, 2021
Copy link
Contributor

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned about the additional derivations of _Copy_memcpy

Also I believe the _STL_ASSERTs should bee _STATIC_ASSERT(_Always_false, ...)

stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
&& (_Is_sized1 || same_as<_Se1, unreachable_sentinel_t>) //
#pragma warning(suppress : 6287) // Redundant code: the left and right subexpressions are identical
&&(_Is_sized2 || same_as<_Se2, unreachable_sentinel_t>) //
&&same_as<_Pj1, identity> && same_as<_Pj2, identity>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above case you put those constraints on top. I would prefer to keep them there but downgrade to is_same_v or _Same_impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the ranges algorithm optimizations I've written already use same_as.

stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/memory Show resolved Hide resolved
stl/inc/xmemory Show resolved Hide resolved
stl/inc/xmemory Outdated
Comment on lines 1579 to 1604
template <class _InIt, class _OutIt, typename _DistIt>
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance(
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept {
const auto _IFirstPtr = _To_address(_IFirst);
const auto _OFirstPtr = _To_address(_OFirst);
const auto _DFirstPtr = _To_address(_DFirst);
const auto _DLastPtr = _To_address(_DLast);
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr));
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr));
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr));
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr));
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch);
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count);
if constexpr (is_pointer_v<_InIt>) {
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count);
} else {
_IFirst += _Count / sizeof(iter_value_t<_InIt>);
}

if constexpr (is_pointer_v<_OutIt>) {
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count);
} else {
_OFirst += _Count / sizeof(iter_value_t<_OutIt>);
}
return {_STD move(_IFirst), _STD move(_OFirst)};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that overload is superfluous. The only difference is how we get to _Count

I guess we can get to that easier that duplicating all that machinery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might just use the _count version but that might have worse codegen. #1433 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought more about determining _Count via the right method and passing that around. That should give the optimal codegen all the time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.

stl/inc/xmemory Outdated Show resolved Hide resolved
stl/inc/xutility Show resolved Hide resolved
@CaseyCarter CaseyCarter self-assigned this Apr 28, 2021
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/memory Outdated Show resolved Hide resolved
stl/inc/algorithm Outdated Show resolved Hide resolved
stl/inc/xmemory Show resolved Hide resolved
stl/inc/xmemory Outdated Show resolved Hide resolved
stl/inc/xmemory Outdated
Comment on lines 1579 to 1604
template <class _InIt, class _OutIt, typename _DistIt>
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance(
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept {
const auto _IFirstPtr = _To_address(_IFirst);
const auto _OFirstPtr = _To_address(_OFirst);
const auto _DFirstPtr = _To_address(_DFirst);
const auto _DLastPtr = _To_address(_DLast);
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr));
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr));
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr));
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr));
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch);
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count);
if constexpr (is_pointer_v<_InIt>) {
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count);
} else {
_IFirst += _Count / sizeof(iter_value_t<_InIt>);
}

if constexpr (is_pointer_v<_OutIt>) {
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count);
} else {
_OFirst += _Count / sizeof(iter_value_t<_OutIt>);
}
return {_STD move(_IFirst), _STD move(_OFirst)};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.

stl/inc/xmemory Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
@CaseyCarter CaseyCarter removed their assignment Jun 2, 2021
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I'll validate and push changes for the very minor things I noticed.

stl/inc/memory Outdated Show resolved Hide resolved
stl/inc/xmemory Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

@AdamBucior @CaseyCarter FYI I added _STD move occurrences and verified that the tests are passing, please hiss if I damaged something (I'm 99.999% sure they're good).

@StephanTLavavej
Copy link
Member

Also had to merge main and fix a stealth merge conflict - now that views are guarded by /std:c++latest after GH-1929, the tests being modified to use views::take_while need to use the concepts_latest_matrix.

@StephanTLavavej StephanTLavavej self-assigned this Jun 11, 2021
@StephanTLavavej StephanTLavavej merged commit e326ec1 into microsoft:main Jun 12, 2021
@StephanTLavavej
Copy link
Member

Thanks for increasing the STL's performance to previously unreachable levels! 😹 🚀 📈

@AdamBucior AdamBucior deleted the unreachable-sentinel-optimizations branch June 12, 2021 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster ranges C++20/23 ranges
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants