-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for unreachable sentinels #1810
Optimizations for unreachable sentinels #1810
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit concerned about the additional derivations of _Copy_memcpy
Also I believe the _STL_ASSERT
s should bee _STATIC_ASSERT(_Always_false, ...)
stl/inc/algorithm
Outdated
&& (_Is_sized1 || same_as<_Se1, unreachable_sentinel_t>) // | ||
#pragma warning(suppress : 6287) // Redundant code: the left and right subexpressions are identical | ||
&&(_Is_sized2 || same_as<_Se2, unreachable_sentinel_t>) // | ||
&&same_as<_Pj1, identity> && same_as<_Pj2, identity>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the above case you put those constraints on top. I would prefer to keep them there but downgrade to is_same_v
or _Same_impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the ranges algorithm optimizations I've written already use same_as
.
stl/inc/xmemory
Outdated
template <class _InIt, class _OutIt, typename _DistIt> | ||
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance( | ||
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept { | ||
const auto _IFirstPtr = _To_address(_IFirst); | ||
const auto _OFirstPtr = _To_address(_OFirst); | ||
const auto _DFirstPtr = _To_address(_DFirst); | ||
const auto _DLastPtr = _To_address(_DLast); | ||
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr)); | ||
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr)); | ||
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr)); | ||
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr)); | ||
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch); | ||
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count); | ||
if constexpr (is_pointer_v<_InIt>) { | ||
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count); | ||
} else { | ||
_IFirst += _Count / sizeof(iter_value_t<_InIt>); | ||
} | ||
|
||
if constexpr (is_pointer_v<_OutIt>) { | ||
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count); | ||
} else { | ||
_OFirst += _Count / sizeof(iter_value_t<_OutIt>); | ||
} | ||
return {_STD move(_IFirst), _STD move(_OFirst)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that overload is superfluous. The only difference is how we get to _Count
I guess we can get to that easier that duplicating all that machinery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might just use the _count
version but that might have worse codegen. #1433 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought more about determining _Count
via the right method and passing that around. That should give the optimal codegen all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.
stl/inc/xmemory
Outdated
template <class _InIt, class _OutIt, typename _DistIt> | ||
in_out_result<_InIt, _OutIt> _Copy_memcpy_distance( | ||
_InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept { | ||
const auto _IFirstPtr = _To_address(_IFirst); | ||
const auto _OFirstPtr = _To_address(_OFirst); | ||
const auto _DFirstPtr = _To_address(_DFirst); | ||
const auto _DLastPtr = _To_address(_DLast); | ||
const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr)); | ||
const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr)); | ||
const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr)); | ||
const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr)); | ||
const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch); | ||
_CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count); | ||
if constexpr (is_pointer_v<_InIt>) { | ||
_IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count); | ||
} else { | ||
_IFirst += _Count / sizeof(iter_value_t<_InIt>); | ||
} | ||
|
||
if constexpr (is_pointer_v<_OutIt>) { | ||
_OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count); | ||
} else { | ||
_OFirst += _Count / sizeof(iter_value_t<_OutIt>); | ||
} | ||
return {_STD move(_IFirst), _STD move(_OFirst)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.
Co-authored-by: Casey Carter <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I'll validate and push changes for the very minor things I noticed.
@AdamBucior @CaseyCarter FYI I added |
Also had to merge |
Thanks for increasing the STL's performance to previously unreachable levels! 😹 🚀 📈 |
Enables optimizations for contiguous iterator +
unreachable_sentinel_t
ranges.The only thing I'm concerned about is users overloading
operator==(custom contiguous iterator, unreachable_sentinel_t)
which doesn't always returnfalse
. Such code should definitely be forbidden by the standard but I'm not sure if it is.