-
Notifications
You must be signed in to change notification settings - Fork 758
Conversation
run tests |
DVS CL: 30580794 |
There is one thing that might impact the performance on thrust and libcxx pointers. In cub, https://github.com/NVIDIA/cub/blob/main/cub/agent/agent_scan_by_key.cuh#L124-L129 using WrappedKeysInputIteratorT = typename If<IsPointer<KeysInputIteratorT>::VALUE,
CacheModifiedInputIterator<AgentScanByKeyPolicyT::LOAD_MODIFIER, KeyT, OffsetT>, // Wrap the native input pointer with CacheModifiedInputIterator
KeysInputIteratorT>::Type;
using WrappedValuesInputIteratorT = typename If<IsPointer<ValuesInputIteratorT>::VALUE,
CacheModifiedInputIterator<AgentScanByKeyPolicyT::LOAD_MODIFIER, InputT, OffsetT>, // Wrap the native input pointer with CacheModifiedInputIterator
ValuesInputIteratorT>::Type; These two lines are copied from template <typename Iterator>
struct is_contiguous_iterator_impl
: integral_constant<
bool
, is_pointer<Iterator>::value
|| is_thrust_pointer<Iterator>::value
|| is_libcxx_wrap_iter<Iterator>::value
|| is_libstdcxx_normal_iterator<Iterator>::value
|| is_msvc_contiguous_iterator<Iterator>::value
|| proclaim_contiguous_iterator<Iterator>::value
>
{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth addressing @zasdfgbnm comment by implementing a make_input_iterator
version without caching policy:
template <class It>
auto __device__ __forceinline__
make_load_iterator_impl(It it, thrust::detail::true_type /* is_trivial */)
{
return raw_pointer_cast(&*it);
}
template <class It>
It __device__ __forceinline__
make_load_iterator_impl(It it, thrust::detail::false_type /* is_trivial */)
{
return it;
}
template <class It>
typename LoadIterator<It>::type __device__ __forceinline__
make_load_iterator(It it)
{
return make_load_iterator_impl(
it, typename is_contiguous_iterator<It>::type());
}
This version would be used before dispatching into CUB code. I think will need this kind of facility later in any case.
Agreed -- I'll add something like this to the contiguous iterator implementation. |
17e2f3a
to
431d86a
Compare
run tests |
Still need to write/verify benchmarks, and if we're happy with the new contiguous iterator unwrapping API I'll update the other CUB-backed algorithms in a followup. |
431d86a
to
d11776b
Compare
run tests |
d11776b
to
9037c1d
Compare
run tests |
9037c1d
to
95e0d6c
Compare
run tests |
95e0d6c
to
9bc0c59
Compare
run tests |
9bc0c59
to
c4ddcd2
Compare
run tests |
All off these are internal implementation details in the `thrust::detail` namespace: Contiguous iterators only: - `contiguous_iterator_traits` - `contiguous_iterator_raw_pointer_t`: - `contiguous_iterator_raw_pointer_cast` These work on all iterators, but convert to a raw pointer if given a contiguous iterator. - `try_unwrap_contiguous_iterator_return_t` - `try_unwrap_contiguous_iterator`
c4ddcd2
to
3f4485e
Compare
Split the scan_by_key test into separate exclusive and inclusive variants to reduce memory usage during compilation. run tests |
This test was consuming excessive memory during nvc++ compilation. Splitting into two TUs should remedy this. Ran clang-format on the new test files, but the contents are the same.
3f4485e
to
5f794f6
Compare
run tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but nothing critical. Thank you for separating the test!
testing/is_contiguous_iterator.cu
Outdated
true>::value)); | ||
THRUST_STATIC_ASSERT((check_unwrapped_iterator<T *, | ||
T *, | ||
false>::value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to test for check_unwrapped_iterator<T, T, true>
+ check_unwrapped_iterator<T, T, false>
that I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the unwrapped raw pointer is the same type as the pass-through, both the true
and false
cases should pass here. Testing both just ensures that everything is working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the unwrapped raw pointer is the same type as the pass-through, both the true
and false
cases should pass here. Testing both just ensures that everything is working as expected.
typename ScanOpT, | ||
typename SizeT> | ||
__host__ __device__ | ||
ValuesOutIt exclusive_scan_by_key_n( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: this function is a duplicate on inclusive version. We might always have InitValueT
and pass cub::NullType{}
to indicate exclusiveness. The error messages will get back to scan_by_key
, but it was like that before anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll clean this up to reduce duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, let's keep these separate -- otherwise we'll just have to re-split them when we address NVIDIA/cub#384 and add InitialValue
support to inclusive scans.
run tests |
I will update
thrust_benchmark
to test this for perf regressions before merging. (Hence theblocked
tag)cc: @zasdfgbnm for awareness (don't want to duplicate effort)
run tests