Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixes and additions on cub developer guides #1559

Merged
merged 3 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cub/docs/developer_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -543,7 +543,7 @@ The dispatch entry point is typically represented by a static member function th
};

For many algorithms, the dispatch layer is part of the API.
The first reason for this to be the case is ``size_t`` support.
The main reason for this integration is to support ``size_t``.
Our API uses ``int`` as a type for ``num_items``.
Users rely on the dispatch layer directly to workaround this.
Exposing the dispatch layer also allows users to tune algorithms for their use cases.
Expand Down
34 changes: 34 additions & 0 deletions cub/docs/test_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,40 @@ The code above leads to the following combinations being compiled:
- ``type = std::int32_t``, ``threads_per_block = 128``
- ``type = std::int32_t``, ``threads_per_block = 256``

As an example, the following test case includes both multidimensional configuration spaces
and multiple random sequence generations.

.. code-block:: c++

using block_sizes = c2h::enum_type_list<int, 128, 256>;
using types = c2h::type_list<std::uint8_t, std::int32_t>;

CUB_TEST("SCOPE FACILITY works with CONDITION",
"[FACILITY][SCOPE]",
types,
block_sizes)
{
using type = typename c2h::get<0, TestType>;
constexpr int threads_per_block = c2h::get<1, TestType>::value;
// ...
c2h::device_vector<type> d_input(5);
c2h::gen(CUB_SEED(2), d_input);
}

gonidelis marked this conversation as resolved.
Show resolved Hide resolved
The code above leads to the following combinations being compiled:

- ``type = std::uint8_t``, ``threads_per_block = 128``, 1st random generated input sequence
- ``type = std::uint8_t``, ``threads_per_block = 256``, 1st random generated input sequence
- ``type = std::int32_t``, ``threads_per_block = 128``, 1st random generated input sequence
- ``type = std::int32_t``, ``threads_per_block = 256``, 1st random generated input sequence
- ``type = std::uint8_t``, ``threads_per_block = 128``, 2nd random generated input sequence
- ``type = std::uint8_t``, ``threads_per_block = 256``, 2nd random generated input sequence
- ``type = std::int32_t``, ``threads_per_block = 128``, 2nd random generated input sequence
- ``type = std::int32_t``, ``threads_per_block = 256``, 2nd random generated input sequence

Each new generator multiplies the number of execution times by its number of seeds. That means
that if there were further more sequence generators (``c2h::gen(CUB_SEED(X), ...)``) on the
example above the test would execute X more times and so on.

Speedup Compilation Time
=====================================
Expand Down
Loading