NVIDIA · gonidelis · Mar 22, 2024 · Mar 22, 2024 · Mar 22, 2024 · Mar 22, 2024
@@ -543,7 +543,7 @@ The dispatch entry point is typically represented by a static member function th
     };
 
 For many algorithms, the dispatch layer is part of the API. 
-The first reason for this to be the case is ``size_t`` support. 
+The main reason for this integration is to support ``size_t``.
 Our API uses ``int`` as a type for ``num_items``. 
 Users rely on the dispatch layer directly to workaround this. 
 Exposing the dispatch layer also allows users to tune algorithms for their use cases. 

@@ -200,6 +200,40 @@ The code above leads to the following combinations being compiled:
 - ``type = std::int32_t``, ``threads_per_block = 128``
 - ``type = std::int32_t``, ``threads_per_block = 256``
 
+As an example, the following test case includes both multidimensional configuration spaces 
+and multiple random sequence generations.
+
+.. code-block:: c++
+
+    using block_sizes = c2h::enum_type_list<int, 128, 256>;
+    using types = c2h::type_list<std::uint8_t, std::int32_t>;
+
+    CUB_TEST("SCOPE FACILITY works with CONDITION",
+            "[FACILITY][SCOPE]",
+            types,
+            block_sizes)
+    {
+      using type = typename c2h::get<0, TestType>;
+      constexpr int threads_per_block = c2h::get<1, TestType>::value;
+      // ...
+      c2h::device_vector<type> d_input(5);
+      c2h::gen(CUB_SEED(2), d_input);
+    }
+
+The code above leads to the following combinations being compiled:
+
+- ``type = std::uint8_t``, ``threads_per_block = 128``, 1st random generated input sequence
+- ``type = std::uint8_t``, ``threads_per_block = 256``, 1st random generated input sequence
+- ``type = std::int32_t``, ``threads_per_block = 128``, 1st random generated input sequence
+- ``type = std::int32_t``, ``threads_per_block = 256``, 1st random generated input sequence
+- ``type = std::uint8_t``, ``threads_per_block = 128``, 2nd random generated input sequence
+- ``type = std::uint8_t``, ``threads_per_block = 256``, 2nd random generated input sequence
+- ``type = std::int32_t``, ``threads_per_block = 128``, 2nd random generated input sequence
+- ``type = std::int32_t``, ``threads_per_block = 256``, 2nd random generated input sequence
+
+Each new generator multiplies the number of execution times by its number of seeds. That means
+that if there were further more sequence generators (``c2h::gen(CUB_SEED(X), ...)``) on the
+example above the test would execute X more times and so on.
 
 Speedup Compilation Time
 =====================================