Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device subsets example #346

Merged
merged 34 commits into from
Sep 26, 2023
Merged

Conversation

PointKernel
Copy link
Member

@PointKernel PointKernel commented Aug 5, 2023

Depends on #349

This PR adds an example demonstrating how to create multiple subsets with one single storage. It includes necessary changes and cleanups that will unblock orc/parquet dictionary encoding (rapidsai/cudf#12261) to use the new map/set data structures.

@PointKernel PointKernel added type: feature request New feature request In Progress Currently a work in progress topic: static_set Issue related to the static_set labels Aug 5, 2023
@PointKernel PointKernel marked this pull request as ready for review August 16, 2023 00:32
Comment on lines 67 to 72
// static_assert(is_window_extent_v<typename StorageRef::extent_type>,
// "Extent is not a valid cuco::window_extent");
// static_assert(ProbingScheme::cg_size == StorageRef::extent_type::cg_size,
// "Extent has incompatible CG size");
// static_assert(StorageRef::window_size == StorageRef::extent_type::window_size,
// "Extent has incompatible window size");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could we solve the issue where the sum of window_extents is not a window_extent itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing these checks isn't ideal. We could create a new window_extent from the sum using make_window_extent and pass that to the ctor.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing these checks isn't ideal.

Agreed. That's the complicated part.

In general, when users have a pointer and a size on hand. Creating a ref should be as simple as:

auto ref = ref_type{&data, size};

That's what I'm trying to achieve for storage_ref in the subset example:

  auto set_ref = ref_type<cuco::experimental::find_tag>{
    cuco::empty_key<key_type>{-1}, {}, {}, storage_ref_type{sizes[idx], windows + offsets[idx]}};

Enabling those checks enforces users to invoke make_window_extent again over sizes[i] (Note sizes[i] is already the output of make_window_extent). Also, default template parameters are no longer valid either thus users need to specify them explicitly. The above code would turn into:

  using extent_type =
    decltype(make_window_extent<cg_size, window_size>(std::declval<cuco::experimental::extent<size_t>>()));
  auto set_ref = ref_type<cuco::experimental::find_tag>{
    cuco::empty_key<key_type>{-1},
    {},
    {},
    aow_storage_ref<key_type,
                    window_size,
                    extent_type>{make_valid_extent<cg_size, window_size>(sizes[idx]), windows + offsets[idx]}};

This is way more complex than needed.

One solution I can think of is to set the proper data type for sizes array, so instead of:

  auto valid_sizes = std::vector<std::size_t>(num);

Users should get the return type of make_window_extent first and then declare the array:

  using extent_type =
    decltype(make_window_extent<cg_size, window_size>(std::declval<cuco::experimental::extent<size_t>>()));
  auto valid_sizes = std::vector<extent_type>(num);

One thing I don't like here is the obscure way that users have to follow to set up the proper extent type. Isn't all those fiddlings around cg_size, window_size, and size_t, etc too complicated for people who just want a size? By all means, this is a doable workaround but doesn't solve the core problem that we are prohibiting users to create a data ref with a pointer and a size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree on the unnecessary complexity. It should be as simple as passing a pointer and a size (or a cuda::std::span once we have it).

Can we provide an additional ctor with signature storage_ref_type(ptr, size), which then internally constructs a window_extent from the size?

We can discuss this in today's dev sync.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which then internally constructs a window_extent from the size?

We lose the motivation of having a window_extent strong type in that way.

#include <iostream>

auto constexpr cg_size = 8; ///< A CUDA Cooperative Group of 8 threads to handle each subset
auto constexpr window_size = 1; ///< TODO: how to explain window size (vector length) to users
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory access granularity (which may impact perfomance depending on the size of the slot type)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like referring to it as "items per thread" or "thread granularity" as it controls how many elements an individual thread processes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, items_per_thread itself is definitely a less abstractive name than window_size.

  cuco::storage<key_type, items_per_thread> { ... };

v.s.

  cuco::aow_storage<key_type, window_size> { ... };
  // or
  cuco::window_storage<key_type, window_size> { ... };

Actually, the former one is not bad at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only confusion that might occur is that items_per_thread in e.g. CUB refers to input items per thread, whilst our items_per_thread means slots per thread. Just a minor thing. I'm ok with it. thread_granularity would remove the items part which might be less confusing but also less descriptive. Meh.

examples/static_set/device_subsets_example.cu Outdated Show resolved Hide resolved
examples/static_set/device_subsets_example.cu Outdated Show resolved Hide resolved
Comment on lines 67 to 72
// static_assert(is_window_extent_v<typename StorageRef::extent_type>,
// "Extent is not a valid cuco::window_extent");
// static_assert(ProbingScheme::cg_size == StorageRef::extent_type::cg_size,
// "Extent has incompatible CG size");
// static_assert(StorageRef::window_size == StorageRef::extent_type::window_size,
// "Extent has incompatible window size");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing these checks isn't ideal. We could create a new window_extent from the sum using make_window_extent and pass that to the ctor.

@PointKernel PointKernel added Needs Review Awaiting reviews before merging and removed In Progress Currently a work in progress labels Aug 16, 2023
@PointKernel PointKernel added the helps: rapids Helps or needed by RAPIDS label Aug 18, 2023
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 6, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@PointKernel
Copy link
Member Author

/ok to test

@PointKernel
Copy link
Member Author

/ok to test

Copy link
Collaborator

@sleeepyjack sleeepyjack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Lets get this merged as is and fix the ref type problem later.

include/cuco/detail/static_map/static_map_ref.inl Outdated Show resolved Hide resolved
@PointKernel
Copy link
Member Author

/ok to test

@PointKernel PointKernel merged commit 359f5ae into NVIDIA:dev Sep 26, 2023
11 checks passed
@PointKernel PointKernel deleted the subset-example branch September 26, 2023 20:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helps: rapids Helps or needed by RAPIDS Needs Review Awaiting reviews before merging topic: static_set Issue related to the static_set type: feature request New feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants