-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENHANCEMENT]: Add a callback to the bulk_insert functions #376
Comments
Hm, I'm not against this, but this kind of custom operation seems like what the device-side API is intended for. |
I'd like to avoid duplicating the work that cuco has already done. So if I can use the bulk operations, all the better! |
I totally get your point about avoiding writing a kernel that is already 90% existent in cuco. It's really about API flexibility:
The solution you are proposing is somewhere in between: So, when a developer plans to use cuco for a particular complex task, they have the following thought process:
I'm not convinced 2. is helpful when we can just move to 3. directly and be able to optimize the hell out of the custom kernel. However, we could solve your particular problem with a different approach: So you can achieve the same thing that you asked for by using the bulk API with Thrust fancy iterators. @PointKernel @jrhemstad I have one minor concern about my proposal: |
The output iterator is a nice idea. No way it works with the memcpy_async, right? Surely memcpy_async requires that the destination is a real pointer. Worse still, it might compile and work just fine on A100 and before but fail on Hopper. |
Correction: We use the
So using a fancy iterator should work in this scenario. |
Is your feature request related to a problem? Please describe.
I would like to be able to run a function on every key that is inserted by the bulk insert functions.
Describe the solution you'd like
Each insert will return the slot and whether this was an insert or the key was already there so this was just a lookup. And then I will do whatever I want with that information. For example, I could use an atomic and fill an array with unique elements, allowing me to perform "retrieve_all" during the insert without reading the table twice.
Describe alternatives you've considered
I can copy and paste all the bulk insert code into my code and then use that. The problem here is that I trust cuco to get the grid shape and everything right and if I do it this way then I might get it wrong. Also, if cuco comes up with improvements to the bulk insert, I could use them.
Additional context
No response
The text was updated successfully, but these errors were encountered: