Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] support lists::contains on lists of structs. #8958

Closed
revans2 opened this issue Aug 4, 2021 · 6 comments · Fixed by #10548
Closed

[FEA] support lists::contains on lists of structs. #8958

revans2 opened this issue Aug 4, 2021 · 6 comments · Fixed by #10548
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS

Comments

@revans2
Copy link
Contributor

revans2 commented Aug 4, 2021

Is your feature request related to a problem? Please describe.
We would like to extend the lists::contains function to support lists of structs and list of more deeply nested structs, like a struct with structs in it. We are prioritizing structs right now, and this is not intended to cover structs that also contain lists.

Just like with the existing code if the input value is a null or if the search value is a null, then the output should also be a null.

Describe the solution you'd like
Have lists::contains just work.

Describe alternatives you've considered
We could try and hack this together ourselves by picking apart the child columns and then checking for equality for each of them, with and-ing them all together followed by a reduction, but that gets really complicated.

@revans2 revans2 added feature request New feature or request Needs Triage Need team to review and classify labels Aug 4, 2021
@beckernick beckernick added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Aug 23, 2021
@beckernick beckernick added the Spark Functionality that helps Spark RAPIDS label Aug 25, 2021
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@ttnghia ttnghia self-assigned this Dec 11, 2021
@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@revans2
Copy link
Contributor Author

revans2 commented Mar 11, 2022

This is still wanted. Not a huge priority though

@ttnghia
Copy link
Contributor

ttnghia commented Mar 11, 2022

Blocked by hash refactoring. I'm still investigating it.

@jrhemstad
Copy link
Contributor

Blocked by hash refactoring. I'm still investigating it.

This will be solved trivially by #10186

@ttnghia
Copy link
Contributor

ttnghia commented Mar 14, 2022

@revans2 I think we should better support trivial lists::contain first (i.e., check if each list contains a given scalar), as we did for the general cudf::contain before. General support for checking a list containing a set a scalars is more difficult, is currently blocked by hash refactoring. We will implement it later if there is customer request.

rapids-bot bot pushed a commit that referenced this issue Aug 22, 2022
This extends the `lists::contains` API to support nested types (lists + structs) with arbitrarily nested levels. As such, `lists::contains` will work with literally any type of input data.

In addition, the related implementation has been significantly refactored to facilitate adding new implementation.

Closes #8958.
Depends on:
 * #10730
 * #10883
 * #10999
 * #11019
 * #11037

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Bradley Dice (https://github.com/bdice)

URL: #10548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants