Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Stream compaction drop_duplicates does not use stable sort when removing duplicates #9356

Closed
ttnghia opened this issue Oct 1, 2021 · 2 comments
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Oct 1, 2021

Currently, the stream compaction API drop_duplicates has an option allowing to keep the first/last duplicate element. For example, if the input keys are [1, 1, 2, 2] and values are [1, 2, 3, 4], then removing duplicates (by keys) with KEEP_FIRST option should result in the values [1, 3].

Internally, drop_duplicates uses sorting to sort the keys elements then uses unique_copy. With KEEP_FIRST and KEEP_LAST options, stable sort should be used to guarantee to have the expected result. However, the current implementation is using the default unstable sort.

Since unstable sort may produce the same result as stable sort, the current unit tests for drop_duplicates still pass all. But we should switch to use stable sort ASAP.

@ttnghia ttnghia added bug Something isn't working libcudf blocker libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Oct 1, 2021
@ttnghia ttnghia self-assigned this Oct 1, 2021
@davidwendt
Copy link
Contributor

Was this fixed by #9417 ?

@ttnghia
Copy link
Contributor Author

ttnghia commented Oct 18, 2021

Yes. Sorry that I forgot to link this.

@ttnghia ttnghia closed this as completed Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

No branches or pull requests

2 participants