Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38770: [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray #40971

Merged
merged 4 commits into from
May 8, 2024

Conversation

AlenkaF
Copy link
Member

@AlenkaF AlenkaF commented Apr 3, 2024

Rationale for this change

Filtering a record batch with a boolean mask in the form of a ChunkedArray results in a segmentation fault.

What changes are included in this PR?

In case chunked array is passed as a mask to filter record batch, the code path for pa.Table.filter() is taken resulting in a filtered table.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

Copy link

github-actions bot commented Apr 3, 2024

⚠️ GitHub issue #38770 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@danepitkin danepitkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 3, 2024
@pitrou
Copy link
Member

pitrou commented Apr 4, 2024

Hmm... if the crash occurs due to a defect in the C++ code, perhaps this can be fixed on the C++ side? (or at least the C++ side could return a proper error)

@AlenkaF
Copy link
Member Author

AlenkaF commented Apr 4, 2024

Hmm... if the crash occurs due to a defect in the C++ code, perhaps this can be fixed on the C++ side? (or at least the C++ side could return a proper error)

Ah yes, you are correct. That is also what Joris ment in the issue thread. Will correct.

@AlenkaF
Copy link
Member Author

AlenkaF commented Apr 4, 2024

@pitrou I have moved the check to C++ but changed the logic to raise an error in case of a chunked mask. This way the bug will be resolved quicker as I would need to figure out some things in order to use FilterTable path =)

@AlenkaF AlenkaF changed the title GH-38770: [Python] RecordBatch.filter() segfaults if passed a ChunkedArray GH-38770: [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray Apr 11, 2024
@AlenkaF AlenkaF requested a review from pitrou April 11, 2024 13:35
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Apr 11, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 15, 2024
@AlenkaF AlenkaF requested a review from amol- May 7, 2024 09:57
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 8, 2024
@AlenkaF AlenkaF merged commit d83af8f into apache:main May 8, 2024
36 of 37 checks passed
@AlenkaF AlenkaF removed the awaiting merge Awaiting merge label May 8, 2024
@AlenkaF AlenkaF deleted the gh-38770-filter-segfault branch May 8, 2024 10:57
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit d83af8f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 24 possible false positives for unstable benchmarks that are known to sometimes produce them.

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…d a ChunkedArray (apache#40971)

### Rationale for this change

Filtering a record batch with a boolean mask in the form of a `ChunkedArray` results in a segmentation fault.

### What changes are included in this PR?

In case chunked array is passed as a mask to filter record batch, the code path for `pa.Table.filter()` is taken resulting in a filtered table.

### Are these changes tested?

Yes.

### Are there any user-facing changes?

No.
* GitHub Issue: apache#38770

Authored-by: AlenkaF <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants