-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-42240: [R] Fix crash in ParquetFileWriter$WriteTable and add WriteBatch #42241
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for jumping on this! I have one comment, feel free to take it or leave it.
I pushed up a change addressing that comment and noticed I missed documenting the new class method so I added another commit for that. Letting CI run and then I'll merge. |
One of the tests this PR adds, in conjunction with the minimum supported C++ version check, caught a segfault in Arrow C++ that wasn't fixed until Arrow 15. I think we need to put a closed-check in the R package somewhere to catch this. I'll look into that and include a note that the check can be removed once we bump the minimum version to or above 15. |
Does this mean we should bump our minimum libarrow to 15 then? If we can't support it, that's the way to go IMO. The backwards compatibility with libarrow is nice, but as far as I know it's not used anywhere specifically. It was added with the hopes that we might use system-library libarrow on CRAN among other places, but that hasn't happened. |
Hrm, maybe. I think the only thing to do other than that would be to duplicate the open/closed state tracking on the R side until we bump the minimum version for some other reason. I think it may be relevant that this segfault is just in the manual invocation of the ParquetFileWriter and I'm not sure if that impacts the decision-making process here. |
Do we know of any actual uses of the backwards compatibility in the wild? |
I'm not aware of any. |
In that case I think we should just require >15 and that's ok (that might also let us delete some of the |
22a706b
to
e3c460d
Compare
If you use 15.0.0 here like you'd expect you could, you get root@0347b6360c69:/# apt install -y -V libarrow-dev=15.0.0-1 \ libarrow-acero-dev=15.0.0-1 \ libparquet-dev=15.0.0-1 \ libarrow-dataset-dev=15.0.0-1 Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: libarrow-acero-dev : Depends: libarrow-acero1500 (= 15.0.0-1) but 15.0.2-1 is to be installed libarrow-dataset-dev : Depends: libarrow-dataset1500 (= 15.0.0-1) but 15.0.2-1 is to be installed libarrow-dev : Depends: libarrow1500 (= 15.0.0-1) but 15.0.2-1 is to be installed libparquet-dev : Depends: libparquet1500 (= 15.0.0-1) but 15.0.2-1 is to be installed E: Unable to correct problems, you have held broken packages.
Okay @jonkeane, I've bumped the minimum version from 13 to 15 and updated the PR body. There was only one |
Thanks! Sorry I should have linked this before, but we also need to change arrow/r/tools/check-versions.R Line 27 in 89fd566
|
My fault for missing that. I changed that and added a NEWS.md entry while I was at it. |
I think CI is happy now that I pushed c2d39d0. |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 84df343. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
See #42240.
What changes are included in this PR?
ParquetFileWriter$WriteTable
by asserting the class of what's passed in and stopping if it's not aTable
WriteBatch
to matchpyarrow.parquet.ParquetWriter.write_batch
which is just a convenienceAre these changes tested?
Yes.
Are there any user-facing changes?
New method on ParquetFileWriter (WriteBatch).