-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Arrow storage format for TestData #381
Comments
I'm happy to do the conversion, but I still can't figure out how to read the the produced files into Python. The documentation for the relevant packages there treat Arrow as a memory format and not as a disk format and none of the various disk formats listed seem to match the output of |
I am trying out the conversion now. Did you see Jacob's answer on https://julialang.slack.com/archives/C674VR0HH/p1600454109147800 I wasn't quite sure what arguments could be used to import pyarrow as pa
df = pa.ipc.open_file(buf).read_pandas() |
I have added the Arrow files to the osf.io repo. If you add the master branch of https://github.com/JuliaData/Arrow.jl (which also requires the master branch of Tables.jl) you can read these files with, e.g., |
This issue may come to the fore earlier than we had anticipated. I just installed a prerelease version of julia-1.5.2 and was unable to test MixedModels because compilation of the release version of In the discourse.julialang.org discussion on julia-1.5.2 the conclusion seems to be that the compilation failure is in
|
The experience with #380 makes me more convinced that it would be good to switch from Feather storage format, which brings in DataFrames and CategoricalArrays when reading the file, to the new Arrow format as implemented in https://github.com/JuliaData/Arrow.jl (note that this is not the currently registered repository for Arrow). On the slack data channel Jacob indicated that he hopes to release the new Arrow implementation in a week or so.
It will take us a while to switch formats because all the datasets must be saved in the new format and I haven't worked out a way of having both Feather and the new Arrow loaded at the same time.
The text was updated successfully, but these errors were encountered: