Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Arrow storage format for TestData #381

Closed
dmbates opened this issue Sep 19, 2020 · 4 comments · Fixed by #382
Closed

Switch to Arrow storage format for TestData #381

dmbates opened this issue Sep 19, 2020 · 4 comments · Fixed by #382

Comments

@dmbates
Copy link
Collaborator

dmbates commented Sep 19, 2020

The experience with #380 makes me more convinced that it would be good to switch from Feather storage format, which brings in DataFrames and CategoricalArrays when reading the file, to the new Arrow format as implemented in https://github.com/JuliaData/Arrow.jl (note that this is not the currently registered repository for Arrow). On the slack data channel Jacob indicated that he hopes to release the new Arrow implementation in a week or so.

It will take us a while to switch formats because all the datasets must be saved in the new format and I haven't worked out a way of having both Feather and the new Arrow loaded at the same time.

@palday
Copy link
Member

palday commented Sep 19, 2020

I'm happy to do the conversion, but I still can't figure out how to read the the produced files into Python. The documentation for the relevant packages there treat Arrow as a memory format and not as a disk format and none of the various disk formats listed seem to match the output of Arrow.jl.

@dmbates
Copy link
Collaborator Author

dmbates commented Sep 19, 2020

I am trying out the conversion now. Did you see Jacob's answer on https://julialang.slack.com/archives/C674VR0HH/p1600454109147800

I wasn't quite sure what arguments could be used to open_file as in

import pyarrow as pa
df = pa.ipc.open_file(buf).read_pandas()

@dmbates
Copy link
Collaborator Author

dmbates commented Sep 20, 2020

I have added the Arrow files to the osf.io repo. If you add the master branch of https://github.com/JuliaData/Arrow.jl (which also requires the master branch of Tables.jl) you can read these files with, e.g., Arrow.Table("cbpp.arrow")

@dmbates
Copy link
Collaborator Author

dmbates commented Sep 20, 2020

This issue may come to the fore earlier than we had anticipated. I just installed a prerelease version of julia-1.5.2 and was unable to test MixedModels because compilation of the release version of Arrow.jl (from https://github.com/ExpandingMan/Arrow.jl) segfaulted. The development version in https://github.com/JuliaData/Arrow.jl did not segfault.

In the discourse.julialang.org discussion on julia-1.5.2 the conclusion seems to be that the compilation failure is in CategoricalArrays.jl and is a problem for any 1.5 series version. It does not show up in 1.5.1 because assertions are not turned on in the distributed version whereas they are in the 1.5.2 test version.

julia: /buildworker/worker/package_linux64/build/src/subtype.c:1978: jl_types_equal: Assertion `subtype_ab == 3 || subtype_ab == subtype || jl_has_free_typevars(a) || jl_has_free_typevars(b)' failed.

signal (6): Aborted
in expression starting at /home/bates/.julia/packages/Arrow/q3tEJ/src/Arrow.jl:3
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f52176df728)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
jl_types_equal at /buildworker/worker/package_linux64/build/src/subtype.c:1978
jl_typemap_entry_lookup_by_type at /buildworker/worker/package_linux64/build/src/typemap.c:537
jl_typemap_assoc_by_type at /buildworker/worker/package_linux64/build/src/typemap.c:599
check_ambiguous_visitor at /buildworker/worker/package_linux64/build/src/gf.c:1302
jl_typemap_intersection_node_visitor at /buildworker/worker/package_linux64/build/src/typemap.c:312
jl_typemap_intersection_visitor at /buildworker/worker/package_linux64/build/src/typemap.c:408
jl_typemap_intersection_visitor at /buildworker/worker/package_linux64/build/src/typemap.c:399
check_ambiguous_matches at /buildworker/worker/package_linux64/build/src/gf.c:1394
jl_method_table_insert at /buildworker/worker/package_linux64/build/src/gf.c:1709
jl_insert_methods at /buildworker/worker/package_linux64/build/src/dump.c:2292 [inlined]
_jl_restore_incremental at /buildworker/worker/package_linux64/build/src/dump.c:3248
jl_restore_incremental at /buildworker/worker/package_linux64/build/src/dump.c:3299
_include_from_serialized at ./loading.jl:681
_require_search_from_serialized at ./loading.jl:782
_require at ./loading.jl:1007
require at ./loading.jl:928
require at ./loading.jl:923
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
call_require at /buildworker/worker/package_linux64/build/src/toplevel.c:425 [inlined]
eval_import_path at /buildworker/worker/package_linux64/build/src/toplevel.c:462
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:682
jl_eval_module_expr at /buildworker/worker/package_linux64/build/src/toplevel.c:197
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:666
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:913
jl_load_rewrite at /buildworker/worker/package_linux64/build/src/toplevel.c:914
include at ./Base.jl:380
include at ./Base.jl:368
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:117
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:206
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:157 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:552
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:492
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:660
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:840
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:883
eval at ./boot.jl:331 [inlined]
eval at ./client.jl:467
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
top-level scope at ./none:3
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2238
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:834
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:790
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:883
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
exec_options at ./client.jl:272
_start at ./client.jl:506
jfptr__start_52252.clone_1 at /home/bates/src/julia-1.5.2-DEV/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1690 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:106
main at /buildworker/worker/package_linux64/build/ui/repl.c:227
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/bates/src/julia-1.5.2-DEV/bin/julia (unknown line)
Allocations: 2545 (Pool: 2535; Big: 10); GC: 0
ERROR: LoadError: Failed to precompile Arrow [69666777-d1a9-59fb-9406-91d4454c9d45] to /home/bates/.julia/compiled/v1.5/Arrow/QnF3w_WvzKc.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] top-level scope at none:2
 [9] eval at ./boot.jl:331 [inlined]
 [10] eval(::Expr) at ./client.jl:467
 [11] top-level scope at ./none:3
in expression starting at /home/bates/.julia/packages/Feather/pbm3o/src/Feather.jl:3
ERROR: LoadError: Failed to precompile Feather [becb17da-46f6-5d3c-ad1b-1c5fe96bc73c] to /home/bates/.julia/compiled/v1.5/Feather/RgcL0_WvzKc.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::Function, ::Module, ::String) at ./Base.jl:380
 [7] include(::Module, ::String) at ./Base.jl:368
 [8] top-level scope at none:2
 [9] eval at ./boot.jl:331 [inlined]
 [10] eval(::Expr) at ./client.jl:467
 [11] top-level scope at ./none:3
in expression starting at /home/bates/.julia/dev/MixedModels/src/MixedModels.jl:5
ERROR: LoadError: Failed to precompile MixedModels [ff71e718-51f3-5ec2-a782-8ffcbfa3c316] to /home/bates/.julia/compiled/v1.5/MixedModels/tBiYK_WvzKc.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compilecache(::Base.PkgId, ::String) at ./loading.jl:1305
 [3] _require(::Base.PkgId) at ./loading.jl:1030
 [4] require(::Base.PkgId) at ./loading.jl:928
 [5] require(::Module, ::Symbol) at ./loading.jl:923
 [6] include(::String) at ./client.jl:457
 [7] top-level scope at none:6
in expression starting at /home/bates/.julia/dev/MixedModels/test/runtests.jl:1
ERROR: Package MixedModels errored during testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants