-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Register .pkl
in FileIO?
#8
Comments
Is |
From https://fileinfo.com/extension/pkl it seems that To avoid conflict with other file formats, FileIO requires you to write a small detector to check the first few bytes, for instance, https://github.com/JuliaIO/FileIO.jl/blob/21f435dc3ea63c4786e4cb894f855aef0692c056/src/registry.jl#L108 Thus it might be function is_pickle_format(...)
...
end
add_format(format"PyPickle", detect_pkl_format, [".pkl", ".p", ".pickle"], [idPickle]) By doing this, when FileIO sees a file with extension in ".pkl", ".p", or ".pickle", it will call If pickle format has fixed bytes at the beginning of the file, say
The You can check Netpbm.jl out as a simple example. |
Unfortunately regular python pickle don't have magic bytes. We would need to scan through the whole file to see if it is potentially loadable by the pickler. Would that be acceptable? |
Googling add_format(format"PyPickle", "", [".pkl", ".p", ".pickle"], [idPickle]) If BTW, it says pickle format is unsafe: https://docs.python.org/3/library/pickle.html. Does this package handle this security issue? |
Pickle format is unsafe because it can execute arbitrary python functions by explicitly storing instructions to call those functions. But since we don't/can't map every python function into equivalent Julia ones, we won't encounter that security issue. You can safely "load" any legal pickle format file, even the malicious one, but they would just return a This brings up another issue. If we want to get the stored data correctly (i.e. without any |
As long as it's clearly documented somewhere in the Pickle README or raises a friendly error message, I think it already improves the usability. BSON as another Julia serialization format is also registered in FileIO and I haven't seen people complaining about this. |
We could also register ".npy", ".npz" and pytorch's ".pt" or ".pth" support |
I thought ".npy" and ".npz" are already supported now JuliaIO/FileIO.jl#358? |
Yes by NPZ.jl, but we can register Pickle.jl as an alternative |
NPZ and Pickle are different. ".npy" and ".npz" are special format defined by numpy. You cannot load a numpy file by pickle and vice versa. Usually there are no pickle format in numpy format, unless you're storing ndarray with object dtype. This means there're actually 2 different ways (format, to be precise) to serialize a ndarray in numpy. One is numpy's own format npy/npz, another one is numpy's extension of pickle format. |
Some explanation about each format:
So npy/npz and torch pickle are not real pickle file, but some custom formats that use pickle format as part of its definition. |
Thanks for the explanation, these formats were very confusing to me. |
torch pickle? yes. npy file? no, I didn't handle that, but shouldn't be too hard to add. |
But I have been using with success |
I wonder if it has the wrong file extension? Could you try loading with julia> using PyCall, NPZ
julia> np = pyimport("numpy")
PyObject <module 'numpy' from '/home/peter/pyenv/lib/python3.6/site-packages/numpy/__init__.py'>
julia> np.save("npsave.npy", np.random.randn(10))
julia> Pickle.npyload("npsave.npy")
ERROR: ArgumentError: Deque must be non-empty
Stacktrace:
[...]
julia> NPZ.npzread("npsave.npy")
10-element Vector{Float64}:
-1.064963962117236
0.5587883282776522
-1.8944936182698036
1.2505924109230577
0.36063115028026194
3.3166609146251327
-2.31110326775469
-1.0430525977835379
1.2946367099663125
-1.441233594994848
|
This is a low priority issue.
Ref: https://github.com/JuliaIO/FileIO.jl#adding-new-formats
The text was updated successfully, but these errors were encountered: