Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How can I save a vector as raw binary blob? #338

Closed
maxfreu opened this issue Mar 4, 2024 · 4 comments
Closed

[Question] How can I save a vector as raw binary blob? #338

maxfreu opened this issue Mar 4, 2024 · 4 comments

Comments

@maxfreu
Copy link
Contributor

maxfreu commented Mar 4, 2024

Hi! I have a dataframe column containing vectors of 10 Int16s. I would like to save the vectors as 20 bytes of blob data. How can I do that? Right now I work around it by converting the reinterpreted chars to a string, but that has issues with null termination etc.

@quinnj
Copy link
Member

quinnj commented Mar 12, 2024

It's a little hard to tell what you're trying to do; can you share some example code of what you would like to do or what you're currently doing and problems you're having? Having concrete code example to work with can help in answering your question.

@maxfreu
Copy link
Contributor Author

maxfreu commented Apr 2, 2024

Actually, my question was imprecise. It's more directed towards how blob data can be written without unnecessary copies.

# My data looks like this, just with 80 million rows:
data = [rand(UInt16, 10) for _ in 1:10]

# what I want is writing the data as contiguous blob (NOT serialized julia structs)
# I achieve this like so:
data2blob(v) = collect(reinterpret(UInt8, v))

df = DataFrame(:foo => data2blob.(data))
db = SQLite.DB("deleteme.sqlite")
SQLite.load!(df, db, "foo")
close(db)

The resulting file has a blob column with the correct data written to it. However, I'd like to avoid calling collect on 1.6GB of data. But when I leave it away, like so:

df = DataFrame(:foo => reinterpret.(UInt8, data))

the julia types get serialized somehow before being written. This makes kind of sense, but then I can't read it into other programs anymore. Maybe it would be good to special-case reinterpret arrays of basic integer types somewhere in the code?

@quinnj
Copy link
Member

quinnj commented Apr 2, 2024

Yeah, that makes sense to me. It might just be that we're supporting Vector{UInt8}, but could make it AbstractVector{UInt8} to store as blobs.

@maxfreu
Copy link
Contributor Author

maxfreu commented Apr 3, 2024

Oh yes, relaxing to AbstractVector{UInt8} is way better than specializing for reinterpret arrays. Where would that go?

@maxfreu maxfreu closed this as completed May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants