Add gzip decompression from file #48

DerAndereJohannes · 2024-07-15T12:23:53Z

Hi.
I generally prefer to store my acqknowledge files in gzip format to save quite a bit of storage space. Using this package I have to usually add a bit of boilerplate to my code in order to read the gzipped file into an io.BytesIO object and then pass that into bioread.

This works fine, but I was wondering if you would be open to allowing directly adding .gz files into the input of bioread.read which would do this boilerplate automatically.

Thank you in advance for your time.

njvack · 2024-07-15T14:00:12Z

Wait, why not use the compressed .acq format? That compresses a whole heck of a lot better than gzipping interleaved data and doesn't require any extra steps to read.

I'm not super keen on adding .acq.gz as a natively-supported format, I think

DerAndereJohannes · 2024-07-15T14:37:22Z

oof, this is quite a way to find out that there are already ways to compress these files natively. Funnily enough, at least for the files I receive, gzipping them manually gets me pretty good results. here is an example file of a recording not too long ago:

gzip.exe --list .\2024-07-10-005.acq.gz
 compressed        uncompressed  ratio uncompressed_name
   62798861           228915926  72.6% .\2024-07-10-005.acq

I'm guessing since you use zlib to decompress, the compression ratios must be somewhat similar to these though. Or does acqknowledge not store the data in an interleaved fashion when it itself compresses?

Thanks for the heads up and the clarification, I will go look into the native acqknowledge options. I understand that this pull request may be not as relevant as I had thought.

njvack · 2024-07-15T18:27:10Z

That is a pretty good ratio!

But yes, when you're using the built-in compression, each channel's data is chunked together before compression; since physio data tend to be quite autocorrelated it tends to do better than just running the normal file through gzip.

I'm kinda torn; this is a super clean PR and it doesn't add much code and it would make life easier for at least some folk. Let me think on it a little bit, and very seriously, thank you for your contribution.

DerAndereJohannes · 2024-07-16T14:49:04Z

Thanks for the info. I went to the lab today and checked out the compression options, you are right that the native compression does beat out gzip (not really a surprise):

Raw File: 228915926 bytes
Gzip Compression: 62798874 Bytes (72.6% Ratio)
Acq Compression: 40934100 Bytes (82.1% Ratio)

I should have probably tested this on more files, but I did not have enough time for this now.

The non-native compression really only has two arguments for it:

Does not require Acqknowledge software / dongle to perform
You retain the ability to append to the file if you reimport it back into acqknowledge (after decompression)

I am also torn on what I should do. By automatically gzipping in my current pipeline, I don't have to worry about the files that people are sending me and am guaranteed pretty good results regardless if they send natively compressed or not. But you are right, this is probably not that orthodox.

Thank you for your time!

Add gzip decompression from file

1b23073

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gzip decompression from file #48

Add gzip decompression from file #48

DerAndereJohannes commented Jul 15, 2024

njvack commented Jul 15, 2024

DerAndereJohannes commented Jul 15, 2024

njvack commented Jul 15, 2024

DerAndereJohannes commented Jul 16, 2024

Add gzip decompression from file #48

Are you sure you want to change the base?

Add gzip decompression from file #48

Conversation

DerAndereJohannes commented Jul 15, 2024

njvack commented Jul 15, 2024

DerAndereJohannes commented Jul 15, 2024

njvack commented Jul 15, 2024

DerAndereJohannes commented Jul 16, 2024