-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qs2 proof of concept #57
base: main
Are you sure you want to change the base?
Conversation
…o unblock the thread
Persistent wait thread
Just 3 requests if I may:
|
Thanks, all good suggestions! I've been using |
Great thanks @traversc. No particular rush from my side. I have a small bugfix release of I'll open up new PRs at I haven't arrived at a conclusion as to user interface yet, but I'd probably be keen to offer just one |
Check out the latest commit. Here is a revised example:
|
I'd recommend the following:
The default CL in the package is 3, but thats because Im concerned with minimize disk usage long term. In-memory / over network CL 1 to me is a better tradeoff since the objects are temporary. I recommend shuffle = true. This often provides a moderate compression improvement on numerical data at very little computational cost. In previous versions of ZSTD this used to be a massive benefit, but in more recent ZSTD versions shuffling does not seem to be as important (ZSTD is likely finding the compression improvement by itself). You can experiment with your data. |
Thanks @traversc seems to work and test fine. Noted on the recommended settings. Just a thought I had - to really get some mileage out of qs2, I could eventually enable it as the default if qs2 is installed, rather than an opt in by the user. For this to work smoothly, I'd like to be able to swap the functions provided to |
Outside of those functions I am writing header info and hash. The I think that might be a little cumbersome, so I'd be inclined to keep it as an opt in. |
Sure, was just a thought. I think I can work with what you have here. |
I did a short analysis to benchmark qs2 in nanonext. Inter-process (same machine), sending 1 GB of data takes 1.9 seconds using "serial" and 4.4 seconds using qs2. So the benefit of using qs2 (or compression in general) will highly depend on network speed and also what your data looks like. It's not always better. qs2 will outperform serial once network speed drops below ~500 Mb/s. The bottleneck for qs2 is ZSTD compression speed, which is several 100s MB/s depending on the data. There are things that can be done to improve the tradeoff:
|
Makes sense, and very useful to know. I wonder what the tradeoffs are for thousands of smaller objects around 1 MB or so. AFAIK |
Not for merging
Very quick concept.
The short example I've added to the end of
recv()
docs appears to work well.@traversc
@wlandau FYI