Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qs2 proof of concept #57

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

qs2 proof of concept #57

wants to merge 12 commits into from

Conversation

shikokuchuo
Copy link
Owner

Not for merging

Very quick concept.

The short example I've added to the end of recv() docs appears to work well.

@traversc
@wlandau FYI

@shikokuchuo shikokuchuo marked this pull request as draft November 9, 2024 21:11
@shikokuchuo
Copy link
Owner Author

shikokuchuo commented Nov 9, 2024

@traversc

Just 3 requests if I may:

  1. Pls make the buffer type unsigned char rather than char. More usual for binary data (e.g. R itself defines https://github.com/wch/r-source/blob/2aeb2367202a89d408afed23839b139197b6bfde/src/include/Rinternals.h#L64)

  2. Pls make the size type size_t rather than uint64_t (compiler warning here: https://github.com/shikokuchuo/nanonext/actions/runs/11759589221/job/32759211560?pr=57#step:6:75).

  3. Either provide a qs2_free() function if you're allocating the buffer so that a consumer can be sure what they're using is correct, or as this is R, use R_Calloc() when you allocate so I can call R_Free() on my side. The latter would integrate well with my existing code, but I think the former would be a more idiomatic general solution.

@traversc
Copy link

@traversc

Just 3 requests if I may:

  1. Pls make the buffer type unsigned char rather than char. More usual for binary data (e.g. R itself defines https://github.com/wch/r-source/blob/2aeb2367202a89d408afed23839b139197b6bfde/src/include/Rinternals.h#L64)
  2. Pls make the size type size_t rather than uint64_t (compiler warning here: https://github.com/shikokuchuo/nanonext/actions/runs/11759589221/job/32759211560?pr=57#step:6:75).
  3. Either provide a qs2_free() function if you're allocating the buffer so that a consumer can be sure what they're using is correct, or as this is R, use R_Calloc() when you allocate so I can call R_Free() on my side. The latter would integrate well with my existing code, but I think the former would be a more idiomatic general solution.

Thanks, all good suggestions! I've been using char instead of unsigned char as lots of older libraries (lz4) used char. But you're right the newer convention seems to be to use unsigned char. I'll get back to you shortly.

@shikokuchuo
Copy link
Owner Author

Great thanks @traversc. No particular rush from my side. I have a small bugfix release of nanonext already lined up, and I'd be happy for this to enter at the next feature release.

I'll open up new PRs at nanonext and mirai for the actual work.

I haven't arrived at a conclusion as to user interface yet, but I'd probably be keen to offer just one qs2 option if that makes sense - are there options for compression / shuffle that you can recommend as default?

@traversc
Copy link

Check out the latest commit. Here is a revised example:

SEXP test_qs_serialize(SEXP x) {
  size_t len = 0;
  unsigned char * buffer = c_qs_serialize(x, &len, 10, true, 4); // object, buffer length, compress_level, shuffle, nthreads
  SEXP y = c_qs_deserialize(buffer, len, false, 4);              // buffer, buffer length, validate_checksum, nthreads
  c_qs_free(buffer);                                             // must manually free buffer
  return y;
}

@traversc
Copy link

I haven't arrived at a conclusion as to user interface yet, but I'd probably be keen to offer just one qs2 option if that makes sense - are there options for compression / shuffle that you can recommend as default?

I'd recommend the following:

qs2_serialize(data, &buf.cur, 1 /*compress level*/, true /*shuffle*/, 1);

The default CL in the package is 3, but thats because Im concerned with minimize disk usage long term. In-memory / over network CL 1 to me is a better tradeoff since the objects are temporary.

I recommend shuffle = true. This often provides a moderate compression improvement on numerical data at very little computational cost. In previous versions of ZSTD this used to be a massive benefit, but in more recent ZSTD versions shuffling does not seem to be as important (ZSTD is likely finding the compression improvement by itself). You can experiment with your data.

@shikokuchuo
Copy link
Owner Author

Thanks @traversc seems to work and test fine. Noted on the recommended settings.

Just a thought I had - to really get some mileage out of qs2, I could eventually enable it as the default if qs2 is installed, rather than an opt in by the user.

For this to work smoothly, I'd like to be able to swap the functions provided to R_InitOutPStream() and R_InitInPStream(). I don't know if it's as simple in this case or you're doing more around the serialization interface.

@traversc
Copy link

Outside of those functions I am writing header info and hash. The R_pstream_data_t holds various templated classes, so we'd need at least 4 more functions for creating and freeing data structures for In and Out.

I think that might be a little cumbersome, so I'd be inclined to keep it as an opt in.

@shikokuchuo
Copy link
Owner Author

Sure, was just a thought. I think I can work with what you have here.

@traversc
Copy link

I did a short analysis to benchmark qs2 in nanonext.

Inter-process (same machine), sending 1 GB of data takes 1.9 seconds using "serial" and 4.4 seconds using qs2.

So the benefit of using qs2 (or compression in general) will highly depend on network speed and also what your data looks like. It's not always better.

Simulated network speed:
nanonext_qs2_bench

qs2 will outperform serial once network speed drops below ~500 Mb/s. The bottleneck for qs2 is ZSTD compression speed, which is several 100s MB/s depending on the data.

There are things that can be done to improve the tradeoff:

  • Allow multithreading
  • Async compress data and send data (Difficult, would require refactors)
  • Look into other compression algorithms that are faster at low compression levels (would require some research)

@wlandau
Copy link

wlandau commented Nov 15, 2024

Makes sense, and very useful to know.

I wonder what the tradeoffs are for thousands of smaller objects around 1 MB or so. AFAIK mirai retains the data for each task until the task is complete, because sometimes it needs to retry a task. I wonder if compression could help with the accumulation of these smaller objects waiting to finish on thousands of parallel workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants