qs2 proof of concept #57

shikokuchuo · 2024-11-09T21:11:13Z

Not for merging

Very quick concept.

The short example I've added to the end of recv() docs appears to work well.

@traversc
@wlandau FYI

…o unblock the thread

Persistent wait thread

shikokuchuo · 2024-11-09T21:21:17Z

@traversc

Just 3 requests if I may:

Pls make the buffer type unsigned char rather than char. More usual for binary data (e.g. R itself defines https://github.com/wch/r-source/blob/2aeb2367202a89d408afed23839b139197b6bfde/src/include/Rinternals.h#L64)
Pls make the size type size_t rather than uint64_t (compiler warning here: https://github.com/shikokuchuo/nanonext/actions/runs/11759589221/job/32759211560?pr=57#step:6:75).
Either provide a qs2_free() function if you're allocating the buffer so that a consumer can be sure what they're using is correct, or as this is R, use R_Calloc() when you allocate so I can call R_Free() on my side. The latter would integrate well with my existing code, but I think the former would be a more idiomatic general solution.

traversc · 2024-11-10T02:14:23Z

@traversc

Just 3 requests if I may:

Pls make the buffer type unsigned char rather than char. More usual for binary data (e.g. R itself defines https://github.com/wch/r-source/blob/2aeb2367202a89d408afed23839b139197b6bfde/src/include/Rinternals.h#L64)

Pls make the size type size_t rather than uint64_t (compiler warning here: https://github.com/shikokuchuo/nanonext/actions/runs/11759589221/job/32759211560?pr=57#step:6:75).

Either provide a qs2_free() function if you're allocating the buffer so that a consumer can be sure what they're using is correct, or as this is R, use R_Calloc() when you allocate so I can call R_Free() on my side. The latter would integrate well with my existing code, but I think the former would be a more idiomatic general solution.

Thanks, all good suggestions! I've been using char instead of unsigned char as lots of older libraries (lz4) used char. But you're right the newer convention seems to be to use unsigned char. I'll get back to you shortly.

shikokuchuo · 2024-11-10T22:31:35Z

Great thanks @traversc. No particular rush from my side. I have a small bugfix release of nanonext already lined up, and I'd be happy for this to enter at the next feature release.

I'll open up new PRs at nanonext and mirai for the actual work.

I haven't arrived at a conclusion as to user interface yet, but I'd probably be keen to offer just one qs2 option if that makes sense - are there options for compression / shuffle that you can recommend as default?

traversc · 2024-11-11T21:01:29Z

Check out the latest commit. Here is a revised example:

SEXP test_qs_serialize(SEXP x) {
  size_t len = 0;
  unsigned char * buffer = c_qs_serialize(x, &len, 10, true, 4); // object, buffer length, compress_level, shuffle, nthreads
  SEXP y = c_qs_deserialize(buffer, len, false, 4);              // buffer, buffer length, validate_checksum, nthreads
  c_qs_free(buffer);                                             // must manually free buffer
  return y;
}

traversc · 2024-11-11T21:11:32Z

I haven't arrived at a conclusion as to user interface yet, but I'd probably be keen to offer just one qs2 option if that makes sense - are there options for compression / shuffle that you can recommend as default?

I'd recommend the following:

qs2_serialize(data, &buf.cur, 1 /*compress level*/, true /*shuffle*/, 1);

The default CL in the package is 3, but thats because Im concerned with minimize disk usage long term. In-memory / over network CL 1 to me is a better tradeoff since the objects are temporary.

I recommend shuffle = true. This often provides a moderate compression improvement on numerical data at very little computational cost. In previous versions of ZSTD this used to be a massive benefit, but in more recent ZSTD versions shuffling does not seem to be as important (ZSTD is likely finding the compression improvement by itself). You can experiment with your data.

shikokuchuo · 2024-11-12T15:51:19Z

Thanks @traversc seems to work and test fine. Noted on the recommended settings.

Just a thought I had - to really get some mileage out of qs2, I could eventually enable it as the default if qs2 is installed, rather than an opt in by the user.

For this to work smoothly, I'd like to be able to swap the functions provided to R_InitOutPStream() and R_InitInPStream(). I don't know if it's as simple in this case or you're doing more around the serialization interface.

traversc · 2024-11-12T17:42:58Z

Outside of those functions I am writing header info and hash. The R_pstream_data_t holds various templated classes, so we'd need at least 4 more functions for creating and freeing data structures for In and Out.

I think that might be a little cumbersome, so I'd be inclined to keep it as an opt in.

shikokuchuo · 2024-11-13T20:04:25Z

Sure, was just a thought. I think I can work with what you have here.

traversc · 2024-11-14T22:48:27Z

I did a short analysis to benchmark qs2 in nanonext.

Inter-process (same machine), sending 1 GB of data takes 1.9 seconds using "serial" and 4.4 seconds using qs2.

So the benefit of using qs2 (or compression in general) will highly depend on network speed and also what your data looks like. It's not always better.

Simulated network speed:

qs2 will outperform serial once network speed drops below ~500 Mb/s. The bottleneck for qs2 is ZSTD compression speed, which is several 100s MB/s depending on the data.

There are things that can be done to improve the tradeoff:

Allow multithreading
Async compress data and send data (Difficult, would require refactors)
Look into other compression algorithms that are faster at low compression levels (would require some research)

wlandau · 2024-11-15T10:08:57Z

Makes sense, and very useful to know.

I wonder what the tradeoffs are for thousands of smaller objects around 1 MB or so. AFAIK mirai retains the data for each task until the task is complete, because sometimes it needs to retry a task. I wonder if compression could help with the accumulation of these smaller objects waiting to finish on thousands of parallel workers.

shikokuchuo added 9 commits November 5, 2024 15:32

persistent wait thread concept

c2a024f

introduce rnng_thread_shutdown

5595075

greatly simplify implementation with single wait mutex/cv

8f4d22d

no longer cancel the aio on interrupt but on next wait if necessary t…

fdabb17

…o unblock the thread

preserve previous behaviour - launch new threads when required

3711028

increment version and update NEWS

062dd2b

Merge pull request #56 from shikokuchuo/thread_wait

08e77f4

Persistent wait thread

refactor wait thread logic

e9fc6a1

qs2 concept

3981b4e

shikokuchuo marked this pull request as draft November 9, 2024 21:11

shikokuchuo added 2 commits November 9, 2024 21:25

correct free

348c5da

add remote

2534d28

update argument types

31e0d46

shikokuchuo force-pushed the main branch from 6408dc8 to a1deb15 Compare November 13, 2024 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qs2 proof of concept #57

qs2 proof of concept #57

shikokuchuo commented Nov 9, 2024

shikokuchuo commented Nov 9, 2024 •

edited

Loading

traversc commented Nov 10, 2024

shikokuchuo commented Nov 10, 2024

traversc commented Nov 11, 2024

traversc commented Nov 11, 2024

shikokuchuo commented Nov 12, 2024

traversc commented Nov 12, 2024

shikokuchuo commented Nov 13, 2024

traversc commented Nov 14, 2024

wlandau commented Nov 15, 2024

qs2 proof of concept #57

Are you sure you want to change the base?

qs2 proof of concept #57

Conversation

shikokuchuo commented Nov 9, 2024

shikokuchuo commented Nov 9, 2024 • edited Loading

traversc commented Nov 10, 2024

shikokuchuo commented Nov 10, 2024

traversc commented Nov 11, 2024

traversc commented Nov 11, 2024

shikokuchuo commented Nov 12, 2024

traversc commented Nov 12, 2024

shikokuchuo commented Nov 13, 2024

traversc commented Nov 14, 2024

wlandau commented Nov 15, 2024

shikokuchuo commented Nov 9, 2024 •

edited

Loading