-
Notifications
You must be signed in to change notification settings - Fork 32
Is it possible to get pointer of Uint8List instead of allocating then copy #31
Comments
The simplest solution would be to use C memory as the buffer. You can use a
With pure usage of
You code e.g. contains var sourceBuffer = List<int>();
...
await for (final chunk in stream) {
...
sourceBuffer.addAll(chunk);
...
nextSrcSize = _decompressFrame(
context, dstBuffer, dstSizePtr, srcBuffer, srcSizePtr);
...
sourceBuffer = sourceBuffer.sublist(consumedSrcSize); Firstly, please note that using Secondly, instead of using the int sourceBufferPos = 0;
int bufferSize = 4 * 1024;
var sourceBuffer = allocate<Uint8>(bufferSize);
// ^^^^^^^^^^^^^^^^^^^^^^^^^^-- make a generous estimate of the chunk sizes we get
...
await for (final chunk in stream) {
...
// <- some code to resize [sourceBuffer] if our estimate above was too small
sourceBuffer.asTypedList(bufferSize).setRange(sourceBufferPos, <length>, ...);
...
nextSrcSize = _decompressFrame(..., sourceBuffer, ...); Another comment regarding your code, you can use the ...
srcSizePtr[0] = sourceBuffer.length; // <-- Instead of srcSizePtr.asTypedList(1).setAll(0, [sourceBuffer.length]);
...
final consumedSrcSize = srcSizePtr[0]; // <-- Instead of srcSizePtr.elementAt(0).value Lastly, please notice that the |
As always, if you can reduce your code to a small benchmark we could take a look and see if we can make something faster. Exposing pointers to Dart VM heap managed objects would need to be thought through very carefully, so it's unlikely to happen very soon. |
@mkustermann Thanks for the hints! Good to know the plan, I will create some FFI functions to allocate native memory directly then. |
@mkustermann Also I cannot find size_t in dart FFI so in my code I use Uint64 which is not accurate either and will cause crash in non x64 build. Any suggestions? |
@mkustermann Regarding the native buffer suggestion, I thought about it but I feel it makes things worse. No matter how we implement this, the compress / decompress function has input buffer from dart heap, and its output buffer will eventually land on dart heap (thus accessible from dart functions), the problem here is the native code algorithm does not have access to dart heap, thus we make a copy to C heap and copy it back (to dart heap), which is definitely not efficient. Given input and output have to be on dart heap, the most efficient way must be to give native code access to it (dart heap), so only if dart vm provides 2 APIs, we can achieve this efficiently. 1. be able to pass Uint8List as Pointer to grant native code access to the raw input. 2. (this is actually half way done), grant native code access to output on dart heap (I assume allocate function in dart:ffi does this), what is missing here is, to let a dart reference (Uint8List) claim the memory of the output buffer with GC taking control (not sure if Uint8List.view lets GC take control of deallocating the memory). Does that make sense? |
That is not necessary. You can have the buffer live in C heap, and expose it to Dart with
Finalizers are tracked in dart-lang/sdk#35770.
As @mkustermann mentioned, please provide a benchmark so that we can assess whether it's the copying that's slow, or whether it's something else that we should optimize. |
@hanabi1224 If you follow our advice (**) - how much time is spent in copying vs performing the actual compression/decompression in (**) By having two buffers in C (the input/output buffers for the compression/decompression) and copy into the input buffer before compression/decompression and copy out of the C buffer after compression/decompression -- and perform no other copies. |
@dcharkes @mkustermann I have refined my code with @mkustermann 's suggestions to replace List with BytesBuilder, but still one-shot decompression is much faster than stream decompression, which uses exactly the same native API, the only difference from what I can tell is, in stream api, it performs multiple copy of source buffer. I just created a benchmark here Bench result on CI: https://cirrus-ci.com/task/5740408574836736 BTW, if you want to run the bench locally, in addition to c compiler, rust is needed as well (Install command: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable) |
@mkustermann Regarding your advice ** in one-shot compression / decompression input / output buffer is only allocated once (currently with dart:ffi allocate method, I tried native allocation but I dont see difference), in stream decompression mode, input buffer is streamed so that it's impossible to copy only once, the output buffer is reused so it's only allocated once (with dart:ffi allocate method) tho. |
@hanabi1224 I have a bit of trouble building the rust lib:
|
@dcharkes Ah it's the submodule of zlib source code, sry for not mentioning that, please run git submodule update --init --recursive |
edit: |
Replace all your
-->
Explanation
Which shows that BytesBuilder is copying data. The thing that is slowest after that is |
@dcharkes Awesome, thanks! |
@dcharkes Regarding the effort of replacing My another question is, how to efficiently make a slice of BytesBuilder? Not sure if I'm doing it properly final tmpBuffer = sourceBufferBuilder.takeBytes();
final remaining = Uint8List.view(tmpBuffer.buffer, consumedSrcSize,
tmpBuffer.length - consumedSrcSize);
sourceBufferBuilder.add(remaining); |
Closing old issue, please reopen if issue persists. |
Echo #27
I have a package that binds to native lz4 compression lib via dart ffi, however, with this restriction, I have to duplicate the data before compression or decompression, which I really really want to avoid.
I understand a dart reference is GC controlled and I can play some tricks to hold the reference to avoid unexpected GC kick in, rather than actually duplicating data before decompression, please let me know your concerns. Thanks!
BTW, If you'd like to understand my scenario better, source code is here
The text was updated successfully, but these errors were encountered: