[Go] ipc.Writer Option to skip appending data buffers #76

asfimport · 2019-08-02T01:06:57Z

For cases where we have a known shared memory region, it would be great if the ipc.Writer (and by extension ipc.Reader?) had the ability to write out everything but the actual buffers holding the data. That way we can still utilize the ipc mechanisms to communicate without having to serialize all the underlying data across the wire.

This seems like it should be possible since the RecordBatch flatbuffers only contain the metadata and the underlying data buffers are appended later. We just need to skip appending the underlying data buffers.

@sbinet thoughts?

Reporter: Nick Poorman / @nickpoorman

_{Note: This issue was originally created as ARROW-6107. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2019-08-02T13:18:34Z

Sebastien Binet / @sbinet:
not saying it wouldn't be advisable nor doable, but: if it's already in a shmem region, why not just use that already?

(and I guess it's kind of implementing: https://issues.apache.org/jira/browse/ARROW-4852)

asfimport · 2019-08-02T13:43:02Z

Nick Poorman / @nickpoorman:
https://issues.apache.org/jira/browse/ARROW-4852 Is the same use case I'm thinking of.

If you have an Arrow Table in C (or Python) and you want to access the data in Go, you can pass a pointer back from C to the underlying data buffers. However, you still have to collect all the metadata to utilize the buffers. Making CGO calls is slow, so being able to pass a pointer to the data buffers and a pointer to the serialized metadata would ensure a more constant time when crossing the language boundary.

I did a simple POC to demonstrate what it would take to collect all the information from Python and re-materialize it in Go. https://github.com/nickpoorman/go-py-arrow-bridge The bottleneck is the number of CGO calls required to fetch all the metadata.

asfimport · 2019-08-06T12:17:00Z

Sebastien Binet / @sbinet:
ok.

(just nit-picking but to really assess the CGo overhead, one should directly call C, not C++-via-python :P. that said, it's a nice PoC.)

SGTM.

assignUser transferred this issue from apache/arrow Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Go] ipc.Writer Option to skip appending data buffers #76

[Go] ipc.Writer Option to skip appending data buffers #76

asfimport commented Aug 2, 2019

asfimport commented Aug 2, 2019

asfimport commented Aug 2, 2019

asfimport commented Aug 6, 2019

[Go] ipc.Writer Option to skip appending data buffers #76

[Go] ipc.Writer Option to skip appending data buffers #76

Comments

asfimport commented Aug 2, 2019

asfimport commented Aug 2, 2019

asfimport commented Aug 2, 2019

asfimport commented Aug 6, 2019