Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] ipc.Writer Option to skip appending data buffers #76

Open
asfimport opened this issue Aug 2, 2019 · 3 comments
Open

[Go] ipc.Writer Option to skip appending data buffers #76

asfimport opened this issue Aug 2, 2019 · 3 comments
Labels
Type: enhancement New feature or request

Comments

@asfimport
Copy link

For cases where we have a known shared memory region, it would be great if the ipc.Writer (and by extension ipc.Reader?) had the ability to write out everything but the actual buffers holding the data. That way we can still utilize the ipc mechanisms to communicate without having to serialize all the underlying data across the wire.

 

This seems like it should be possible since the RecordBatch flatbuffers only contain the metadata and the underlying data buffers are appended later. We just need to skip appending the underlying data buffers.

 

@sbinet thoughts?

Reporter: Nick Poorman / @nickpoorman

Note: This issue was originally created as ARROW-6107. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Sebastien Binet / @sbinet:
not saying it wouldn't be advisable nor doable, but: if it's already in a shmem region, why not just use that already?

(and I guess it's kind of implementing: https://issues.apache.org/jira/browse/ARROW-4852)

@asfimport
Copy link
Author

Nick Poorman / @nickpoorman:
https://issues.apache.org/jira/browse/ARROW-4852 Is the same use case I'm thinking of.

 

If you have an Arrow Table in C (or Python) and you want to access the data in Go, you can pass a pointer back from C to the underlying data buffers. However, you still have to collect all the metadata to utilize the buffers. Making CGO calls is slow, so being able to pass a pointer to the data buffers and a pointer to the serialized metadata would ensure a more constant time when crossing the language boundary.

 

I did a simple POC to demonstrate what it would take to collect all the information from Python and re-materialize it in Go. https://github.com/nickpoorman/go-py-arrow-bridge The bottleneck is the number of CGO calls required to fetch all the metadata.

@asfimport
Copy link
Author

Sebastien Binet / @sbinet:
ok.

(just nit-picking but to really assess the CGo overhead, one should directly call C, not C++-via-python :P. that said, it's a nice PoC.)

SGTM.

 

@assignUser assignUser transferred this issue from apache/arrow Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant