-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interoperability between arrow-rs and nanoarrow #5052
Comments
My memory is admittedly a little hazy, but I definitely remember that flatbuffers do not mandate any alignment internally. I am therefore not sure why flatcc would be including this in its verification process... Is nanoarrow using it in some especially pedantic mode or something? Edit: reviewing the linked example code it does not appear to be doing anything to guarantee the alignment of the buffer the data is being read into - https://github.com/apache/arrow-nanoarrow/blob/d104e9065101401c63e931acdc7c10f114c47eaf/examples/cmake-ipc/src/app.c#L26. Does this work with data written by other systems? If so could you perhaps provide an example of an IPC file that works and one containing the same data that doesn't? |
recordbatch.tgz As well enclosing slices, used to generate binaries with RecordBatch(the working one was non rust generated )
|
Not sure if it helps, in the failed condition the misalignment is 4 bytes |
The provided non-working buf doesn't even read with arrow-rs, what code did you use to produce it? Edit: In fact neither files are valid IPC Streams AFAICT... |
Apologies for not being clear, the provided non-working/working samples contains only a RecordBatch, without preceding schema, so they that can be tested by nanowarrow example(app.c), the example app doesn't iterate over all headers. Used code to reproduce is at the begging of the thread. Also enclosing full samples of working and non-working. |
Annotating the relevant RecordBatch we get Good
Bad
In the bad example we can see that the vectors don't contain padding to align them to an 8 byte boundary, instead they only have 4 byte alignment. This in turn means that the structs are not correctly aligned, which I suspect is what flatcc is then complaining about. This appears to be an upstream bug Edit: Filed google/flatbuffers#8150 |
@tustvold appreciate your prompt assistance :) |
@bkietz mentioned on #6449 (comment):
|
We have disabled the CI test in #6449, so as part of closing this PR we should enable the tests |
Closes #641. Unfortunately we just have to skip checking Rust compatibility due to apache/arrow-rs#5052 (e.g., apache/arrow-rs#6449 ). This PR also ensures compatibility with big endian Arrow files and Arrow files from before the continuation token. Support for those had already been added in the decoder but hadn't made it to the stream reader yet. Local check: ```bash # Assumes arrow-testing, arrow-nanoarrow, and arrow are all checked out in the same dir export gold_dir=../arrow-testing/data/arrow-ipc-stream/integration export ARROW_NANOARROW_PATH=$(pwd)/build pip install -e "../arrow/dev/archery/[all]" archery integration --with-nanoarrow=true --run-ipc \ --gold-dirs=$gold_dir/0.14.1 \ --gold-dirs=$gold_dir/0.17.1 \ --gold-dirs=$gold_dir/1.0.0-bigendian \ --gold-dirs=$gold_dir/1.0.0-littleendian \ --gold-dirs=$gold_dir/2.0.0-compression \ --gold-dirs=$gold_dir/4.0.0-shareddict ```
Which part is this question about
Deserialization from arrow-rs into nanoarrow
Describe your question
I’ve encountered a problem while serializing a basic Arrow object using StreamWriter with a single RecordBatch, and deserialize the object using nanoarrow, it fails while deserializing RecordBatch, due to header alignment verification in flatcc https://github.com/apache/arrow-nanoarrow/blob/d104e9065101401c63e931acdc7c10f114c47eaf/dist/flatcc.c#L2453
The alignment failure occurs in the calculation of the base offset and the offset of the union value relative to the base.
I'm not fully sure, the problem is arrow-rs or flatbuffers.
Additional context
Tested arrow-rs versions 5.0.0 and 47.0.0, so it's not a degradation or never worked.
Steps to reproduce:
Create arrow object and save only RecordBatch bytes
Test using nanoarrow or example code https://github.com/apache/arrow-nanoarrow/blob/d104e9065101401c63e931acdc7c10f114c47eaf/examples/cmake-ipc/src/app.c
Reproduced on Debian 11 x86 and MacOS M1
Code snippet
`
`
The text was updated successfully, but these errors were encountered: