-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] The offset buffer of empty BaseVariableWidthVector should not be empty when exposed through C Data Interface #40038
Comments
…-size layout should not be empty
@viirya Thanks for reporting this issue. Planning to make a PR? |
@vibhatha Yea, I'm working on a fix locally. But it causes a few tests failed now. Still looking into fixing the tests. |
Wonderful! |
…e-size layout should not be empty
It looks like both cases are acceptable (empty or one single zero value element): https://lists.apache.org/thread/w7g1zfqrjxx0bvrct0mt5zwxvdnc9nob Close this for now. |
Per more discussions in the PR, we probably need to fix C data interface of Java Arrow to properly export empty offset buffer for var-size arrays. |
… variable-size layout should not be empty" This reverts commit 5eb34e1.
Is there a sample code that we could use to reproduce the issue and probably look for a fix? |
Hmm, the code producing the issue is complicated. We execute TPCDS query in Spark/DataFusion and pass the results as Arrow vectors through C Data interface to Rust (arrow-rs). I saw there are some tests in |
Ah, after deeper debugging today, I found we were misled by
But as I debugged it today, it actually returns zero value pointer. So it causes the Rust arrow think of it a NULL pointer. And, there is an update to the API document: It is changed to:
Above is easily to verify by a simple test in @Test
public void testEmptyBuffer() throws Exception {
try (ListVector empty = ListVector.empty("empty", allocator)) {
Assert.assertTrue("memory shouldn't zero", empty.getOffsetBuffer().memoryAddress() != 0);
}
} It will pass in default run, but fail in unsafe run. So once |
Actually we have |
|
You could statically allocate a buffer in the allocator itself to represent the zero size buffer (in fact doesn't the allocator already do this?) |
Zero size buffer? Why we need zero size buffer? Do you mean one zero value buffer? |
Er, weren't we talking about this?
|
I don't see why the allocator would give NULL for an allocation of size 4 or 8... |
As you said, we already have it now. But for C Data Interface, as we discussed yesterday, it is not valid under C Data Interface. I think now we are going to send one value (zero) buffer instead of zero size buffer, isn't?
To clarify, we currently send a zero size buffer (i.e, For the reason, why the allocator give NULL for an allocation of empty buffer, see my previous comment: #40038 (comment). In short, for the |
Let me try to summarize the issue:
|
Ok. (1) sounds fine. (2) is also fine. That's explicitly allowed so long as the buffer size really is 0. It won't be relevant after (1) is fixed, right? |
Yea, if we don't send empty buffer as offset buffer, it is fine. |
…out through C Data Interface (#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: #40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
…ze layout through C Data Interface (apache#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: apache#40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
…ze layout through C Data Interface (apache#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: apache#40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
…ze layout through C Data Interface (apache#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: apache#40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
…ze layout through C Data Interface (apache#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: apache#40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
…ze layout through C Data Interface (apache#40043) ### Rationale for this change We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0). According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout: > The offsets buffer contains length + 1 signed integers ... So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`). ### What changes are included in this PR? This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface. ### Are these changes tested? Added test cases. ### Are there any user-facing changes? No * Closes: apache#40038 Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: David Li <[email protected]>
Describe the bug, including details regarding any error messages, version, and platform.
We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow
BaseVariableWidthVector
class assigns an empty offset buffer if the array is empty (value count 0).According to Arrow spec for variable size binary layout:
So for an empty string array, its offset buffer should be a buffer with one element (generally it is
0
).Component(s)
Java
The text was updated successfully, but these errors were encountered: