Skip to content

Commit

Permalink
apacheGH-40038: [Java] Export non empty offset buffer for variable-si…
Browse files Browse the repository at this point in the history
…ze layout through C Data Interface (apache#40043)

### Rationale for this change

We encountered an error when exchanging string array from Java to Rust through Arrow C data interface. At Rust side, it complains that the buffer at position 1 (offset buffer) is null. After tracing down and some debugging, it looks like the issue is Java Arrow `BaseVariableWidthVector` class assigns an empty offset buffer if the array is empty (value count 0).

According to Arrow [spec](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout) for variable size binary layout:

> The offsets buffer contains length + 1 signed integers ...

So for an empty string array, its offset buffer should be a buffer with one element (generally it is `0`).

### What changes are included in this PR?

This patch replaces current empty offset buffer in variable-size layout vector classes when exporting arrays through C Data Interface.

### Are these changes tested?

Added test cases.

### Are there any user-facing changes?

No

* Closes: apache#40038

Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: David Li <[email protected]>
  • Loading branch information
viirya authored and rok committed May 8, 2024
1 parent f8fa0a5 commit e4f4745
Showing 1 changed file with 5 additions and 15 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -553,21 +553,11 @@ public void splitAndTransfer(int startIndex, int length) {
Preconditions.checkArgument(startIndex >= 0 && length >= 0 && startIndex + length <= valueCount,
"Invalid parameters startIndex: %s, length: %s for valueCount: %s", startIndex, length, valueCount);
to.clear();
if (length > 0) {
final int startPoint = offsetBuffer.getInt(startIndex * OFFSET_WIDTH);
final int sliceLength = offsetBuffer.getInt((startIndex + length) * OFFSET_WIDTH) - startPoint;
to.offsetBuffer = to.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
/* splitAndTransfer offset buffer */
for (int i = 0; i < length + 1; i++) {
final int relativeOffset = offsetBuffer.getInt((startIndex + i) * OFFSET_WIDTH) - startPoint;
to.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeOffset);
}
/* splitAndTransfer validity buffer */
splitAndTransferValidityBuffer(startIndex, length, to);
/* splitAndTransfer data buffer */
dataTransferPair.splitAndTransfer(startPoint, sliceLength);
to.lastSet = length - 1;
to.setValueCount(length);
to.offsetBuffer = to.allocateOffsetBuffer((length + 1) * OFFSET_WIDTH);
/* splitAndTransfer offset buffer */
for (int i = 0; i < length + 1; i++) {
final int relativeOffset = offsetBuffer.getInt((startIndex + i) * OFFSET_WIDTH) - startPoint;
to.offsetBuffer.setInt(i * OFFSET_WIDTH, relativeOffset);
}
}

Expand Down

0 comments on commit e4f4745

Please sign in to comment.