Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] VarBinaryWriter should support writing from byte[] or ByteBuffer #37705

Closed
lidavidm opened this issue Sep 13, 2023 · 1 comment · Fixed by #37791 or #37883
Closed

[Java] VarBinaryWriter should support writing from byte[] or ByteBuffer #37705

lidavidm opened this issue Sep 13, 2023 · 1 comment · Fixed by #37791 or #37883

Comments

@lidavidm
Copy link
Member

Describe the enhancement requested

The writer API only supports writing from ArrowBuf. So if you don't already have an ArrowBuf, you have to make a new allocation and copy the data over, only for the writer to copy the data from your new allocation into the vector. VarBinaryWriter should just also support these common types for bytestrings and eliminate an unnecessary copy. (VarBinaryVector already has overloads for both.)

Component(s)

Java

@jduo
Copy link
Member

jduo commented Sep 18, 2023

take

jduo added a commit to jduo/arrow that referenced this issue Sep 19, 2023
Add methods to VarBinary and LargeVarBinary writers to take in
common binary parameters - byte[] and ByteBuffer.
lidavidm pushed a commit that referenced this issue Sep 19, 2023
### Rationale for this change
ByteBuffer and byte[] are commonly used to hold binary data. The current writers require working
with ArrowBuf objects which need to be populated by copying from these types, then copying
into the vector.

### What changes are included in this PR?
Add methods to VarBinary and LargeVarBinary writers to take in common binary parameters - byte[] and ByteBuffer.
The writer now sets these objects on the Vectors directly.

### Are these changes tested?
Yes.

* Closes: #37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
@lidavidm lidavidm added this to the 14.0.0 milestone Sep 19, 2023
jduo added a commit to jduo/arrow that referenced this issue Sep 26, 2023
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not
just the Impls.
jduo added a commit to jduo/arrow that referenced this issue Sep 26, 2023
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not
just the Impls.
lidavidm pushed a commit that referenced this issue Sep 26, 2023
### Rationale for this change
Improve the convenience of using VarCharWriter and LargeVarCharWriter interfaces.
Also allow users to avoid unnecessary overhead creating Arrow buffers when writing
String and Text data.

### What changes are included in this PR?
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not just the Impls.### Are these changes tested?

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: #37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
etseidl pushed a commit to etseidl/arrow that referenced this issue Sep 28, 2023
…e#37883)

### Rationale for this change
Improve the convenience of using VarCharWriter and LargeVarCharWriter interfaces.
Also allow users to avoid unnecessary overhead creating Arrow buffers when writing
String and Text data.

### What changes are included in this PR?
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not just the Impls.### Are these changes tested?

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…e#37883)

### Rationale for this change
Improve the convenience of using VarCharWriter and LargeVarCharWriter interfaces.
Also allow users to avoid unnecessary overhead creating Arrow buffers when writing
String and Text data.

### What changes are included in this PR?
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not just the Impls.### Are these changes tested?

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
dongjoon-hyun pushed a commit to apache/spark that referenced this issue Nov 4, 2023
### What changes were proposed in this pull request?
This pr upgrade Apache Arrow from 13.0.0 to 14.0.0.

### Why are the changes needed?
The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes.
‎
In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](apache/arrow#36332)) and problems with importing empty data arrays([GH-37056](apache/arrow#37056)). It also optimizes the process of appending variable length vectors([GH-37829](apache/arrow#37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](apache/arrow#38076)).
‎
The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](apache/arrow#25659)), support for extended expressions in ScannerBuilder([GH-34252](apache/arrow#34252)), and the exposure of the VectorAppender class([GH-37246](apache/arrow#37246)).
‎
The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](apache/arrow#36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](apache/arrow#37702)).
‎
Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](apache/arrow#37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](apache/arrow#37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](apache/arrow#37863)).

The full release notes as follows:
- https://arrow.apache.org/release/14.0.0.html

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43650 from LuciferYang/arrow-14.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…#37791)

### Rationale for this change
ByteBuffer and byte[] are commonly used to hold binary data. The current writers require working
with ArrowBuf objects which need to be populated by copying from these types, then copying
into the vector.

### What changes are included in this PR?
Add methods to VarBinary and LargeVarBinary writers to take in common binary parameters - byte[] and ByteBuffer.
The writer now sets these objects on the Vectors directly.

### Are these changes tested?
Yes.

* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…e#37883)

### Rationale for this change
Improve the convenience of using VarCharWriter and LargeVarCharWriter interfaces.
Also allow users to avoid unnecessary overhead creating Arrow buffers when writing
String and Text data.

### What changes are included in this PR?
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not just the Impls.### Are these changes tested?

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…#37791)

### Rationale for this change
ByteBuffer and byte[] are commonly used to hold binary data. The current writers require working
with ArrowBuf objects which need to be populated by copying from these types, then copying
into the vector.

### What changes are included in this PR?
Add methods to VarBinary and LargeVarBinary writers to take in common binary parameters - byte[] and ByteBuffer.
The writer now sets these objects on the Vectors directly.

### Are these changes tested?
Yes.

* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…e#37883)

### Rationale for this change
Improve the convenience of using VarCharWriter and LargeVarCharWriter interfaces.
Also allow users to avoid unnecessary overhead creating Arrow buffers when writing
String and Text data.

### What changes are included in this PR?
Add write() methods for Text and String types.
Ensure these methods are part of the writer interfaces and not just the Impls.### Are these changes tested?

### Are these changes tested?
Yes.

### Are there any user-facing changes?
No.
* Closes: apache#37705

Authored-by: James Duong <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment