Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Add LargeBinary, LargeString and LargeList array types #43266

Closed
adamreeve opened this issue Jul 15, 2024 · 2 comments
Closed

[C#] Add LargeBinary, LargeString and LargeList array types #43266

adamreeve opened this issue Jul 15, 2024 · 2 comments

Comments

@adamreeve
Copy link
Contributor

Describe the enhancement requested

This is a subtask of #34736

These arrays will not be able to support value buffers > 2 GB until #23776 is addressed, but there's some value in supporting these types anyway, to simplify integration with other Arrow implementations.

See #34736 (comment) for some more context.

Component(s)

C#

@adamreeve
Copy link
Contributor Author

take

CurtHagenlocher pushed a commit that referenced this issue Jul 19, 2024
…#43269)

### Rationale for this change

See #43266. Note that LargeBinary and LargeString are still limited to 2 GiB buffers, and LargeList is limited to offsets that can be represented as int32.

### What changes are included in this PR?

* Add new Array subtypes: LargeBinaryArray, LargeStringArray and LargeListArray
* Support round-tripping these array types via the IPC format
* Support round-tripping these array types via the C Data Interface
* Improve error messages when importing arrays that are too large via IPC or C Data Interface
* Enable integration tests for the new types 
* Update documentation

### Are these changes tested?

Yes, I've added some basic tests specifically for the new array types, and added these to the test data generator so they're covered by the existing tests for round tripping using IPC and C Data Interface.

### Are there any user-facing changes?

Yes, this is a new user facing feature.

### Implementation notes

* I haven't added builders for these new array types. Given they're added to help with interoperability with other libraries, I wouldn't expect .NET users to build arrays of these types as they provide no other benefit over the non-large types until we have proper large memory support. But I'm happy to add this if it would be useful.
* The new array types share a lot of logic with the non-large types. I considered trying to consolidate this logic by adding a new `BinaryArrayBase<TOffset>` class for example, but I think this would require generic math support to work nicely, and would still complicate the code quite a bit and add extra virtual method call overhead. So I think it's fine to keep these new Array subtypes independent from the non-large types.
* I haven't included support for materializing a LargeStringArray (see #41048). I'm not sure whether there would be a use for this, but it could be added later if needed.
* GitHub Issue: #43266

Authored-by: Adam Reeve <[email protected]>
Signed-off-by: Curt Hagenlocher <[email protected]>
@CurtHagenlocher CurtHagenlocher added this to the 18.0.0 milestone Jul 19, 2024
@CurtHagenlocher
Copy link
Contributor

Issue resolved by pull request 43269
#43269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants