Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Support new data types #34736

Open
4 of 9 tasks
wjones127 opened this issue Mar 27, 2023 · 10 comments
Open
4 of 9 tasks

[C#] Support new data types #34736

wjones127 opened this issue Mar 27, 2023 · 10 comments

Comments

@wjones127
Copy link
Member

wjones127 commented Mar 27, 2023

Describe the enhancement requested

The C# implementation still needs to add support for:

Component(s)

C#

@teo-tsirpanis
Copy link
Contributor

Can you also add Tensors (#34746)?

@Tommo56700
Copy link

Map type addressed here: #35243

@istvan-fodor
Copy link

Hi All, just wanted to check if there is any plan on adding support for LargeBinary and LargeList?

@CurtHagenlocher
Copy link
Contributor

CurtHagenlocher commented Jun 13, 2024

Hi All, just wanted to check if there is any plan on adding support for LargeBinary and LargeList?

There are basically two scenarios here:

  1. Some producer is creating large arrays even though the data itself still fits into a non-large array.
  2. Supporting arrays whose buffer sizes exceed what the CLR supports.

The first of these is probably pretty easy but I don't see much value in having it -- and it would be misleading to "support" LargeBinary but have it work only for smaller arrays.

I have an idea for how to support the large buffers required for the second of these (see #38086) but don't expect to be able to work on it for a while. I also suspect it may require a bunch of Flatbuffers-related hackery.

Is this something you're interested in implementing? :D

@istvan-fodor
Copy link

Hahh, I wish I could, unfortunately I am absolutely new to C#.

I work on a Rust project where we use LargeList and LargeBinary values and now we need to pass those into a C# context. Right now I am just trying to figure out what's possible and what isn't, but this seems to be a major blocker unless we do some workarounds. Looking at your suggestions, we would def have issues with 1., as we have some heavy Lidar / Image datasets that we can only safely handle in LargeBinary arrays.

@CurtHagenlocher
Copy link
Contributor

Are there also individual values in these arrays which are larger than 2GB or is it that the array itself exceeds that size?

@istvan-fodor
Copy link

istvan-fodor commented Jun 13, 2024

The array itself can exceed 2GB. Individual records would never be close to 2GB, values are mostly under 1 MB.

@adamreeve
Copy link
Contributor

Hi @CurtHagenlocher, we've run into issues integrating with Polars, which always exports string data to Arrow as the LargeString type (see pola-rs/polars#15047). We can work around this by casting to String first via PyArrow, but it would simplify things if there was LargeString support in .NET Arrow, even if it didn't yet support values buffers that were actually > 2GB. Would you be open to accepting a PR to add LargeString, LargeBinary and LargeList arrays?

I'm hopeful I might eventually be able to help with adding support for IPC record batches and buffers > 2 GB too, but I think there is some value in having support for LargeString etc even if they don't actually support large buffers yet, and it makes sense to me to split this work out from adding support for large buffers.

@CurtHagenlocher
Copy link
Contributor

I think blocking integration with Polars is probably reason enough to add support for these types. I'd expect, though, that there's a reasonable error experience when the actual sizes do exceed what we can currently support.

@adamreeve
Copy link
Contributor

👍 good point, I'll make sure there are helpful error messages if consuming data that's too large via IPC or the C Data Interface.

I've opened #43266 for adding these array types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants