-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-32240: [C#] Add new Apache.Arrow.Compression package to implement IPC decompression #33893
Conversation
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="CommunityToolkit.HighPerformance" Version="8.0.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CommunityToolkit.HighPerformance
dependency is only needed for the AsStream
extension methods on Memory<byte>
and ReadOnlyMemory<byte>
which could be re-implemented if we wanted to minimize extra dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the LZ4 library offers APIs that takes Spans. We should just use those APIs instead, and then we can get rid of this dependency.
Note that there are 2 LZ4 libraries - one for streams (which this pr is using now) and one that doesn't use streams - https://www.nuget.org/packages/K4os.Compression.LZ4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two libraries actually implement different formats. The K4os.Compression.LZ4.Streams
library implements the LZ4 frame format, and the K4os.Compression.LZ4
library implements the LZ4 block format. Arrow IPC uses the frame format (https://github.com/apache/arrow/blob/apache-arrow-11.0.0/format/Message.fbs#L46-L48), so we need to use the streams library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened MiloszKrajewski/K4os.Compression.LZ4#79 to request this is added to the K4os.Compression.LZ4
library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Eric, your issue prompted me to dig more into the K4os
library though and I realised that since I initially started working on this, a new API has been added that allows using the frame format with more types than just Stream
, so I've switched to this and removed the CommunityToolkit.HighPerformance
dependency.
@github-actions crossbow submit nuget verify-rc-source-csharp* |
@adamreeve I have kicked of some additional CI jobs that should be sufficent to test the changes to the ci side of things and building the nuget packages. The release script is tested with @kou can you give details on if additional tests are needed for this PR and if so how to implement them? |
Revision: 72f1c16 Submitted crossbow builds: ursacomputing/crossbow @ actions-c460b32b4b |
looks like the binary is properly uploaded: https://github.com/ursacomputing/crossbow/releases/tag/actions-c460b32b4b-github-nuget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI side looks good, cant comment on the C# +1
@github-actions crossbow submit nuget |
@eerhardt Could you review this? |
We don't have a test for |
Revision: 6b5145e Submitted crossbow builds: ursacomputing/crossbow @ actions-2b874cf487
|
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="CommunityToolkit.HighPerformance" Version="8.0.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine.
csharp/src/Apache.Arrow.Compression/Apache.Arrow.Compression.csproj
Outdated
Show resolved
Hide resolved
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="CommunityToolkit.HighPerformance" Version="8.0.0" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the LZ4 library offers APIs that takes Spans. We should just use those APIs instead, and then we can get rid of this dependency.
Note that there are 2 LZ4 libraries - one for streams (which this pr is using now) and one that doesn't use streams - https://www.nuget.org/packages/K4os.Compression.LZ4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple last nits. I think this can be merged after they are addressed.
Thanks for the great work here, @adamreeve!
csharp/test/Apache.Arrow.Compression.Tests/Apache.Arrow.Compression.Tests.csproj
Outdated
Show resolved
Hide resolved
csharp/test/Apache.Arrow.Compression.Tests/Apache.Arrow.Compression.Tests.csproj
Outdated
Show resolved
Hide resolved
Co-authored-by: Eric Erhardt <[email protected]>
Co-authored-by: Eric Erhardt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks again!
I'll merge this soon, assuming no more feedback is submitted.
Benchmark runs are scheduled for baseline = cb63068 and contender = 6776229. 6776229 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Rationale for this change
This further addresses #32240 and is a follow up to PR #33603 to provide an implementation of the
ICompressionCodecFactory
interface in a newApache.Arrow.Compression
package. Making this a separate package means users who don't need IPC decompression support don't need to pull in extra dependencies.What changes are included in this PR?
Adds a new
Apache.Arrow.Compression
package and moves the existing compression codec implementations used for testing into this package.Are these changes tested?
There are unit tests verifying the decompression support, but this also affects the release scripts and I'm not sure how to fully test these.
Are there any user-facing changes?
Yes, this adds a new package users can install for IPC decompression support, so documentation has been updated.