You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello world between write_feather() and ArrowFileReader.ReadNextRecordBatch() fails with default settings. This is specific to compressed files (see workaround below) and it looks like what happens is C# correctly decompresses the batches but provides the caller with the compressed versions of the data arrays instead of the uncompressed ones. While all of the various Length properties are set correctly in C#, the data arrays are too short to contain all of the values in the file, the bytes do not match what the decompressed bytes should be, and basic data accessors like PrimitiveArray.Values can't be used because they throw ArgumentOutOfRangeException. Looking through the C# classes in the github repo it doesn't appear there's a way for the caller to request decompression. So I'm guessing decompression is supposed to be automatic but, for some reason, isn't.
While functionally successful, the workaround of using uncompressed feather isn't great as the uncompressed files are bigger than .csv. In my application the resulting disk space penalty is hundreds of megabytes compared to the footprint of using compressed feather.
Simple single field repex:
In R (arrow 8.0.0): write_feather(tibble(value = seq(0, 1, length.out = 21)), "test lz4.feather")
In C# (Apache.Arrow 8.0.0):
using Apache.Arrow; using Apache.Arrow.Ipc; using System.IO; using System.Runtime.InteropServices;
using FileStream stream = new("test lz4.feather", FileMode.Open, FileAccess.Read, FileShare.Read);
Hello world between write_feather() and ArrowFileReader.ReadNextRecordBatch() fails with default settings. This is specific to compressed files (see workaround below) and it looks like what happens is C# correctly decompresses the batches but provides the caller with the compressed versions of the data arrays instead of the uncompressed ones. While all of the various Length properties are set correctly in C#, the data arrays are too short to contain all of the values in the file, the bytes do not match what the decompressed bytes should be, and basic data accessors like PrimitiveArray.Values can't be used because they throw ArgumentOutOfRangeException. Looking through the C# classes in the github repo it doesn't appear there's a way for the caller to request decompression. So I'm guessing decompression is supposed to be automatic but, for some reason, isn't.
While functionally successful, the workaround of using uncompressed feather isn't great as the uncompressed files are bigger than .csv. In my application the resulting disk space penalty is hundreds of megabytes compared to the footprint of using compressed feather.
Simple single field repex:
In R (arrow 8.0.0):
write_feather(tibble(value = seq(0, 1, length.out = 21)), "test lz4.feather")
In C# (Apache.Arrow 8.0.0):
using Apache.Arrow;
using Apache.Arrow.Ipc;
using System.IO;
using System.Runtime.InteropServices;
using FileStream stream = new("test lz4.feather", FileMode.Open, FileAccess.Read, FileShare.Read);
using ArrowFileReader arrowFile = new(stream);
for (RecordBatch batch = arrowFile.ReadNextRecordBatch(); batch != null; batch = arrowFile.ReadNextRecordBatch())
{
IArrowArray[] fields = batch.Arrays.ToArray();
ReadOnlySpan<double> test = MemoryMarshal.Cast<byte, double>(((DoubleArray)fields[0]).ValueBuffer.Span); // 15 incorrect values instead of 21 correctly incrementing ones (0, 0.05, 0.10, ..., 1)
Workaround in R:
write_feather(tibble(value = seq(0, 1, length.out = 21)), "test.feather", compression = "uncompressed")
Apologies if this is a known issue. I didn't find anything on a Jira search and this isn't included in the known issues list on github.
Environment: Arrow 8.0.0, R 4.2.1, VS 17.2.4
Reporter: Todd West
Related issues:
Note: This issue was originally created as ARROW-17062. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: