You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But reading the row groups in parallel with Parallel.For or Parallel.ForAsync fails with "random" error messages like System.InvalidOperationException: "don't know how to skip type Double", System.InvalidOperationException: "don't know how to skip type 14", ...
Is there a thread safe way for reading row groups in parallel already implemented?
Cheers!
The text was updated successfully, but these errors were encountered:
File stream are generally not compatible with parallel processing. You can, however, open file stream per prallel thread i.e. your Parallel.For should perform file opening operation. Or you can introduce a lock on file read, depends on what works better for you. I might state the obvious here, but asynchronous and parallel are not the same thing.
I think a short section about reading row groups in parallel would be cool in the documentation: https://aloneguid.github.io/parquet-dotnet/reading.html
When passing a filePath (string) to ParquetReader.CreateAsync(), it might also be possible to handle the concurrency within the library itself. This would be a little bit more elegant, sparing the need to figure out the row group count beforehand with another separate ParquetReader.CreateAsync() call.
Tangent problem: Is there any way to write row groups in parallel?
Issue description
Hi!
I would like to read parquet files in parallel to significantly speed up a processing pipeline.
Works as expected.
But reading the row groups in parallel with Parallel.For or Parallel.ForAsync fails with "random" error messages like System.InvalidOperationException: "don't know how to skip type Double", System.InvalidOperationException: "don't know how to skip type 14", ...
Is there a thread safe way for reading row groups in parallel already implemented?
Cheers!
The text was updated successfully, but these errors were encountered: