You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're using parquet-cli (brew install parquet-cli) to read files that created with this lib, but we're running in to issues with either errors or empty values for fields with repeated: true and/or type: 'LIST'. Reading using ParquetReader.openFile from this lib works fine though!
Steps to reproduce
Example 1 - repeated: true
Using the following schema and code, based on this README example
Read these files with parquet-cli using parquet cat <path-to-file>.
Expected behaviour
Example 1
Being able to read the file without errors.
Example 2
The result having { list: [ { element: 'abcdef' }, { element: 'fedcba' } ] } in the test field, like when reading the file using ParquetReader.openFile.
Actual behaviour
Example 1
An error is thrown, see under Error logs
Example 2
Getting the result {"id": "Row1", "test": null}
Error logs
From Example 1
Unknown error
java.lang.RuntimeException: Failed on record 0 in <omitted>/output-basic.parquet
at org.apache.parquet.cli.commands.ScanCommand.run(ScanCommand.java:75)
at org.apache.parquet.cli.Main.run(Main.java:163)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.parquet.cli.Main.main(Main.java:191)
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:<omitted>/output-basic.parquet
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:280)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:140)
at org.apache.parquet.cli.BaseCommand$1$1.advance(BaseCommand.java:356)
at org.apache.parquet.cli.BaseCommand$1$1.<init>(BaseCommand.java:337)
at org.apache.parquet.cli.BaseCommand$1.iterator(BaseCommand.java:335)
at org.apache.parquet.cli.commands.ScanCommand.run(ScanCommand.java:70)
... 3 more
Caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: required group stock (LIST) {
repeated group array {
required double price;
required int64 quantity;
}
} != repeated group stock {
required double price;
required int64 quantity;
}
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:104)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:81)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:52)
at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:167)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:155)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:245)
... 9 more
The text was updated successfully, but these errors were encountered:
johanfunnel
changed the title
Issues using parquet-cli to read files with repeated fields or LIST
Issues using parquet-cli to read created files with repeated fields or LIST
Nov 13, 2024
We're using parquet-cli (
brew install parquet-cli
) to read files that created with this lib, but we're running in to issues with either errors or empty values for fields withrepeated: true
and/ortype: 'LIST'
. Reading usingParquetReader.openFile
from this lib works fine though!Steps to reproduce
Example 1 -
repeated: true
Using the following schema and code, based on this README example
Example 2 -
type: 'LIST'
Using the following schema and code, based on the tests for array list
parquet cat <path-to-file>
.Expected behaviour
Example 1
Being able to read the file without errors.
Example 2
The result having
{ list: [ { element: 'abcdef' }, { element: 'fedcba' } ] }
in thetest
field, like when reading the file usingParquetReader.openFile
.Actual behaviour
Example 1
An error is thrown, see under Error logs
Example 2
Getting the result
{"id": "Row1", "test": null}
Error logs
From Example 1
The text was updated successfully, but these errors were encountered: