-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
Hi @Mydurian, by complex type, do you mean complex numbers with real and imaginary components? The Parquet format doesn't have built in support for complex numbers so I'm not sure what you're asking. Can you provide an example of the schema of a file you're trying to read? |
Beta Was this translation helpful? Give feedback.
-
Hi @adamreeve, not complex number. I'm reading the parquet file with complex column type like Struct/List/Map such nested data. Here's the example. |
Beta Was this translation helpful? Give feedback.
-
Right, yes you can read such files with ParquetSharp but there isn't a way to read the data directly as .NET lists and dictionaries. One approach is to use the Arrow based API to read the data as Arrow record batches. There is documentation on this at https://github.com/G-Research/ParquetSharp/blob/master/docs/Arrow.md. Otherwise, if you want to use the original ParquetSharp API that is based on standard .NET types, you can't read this in a structured form directly, but can read the leaf-level columns. For example, the first column can be read as string arrays, where each row is an array of the "line1" values, and the second column will contain arrays for the "name" values. You can also read maps by separately reading arrays of keys and arrays of values. There's some documentation on this at https://github.com/G-Research/ParquetSharp/blob/master/docs/Nested.md#reading-dictionary-data |
Beta Was this translation helpful? Give feedback.
Yes, you can access the Parquet schema with something like:
schemaNode
is aParquetSharp.Schema.GroupNode
and has aFields
property containing a node for each top-level column of the schema. Fields can be aPrimitiveNode
or aGroupNode
with their own subfields for more complex types like lists, maps and structs. TheLogicalType
property of a node will tell you if it represents a List or Map.Alternatively, if you're working with the Arrow API you can also get the schema in Arrow format with: