Clarify support for zero geometry columns #165

himikof · 2023-01-27T18:31:15Z

Should the format allow for no geometry columns in a file?

I think that it should, because it is occasionally useful: for example, a tool converting a schema-less (GeoJSON-like) input to GeoParquet on an empty input has to either guess about the geometry columns, or to error out. Both options seem less than ideal.
Also, geopandas.GeoDataFrame().to_parquet(...) is a similar case and should do something reasonable and compliant.

In the spec is currently written I think the answer is yes, but in a counter-intuitive way: columns can be empty, and the required primary_column field could be anything:

The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations.

There are no requirements for primary_column to be actually contained in columns, so the spec could be taken to mean "the name that would be used if there would be any (multiple?) geometry columns".
Interestingly, the current implementation of geopandas.GeoDataFrame().to_parquet agrees, writing {"primary_column": "geometry", "columns": {}} in the metadata.

But the JSON schema contradicts the written spec here, requiring the columns to be non-empty, and there is an additional check for primary_column to be contained in columns.

I think this corner case is important enough to be made explicit in the specification, either by making primary_column optional/nullable and empty columns valid, or by explicitly allowing {"primary_column": "geometry", "columns": {}}.
Or, alternatively, by explicitly prohibiting this case in the specification, even if I think that would be rather unfortunate.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2023-01-27T19:19:16Z

What is the advantage of having the metadata without columns (in which case it is basically just a version, i.e. "geo": {"columns": [], "version": "1.0.0-dev"}), compared to just leaving out the "geo" metadata?

himikof · 2023-01-28T18:04:52Z

I think that would depend on whether the "geo" metadata would be made optional in the specification (that is, if any other parquet file would be a valid GeoParquet file, just without geo-columns). If that is true, then leaving out "geo" metadata could be indeed another solution. But if the spec would require a valid GeoParquet file to always have this metadata (as it currently does), then it would be very strange for a GeoDataFrame (or something like that) GeoParquet serialization function to write a file that:

is not valid according to the GeoParquet specification;
would error out due to the missing required metadata in many (if not all) readers.

It seems like a trade-off between supporting missing "geo" metadata (and possibly interpreting a totally unrelated parquet file as GeoParquet) and empty "columns" (which likely requires a bit of careful handling) in GeoParquet readers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify support for zero geometry columns #165

Clarify support for zero geometry columns #165

himikof commented Jan 27, 2023

jorisvandenbossche commented Jan 27, 2023

himikof commented Jan 28, 2023

Clarify support for zero geometry columns #165

Clarify support for zero geometry columns #165

Comments

himikof commented Jan 27, 2023

jorisvandenbossche commented Jan 27, 2023

himikof commented Jan 28, 2023