You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should the format allow for no geometry columns in a file?
I think that it should, because it is occasionally useful: for example, a tool converting a schema-less (GeoJSON-like) input to GeoParquet on an empty input has to either guess about the geometry columns, or to error out. Both options seem less than ideal.
Also, geopandas.GeoDataFrame().to_parquet(...) is a similar case and should do something reasonable and compliant.
In the spec is currently written I think the answer is yes, but in a counter-intuitive way: columns can be empty, and the required primary_column field could be anything:
The name of the "primary" geometry column. In cases where a GeoParquet file contains multiple geometry columns, the primary geometry may be used by default in geospatial operations.
There are no requirements for primary_column to be actually contained in columns, so the spec could be taken to mean "the name that would be used if there would be any (multiple?) geometry columns".
Interestingly, the current implementation of geopandas.GeoDataFrame().to_parquet agrees, writing {"primary_column": "geometry", "columns": {}} in the metadata.
But the JSON schema contradicts the written spec here, requiring the columns to be non-empty, and there is an additional check for primary_column to be contained in columns.
I think this corner case is important enough to be made explicit in the specification, either by making primary_column optional/nullable and empty columns valid, or by explicitly allowing {"primary_column": "geometry", "columns": {}}.
Or, alternatively, by explicitly prohibiting this case in the specification, even if I think that would be rather unfortunate.
The text was updated successfully, but these errors were encountered:
What is the advantage of having the metadata without columns (in which case it is basically just a version, i.e. "geo": {"columns": [], "version": "1.0.0-dev"}), compared to just leaving out the "geo" metadata?
I think that would depend on whether the "geo" metadata would be made optional in the specification (that is, if any other parquet file would be a valid GeoParquet file, just without geo-columns). If that is true, then leaving out "geo" metadata could be indeed another solution. But if the spec would require a valid GeoParquet file to always have this metadata (as it currently does), then it would be very strange for a GeoDataFrame (or something like that) GeoParquet serialization function to write a file that:
is not valid according to the GeoParquet specification;
would error out due to the missing required metadata in many (if not all) readers.
It seems like a trade-off between supporting missing "geo" metadata (and possibly interpreting a totally unrelated parquet file as GeoParquet) and empty "columns" (which likely requires a bit of careful handling) in GeoParquet readers.
Should the format allow for no geometry columns in a file?
I think that it should, because it is occasionally useful: for example, a tool converting a schema-less (GeoJSON-like) input to GeoParquet on an empty input has to either guess about the geometry columns, or to error out. Both options seem less than ideal.
Also,
geopandas.GeoDataFrame().to_parquet(...)
is a similar case and should do something reasonable and compliant.In the spec is currently written I think the answer is yes, but in a counter-intuitive way:
columns
can be empty, and the requiredprimary_column
field could be anything:There are no requirements for
primary_column
to be actually contained incolumns
, so the spec could be taken to mean "the name that would be used if there would be any (multiple?) geometry columns".Interestingly, the current implementation of
geopandas.GeoDataFrame().to_parquet
agrees, writing{"primary_column": "geometry", "columns": {}}
in the metadata.But the JSON schema contradicts the written spec here, requiring the
columns
to be non-empty, and there is an additional check forprimary_column
to be contained incolumns
.I think this corner case is important enough to be made explicit in the specification, either by making
primary_column
optional/nullable and emptycolumns
valid, or by explicitly allowing{"primary_column": "geometry", "columns": {}}
.Or, alternatively, by explicitly prohibiting this case in the specification, even if I think that would be rather unfortunate.
The text was updated successfully, but these errors were encountered: