-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify the dst crs in convert? #160
Comments
Following up, my guess is that storage_options = {"account_name": <account_name> "credential": <token>}
href = "<protocol>/<container>/<prefix>/valid-geo.parquet"
gdf = dask_geopandas.read_parquet(href, storage_options=storage_options) Python traceback---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask/backends.py:136, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
135 try:
--> 136 return func(*args, **kwargs)
137 except Exception as e:
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask/dataframe/io/parquet/core.py:538, in read_parquet(path, columns, filters, categories, index, storage_options, engine, use_nullable_dtypes, dtype_backend, calculate_divisions, ignore_metadata_file, metadata_task_size, split_row_groups, blocksize, aggregate_files, parquet_file_extension, filesystem, **kwargs)
536 blocksize = None
--> 538 read_metadata_result = engine.read_metadata(
539 fs,
540 paths,
541 categories=categories,
542 index=index,
543 use_nullable_dtypes=use_nullable_dtypes,
544 dtype_backend=dtype_backend,
545 gather_statistics=calculate_divisions,
546 filters=filters,
547 split_row_groups=split_row_groups,
548 blocksize=blocksize,
549 aggregate_files=aggregate_files,
550 ignore_metadata_file=ignore_metadata_file,
551 metadata_task_size=metadata_task_size,
552 parquet_file_extension=parquet_file_extension,
553 dataset=dataset_options,
554 read=read_options,
555 **other_options,
556 )
558 # In the future, we may want to give the engine the
559 # option to return a dedicated element for `common_kwargs`.
560 # However, to avoid breaking the API, we just embed this
561 # data in the first element of `parts` for now.
562 # The logic below is inteded to handle backward and forward
563 # compatibility with a user-defined engine.
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask_geopandas/io/parquet.py:57, in GeoArrowEngine.read_metadata(cls, fs, paths, **kwargs)
55 @classmethod
56 def read_metadata(cls, fs, paths, **kwargs):
---> 57 meta, stats, parts, index = super().read_metadata(fs, paths, **kwargs)
59 gather_spatial_partitions = kwargs.pop("gather_spatial_partitions", True)
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask/dataframe/io/parquet/arrow.py:549, in ArrowDatasetEngine.read_metadata(cls, fs, paths, categories, index, use_nullable_dtypes, dtype_backend, gather_statistics, filters, split_row_groups, blocksize, aggregate_files, ignore_metadata_file, metadata_task_size, parquet_file_extension, **kwargs)
548 # Stage 2: Generate output `meta`
--> 549 meta = cls._create_dd_meta(dataset_info)
551 # Stage 3: Generate parts and stats
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask_geopandas/io/parquet.py:103, in GeoArrowEngine._create_dd_meta(cls, dataset_info, use_nullable_dtypes)
99 raise ValueError(
100 "No dataset parts discovered. Use dask.dataframe.read_parquet "
101 "to read it as an empty DataFrame"
102 )
--> 103 meta = cls._update_meta(meta, schema)
104 return meta
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask_geopandas/io/parquet.py:77, in GeoArrowEngine._update_meta(cls, meta, schema)
74 """
75 Convert meta to a GeoDataFrame and update with potential GEO metadata
76 """
---> 77 return _update_meta_to_geodataframe(meta, schema.metadata)
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask_geopandas/io/arrow.py:36, in _update_meta_to_geodataframe(meta, schema_metadata)
35 geometry_column_name = geo_meta["primary_column"]
---> 36 crs = geo_meta["columns"][geometry_column_name]["crs"]
37 geometry_columns = geo_meta["columns"]
KeyError: 'crs'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[36], line 6
3 with fsspec.open(href, mode="rb", **storage_options) as f:
4 gpd.read_parquet(f)
----> 6 dask_geopandas.read_parquet(href, storage_options=storage_options)
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask_geopandas/io/parquet.py:112, in read_parquet(*args, **kwargs)
111 def read_parquet(*args, **kwargs):
--> 112 result = dd.read_parquet(*args, engine=GeoArrowEngine, **kwargs)
113 # check if spatial partitioning information was stored
114 spatial_partitions = result._meta.attrs.get("spatial_partitions", None)
File ~/mambaforge/envs/jl-full/lib/python3.11/site-packages/dask/backends.py:138, in CreationDispatch.register_inplace.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
136 return func(*args, **kwargs)
137 except Exception as e:
--> 138 raise type(e)(
139 f"An error occurred while calling the {funcname(func)} "
140 f"method registered to the {self.backend} backend.\n"
141 f"Original Message: {e}"
142 ) from e
KeyError: "An error occurred while calling the read_parquet method registered to the pandas backend.\nOriginal Message: 'crs'" |
It looks like you are running into an issue with dask-geopandas. The I think this is the same issue geopandas/dask-geopandas#270 |
I don't think gpq currently contains a method to specify the target crs. Also I see that by default you use "OGC:CRS84", what is your rationale for that? Why not, for example, use "EPSG:4326"?
I'll add a little bit of context on my use case. So I just used
gpq
to convert a 'big' collection of parquet files to geoparquet by simply doinggpq convert non-geo.parquet valid-geo.parquet
in a for loop. Further in my processing chain I load these geoparquet files usingGeoPandas
, but I ran into an issue because when thecrs == "OGC:CRS84"
it cannot be converted to epgs. Although it's expected behaviour I'm mostly just curious why you use "OGC:CRS84" instead of "EPSG:4326".I'll probably change my routines from
gdf.crs.to_epsg()
togdf.crs.to_string()
, but I guess that several others rely on to_epsg() as well when using GeoPandas, so I thought it's worth opening a discussion point here.The text was updated successfully, but these errors were encountered: