Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing data in a cross-language form #71

Open
martinfleis opened this issue Sep 19, 2024 · 11 comments
Open

Storing data in a cross-language form #71

martinfleis opened this issue Sep 19, 2024 · 11 comments

Comments

@martinfleis
Copy link

Hi,

would you be keen on storing the data in some open formats alongside rda so we could link to it from Python? We have a geodatasets package that holds metadata and some tooling to cache the data locally so if you including the data here as GeoJSON, CSV, GPKG or whatever is needed we could include them in geodatasets allowing easier access to the same data from R and Python, avoiding the need of running R first to save the data Python can read.

@Robinlovelace
Copy link
Collaborator

Would be really useful to have cross-language datasets. Maybe a spDatapy or spDatax repo could be worthwhile, to avoid issues with CRAN..

@Nowosad
Copy link
Owner

Nowosad commented Sep 19, 2024

@martinfleis what do you have in mind? Do you want to store the files in some python package? Many of the datasets from spData are available in inst/shapes -- https://github.com/Nowosad/spData/tree/master/inst/shapes (although we plan to remove shapefiles soon from there -- #62). Do you need any other dataset from spData as a file?

@martinfleis
Copy link
Author

Many of the datasets from spData are available in inst/shapes

Missed that! That is what I was looking for. If these links are considered stable, I would just include them in geodatasets for easy access from Python.

@Nowosad
Copy link
Owner

Nowosad commented Sep 19, 2024

Yes, they are v. stable. (Except the .shp files, which will be removed in ~two months)

@Nowosad
Copy link
Owner

Nowosad commented Sep 19, 2024

@martinfleis
Copy link
Author

I have exposed those datasets that live in inst/shapes in geodatasets in geopandas/geodatasets#27. It is far from the complete list but I believe that the rest is not available as files but generated in some form?

@Nowosad
Copy link
Owner

Nowosad commented Sep 20, 2024

The rest of them are .rda object -- do you want all of the datasets from the README available (except the one we discussed yesterday)? If so, I could just create another GH repo for that.

@martinfleis
Copy link
Author

It would be nice for independence of R and Python examples depending on the same data. The tiny snippet @Robinlovelace used during SDSL required R running prior to Python to load the file and dump it to the disk before it could be read by geopandas. Having it available directly would allow more freedom in what runs first and in what runs at all.

@Robinlovelace
Copy link
Collaborator

+1 to increasing modularity and x-language compat (without having to depend on either for shared examples).

@Nowosad
Copy link
Owner

Nowosad commented Sep 27, 2024

@martinfleis I took a look at the data available in R files -- they consist of spatial vector data, some raster data, a few tables, and also some graph data. Do you have any suggestions on the data formats you would prefer for each of the data types (e.g., vector -- gpkg, raster -- geotiff, etc)?

@martinfleis
Copy link
Author

As long as GDAL can read it I don't really care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants