Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor GeoDataset #73

Merged
merged 4 commits into from
Aug 6, 2021
Merged

Refactor GeoDataset #73

merged 4 commits into from
Aug 6, 2021

Conversation

adamjstewart
Copy link
Collaborator

This PR refactors GeoDataset by adding two additional base classes:

  • RasterDataset: for datasets composed of raster images
  • VectorDataset: for datasets composed of vector shapefiles

All existing GeoDataset implementations now subclass these new base classes.

The motivation behind this PR is to reduce the significant amount of code duplication that has already accumulated. Note that since this is a refactor, no changes to the API or tests have been made.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Aug 5, 2021
@adamjstewart adamjstewart marked this pull request as ready for review August 5, 2021 22:15
Comment on lines +224 to +231
hits = self.index.intersection(query, objects=True)

try:
hit = next(hits) # TODO: this assumes there is only a single hit
except StopIteration:
raise IndexError(
f"query: {query} is not within bounds of the index: {self.bounds}"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes #57 (comment)

Comment on lines +187 to +188
if self.crs is None:
self.crs = src.crs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes #57 (comment)

crs is now optional. Instead of computing the most common CRS, I just took the first CRS we find, as it's much simpler.

Comment on lines +207 to +210
if i == 0:
raise FileNotFoundError(
f"No {self.__class__.__name__} data was found in '{root}'"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes #57 (comment)

with WarpedVRT(src, crs=self.crs) as vrt:
minx, miny, maxx, maxy = vrt.bounds
except rasterio.errors.RasterioIOError:
# Skip files that rasterio is unable to read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to spit out a warning here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could issue debug-level logging messages for these (#39), but these are far too common to warrant info-level warning messages. This will be hit for every .zip, .txt, .dbf, etc. file in the directory.

@adamjstewart adamjstewart merged commit da7ba0b into main Aug 6, 2021
@adamjstewart adamjstewart deleted the datasets/geo branch August 6, 2021 14:55
@adamjstewart adamjstewart mentioned this pull request Aug 9, 2021
29 tasks
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants