Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intelligent sampling #82

Closed
adamjstewart opened this issue Aug 11, 2021 · 0 comments · Fixed by #84
Closed

More intelligent sampling #82

adamjstewart opened this issue Aug 11, 2021 · 0 comments · Fixed by #84
Labels
samplers Samplers for indexing datasets
Milestone

Comments

@adamjstewart
Copy link
Collaborator

adamjstewart commented Aug 11, 2021

Here are some ideas:

  • All GeoSamplers should take a GeoDataset index as input
  • Randomly choose a file, then randomly sample from within bounds of that file (solves sampling out of bounds problem)
  • Add new sampler (RandomBatchGeoSampler) that subclasses BatchSampler and returns a batch of random patches from a single tile

When using ZipDataset with random samplers, the index should come from whichever dataset is tile-based. When using ZipDataset with grid samplers, the index should come from whichever dataset is not tile-based. Not yet sure how to handle something like Landsat + Sentinel, but we can figure that out another day.

Class hierarchy:

  • Sampler
    • GeoSampler
      • RandomGeoSampler
      • GridGeoSampler
    • BatchGeoSampler
      • RandomBatchGeoSampler

Make sure to document the difference between samplers and batch samplers and when to use which. Should store samplers and batch samplers in different files and combine in __init__ like we do with datasets. Add utils.py for things like _to_tuple.

Question: if I'm using an LRU cache and BatchSampler and multiple workers, if something isn't yet in the cache, will PyTorch spawn multiple workers all trying to warp the entire tile? It may actually be faster to use a single worker in this case.

@adamjstewart adamjstewart added the samplers Samplers for indexing datasets label Aug 11, 2021
This was referenced Aug 11, 2021
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant