You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This in particular is a problem when writing tiles sources from RasterSource API using GeoTrellis LayerWriter because the first action taken is to groupBy the records by their index:
Since the read is triggered before this groupBy this results in shuffle of all of the raster pixels which is quite expensive.
What would be preferable is a having an instance of MultibandTile that contains a RasterRegion but does not read the pixels until they're explicitly requested by one of the functions. This would allow the groupBy to be performed on metadata only, greatly improving performance of all ingests.
This would be helpful behavior in other but similar situations where the tiles need to be sorted, filtered or joined before they're actually used.
I'm not sure if this should be default behavior (probably?) or if we should provide both behaviors as part of the RasterRegion interface: eagerRaster and lazyRaster.
Implement LazyMultibandTile
RasterRegion produces LazyMultibandTile
Benchmark a sample ingest with eager vs lazy tile read to validate assumption and document
The text was updated successfully, but these errors were encountered:
Currently
RasterRegion.raster
triggers a read when the raster is requested:geotrellis-contrib/vlm/src/main/scala/geotrellis/contrib/vlm/RasterRegion.scala
Lines 41 to 55 in aca902c
This in particular is a problem when writing tiles sources from
RasterSource
API using GeoTrellisLayerWriter
because the first action taken is togroupBy
the records by their index:https://github.com/locationtech/geotrellis/blob/474ed9019b1281ce9e134167e7f7f3b0fc3e2eae/s3-spark/src/main/scala/geotrellis/spark/store/s3/S3RDDWriter.scala#L81
Since the read is triggered before this
groupBy
this results in shuffle of all of the raster pixels which is quite expensive.What would be preferable is a having an instance of
MultibandTile
that contains aRasterRegion
but does not read the pixels until they're explicitly requested by one of the functions. This would allow thegroupBy
to be performed on metadata only, greatly improving performance of all ingests.This would be helpful behavior in other but similar situations where the tiles need to be sorted, filtered or joined before they're actually used.
I'm not sure if this should be default behavior (probably?) or if we should provide both behaviors as part of the
RasterRegion
interface:eagerRaster
andlazyRaster
.LazyMultibandTile
RasterRegion
producesLazyMultibandTile
The text was updated successfully, but these errors were encountered: