You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that regridding from a large tile size (e.g. 10224 or 512) to a small size (e.g. 64) results in very large rdd/partition sizes, which is unexpected.
It seems that regridding from a large tile size (e.g. 10224 or 512) to a small size (e.g. 64) results in very large rdd/partition sizes, which is unexpected.
My theory is that the 'crop' method used in Regrid:
https://github.com/locationtech/geotrellis/blob/d65d6a22eb70efd96caa5c6f5f660b2b936b2763/spark/src/main/scala/geotrellis/spark/regrid/Regrid.scala#L122
Is a lazy crop, which keeps the original array instead of copying the smaller chunk of data out of the larger one. So when Spark serializes the rdd, it also copies over all of the larger arrays backing the cropped types, inflating the data a lot.
The text was updated successfully, but these errors were encountered: