Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

worldcover extract requires too much memory #99

Closed
jdries opened this issue Dec 2, 2022 · 9 comments
Closed

worldcover extract requires too much memory #99

jdries opened this issue Dec 2, 2022 · 9 comments
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Dec 2, 2022

I had to set memory very high for this one:

wc = c.load_collection("ESA_WORLDCOVER_10M_2021_V2", bands="MAP",
                                          temporal_extent=["2020-12-30", "2022-01-01"])

    statsfile = "cropland_mean_laea_2021.json"
    if (not Path(statsfile).exists()):

        (wc.band("MAP") == 40).aggregate_spatial(
            "https://artifactory.vgt.vito.be/auxdata-public/grids/LAEA-20km-EU27.geojson",reducer=lambda x:array_create(mean(x),count(x))).execute_batch(
            statsfile, title="Worldcover stats LAEA", job_options={"executor-memory":"7G","executor-memoryOverhead":"2G"})

This stage is the problem:
flatMap at FileLayerProvider.scala:758

Image

Thread dumps mostly seem to be stuck in tileToLayout.

Worldcover products are quite large (36000, 36000), and have a block size of 1024x1024.

@jdries
Copy link
Contributor Author

jdries commented Jan 5, 2023

Note that I fixed a similar issue for sentinel-2, where I avoided expensive tileToLayout by setting a 'predefinedExtent' when creating BandCompositeRasterSource.
We may be able to set a similar extent for worldcover, which also uses a fixed tiling scheme.

@jdries
Copy link
Contributor Author

jdries commented Jan 25, 2023

@EmileSonneveld based on what we saw yesterday, with the mask rasterization going OOM, I committed one fix to really ensure that the spatial partitioner is used.
We could however also write a test for this case that simply counts the number of resulting partitions? (The test does not necessarily need to perform the full rasterization, it may be sufficient to establish that there's sufficient partitions in the rdd.)

@EmileSonneveld
Copy link
Contributor

It seems like some issue where Scala does not behave like expected. Maybe there is already a warning for this during compilation. I'll check.
When I remove the new PartitionerIndex[SpatialKey] code here and run aggregateTemporalTest, scala find/creates a new PartitionerIndex. This sounds like it could cause difficult to spot bugs.

@jdries
Copy link
Contributor Author

jdries commented Jan 27, 2023

Current state is in screenshot below:

Image

@jdries
Copy link
Contributor Author

jdries commented Feb 6, 2023

Just tried running it again: stage 10 is now much better.
To fix the subsequent problem with stage 14, we can try filling in the 'predefinedExtent'. For worldcover, the crs is EPSG:4326, so we can assume that the provided bbox is the exact bbox of the raster. Setting that extent may speed up this stage because we avoid retrieving metadata from every single raster.

@jdries jdries added this to the sap05-usability milestone Feb 8, 2023
@jdries
Copy link
Contributor Author

jdries commented Feb 8, 2023

Last improvement did not yet solve the issue, we're still stuck with a resampling that we don't want to happen in the first place:

[email protected]/java.io.FileInputStream.open0(Native Method)
[email protected]/java.io.FileInputStream.open(FileInputStream.java:219)
[email protected]/java.io.FileInputStream.<init>(FileInputStream.java:157)
geotrellis.util.FileRangeReader.readClippedRange(FileRangeReader.scala:34)
geotrellis.util.RangeReader.readRange(RangeReader.scala:42)
geotrellis.util.RangeReader.readRange$(RangeReader.scala:41)
geotrellis.util.FileRangeReader.readRange(FileRangeReader.scala:30)
geotrellis.util.StreamingByteReader.readChunk(StreamingByteReader.scala:99)
geotrellis.util.StreamingByteReader.ensureChunk(StreamingByteReader.scala:112)
geotrellis.util.StreamingByteReader.get(StreamingByteReader.scala:130)
geotrellis.raster.io.geotiff.reader.GeoTiffInfo$.read(GeoTiffInfo.scala:127)
geotrellis.raster.io.geotiff.reader.GeoTiffReader$.readMultiband(GeoTiffReader.scala:211)
geotrellis.raster.geotiff.GeoTiffResampleRasterSource.$anonfun$tiff$1(GeoTiffResampleRasterSource.scala:45)
geotrellis.raster.geotiff.GeoTiffResampleRasterSource$$Lambda$1280/0x0000000840aa6440.apply(Unknown Source)
app//scala.Option.getOrElse(Option.scala:189)
geotrellis.raster.geotiff.GeoTiffResampleRasterSource.tiff$lzycompute(GeoTiffResampleRasterSource.scala:42) => holding Monitor(geotrellis.raster.geotiff.GeoTiffResampleRasterSource@1926505049})
geotrellis.raster.geotiff.GeoTiffResampleRasterSource.tiff(GeoTiffResampleRasterSource.scala:39)
geotrellis.raster.geotiff.GeoTiffResampleRasterSource.crs(GeoTiffResampleRasterSource.scala:58)
geotrellis.raster.RasterSource.reproject(RasterSource.scala:54)
org.openeo.geotrellis.layers.BandCompositeRasterSource.$anonfun$reprojectedSources$1(FileLayerProvider.scala:48)
org.openeo.geotrellis.layers.BandCompositeRasterSource$$Lambda$1279/0x0000000840aa5840.apply(Unknown Source)
app//cats.data.NonEmptyList.map(NonEmptyList.scala:77)
org.openeo.geotrellis.layers.BandCompositeRasterSource.reprojectedSources(FileLayerProvider.scala:48)
org.openeo.geotrellis.layers.BandCompositeRasterSource.resample(FileLayerProvider.scala:109)
geotrellis.raster.RasterSource.resampleToGrid(RasterSource.scala:92)
geotrellis.layer.Implicits$TileToLayoutOps.tileToLayout(Implicits.scala:67)
geotrellis.layer.Implicits$TileToLayoutOps.tileToLayout(Implicits.scala:70)
org.openeo.geotrellis.layers.FileLayerProvider.$anonfun$readMultibandTileLayer$3(FileLayerProvider.scala:804)
org.openeo.geotrellis.layers.FileLayerProvider$$Lambda$1239/0x0000000840b34040.apply(Unknown Source)
app//scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)

jdries added a commit that referenced this issue Feb 12, 2023
jdries added a commit that referenced this issue Feb 13, 2023
…the incoming extent, reduces special cases
jdries added a commit that referenced this issue Feb 13, 2023
* apply global_bounds also when not in utm projection
* do not apply rounding to resolution as a general rule, stick to globalbounds which is assumed to be rounded already if needed
* also skip resampling for layers that are not utm, all raster sources should be pre-aligned to layout
* align extent to layout before loading data
* further apply principle of always aligning raster sources to the incoming extent, reduces special cases
* update agera5 and dem tests, output has improved thanks to changes
@jdries
Copy link
Contributor Author

jdries commented Feb 14, 2023

The issue with stage 14 in previous screenshot is entirely gone.
Stage 15 with 51376 now has the issue that task de-serialization time and GC is much longer than actual task time. This either points to too many partitions, are tasks with a scope that is too large resulting in serialization overhead.

@jdries
Copy link
Contributor Author

jdries commented Mar 8, 2023

Stage 15 went from one hour to 5 minutes by changing partitioner settings.
Now we only need to tune the final stage.

@jdries
Copy link
Contributor Author

jdries commented Mar 8, 2023

Looks like we're there, I ran a job with
{"executor-memory":"4G","executor-memoryOverhead":"2G"}
which is much more reasonable than before, and also finished in reasonable time.

Screenshot of current state:

Image

@jdries jdries closed this as completed Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants