You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Naively, I had thought that for a remote proxy cube, taking advantage of gdalcubes lazy evaluation to perform a computation in a single go would be faster than forcing it to write out to disk half-way, and then reading it back in and continuing. i.e. that:
however, in my current examples I'm seeing over 5x fold improvement in runtime, from over an hour in the first case to about 12 minutes in the second case. Notably, monitoring my network download rates I see the stays steadily around 100Mb/s, whereas in the second case I see rates of 700-999Mb/s (my max), suggesting that the however gdalcubes is computing the temporal reduce in streaming mode is limiting the how fast the network can read in? (But not sure what part of the hardware is the bottleneck here -- CPU does not max out either).
It kinda makes sense to me -- it's still worth doing the crop to download bytes we never touch, but the temporal reduction requires all the data, so maybe it is faster to stream it in big chunks or something to disk and then process it? Do you think that's true in general or just here?
This should be a reprex example for the full data. (Aside, but note we annoyingly need a custom url function wrapper -- I'm thrilled gdalcubes supports that even for stac_image_collection, though here I read from nasa's custom api which is more reliable than nasa's stac api. Also note this example reads netcdf, not cog, because that's what NASA gives us.).
Naively, I had thought that for a remote proxy cube, taking advantage of
gdalcubes
lazy evaluation to perform a computation in a single go would be faster than forcing it to write out to disk half-way, and then reading it back in and continuing. i.e. that:would be faster than:
however, in my current examples I'm seeing over 5x fold improvement in runtime, from over an hour in the first case to about 12 minutes in the second case. Notably, monitoring my network download rates I see the stays steadily around 100Mb/s, whereas in the second case I see rates of 700-999Mb/s (my max), suggesting that the however gdalcubes is computing the temporal reduce in streaming mode is limiting the how fast the network can read in? (But not sure what part of the hardware is the bottleneck here -- CPU does not max out either).
It kinda makes sense to me -- it's still worth doing the crop to download bytes we never touch, but the temporal reduction requires all the data, so maybe it is faster to stream it in big chunks or something to disk and then process it? Do you think that's true in general or just here?
This should be a reprex example for the full data. (Aside, but note we annoyingly need a custom url function wrapper -- I'm thrilled gdalcubes supports that even for stac_image_collection, though here I read from nasa's custom api which is more reliable than nasa's stac api. Also note this example reads netcdf, not cog, because that's what NASA gives us.).
The text was updated successfully, but these errors were encountered: