-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpkg / gdal parallel performance less than sequential #286
Comments
After some profiling I conclude that opening the GDAL Dataset (multiple times, especially concurrently) is the bottleneck. I think that the dataset should be opened once (in the
|
Thanks for the thorough investigation! I guess that my GDAL driver implementation came earlier than this change in the Rust wrapper. Would be great to have a little proof of concept as standalone GDAL example. |
(Sorry to bump this. I should have responded earlier.) I agree that would be useful. But I don't have the skills to do that (or most things) in rust. Or do you mean just a setup with some gpkg and a trex config file? |
Our team has been working on this recently because we want to start serving tiles "on-the-fly" with t-rex instead of pre-tiling as we do now. We haven't found the cause in t-rex (rust is too hard 😉 ), but we did make a setup for a multi process t-rex, with an http server in front. This circumvents the bottleneck. But of course it's not ideal, multithreading (as already supported in t-rex) should be more efficient than spinning up multiple processes. Perhaps you want to take a look at the difference. We made a simple test setup here. On my machine there's a factor ~5 difference in the advantage of multi-processing (same underlying resources). |
Thanks for the test setup! Is this |
Sure. It's public topography data (from here). |
…ocess setup behind a lighttpd reverse proxy. Due to t-rex-tileserver/t-rex#286 (comment)
When serving tiles from a geopackage (~85 GiB, ~20 layers) I see that the response times are drastically lower when requests are made in parallel. E.g. when I create test requests for 6 tiles in parallel the response time is ~8000ms versus ~300ms when I create them one after another.
I've tried looking at the source code (I'm a Rust rookie) but I couldn't find a hint yet. I tried. I've tried supplying the NOLOCK open option. Thinking that it might be the overhead of a locked GPKG that is trying to be opened by multiple clients at the same time. Even though it should by possible to open an sqlite db concurrently. That helped a little to get it to ~7000 ms.
Does anyone have an idea where the delay in response times might originate?
Edit: connection is already opened read-only by default. Opening shared would be unsafe.
Edit2: Made an image with gdal 3.4.2 and t-rex to try out NOLOCK.
Edit3: The same effect happens with a FlatGeoBuf datasource.
The text was updated successfully, but these errors were encountered: