Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpkg / gdal parallel performance less than sequential #286

Open
roelarents opened this issue Apr 19, 2022 · 6 comments
Open

gpkg / gdal parallel performance less than sequential #286

roelarents opened this issue Apr 19, 2022 · 6 comments

Comments

@roelarents
Copy link

roelarents commented Apr 19, 2022

When serving tiles from a geopackage (~85 GiB, ~20 layers) I see that the response times are drastically lower when requests are made in parallel. E.g. when I create test requests for 6 tiles in parallel the response time is ~8000ms versus ~300ms when I create them one after another.

I've tried looking at the source code (I'm a Rust rookie) but I couldn't find a hint yet. I tried. I've tried supplying the NOLOCK open option. Thinking that it might be the overhead of a locked GPKG that is trying to be opened by multiple clients at the same time. Even though it should by possible to open an sqlite db concurrently. That helped a little to get it to ~7000 ms.

Does anyone have an idea where the delay in response times might originate?

Edit: connection is already opened read-only by default. Opening shared would be unsafe.
Edit2: Made an image with gdal 3.4.2 and t-rex to try out NOLOCK.
Edit3: The same effect happens with a FlatGeoBuf datasource.

@roelarents roelarents changed the title gpkg parallel performance less than sequential gpkg / gdal parallel performance less than sequential Apr 21, 2022
@roelarents
Copy link
Author

roelarents commented Apr 22, 2022

After some profiling I conclude that opening the GDAL Dataset (multiple times, especially concurrently) is the bottleneck. I think that the dataset should be opened once (in the connected or new method) and then shared using the Send concept. However:

  • I'm very new to Rust. I'm trying to figure it out, but it will take time. If at all possible/feasible.
  • In this comment I see that the Dataset was explicitly not shared because it needed to be mut. I'm not sure that's still relevant though.

@pka
Copy link
Member

pka commented Apr 22, 2022

Thanks for the thorough investigation! I guess that my GDAL driver implementation came earlier than this change in the Rust wrapper. Would be great to have a little proof of concept as standalone GDAL example.

@roelarents
Copy link
Author

Would be great to have a little proof of concept as standalone GDAL example.

(Sorry to bump this. I should have responded earlier.) I agree that would be useful. But I don't have the skills to do that (or most things) in rust. Or do you mean just a setup with some gpkg and a trex config file?

@roelarents
Copy link
Author

Our team has been working on this recently because we want to start serving tiles "on-the-fly" with t-rex instead of pre-tiling as we do now. We haven't found the cause in t-rex (rust is too hard 😉 ), but we did make a setup for a multi process t-rex, with an http server in front. This circumvents the bottleneck. But of course it's not ideal, multithreading (as already supported in t-rex) should be more efficient than spinning up multiple processes.

Perhaps you want to take a look at the difference. We made a simple test setup here. On my machine there's a factor ~5 difference in the advantage of multi-processing (same underlying resources).

@pka
Copy link
Member

pka commented Jun 20, 2023

Thanks for the test setup! Is this all.gpkg free to use in a different repo to implement the same test?

@roelarents
Copy link
Author

Sure. It's public topography data (from here).

RoelvandenBerg pushed a commit to PDOK/trex-docker that referenced this issue Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants