Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow loading time reported by Vroom when under heavy load #1180

Open
SimonBradley1993 opened this issue Nov 5, 2024 · 4 comments
Open

Slow loading time reported by Vroom when under heavy load #1180

SimonBradley1993 opened this issue Nov 5, 2024 · 4 comments
Labels

Comments

@SimonBradley1993
Copy link
Contributor

We have a setup using vroom-express deployed on large AWS EC2 machines, with Vroom handling around 500+ req/s. When under this load we're seeing frequent loading times of over 1s. Usually this loading time is in the low 10's of ms, up to around 100ms. OSRM doesn't report any particular slow downs in response times. All CPU usages remain low - around 20% for all machines.

We've tried switching Vroom to using libOSRM and this exacerbated the problem with loading times going into the multiple second range.

Is there anything that can be improved with the loading time to resolve this issue when Vroom is under load?

Alternatively, is there any chance this value is being reported incorrectly by Vroom? Our service that interacts directly with the vroom-express instances doesn't show a slowdown in response times, but another service further up the chain seems to.

@jcoupey
Copy link
Collaborator

jcoupey commented Nov 6, 2024

The reported loading time includes everything prior to actually running the solving approach:

  • parsing the json payload;
  • building up internal data structures;
  • computing the matrices (external calls to routing engine);
  • precompute various things used down the line.

In general most of the time is spent in computing the matrices but of course if the system is under heavy load all of the above could be slowed down. This is also very dependent on the matrices sizes and routing setup: same machine or not etc.

First thing coming to mind would be to check if OSRM spends more time per request under load, but you seem to have ruled that out. Next in line would be to investigate network latency, especially if OSRM is on a remote machine and/or behind a proxy that may be limiting throughput.

@jcoupey
Copy link
Collaborator

jcoupey commented Nov 6, 2024

Alternatively, is there any chance this value is being reported incorrectly by Vroom?

The reported value is just a subtraction between two points in time so nothing fancy or error-prone here. Again this includes a variety of different tasks that may be impacted by many different factors.

@SimonBradley1993
Copy link
Contributor Author

SimonBradley1993 commented Nov 6, 2024

Hi Julien,

Thanks for picking this up.

On the value being reported, I thought as much - that it is just a simple subtraction of two timestamps.

So, our set up has Vroom and OSRM on different machines, and I'd considered network latency and json parsing from another github issue talking about slowdowns with OSRM.

This led us to deploying Vroom with libOSRM - meaning OSRM and Vroom were then on the same machine - which would rule out those things, and we saw a significant decrease in performance for the loading step reported by Vroom.

It seems that even though the machine's CPU usage is quite low, around 20%, due to the volume of requests coming in Vroom is struggling to performing the loading step as efficiently as it can when it is under much lighter load.

Currently our solution is going to be talking to OSRM directly and passing the matrices to Vroom to cut out the loading step that it has to do. Do you know if Vroom will have to perform any computations still on the matrices passed into the request?

@jcoupey
Copy link
Collaborator

jcoupey commented Nov 6, 2024

and we saw a significant decrease in performance for the loading step reported by Vroom.

Using libOSRM also requires some boilerplate to create objects in C++. I'd say that whether the gain is worth the trouble would be very dependent on how long your typical OSRM requests actually take. I have no data to back this thought but if you have a lot of very small requests, this may be a lead.

It seems that even though the machine's CPU usage is quite low, around 20%

Not sure how you measure this, but if the system load is reported as some kind of average over a certain period, then it is still possible to get a low usage on average, but with huge peaks where you experience slowdowns due to concurrent requests.

Currently our solution is going to be talking to OSRM directly and passing the matrices to Vroom to cut out the loading step that it has to do

I don't really see how this would solve the problem if it is related to network or OSRM load. You'd most probably simply be reproducing the same problem outside VROOM with your additional layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants