Slow to respond concurrent requests for new shapes #1785

msfstef · 2024-10-03T07:42:29Z

When multiple concurrent requests for new shapes come in, requests and responses are all routed through the ShapeCache causing delays and flooding of the process mailbox. Requests can instead be routed to individual shape worker processes to handle their needs, like waiting for snapshots and requesting a log etc.

The goal is to move stuff off of the critical path of the requests and enable quick responses, even if the responses are 429s because of e.g. pool connection resource limitations.

The text was updated successfully, but these errors were encountered:

msfstef · 2024-10-03T15:46:17Z

#1787 should address sending requests directly to the shape workers

There's more work that's easily done to remove responsibilities from ShapeCache, but I'd rather do it after merging the above PR.

@robacourt

Addresses #1785 and partially addresses #1770 Moves a lot of the operations that went through `ShapeCache` directly into the `Shape.Consumer`, so that requests can be replied to directly from the shape consumers rather than flooding the `ShapeCache` with casts that take a while to reach the requesters. I've tried to keep changes to a minimum in order to do this incrementally and keep these PRs easily reviewable - the `ShapeStatus` still persists data on every call, the relations and truncates still go through `ShapeCache` rather than individual shapes, etc I've also caught the `DBConnection.ConnectionError`s for queue timeouts and converted them to 429 errors. We need to also handle `GenServer.call` timeouts as sometimes the PG query might not fail but take longer than the default 5 seconds for the GenServer call. NOTE: I have not updated any tests yet as I first want to ensure people agree with the approach PERFORMANCE CHECK: - On my local machine, using in memory stores, running 1000 concurrent new shape connections consistently took ~20sec with these changes, compared to the ~33sec on main, so a ~30% improvement. - I was also able to succesfully run 10k concurrent connections with this, although it took ~10min to serve, but on main I wasn't able to succsefully run it (@robacourt I think that was the case for you too?) - at least we know it does not get into an unrecoverable state.

msfstef added the improvement label Oct 3, 2024

msfstef self-assigned this Oct 3, 2024

msfstef mentioned this issue Oct 3, 2024

feat: Move operations from ShapeCache to Shape.Consumer #1787

Merged

balegas added this to the Electric Scales milestone Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow to respond concurrent requests for new shapes #1785

Slow to respond concurrent requests for new shapes #1785

msfstef commented Oct 3, 2024

msfstef commented Oct 3, 2024

Slow to respond concurrent requests for new shapes #1785

Slow to respond concurrent requests for new shapes #1785

Comments

msfstef commented Oct 3, 2024

msfstef commented Oct 3, 2024