feat: Move operations from `ShapeCache` to `Shape.Consumer` #1787

msfstef · 2024-10-03T09:30:00Z

Addresses #1785 and partially addresses #1770

Moves a lot of the operations that went through ShapeCache directly into the Shape.Consumer, so that requests can be replied to directly from the shape consumers rather than flooding the ShapeCache with casts that take a while to reach the requesters.

I've tried to keep changes to a minimum in order to do this incrementally and keep these PRs easily reviewable - the ShapeStatus still persists data on every call, the relations and truncates still go through ShapeCache rather than individual shapes, etc

I've also caught the DBConnection.ConnectionErrors for queue timeouts and converted them to 429 errors.
We need to also handle GenServer.call timeouts as sometimes the PG query might not fail but take longer than the default 5 seconds for the GenServer call.

NOTE: I have not updated any tests yet as I first want to ensure people agree with the approach

PERFORMANCE CHECK:

On my local machine, using in memory stores, running 1000 concurrent new shape connections consistently took ~20sec with these changes, compared to the ~33sec on main, so a ~30% improvement.
I was also able to succesfully run 10k concurrent connections with this, although it took ~10min to serve, but on main I wasn't able to succsefully run it (@robacourt I think that was the case for you too?) - at least we know it does not get into an unrecoverable state.

netlify · 2024-10-03T09:31:02Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`edf9eed`
🔍 Latest deploy log	https://app.netlify.com/sites/electric-next/deploys/66feb82caa29010008b9c0e0
😎 Deploy Preview	https://deploy-preview-1787--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

packages/sync-service/lib/electric/plug/serve_shape_plug.ex

msfstef · 2024-10-03T09:33:36Z

packages/sync-service/lib/electric/shape_cache.ex

@@ -293,9 +296,12 @@ defmodule Electric.ShapeCache do
        {:reply, :started, [], state}

      true ->
-        Logger.debug("Starting a wait on the snapshot #{shape_id} for #{inspect(from)}}")
+        GenServer.cast(


We could avoid going through the shape cache and directly go to a shape consumer, but for the sake of incrementally refactoring this I kept it like this as I think there are more concurrency considerations in the other case

packages/sync-service/lib/electric/shapes/consumer/supervisor.ex

msfstef · 2024-10-03T09:38:49Z

packages/sync-service/lib/electric/shapes/consumer_supervisor.ex

+        Consumer.Supervisor.clean_and_stop(%{
+          electric_instance_id: electric_instance_id,
+          shape_id: shape_id
+        })


I was unsure if we need to also have a fallback here to forcefully do DynamicSupervisor.stop(name, pid) - I figured if the Supervisor is alive it should be able to handle its own shutdown gracefully

I'm not entirely sure what you're referring to here. The code feels weird in the sense that the passed name argument is not used but a name is constructed for the consumer process. It's not your changes, it was weird before them.

It's not important to dwell on this right now, IMO. We can clear up process lifecycles at another incremental step of the overall move from ShapeCache to individual shape process trees.

I think the idea was that we would terminate processes via the dynamic supervisor, hence the name being passed - but for some reason we check if the consumer is alive (rather than handling the error from the DynamicSupervisor)

I've changed things around such that processes clean up after themselves, so an external termination would only be a fallback

Either way, I agree that we should do these in steps to keep things reviewable!

alco

Looking great so far 👍

packages/sync-service/lib/electric/plug/serve_shape_plug.ex

alco · 2024-10-03T11:14:03Z

packages/sync-service/lib/electric/shapes/consumer_supervisor.ex

+        Consumer.Supervisor.clean_and_stop(%{
+          electric_instance_id: electric_instance_id,
+          shape_id: shape_id
+        })


I'm not entirely sure what you're referring to here. The code feels weird in the sense that the passed name argument is not used but a name is constructed for the consumer process. It's not your changes, it was weird before them.

It's not important to dwell on this right now, IMO. We can clear up process lifecycles at another incremental step of the overall move from ShapeCache to individual shape process trees.

packages/sync-service/lib/electric/shapes/consumer/supervisor.ex

msfstef · 2024-10-03T15:47:22Z

@balegas PR should be ready, @robacourt has been kind enough to kick off a benchmark run for this PR so we can see results on Monday and hopefully merge (?) I expect an improvement in concurrent shape creation and no regressions on other benchmarks - let's see

balegas · 2024-10-03T15:55:10Z

Yeah looking forward to that! If the benchmarks show anything odd, we'll be happy to have run the benchmarks before :).

robacourt

Great work! This shows a massive speed improvement for concurrent shape creation while the other benchmarks remain unchanged:

Fixes #1770 With #1787 we've managed to return 429s whenever there's too many concurrent shape creations that cause the database connection pool to be exhausted. This PR just ensures that the client does indeed retry on 429s - for now just with our regular exponential backoff, as there is no standard for retry headers to respect. P.S. additional changes to the openapi spec done by my formatter 👀 I can roll them back if you think they are worse than before

msfstef requested review from alco and robacourt October 3, 2024 09:30

msfstef commented Oct 3, 2024

View reviewed changes

packages/sync-service/lib/electric/plug/serve_shape_plug.ex Outdated Show resolved Hide resolved

msfstef commented Oct 3, 2024

View reviewed changes

packages/sync-service/lib/electric/shapes/consumer/supervisor.ex Outdated Show resolved Hide resolved

msfstef commented Oct 3, 2024

View reviewed changes

alco reviewed Oct 3, 2024

View reviewed changes

msfstef marked this pull request as ready for review October 3, 2024 15:27

msfstef added 7 commits October 3, 2024 18:28

Move operations from ShapeCache to Shape.Consumer

968e443

Remove awaiting snapshot start map

f886a8e

Remove call handlers as we have cast handlers

836ed71

Fix tests to work with new consumer

9970217

Fix open telemetry error logging

6d1a193

Make shape status a mod arg

019ce46

Wait directly from shape workers

edf9eed

msfstef force-pushed the msfstef/handle-overloads-gracefully branch from dd9738c to edf9eed Compare October 3, 2024 15:28

msfstef mentioned this pull request Oct 3, 2024

Slow to respond concurrent requests for new shapes #1785

Open

robacourt approved these changes Oct 9, 2024

View reviewed changes

msfstef merged commit edb0f72 into main Oct 9, 2024
23 checks passed

msfstef deleted the msfstef/handle-overloads-gracefully branch October 9, 2024 08:17

msfstef mentioned this pull request Oct 10, 2024

fix: Handle 429s in the client #1830

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Move operations from `ShapeCache` to `Shape.Consumer` #1787

feat: Move operations from `ShapeCache` to `Shape.Consumer` #1787

msfstef commented Oct 3, 2024 •

edited

Loading

netlify bot commented Oct 3, 2024 •

edited

Loading

msfstef Oct 3, 2024

msfstef Oct 3, 2024

alco Oct 3, 2024

msfstef Oct 3, 2024

alco left a comment

alco Oct 3, 2024

msfstef commented Oct 3, 2024

balegas commented Oct 3, 2024 •

edited

Loading

robacourt left a comment

feat: Move operations from ShapeCache to Shape.Consumer #1787

feat: Move operations from ShapeCache to Shape.Consumer #1787

Conversation

msfstef commented Oct 3, 2024 • edited Loading

netlify bot commented Oct 3, 2024 • edited Loading

✅ Deploy Preview for electric-next ready!

msfstef Oct 3, 2024

Choose a reason for hiding this comment

msfstef Oct 3, 2024

Choose a reason for hiding this comment

alco Oct 3, 2024

Choose a reason for hiding this comment

msfstef Oct 3, 2024

Choose a reason for hiding this comment

alco left a comment

Choose a reason for hiding this comment

alco Oct 3, 2024

Choose a reason for hiding this comment

msfstef commented Oct 3, 2024

balegas commented Oct 3, 2024 • edited Loading

robacourt left a comment

Choose a reason for hiding this comment

feat: Move operations from `ShapeCache` to `Shape.Consumer` #1787

feat: Move operations from `ShapeCache` to `Shape.Consumer` #1787

msfstef commented Oct 3, 2024 •

edited

Loading

netlify bot commented Oct 3, 2024 •

edited

Loading

balegas commented Oct 3, 2024 •

edited

Loading