-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Move operations from ShapeCache
to Shape.Consumer
#1787
Conversation
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -293,9 +296,12 @@ defmodule Electric.ShapeCache do | |||
{:reply, :started, [], state} | |||
|
|||
true -> | |||
Logger.debug("Starting a wait on the snapshot #{shape_id} for #{inspect(from)}}") | |||
GenServer.cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid going through the shape cache and directly go to a shape consumer, but for the sake of incrementally refactoring this I kept it like this as I think there are more concurrency considerations in the other case
packages/sync-service/lib/electric/shapes/consumer/supervisor.ex
Outdated
Show resolved
Hide resolved
Consumer.Supervisor.clean_and_stop(%{ | ||
electric_instance_id: electric_instance_id, | ||
shape_id: shape_id | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure if we need to also have a fallback here to forcefully do DynamicSupervisor.stop(name, pid)
- I figured if the Supervisor is alive it should be able to handle its own shutdown gracefully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what you're referring to here. The code feels weird in the sense that the passed name
argument is not used but a name is constructed for the consumer process. It's not your changes, it was weird before them.
It's not important to dwell on this right now, IMO. We can clear up process lifecycles at another incremental step of the overall move from ShapeCache to individual shape process trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea was that we would terminate processes via the dynamic supervisor, hence the name being passed - but for some reason we check if the consumer is alive (rather than handling the error from the DynamicSupervisor
)
I've changed things around such that processes clean up after themselves, so an external termination would only be a fallback
Either way, I agree that we should do these in steps to keep things reviewable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great so far 👍
Consumer.Supervisor.clean_and_stop(%{ | ||
electric_instance_id: electric_instance_id, | ||
shape_id: shape_id | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what you're referring to here. The code feels weird in the sense that the passed name
argument is not used but a name is constructed for the consumer process. It's not your changes, it was weird before them.
It's not important to dwell on this right now, IMO. We can clear up process lifecycles at another incremental step of the overall move from ShapeCache to individual shape process trees.
packages/sync-service/lib/electric/shapes/consumer/supervisor.ex
Outdated
Show resolved
Hide resolved
dd9738c
to
edf9eed
Compare
@balegas PR should be ready, @robacourt has been kind enough to kick off a benchmark run for this PR so we can see results on Monday and hopefully merge (?) I expect an improvement in concurrent shape creation and no regressions on other benchmarks - let's see |
Yeah looking forward to that! If the benchmarks show anything odd, we'll be happy to have run the benchmarks before :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #1770 With #1787 we've managed to return 429s whenever there's too many concurrent shape creations that cause the database connection pool to be exhausted. This PR just ensures that the client does indeed retry on 429s - for now just with our regular exponential backoff, as there is no standard for retry headers to respect. P.S. additional changes to the openapi spec done by my formatter 👀 I can roll them back if you think they are worse than before
Addresses #1785 and partially addresses #1770
Moves a lot of the operations that went through
ShapeCache
directly into theShape.Consumer
, so that requests can be replied to directly from the shape consumers rather than flooding theShapeCache
with casts that take a while to reach the requesters.I've tried to keep changes to a minimum in order to do this incrementally and keep these PRs easily reviewable - the
ShapeStatus
still persists data on every call, the relations and truncates still go throughShapeCache
rather than individual shapes, etcI've also caught the
DBConnection.ConnectionError
s for queue timeouts and converted them to 429 errors.We need to also handle
GenServer.call
timeouts as sometimes the PG query might not fail but take longer than the default 5 seconds for the GenServer call.NOTE: I have not updated any tests yet as I first want to ensure people agree with the approach
PERFORMANCE CHECK: