Periodic Job enqueuing & leader election with multiple different river clients #343

magaldima · 2024-05-07T16:18:36Z

magaldima
May 7, 2024

I've observed (expected) behavior while developing with River (v0.4.1) that periodic jobs are only enqueued by an elected leader which I now understand to be part of the maintenance service responsibilities of an elected leader.

The "problem" that I'm experiencing is that I'm attempting to use river (and initialize River Clients) across a few different services that all have different workers and different river periodic job configurations and are mostly isolated by different queues. So if I have service A with a periodic job that executes every hour and service B with a different periodic job, if service B is the elected leader for more than 1 hour, service A's periodic job will not be enqueued per its configured schedule.

This is less of an issue and more of a question around how River was designed and if this use case was considered. Presumably, several services sharing a background job queue is a common pattern and we'd like for each service to contain the logic of processing jobs specific to that service. Does this necessitate that scheduled jobs should be declared and managed by a single service or that all the different applications that create a River client need to use the same River config (with the same set of periodic jobs including those that are not in scope for the specific service)?

The other workaround currently under discussion at the moment is creating separate DB schemas per service and instantiating the River tables in each schema so that each service has it's own elected leader. On the surface this seems like something River should be able to handle and I'm worried about the increased overhead with maintaining separate DB schemas for each application.

I was hoping you would be able to share insights/context into River's design and if this was something that was considered at any point in time and how you thought about it and if this is on your radar/roadmap. Curious what your recommendation is for a solution: consolidating job definition vs. separate DB schemas (or something else entirely) ?

Another last idea that I had was to create a new DB table in River river_periodic_job where the definition/configuration for periodic jobs could live and be shared across all services in a way that the river client configuration state can be pushed into the DB layer. In this way, the elected leader would have visibility into all periodic jobs and be able to enqueue jobs based on the periodic jobs defined in that DB table. Curious your thoughts on this idea as well - happy to help put forth a proposal and contribute time/work if you think this feature would be doable.

Thanks for the time for reading that through! I've been using River for a few months now and have loved it. If anything, this "problem" has arisen out of my exuberance to extend River's capabilities to many of our backend services.

elee1766 · 2024-05-08T01:58:13Z

elee1766
May 8, 2024

that all the different applications that create a River client need to use the same River config (with the same set of periodic jobs including those that are not in scope for the specific service)?

not the exact same river config, but yes, sharing the same scheduled_job across both workers and insert only client is what we are doing.

i ran into a similar problem here, you can see a response: #336 (comment)

0 replies

bgentry · 2024-05-08T02:18:09Z

bgentry
May 8, 2024
Maintainer

Hi @magaldima, there's a lot in here and it will probably be fairly open-ended, so I converted it to a discussion for now. There may still be some fixable issues to come out of the conversation though so don't consider it a dismissal 🙂

This is definitely an area where we could use better documentation, particularly including some architectural diagrams.

River provides some isolation between queues of different names operating in the same schema and database, but only really as far as which jobs get inserted & worked on the named queues. Higher level Client functionality like leader election and maintenance services (including periodic job insertion, job cleaning, etc) are done at the schema level and are not isolated per-queue.

While you will get some isolation with different services enqueueing and running their jobs on different named queues, it's not full isolation—and really you should have i.e. periodic jobs configured the same across all started clients in a single Postgres (database, schema/search_path) pair. This setup of course means your services need to contain each others' code, at least in order to be able to enqueue any of the periodic jobs configured across all of them. Definitely not ideal!

We have also been working on full support for running totally isolated River instances across different Postgres schemas/search paths within a single database. I'm not certain we're 100% of the way there yet, but we are at least pretty close as of the v4 migration changes. One of the ways we intend to confirm we're there is to migrate River's test suite from handing out independent databases per-test to instead using schemas.

With a schema-per-service setup, you would be able to have independent leaders per service, as well as different periodic job configs per service. If you decide to experiment with this, please let us know what you find as it might already work well! In theory it should "just work" by setting the search_path connection param in your pgx config per jackc/pgx#1013 (comment).

Another last idea that I had was to create a new DB table in River river_periodic_job where the definition/configuration for periodic jobs could live and be shared across all services in a way that the river client configuration state can be pushed into the DB layer. In this way, the elected leader would have visibility into all periodic jobs and be able to enqueue jobs based on the periodic jobs defined in that DB table. Curious your thoughts on this idea as well - happy to help put forth a proposal and contribute time/work if you think this feature would be doable.

We have some plans for a more robust periodic job implementation that allows for perfect gap free scheduling even as leaders come & go. I'm hoping it's something we'll start making progress on during the next few months but it's hard to be sure, we're working on shipping a lot of new stuff (most notably a UI).

Thanks for the time for reading that through! I've been using River for a few months now and have loved it. If anything, this "problem" has arisen out of my exuberance to extend River's capabilities to many of our backend services.

Thank you, this is great to hear 😄 ❤️

0 replies

magaldima · 2024-05-08T03:15:33Z

magaldima
May 8, 2024
Author

Hi @bgentry and thanks for the additional information. While the schema-per-service setup may work I'm not sure how viable the option is given the overhead in managing n schemas.

As for the plans mentioned around more robust periodic job scheduling, is there any additional info or resources that you can share? Will that solution also address the "problem" around periodic job state living in the river config? Thanks!

1 reply

bgentry May 8, 2024
Maintainer

As for the plans mentioned around more robust periodic job scheduling, is there any additional info or resources that you can share? Will that solution also address the "problem" around periodic job state living in the river config? Thanks!

Not really at this time, we haven't put a ton of thought into how it will work. Mostly we know we'd like to eventually address the shortcomings in the current design like potential gaps or duplicates that occur if a leader change happens right around the time of a periodic job being inserted, and potentially allowing for periodic jobs to be managed by code instead of being essentially hardcoded at init time.

brandur · 2024-05-08T05:24:32Z

brandur
May 8, 2024
Maintainer

@magaldima Can you share anything broadly about the use case you have that requires that periodic jobs be so dynamic?

River's definitely constructed in such a way that you're assumed to have most of the information about the periodic jobs you want at compile time. You can add more dynamically, but the set is still more static than dynamic, which for most cases of periodic jobs I've worked with in my career at least, would seem to me a sound model. You might change some, but doing so with a new code deploy is dynamic enough to be practical.

It is possible here that you might be deviating far enough off the mainline happy path that of periodic jobs that a better solution for you might be to write your own periodic scheduling module that's more aware of and smarter about the specific domain you're working with. That might sound like a large task, but it's not that bad — River's internal one is < 500 LOCs (and a lot of that is fluff for testability and such).

1 reply

magaldima May 8, 2024
Author

@brandur Our use case doesn't necessitate dynamic periodic jobs - in fact, statically defined periodic jobs would be fine if a service's periodic jobs (within many other services that also used River) were guaranteed to be enqueued according to their own schedules and not based on the chance of them being the elected leader out of a group of other River clients composed of distinct/disjoint services.

Our use case is simply having different services (with different periodic jobs) encapsulate the job definition and all logic around how the job is worked where each service instantiates a River client and uses the same River queue (db database/schema). Let me know if this part isn't clear and/or outside the scope of River's initial designs.

I appreciate the idea on implementing our own periodic scheduling module. Between that option and exploring the alternate schema per service I'll report back on where we land and what makes the most sense. That being said, would love to learn more about if this use case can be generally applicable and potentially incorporated into River which @bgentry hinted towards.

magaldima · 2024-09-23T16:11:59Z

magaldima
Sep 23, 2024
Author

Hey @brandur and @bgentry are there any updates that you can share here? We're now managing 4 DB schemas where we have deployed separate River instances. This is manageable for now but it's starting to add technical debt to our code and prevents us from being able to expose a single view of all jobs running across our set of services. We would love to be able to consolidate the queue while keeping scheduled job logic separated per service within our cluster (ideally we would partition service jobs via the queue).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic Job enqueuing & leader election with multiple different river clients #343

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Periodic Job enqueuing & leader election with multiple different river clients #343

magaldima May 7, 2024

Replies: 5 comments · 2 replies

elee1766 May 8, 2024

bgentry May 8, 2024 Maintainer

magaldima May 8, 2024 Author

bgentry May 8, 2024 Maintainer

brandur May 8, 2024 Maintainer

magaldima May 8, 2024 Author

magaldima Sep 23, 2024 Author

magaldima
May 7, 2024

Replies: 5 comments 2 replies

elee1766
May 8, 2024

bgentry
May 8, 2024
Maintainer

magaldima
May 8, 2024
Author

bgentry May 8, 2024
Maintainer

brandur
May 8, 2024
Maintainer

magaldima May 8, 2024
Author

magaldima
Sep 23, 2024
Author