Using River in a separate database w/o transactional enqueuing #554

raymondregrello · 2024-08-23T07:40:14Z

raymondregrello
Aug 23, 2024

Hi there! I've been following River for a while and the project is looking great, thank you for building this :)

My team is considering adopting River, primarily to minimize the number of stateful services we depend on (we already use Postgres, so a Postgres-backed queue would be great).

One decision we're thinking about is whether to use the same database for River as we use for our application data (so we can take advantage of transactional enqueuing), or to use a separate database (so we can scale the database instances independently, and database load from River will be less likely to impact our app's synchronous operations).

For now I'm trying to better understand how much complexity "enqueue before transaction completes" adds vs. transactional enqueuing. I know both the River site and Brandur's site have discussed this topic quite a bit and would really appreciate any additional insight you can share. Below is how I think I would implement the "enqueue before transaction completes" pattern, curious if there any major pitfalls you see with this?

Process that enqueues:
- start transaction
- write to database (e.g. create user 123)
- enqueue job (e.g. send email to user 123)
- commit transaction

Process that handles the job:
- Check certain preconditions (e.g. if user 123 exists)
- If the check fails, sleep for N seconds in case transaction has not committed yet
- Check preconditions again
- If the check still failed, assume the transaction was rolled back and return success
- Do the work (e.g. send email to user 123)

In addition, if there are other trade-offs you would recommend thinking about (with same database vs. separate database), would appreciate the insight! Thank you :)

brandur · 2024-08-23T14:27:34Z

brandur
Aug 23, 2024
Maintainer

Regarding your proposed algorithm: this definitely can be made to work, and is pretty much how every non-transactional job queue works using anything like Redis (e.g. Sidekiq), RabbitMQ, or Kafka. This is how we ran things at Heroku and Stripe for many years.

It works, but it's definitely a reliable source of churn/errors/bugs. You get a lot of job churn (i.e. workers are unnecessarily busy) as jobs check to see if they can run yet, realize they can't, and back off. You will see subtle bugs too — for some jobs the backoff condition will be an easy binary (e.g. does the user exist yet or not?), but it might not be so clear for others, causing a job to think it's ready to run when it wasn't, and operating on incomplete/bad data.

That said, there definitely are advantages to keeping high throughput changes out of the main database. I do so at work regularly and the result is a main database that's had next to zero downtime in years of operation, which is pretty great.

1 reply

raymondregrello Oct 8, 2024
Author

Good point on the job churn and that the backoff condition is not always so straightforward! Thank you for your help with this :)

bgentry · 2024-08-23T14:44:31Z

bgentry
Aug 23, 2024
Maintainer

In addition to @brandur’s great comment, I’ll share my perspective. I see this primarily as a question about where I’d like to spend my time, and with some tradeoffs and constraints that come with scale.

In general I would say the transactional enqueuing model is an excellent starting point for the vast majority of users. It’s quite easy to spend a little more to scale your database and they can get very big these days with large instance & memory options. Compared to the time you would spend developing all your jobs so that they can robustly handle all the different failure modes with transaction rollbacks and other distributed systems failures, it’s a no brainer for me to start here.

The approach does eventually hit some constraints due to write throughput and bloat. Some of these can be worked around quite easily to give you a lot more headroom (periodic concurrent reindexing for example). Once you start running out of scaling headroom, you’ve probably reached a scale where it makes sense to start optimizing things, or maybe you just have a use case with a massive number of jobs and are willing to tolerate the complexity that goes along with it.

The good news is, once you hit this point it’s straightforward to migrate any individual task out of your DB backed queue into Kafka or Redis or whatever else. At that point you will get to solve all the same distributed systems challenges you would have had to tackle if you didn’t start with transactional enqueueing in River, except you will have significantly delayed that time investment. You’ll also only need to make that investment for specific high throughput jobs that demand it, and can do so gradually as needed. Even once you hit this point, you may still want to utilize the transactional outbox pattern to give you transactionality before you push that work downstream into another system. Read up on the transactional outbox pattern or the “Life Beyond Distributed Transactions” paper from Amazon circa 2007 iirc. In short, this evolution seems to me like an ideal application of pragmatic engineering tradeoffs and it’s why I recommend this model as the default for most use cases 🙂

If your use case is something where you’re going to be investing thousands of events per second from an analytics service, yeah, you probably don’t want to stuff those all in River. But that would seem like a situation where the use case requiring that scale deserves a specialized solution—and even then the rest of the main business logic of your app could still benefit from transactional enqueueing in River.

I hope that helps!

1 reply

raymondregrello Oct 8, 2024
Author

Apologies for the very delayed reply, thanks so much for taking the time to share your thoughts! This was really helpful :) It makes a lot of sense to start with all tasks in this model and then migrate individual high throughput tasks out as necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using River in a separate database w/o transactional enqueuing #554

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using River in a separate database w/o transactional enqueuing #554

raymondregrello Aug 23, 2024

Replies: 2 comments · 2 replies

brandur Aug 23, 2024 Maintainer

raymondregrello Oct 8, 2024 Author

bgentry Aug 23, 2024 Maintainer

raymondregrello Oct 8, 2024 Author

raymondregrello
Aug 23, 2024

Replies: 2 comments 2 replies

brandur
Aug 23, 2024
Maintainer

raymondregrello Oct 8, 2024
Author

bgentry
Aug 23, 2024
Maintainer

raymondregrello Oct 8, 2024
Author