Using River in a separate database w/o transactional enqueuing #554
Replies: 2 comments 2 replies
-
Regarding your proposed algorithm: this definitely can be made to work, and is pretty much how every non-transactional job queue works using anything like Redis (e.g. Sidekiq), RabbitMQ, or Kafka. This is how we ran things at Heroku and Stripe for many years. It works, but it's definitely a reliable source of churn/errors/bugs. You get a lot of job churn (i.e. workers are unnecessarily busy) as jobs check to see if they can run yet, realize they can't, and back off. You will see subtle bugs too — for some jobs the backoff condition will be an easy binary (e.g. does the user exist yet or not?), but it might not be so clear for others, causing a job to think it's ready to run when it wasn't, and operating on incomplete/bad data. That said, there definitely are advantages to keeping high throughput changes out of the main database. I do so at work regularly and the result is a main database that's had next to zero downtime in years of operation, which is pretty great. |
Beta Was this translation helpful? Give feedback.
-
In addition to @brandur’s great comment, I’ll share my perspective. I see this primarily as a question about where I’d like to spend my time, and with some tradeoffs and constraints that come with scale. In general I would say the transactional enqueuing model is an excellent starting point for the vast majority of users. It’s quite easy to spend a little more to scale your database and they can get very big these days with large instance & memory options. Compared to the time you would spend developing all your jobs so that they can robustly handle all the different failure modes with transaction rollbacks and other distributed systems failures, it’s a no brainer for me to start here. The approach does eventually hit some constraints due to write throughput and bloat. Some of these can be worked around quite easily to give you a lot more headroom (periodic concurrent reindexing for example). Once you start running out of scaling headroom, you’ve probably reached a scale where it makes sense to start optimizing things, or maybe you just have a use case with a massive number of jobs and are willing to tolerate the complexity that goes along with it. The good news is, once you hit this point it’s straightforward to migrate any individual task out of your DB backed queue into Kafka or Redis or whatever else. At that point you will get to solve all the same distributed systems challenges you would have had to tackle if you didn’t start with transactional enqueueing in River, except you will have significantly delayed that time investment. You’ll also only need to make that investment for specific high throughput jobs that demand it, and can do so gradually as needed. Even once you hit this point, you may still want to utilize the transactional outbox pattern to give you transactionality before you push that work downstream into another system. Read up on the transactional outbox pattern or the “Life Beyond Distributed Transactions” paper from Amazon circa 2007 iirc. In short, this evolution seems to me like an ideal application of pragmatic engineering tradeoffs and it’s why I recommend this model as the default for most use cases 🙂 If your use case is something where you’re going to be investing thousands of events per second from an analytics service, yeah, you probably don’t want to stuff those all in River. But that would seem like a situation where the use case requiring that scale deserves a specialized solution—and even then the rest of the main business logic of your app could still benefit from transactional enqueueing in River. I hope that helps! |
Beta Was this translation helpful? Give feedback.
-
Hi there! I've been following River for a while and the project is looking great, thank you for building this :)
My team is considering adopting River, primarily to minimize the number of stateful services we depend on (we already use Postgres, so a Postgres-backed queue would be great).
One decision we're thinking about is whether to use the same database for River as we use for our application data (so we can take advantage of transactional enqueuing), or to use a separate database (so we can scale the database instances independently, and database load from River will be less likely to impact our app's synchronous operations).
For now I'm trying to better understand how much complexity "enqueue before transaction completes" adds vs. transactional enqueuing. I know both the River site and Brandur's site have discussed this topic quite a bit and would really appreciate any additional insight you can share. Below is how I think I would implement the "enqueue before transaction completes" pattern, curious if there any major pitfalls you see with this?
In addition, if there are other trade-offs you would recommend thinking about (with same database vs. separate database), would appreciate the insight! Thank you :)
Beta Was this translation helpful? Give feedback.
All reactions