-
-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dequeing should be first-in first-out #651
Conversation
Secondarily to priority, within a given priority (including null priority), dequeue respecting first-in first-out, by sorting on created_at
Warning: I'm actually going on vacation at end of day today, and wont' be back until Monday July 11th, so won't respond to any feedback/comments until then, sorry! Also: Benchmarking -- Locally, I tried to add a fourth benchmarking case for JUST created_at, without priority -- and in that run, got the benchmarker saying that all four cases were |
OK, I tried actually adding a million records to the db (haven't committed this to this PR), and saw a difference, like so...
Maybe just the result of having two sort conditions instead of one, since both of the single sort orderings were identical? May having more than one sort ordering (sql I suspect this isn't going to be an issue for less than hundreds of thousands of enqueued records. I also wonder how much "speed to dequeue" matters in total execution time -- Do those results really mean it can only dequeue 6-per-second in priority-sort, the current behavior? That seems surprisingly slow! I guess if you have a million records in the queue though... Going back to only 10K records in the queue, it's more like 60 dequeues per second (still not winning any contests with sidekiq, but I don't think it matters), and is not very different in the different cases, adding the second sort order is slight-to-no difference. It's possible we could add indexes to improve the million record performance, not sure. |
@jrochkind thank you thank you! This looks great. I'll do a little more poking and get this merged. I'm happy to finish it out if I discover anything that needs tweaking. |
Thanks! I really do think we should have some tests -- having some tests for I just wasn't able to figure out how to do those tests yet in the time I have today! |
Hi @bensheldon , I'm back from a week off. Thanks for adding the test! I'm curious what your current thoughts on this one are? And is there anything else I can do on it? |
@jrochkind welcome back! Hope you had a restful week off 🌴 Let's do it. I think FIFO is a good improvement. I'm not too worried about performance because I don't think it makes performance meaningfully worse than it already is at huge queue sizes. We should maybe open a separate Discussion for that. |
Thanks! If I have time, i'm going to follow-up with a more complex PR for "ordered queues" too, which I think is pretty do-able, and will only have a performance implication if someone chooses to use it, it'll be something you choose to use in a given worker, similar to how it works in resque and sidekiq. |
Btw, my intuition says there MIGHT be a way to use FIFO to actually make performance BETTER at huge queue sizes than it was previously, as there should be a way to make pg use the created_at index -- but it MIGHT require turning off the |
@jrochkind I'm colder on the ordered-queues option. Having had more discussion in #624, I think that's trying to recreate behavior that isn't necessary with GoodJob because GoodJob has job-level priority, whereas Sidekiq and Resque only have queue-level priority. |
I'll go discuss more there. |
I will give you my use case for this one here, because it is related to "well, can't you just add more capacity" -- your original argument against FIFO I think. I have very spikey job patterns, there can be a while with few jobs, then a period iwth a lot of jobs at once I have my workers on heroku with an autoscaler from hirefire. So there might be only one worker dyno running normally, but if the autoscaler sees a lot of work building up in the queue, it scales up more worker dynos. But it can take a few minutes for the dynos to scale up. This might be fine, because the load might be expected with like "ten minutes" latency. Or in pathological cases, it might take even longer -- it might scale up a few dynos, see that the queue has gotten even BIGGEr, then scale up a few more (wait a few more minutes), wash rinse repeat. It could be 20+ minutes until all the workers are fully scaled up to handle the load. That's okay in exceptional situations -- but during that time I want FIFO, I would not want something that came in 30 minutes ago happening, by chance, to still be waiting in the queue, while something that came in 1 second ago gets handled right away. The latency of the auto-scaler is acceptable because the queue is FIFO, so even if it takes 20 minutes to get the resources fully allocated, that doesn't necessarily mean any job was waiting 20 minutes, the oldest jobs are being handled first, the jobs waiting are the ones that came in later. It significantly reduces median or mean latency of jobs, under circumstances where the auto-scaler is catching up with capacity. |
Connects to #624.
For the sake of having something concrete to discuss!
Secondarily to priority, within a given priority (including null priority), dequeue respecting first-in first-out, by sorting on created_at
FIFO is how all other ActiveJob queues perform (so far as I know), and is kind of implied by the very word "queue". It will be less surprising to those coming from other ActiveJob queues, and that this is how every other queue performs is probably a signal that there are good reasons for it. (I can definitely think of many desirable things in my own use cases).
This PR does not yet include any tests. Any advice about the best place/way to do a test on ordering of dequeue? I looked for an existing test on ordering of dequeue based on
priority
to use as a model, but didn't find one of those. If one of those doesn't exist yet, would be good to add that too. I think we do want tests on dequeue ordering.I tried to enhance the existing benchmark script for this case. It does seem to suggest that including this sort is a bit slower? That's surprising to me, I wasn't really expecting it. I'm not sure to what extent speed to dequeue is really a bottleneck in most real use cases though -- it may depend on how many jobs you have in queue of course. If we do want to speed it up, perhaps there is a way to do so with indexing... but there's already a
created_at
index, although maybe it can't be used in this particular sort, I'm not really a pg index expert.Ideally I think FIFO would just be standard always, as here, because I think of it as table stakes for an ActiveJob queue. But if we do need to make it configurable because of perf reasons... perhaps make paying attention to priority configurable too? Not sure.