-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we run multiple workers for a delayed queue #91
Comments
@junwchina Do you have a proposal for how to address this issue? |
@gresrun Yes, I have a thought. The basic idea is all ready jobs should be pushed to regular queue which is implemented by list type. Let's say each queue will have two internal queues, one is a delayed queue in Also, there is a That's all, please let me know how do you think. |
hey @junwchina, moving that to the main loop sounds nice, but this issue can crop up again when running jesque on multiple machines.
related: https://groups.google.com/forum/#!msg/redis-db/ur9U8o-Sko0/tgefLK3zrzQJ cc @gresrun |
@argvk @junwchina @zhangliuping |
@argvk Yes, Multi/Exec is a way to fix this issue. Can you clarify why you think my solution will have issue on multiple machines? I prefer to use two internal queues. The reason is:
|
@junwchina I meant to say jesque instances running on multiple machines, might have concurrency issues, due to multiple main loops running across them all. Totally agree with your points, the MULTI/EXEC is in addition to moving it to the main loop. thoughts ? |
@argvk I think that should not be a problem on multiple machines. We just need to ensure every redis operator must be atomic. That would be possible based on current redis version. |
@junwchina OK, I'm trying to work on this and I having a little difficulty understanding this part
AFAIK, |
@argvk Just like you said in the previous message. You can use |
Any implementation which will require 2 extra redis calls will not scale well under load. Even the query to check type is overloading the redis in my case. This is because redis performance in VM environments is not very good. I am getting around 40k-50k ops per second on a VM as opposed to dedicated hardware where i get around 200K |
Here's a novel idea: replace the majority of local queueKey = KEYS[1]
local inFlightKey = KEYS[2]
local freqKey = KEYS[3]
local now = ARGV[1]
local payload = nil
local not_empty = function(x)
return (type(x) == 'table') and (not x.err) and (#x ~= 0)
end
local ok, queueType = next(redis.call('TYPE', queueKey))
if queueType == 'zset' then
local i, lPayload = next(redis.call('ZRANGEBYSCORE', queueKey, '-inf', now, 'WITHSCORES'))
if lPayload then
payload = lPayload
local frequency = redis.call('HGET', freqKey, payload)
if frequency then
redis.call('ZINCRBY', queueKey, frequency, payload)
else
redis.call('ZREM', queueKey, payload)
end
end
elseif queueType == 'list' then
payload = redis.call('LPOP', queueKey)
if payload then
redis.call('LPUSH', inFlightKey, payload)
end
end
return payload Since the evaluation of scripts are atomic, this is guaranteed to be safe for multiple workers. Thoughts? |
@gresrun that'd be perfect, debugging 🎄 |
@gresrun It likes a big MULTI/EXEC operator. I think these three keys will be blocked longer then separated MULTI/EXEC. Happy Christmas!:christmas_tree: |
@junwchina I was concerned about the same thing but my initial testing showed just the opposite; overall job throughput almost doubled(!) for a single worker using the new Lua-based implementation of Jesque-2.1.0 => ~2500 jobs/sec
Jesque-2.1.1-SNAPSHOT/lua_pop => ~4200 jobs/sec
|
@gresrun Maybe it's because you wrapped all of these redis operators into one, and decreased some network requests. If there are only one worker to consume the jobs, I think it should be perfect. I am afraid what if there are many workers to consume and lots of clients are enqueue at the same time. All workers and clients do want to have an operator on these keys, the bottleneck might occur. |
@junwchina OK, I wanted to see if that is the case so I modified my test to allow utilize multiple workers and the results indicate the Lua pop is far better under heavy contention! 4 WorkersJesque-2.1.0 => ~4250 jobs/sec with 4 workers 😦
Jesque-2.1.1-SNAPSHOT/lua_pop => ~10550 jobs/sec with 4 workers 😄
8 WorkersJesque-2.1.0 => ~2800 jobs/sec with 8 workers 😦 😦
Jesque-2.1.1-SNAPSHOT/lua_pop => ~12950 jobs/sec with 8 workers 😄 😄
|
@gresrun OK. It seems very good. |
Merged. |
@gresrun Thanks for your hard work. 👍 |
Is there already a release date for 2.1.1? I'd need a release to depend on including the fix for this issue. Thanks for sharing Jesque! |
@thammerl I just cut Jesque 2.1.1; it should appear in Maven Central in a few hours. |
Copying my old and latest becnhmarks: Case 2 - Jesque 2.1.1 As it can be seen the performance for low duration jobs is largely unaffected by increase in num of workers from 10 to 150... What's really cool is that in a web/mobile backend scenario where business logic computations and database interactions can vary from 10s of milliseconds for one task to 100s of milliseconds in another task we can keep a common worker pool and considerably increase the number of workers without the fear of contention issues. Previously i used to assign a fraction(s) of workers to to different types of jobs from total worker pool. |
@gresrun Thanks a lot! |
Hello,
The delayed queue is a good feature. But It seems there is a issue when a delayed queue consumed by multiple workers. The implementation on
WorkerImpl#pop
will return a null when it is processing the same job with the another worker. That will be a big performance issue if there are many collisions like this.The text was updated successfully, but these errors were encountered: