Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we run multiple workers for a delayed queue #91

Closed
junwchina opened this issue Nov 17, 2015 · 24 comments
Closed

Can we run multiple workers for a delayed queue #91

junwchina opened this issue Nov 17, 2015 · 24 comments
Assignees
Labels

Comments

@junwchina
Copy link

Hello,

The delayed queue is a good feature. But It seems there is a issue when a delayed queue consumed by multiple workers. The implementation on WorkerImpl#pop will return a null when it is processing the same job with the another worker. That will be a big performance issue if there are many collisions like this.

@gresrun
Copy link
Owner

gresrun commented Nov 30, 2015

@junwchina Do you have a proposal for how to address this issue?

@junwchina
Copy link
Author

@gresrun Yes, I have a thought. The basic idea is all ready jobs should be pushed to regular queue which is implemented by list type.

Let's say each queue will have two internal queues, one is a delayed queue in zset, the another is a regular queue in list. In the main loop, I will handle the queue's delayed queue firstly, and push all ready delayed jobs into it's regular queue. No matter there is any ready delayed jobs, I will handle the queue's regular queue then. Since the regular queue is a list type, we can use lpop or rpop to get a job to run. These operators are atomic, so we don't need to care the multiple thread issue. It ensures every thread should can get a job to run if it's not empty.

Also, there is a rpoplpush operator in redis which is atomic too, maybe we can use it to implement lpoplpush. The current implementation will get collision too on multiple threads environments.

That's all, please let me know how do you think.

@argvk
Copy link
Contributor

argvk commented Dec 14, 2015

hey @junwchina, moving that to the main loop sounds nice, but this issue can crop up again when running jesque on multiple machines.
Another way would be to perform zrange and zrem within a multi/exec, that way it acts like an atomic zpop.
Something like this:

> multi
OK
> zrangebyscore resque:queue:fooSchedule -1 1450151702000 withscores
QUEUED
> zremrangebyscore resque:queue:fooSchedule -1 1450151702000
QUEUED
> EXEC
1) 1) "{\"class\":\"fooScheduleJob\",\"args\":[54625],\"vars\":null}"
   2) "1450151700000"
2) (integer) 1
> multi
OK
> zrangebyscore resque:queue:fooSchedule -1 1450151701000 withscores
QUEUED
> zremrangebyscore resque:queue:fooSchedule -1 1450151701000
QUEUED
> EXEC
1) (empty list or set)
2) (integer) 0

related: https://groups.google.com/forum/#!msg/redis-db/ur9U8o-Sko0/tgefLK3zrzQJ

cc @gresrun

@gresrun
Copy link
Owner

gresrun commented Dec 14, 2015

@argvk @junwchina @zhangliuping
This definitely looks like an issue that needs to be addressed. I'm currently in the process of relocating from Florida to California so, I am currently not able to take a look at this. Pull requests are most welcome!

@junwchina
Copy link
Author

@argvk Yes, Multi/Exec is a way to fix this issue. Can you clarify why you think my solution will have issue on multiple machines?

I prefer to use two internal queues. The reason is:

  1. zremrangebyscore's time complex is O(log(N)), but lpop or rpop is only O(1). I think it's better to save as few as possible jobs in delayed queue. So in the main loop, I would like to get all ready jobs from delayed queue by one operator, and push all of them into regular queue.
  2. Since all running jobs are fetching from the internal regular queue. So we can enqueue all kinds of jobs into one queue.
  3. is coming.. :)

@argvk @gresrun Please let me know how do you think.

@argvk
Copy link
Contributor

argvk commented Dec 15, 2015

@junwchina I meant to say jesque instances running on multiple machines, might have concurrency issues, due to multiple main loops running across them all. Totally agree with your points, the MULTI/EXEC is in addition to moving it to the main loop.

thoughts ?
👍

@junwchina
Copy link
Author

@argvk I think that should not be a problem on multiple machines. We just need to ensure every redis operator must be atomic. That would be possible based on current redis version.

@argvk
Copy link
Contributor

argvk commented Dec 15, 2015

@junwchina OK, I'm trying to work on this and I having a little difficulty understanding this part

I would like to get all ready jobs from delayed queue by one operator,

AFAIK, zremrangebyscore only returns the numbers of elements removed. So, is there something else I can use to return and remove the elements atomically ?

@junwchina
Copy link
Author

@argvk Just like you said in the previous message. You can use zrangebyscore and zremrangebyscore together within Multi/Exec.

@abhinavdwivedi
Copy link

Any implementation which will require 2 extra redis calls will not scale well under load. Even the query to check type is overloading the redis in my case. This is because redis performance in VM environments is not very good. I am getting around 40k-50k ops per second on a VM as opposed to dedicated hardware where i get around 200K

@gresrun
Copy link
Owner

gresrun commented Dec 24, 2015

Here's a novel idea: replace the majority of WorkerImpl.pop() with an EVAL of a lua script that returns the next job.

local queueKey = KEYS[1]
local inFlightKey = KEYS[2]
local freqKey = KEYS[3]
local now = ARGV[1]

local payload = nil

local not_empty = function(x)
  return (type(x) == 'table') and (not x.err) and (#x ~= 0)
end

local ok, queueType = next(redis.call('TYPE', queueKey))
if queueType == 'zset' then
    local i, lPayload = next(redis.call('ZRANGEBYSCORE', queueKey, '-inf', now, 'WITHSCORES'))
    if lPayload then
        payload = lPayload
        local frequency = redis.call('HGET', freqKey, payload)
        if frequency then
            redis.call('ZINCRBY', queueKey, frequency, payload)
        else
            redis.call('ZREM', queueKey, payload)
        end
    end
elseif queueType == 'list' then
    payload = redis.call('LPOP', queueKey)
    if payload then
        redis.call('LPUSH', inFlightKey, payload)
    end
end

return payload

Since the evaluation of scripts are atomic, this is guaranteed to be safe for multiple workers. Thoughts?

cc @junwchina @argvk @abhinavdwivedi

@gresrun
Copy link
Owner

gresrun commented Dec 25, 2015

Take a look at 5721a43 (on a new branch lua_pop) for a proof-of-concept. As a side-note, the Lua-based pop() is over twice as fast in my micro-benchmark!

cc @junwchina @argvk @abhinavdwivedi

@argvk
Copy link
Contributor

argvk commented Dec 25, 2015

@gresrun that'd be perfect, debugging WorkerImpl.pop() might not be all that straightforward, but personally it is trade-off I'm OK with.

🎄

@junwchina
Copy link
Author

@gresrun It likes a big MULTI/EXEC operator. I think these three keys will be blocked longer then separated MULTI/EXEC.

Happy Christmas!:christmas_tree:

@gresrun
Copy link
Owner

gresrun commented Dec 25, 2015

@junwchina I was concerned about the same thing but my initial testing showed just the opposite; overall job throughput almost doubled(!) for a single worker using the new Lua-based implementation of pop(). Using this mirco-benchmark, which uses standard queues, I got the following results:

Jesque-2.1.0 => ~2500 jobs/sec

10:13:41.287 [main] INFO  n.g.jesque.perftest.PerfTest - Starting test...
10:13:47.806 [main] INFO  n.g.jesque.perftest.PerfTest - Enqueue complete!
10:13:47.807 [Worker-0 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.PerfTest - Started the clock...
10:14:27.757 [Worker-0 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.PerfTest - Completed 100000 jobs in 39950ms - Avg. 2503.1289111389237 jobs/sec
10:14:27.761 [main] INFO  n.g.jesque.perftest.PerfTest - Test complete!

Jesque-2.1.1-SNAPSHOT/lua_pop => ~4200 jobs/sec

10:15:38.862 [main] INFO  n.g.jesque.perftest.PerfTest - Starting test...
10:15:45.436 [main] INFO  n.g.jesque.perftest.PerfTest - Enqueue complete!
10:15:45.437 [Worker-0 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.PerfTest - Started the clock...
10:16:09.293 [Worker-0 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.PerfTest - Completed 100000 jobs in 23855ms - Avg. 4191.993292810732 jobs/sec
10:16:09.294 [main] INFO  n.g.jesque.perftest.PerfTest - Test complete!

@junwchina
Copy link
Author

@gresrun Maybe it's because you wrapped all of these redis operators into one, and decreased some network requests. If there are only one worker to consume the jobs, I think it should be perfect. I am afraid what if there are many workers to consume and lots of clients are enqueue at the same time. All workers and clients do want to have an operator on these keys, the bottleneck might occur.

@gresrun
Copy link
Owner

gresrun commented Dec 26, 2015

@junwchina OK, I wanted to see if that is the case so I modified my test to allow utilize multiple workers and the results indicate the Lua pop is far better under heavy contention!

4 Workers

Jesque-2.1.0 => ~4250 jobs/sec with 4 workers 😦

23:54:41.600 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Starting test...
23:54:48.020 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Enqueue complete!
23:54:48.021 [Worker-2 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Started the clock...
23:55:11.662 [Worker-2 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Completed 100000 jobs in 23641ms - Avg. 4229.93951186498 jobs/sec
23:55:11.664 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Test complete!

Jesque-2.1.1-SNAPSHOT/lua_pop => ~10550 jobs/sec with 4 workers 😄

23:55:57.815 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Starting test...
23:56:04.314 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Enqueue complete!
23:56:04.315 [Worker-2 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Started the clock...
23:56:13.805 [Worker-2 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Completed 100000 jobs in 9489ms - Avg. 10538.518284329224 jobs/sec
23:56:13.807 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Test complete!

8 Workers

Jesque-2.1.0 => ~2800 jobs/sec with 8 workers 😦 😦

23:58:36.598 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Starting test...
23:58:43.007 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Enqueue complete!
23:58:43.008 [Worker-5 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Started the clock...
23:59:19.015 [Worker-3 Jesque-2.1.0: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Completed 100000 jobs in 36007ms - Avg. 2777.2377593245756 jobs/sec
23:59:19.018 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Test complete!

Jesque-2.1.1-SNAPSHOT/lua_pop => ~12950 jobs/sec with 8 workers 😄 😄

23:57:17.676 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Starting test...
23:57:24.218 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Enqueue complete!
23:57:24.220 [Worker-2 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Started the clock...
23:57:31.950 [Worker-4 Jesque-2.1.1-SNAPSHOT: RUNNING] INFO  n.g.jesque.perftest.MultiPerfTest - Completed 100000 jobs in 7730ms - Avg. 12936.6106080207 jobs/sec
23:57:31.953 [main] INFO  n.g.jesque.perftest.MultiPerfTest - Test complete!

@junwchina
Copy link
Author

@gresrun OK. It seems very good.

@gresrun
Copy link
Owner

gresrun commented Dec 27, 2015

Merged.

@gresrun gresrun closed this as completed Dec 27, 2015
@junwchina
Copy link
Author

@gresrun Thanks for your hard work. 👍

@thammerl
Copy link

thammerl commented Jan 8, 2016

Is there already a release date for 2.1.1? I'd need a release to depend on including the fix for this issue. Thanks for sharing Jesque!

@gresrun
Copy link
Owner

gresrun commented Jan 10, 2016

@thammerl I just cut Jesque 2.1.1; it should appear in Maven Central in a few hours.

@abhinavdwivedi
Copy link

Copying my old and latest becnhmarks:
Job times: 10, 50, 200, 1000, 5000 in ms
Num of Workers: 10, 20, 40, 50, 100, 150
Case 1) Jesque 2.0.2
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (10), Time: (15860ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (50), Time: (10662ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (200), Time: (25759ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (1000), Time: (100355ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (5000), Time: (500966ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (10), Time: (6136ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (50), Time: (6072ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (200), Time: (16253ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (1000), Time: (51090ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (5000), Time: (251239ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (10), Time: (6769ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (50), Time: (7054ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (200), Time: (6898ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (1000), Time: (26891ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (5000), Time: (126925ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (10), Time: (7253ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (50), Time: (7798ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (200), Time: (7323ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (1000), Time: (22162ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (5000), Time: (102479ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (10), Time: (14764ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (50), Time: (14705ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (200), Time: (9567ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (1000), Time: (14912ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (5000), Time: (54647ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (10), Time: (22356ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (50), Time: (22306ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (200), Time: (17401ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (1000), Time: (18004ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (5000), Time: (42544ms)

Case 2 - Jesque 2.1.1
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (10), Time: (5919ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (50), Time: (10252ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (200), Time: (25213ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (1000), Time: (100254ms)
Start of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (10), numTask: (1000), jobTimeInMillis: (5000), Time: (500699ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (10), Time: (5226ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (50), Time: (5218ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (200), Time: (15176ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (1000), Time: (50192ms)
Start of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (20), numTask: (1000), jobTimeInMillis: (5000), Time: (250299ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (10), Time: (5245ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (50), Time: (5185ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (200), Time: (10176ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (1000), Time: (25172ms)
Start of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (40), numTask: (1000), jobTimeInMillis: (5000), Time: (125264ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (10), Time: (5262ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (50), Time: (5170ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (200), Time: (5177ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (1000), Time: (20165ms)
Start of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (50), numTask: (1000), jobTimeInMillis: (5000), Time: (100216ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (10), Time: (5289ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (50), Time: (5204ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (200), Time: (5170ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (1000), Time: (10180ms)
Start of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (100), numTask: (1000), jobTimeInMillis: (5000), Time: (50246ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (10)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (10), Time: (5250ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (50)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (50), Time: (5314ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (200)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (200), Time: (5183ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (1000)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (1000), Time: (10175ms)
Start of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (5000)
End of Run - maxWorkers: (150), numTask: (1000), jobTimeInMillis: (5000), Time: (35209ms)

As it can be seen the performance for low duration jobs is largely unaffected by increase in num of workers from 10 to 150... What's really cool is that in a web/mobile backend scenario where business logic computations and database interactions can vary from 10s of milliseconds for one task to 100s of milliseconds in another task we can keep a common worker pool and considerably increase the number of workers without the fear of contention issues. Previously i used to assign a fraction(s) of workers to to different types of jobs from total worker pool.

@thammerl
Copy link

@gresrun Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants