Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threadpool: spawn new tasks onto a random worker #683

Merged
2 commits merged into from Oct 3, 2018
Merged

threadpool: spawn new tasks onto a random worker #683

2 commits merged into from Oct 3, 2018

Conversation

ghost
Copy link

@ghost ghost commented Oct 3, 2018

Motivation

Once an IO handle gets assigned to a reactor, it stays assigned to it forever. A handle is assigned to a reactor the first time it is polled.

Now that every worker thread in tokio-threadpool drives its own reactor, we'd like to distribute the IO work among reactors as equally as possible. This is only possible if each newly spawned task is polled the first time on a random worker.

Currently, when spawning a new task inside threadpool, it always goes into the local worker's queue. If a task spawns a bunch of child IO tasks, that means a disproportionate number of them will be polled the first time by the current worker thread. Work stealing doesn't help much in this case - it turns out a lot of tasks will be polled before other worker threads get a chance to steal them, thus skewing the IO handle distribution onto reactors.

Solution

When spawning a new task, always pick a random worker and push the task into that worker's inbound queue.

Note: this heuristic is not perfect, but seems to be an improvement over the current task spawning strategy. We might tweak it further in the future.

I've expanded the tokio-reactor benchmark. Here's our baseline using tokio-io-pool:

test io_pool::notify_many    ... bench:  34,479,546 ns/iter (+/- 10,232,063)

This is tokio-threadpool before PR #660 (at commit 886511c), with a single global reactor:

test threadpool::notify_many ... bench:  87,539,637 ns/iter (+/- 4,874,898)

After PR #660 (commit d35d051), with a reactor per worker thread:

test threadpool::notify_many ... bench:  57,666,138 ns/iter (+/- 6,762,130)

Finally, this PR, with randomized task submission:

test threadpool::notify_many ... bench:  46,792,382 ns/iter (+/- 5,140,460)

Still not as fast as tokio-io-pool, but we're getting there.

cc @jonhoo

@ghost ghost requested a review from carllerche October 3, 2018 12:48
@jonhoo
Copy link
Contributor

jonhoo commented Oct 3, 2018

Interesting... tokio-io-pool uses a slightly different heuristic here: anything that gets spawned onto the pool (e.g. with Runtime::spawn, Runtime::spawn_all, or through a Handle) is sent to a random worker, whereas anything that is spawned on the pool (through tokio::spawn). The idea behind this was that when you add a new thing onto the pool, it's likely that that that's the result of something like an accept, and then it should definitely be distributed evenly (as you observe in this PR). Once something is on the pool though, we want to keep spawn fast, so we may as well spawn it locally. Specifically, the observation is that I/O tasks don't generally spawn lots of other I/O tasks.

This might be trickier in tokio proper since the listener will also be running on the pool, so we don't have a good way of distinguishing the two...

@ghost
Copy link
Author

ghost commented Oct 3, 2018

This might be trickier in tokio proper since the listener will also be running on the pool, so we don't have a good way of distinguishing the two...

Yep, that's why I'm leaning towards always spawning a new task onto a random worker - just in case.

Once something is on the pool though, we want to keep spawn fast, so we may as well spawn it locally.

I'm not as experienced with the real-world use of Tokio, but I wonder: Is the cost of spawning something we should be particularly worried about?

My gut feeling is that tasks are relatively rarely spawned and that the cost of spawning tends to be negligible in the grand scheme of things. On the other hand, I'd wager tasks are yielded much more often than they're spawned so we should strive to optimize task switching as much as possible.

@jonhoo
Copy link
Contributor

jonhoo commented Oct 3, 2018

Yeah, you could be right that spawning is relatively rare on the critical path, so this may just be a non-issue :)

Copy link
Member

@carllerche carllerche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me, I just had a question about the dependency version bump in tokio-reactor/Cargo.toml, but not a blocker.

@@ -27,9 +27,9 @@ mio = "0.6.14"
num_cpus = "1.8.0"
parking_lot = "0.6.3"
slab = "0.4.0"
tokio-executor = { version = "0.1.1", path = "../tokio-executor" }
tokio-io = { version = "0.1.6", path = "../tokio-io" }
tokio-executor = { version = "0.1.5", path = "../tokio-executor" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this bumped? Were the minimal versions wrong?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just a desperate attempt while I was struggling to patch the dependencies. :)

In the end, moving [patch.crates-io] to the root Cargo.toml fixed the issue.

Thanks for reporting this to rust-lang/cargo!

@@ -89,3 +89,20 @@ serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"
time = "0.1"

[patch.crates-io]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed rust-lang/cargo#6126 to address this.

@carllerche
Copy link
Member

I'm using this patch to bench the hyper hello example:

With this patch:

$ wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   368.86us   69.77us   4.96ms   90.77%
    Req/Sec    75.06k     6.64k   79.88k    88.00%
  746917 requests in 10.00s, 62.68MB read
Requests/sec:  74679.24
Transfer/sec:      6.27MB

Requests/sec: 74679.24

Against current_thread::Runtime

wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   603.60us   84.34us   5.56ms   91.32%
    Req/Sec    81.17k     2.25k   82.73k    98.00%
  807192 requests in 10.00s, 67.74MB read
Requests/sec:  80708.63
Transfer/sec:      6.77MB

Requests/sec: 80708.63

So, it seems like there is still some source of additional overhead there.

@carllerche
Copy link
Member

Hyper is doing basic spawning here, so in theory it should be distributed across the runtime.

It could be that multi threading provides no added benefit in this benchmark, but I am not 100% sure what is going on.

@ghost
Copy link
Author

ghost commented Oct 3, 2018

Here's the same hyper hello benchmark on my machine.

Baseline tokio-io-pool:

$ wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   362.54us  603.80us  14.48ms   97.47%
    Req/Sec    91.21k     3.24k   96.54k    72.00%
  907058 requests in 10.00s, 76.12MB read
Requests/sec:  90675.35
Transfer/sec:      7.61MB

tokio-threadpool before PR #660:

$ wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   452.42us  743.86us  26.01ms   96.34%
    Req/Sec    77.83k     3.30k   85.16k    83.00%
  774089 requests in 10.00s, 64.96MB read
Requests/sec:  77374.46
Transfer/sec:      6.49MB

PR #660:

$ wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   418.65us    0.90ms  18.85ms   97.56%
    Req/Sec    86.94k     2.85k   92.42k    78.00%
  865452 requests in 10.01s, 72.63MB read
Requests/sec:  86487.88
Transfer/sec:      7.26MB

This PR:

$ wrk -t1 -d10 -c50 http://127.0.0.1:3000/
Running 10s test @ http://127.0.0.1:3000/
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   420.79us  812.86us  18.17ms   97.32%
    Req/Sec    87.12k     3.24k   95.14k    71.00%
  867375 requests in 10.01s, 72.79MB read
Requests/sec:  86687.78
Transfer/sec:      7.28MB

@carllerche
Copy link
Member

@stjepang Interesting. What is the system that you are running the benchmarks on?

@carllerche
Copy link
Member

@stjepang either way, I have approved the PR, so feel free to merge 👍

@ghost
Copy link
Author

ghost commented Oct 3, 2018

My machine:

  • OS: GNU/Linux, kernel v4.17.19
  • Arch: x86-64
  • CPU: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
  • Cores: 2 physical / 4 logical

@ghost ghost merged commit e27b0a4 into tokio-rs:master Oct 3, 2018
@ghost ghost deleted the submit-random branch October 3, 2018 21:09
@ghost ghost mentioned this pull request Nov 13, 2018
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants