Improve thread notify #597

carllerche · 2017-10-01T21:57:01Z

This depends on #596.

This PR improves ThreadNotify performance over 2x in some cases by doing two things:

Avoiding an Arc per call to wait. Instead, a TLS is used to cache a ThreadNotify handle.
Use an atomic to avoid locks / signaling when not needed.

This renames an internal type away from deprecated terminology.

This change avoids the Arc allocation for each blocking call as well as eliminates the need to perform the Arc ref count increment if unnecessary.

Unfortunately, using an atomic requires a final atomic CAS within the "wakeup" mutex. This means we cannot use the thread park / unpark helpers from std. This change increases a single threaded "yield" benchmark by almost 40%.

alexcrichton · 2017-10-04T09:53:42Z

Is the improvement over thread park/unpark here critical to land for now? Ideally we'd leave it as is if it's otherwise a "nice to have" and instead upstream the changes to libstd I think?

carllerche · 2017-10-04T14:47:35Z

Avoiding the Arc on each call is pretty important and unrelated to std. Given that change, using a mutex / condvar avoids the extra indirection to 'std::Thread'.

alexcrichton · 2017-10-04T22:12:49Z

Oh sorry yeah the arc caching is fine, but I was wondering if we could avoid adding the atomic optimizations here and instead upstream them.

carllerche · 2017-10-11T04:48:16Z

If you want to pull this into rust proper, feel free to do so. For now, I'll just leave this PR here and it can be pulled in whenever you want to get the perf increase.

alexcrichton · 2017-10-11T17:39:34Z

Was this needed for a particular project? I'm just thinking it's probably best to land the park/unpark improvements upstream and land the arc caching here, but if the park/unpark improvements are needed now then seems fine to land here and upstream in parallel

carllerche · 2017-10-24T20:28:58Z

Yes, this change is required when going from async -> sync in a perf sensitive scenario. Specifically, using a channel to send telemetry info to a sync thread that processes it.

That said, I am currently using a custom waiter for this scenario until perf improvements are land upstream.

twmb · 2017-10-24T21:31:03Z

src/task_impl/std/mod.rs

+        loop {
+            m = self.condvar.wait(m).unwrap();
+
+            // Transition back to idle, loop otherwise


Why would this ever not be NOTIFY? Can this be just an atomic store and return?

condvars can wakeup spuriously

Ah right, I forgot that actual OS condvars can do that - Go's sync.Cond does not. Apologies!

twmb · 2017-10-24T21:49:48Z

src/task_impl/std/mod.rs

+        }
+
+        // The other half is sleeping, this requires a lock
+        let _m = self.mutex.lock().unwrap();


Would it be worthwhile to have a separate notifier_mutex? This would allow something like...

let _nm = match self.notifier_mutex.try_lock() { Ok(g) => g, Err(e) => match e { TryLockResult::Poisoned(e) => panic!(e), TryLockResult::WouldBlock => { return ; } } } let _m = self.mutex.lock().unwrap(); ...

which would avoid simultaneous notifications sitting on a mutex to notify a thread that will only need the first (and would also help avoid [not eliminate] the scenario where the first notification woke the sleeping thread, which then consumes all events, and then gets falsely notified by the other notifications that hadn't hit yet).

Eh nvm - this would still require keeping one notifier on deck, which TryLock does not provide.

twmb · 2017-10-24T21:50:59Z

src/task_impl/std/mod.rs

+        let _m = self.mutex.lock().unwrap();
+
+        // Transition from SLEEP -> NOTIFY
+        match self.state.compare_and_swap(SLEEP, NOTIFY, Ordering::SeqCst) {


I think this cas would just need to be a store with notifier_mutex.

[edit: nvm - it still needs to be a compare_and_swap, b/c a simultaneous notification that is slow to the try_lock could still fall in here after the parked thread awakens and swaps to idle]

alexcrichton · 2017-10-25T15:08:11Z

@carllerche ok sounds good to me! I'll send a PR to libstd with these changes so we can eventually move back to thread park/unpark

This is an adaptation of rust-lang/futures-rs#597 for the standard library. The goal here is to avoid locking a mutex on the "fast path" for thread park/unpark where you're waking up a thread that isn't sleeping or otherwise trying to park a thread that's already been notified. Mutex performance varies quite a bit across platforms so this should provide a nice consistent speed boost for the fast path of these functions.

carllerche · 2017-10-25T18:50:18Z

Thanks.

Incidentally, the entire impl and thread local could be avoided if we could convert std::thread::Thread to and from *mut UnsafeNotify. This should be possible as std::Thread is just a wrapper around Arc<Inner> but the necessary APIs aren't present.

I don't really have much insight into what std would want to provide to support this.

carllerche · 2017-10-25T18:52:07Z

I mean, we could technically do it w/ mem::transmute, but that seems unwise.

std: Optimize thread park/unpark implementation This is an adaptation of rust-lang/futures-rs#597 for the standard library. The goal here is to avoid locking a mutex on the "fast path" for thread park/unpark where you're waking up a thread that isn't sleeping or otherwise trying to park a thread that's already been notified. Mutex performance varies quite a bit across platforms so this should provide a nice consistent speed boost for the fast path of these functions.

alexcrichton · 2017-10-27T16:03:06Z

Nah yeah I think we don't want to assume the size of Thread just yet, but we could consider into/from usize upstream!

carllerche added 4 commits September 30, 2017 21:44

Rename ThreadUnpark -> ThreadNotify

5323e68

This renames an internal type away from deprecated terminology.

Reuse ThreadNotify when blocking thread for future

28df950

This change avoids the Arc allocation for each blocking call as well as eliminates the need to perform the Arc ref count increment if unnecessary.

Use an atomic to avoid unnecessary mutex locking

d832074

Unfortunately, using an atomic requires a final atomic CAS within the "wakeup" mutex. This means we cannot use the thread park / unpark helpers from std. This change increases a single threaded "yield" benchmark by almost 40%.

Add ThreadNotify benchmark

60d9ce1

carllerche mentioned this pull request Oct 2, 2017

Provide TestHarness to make testing future / combinator implementations easier. #598

Closed

twmb reviewed Oct 24, 2017

View reviewed changes

alexcrichton merged commit 6d861ee into rust-lang:master Oct 25, 2017

alexcrichton mentioned this pull request Oct 25, 2017

std: Optimize thread park/unpark implementation rust-lang/rust#45524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve thread notify #597

Improve thread notify #597

carllerche commented Oct 1, 2017

alexcrichton commented Oct 4, 2017

carllerche commented Oct 4, 2017

alexcrichton commented Oct 4, 2017

carllerche commented Oct 11, 2017

alexcrichton commented Oct 11, 2017

carllerche commented Oct 24, 2017

twmb Oct 24, 2017

carllerche Oct 24, 2017

twmb Oct 24, 2017

twmb Oct 24, 2017 •

edited

Loading

twmb Oct 24, 2017

twmb Oct 24, 2017 •

edited

Loading

alexcrichton commented Oct 25, 2017

carllerche commented Oct 25, 2017

carllerche commented Oct 25, 2017

alexcrichton commented Oct 27, 2017

Improve thread notify #597

Improve thread notify #597

Conversation

carllerche commented Oct 1, 2017

alexcrichton commented Oct 4, 2017

carllerche commented Oct 4, 2017

alexcrichton commented Oct 4, 2017

carllerche commented Oct 11, 2017

alexcrichton commented Oct 11, 2017

carllerche commented Oct 24, 2017

twmb Oct 24, 2017

Choose a reason for hiding this comment

carllerche Oct 24, 2017

Choose a reason for hiding this comment

twmb Oct 24, 2017

Choose a reason for hiding this comment

twmb Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

twmb Oct 24, 2017

Choose a reason for hiding this comment

twmb Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

alexcrichton commented Oct 25, 2017

carllerche commented Oct 25, 2017

carllerche commented Oct 25, 2017

alexcrichton commented Oct 27, 2017

twmb Oct 24, 2017 •

edited

Loading

twmb Oct 24, 2017 •

edited

Loading