Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve thread notify #597

Merged
merged 4 commits into from
Oct 25, 2017
Merged

Improve thread notify #597

merged 4 commits into from
Oct 25, 2017

Conversation

carllerche
Copy link
Member

This depends on #596.

This PR improves ThreadNotify performance over 2x in some cases by doing two things:

  • Avoiding an Arc per call to wait. Instead, a TLS is used to cache a ThreadNotify handle.
  • Use an atomic to avoid locks / signaling when not needed.

This renames an internal type away from deprecated terminology.
This change avoids the Arc allocation for each blocking call as well as
eliminates the need to perform the Arc ref count increment if
unnecessary.
Unfortunately, using an atomic requires a final atomic CAS within the
"wakeup" mutex. This means we cannot use the thread park / unpark
helpers from std.

This change increases a single threaded "yield" benchmark by almost 40%.
@alexcrichton
Copy link
Member

Is the improvement over thread park/unpark here critical to land for now? Ideally we'd leave it as is if it's otherwise a "nice to have" and instead upstream the changes to libstd I think?

@carllerche
Copy link
Member Author

Avoiding the Arc on each call is pretty important and unrelated to std. Given that change, using a mutex / condvar avoids the extra indirection to 'std::Thread'.

@alexcrichton
Copy link
Member

Oh sorry yeah the arc caching is fine, but I was wondering if we could avoid adding the atomic optimizations here and instead upstream them.

@carllerche
Copy link
Member Author

If you want to pull this into rust proper, feel free to do so. For now, I'll just leave this PR here and it can be pulled in whenever you want to get the perf increase.

@alexcrichton
Copy link
Member

Was this needed for a particular project? I'm just thinking it's probably best to land the park/unpark improvements upstream and land the arc caching here, but if the park/unpark improvements are needed now then seems fine to land here and upstream in parallel

@carllerche
Copy link
Member Author

Yes, this change is required when going from async -> sync in a perf sensitive scenario. Specifically, using a channel to send telemetry info to a sync thread that processes it.

That said, I am currently using a custom waiter for this scenario until perf improvements are land upstream.

loop {
m = self.condvar.wait(m).unwrap();

// Transition back to idle, loop otherwise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this ever not be NOTIFY? Can this be just an atomic store and return?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condvars can wakeup spuriously

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I forgot that actual OS condvars can do that - Go's sync.Cond does not. Apologies!

}

// The other half is sleeping, this requires a lock
let _m = self.mutex.lock().unwrap();
Copy link

@twmb twmb Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worthwhile to have a separate notifier_mutex? This would allow something like...

let _nm = match self.notifier_mutex.try_lock() {
    Ok(g) => g,
    Err(e) => match e {
        TryLockResult::Poisoned(e) => panic!(e),
        TryLockResult::WouldBlock => { return ; }
    }
}
let _m = self.mutex.lock().unwrap();
...

which would avoid simultaneous notifications sitting on a mutex to notify a thread that will only need the first (and would also help avoid [not eliminate] the scenario where the first notification woke the sleeping thread, which then consumes all events, and then gets falsely notified by the other notifications that hadn't hit yet).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh nvm - this would still require keeping one notifier on deck, which TryLock does not provide.

let _m = self.mutex.lock().unwrap();

// Transition from SLEEP -> NOTIFY
match self.state.compare_and_swap(SLEEP, NOTIFY, Ordering::SeqCst) {
Copy link

@twmb twmb Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this cas would just need to be a store with notifier_mutex.

[edit: nvm - it still needs to be a compare_and_swap, b/c a simultaneous notification that is slow to the try_lock could still fall in here after the parked thread awakens and swaps to idle]

@alexcrichton
Copy link
Member

@carllerche ok sounds good to me! I'll send a PR to libstd with these changes so we can eventually move back to thread park/unpark

@alexcrichton alexcrichton merged commit 6d861ee into rust-lang:master Oct 25, 2017
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Oct 25, 2017
This is an adaptation of rust-lang/futures-rs#597 for the standard library.
The goal here is to avoid locking a mutex on the "fast path" for thread
park/unpark where you're waking up a thread that isn't sleeping or otherwise
trying to park a thread that's already been notified. Mutex performance varies
quite a bit across platforms so this should provide a nice consistent speed
boost for the fast path of these functions.
@carllerche
Copy link
Member Author

Thanks.

Incidentally, the entire impl and thread local could be avoided if we could convert std::thread::Thread to and from *mut UnsafeNotify. This should be possible as std::Thread is just a wrapper around Arc<Inner> but the necessary APIs aren't present.

I don't really have much insight into what std would want to provide to support this.

@carllerche
Copy link
Member Author

I mean, we could technically do it w/ mem::transmute, but that seems unwise.

bors added a commit to rust-lang/rust that referenced this pull request Oct 27, 2017
std: Optimize thread park/unpark implementation

This is an adaptation of rust-lang/futures-rs#597 for the standard library.
The goal here is to avoid locking a mutex on the "fast path" for thread
park/unpark where you're waking up a thread that isn't sleeping or otherwise
trying to park a thread that's already been notified. Mutex performance varies
quite a bit across platforms so this should provide a nice consistent speed
boost for the fast path of these functions.
@alexcrichton
Copy link
Member

Nah yeah I think we don't want to assume the size of Thread just yet, but we could consider into/from usize upstream!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants