Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a lock-free work-stealing deque #10678

Merged
merged 1 commit into from
Nov 29, 2013

Conversation

alexcrichton
Copy link
Member

This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).

The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.

I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).

Closes #4877

@alexcrichton
Copy link
Member Author

I've updated this with a slight optimization. The Buffer class no longer uses ~[T] and uses *T instead, and I also shuffled around a few things to get rid of a div instruction that LLVM was generating (saw a ~5% decrease in push time).

bors added a commit that referenced this pull request Nov 29, 2013
This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).

The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.

I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).

Closes #4877
This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).

The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.

I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).

Closes rust-lang#4877
bors added a commit that referenced this pull request Nov 29, 2013
This adds an implementation of the Chase-Lev work-stealing deque to libstd
under std::rt::deque. I've been unable to break the implementation of the deque
itself, and it's not super highly optimized just yet (everything uses a SeqCst
memory ordering).

The major snag in implementing the chase-lev deque is that the buffers used to
store data internally cannot get deallocated back to the OS. In the meantime, a
shared buffer pool (synchronized by a normal mutex) is used to
deallocate/allocate buffers from. This is done in hope of not overcommitting too
much memory. It is in theory possible to eventually free the buffers, but one
must be very careful in doing so.

I was unable to get some good numbers from src/test/bench tests (I don't think
many of them are slamming the work queue that much), but I was able to get some
good numbers from one of my own tests. In a recent rewrite of select::select(),
I found that my implementation was incredibly slow due to contention on the
shared work queue. Upon switching to the parallel deque, I saw the contention
drop to 0 and the runtime go from 1.6s to 0.9s with the most amount of time
spent in libuv awakening the schedulers (plus allocations).

Closes #4877
@bors bors closed this Nov 29, 2013
@bors bors merged commit a70f9d7 into rust-lang:master Nov 29, 2013
@alexcrichton alexcrichton deleted the issue-4877 branch December 1, 2013 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write a parallel deque for work stealing
2 participants