Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Vtec234 · 2016-08-01T14:56:15Z

What?

This PR adds a new type to crossbeam::sync - the MarkableArcCell. This type provides all the functionality of ArcCell and more. In particular, it allows one to mark the cell with a boolean value and then perform atomic compare_exchange operations on the (Arc<T>, bool) pair. It is based on a type from the java.util.concurrent.atomic module called the AtomicMarkableReference (docs) and is supposed to provide the same kind of functionality.

I also made a tiny addition to ArcCell - ArcCell::with_val(v) is equivalent to ArcCell::new(Arc::new(v)). I think it's nicer not to have to expliciitly initialize the Arc each time, but if the change is unwanted I don't mind removing it as it's not too important.

Why?

This type allows us to build concurrent lock-free collections as described in "The Art of Multiprocessor Programming" by M. Herlihy and N. Shavit. The book provides code in Java, which is a garbage-collected language. The big problem in writing such structures in non-GC languages is memory cleanup. In my opinion, the easiest to use equivalent of a GC is atomic reference counting, provided by Arc<T>.

I have already written a lock-free linked-list-based map and will publish it after this PR or its equivalent is merged, as the map depends on the MarkableArcCell. This will allow us to then create more performant maps, such as skiplist-based ones and stabilize the interface for a concurrent map in Rust, as discussed in #41, after which work on a lock-free hashmap could begin in crossbeam.

I have made some decisions regarding the interface of MarkableArcCell which might raise questions, so please ask and I will try to justify them.

schets · 2016-08-01T15:44:07Z

src/sync/markable_arc_cell.rs

+    // Locks the internal spinlock.
+    fn take(&self) -> Arc<T> {
+        loop {
+            match self.0.swap(0, Ordering::Acquire) {


This isn't lock free, a suspended thread between a take/put operation can block others here arbitrarily.

While this isn't inherently wrong, this does mean that collections written with this won't be lock-free either.

You're right, and we should consider other ways of creating a markable pointer.

schets · 2016-08-01T15:52:09Z

What's the use case for this? The only literature that I found used this as some sort of atomic Optional. Are there uses for this where the nullness of the pointer and the marker status differ?

Is there a reason this must be reference counted instead of using the epoch memory management?

If there are any guarantees on heap pointer alignment in rust, this could also be done by using unused bits and would be implementable in a lock-free manner. If/when double word atomics are added they could make this current implementation lock free as well.

Vtec234 · 2016-08-01T17:10:15Z

If there are any guarantees on heap pointer alignment in rust, this could also be done by using unused bits and would be implementable in a lock-free manner.

Yes, a tagged pointer would be an optimization over two values, but the reason I use a spinlock is not that there are two values. When you look at ArcCell, it also spinlocks. The problem is in the Arc->AtomicUsize conversion.

Consider the situation:
Thread 1 reads the usize and transmutes it to an Arc. Assuming no other Arc copies are present, the refcount is 1. Then it suspends. Thread 2 comes in, and uses set_arc (or set in ArcCell) to set a new value. set_arc returns the old value, which Thread 2 then drops. The refcount is 0, and Thread 1 holds an invalid value.
The spinlock protects from this, since between a take and put the value is guaranteed to be valid.

The ability to make this lock-free therefore depends not on squashing the boolean into the unused bits of the Arc pointer, but on making ArcCell lock-free. It would be great if somebody could do that, unfortunately I have no idea how at the moment.

If/when double word atomics are added they could make this current implementation lock free as well.

I wouldn't count on hardware support for this, unfortunately, but as explained above a DCAS is not necessary.

What's the use case for this?

The problem I am trying to solve is that we need a stronger guarantee than just knowing that the memory we have a pointer to can be accessed, because while the memory may be valid, it might've been logically deleted from the list.

When holding a mem::epoch::Shared pointer, we know that the node has not been deallocated, but we are not sure whether it is still in the list. Inserting a new node after a node that has been logically deleted will have no effect, and would therefore be invalid. A marker bit solves this, as we can use an atomic operation to make sure that we are setting the next pointer of the previous node to the new one only if the previous node's marker is false, which indicates still being in the list. A remove procedure would switch that bit to true.

In other words, we need a way to make sure that the pointer we hold is still accessible from the list when modifying it.

My explanation is probably a bit wonky, so please see 4.3.3.1, p. 57 in this.

Is there a reason this must be reference counted instead of using the epoch memory management?

Probably not, but to my knowledge epoch memory management would require a similar addition of a pointer-boolean pair for this to work, unless it already contains something like this. Due to the MarkableArcCell not actually being lock-free as you remarked, we might consider adding a markable pointer to mem::epoch instead.

EDIT: In fact, I think this could be done with some additions mem::epoch::Atomic. I'll think about it some more, but that might be a better option.

schets · 2016-08-02T17:58:04Z

I'm at work right now so this will be short, but I think that this commit is really doing two things at once - it's adding the idea of a markable pointer, and also the idea of an Atomic Arc. Would it make sense to separate them? If only a markable pointer was needed it would be far cheaper than the AtomicMarkableArc for loads/stores, and the AtomicArc also wouldn't have to deal with the overhead of being markable.

Vtec234 · 2016-08-02T21:49:12Z

it's adding the idea of a markable pointer, and also the idea of an Atomic Arc.

If by AtomicArc you mean a type that provides atomic storage and retrieval of an Arc, then, well, it's already in crossbeam and it's called ArcCell. It's true that it doesn't provide CAS, but it might be extended to do that as well. Anyhow, this PR doesn't add that idea, as it's already there.

A markable pointer type is something that is indeed not yet present, and it's the aim of this PR to add one.

If only a markable pointer was needed it would be far cheaper than the AtomicMarkableArc for loads/stores, and the AtomicArc also wouldn't have to deal with the overhead of being markable.

The aim of this PR is to build a markable pointer that is already integrated with a cleanup scheme (not simply a ptr-boolean pair that has to be freed like a normal ptr) and the current choice is ref-counting or the epoch GC. It's built on top of ArcCell, because I want to try to build some ref-counted structures and see how they hold up.

An alternative to ref-counting, as you mentioned, is a GC (e.g. the epoch GC), and a similar marked pointer type might be added to mem::epoch, but it's a matter of benchmarking to see whether it is really cheaper - no refcounts, but GC sweeps, etc.
You also rightly said that it's not lock-free, so I made an ArcCell with a wait-free get() and I'll try to rewrite the markable variant on top of that.

Vtec234 · 2016-08-25T19:34:59Z

Finally got around to writing a more concurrency-friendly version of MarkableArcCell. In this one, all read methods are wait-free and all methods which write the Arc part are wait-free with respect to each other (other writes), but will lock until all readers are done. compare_arc_exchange_mark, which writes only the marker bit, may loop a little due to a CAS operation, so i'm not sure whether it can be classified as wait-free, but it's definitely lock-free.

With these guarantees, the guarantees that a skiplist as described in "The Art of Multiprocessor Programming" by M. Herlihy and N. Shavit implemented using this type would provide are:
find/contains - always lock-free
concurrent removes and inserts - won't block
concurrent writes with finds - writes may block until reads are done

What do you think?

arthurprs · 2016-10-07T08:25:05Z

I think this would be more appealing if you could link the code of the use cases you said to have. Code can be enlightening.

Vtec234 · 2016-11-10T16:22:59Z

@arthurprs You're right. I have uploaded a repository containing a concurrent linked list based on the MarkableArcCell: SortedLazyList. It's just a PoC, so I might've gotten my lock-freedom guarantees wrong, there might be undetected race conditions (even though the tests seem to pass), etc., but it demonstrates how data structures can be built based on this primitive. It's also terribly slow, but it could be extended relatively easily into the skiplist I mentioned earlier, and skiplists are not bad at all.
I'm sorry for taking so long, but I've been busy with uni and other silly things.

arthurprs · 2016-11-10T17:30:59Z

That's great. But I'm confused, you say the linked list is terribly slow so how the skip list would fare better?

arthurprs · 2016-11-10T17:33:30Z

Oh I see it's not actually a linked list, just similar name.

Vtec234 · 2016-11-10T17:37:11Z

Yes, skiplists are essentialy made by using several linked lists internally, each one with a different number of elements. Thanks to this and some other properties, skiplists scale much better than linked lists.

arthurprs · 2016-11-23T20:47:04Z

I see the use case more clearly now, thanks. Does this scheme perform reasonably well in practice though?

Vtec234 · 2017-01-01T20:51:30Z

The linked list itself is not that useful, but good as a starting point. I used it in that way to write a map implemented as a skiplist using pretty much the same techniques as the linked list does. The problem with that map is that it is only lock-free if the underlying primitive (MarkableArcCell) is lock-free, and at the moment it is not (Please disregard the documentation on my repo, it assumes that MarkableArcCell is lock-free. Also, I was wrong when stating here earlier that it is). It nevertheless seems to perform reasonably well, maintaining the O(log n) complexity of a skiplist, although I hadn't yet done any significant optimizations or benchmarked it properly, since I focused on correctness when writing it. Like the linked list, it's based on "The Art of Multiprocessor Programming".

Now, even though the skiplist is more or less fast, we want it to perform well under contention, which can only be guaranteed by lock-freedom. I can see two ways of achieving this. One is to fix MarkableArcCell to be lock-free. I am unsure whether that's even possible, but feel free to try.

The other way is to replace the memory recollection scheme with crossbeam's epoch GC. That seems like the way to go for now, since plain pointers (epoch::Atomic) can trivially be paired with a bool (we cannot avoid the Markable (ptr, bool) primitive for skiplists) in a lock-free atomic primitive. I will probably PR something like MarkableAtomic soon and then try to rewrite the skiplist on top of that. This will have the additional advantage of being most likely faster. (Link taken from aturon's excellent blog post), but some disadvantages with respect to interface design (&ValueType cannot be returned, only ValueType).

TL;DR concurrency is hard

arthurprs · 2017-01-26T11:38:50Z

@Vtec234 should this be closed (same as #102)?

Vtec234 · 2017-01-26T17:52:24Z

@arthurprs This primitive is integrated with a different memory-recollection scheme - reference counting. If the authors of crossbeam do not want to look into this scheme, since it's quite a bit slower, this should be closed, but i'll keep it open for now as reference counting has some advantages over garbage collection, primarily having more predictable recollection times.

Vtec234 · 2018-03-07T19:08:17Z

More general types have been proposed in crossbeam-rs/rfcs#28.

schets reviewed Aug 1, 2016
View reviewed changes

Vtec234 mentioned this pull request Aug 2, 2016

ArcCell: wait-free get() #81

Closed

Vtec234 added 4 commits August 24, 2016 13:18

ArcCell: wait-free get()

573adf3

ArcCell: fix ordering

8ce8fa9

Add with_val to ArcCell; Add MarkableArcCell

eae9a71

Make MarkableArcCell docs marginally more readable

7bc6e35

Vtec234 force-pushed the lazy-list-map branch from 8b1abae to 7bc6e35 Compare August 24, 2016 11:19

Vtec234 added 5 commits August 25, 2016 00:15

Revert ArcCell with_val; Begin wait-free MarkableArcCell

b202d26

Initial better concurrency for MarkableArcCell

cf588d9

Make some methods on MarkableArcCell wait-free

a23b2cc

Revert ArcCell changes

aedf256

Add concurrent MarkableArcCell test

33febf2

Vtec234 mentioned this pull request Jan 6, 2017

Add MarkableAtomic #102

Closed

Vtec234 closed this Mar 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Vtec234 commented Aug 1, 2016 •

edited

Loading

schets Aug 1, 2016 •

edited

Loading

Vtec234 Aug 1, 2016

schets commented Aug 1, 2016

Vtec234 commented Aug 1, 2016 •

edited

Loading

schets commented Aug 2, 2016

Vtec234 commented Aug 2, 2016

Vtec234 commented Aug 25, 2016 •

edited

Loading

arthurprs commented Oct 7, 2016

Vtec234 commented Nov 10, 2016

arthurprs commented Nov 10, 2016

arthurprs commented Nov 10, 2016

Vtec234 commented Nov 10, 2016

arthurprs commented Nov 23, 2016 •

edited

Loading

Vtec234 commented Jan 1, 2017 •

edited

Loading

arthurprs commented Jan 26, 2017

Vtec234 commented Jan 26, 2017

Vtec234 commented Mar 7, 2018

Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Add a MarkableArcCell to allow building reference-counted concurrent collections #80

Conversation

Vtec234 commented Aug 1, 2016 • edited Loading

What?

Why?

schets Aug 1, 2016 • edited Loading

Choose a reason for hiding this comment

Vtec234 Aug 1, 2016

Choose a reason for hiding this comment

schets commented Aug 1, 2016

Vtec234 commented Aug 1, 2016 • edited Loading

schets commented Aug 2, 2016

Vtec234 commented Aug 2, 2016

Vtec234 commented Aug 25, 2016 • edited Loading

arthurprs commented Oct 7, 2016

Vtec234 commented Nov 10, 2016

arthurprs commented Nov 10, 2016

arthurprs commented Nov 10, 2016

Vtec234 commented Nov 10, 2016

arthurprs commented Nov 23, 2016 • edited Loading

Vtec234 commented Jan 1, 2017 • edited Loading

arthurprs commented Jan 26, 2017

Vtec234 commented Jan 26, 2017

Vtec234 commented Mar 7, 2018

Vtec234 commented Aug 1, 2016 •

edited

Loading

schets Aug 1, 2016 •

edited

Loading

Vtec234 commented Aug 1, 2016 •

edited

Loading

Vtec234 commented Aug 25, 2016 •

edited

Loading

arthurprs commented Nov 23, 2016 •

edited

Loading

Vtec234 commented Jan 1, 2017 •

edited

Loading