-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add garbage collector to std::gc #11399
Conversation
cc @pnkfelix, @pcwalton, @nikomatsakis (anyone else) |
I've filled in more details. |
How about special-casing RefCell and using the RefCell borrow flags to implement write barriers? (that is, add a new GcWriteBarrier state that causes borrow to append the RefCell address to a task-local modification log) This way, they should work automagically with no API change, and allow safe code to borrow with no write barriers (since it's done at the proper place, i.e. the RefCell that can potentially give &muts and not the Gc which doesn't). [Cell can only contain POD types, so the GC can ignore them] Of course, the GC needs to be made fully accurate and aware of types beforehand, but that's essential anyway. |
That may work, but it requires chaining RefCell and Gc together, while the former is definitely useful outside of Gc... would it make RefCell slower in the non-Gc case? Also, RefCell isn't the necessarily only type with interior mutability; people can define their own outside of libstd. In any case, Gc as implemented here does satisfy Pod, we may wish to not have this but satisfying Pod is certainly convenient. |
Excellent work! For the trie, here's the gold standard: http://www.hpl.hp.com/personal/Hans_Boehm/gc/tree.html |
Cool! This seems like a good place to float an idea I've been thinking about: what about doing tracing with a trait? I've never written a GC before, only read a few papers. Most GCs seem to do tracing entirely based on dynamic information: a header attached to each heap object, and/or runtime calls to register/deregister traceable areas of memory (as here). Instead of that, what if we had a trait like:
The If necessary, For the built-in types except An invariant that would have to be maintained in the type system is that data which transitively contains |
That would certainly be precise. However, it's not immediately clear to me how it interacts with Rust's stack, since AIUI, being precise on the stack (to work out which (Also, would that preclude performance optimisations, so that in applications just using the GC in only one task get slowed globally (which this GC doesn't do) because LLVM can't reorder/SROA things on the stack?) |
Not exactly, although the behavior would obviously be triggered only for RefCells that the garbage collector sees
No, because it already needs to check the flags to make sure it's not already borrowed, so the check for a write barrier and for the value already being borrowed can be done together at no extra performance cost.
This would have to banned by adding a FreezeExceptNonFreezeTypesWhichSupportWriteBarriersOrCannotHoldGc kind bound to Gc or something like that. Is there any real use case for this?
It would either need to stop being Pod, or Cell needs to also have a NonManaged bound. Overall, the issue is that write barrier functionality must be where you can get &muts, since otherwise you cannot safely request to borrow without write barriers (and in fact your code makes such functionality unsafe), which seems unacceptable. So the other alternative is to add a Freeze bound to Gc and add a GcMut, but that's less flexible, and it appears the plan is to move away from that model towards using Cell and RefCell inside "immutable" types. |
This occurred to me as well. I plead total ignorance. How was the planned "conservative on the stack, precise on the heap" GC going to do this? With headers on all heap objects?
Can SROA take something off the stack entirely and put it in a register? Would this GC handle that? EDIT: Would it be unreasonable to have the compiler explicitly generate code to call |
On Wed, Jan 8, 2014 at 2:38 PM, Gábor Lehel [email protected] wrote:
Sure, the LLVM is even free to remove calls to |
My current plan was storing the information in the table in the GC. (I.e. expand the
Is this a serious suggestion? In any case, I'm not too concerned by the (as yet hypothetical) write barriers for now:
|
@glaebhoerl I've been thinking about precision/tracing, and I get the feeling that your suggestion does actually work (I'd been thinking kinda-similar thoughts, but it was late and I got distracted by your mention of stack precision): trait Trace {
fn trace(&self, gc_info: &mut GcInfo);
}
impl<T: Trace> Trace for Gc<T> {
fn trace(&self, gc_info: &mut GcInfo) {
if gc_info.mark_reachable(self) { // returns true if already traced
x.borrow().trace(gc_info);
}
}
}
impl<T: Trace> Trace for RefCell<T> {
fn trace(&self, gc_info: &mut GcInfo) {
self.value.trace(gc_info);
}
}
impl Trace for int { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for *T { fn trace(&self, _: &mut GcInfo) {} }
// and similarly for the other basic types (I don't think we can/should
// impose any particular tracing semantics on `*`?)
// these are registered as roots for (precise) scanning separately, or have
// been registered to be included in any conservative scans (and have
// proper impls here). (The latter is probably better; see below.)
impl<T> Trace for Uniq<T> { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for Vec<T> { fn trace(&self, _: &mut GcInfo) {} }
impl<T> Trace for Rc<T> { fn trace(&self, _: &mut GcInfo) {} }
// etc. Then we could have a deriving mode that takes #[deriving(Trace)]
enum Foo {
X(Gc<int>, Vec<Gc<int>>)
Y(int, int)
Z(int, Gc<Vec<int>>)
} and generates (after inlining & removing the no-op methods) impl<T: Trace> Trace for Gc<T> {
fn trace(&self, gc_info: &mut GcInfo) {
match *self {
X(a, _) => { a.trace(gc_info) }
Y(_, _) => {}
Z(_, b) => { b.trace(gc_info) }
}
}
} And then We could make Re Also In any case, this seems significantly more feasible than I first thought; I'll experiment. (Also, I'm not sure how this API would work if we were to try to support other tracers, rather than just the one in libstd, I guess |
I was thinking Why can't you
Ah right. In my head I was using my earlier-proposed idea to have
What do you mean here exactly? This seems logical to me (e.g.
If we do this I think an |
It would be possible (see the "Re Uniq, and Vec and so on" paragraph my previous comment). In fact, just thinking about it now, it is necessary. My current implementation is actually incorrect for something like struct X {
x: Gc<Uniq<RefCell<X>>>
} The
Something different. Smart pointers and generic data stuctures that can contain GC'd values will need to be able to register themselves with the garbage collector, passing in their tracing info (e.g. a function pointer to something wrapping the fn run_tracing<T: Trace>(x: *(), gc_info: &mut GcInfo) {
unsafe { (*(x as *T)).trace(gc_info) }
}
impl<T> Uniq<T> {
fn new(x: T) -> Uniq<T> {
// pseudo-Rust
let ptr = malloc(size);
*ptr = x;
// get the appropriately monomorphised version of `run_tracing`
register_with_gc(ptr, run_tracing::<T>);
Uniq { ptr: ptr }
}
} The problem is the My goal above was to be fancier/work-around the type system: e.g. a pointer to a function equivalent to #[lang="trace_traceable"]
fn trace_traceable<T: Trace>(start: *(), end: *(), gc_info: &mut GcInfo) {
unsafe { (*(start as *T)).trace(gc_info) }
}
#[lang="trace_nontraceable"]
fn trace_nontraceable<T>(start: *(), end: *(), gc_info: &mut GcInfo) {
gc_info.conservative_scan(start, end)
} where the compiler uses the first where possible (i.e. in the tydescs of types with Trace impls) and the second to make up the slack (i.e. in the tydescs of types without Trace impls). In any case, this tracing stuff seems pretty similar to Another problem I just thought of (which is completely obvious in hindsight): when we are scanning the stack conservatively and see a pointer to a
I'm leaning toward 1 as the solution that has the least impact: it only affects vectors of managed things stored directly on the stack (or in other such vectors), as soon as you're behind, e.g, a smart pointer, you can trace precisely (so |
Ultimately, the only solution is to have a fully precise GC, including the stack. There seems to be already support for that in LLVM through the shadow stack plugin (albeit with some performance degradation due to explicit bookkepping and need to keep roots on the stack and not in registers), and there is work on something that can work with no performance degradation at https://groups.google.com/forum/#!topic/llvm-dev/5yHl6JMFWqs And in fact the latter appears to be already experimentally available in LLVM 3.4 according to http://www.llvm.org/docs/StackMaps.html#stackmap-section Anyway, as long as trait objects without Send or NonManaged bounds are not used, the Rust compiler has perfect knowledge of whether any value holds Gc or not, so since trait objects are supposed to be rare (especially those without such bounds), even a precise GC with suboptimal performance should not really impact non-GC code much. |
the llvm-dev thread rooted here may also be relevant/of interest: http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066782.html Note in particular the discussion of Bartlett mostly-copying gc
(Update: ah, indeed, the google groups link that bill-myers provided is to a propsal that came after (and in response to) the thread I linked above.) |
Essentially, yes, that was my plan (that, in addition to some way to map from interior pointers to the start of an object). |
@pnkfelix: What about heap objects without managed values? Does rooting borrowed pointers require adding overhead to all heap allocations? |
@thestinger my plan was that ~-allocated objects that could contain references to managed-refs or borrowed-refs would need a header. (I'll need to double check about ~-allocated objects whose references are solely to other ~-allocated objects.) |
@pnkfelix: So won't this require an extra branch in every unique trait object that's not |
I don't understand your question. There will be extra branches in some places, but I don't see what trait objects would have to do with it ... I was planning on storing all such header information at a negative offset from the referenced address, so that the object layout would look the same from the view point of client code accessing the state of a ~-allocated object. |
(or maybe @thestinger is talking about the trait objects that store a |
Do borrowed refs need a header? Doesn't the borrow freeze the original value in-place and hence the GC will trace it through that value? |
The references themselves should not need any header. I think the sentence I wrote was unclear. The point I was trying to make was just that: Under the scheme I was envisaging (a variant of Bartlett), an ~-allocated objects of type ~T would need a header, unless rustc can statically prove that any instance of ~T would not need to be traced for the GC. What sort of header would the ~T need? At minimum, just the extent of the object (and you would conservatively scan the ~T in the same way that you would the stack). Of course, we can provide a more precise header (and may want to do so). |
(a clarification: by "need a header", I am really thinking "will need meta-data somewhere tracking extent, and/or type, etc"; a side-table might be a better choice, or even an outright necessity, for some cases. Either way, whether its a side-table or a header at a negative-offset, this is all stuff just for the Gc, not for the object to use.) |
I've pushed two commits that turn this into a tracing garbage collector. (Conservative on the stack, precise on the heap.) Unfortunately, it has some drawbacks, like forcing
|
On Wed, Jan 08, 2014 at 07:32:41AM -0800, Huon Wilson wrote:
When we last discussed this, we did plan to have |
@huonw I've been thinking about this paradox:
Can you pinpoint where the contradiction is? I get the feeling that I'm using circular logic in there somewhere. Maybe there should be two separate (Is this the same thing as the The key to unlocking this might also be in the particulars of how 2. happens, which isn't clear to me either. (I also had an idea w.r.t a primitive method for precisely scanning the stack that doesn't rely on LLVM (but does require compiler support) that I edited into the end of one of my previous comments, did you catch it?) |
@glaebhoerl I think I've actually been thinking a similar thing to you (and have finally worked out that I'm on approximately the same page as you). If
Even symbolically like that, it's not exactly obvious to me where that breaks down.
I think the two traits you mention are basically equivalent to
I didn't catch it (but I have now). I'd guess that something like that is the simplest way to do it, assuming that stops LLVM from optimising out the references. (As @nikomatsakis implies, the Mozilla employees (who've more experience in the GC space that I anyway, and more experience hacking on the compiler) have been thinking about this on and off for a while, so I assume they may have solutions for many of these problems.)
The API current exposed here is designed to have a write barrier on (Although, if they were privileged, |
@pnkfelix and I had some discussion on IRC about the fact that the current collection policy (i.e. deciding when to run a collection; at the time of writing, it is just after some fixed number of allocations) is very subpar, causing quadratic behaviour---the number should be adjusting for heapsize. (Noting this here for posterity, and so I don't forget to fix it.) |
Me too, but can you think of a better plan? The one I described is the only one I managed to arrive at which actually holds together. I don't see any promising leads for a potentially better one, but that doesn't mean there isn't one. W.r.t. a copying/moving collector (it's good to see we've been thinking about the same things again!), the thought I keep returning to is that the garbage collector would have to be made to "obey the borrow checker", i.e. to not move things around if it would violate the contracts of any outstanding loans. I have very little idea about how (or whether) this could happen, though. (There's potentially two parts of it: one is moving an object in the heap and leaving an indirection to it in its former location. This might be handled by changing the representation of a Gc box (to enum of indirection or value etc), and potentially storing bit flags like Cell to know if there's any loans. The harder-seeming part is cleaning up the indirection and rewriting references to point to the new object... for this you would have to know that there aren't any loans of the Gc references themselves (as opposed to their contents).)
Right... my thought was that the parts which relate to the properties and representation of types would be all be handled by the given type's trace glue (i.e. instead of storing data that "this type is enum, it has a discriminant of size S, and members of type either A, B, and C or X, Y, and Z" for the GC to interpret, the |
At the very least, by tracing through Another option is to actually rewrite
I assume you're talking about
The only other one I can think of that I think has any hope of working is the one that is similar to extern "rust-intrinsic" {
/// Trace the value for the GC, calling a `Trace` impl and[/or]
/// the auto-generated tracing glue.
fn trace<T>(value: T, tracer: &mut Tracer);
}
fn foo(t: &mut Tracer) {
// ok
trace(0i, t);
trace(Gc::new(Gc::new(1)), t);
// error: trace called on a raw pointer that contains a managed type
trace(0 as *Gc<int>, t);
} But, as I said above, this will have similar errors to transmute, where generic instantiations of functions from other crates can cause errors (similar to C++, too). |
This is kind of what I was thinking. I don't suppose there's any way that the compiler could relay the static information that it has about borrows/loans to the garbage collector, and that this information could be sufficient? (I have a hard time wrapping my head around this, but my suspicion is no.)
Right. Any time the garbage collector disregards the borrow checker, I think you need to have a very strong reason why it's safe for it to do that, when it's not safe for other code to do so.
Yes, or more likely a borrow of some larger structure with All this talk of rewriting pointers and references where it's not expected has me wondering whether we couldn't, or even shouldn't, adopt SpiderMonkey's solution involving double indirection. After all, they're facing very similar constraints: a managed heap hosted in a non-managed environment. Their rooting API also seems to be very similar to the suggestion I had for how to do precise tracing of the stack. So essentially in this scheme our
Just to see if I'm understanding you correctly: is the idea that the transmute-like error would be generated if a type contains a raw pointer, which the automatically generated trace glue would attempt to trace? And that to resolve it you'd have to write your own In the context of my earlier plan, i.e. not the afore-discussed transmute-like one, what about this solution: a lint which warns (or errors) if a type contains a raw pointer potentially referencing managed data, UNLESS any of the following are true:
For a type like This would still be best-effort (of course, I don't think you can make any guarantees around raw pointers), but it seems less error-prone than the prior plan. |
@glaebhoerl @huonw regarding this issue: "W.r.t. a copying/moving collector (it's good to see we've been thinking about the same things again!), the thought I keep returning to is that the garbage collector would have to be made to "obey the borrow checker", i.e. to not move things around if it would violate the contracts of any outstanding loans." In the design I have been envisaging, this would fall semi-naturally out of a Bartlett style mostly-copying GC, with a few caveats. Note that currently in
I do not know if this answers your question about how to integrate with the borrow checker, but it was enough to convince me that we could consider doing a moving collector that even allowed for (some) (The reason that I want to ensure that we handle updating the |
I didn't address native |
At the very least, we have to convey the actual value of the pointers that are borrowed so the GC knows which they are. (I imagine some sort of static information may be possible with precision everywhere... not really sure.)
Yes, that's the idea, but I think we could still run the generated trace glue with a manual Trace impl: basically struct NoTraceImpl<T>(*T);
struct HasATraceImpl<T>(T);
impl<T> Trace for HasATraceImpl<T> { ... }
// errors:
trace(raw_ptr, ...)
trace(NoTraceImpl(raw_ptr), ...);
trace(~NoTraceImpl(raw_ptr), ...); // `~` calls `trace` on its interior
// ok
trace(HasATraceImpl(NoTraceImpl(raw_ptr)), ...);
trace(~HasATraceImpl(NoTraceImpl(raw_ptr)), ...);
trace(HasATraceImpl(~NoTraceImpl(raw_ptr)), ...); To be clear, in my mind, all this work with raw pointers is just designed to make it harder to do the wrong thing, it's not particularly fundamental to the semantics of the GC, it's just something that users should avoid making mistakes on, and it'd be nice if To this end, a lint makes sense, but I think it could be rather difficult to do properly, since you may have the tracer for a raw pointer on a different type, i.e.: struct Inner {
ptr: *X
}
pub struct Outer {
priv inner: Complicated<Wrappers<Inner>>
}
impl Trace for Outer { ... } Of course, peculiar wrappers like this may just be where we say "you're doing weird things, and the compiler can only be so smart; you're on your own, be careful".
(FWIW, the code in this PR currently does this.) Does this include conservatively scanning Also, on |
I might be misunderstanding something: wouldn't the compiler generate trace glue for
We're on the same page.
In this example, the compiler would complain about (That is, my thinking was that it wouldn't try to "look inside" anything, and would raise the error at the type which actually contains the What I'm not so clear about is how something like
The compiler's information about borrows is per function, since of course function bodies are typechecked separately. So to again start from the least sophisticated solution that could possibly work, what the compiler could do is insert calls to What's not clear to me yet is the case where the function takes out a loan on some larger structure, but the interior of that structure (which is thereby also borrowed) is also reachable through other means. First of all, is this possible? (Does the borrow checker allow it? Does the compiler have some notion of active vs. inactive pointers which the GC doesn't (yet)?) Second of all, if it's possible, could the GC detect it in a reasonable way? And of course the other big question is what the GC would actually do with all this information about active borrows that we'd be giving it.
|
No, that's the "magic". There's two concepts under that plan: the actual trace glue and the calling of the trace glue, (normally) done via some intrinsic (I called it Semantically, creating the trace glue for types without a As an example, say we have impl<T> Trace for Uniq<T> {
fn trace(&self, t: &mut Tracer) {
unsafe { trace_checked(&*self.ptr, t) }
}
} i.e. just tracing its contents. Then the trace glue for // explicit Trace impl:
call trace_checked::<T> // contents
// autogenerated:
call trace_glue::<*mut T> // noop For // explicit Trace impl:
call trace_checked::<*T> // error: calling trace_checked on *T
// autogenerated:
call trace_glue::<*mut *T> // noop As a different example, say we have // explicit Trace impl: <no impl>
// autogenerated:
call trace_checked::<Uniq<int>> // code above
call trace_checked::<*int> // error: calling trace_checked on *T The difference is the call of [1]: I don't know how this would interact with trait objects. I guess the trace glue in a trait object would created as if it were a call to
My point is that is a case when it shouldn't complain about Inner, because
Maybe something like Also, re spidermonkey-style double-indirection, I'd thought about it briefly, but didn't really pursue it at all, since we actually have control of the compiler & language and the ability to put in the appropriate hooks to make that unnecessary, in theory. (I'd assumed that avoiding two layers of pointers is a Good Thing, without considering the benefits of such an approach in depth.) |
This is a game of trade-offs. The word "possible" is very broad. I do not doubt that it is "possible" to implement a completely precise GC for Rust. But that does not imply that it would be the right engineering decision for the short term (and perhaps not even for the long term). Largely I keep pushing for a Mostly-Copying Collector due to ease of implementation: to get a tracing GC into Rust ASAP, I do not want to spend more time than necessary wrestling with LLVM. I am under the impression that LLVM's API for precise stack scanning may be in a flux in the near term, so that's a distraction, and I do not want to spend time finding out what the performance overheads are of its current API. I'm not saying that I would immediately veto a fully-precise collector. I just think it would be a mistake to build full-precision in as an up-front requirement for the GC. Anyway, there are other reasons that a Mostly-Copying GC could be preferable. I continue to stress: this is all about trade-offs:
I think my mindset coincides with Filip Pizlo's on this matter; see the LLVMdev emails I linked in another comment above. In particular this one: "I do not mean to imply that accurate GC shouldn't be considered at all. It's an interesting compiler feature." I'll repeat a few links that I found again while double-checking my work writing this comment: |
@pnkfelix Thanks for the pointers, I've finally borrowed some time to start following them. Below are my notes. I did not give them very thorough readings, but only attempted to understand the most important aspects of each. Bartlett's 1989 paper explains mostly-copying GC and how to extend it to a generational collector. The key idea is that besides the area of memory they reside in, whether a page is in old-space or new-space can also be tracked by a field stored in the page. The stack is scanned conservatively, and pages pointed into are "moved" to new-space by updating their space fields. In this way the address of objects which may-or-may-not be referenced from the stack is unchanged. After this, objects referenced from these pages are physically copied into new-space. (This requires those objects to be self-identifying, e.g. to have headers.) Appel's paper is about GC in Pascal and ML without headers or tags on objects. This is accomplished by the compiler generating tables containing type information (layout etc) for the garbage collector, which traverses these tables in parallel with the traced objects to keep track of types. The type of roots is determined by associating this information with the functions containing them, which is looked up by the GC based on the return-address pointer, and in the case of polymorphic functions, walking up the stack until the function where the type variables are instantiated is found. It also has a section on how to do breadth-first copying collection with this scheme. Goldberg 91 seems closer to what I was thinking about: here garbage collection routines are generated by the compiler for each type. He takes it a step further, and borrowing Appel's idea of associating GC information with functions, also generates GC routines per function to trace its stack variables (and in fact a different routine for each point in the function where different variables are live). There is also analysis to avoid doing this when a function doesn't allocate and therefore cannot trigger collection. Tracing polymorphic values (here all polymorphism is with boxing) is accomplished by parameterizing the tracing routine for each polymorphic function and type over the tracing routines of their type variables, which during GC are passed in by the tracing routine for the containing type/stack frame. There is also a section on extending this to parallel programs which I skipped. Goldberg 92 is about how to extend tagless GC to copying incremental GC. Read about halfway, seems quite interesting, might be worth returning to later. (Also the pages are in reverse order, which is kinda weird.) Goldberg/Gloger 92 realize that tracing vptrs must also be stored for closures. For polymorphic closures there seems to be a problem, because there is no way to go back to the point where the closure was created to find out what types it is closing over. They then realize that by parametricity, if a closure is polymorphic over some types, then its code cannot follow pointers to those types, therefore doing this is actually unnecessary: any values whose type the GC cannot establish can only be garbage. Huang et al 04 appears to be about using JIT technology to help the Java GC optimize locality based on the dynamic execution of the program. I'm not sure which part of this is relevant here? Cheadle et al 04 is about using code specialization to reduce the heap usage of their incremental GHC/Haskell garbage collector. As Haskell is lazily evaluated, heap objects in GHC contain a pointer to "entry code" which, if the object is an unevaluated thunk, points to code which evaluates it, stores the result, and then overwrites the entry code pointer with a different one which merely returns the stored result. Objects are accessed by the mutator "entering" them in this way. As this is already a read barrier, they take advantage of the same mechanism for incremental GC to scavenge the object (copy its referents from old-space to new-space) when the mutator "enters" it. In their earlier version, this necessitated storing an additional pointer to the original non-scavenging entry code in each heap object, so that after scavenging the normal entry code can be used. In this paper they remove the need for this additional pointer by instead generating specialized scavenging and non-scavenging versions of each function, and making the entry code pointer point to the appropriate one. Except for perhaps their measurements I'm not sure how relevant all of this is to us. They do observe that closures are just trait objects:
I haven't looked at the LLVM links yet, I'll have to get back to those later. |
@pnkfelix most of your concerns w.r.t. precise tracing appear to be about tracing the stack. I'll re-suggest my earlier suggestion here: the compiler could generate code in each function to register and deregister those stack variables as roots (in the form of their address plus the trace glue vptr corresponding to their type) which may possibly contain references to managed data. While this may not be optimal for mutator performance when using GC, it also avoids relying on LLVM and imposes no cost on code which is known not to require GC. I also had a similar idea for identifying (and presumably pinning?) objects which have been borrowed. In both cases, there is also the advantage that besides the compiler generating these calls automatically for safe code, they would also be available for unsafe code to invoke manually where appropriate. The reason I was wondering what advantages semi-precise GC has (in other words, what the other side of the tradeoffs is) is that one of our main objectives, which is not putting a burden on code which doesn't use GC, seems (to me) to be easier to achieve with precise collection. (Which is not so surprising if you consider that precise collection involves knowing what not to trace.) @huonw Gah, I still need to process your last comment as well. :) I'll get around to it. |
@glaebhoerl you said "most of your concerns w.r.t. precise tracing appear to be about tracing the stack": (update: I was being a bit flip in the previous comment. I do realize you've been talking about trying to do a heavily type-driven tracing gc, which would imply that you really do need precise knowledge of the types of one's registers and stack slots in order to properly drive the tracing itself, at least if one wants to do it without any headers, and that would be impossible in a conservative stack scanning setting. I just am not yet convinced that this is a realistic option for us. Maybe I am too pessimistic.) There is plenty of precedent for having the compiler emit code to automatically maintain a root set (or a shadow stack of roots, register/deregister, etc.) I think the LLVM-dev email thread I linked to earlier has some pointers to related work here. As you said, its not a technique thats known to be terribly performant. I'll have to go back through your comments on this thread, as I feel like I must have misunderstood something in the line of discussion here. |
You can have optimal performance by generating tables that tell you, for every instruction that can potentially trigger GC (i.e. all function calls), which registers or locations on the stack cointain GC roots, and what their type is. When GC is triggered you use unwind tables to find the IPs in all functions on the call stack, lookup the GC tables for all of them, and trace them. You can probably do that now on LLVM with http://www.llvm.org/docs/StackMaps.html I think that ultimately this is the only option for Rust that would be acceptable for "production": conservative scanning is conceptually really ugly, cannot scan datatypes that hide pointers but have a custom Trace implementation, can incorrectly keep dead objects alive forever, adds the overhead of having to annotate all allocations with type information and adds the overhead of a data structure that allows to lookup whether a value is a pointer to an object. |
@bill-myers you've said basically the same thing in your earlier comment. Out of curiosity:
In my opinion we need precise heap scanning but can live with conservative stack scanning for 1.0. I agree that the other drawbacks with conservative scanning you pointed out are present, at least partially (I may quibble about details), but I disagree about their impact. I do not think mostly-copying is conceptually ugly, and I stand my points about the trade-offs here that I made earlier. Looking over the LLVM stack map API, it says outright in the motivation section that the functionality is currently experimental. The only linked client of the API is FLT JIT, which is also experimental and disabled by default. So I stand especially by this comment I made earlier:
|
I realized something while reflecting on this dialogue. I think many of the participants here are focused on using precise stack scanning to enable type-driven heap tracing from the outset in the roots+stack. Notably, people haven't been discussing whether these tracing procedures are using a mark-sweep style GC or a copying GC, which makes sense, because that is an orthogonal decision once you assume that you are going to have 100% precise tracing on the roots and stack. Meanwhile, my focus has been on how to get a GC into Rust that makes use of techniques such as copying collection. I do not want to build in a mark-sweep GC as the only GC we deploy in 1.0, because I worry that end-users will then build the assumption that objects do not move into their own code, and we'll never get a relocating collector into the runtime once libraries deployed with that assumption become popular. (And Rust may not need a relocating collector; but since it might, I would prefer to start with one and then see whether it fails to pay for itself.) My recent realization is that these two ends: type-driven GC and a mostly-copying style GC, may not be at odds with one another. Assuming that we have precise type information for at least one GC reference stored on the stack (or in a ~[T] solely reachable from the stack), via a stack map or what-have-you: then that may be enough info to drive the GC in the manner that @glaebhoerl wants, while still allowing us to use a Bartlett style system that still pins all the objects immediately reachable from the stack (which I conservatively assume to be borrowed and have outstanding borrowed references) but allows relocation of the other heap-allocated objects. There would be no need, I think, to worry about borrowed-references to moving objects, which has been my primary motivation for focusing on mostly-copying GC. I admit: the above is just conjecture, I haven't thought it through completely. It may not address all of the drawbacks that @bill-myers pointed out. I guess my point is, I may have been mistakenly conflating "precise stack scanning" with "fully-moving GC", and the latter I have been treating as "too risky for 1.0." But I would be happy to adopt type-driven tracing in combination with mostly-copying GC. |
@huonw I think I finally understand the
In total you'd need three attributes, with the hard part being figuring out ergonomic bikeshed colors:
The latter two are rather different, but capturing that distinction in their names is another matter.
A good example might actually be the reverse: consider The restrictions on (a) seem like they could be satisfied with a I think my attitude is basically that it would be awfully nice to have the infrastructure for fully precise, type-based, Rustic tracing right from the start, in a way that minimizes (ideally all the way to 0) the impact on non-GC code, while the performance of the GC itself and GC-using code can be improved later as long as the semantics and programmer-facing parts of it are there. Hence why the compiler-inserted
Given that Rust's semantics guarantee memory safety, this should only be an issue for Also questions:
(I don't really mind what method is used, as long as it's compatible with the mentioned goals, so if "borrowing is pinning" is implemented by conservatively scanning the stack that's fine by me. I'm just trying to understand how it would work.)
Could you expand on this point? I'm not sure I grok it.
Is this because of having to rely on uncertain-at-best LLVM support, or some other reason? @bill-myers do you have any thoughts about the generated stack tracing code approach from Goldberg's 1991 paper? (This is kind of like treating a stack frame as a big struct and generating code to trace that type (not unlike how stack-closures-as-trait-objects can be interpreted), with the additional complication that different variables are live at different points. Of course that complication exists with stack maps as well.) |
I'll come back to it, but:
It's possible that |
Closing to clear up the queue. |
(I don't think this should impede us in continuing the discussion?) @huonw I noticed that of the "three attributes" from above, the second can be adequately expressed by writing an explicit no-op |
(No, it certainly doesn't) On Mon, Feb 3, 2014 at 2:58 AM, Gábor Lehel [email protected]:
|
Lints can examine attributes etc. on things, so it would be possible to have a
I'm probably misunderstanding(I'm interpreting that as "change the value that the
As long as we can guarantee we rewrite all references (and only rewrite actual references) as we move things, it seems reasonable for this to always be possible... although this means everything can change under the feet of a function (since the values can change due to some far removed (If so, it would have to apply to all
I think the |
Some GCs, I think generational collectors, do a thing where moves happen in two stages: in minor collections moved objects are rewritten with forwarding pointers (indirections) to their new location, and in major collections these are cleaned up by actually making all references point to the new location (after which the indirection becomes garbage). A simplistic formulation in Rust might be something like:
Clearly the contents can't be overwritten with an indirection as long as the object itself is borrowed.
I don't have specific arguments for this position yet (though perhaps you have just stated one), but my very strong feeling is that borrowed references should not be rewritten. It should work in the opposite direction: an
In this particular case yes, this is true. But in the case of |
@glaebhoerl First off, I'm inclined to agree that a I think this property (that I do not yet know how to handle doing the unpin exactly when the borrow expires, which is one reason strategy 2 worries me, although maybe we can do something that just keeps things pinned for longer than strictly necessary, I am not yet sure. Second, just an aside / FYI / head's up: I think you are conflating generational collectors with replicating incremental collectors. A relocating generational collector will move a subset of the objects (namely the live ones in the nursery) during a minor collection, but in my experience it is also responsible for updating all references to the moved objects before yielding control back to the mutator. (This is one purpose of a remembered set: to provide the collector with a narrowed subset of the fields that need to be updated when the objects in the nursery are moved, so that hopefully one will do less work than an scan of the entire heap in order to maintain this invariant.) In an incremental copying (i.e. relocating) collector, objects can be copied and then there may be two copies in existence while the mutator is allowed to run. Supporting a system where an object can be forwarded by the GC and then control returns to the mutator before all of the outstanding references have been updated in this strategy is often accomplished via a read-barrier (which is a non-starter for us IMO), although I think alternative schemes that still rely solely on write-barriers have been devised such as the one from Cheng and Blelloch (which I think is a variant of Nettles and O'Toole; and here's another contemporary paper by Blelloch and Cheng, I cannot recall offhand which one is most relevant here; in any case the write-barriers in these cases are far more expensive than the ones one typically sees in a normal generational collector.) (my apologies for not providing links that are not behind a paywall. If other people take the time to track down variant links to free pre-prints of those papers, please feel free to add those links to the end of the this comment, but please leave the acm links above alone to aid with finding cross-references.) |
@pnkfelix I was thinking in particular of GHC's collector, which does do the thing where indirections are cleaned up by the GC, but indeed, my hazarded guess that it has to do with generational collection was wrong: in fact the indirections are created when thunks are evaluated. (I had a surprising amount of trouble finding a good reference for this which does more than just mention it in passing, but see e.g. the "Graph Reduction: Thunks & Updates" section here.) So mea culpa, but it doesn't end up making much difference: if the object is moved by rewriting all references, the old location will become garbage, so we still can't allow it while borrowed references exist (as @huonw noted). (Constrast to a borrow not of the managed object, but of a
My thoughts here are still the same: I can see how this would straightforwardly handle most cases, but we need to handle all of them, and it's not obvious to me what the story for the remainder is. In particular I'm still concerned about two things:
Something like that. In the tradition of starting with a simple solution which is obviously correct, consider:
(I now regret using the word "obviously": it's not obvious to me that I got all of that right.) But in any case, the basic idea that just as with
Could you elaborate on what kind of mechanisms you were thinking about here? @huonw, re: interior references, I think my concerns can be distilled down to the fact that the borrow checker doesn't let you do it. Any time you take out a loan on the interior of an object, whether owned box, tuple, or |
@glaebhoerl I'll address your questions in reverse order
My assumption is either (1.) we require the owning pointer to the managed object to be kept alive on the stack for as long as it has outstanding I have more hands-on experience with (2) than (1), but either should be workable.
Let me see if I understand this scenario correctly. Any kind of value, including the smart-pointers holding the The main worrisome case that I can see here is But at that point we must be in the realm of user-defined tracers, no? (Or user-annotations on the types that yield tracers, etc; this is a topic I've been sidestepping since I wanted to get the basics working first.) Or are you talking about the pointer to malloc'ed memory then itself being transmuted to a I continue to worry that I jump into responding to your questions while being unsure that I actually understand the scenarios you are describing... |
�(also I am fully aware that @glaebhoerl has posed questions to me that I have not answered. That's mostly because it takes too long for me to come up with concise answers, while lengthy answers make this long thread even more unmanageable. I wonder whether there is a better forum for us to carry on this discussion...) |
Not ready for merging.
Summary
Tracing non-generational task-local GC, with stop-the-task non-incremental collections. The GC stores nothing inline in any values, and so doesn't need a header and a change of representation of generic types (like
@
does).Includes two new modules
std::{libvec, uniq}
for examples of library defined versions of~[]
and~
respectively, which use the rooting API defined instd::gc
to properly hold references to GC pointers (unlike~
,@
,~[]
and@[]
). (These modules are mostly demonstrations, not necessarily designed for landing.)Note: when looking over this code, keep in mind I have pretty much no idea what I'm doing, so if something seems stupid, unconventional or silly; it almost certainly is.
Details
This adds a
#[managed]
annotation and an intrinsicreachable_new_managed::<T>() -> bool
(cf. the oldowns_managed
intrinsic, renamed toowns_at_managed
in this PR). The intrinsic is designed to check whether a type contains any#[managed]
annotated types, so that the library types can avoid touching the GC if they aren't storing GC'd pointersThe GC is conservative on the stack: when checking for GC'd references on the stack, it will scan every word and any bitpattern that matches the pointer to the start of a GC'd pointer or one of the other pointer types will be considered a valid region.
It does support finalisers, which are just run on the memory when a pointer is determined to be unused; so
Gc<T>
uses this to run the destructor ofT
(if it has one).The rooting API mentioned above is simply a function
register_root_changes<T>(removals: &[*T], additions: &[(*T, uint, TraceFunc)])
which lets us indicate that certain regions are no longer possibly rootingGc
s, and add regions(*T, uint, TraceFunc) == (start pointer, metadata, function to start tracing)
that possibly are rootingGc
s. The trick with this API that stops us having to scan everything is being generic: it knows the typeT
a pointer contains and statically if a certain typeT
can contain#[managed]
types via the intrinsic. So ifT
can't contain managed pointers, there's no need to register it with the GC, so all one needs to do to make things GC safe is (unconditionally) pass a pointer of the appropriate to the relevant memory regions and thestd::gc
library will automatically figure out (by calling the intrinsic) if it actually needs to register those regions. In particular, this means that programs that never use any GC'd types can have the GC code removed, because nothing will call it. (register_root_changes
will be inlined and reduce to a no-op for non-managed types.)The metadata mentioned above is arbitrary (not examined by the GC) and can be set with
update_metadata
, this is essentially just designed to allow storing the length of vectors for tracing.There are a few commits which act as an example for this API: I add the library vector type from strcat's rust-core, and then make the appropriate adjustments to make it GC safe (5 calls to
register_root_changes
), and also add an equivalent to~
,std::uniq::Uniq<T>
which is similarly GC-safe.Unfortunately to support tracing, these types (
Uniq
andVec
) require aTrace
bound on their contents, which is unfortunate, as they should be allowed to store non-tracable types if those types don't contain anyGc<T>
pointers.Problems
~
,@
,~[]
and@[]
do not act as roots. That is, having a pile ofGc
pointers such that the only reference to them is a~[Gc<int>]
will cause them to be considered unreachable and be garbage collected. As such, I've marked any method that could demonstrate a symptom of this problem (the various.borrow
s) asunsafe
. It's probably possible to do something with lang-items to get them to work... but, personally having them as library types seems simpler (look at how simple it was to add GC support to the two new modules: adding support toRc
would be as easy as it was to add toUniq
too.)register_root_changes
themselves to register one if they must)20x6.5x5x4x slower than straight malloc for a microbenchmark of just doing a lot of allocations) and memory hungry. Reasons:free
d (I'm working on caching unused allocations now)std::trie
is slow, and I'm using it wrong, and it's possibly not the best data structure.insert
andremove
, if allocations were cached and not removed from thetrie
this would help a lotuint
into chunks of 4 bits from the most significant bit, and mostmalloc
'd pointers agree on their first 30 bits, so we spend a lot of time just traversing that with no ability to distinguish between keys (very hard to fix, since the only way to get the correct order is to traverse in this manner; requires a path-compressing trie)unsafe
)Local::borrow(None::<Task>)
twice on every allocation (to retrieve and return the GC, see below) (althoughperf
indicates the vast majority of the time is spent in collection)Task
because that would be far worse than what I'm about to describe), which means that any finalisers that need to call into the GC (like those that need to unregister roots) will crash, in particular, a type likeGc<Vec<Gc<T>>>
will fail because theVec
destructor callsregister_root_changes
.fail!
-ing finalisers aren't considered at all, and also cause failure. (Both of these can hit the double-unwinding and cause an abort.)#[unsafe_destructor]
so it's the users "fault" if they crash due to thisGc
(all areGc<T> -> &T
):borrow
for onlyFreeze
types, that does not have any write barriersborrow_write_barrier
, the general borrow that does have a write barrier; although in theory the write barrier could be elided whenowns_new_managed::<T>
is false (since any writes couldn't add/change references toGc
pointers)borrow_no_write_barrier
; same as.borrow
, but implemented for allT
andunsafe
, designed for when someone is definitely sure they're going to be reading only, or are not going to be changing anyGc
references.std::{uniq, libvec}
have no documentation or tests (since they're just designed to exhibit the rooting API)Uniq
andVec
are less flexible that desirable (which flows downstream to any generic users of them) because they require aTrace
bound to be able to register a handle to run when discovered by a conservative scan. Possible solutions:.push
and.pop
forVec
!) but this would then require similar contortions downstream;TraceOrNonManaged
bound (which would also require downstream generic to have that bound)Gc
can only go intoTrace
types (a little liketransmute
requires types of the same size)call_trace_glue
intrinisic, but then we'd have to have some way to get the appropriate types and information into it).Because of this list (mainly the memory-unsafety problem with
~
etc not acting as roots), I've markedstd::gc::Gc
as#[experimental]
.