-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The great API simplification #9
Conversation
Here's a large scale adaptation of this API in rust-analyzer: rust-lang/rust-analyzer#449 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems fine to me, but as I said, I don't know much about unsafe
:)
migrated rust-analyzer to this API, everything seems to work fine! |
@flodiebold thanks for the review! |
published: https://crates.io/crates/rowan/0.2.0 |
55: Fix the s_expressions example r=CAD97 a=nickolay It was broken since the refactoring in PR #9 (v0.2.0) Co-authored-by: Nickolay Ponomarev <[email protected]>
This PR greatly
breakssimplifies the API by getting rid ofRefRoot / OwnedRoot
and leveraging native rust references instead.To explain what's going on, it is helpful to start with the history of the current implementation.
Originally, the red-green tree approach was popularized by Roslyn. Red-Green solves allows one to have an immutable tree with cheap updates and with offests and parent pointers on nodes. At the first sight, this seems impossible: an offset of the node depends on the offsets of all of the previous nodes, so if you insert a new node at the beginning of the tree, then all offsets must be recalculated, resulting in O(N) performance. The same reasoning applies to parent links.
The sword to cut this knot is laziness. First, we build a green tree -- a usual immutable tree without parent pointers and with lengths instead of offsets (lengths can be maintained cheaply by summing lengths of children). Then, we build a lazy read tree. Each read node stores a green node, a parent red node, start offset and an array of red children which are calculated lazily. Traversing the tree is O(1): we either use a cached node (constant) or create a one new node (constant as well). When modifying the tree, we keep the green nodes, but tear down all the red nodes. This is still amortizing O(1) though, if we count traversals as well: to tear down a read node, you first need to traverse it.
Roslyn is implemented in C#, so their red-green relies on garbage collection for memory management.
The red-green implementation of red-green in C++ exists in Swift. To solve memory management, they added a third layer of
SyntaxNode
s. In Swifts libsyntax, green nodes are a usual immutable trees, red nodes own a green node and children, and store a raw pointer to the parent node.SyntaxNode
s bind together a reference-counted pointer to the root of the tree and a raw pointer to the red node. OnlySyntaxNode
is a public API, and it guarantees that the underling red-node is alive. To be more concrete, here are equivalent Rust type definitions:The problem with this approach is that traversing of the syntax tree constantly creates new
SyntaxNode
s, bumping theArc
. Ideally, reading the tree should just traverse the pointers, without write operations. It would also be cool to be able to work withCopy
syntax nodes! And that's where the old implementation ofrowan
comes from. The idea of old implementation is to make theroot
field of theSynaxNode
generic:root: R
, so that R can be either&'a RedNode
or aArc<RedNode>
. That way, you'll get to chose between owned version (arc bumps, .clone required, no lifetimes) and borrowed version (cheap, Copy, requires lifetime annotations). Unfortunately, being generic over ownership is not the most pleasant thing to do in Rust, and the resulting API was quite mind-bending and verbose.This new API started from the desire to leverage usual reference and use
&'a SyntaxNode
for borrowed variant andSyntaxNode
for owned variant. However, this doesn't quite work, due to the requirement of sharing data betweenSyntaxNode
s. A similar situation happens if you are writing an application, and you have someCtx
struct which you'd like to store behind theArc
. If you do this, some parts of the app will work withArc<Ctx>
and some with&'a Ctx
: note how owned variant have a smart pointer around. So, in this version a borrowed node looks like&'s SyntaxNode
and an owned one asTreePtr<SyntaxNode>
.Specifically, we merge
RedTree
andSyntaxNode
into one struct, and add aroot: *const TreeRoot
field to anode
. Physically, thisroot
is always stored inside anArc
. An ownedTreePtr
stores a*const SyntaxNode
inside, but it bumps the root's Arc on creation/clone. That is,TreePtr<T>
is basically andArc
to the tree root, except that it points not to the root itself, but to some node within a tree. This "manual" management of ref counts forms one bit of unsafely here.The other bit of unsafely comes from the desire to be able to use
TreePtr
not only with rawSyntaxNode
s, but with ast wrappers as well. Becauseast
and raw syntax use exactly the same binary representation (it's only compile-time types that differ), there's actually no implementation required for this to work: just some casts. So,TransparentNewType
trait establishes this "it's safe to cast between newtypes" relation.