Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The great API simplification #9

Merged
merged 16 commits into from
Jan 8, 2019
Merged

The great API simplification #9

merged 16 commits into from
Jan 8, 2019

Conversation

matklad
Copy link
Member

@matklad matklad commented Jan 7, 2019

This PR greatly breaks simplifies the API by getting rid of RefRoot / OwnedRoot and leveraging native rust references instead.

To explain what's going on, it is helpful to start with the history of the current implementation.

Originally, the red-green tree approach was popularized by Roslyn. Red-Green solves allows one to have an immutable tree with cheap updates and with offests and parent pointers on nodes. At the first sight, this seems impossible: an offset of the node depends on the offsets of all of the previous nodes, so if you insert a new node at the beginning of the tree, then all offsets must be recalculated, resulting in O(N) performance. The same reasoning applies to parent links.

The sword to cut this knot is laziness. First, we build a green tree -- a usual immutable tree without parent pointers and with lengths instead of offsets (lengths can be maintained cheaply by summing lengths of children). Then, we build a lazy read tree. Each read node stores a green node, a parent red node, start offset and an array of red children which are calculated lazily. Traversing the tree is O(1): we either use a cached node (constant) or create a one new node (constant as well). When modifying the tree, we keep the green nodes, but tear down all the red nodes. This is still amortizing O(1) though, if we count traversals as well: to tear down a read node, you first need to traverse it.

Roslyn is implemented in C#, so their red-green relies on garbage collection for memory management.

The red-green implementation of red-green in C++ exists in Swift. To solve memory management, they added a third layer of SyntaxNodes. In Swifts libsyntax, green nodes are a usual immutable trees, red nodes own a green node and children, and store a raw pointer to the parent node. SyntaxNodes bind together a reference-counted pointer to the root of the tree and a raw pointer to the red node. Only SyntaxNode is a public API, and it guarantees that the underling red-node is alive. To be more concrete, here are equivalent Rust type definitions:

struct RedNode {
    offest: usize,
    parent: *const RedNode,
    children: Vec<Lazy<RedNode>>,
}

struct SyntaxNode {
    root: Arc<RedNode>,
    node: *const RedNode,
}

The problem with this approach is that traversing of the syntax tree constantly creates new SyntaxNodes, bumping the Arc. Ideally, reading the tree should just traverse the pointers, without write operations. It would also be cool to be able to work with Copy syntax nodes! And that's where the old implementation of rowan comes from. The idea of old implementation is to make the root field of the SynaxNode generic: root: R, so that R can be either &'a RedNode or a Arc<RedNode>. That way, you'll get to chose between owned version (arc bumps, .clone required, no lifetimes) and borrowed version (cheap, Copy, requires lifetime annotations). Unfortunately, being generic over ownership is not the most pleasant thing to do in Rust, and the resulting API was quite mind-bending and verbose.

This new API started from the desire to leverage usual reference and use &'a SyntaxNode for borrowed variant and SyntaxNode for owned variant. However, this doesn't quite work, due to the requirement of sharing data between SyntaxNodes. A similar situation happens if you are writing an application, and you have some Ctx struct which you'd like to store behind the Arc. If you do this, some parts of the app will work with Arc<Ctx> and some with &'a Ctx: note how owned variant have a smart pointer around. So, in this version a borrowed node looks like &'s SyntaxNode and an owned one as TreePtr<SyntaxNode>.

Specifically, we merge RedTree and SyntaxNode into one struct, and add a root: *const TreeRoot field to a node. Physically, this root is always stored inside an Arc. An owned TreePtr stores a *const SyntaxNode inside, but it bumps the root's Arc on creation/clone. That is, TreePtr<T> is basically and Arc to the tree root, except that it points not to the root itself, but to some node within a tree. This "manual" management of ref counts forms one bit of unsafely here.

The other bit of unsafely comes from the desire to be able to use TreePtr not only with raw SyntaxNodes, but with ast wrappers as well. Because ast and raw syntax use exactly the same binary representation (it's only compile-time types that differ), there's actually no implementation required for this to work: just some casts. So, TransparentNewType trait establishes this "it's safe to cast between newtypes" relation.

@matklad
Copy link
Member Author

matklad commented Jan 7, 2019

Here's a large scale adaptation of this API in rust-analyzer: rust-lang/rust-analyzer#449

Copy link
Member

@flodiebold flodiebold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine to me, but as I said, I don't know much about unsafe :)

src/imp.rs Outdated Show resolved Hide resolved
src/imp.rs Outdated Show resolved Hide resolved
src/imp.rs Outdated Show resolved Hide resolved
src/imp.rs Outdated Show resolved Hide resolved
src/imp.rs Show resolved Hide resolved
src/imp.rs Show resolved Hide resolved
@matklad
Copy link
Member Author

matklad commented Jan 8, 2019

migrated rust-analyzer to this API, everything seems to work fine!

@matklad matklad merged commit 9be5641 into master Jan 8, 2019
@matklad
Copy link
Member Author

matklad commented Jan 8, 2019

@flodiebold thanks for the review!

@matklad matklad deleted the new-impl branch January 8, 2019 09:02
@matklad
Copy link
Member Author

matklad commented Jan 8, 2019

published: https://crates.io/crates/rowan/0.2.0

bors bot added a commit that referenced this pull request Apr 23, 2020
55: Fix the s_expressions example r=CAD97 a=nickolay

It was broken since the refactoring in PR #9 (v0.2.0)

Co-authored-by: Nickolay Ponomarev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants