The great API simplification #9

matklad · 2019-01-07T11:16:53Z

This PR greatly ~~breaks~~ simplifies the API by getting rid of RefRoot / OwnedRoot and leveraging native rust references instead.

To explain what's going on, it is helpful to start with the history of the current implementation.

Originally, the red-green tree approach was popularized by Roslyn. Red-Green solves allows one to have an immutable tree with cheap updates and with offests and parent pointers on nodes. At the first sight, this seems impossible: an offset of the node depends on the offsets of all of the previous nodes, so if you insert a new node at the beginning of the tree, then all offsets must be recalculated, resulting in O(N) performance. The same reasoning applies to parent links.

The sword to cut this knot is laziness. First, we build a green tree -- a usual immutable tree without parent pointers and with lengths instead of offsets (lengths can be maintained cheaply by summing lengths of children). Then, we build a lazy read tree. Each read node stores a green node, a parent red node, start offset and an array of red children which are calculated lazily. Traversing the tree is O(1): we either use a cached node (constant) or create a one new node (constant as well). When modifying the tree, we keep the green nodes, but tear down all the red nodes. This is still amortizing O(1) though, if we count traversals as well: to tear down a read node, you first need to traverse it.

Roslyn is implemented in C#, so their red-green relies on garbage collection for memory management.

The red-green implementation of red-green in C++ exists in Swift. To solve memory management, they added a third layer of SyntaxNodes. In Swifts libsyntax, green nodes are a usual immutable trees, red nodes own a green node and children, and store a raw pointer to the parent node. SyntaxNodes bind together a reference-counted pointer to the root of the tree and a raw pointer to the red node. Only SyntaxNode is a public API, and it guarantees that the underling red-node is alive. To be more concrete, here are equivalent Rust type definitions:

struct RedNode {
    offest: usize,
    parent: *const RedNode,
    children: Vec<Lazy<RedNode>>,
}

struct SyntaxNode {
    root: Arc<RedNode>,
    node: *const RedNode,
}

The problem with this approach is that traversing of the syntax tree constantly creates new SyntaxNodes, bumping the Arc. Ideally, reading the tree should just traverse the pointers, without write operations. It would also be cool to be able to work with Copy syntax nodes! And that's where the old implementation of rowan comes from. The idea of old implementation is to make the root field of the SynaxNode generic: root: R, so that R can be either &'a RedNode or a Arc<RedNode>. That way, you'll get to chose between owned version (arc bumps, .clone required, no lifetimes) and borrowed version (cheap, Copy, requires lifetime annotations). Unfortunately, being generic over ownership is not the most pleasant thing to do in Rust, and the resulting API was quite mind-bending and verbose.

This new API started from the desire to leverage usual reference and use &'a SyntaxNode for borrowed variant and SyntaxNode for owned variant. However, this doesn't quite work, due to the requirement of sharing data between SyntaxNodes. A similar situation happens if you are writing an application, and you have some Ctx struct which you'd like to store behind the Arc. If you do this, some parts of the app will work with Arc<Ctx> and some with &'a Ctx: note how owned variant have a smart pointer around. So, in this version a borrowed node looks like &'s SyntaxNode and an owned one as TreePtr<SyntaxNode>.

Specifically, we merge RedTree and SyntaxNode into one struct, and add a root: *const TreeRoot field to a node. Physically, this root is always stored inside an Arc. An owned TreePtr stores a *const SyntaxNode inside, but it bumps the root's Arc on creation/clone. That is, TreePtr<T> is basically and Arc to the tree root, except that it points not to the root itself, but to some node within a tree. This "manual" management of ref counts forms one bit of unsafely here.

The other bit of unsafely comes from the desire to be able to use TreePtr not only with raw SyntaxNodes, but with ast wrappers as well. Because ast and raw syntax use exactly the same binary representation (it's only compile-time types that differ), there's actually no implementation required for this to work: just some casts. So, TransparentNewType trait establishes this "it's safe to cast between newtypes" relation.

matklad · 2019-01-07T13:22:27Z

Here's a large scale adaptation of this API in rust-analyzer: rust-lang/rust-analyzer#449

flodiebold

It seems fine to me, but as I said, I don't know much about unsafe :)

src/imp.rs

matklad · 2019-01-08T09:02:38Z

migrated rust-analyzer to this API, everything seems to work fine!

matklad · 2019-01-08T09:02:51Z

@flodiebold thanks for the review!

matklad · 2019-01-08T09:04:16Z

published: https://crates.io/crates/rowan/0.2.0

55: Fix the s_expressions example r=CAD97 a=nickolay It was broken since the refactoring in PR #9 (v0.2.0) Co-authored-by: Nickolay Ponomarev <[email protected]>

matklad added 14 commits January 7, 2019 03:03

new implementation

c3f719d

make TreePtr clone

d2c3ed9

parametrize TreePtr with type

aac5ced

move impl to impl

4d6a55f

fix vis

8188649

format

5e5af1a

move safe code out of mip

d791a09

Make Repr assoc type to make Clone work

e5150d8

reshuffle bounds

e01137d

cast is assoc function

c482360

fix-tests

b22d4b6

tweak where clauses

a7d3d76

comment unsafe code

4a76a40

bump-major-version

688693c

This was referenced Jan 7, 2019

Simplify sytax trees API rust-lang/rust-analyzer#448

Closed

switch to new rowan API rust-lang/rust-analyzer#449

Merged

matklad mentioned this pull request Jan 7, 2019

RFC: libsyntax2.0 rust-lang/rfcs#2256

Closed

implement equality for TreePtr

6b50db0

flodiebold reviewed Jan 7, 2019

View reviewed changes

src/imp.rs Outdated Show resolved Hide resolved

src/imp.rs Outdated Show resolved Hide resolved

src/imp.rs Outdated Show resolved Hide resolved

src/imp.rs Outdated Show resolved Hide resolved

src/imp.rs Show resolved Hide resolved

src/imp.rs Show resolved Hide resolved

review fixed

c921101

matklad merged commit 9be5641 into master Jan 8, 2019

matklad deleted the new-impl branch January 8, 2019 09:02

nickolay mentioned this pull request Apr 23, 2020

Fix the s_expressions example #55

Merged

bors bot added a commit that referenced this pull request Apr 23, 2020

Merge #55

815a5f3

55: Fix the s_expressions example r=CAD97 a=nickolay It was broken since the refactoring in PR #9 (v0.2.0) Co-authored-by: Nickolay Ponomarev <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The great API simplification #9

The great API simplification #9

matklad commented Jan 7, 2019

matklad commented Jan 7, 2019

flodiebold left a comment

matklad commented Jan 8, 2019

matklad commented Jan 8, 2019

matklad commented Jan 8, 2019

The great API simplification #9

The great API simplification #9

Conversation

matklad commented Jan 7, 2019

matklad commented Jan 7, 2019

flodiebold left a comment

Choose a reason for hiding this comment

matklad commented Jan 8, 2019

matklad commented Jan 8, 2019

matklad commented Jan 8, 2019