Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_formatter): Comments multi map #3230

Merged
merged 3 commits into from
Sep 16, 2022

Conversation

MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Sep 15, 2022

Summary

This PR is part of the Comments refactoring (#3227)

This PR implements a multimap for storing the leading, dangling, and trailing comments for nodes.

A naive implementation using three multimaps: one to store the leading, dangling, and trailing parts of each node, requires between nodes < allocations < nodes * 3 vec allocations (nodes = number of nodes with at least one comment).

This purpose-built multimap uses a shared Vec to store all comments where this is possible (99.99% of comments) and only allocates node-specific Vecs for nodes appearing "out of order"

The idea behind the implementation is that comments normally remain in the same order as in the source document. That means:

  • All comments for a node are inserted at "the same" time (one at a time, but before inserting the comments of the next node)
  • It first inserts the leading, then dangling, and finally, a node's trailing comments.

Using this fact allows storing the vast majority of comments in a single Vec and only storing the offsets into this shared vec indicating the start/end of the leading/dangling/trailing comments. The implementation falls back to allocating (up to) three vectors for a node in case either of the above rules is violated.

Using a single vector for the majority of comments is also beneficial when it comes to iterating because:

  • Retrieving the comments for a node requires a single hash map lookup instead of 3
  • All comments for a node are located next to each other in the vector, making better use of cache locality.

Metrics

I used countme to verify that the hypothesis that comments appearing out of order are rare by formatting all benchmark files:

cargo run -p xtask_bench --release --features=count  -- --feature=formatter  --criterion=false
rome_formatter::comments::map::InOrderEntry           39_383       34_333            0
rome_formatter::comments::map::OutOfOrderEntry            33           22            0
                                                       total     max_live         live

Out of the close to 40k comments, only 33 nodes required their "own" leading, dangling, and trailing collections. That's 0.0837925%. Or, in other words. This implementation avoids between 40k and 120k vector allocations compared to using three general purpose multimap implementations.

Test Plan

I added a handful of tests to the map implementation

@MichaReiser MichaReiser added the A-Formatter Area: formatter label Sep 15, 2022
@MichaReiser MichaReiser added this to the 0.10.0 milestone Sep 15, 2022
@MichaReiser MichaReiser requested review from leops and a team September 15, 2022 12:11
@MichaReiser MichaReiser changed the title Delete JsConstructorParameter kind feat(rome_formatter): Comments multi map Sep 15, 2022
This PR implements a multimap for storing the leading, dangling, and trailing comments for nodes.

An implementation using a general-purpose implementation would use a multimaps for the leading, trailing, and dangling comments and
 each multimap would allocate a `Vec` for every node with a comment.

 This purpose-built multimap uses a shared `Vec` to store all comments where this is possible (99.99% of comments) and only allocates
 node specific `Vec`s if necessary.

The idea behind the implementation is that comments should maintain the same order as in the source document. That means:
* All comments for a node are inserted at "the same" time (one at a time, but all together)
* It first inserts the leading, then dangling, and finally, a node's trailing comments.
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Outdated Show resolved Hide resolved
crates/rome_formatter/src/comments/map.rs Show resolved Hide resolved
leading.push(part);
}

Some(entry) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we hit this variant? Asking because I read the function entry_to_out_of_order which does match against entry... so I am confused 😅

Copy link
Contributor Author

@MichaReiser MichaReiser Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is hit if entry is an Entry::InOrder and either:

  • There are existing dangling comments or trailing comments for this node. It then isn't possible to insert the leading comment because it would require inserting in the middle of the Vec which would invalidate the indices and requires copying the dangling and trailing comments
  • Parts for another key have been inserted in the meantime: So if you have leading(a, 1), leading(b, 1), leading(a, 2) (notice the a, b, a sequence). Here we have the same issue, it is now required to insert the comment in between the leading(a, 1) and leading(b, 1) but that requires copying all comments and invalidates all indices.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Formatter Area: formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants