Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeMap: tag and explain unsafe internal functions or assert preconditions #68468

Merged
merged 1 commit into from
Jan 30, 2020

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Jan 22, 2020

#68418 concluded that it's not desirable to tag all internal functions with preconditions as being unsafe. This PR does it to some functions, documents why, and elsewhere enforces the preconditions with asserts.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 22, 2020
@ssomers
Copy link
Contributor Author

ssomers commented Jan 22, 2020

Benchmark comparison should be the same as in #68418, I suppose - just some more unsafe tags and comments. These do seem to shift the focus of the optimizer somewhat away from set::is_subset, regardless of the usual noise.

>cargo benchcmp mastr2.txt middle2.txt --threshold 3
 name                                           mastr2.txt ns/iter  middle2.txt ns/iter  diff ns/iter  diff %  speedup
 btree::map::find_rand_100                      18                  17                             -1  -5.56%   x 1.06
 btree::map::find_rand_10_000                   58                  56                             -2  -3.45%   x 1.04
 btree::map::insert_seq_100                     45                  47                              2   4.44%   x 0.96
 btree::map::iter_1000                          2,641               2,880                         239   9.05%   x 0.92
 btree::map::iter_mut_1000                      2,705               2,823                         118   4.36%   x 0.96
 btree::set::build_and_pop_all                  6,065               5,828                        -237  -3.91%   x 1.04
 btree::set::difference_staggered_100_vs_100    855                 795                           -60  -7.02%   x 1.08
 btree::set::difference_staggered_100_vs_10k    2,405               2,311                         -94  -3.91%   x 1.04
 btree::set::difference_staggered_10k_vs_10k    78,083              75,095                     -2,988  -3.83%   x 1.04
 btree::set::intersection_100_neg_vs_100_pos    25                  24                             -1  -4.00%   x 1.04
 btree::set::intersection_10k_neg_vs_10k_pos    30                  31                              1   3.33%   x 0.97
 btree::set::intersection_random_100_vs_100     591                 646                            55   9.31%   x 0.91
 btree::set::intersection_random_100_vs_10k     2,251               2,361                         110   4.89%   x 0.95
 btree::set::intersection_random_10k_vs_100     2,276               2,362                          86   3.78%   x 0.96
 btree::set::intersection_staggered_100_vs_100  652                 717                            65   9.97%   x 0.91
 btree::set::intersection_staggered_100_vs_10k  2,075               2,201                         126   6.07%   x 0.94
 btree::set::intersection_staggered_10k_vs_10k  63,366              67,628                      4,262   6.73%   x 0.94
 btree::set::is_subset_100_vs_100               426                 459                            33   7.75%   x 0.93
 btree::set::is_subset_100_vs_10k               1,363               1,458                          95   6.97%   x 0.93
 btree::set::is_subset_10k_vs_10k               40,384              45,252                      4,868  12.05%   x 0.89

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a few comments.

@@ -369,7 +369,7 @@ impl<BorrowType, K, V, Type> NodeRef<BorrowType, K, V, Type> {
/// If the node is a leaf, this function simply opens up its data.
/// If the node is an internal node, so not a leaf, it does have all the data a leaf has
/// (header, keys and values), and this function exposes that.
/// See `NodeRef` on why the node may not be a shared root.
/// Unsafe because the node may not be a shared root, see why in the `NodeRef` comments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Unsafe because the node may not be a shared root, see why in the `NodeRef` comments.
/// The node must not be a shared root. For more information, see the `NodeRef` comments.

Copy link
Contributor Author

@ssomers ssomers Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to believe I fell in the old must-not/may-not trap (a Dutch thing). But now, is it clear why the function is unsafe? I edited the age-old comments on new_kv and new_edge just to get that across.

src/liballoc/collections/btree/map.rs Show resolved Hide resolved
src/liballoc/collections/btree/node.rs Outdated Show resolved Hide resolved
@@ -647,7 +637,7 @@ impl<'a, K, V> NodeRef<marker::Mut<'a>, K, V, marker::Internal> {

fn correct_childrens_parent_links(&mut self, first: usize, after_last: usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be annotated, I think, as the first/after_last must be <= self.len() I believe.

(Or we can assert that condition).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can't simply assert it like that. split_off invokes move_suffix which invokes move_edges with a fixed dest_offset of 1, which is out of bounds if right_new_len is zero. Nothing is moved, no children's parents are corrected.

One may condemn split_off for investing time to move an empty suffix but I'm opting for a stranger debug_assert right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my mind, added an if-test on right_new_len

@Mark-Simulacrum
Copy link
Member

I also resolved your comments, as I agree that neither need be addressed here.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully a final round and then we can get this landed.

@@ -240,7 +240,7 @@ impl<K, V> Root<K, V> {
/// Adds a new internal node with a single edge, pointing to the previous root, and make that
/// new node the root. This increases the height by 1 and is the opposite of `pop_level`.
pub fn push_level(&mut self) -> NodeRef<marker::Mut<'_>, K, V, marker::Internal> {
debug_assert!(!self.is_shared_root());
debug_assert!(!self.is_shared_root()); // &mut to a static is impossible anyway
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see us drop the debug asserts that should be statically impossible (e.g. like here, where we have unique access).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually, not committing that (yet). Seems I should retract the comment instead. And perhaps mark unsafe. Or do you imply that &mut self logically implies that the caller should know that the shared root is unacceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should retract this. I missed that we don't actually "own" the root here necessarily (i.e., we may have a NonNull or perhaps not)

pub fn new_kv(node: NodeRef<BorrowType, K, V, NodeType>, idx: usize) -> Self {
// Necessary for correctness, but in a private module
/// Creates a new handle to a key/value pair in `node`.
/// Unsafe because the caller must ensure that `idx` be less than `node.len()`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Unsafe because the caller must ensure that `idx` be less than `node.len()`.
/// Unsafe because the caller must ensure that `idx < node.len()`.
/// Note that `node.len()` is always safe to call.

Please verify that the second line is actually true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that node.len() is safe. Everything I learnt over the past months tells me that it reads through a pointer that is always well aligned from memory that is always allocated, storing the value in a register or stack exclusive to the caller. But in the course of #58431, @RalfJung wrote in into_slices_mut that len wreaks havoc on things elsewhere and had to be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it is true that under stacked borrows a call to len() can invalidate some other reference, but that doesn't really make it unsafe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading up on stacked borrows, and two hours later I might have understood something, i.e. what you're asking. I guess invalidating is a lot more subtle than killing. Still bad enough that it prompted the rewrite of into_slices_mut, but that was only to satisfy Miri and to avoid future trouble - there was no genuine UB. "Verify" means to check that Miri is happy executing the expressions in the debug_assert (while it normally skips those).

If that's right, then yes, the second line is actually true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also applied that comment style to new_edge

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling len has no preconditon other than the argument being well-typed

Sure, but I was thinking of a precondition (or restriction, if precondition is the wrong term) on the unsafe new_kv function. Though it does seem that if you start documenting that, you'd have to do that in a lot of functions for a lot of "interactions" with memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a precondition, I'd say it's part of the side-effects of this function. It's part of the side-effect of all &self functions that calling it kills exclusive aliases to self. This one here is special only insofar as the NodeRef type is used instead of a "normal" reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understand. "Note that node.len() is always safe to call" does not go into what side effects there might be.

My answer to "that the second line is actually true" is simply: yes, NodeRef::len() is not marked unsafe.

And Mark's point is that it is merely a hint to the caller, not a justification of the debug_assert, so there's not real reason to keep it nor to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's drop it from here perhaps and make an explicit note on NodeRef::len()? Unfortunately given the unsafety landscape today the "is not unsafe" part of NodeRef::len() doesn't really actually play out to being necessarily indicative of lacking preconditions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped it and recycled the bytes to somewhat wishy-washy comments in NodeRef::len(). And not on NodeRef::height() or any other &self method...

src/liballoc/collections/btree/node.rs Show resolved Hide resolved
@Mark-Simulacrum
Copy link
Member

Please also squash the commits into one while addressing the latest feedback.

Copy link
Contributor Author

@ssomers ssomers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I transplanted the earlier demo into a commit on top of the current commits here, which gives branch btreemap_prefer_middle_demo. Testing btree::map::test_basic_small clearly demonstrates to me that there is no protection against a &mut self to the shared root, apart from the debug_assert inside into_key_slice_mut. Even Miri doesn't object (though of course a little later Miri does object to transmuting the shared root to a leaf pointer).

Perhaps what you based on is that the node field of a NodeRef cannot be turned into a '&mut` on its own?

@Mark-Simulacrum
Copy link
Member

Okay, I think this is good to go modulo the &mut to a static is impossible comments, sorry for misleading you there. (Well, it's true, but that's not what those method signatures mean, due to indirection through a raw pointer).

@ssomers
Copy link
Contributor Author

ssomers commented Jan 24, 2020

I've become quite fond of the idea that marker::Mut should imply exclusivity. But alas, pretending to have a marker::Mut reference to the shared root is exactly what get_mut and remove do. search_node returns such a marker::Mut edge handle into the shared root hinting where the desired key could be inserted. Other mutable operations on BTreeMap don't face this because they swap out the shared root for a real root up front, but that would obviously be wasteful for get_mut and remove.

@ssomers
Copy link
Contributor Author

ssomers commented Jan 24, 2020

So really what I just wrote is that search_node should be made unsafe, because it can return a baloney SearchResult::GoDown result.

@ssomers ssomers force-pushed the btreemap_prefer_middle branch 2 times, most recently from ee8f7a2 to 5aecf55 Compare January 24, 2020 17:48
@ssomers
Copy link
Contributor Author

ssomers commented Jan 28, 2020

search_node should be made unsafe, because it can return a baloney SearchResult::GoDown result.

I tried split up the search code such that it never returns baloney handles, but it's not the only place where this happens: even the Drop implementation constructs a RangeMut with equal front and back edge handles, which point into the shared root.

So such handles are not baloney but unavoidable, and marker::Mut cannot imply exclusivity.

@ssomers
Copy link
Contributor Author

ssomers commented Jan 29, 2020

So such handles are not baloney but unavoidable, and marker::Mut cannot imply exclusivity.

I couldn't let it be, could I... Turns out it's relatively easy to get rid of marker::Mut/Owned NodeRefs to shared roots altogether. But I'm not planning to include any of that in this PR.

@Mark-Simulacrum
Copy link
Member

@bors r+

Let's land this so we can make some progress, and we can continue iterating in future PRs if necessary.

@bors
Copy link
Contributor

bors commented Jan 29, 2020

📌 Commit ba87a50 has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 29, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Jan 30, 2020
…ark-Simulacrum

BTreeMap: tag and explain unsafe internal functions or assert preconditions

rust-lang#68418 concluded that it's not desirable to tag all internal functions with preconditions as being unsafe. This PR does it to some functions, documents why, and elsewhere enforces the preconditions with asserts.
bors added a commit that referenced this pull request Jan 30, 2020
Rollup of 6 pull requests

Successful merges:

 - #66648 (Implement clone_from for BTreeMap and BTreeSet)
 - #68468 (BTreeMap: tag and explain unsafe internal functions or assert preconditions)
 - #68626 (Use termize instead of term_size)
 - #68640 (Document remaining undocumented `From` implementations for IPs)
 - #68651 (Document `From` implementation for NonZero nums)
 - #68655 (Fix revision annotations in borrowck-feature-nll-overrides-migrate)

Failed merges:

r? @ghost
@bors
Copy link
Contributor

bors commented Jan 30, 2020

⌛ Testing commit ba87a50 with merge 3024c4e...

@bors bors merged commit ba87a50 into rust-lang:master Jan 30, 2020
@ssomers ssomers deleted the btreemap_prefer_middle branch January 30, 2020 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants