Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend entry API to work on borrowed keys. #1769

Closed
wants to merge 13 commits into from

Conversation

cristicbz
Copy link

@cristicbz cristicbz commented Oct 12, 2016

Rendered

Playground

Prototype Implementation

Analysis of a Preliminary Crater Run

Alternative to #1203 and #1533 .

cc @aturon @gankro @gereeter
(@aturon this works around the coherence issues we were talking about so it's fully general!)

@cristicbz cristicbz mentioned this pull request Oct 12, 2016
// ...

*string_map.entry("foo").or_insert(0) += 1; // Clones if "foo" not in map.
*string_map.entry("bar".to_string()) += 1; // By-value, never clones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this missing .or_insert(0)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@llogiq
Copy link
Contributor

llogiq commented Oct 12, 2016

This looks like a good way to avoid yet more clones while making entry (_) more ergonomic. I think the IntoOwned trait (as a superset of ToOwned in particular can be useful in other libraries, too, which should be listed as an advantage.

@cristicbz
Copy link
Author

cristicbz commented Oct 12, 2016

@llogiq I added a counterpoint to the 'Drawbacks' section that IntoOwned may be independently useful in generic contexts. Thanks!

Also added drawback pointing out that impl would be insta-stable.

@burdges
Copy link

burdges commented Oct 12, 2016

I've only given this a cursory glance, so maybe I do not understand it yet, but..

There are plenty of situations where delaying the duplication of the key sounds useful, but also many just call for better use of hash functions too, i.e. If the key value still exists elsewhere, then you likely only need the hash, not the key itself. Or maybe that's kinda what you're effectively trying to do here?

@cristicbz
Copy link
Author

@burdges I'm not sure I follow what you're saying:

If the key value still exists elsewhere, then you likely only need the hash, not the key itself. Or maybe that's kinda what you're effectively trying to do here?

Are you suggesting storing only the hash in the map? The map needs to own the key itself if on insertion, regardless of whether the it exists elsewhere or not. This is to be able to compare for equality when hashes match. BTreeMap is an even clearer case of having to own the key on insertion.

Maybe I'm misunderstanding?

@cristicbz
Copy link
Author

cristicbz commented Oct 12, 2016

This concern has now been included in the RFC, you can skip reading this comment

Ok, so I ran into trouble: the `{Vacant,Occupied,}Entry` types have a `.key() -> &K` method on them. My proposed implementation would have stored a `Q: IntoOwned` in the entries rather than a `K`.

I can see two options, both far from perfect:

  1. In OccupiedEntry keep on returning a reference to the key already in the map. In VacantEntry always call into_owned on construction to have a reasonable thing to return.
  2. Swap the type argument order for Entry to Entry<'a, Q, V, K=Q> where Q: IntoOwned<K> and have key() return &Q. I think this would not break any build since all current legal uses of Entry will return entries with Q=K. However, the semantics around OccupiedEntry-s return value would change: currently it returns a reference to the key in the map, now it would return a reference to the key passed in on construction. These keys are equal in the Eq sense, but not the same object referentially.

I'd really appreciate some ideas around this.

@Diggsey
Copy link
Contributor

Diggsey commented Oct 12, 2016

@cristicbz Can't Entry only implement the key() method when Q = K? That way it can have identical semantics (returning the real key for occupied entries, and the provided key for vacant entries).

You could add additional helper methods for accessing the key when Q != K, or the user can pattern match to get a vacant or occupied key (which will have their own methods to access the key value, of type Q or K respectively.

@cristicbz
Copy link
Author

cristicbz commented Oct 12, 2016

@Diggsey Ah yes! That's a good point, using impl<K, V> Entry<K, V, K> { fn key(&self) -> &K {} } would work! I've been doing (2) so far in my implementation, but that I could change.

Right now I'm running into trouble over the comparison implementation. I sort of waved my hand in the POC, by always performing into_owned. Requiring K: Borrow<Q> doesn't work because of an extra reference frustratingly; you'd need String: Borrow<&str>. Any ideas would be greatly appreciated! https://is.gd/F0m5Xg

It feels like there should be a natural way of extending the ability to do key.eq(other.borrow())

@burdges
Copy link

burdges commented Oct 12, 2016

In hind sight, I think my comment is kinda off topic for this pull request @cristicbz . I said basically :

If you need this for performance, then maybe you should use probabilistic data structures more aggressively instead. In particular, you might use a data structure and algorithm where you can proceed if the hashes match and need not worry about the actual key being correct.

Rust is kinda lacking on probabilistic data structures though, and they always require more detailed analysis, including estimating the expected table size in advance. It's clear someone might want this optimization without going through the hassle of tweaking cuckoo hash table parameters or something.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Oct 12, 2016
@Kixunil
Copy link

Kixunil commented Oct 12, 2016

Actually, we don't need IntoOwned. Into<Cow<K>> is sufficient, because you can do this: k.into().into_owned(). I believe compiler will optimize that away. (Reminder: every T: ToOwned impls Into<Cow<K>> and every T: Clone impls T: ToOwned.)

The only problem is that people might accidentally pass in e.g. &String instead of String even though they don't need the key anymore. This might be solved by some lint. (Generally, the lint should warn if you pass variable by reference even though you don't use it anymore and it would work with owned type.)

And yes, Into<Cow<K>> might look weird, so we might just create blanket impl for IntoOwned.

@cristicbz
Copy link
Author

cristicbz commented Oct 12, 2016

@Kixunil I tired to explain the problem with Into<Cow<K>> approach at the beginning of the Approach section.

The gist is it requires clone-able keys. So it'd work fine for String etc. But it would break backwards compatibiliy for !Clone keys which work now and wouldn't anymore.

And yes, Into<Cow<K>> might look weird, so we might just create blanket impl for IntoOwned.

That's roughly what the impl

impl<'a, T: ?Sized + ToOwned> RefIntoOwned for &'a T {
    type Owned = <T as ToOwned>::Owned;
    fn ref_into_owned(self) -> T::Owned { (*self).to_owned() }
}

expresses.

@cristicbz
Copy link
Author

cristicbz commented Oct 12, 2016

This solution has now been simplified and included in the RFC, you can skip reading this comment

[New playground link!](https://play.rust-lang.org/?gist=97b23ca418b29b0009fbfab2299874f3&version=nightly&backtrace=1)

Ok, so I think I worked out a solution for the Borrow conundrum. We need another trait!

pub trait AsBorrowOf<T, B: ?Sized>: IntoOwned<T> where T: Borrow<B> {
    fn as_borrow_of(&self) -> &B;
}

impl<T> AsBorrowOf<T, T> for T {
    default fn as_borrow_of(&self) -> &Self {
        self
    }
}

impl<'a, B: ToOwned + ?Sized> AsBorrowOf<B::Owned, B> for &'a B {
    default fn as_borrow_of(&self) -> &B {
        *self
    }
}

Then, the signature of Entry becomes:

#[stable(feature = "rust1", since = "1.0.0")]
pub fn entry<Q, B: ?Sized>(&mut self, key: Q) -> Entry<Q, V, K>
    where Q: AsBorrowOf<K, B>,
          K: Borrow<B>,
          B: Hash + Eq {
    // Gotta resize now.
    self.reserve(1);
    self.search_mut(key.as_borrow_of()).into_entry(key).expect("unreachable")
}

The trick is to not ever use Q directly. Instead find a B to which can get from both Q and K. This avoids needing to create a &&T via Borrow and other similar issues.

There are two remaining issues as far as I can tell:

  • Type inference breakage. Consider the following code which compiles today:
  let hash_map = HashMap::new();
  *hash_map.entry("hello").or_insert(0u64) += 1;

currently the inference hash_map: HashMap<&'static str, u64> is valid, but it will stop being valid and may require more type annotations to build.

  • The other issue is stranger and I think it's a compiler bug. Currently the following code compiles
fn generic<K: Hash + Eq + Borrow<Q>, V, Q>(map: &mut HashMap<K, V>, key: K) {
  map.entry(key)
}

but after changing the signature of Entry, it fails with: trait 'K: AsBorrowOf<K, Q>' not satisfied. Adding a superfluous K: Borrow<K> fixes it (despite there being a blanket impl<T> Borrow<T> for T):

fn generic<K: Hash + Eq + Borrow<Q> + Borrow<K>, V, Q>(map: &mut HashMap<K, V>, key: K) {
  map.entry(key)
}

I'll update the RFC with this information and add a link with a preliminary rustc branch tomorrow.

@withoutboats
Copy link
Contributor

withoutboats commented Oct 12, 2016

I wonder if there are specialization extensions which would enable the approach ruled out at the beginning, rather than adding IntoOwned. I notice that IntoOwned has the same type signature as simple Into, and I'm worried about adding more identical traits with untyped contracts to the standard library (we already have this with AsRef/Borrow for example).

@Kixunil
Copy link

Kixunil commented Oct 13, 2016

@cristicbz oh, seems like I skimmed through it too fast. Sorry. Anyway, I think IntoOwned seems useful.

@cristicbz
Copy link
Author

cristicbz commented Oct 13, 2016

@withoutboats IntoOwned<K> could just become Into<K> if the following signature existed:

impl<'a, T: ?Sized> From<&'a T> for T::Owned where T: ToOwned { /* ... */ }

(edit: I fixed the above signature which was broken before)

Not sure if there is any reasonable way of adding this backwards compatibly (even with specialization). But maybe there is, there's a lot I don't get about how far we can push coherence.

I don't know if we can get around expressing somehow though AsBorrowOf<K, B>. It's like a type function (with output B --- which could be an assoc type param if we had full assoc type specialization) whose purpose is to borrow K while leaving borrows of K unchanged.

Maybe some arcane magick exists to sort this out some other way.

The counterpoint to 'identical traits with untyped contracts' is that they allow distinguishing between semantics (referring to Borrow requiring Hash/Ord + Eq equivalence). To be clear, it'd make me happy to get rid of IntoOwned specifically though. If we don't stabilise it however, it can be swapped out backwards compatibly for Into when we gain the ability to express the From constraint at the top.

@cristicbz
Copy link
Author

cristicbz commented Oct 13, 2016

This solution is now the main version proposed in the RFC, you can skip reading this comment

So it turns out these three traits can all be compacted into a single one without specialisation:
pub trait AsBorrowOf<T, B: ?Sized>: Sized where T: Borrow<B> {
    fn into_owned(self) -> T;
    fn as_borrow_of(&self) -> &B;
}

impl<T> AsBorrowOf<T, T> for T {
    fn into_owned(self) -> T { self }
    fn as_borrow_of(&self) -> &Self {
        self
    }
}

impl<'a, B: ToOwned + ?Sized> AsBorrowOf<B::Owned, B> for &'a B {
    fn into_owned(self) -> B::Owned { self.to_owned() }
    fn as_borrow_of(&self) -> &B {
        *self
    }
}

The meaning of impl AsBorrowOf<K, B> for Q is:

  • Q can be borrowed to B just like K and
  • Q can be by-value converted to K (potentially).

For entry that's exactly what we want to express:

  • There exists a B such that Q and K can both be borrowed as &B. This enables hashing & comparison.
  • Q can be converted to a K once if needed (on vacant insert).

It is annoying that for a given K and Q there is a single B that makes sense, but I wasn't able to make coherence work with B as a type parameter.

@cristicbz
Copy link
Author

OK now that I've written an implementation and found a bunch of issues, I updated the RFC to address them and moved to the single-trait-no-specialization AsBorrowOf version. please have a look! cc @aturon @llogiq @Diggsey @withoutboats @SimonSapin

@cristicbz
Copy link
Author

cristicbz commented Oct 14, 2016

I added an analysis of the crater run results. Manually went through each one, created a minimal test case and described the causes and results.

In short, there are 9 irreconcilable regressions caused by actual inference failures introduced by widening the allowed argument types of Entry, split about evenly between:

  • Short lived local HashMap with reference key types whose type can no longer be inferred from .entry(some_reference).
  • Usage of Into to convert from &str -> String in calls like string_map.entry(some_str.into()) (this is ambiguous since both the intended conversion and &str -> &str are valid).

Instances of rust-lang/rust#37164 and rust-lang/rust#37138 appear once each.

@cristicbz
Copy link
Author

@Skrapion I think that case would be supported with the proposed API by adding a Query impl either for the the query type itself or for a wrapper of it (the return type of your query_factory).

@cristicbz
Copy link
Author

@aturon @withoutboats I'd love to get this going again, what do you think of the entry_or_clone description I gave in my previous comment? I think that's the extent to which the complexity can be pushed into Entry. And maybe it's worth it for the explicitness of clone in the name rather than silently cloning an e.g. &String even though passing in a String would have been acceptable as well.

@aturon
Copy link
Member

aturon commented Aug 3, 2017

@cristicbz Gah, I'm incredibly sorry I've let this sit idle for so long again; it just never reaches the top of my stack. Please feel free to ping me on IRC or Twitter to get my attention back on it. I will push toward a resolution ASAP.

@Gankra
Copy link
Contributor

Gankra commented Aug 3, 2017

Ok I've been roped into helping out here. Disclaimer: I am so bloody sick of this problem and I am fairly certain it will haunt me on my death bed.

As such I am inclined to pick the maximally flexible solution, no matter how ugly it is. So presumably the second option in #1769 (comment)

People will ask what the heck is up with it, and we'll shrug and say "computers are bad".


With that out of the way, I would like to discuss where we're heading with this in bigger picture terms. This is basically a big aside, and everyone can ignore it if they want. Again, I want this to all go away, and the choice I mentioned above does that... for now.

So the Entry API was borne of two things:

  • borrowck bites, and won't let us double lookup naturally
  • double lookup is inefficient anyway

In solving this issue, we took on several other constraints:

  1. it should be easy to use for common cases (entry(k).or_insert(v), introduced in refinements)
  2. it should be possible to use effeciently in complex cases (match entry(k) { ... }, the "core" API)
  3. it should be hard to misuse

Constraint (3) mostly meant we won't let you give one thing for lookup, and then another thing for the actual insertion. I fear this constraint has led us awry, and caused us to sacrifice too much of (2).

As a result later generations ran into several issues in serving (3).

  • the key must be owned, and that's bad if you have a borrowed key and the entry is Occupied
  • but it's just as bad if the key must be borrowed, you have an owned one, and it's Vacant
    • Also this wasn't compatible with typemap or something
  • and also we can't require that keys be Clone/ToOwned

Which is what this RFC is trying to address.

However I forsee further complaints that this still isn't addressing these other common complaints:

  • I can't memoize hashes
  • I can't implement custom searches without newtypes

I would like to propose a rejection of (3), producing an incredibly flexible API that you can legitimately misuse and lose your keys with.

It is a three-phase entry:

let map = ...;
let key = ...;

// Start the algorithm
let raw_entry = map.raw_entry();

// First do hashing (can memoize)
let hashed_entry = raw_entry.hash_with(|hasher| -> u64 {
  key.handle_hashing(hasher)
});

// Then search, transforming both keys however you need
let searched_entry = hashed_entry.search_with(|found_key| -> bool {
  key.field == &**found_key.whatever
);

// And finally handle the entry
let val = match searched_entry {
  Occupied(entry) => entry.into_mut(),
  
  // And you can just give the dang key here 
  Vacant(entry) => entry.set(key.clone(), 0),
}

*val += 1;

I believe this would handle everyone's usecase for this nonsense from now until eternity. It would be as efficient as possible. It would put no constraints on implementations. And it lets you misuse it as much as you please. (BTreeMap would just remove the hash_with part, and search_with would be -> Ordering)

We could also add conveniences so you can do raw_entry.hash(&key).search(&key).or_insert_with(key, val) or whatever.

@cristicbz
Copy link
Author

@gankro When you say 2nd option, to be maximally clear, do you specifically mean entry_or_clone + no blanket impl or would you be open to just extending the existing entry method with corresponding ergonomic gains, but inference / turbofish-inducing breakage and potential accidental cloning?

Regarding your sketch proposal, it sounds a little like #1533. My personal preference would be for something like that to be added in_addition_ to this RFC.

For the common case of borrow+clone (like word count, or for caching) , raw_entry feels like dropping too many levels of abstraction, but as you point out, examples like @shepmaster 's would still need the power provided by it.

On mobile and thumbs hurt, but thanks a lot for breathing in some new life into this RFC ; it would make me so happy to see it resolved one way or another.

@Gankra
Copy link
Contributor

Gankra commented Aug 4, 2017

I was suggesting taking "entry_or_clone with explicit Query bound, Query without ToOwned blanket" (I would call it cow_entry/entry_cow, but that's a minor quibble). It's ugly, but it gives our users the the most (I think? there's a lot of moving parts and it's hard to keep them all in my head at once.)

I agree that landing this on its own is probably fine.

@aturon
Copy link
Member

aturon commented Aug 9, 2017

@cristicbz So I spent some time re-reading the thread, still haunted by the feeling that there must be a way we can do better.

I wonder whether we could consider deprecating ToOwned, and introducing a new trait that is identical except that the associated type instead becomes a type parameter (permitting a many-to-many relationship). We could provide blanket impls based on Clone, as today. Is there any way that can lead to a solution with the same flexibility, but with fewer traits over all?

@cristicbz
Copy link
Author

cristicbz commented Aug 19, 2017

I've been travelling, so sorry if I'm slow to reply.

Two things here:

  1. I have the feeling that ToOwned was specifically designed with an associated type to fix some kind of coherence issue, but I can't find a mention of it in the discussion of the collections reform RFC; I think the only way to figure this out is to try to replace ToOwned with a type parameter (without actually merging this change) to see if any of the trait impl-s breaks.
  2. I think that adding this said trait, would still require a Query-like trait, but would at most get rid of explicit extra impl-s (instead being able to provide a NewToOwned-based blanket impl of Query). It might actually not work even for that though.

It's a bit laborious, but I'd need to experiment to see how such an impl would pan out.

@Binero
Copy link

Binero commented Sep 3, 2017

While far from addressing all the issues this RFC addressed, I submitted a temporary fix (non-breaking) for some of the issues with Entry: rust-lang/rust#44278

@cristicbz
Copy link
Author

Sadly, I do not have the resources I had when I initially wrote this RFC and I have left it in lmbo for way too long a time. I appreciate everyone's time and feedback and I think some really useful research came out of this, but it doesn't seem like the initially proposed solution is good enough to get merged.

Rather than keep this RFC open and modifying it many times asynchronously, it'd better for someone else to come with a fresh perspective and improve on the ideas from this conversation, crystallising them into a new RFC. Then it would be easier to request a fresh batch of comments and feedback from the community on that new design.

@cristicbz cristicbz closed this Feb 28, 2018
@burdges
Copy link

burdges commented Feb 28, 2018

Just as an aside, I'd love to see (a) more probabilistic data structures implemented in Rust and (b) an effort to unify their interface in some trait hierarchy, but with both happening outside the core language.

@burdges burdges mentioned this pull request May 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.