Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't require a clone in Solver's methods #18

Open
Eh2406 opened this issue Oct 6, 2020 · 8 comments
Open

Don't require a clone in Solver's methods #18

Eh2406 opened this issue Oct 6, 2020 · 8 comments

Comments

@Eh2406
Copy link
Member

Eh2406 commented Oct 6, 2020

Is there a way to avoid the clones and allocation in https://github.com/mpizenberg/elm-test-rs/blob/31596f874a6e9340350de3e7c783a5cde9ca396b/src/solver.rs#L439? In Cargo we use Rc<Vec<Summary>> https://github.com/rust-lang/cargo/blob/07162dbaa84aaad97d6f3be5069265e784179bcd/src/cargo/core/resolver/dep_cache.rs#L35, so that a cache hit is and Rc clone instead of a Vec clone. I wonder if there is a trate bound that works for the algorithm and allows returning Vec<>, Rc<Vec<_>>, im::Vector<_>, or even a leaked &'static [_]?

Originally brought up in #15 (comment)

@mpizenberg
Copy link
Member

We are mostly interested in the iterating over versions to pick one, which is done in PartialSolution::pick_version. Maybe using a impl Iter<V> in that function would make more sense. Which means we can also have impl Iter<V> on list_available_versions()? that would avoid the need to generate a new Vec. But I think I tried it and the compiler was not happy with my usage of impl. I might have done something wrong. Maybe a dyn was needed and I didn't know if it was worth it. One thing that could be quite nice if we manage to have Iter somwhere in that signature is that from a user point of view, a strategy to choose the oldest version for testing purposes for example only cost rev() on an iterator which is cheap I imagine.

Otherwise, I opted for Vec as a first iteration "make it simple, make it work". But I suppose avoiding a clone with an Rc is good. I probably avoided the slice there because of the life time. Haven't really grasp yet what 'static means except making the compiler happy :)

@aleksator
Copy link
Member

'static means the reference to the item is going to be alive for the whole duration of the program.

As an example:
All &strs that you create with double quotes, e.g. let s = "hello", are actually let s: &'static str = "hello".
They are kept inside the binary and not allocated on the heap, and binary itself is obviously present at any point of program's execution; that's why they are 'static.

@mpizenberg
Copy link
Member

'static means the reference to the item is going to be alive for the whole duration of the program.

Doesn't sound like a good thing XD

@mpizenberg
Copy link
Member

mpizenberg commented Oct 7, 2020

One thing that could be quite nice if we manage to have Iter somwhere

However, the nice thing about having an actual Vec or Rc<Vec> is that we could cache it internally, whereas I'm not sure if that's easy to do with an iterator on references. At some point we where thinking with Alex about caching things we ask multiple times inside pubgrub. This may have two advantages, (1) faster even if the retriever does not cache itself, (2) we might actually want to prevent cases where the retriever answers differently for the same request, for correctness of the algorithm. Another advantage related to (2) in the case of the solver compared to the retriever, is that if we make hypothesis (2) it makes sense to cache the answer, while from the context of the retriever, you never know if a new version exists at the time of list_available_versions, so it could be correct to always ask the file system/network. In elm-test-rs, I'll just say that list_available_versions never hits the network, cache updates of available versions have to be done before calling the solver by the caller.

Regarding get_dependencies(), I wonder if we could also ask for an iterator, or if for the same reasons (internal caching VS references) asking for an owned data structure is better (with Rc then). In the iterator case, it's the responsibility of the caller to not give duplicate package entries, or else they will be overwritten.

PS: caching is probably a concern for another issue, but directly related to this one so I wanted to add it here

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 13, 2020

We can change resolve to only use iterators. So I started with returning Iterator<Item = V>, but Vec is not an iterator it is an IntoIterator, so lets try:

pub trait DependencyProvider<P: Package, V: Version> {
    type IV: IntoIterator<Item = V>;
...
    fn list_available_versions(&self, package: &P) -> Result<Self::IV, Box<dyn Error>>;

    type ID: IntoIterator<Item = (P, Range<V>)>;
...
    fn get_dependencies(
        &self,
        package: &P,
        version: &V,
    ) -> Result<Option<Self::ID>, Box<dyn Error>>;
...
}

This works for Vec<V>, BTreeSet<V>, and im::Vector<V>, but impl IntoIterarater is not supported (yet?) for associated types. So anything more complicated than a plane owned type is really painful or does not work. So no BTreeMap.keys().cloned().rev(), not even Box<Vec<V>>.

Looking back, I may just have rediscovered @mpizenberg's point:

But I think I tried it and the compiler was not happy with my usage of impl.

So if impl does not work lets try using dyn, but dyn can't live on its own so it is really Box<dyn Iterator<Item = V> + '_>.

pub trait DependencyProvider<P: Package, V: Version> {
...
    fn list_available_versions<'s>(
        &'s self,
        package: &P,
    ) -> Result<Box<dyn Iterator<Item = V> + 's>, Box<dyn Error>>;
...
    fn get_dependencies<'s>(
        &'s self,
        package: &P,
        version: &V,
    ) -> Result<Option<Box<dyn Iterator<Item = (P, Range<V>)> + 's>>, Box<dyn Error>>;
...
}

This lets us do all kinds of fancy things, but we have an allocation anyway. Is it better to have a Box<_> vs a Vec?

Looking back, I may just have rediscovered @mpizenberg's point:

Maybe a dyn was needed and I didn't know if it was worth it.

So not having found a good api, I am leaning toward leaving things alone.

@aleksator
Copy link
Member

This lets us do all kinds of fancy things, but we have an allocation anyway. Is it better to have a Box<_> vs a Vec?

What is the usage of such API? What would returning for example a Vec look like?

So not having found a good api, I am leaning toward leaving things alone.

What about RC like the original post suggested?

@mpizenberg
Copy link
Member

What about RC like the original post suggested?

For versions, the algorithm needs to pick one from those provided. The code to pick one only needs references and will only clone the picked version. So avoiding to clone the whole vec of versions thanks to Rc should be a perf improvement in theory. This would have to be benchmarked though to see under which conditions the performance improvements are worth it / how much it degrades performances for example when there is only one version per package.
It could also make sense to ask for a &[V] since we don't need owned versions and are cloning only one of those. The lifetime should be different than the one of the dependency provider though since it can change order of versions. I don't think I tried that. An iterator of references would be better performance-wise I suppose, as soon as there are usually more than 1 version per package when we do things like reversing the order, or custom orders (like some preferred versions are listed first, because they are on the file system for example). Also when the retriever does not store versions in a Vec, such as in a Set for example.

For dependencies, the code using those (from_dependencies(..., deps: &Map<P, Range<V>>)) is currently using a reference to the hashmap. But it's actually not a good thing since we need owned values when building the dependency kind. So we should probably change the signatures of incompatibility::from_dependencies and incompatibility::from_dependency to take owned values. In that case, the Rc around the Map would not be useful. We might want to have Rc around ranges but that's another level of API complexity that should be reflected everywhere else we use those ranges to have an effect. So returning an owned hashmap here is probably the right choice. An iterator of owned values may be better to avoid the cloning costs of the hasmap's structure, or when the dependency retriever doesn't store dependencies with hash maps internally. But again that should be benchmarked.

@Eh2406
Copy link
Member Author

Eh2406 commented Oct 27, 2023

but impl IntoIterarater is not supported (yet?) for associated types.

This will be stabilized in a few weeks. We could now have fn get_dependencies<'s>(...) -> Result<Option<Box<impl IntoIterator<Item = (P, Range<V>)> + 's>>, Box<dyn Error ...>>;. Current implementations would only be required to change their signature not the body of their implementation. Other implementations can avoid a clone by returning an iterator directly. (Which is a not insignificant speed up for OfflineDependencyProvider.) Although this signature is starting to get pretty gnarly. So I don't know if it's worth it. (Also are cashing dependency provider example will need a pretty thorough rewrite for dumb reasons.)

Another stable way to get this Vantage is to use Cow. Something like fn get_dependencies(...) -> Result<Cow<'s, Dependencies<P, VS>>, Box<dyn Error ...>>. Which would require existing implementations to either add Cow::Owned around the return, or remove a clone() and add a Cow::Borrowed around the return. Still to gnarly.

zanieb added a commit that referenced this issue Feb 3, 2024
github-merge-queue bot pushed a commit that referenced this issue Mar 11, 2024
Co-authored-by: Zanie Blue <[email protected]>

Otherwise, it's not possible to implement custom formatting of `Range` types. This seems generally useful to expose.

Example usages:

https://github.com/astral-sh/uv/blob/8d721830db8ad75b8b7ef38edc0e346696c52e3d/crates/uv-resolver/src/pubgrub/report.rs#L97-L112

https://github.com/astral-sh/uv/blob/8d721830db8ad75b8b7ef38edc0e346696c52e3d/crates/uv-resolver/src/pubgrub/report.rs#L549-L560

https://github.com/astral-sh/uv/blob/8d721830db8ad75b8b7ef38edc0e346696c52e3d/crates/uv-resolver/src/pubgrub/report.rs#L568-L605

Upstream port of astral-sh#18, but `impl Iterator<Item = (&Bound<V>, &Bound<V>)>` instead of `impl Iterator<Item = &(Bound<V>, Bound<V>)>` to avoid constraining it to a tuple.

Co-authored-by: Zanie Blue <[email protected]>
@Eh2406 Eh2406 added this to the v0.4 Dependency constraints milestone Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants