Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add a simplify for error messages #156

Merged
merged 5 commits into from
Nov 29, 2023
Merged

feat: add a simplify for error messages #156

merged 5 commits into from
Nov 29, 2023

Conversation

Eh2406
Copy link
Member

@Eh2406 Eh2406 commented Nov 22, 2023

Ranges generated by PubGrub often end up being verbose and pedantic about versions that do not matter. To take an example from #155 2 | 3 | 4 | 5 could be more concisely stated as >=2, <=5 given that it's not possible to have versions in between the integers. More generally it could be expressed as >=2, if the list of available versions were taken into account.

The logic for simplifying a VS given a complete set of versions feels like it should be simple. But it is trickier to implement than I expected. Especially if you want to guarantee O(len(VS) + len(versions)) time and O(1) allocations.

While working through the logic I had to implement a check which versions match from a list in O(len(VS) + len(versions)) time, which felt useful enough to be worth including in the API. In implementing that I noticed that our implementation of contains, did not aggressively short-circuit. The short-circuiting implementation is just as easy to read, so I decided to include that as well.

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 22, 2023

cc @zanieb, I think this can be used to dramatically improve your error messages.

cc @baszalmstra mamba-org/resolvo#2 (comment) I don't know if these are improvements you'd be interested in include in your copy of this type.

A future possibility is figuring out when it is safe for PubGrub to ask for a range to be simplified during processing. This would be enormously helpful for #135. But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

@mpizenberg
Copy link
Member

It feels like a very good idea for error reporting where solver logic is not important anymore, and readability is much more useful. Also at this point the potential costs are likely negligible compared to the solving costs?

src/range.rs Outdated
(Excluded(start), Excluded(end)) => v > start && v < end,
(Excluded(start), _) => v <= start,
(Included(start), _) => v < start,
(Unbounded, _) => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong if the very first segment is (Unbounded, Unbounded)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole short-circuit change feels shady ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The false here means that it fails to short-circuit and falls through to the check below which correctly returns true. That being said I need to reread it twice to figure out what was going on here, and I wrote the code yesterday. There has to be a clear way to write this.

Similarly, we can probably improve the testing to ensure that the code are correct, even when were not paying attention. I will need to look into what is worth the effort. (One day, I would love to use kani or creusot to prove the code correct, not that I have time for it now.)

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 24, 2023

I tinkered with the code to make it more "obviously correct". While I was at it I noticed that STD provide some helpful methods which simplified a few things.

I looked at our test code generation, and I'm pretty confident it will cover all the corner cases.

src/range.rs Outdated
(Excluded(start), Included(end)) => v > start && v <= end,
(Excluded(start), Excluded(end)) => v > start && v < end,
} {
if !within_lower_bound(segment, v) {
Copy link
Member

@mpizenberg mpizenberg Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, I I'm getting it now. The naming sounds a bit weird? "within bounds" makes sense, but "within lower bound" feels odd. I'd say something like "above lower bound", "below upper bound". Or if we want to avoid the negation, use "below" in both cases, like this:

if version_below_lower_bound( v, segment ) {
  return false;
} else if version_below_upper_bound( v, segment ) {
  return true;
}

PS, there is a typo in within_uppern_bound with an "n" at the end of "upper".

Copy link
Member

@mpizenberg mpizenberg Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, a method on the bounded range maybe:

if segment.is_above(v) {
  return false;
} else if !segment.is_below(v) {
  return true;
}

I'm not a super fan of the not ! in there though.

EDIT: forget that, we don't have a type for the segment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a within_bounds that returns a https://doc.rust-lang.org/std/cmp/enum.Ordering.html ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So taking Ordering as a slightly richer Bool. It's tempting. In that case it's important to have the version argument before the segment argument (within_bounds(v, segment)) because ordering can be confusing.

@mpizenberg
Copy link
Member

mpizenberg commented Nov 26, 2023

What do you think of splitting a bit the simplify function, to make it easier to follow? I'm thinking something like this (pseudo code):

// return the segment index in the range for each version in the range, None otherwise
locate_versions(&self, versions: Iter<V>) -> Iter<Option<usize>>

// group adjacent versions locations
// [None, 3, 6, 7, None] -> [(3, 7)]
// [3, 6, 7, None] -> [(None, 7)]
// [3, 6, 7] -> [(None, None)]
// [None, 1, 4, 7, None, None, None, 8, None, 9] -> [(1, 7), (8, 8), (9, None)]
group_adjacent_locations(locations: Iter<Option<usize>>) -> Iter<(Option<usize>, Option<usize>)>

// simplify range with segments at given location bounds.
keep_segments(&self, kept_segments: Iter<(Option<usize>, Option<usize>)) -> Self

simplify(&self, versions: Iter<V>) -> Self {
  let version_locations = self.locate_versions(versions);
  let kept_segments = group_adjacent_locations(version_locations);
  self.keep_segments(kept_segments)
}

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 27, 2023

group_adjacent_locations was harder than I was expecting. It's hard to build out of normal iterator methods, because it may need to return one more result after the underlying iterator has been exhausted. It would be straightforward to implement using generators, if they were stable. But eventually I made it work.

@zanieb
Copy link
Member

zanieb commented Nov 27, 2023

I'll try to use this today!

@mpizenberg
Copy link
Member

group_adjacent_locations was harder than I was expecting.

If you feel it's harder than worth it maybe it's better to not split group_adjacent_locations and keep_segments. I'm not finding more straightforward ways to write it either. Your call.

I don't see your previous version to compare complexity. I guess you forced push rewrote it. I haven't followed the CI merging changes. Are PRs still squashed merged? or are they "normal" merge? Because if they are still squashed merge there is no need to overwrite your commit while working on the PR. And if they aren't well it's slightly annoying to make sure all a PR history is clean. I don't want to derail this PR, I just thought it was relevant as it makes my reviewing of the PR harder.

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 27, 2023

I think they're worth keeping. I wish they had been easier to come up with or more direct code, but they do make things clear.

I did force push, and will try and avoid doing so while you're reviewing. I got into the habit because it simplifies the history which is helpful when dealing with several long-lived branches. Hopefully those will become less common. If you scroll through conversation here on GitHub it provides links for every time I force push, like
image
the compare button can show you what changed between each force push.

src/range.rs Show resolved Hide resolved
@konstin
Copy link
Member

konstin commented Nov 28, 2023

But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

Could you give an example where this doesn't hold? I tried but couldn't find one.

@konstin
Copy link
Member

konstin commented Nov 28, 2023

I made a POC for simplifying all terms: astral-sh/pubgrub@main...zanieb:pubgrub:know-thy-versions-rebase. The main problem is that this breaks accum_term.subset_of(&incompat_term), any ideas? I've just inserted the simplification at all the places where we regularly intersect, though i feels like we shouldn't need to build up all the intersections every time in the first place.

Co-authored-by: konsti <[email protected]>
@zanieb
Copy link
Member

zanieb commented Nov 28, 2023

Hm so I gave this a try by hacking it into the reporter and testing a simple holes case.

b9b0e1d

Before:

Because there is no available version for bar and foo 1.0.0 depends on bar, foo 1.0.0 is forbidden.
And because there is no version of foo in <1.0.0 | >1.0.0, <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0, foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (1)

Because there is no available version for bar and foo 2.0.0 depends on bar, foo 2.0.0 is forbidden.
And because foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (1), foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (2)

Because there is no available version for bar and foo 3.0.0 depends on bar, foo 3.0.0 is forbidden.
And because foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (2), foo <4.0.0 | >4.0.0 is forbidden. (3)

Because there is no available version for bar and foo 4.0.0 depends on bar, foo 4.0.0 is forbidden.
And because foo <4.0.0 | >4.0.0 is forbidden (3), foo * is forbidden.
And because root 1.0.0 depends on foo, root 1.0.0 is forbidden.

After:


Because there is no version of bar in ∅ and foo <=1.0.0 depends on bar ∅, foo 1.0.0 is forbidden.
And because there is no version of foo in ∅, foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (1)

Because there is no version of bar in ∅ and foo 2.0.0 depends on bar ∅, foo 2.0.0 is forbidden.
And because foo <2.0.0 | >2.0.0, <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (1), foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden. (2)

Because there is no version of bar in ∅ and foo 3.0.0 depends on bar ∅, foo 3.0.0 is forbidden.
And because foo <3.0.0 | >3.0.0, <4.0.0 | >4.0.0 is forbidden (2), foo <4.0.0 | >4.0.0 is forbidden. (3)

Because there is no version of bar in ∅ and foo >=4.0.0 depends on bar ∅, foo 4.0.0 is forbidden.
And because foo <4.0.0 | >4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root 1.0.0 is forbidden.

While we can certainly do something to improve the null cases, I'm not seeing this as a drastic improvement. Perhaps I'm doing something wrong?

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 28, 2023

But needs to be done with extreme care. Basic set properties do not hold if simplify is added to the equation: A.itersection(A.negate()) == empty is true, but simplify(A).itersection(simplify(A.negate())) == empty is not.

Could you give an example where this doesn't hold? I tried but couldn't find one.

You seem to be correct, this implementation of simplify does uphold this property. The point I was trying to make is defining what properties simplify must uphold and making sure that the rest of the algorithm doesn't rely on anything else requires some care.

I made a POC for simplifying all terms: astral-sh/pubgrub@main...zanieb:pubgrub:know-thy-versions-rebase. The main problem is that this breaks accum_term.subset_of(&incompat_term), any ideas? I've just inserted the simplification at all the places where we regularly intersect,

There are clearly some properties that code is relying on that simplify does not uphold. What they are, I do not yet know. And I would like to separate/delay the conversation about how the algorithm can use simplify until after we merge making the freestanding method useful/available.

though i feels like we shouldn't need to build up all the intersections every time in the first place.

You may want to look at the "accumulated_intersection" and the "fewer_intersections" branches. I be happy to discuss other changes to the algorithm to reduce intersections, either on zulip or in a issue.

Hm so I gave this a try by hacking it into the reporter and testing a simple holes case.

This is exactly where I was hoping this freestanding method would be useful. Let's see how much it helped...

While we can certainly do something to improve the null cases, I'm not seeing this as a drastic improvement. Perhaps I'm doing something wrong?

That is not as helpful as I was hoping :-( clearly simplify is not working as well as I'd like. Let me experiment with your branch and I will report back.

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 28, 2023

Got it. Most of the error reporting is based on the terms, replacing terms: derived.terms.clone(), with

terms: derived
                    .terms
                    .iter()
                    .map(|(p, t)| (p.clone(), t.simplify(versions.get(&p).unwrap_or(&Vec::new()).into_iter())))
                    .collect(),

and adding a simplify method to term that forwards along. I get:

Because there is no version of bar in ∅ and foo <=1.0.0 depends on bar ∅, foo <=1.0.0 is forbidden.
And because there is no version of foo in ∅, foo <2.0.0 is forbidden. (1)

Because there is no version of bar in ∅ and foo 2.0.0 depends on bar ∅, foo 2.0.0 is forbidden.
And because foo <2.0.0 is forbidden (1), foo <3.0.0 is forbidden. (2)

Because there is no version of bar in ∅ and foo 3.0.0 depends on bar ∅, foo 3.0.0 is forbidden.
And because foo <3.0.0 is forbidden (2), foo <4.0.0 is forbidden. (3)

Because there is no version of bar in ∅ and foo >=4.0.0 depends on bar ∅, foo >=4.0.0 is forbidden.
And because foo <4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root * is forbidden.

@zanieb
Copy link
Member

zanieb commented Nov 28, 2023

@Eh2406 ah that makes a lot of sense! I'll explore the effect on more error messages then.

@zanieb
Copy link
Member

zanieb commented Nov 28, 2023

With 53c9f6d we get better handling of the empty ranges

Because there is no available version for bar and foo <=1.0.0 depends on bar, foo <=1.0.0 is forbidden.
And because there is no available version for foo, foo <2.0.0 is forbidden. (1)

Because there is no available version for bar and foo 2.0.0 depends on bar, foo 2.0.0 is forbidden.
And because foo <2.0.0 is forbidden (1), foo <3.0.0 is forbidden. (2)

Because there is no available version for bar and foo 3.0.0 depends on bar, foo 3.0.0 is forbidden.
And because foo <3.0.0 is forbidden (2), foo <4.0.0 is forbidden. (3)

Because there is no available version for bar and foo >=4.0.0 depends on bar, foo >=4.0.0 is forbidden.
And because foo <4.0.0 is forbidden (3), foo * is forbidden.
And because root depends on foo, root * is forbidden.

There's a bit of a problem because And because there is no available version for foo should read And because there is no available version for foo >1.00,<2.0.0 and And because foo <2.0.0 is forbidden (1) should read And because foo <2.0.0 is forbidden (1) and there is no available version for foo >2.0.0,<3.0.0

While attempting to use this simplification code I got an odd lifetime error with
```
let c = set.complement();
let s = c.simplify(versions);
s.complement()
```
By in lining locate_versions the lifetimes could be simplified so that that code works
@Eh2406
Copy link
Member Author

Eh2406 commented Nov 28, 2023

Right. This simplification code assumes that information about versions that don't exist is unneeded. Which is not true when dealing with "NoVertions", or anything derived from them. Because of #155, everything in this example derives from a "NoVertions". I'm open to ideas on how to get us to a better place.

In the meantime, the open question in this PR is whether this code is useful and worth merging.

@zanieb
Copy link
Member

zanieb commented Nov 28, 2023

I think this is a pretty clear path to better error messages. We can either continue tackling it piecewise by merging or devote a new branch to error messaging and merge the whole thing to dev at once.

@Eh2406
Copy link
Member Author

Eh2406 commented Nov 28, 2023

This project has been plagued with long living branches, so I'm biased toward merging as often as is acceptable.

Furthermore there are at least two independent ways to build on this PR, incorporating it in the algorithm and using it on the output, which could be done in parallel and may each take several attempts.

@mpizenberg
Copy link
Member

That's a lot nicer error message @zanieb ! If this is useful as-is I'd agree with @Eh2406 that we can merge it.

src/range.rs Outdated Show resolved Hide resolved
src/range.rs Outdated Show resolved Hide resolved
Eh2406 and others added 2 commits November 29, 2023 11:09
Co-authored-by: Zanie Blue <[email protected]>
Co-authored-by: Zanie Blue <[email protected]>
@Eh2406 Eh2406 added this pull request to the merge queue Nov 29, 2023
Merged via the queue into dev with commit 2b2d8d4 Nov 29, 2023
5 checks passed
@Eh2406 Eh2406 deleted the simplify branch November 29, 2023 16:13
zanieb added a commit to astral-sh/uv that referenced this pull request Dec 12, 2023
Uses pubgrub-rs/pubgrub#156 to consolidate
version ranges in error reports using the actual available versions for
each package.

Alternative to astral-sh/pubgrub#8 which implements
this behavior as a method in the `Reporter` — here it's implemented in
our custom report formatter (#521) instead which requires no upstream
changes.

Requires astral-sh/pubgrub#11 to only retrieve the
versions for packages that will be used in the report.

This is a work in progress. Some things to do:
- ~We may want to allow lazy retrieval of the version maps from the
formatter~
- [x] We should probably create a separate error type for no solution
instead of mixing them with other resolve errors
- ~We can probably do something smarter than creating vectors to hold
the versions~
- [x] This degrades error messages when a single version is not
available, we'll need to special case that
- [x] It seems safer to coerce the error type in `resolve` instead of
`solve` if feasible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants