-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use strict optional checking in req_install.py #11379
Conversation
Suggested by pradyunsg in pypa#11374 Since half of the API of this class depends on self.req not being None, it seems like we should just prevent users from passing None here. However, I wasn't able to make that change. Rather than sprinkle asserts everywhere, I added "checked" properties. I find this less ad hoc and easier to adapt if e.g. we're able to make self.req never None in the future. There are now some code paths where we have asserts that we didn't before. I relied on other type hints in pip's code base to be accurate. If that is not the case and we actually relied on some function being able to accept None when not typed as such, we may hit these asserts. But hopefully tests would catch such a thing.
src/pip/_internal/req/req_install.py
Outdated
def checked_req(self) -> Requirement: | ||
assert self.req is not None | ||
return self.req | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry but I don’t really like this approach. If we can’t (via type checks) guarantee that the input is never None
, then that means we expect the assert to trigger. And we shouldn’t ever be adding asserts that we expect to get triggered - IMO asserts are just for cases that “cannot happen”.
If the problem is that the requirement can be in two states, “under construction” where it’s not safe to assume these values are set up, and “ready”, where everything is definitely not None
, then we should work out how to express that in the type system - and if we can’t, raise the issue with the typing people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, these are all asserts that "cannot happen". I've only added these asserts on code paths that mypy indicates you'd get e.g. a TypeError or AttributeError if something was actually None. The current PR should preserve all non-exceptional semantics. (For completeness, in the PR description I mention that mypy could be wrong if existing annotations in pip are wrong, but I validated several of them and found no issues. I'm quite confident in this change)
I can't quite tell exactly what you're objecting to... If it's just the use of property
I'd be happy to inline these asserts wherever they get used. Note that this sort of defensive programming / use of self-documenting preconditions is already commonly used in this file, e.g. see the use of assert self.source_dir
on all the methods that touch it.
If the problem is that the requirement can be in two states,
It is possible to express this kind of thing in the type system, but a) it would require much larger changes than this one, b) knowing what I know of your feelings about typing, I suspect you wouldn't like it either ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I'm here because @pradyunsg indicated such a PR would be welcome and because I do think this is an improvement over unsound type checking.
I'd be happy to inline asserts instead of using properties and address uranusjr's comments. But if your hesitation about the approach goes beyond that, feel free to close this PR :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually say that tackling a different file than this one would be a better place to start. But, too late for that. :)
That said, inline asserts are definitely a better thing to do here instead of these "checked_foo" properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AAARGH!!!! I had an almost-complete extended explanation of my position here, and github (or my browser) threw it away when I went to check a detail of the PR. Apologies if this is a bit disjointed, I'm trying to remember the points I made before...
It is possible to express this kind of thing in the type system, but a) it would require much larger changes than this one, b) knowing what I know of your feelings about typing, I suspect you wouldn't like it either ;-)
lol, thanks for your understanding :-) After looking more closely, I see a bit better what you're trying to do here. I am trying to look at this from the perspective that type checks are a good thing, apologies if I don't manage to completely hide my reservations1.
First of all, I'm honestly rather appalled by mypy's current behaviour here. We tell the type checker that req
is an Optional[Requirement]
, and yet it still lets us try to access properties of req
without ensuring it's not None
? That's basically just not doing its job. I'm sorry, I don't want to hear any "practicality" arguments here, either check the damn types or don't claim to be a type checker...
So yes, I guess that means I do support making this change. My concern is that we do it in a way that doesn't simply ignore the issue that type checking has found. And my worry is that asserts do precisely that - they simply add a runtime check that replicates the type check, and yet doesn't give any more user friendly result than the exception you get if you try to access an attribute of None
. If we're willing to dump a traceback on the user anyway, why not just use cast
to ignore the type2?
More realistically, I do think we should consider refactoring InstallRequirement
so that we can somehow say "until this happens, req
is optional, and we can't use methods that expect req
to be set". As things currently stand, I don't even know when that point would be, so that would definitely be a benefit. But that involves a potentially major refactor of one of the ugliest bits of pip, so I'm certainly not suggesting it's something we should tackle lightly.
So where does that leave us? Apparently, mypy knew about a problem in our code, so that's good - sort of. It seems like we have a few options:
- Defer fixing the issue until we can fix it "properly". This could take some time, and in the meantime we have to allow mypy to continue ignoring the issue.
- Try to find a way to check and report the problem at runtime, in a way that gives the user proper feedback on what went wrong (that would likely need to be done on a case by case basis).
- Patch over the problem with a runtime assert, to satisfy mypy but without improving the user experience if something goes wrong.
- Add complicated type machinery to express what's happening properly. I'm not against this, despite your suspicions, but I am worried that the more we use complex type features, the more we add to the maintenance burden as few of the maintainers will have the skills to work with that code.
- Do something adhoc like
getattr(self.req, "name", "<unknown requirement>")
. Is that better than an error? Who knows? - Do nothing. Accept that we can't type this properly, but we don't want to spend time fixing the underlying problem.
At the moment, I'm not sure this problem is important enough to need anything more than (6). But if it does, I think we should at least consider a better solution than just adding asserts (which let us fix the types, but otherwise gain us nothing).
@pypa/pip-committers what do you think? Am I being too strict3 here?
I'd be happy to inline asserts instead of using properties and address uranusjr's comments. But if your hesitation about the approach goes beyond that, feel free to close this PR :-)
If we go with asserts, I agree that a property is a good way of handling it. And I certainly don't intend to just close this PR - your work is appreciated even if I have questions about the approach. Ultimately, all I wanted to do here was to raise the question of whether we should do something to tackle the underlying issue, and to make the point that if we're going to implement type checking, we should take type errors seriously - and at least consider other options before asserting.
Footnotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch over the problem with a runtime assert, to satisfy mypy but without improving the user experience if something goes wrong.
IMO, this is the cleanest thing to do here to make things explicit at the point of use. The underlying problem here is of bad design of this object -- something that we've discussed at length in #6607. :)
Having these asserts would make it easier to identify the underlying assumptions and make it easier to spot what bits we need to refactor, when we finally come around to #6607.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying that we take the type error as flagging an implied contract that we've not been enforcing, rather than an actual error? OK, I can see that. And if that's the way we're looking at it, I agree that asserting at the point of use is more appropriate than having a checked_*
property.
There's still something that makes me uncomfortable about this approach to type errors. I'm finding it very hard to put my reservation into words, but essentially it feels like we're not actually fixing type errors, but rather just finding ways to shut the checks up. Maybe that's just an artifact of the process of adding type annotations to an established codebase. But if we establish a "the code is right, get the type checker to understand that" mindset, we're not getting the benefit from type annotations.
🤷 At this point, I've probably spent more time (of mine and everyone else's) on this than it could ever justify. So I'll leave it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that replacing static type checks with runtime type checks alone doesn't get anyone very far. Static analysis works best when it confirms your invariants for you, which is not what happens with assert.
In fact, my first attempt at this PR was to disallow InstallRequirement from taking req=None, but I lack familiarity with the codebase and it ended up becoming a daunting change.
I do think there is still some incremental benefit to option 3 over your option 1:
- It lets us get rid of
# mypy: strict-optional=False
, which I agree is appalling. Doing so allows the type checker to point out potential problems when modifying or refactoring, even if we paper over existing issues to some extent. - It states our invariants more clearly. Right now one has to trace code to know whether a method can be called when self.req is None.
- The user experience is marginally better with an assert, as opposed to e.g. "None has no attribute" on some line somewhere and having to figure out how far back in the stack that None comes from.
I'll go ahead and inline the checked* properties. I agree that an inline assert more clearly documents implied contracts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience while I thought this through in public 🙂
src/pip/_internal/req/req_install.py
Outdated
return self.req.specifier | ||
return self.checked_req.specifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more appropriate fix would be
if self.req is None:
return SpecifierSet()
return self.req.specifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather that we have assert self.req
here, making the existing contract that req needs to be non-None explicit.
src/pip/_internal/req/req_install.py
Outdated
if link and link.hash: | ||
if link and link.hash and link.hash_name: | ||
good_hashes.setdefault(link.hash_name, []).append(link.hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if this is actually an error in the annotation and good_hashes
should really be typed as Dict[Optional[str], List[str]]
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to have None
be the hash algorithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be assert link.hash_name is not None
or equivalent here, instead of changing the if statement here.
Thanks for picking this up @hauntsaninja!
Personally, I don't think checked properties are a good way to do this. I'd much prefer to have asserts sprinkled in various places for this. That'll do a better job of making it explicit what the contract is, and can help guide refactors as well. FWIW, I'll say that you picked the most difficult one first. ;) |
Sure thing! Sounds good and can do. I'll wait to hear back from Paul to make sure he's onboard with adding asserts, just so I don't end up wasting my time with this! |
That's because we'd have disabled the check that it does via the |
Okay, I've gotten rid of the checked properties. The one exception is that the pre-existing |
Hmm, I’d got the impression this was the default for mypy, and we had enabled the check explicitly but then re-disabled it on a file by file basis. Regardless, I think it’s bizarre to be able to disable this check. Why just this one? Maybe people want to disable checks that things are |
Since you seem curious for the history here / why the appalling behaviour existed. It did use to be the default for mypy a long time ago (like >5 years).
Here's the pip commit where the relevant mypy config was changed: #6704 |
Bump! :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
Looks like there’s some edge case you’re not handling correctly. I think |
Fixed the bug, I'd incorrectly shortened an if condition. Wasn't related to case where I went through and audited all the assertions again. They all look sound to me in that all of the relevant ones would currently raise TypeError if |
@@ -457,6 +459,7 @@ def is_wheel_from_cache(self) -> bool: | |||
# Things valid for sdists | |||
@property | |||
def unpacked_source_directory(self) -> str: | |||
assert self.source_dir, f"No source dir for {self}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is one place where we could raise an AssertionError where the existing code would not fail: if self.source_dir
was an empty string. But assert self.source_dir
appears many other times in this file, so I think is okay?
Okay great. |
Merged master and the spurious test failure has gone away :-) |
Thanks @hauntsaninja! |
* Use strict optional checking in req_install.py Suggested by pradyunsg in pypa#11374 Since half of the API of this class depends on self.req not being None, it seems like we should just prevent users from passing None here. However, I wasn't able to make that change. Rather than sprinkle asserts everywhere, I added "checked" properties. I find this less ad hoc and easier to adapt if e.g. we're able to make self.req never None in the future. There are now some code paths where we have asserts that we didn't before. I relied on other type hints in pip's code base to be accurate. If that is not the case and we actually relied on some function being able to accept None when not typed as such, we may hit these asserts. But hopefully tests would catch such a thing. * news * black * inline asserts * code review * fix up merge issue * fix specifier bug
Suggested by pradyunsg in #11374
The new assertions look sound to me in that they would currently raise TypeError if self.req was ever None. It's pretty straightforward to check this yourself, the only case I found tricky is the assertion in install where you have to go a couple layers deep to find where it would fail if self.name was None.
link.hash_name
is minorly tricky too, but there are plenty of places inHashes
where things would fail if it was ever None.