-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache calculated hashes for requirements and _InstallRequirementBackedCandidate #12657
Cache calculated hashes for requirements and _InstallRequirementBackedCandidate #12657
Conversation
Even running the very simple |
Why not hash on demand? Initialise Also, this assumes the fields that are used to compute the hash never change. Do we have anything to ensure that's the case? I'm sure it is, but it would be a little annoying if some clever hack in the future violated that invariant and we didn't catch it. |
I can do that, but be aware it will almost certainly be a little slower, in the scenario of a large number of packages the And I don't think there's a scenario where the hashes aren't being used, maybe in the case of
Only that the hash is dependent on internally marked attributes of internally marked apis, nothing in pip's codebase is changing them. Of course if someone is calling pip as a library and using the internal api and changing the internally marked attributed this will break, but I think lots of other assumptions will break if these attributes are changed, including, I think, the way #12453 was implemented. |
I performance tested this to see if this assumption held up in the face of real data, and found in general the performance difference was within the margin of error. So I've switched to @pfmoore's suggestion of lazily calculating hashes and caching their results. |
78c36e4
to
d15be97
Compare
d15be97
to
1e510e3
Compare
Fixed MyPy errors, cleaned comments, rebased |
This builds on @sbidoul PR #12453.
I was profiling the scenario here #12314 (comment), and found over 30% of the time is spent calculating the hash of specifier requirements, which is mostly spent calculating the string of the install requirement, and is done repeatedly.
After this PR installing all the homeassistant wheels from local directory went from ~48 seconds to ~35 seconds, and dry-run install of apache-airflow[all] with all packages cached on Python 3.12 went from ~4 minutes to ~3 minutes.