Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poc: replace cachy with cache module #5868

Merged
merged 2 commits into from
Oct 11, 2022

Conversation

chadac
Copy link
Contributor

@chadac chadac commented Jun 18, 2022

Related to the discussion in #5720.

This is a test to see what would be needed to pull out of cachy to get it into Poetry entirely. The result is a new utils module, poetry.utils.cache, that is about 250 LOC. I also went ahead and made a couple of small improvements:

  • Broke up cachy into simpler classes, mainly to push stuff out of config and into typed variables. Now caches are declared separately -- FileCache, DictCache
  • Add types to methods and generics
  • Update the file format. Before it was the expiration time concatenated with the payload, now it's purely JSON encoded. Added compatibility tests and checks to ensure the migration won't break existing caches.

Pull Request Check List

  • Added tests for changed code.
  • Updated documentation for changed code.

@chadac chadac changed the title poc: Replace cachy with cache module poc: replace cachy with cache module Jun 18, 2022
@chadac chadac marked this pull request as ready for review June 18, 2022 18:41
src/poetry/repositories/pypi_repository.py Outdated Show resolved Hide resolved
src/poetry/utils/cache.py Outdated Show resolved Hide resolved
src/poetry/utils/cache.py Outdated Show resolved Hide resolved
src/poetry/utils/cache.py Outdated Show resolved Hide resolved
abn
abn previously requested changes Jun 20, 2022
src/poetry/repositories/legacy_repository.py Outdated Show resolved Hide resolved
src/poetry/repositories/pypi_repository.py Outdated Show resolved Hide resolved
src/poetry/repositories/pypi_repository.py Outdated Show resolved Hide resolved
src/poetry/utils/cache.py Show resolved Hide resolved
src/poetry/utils/cache.py Outdated Show resolved Hide resolved
@abn
Copy link
Member

abn commented Jun 20, 2022

Have not gone through the cache logic itself, but left some high level comments regarding the change.

@Secrus
Copy link
Member

Secrus commented Aug 26, 2022

@chadac any progress on this?

@chadac
Copy link
Contributor Author

chadac commented Sep 13, 2022

@Secrus Apologies, got pulled into some busy separate stuff that has occupied my time. I'll commit to get some of those profiling stats up for next week so to help reach a decision on one of those threads above (for DictCache).

@neersighted neersighted added this to the 1.3 milestone Oct 10, 2022
@neersighted neersighted added kind/refactor Pulls that refactor, or clean-up code area/sources Releated to package sources/indexes/repositories labels Oct 10, 2022
@chadac
Copy link
Contributor Author

chadac commented Oct 10, 2022

Posted an update which cuts down a lot of the junk. I did do a test on the proposed match and package caches referenced above. For PyPiRepository I found relatively little difference, so I focused testing on LegacyRepository with a sample package of about 10 packages. With caching, average install time took about 55.2s +/- 1.5s, while without caching it took around 56.1s +/- 2.7s. This was for a sample size of 10, each run inside a Docker container under the same conditions. Since this is well within the margin of error I believe it's not a significant difference, but I'm increasing the sample size now to increase confidence. I'll get a proper hypothesis test once ready to officially confirm.

pyproject.toml Outdated Show resolved Hide resolved
src/poetry/utils/cache.py Outdated Show resolved Hide resolved
tests/console/commands/cache/test_clear.py Outdated Show resolved Hide resolved
tests/utils/test_cache.py Outdated Show resolved Hide resolved
@chadac
Copy link
Contributor Author

chadac commented Oct 11, 2022

I did end up running a more complete performance comparison for the package & match cache. The results mostly agreed with what was shown earlier; when comparing 50 identical runs between the two, the improvement in performance was about 0.5%, which was well within the margin of error (difference in average: 0.3s, with standard deviation of +/- 3.2s). The student t test reports a p-value of around 0.6, indicating that there was very little certainty that this adds any improvement at all.

Copy link
Member

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good to me -- could you address the last review comments and then squash history into something mergable?

e.g. two commits: introduce FileCache and tests, then cut over to the new FileCache

pyproject.toml Show resolved Hide resolved
tests/utils/test_cache.py Outdated Show resolved Hide resolved
@chadac
Copy link
Contributor Author

chadac commented Oct 11, 2022

Sounds good, lemme refactor the commits.

@neersighted neersighted enabled auto-merge (rebase) October 11, 2022 20:54
@neersighted neersighted merged commit c2b1fb3 into python-poetry:master Oct 11, 2022
@chadac chadac deleted the feature/cachy-migration branch October 11, 2022 21:08
neersighted pushed a commit that referenced this pull request Nov 3, 2022
At #5868, I
[suggested](#5868 (comment))
that some of the caching that was being reworked could be removed
altogether, because cachecontrol was already taking care of it just
fine.

But now I find myself using an Azure artifacts repository, and it is
returning headers that insist that the client does not do any caching:
```
cache-control: no-cache
pragma: no-cache           
x-cache: CONFIG_NOCACHE 
```
(pypi, by contrast, sets max-age to 10 minutes here).

So I was wrong! And now I am seeing a big performance hit in some
projects where the solve involves overrides and backtracking: and
therefore hitting the legacy simple API repeatedly.

However, we don't need all the mechanism of cachy and its like for this,
a well-placed `@lru_cache()` seems more than sufficient.

This makes me wonder whether it wouldn't be better to do similar for
pypi anyway, and rip out cachecontrol altogether. But let's keep it
simple for now: this is an easy fix to a performance regression.
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/sources Releated to package sources/indexes/repositories kind/refactor Pulls that refactor, or clean-up code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants