Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-deps] Caching during dependency resolution #8720

Open
McSinyx opened this issue Aug 6, 2020 · 4 comments
Open

[fast-deps] Caching during dependency resolution #8720

McSinyx opened this issue Aug 6, 2020 · 4 comments
Labels
C: cache Dealing with cache and files in it C: download About fetching data from PyPI and other sources S: needs triage Issues/PRs that need to be triaged state: needs discussion This needs some more discussion

Comments

@McSinyx
Copy link
Contributor

McSinyx commented Aug 6, 2020

Depending on the command, the file may already exist:

  1. pip install - wheel cache or HTTP cache
  2. pip wheel - wheel download folder, wheel cache, or HTTP cache
  3. pip download - download folder, wheel cache, or HTTP cache

In all three cases, I think the wheel cache piece may be handled earlier during resolving, so we wouldn't inadvertently use a lazy wheel for it.

In all three cases, it may be the case that the file we want is already in the HTTP cache. I don't know if range requests bypass the HTTP cache or it does what we want (which would be to return the requested bytes from the file on disk). Maybe it is worth it to see if the request is cached first, and just skip the lazy wheel?

In pip wheel and pip download, the user may have already downloaded some of the files, and we would want to be able to use those rather than lazily downloading the metadata. Note that the logic refactored in #8685 does not account for this.

Originally posted by @chrahunt in #8697 (comment)

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Aug 6, 2020
@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 6, 2020

The first thing I'll try to do is to avoid the lazy wheel from being used if the distribution is available in the wheel cache, and probably the download directory since the two are handled pretty closely in the execution path.

pip's current implementation of HTTP cache doesn't work out of the box for range requests (GH-8701) but it might be possible.

@pradyunsg
Copy link
Member

pradyunsg commented Aug 6, 2020

If the distribution is in the cache, I think we can skip the fast-deps logic for it. There's no point doing a "partial" download, for something that's available in a local cache.

@chrahunt
Copy link
Member

chrahunt commented Aug 6, 2020

A few possible approaches for handling the HTTP cache:

  1. query the HTTP cache. This adds some more dependencies between internal components, but it would all be in pip itself.
  2. if cachecontrol is Range-request aware and returns from_cache=True on responses for files that are already fully cached, then the point at which we'd know to not use fast-deps for the file would be the first request made inside LazyWheel
  3. if cachecontrol is Range-request aware and we ignore whether the response is cached, then there's a little bit more overhead than there would be otherwise for metadata but the code stays pretty clean compared to options 1 and 2

The last two assume we make cachecontrol Range-request aware, and I'm not sure what the effort for that would be. I would guess not much if there's agreement on how it should behave. If we go that way then it'd be good to compare what we can do internally vs changes that'd be best made upstream in cachecontrol itself.

@pradyunsg pradyunsg added C: download About fetching data from PyPI and other sources state: needs discussion This needs some more discussion C: cache Dealing with cache and files in it labels Aug 13, 2020
@McSinyx
Copy link
Contributor Author

McSinyx commented Aug 25, 2020

I just found out that wheel cache is handled by the resolver:

class LinkCandidate(_InstallRequirementBackedCandidate):
is_editable = False
def __init__(
self,
link, # type: Link
template, # type: InstallRequirement
factory, # type: Factory
name=None, # type: Optional[str]
version=None, # type: Optional[_BaseVersion]
):
# type: (...) -> None
source_link = link
cache_entry = factory.get_wheel_cache_entry(link, name)
if cache_entry is not None:
logger.debug("Using cached wheel link: %s", cache_entry.link)
link = cache_entry.link

Download folders will be checked with GH-8804 merged. I will try to see what are the possibilities with HTTP cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: cache Dealing with cache and files in it C: download About fetching data from PyPI and other sources S: needs triage Issues/PRs that need to be triaged state: needs discussion This needs some more discussion
Projects
None yet
Development

No branches or pull requests

3 participants