-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download models via pip/requirements always re-runs #1143
Comments
As far as I know, this is just pip's default behaviour: If you install a package from a file or URL, it'll always be overwritten (possibly because pip can't fetch any metadata for the package from a server). I'll look into the pip options – maybe there's a flag that can be set, or some other workaround to prevent this. (In any case, in v2.0+, this will be at least less annoying, as the models are much smaller - e.g. 15 MB for the small English model.) |
Normal pip install does detect if things are already installed and does not re-download. For example, here's some output:
Note all the "Requirement already satisfied". Those come back immediately, nothing is downloaded, and the whole thing ran in under a second. When you run it for the first time, with nothing installed, everything is downloaded and it takes several minutes to run. The models are different - they get re-downloaded every time. pip can not tell that it's already installed them. I'm not a pip expert so I'm not sure if that's because they're specified by URL rather than being downloaded from pyPi via some metadata or if there's an issue with the setup.py. Either way it means that adding one or two new dependencies makes the runtime go from a few seconds to 10's of minutes. |
Yeah, I'm pretty sure this is the case – sorry if this wasn't clear from my comment. In your example above, all packages listed as "Requirement already satisfied" are available on PyPi with meta data. The good news is, I just did some digging in the pip docs and I think I found a solution: If I specify the package name as pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz#egg=en_core_web_md" Does this work for you? |
That's perfect. Thanks! Just a note: I think you've got a typo in that example. The model being downloaded is Spanish ("es" not "en") but the |
Oops, copied the wrong URL – fixed, thanks! And I'm glad it worked – I'll make sure to add this to the docs as well, might be very helpful for others, too. Btw, it looks like you can even specify the version within |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I added
https://github.com/explosion/spacy-models/releases/download/es_core_web_md-1.0.0/es_core_web_md-1.0.0.tar.gz
to my requirements.txt and then installed it viapip -r requirements.txt
. That worked as expected. However, if I then re-runpip -r requirements.txt
it doesn't correctly detect that the model is installed and it re-installs it. That means that every time I add a package to myrequirements.txt
I have to wait while a 350MB file is downloaded and installed. Am I doing something wrong or is there perhaps something wrong with the pip file provided?The text was updated successfully, but these errors were encountered: