Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download models via pip/requirements always re-runs #1143

Closed
oliverdain opened this issue Jun 21, 2017 · 6 comments
Closed

Download models via pip/requirements always re-runs #1143

oliverdain opened this issue Jun 21, 2017 · 6 comments
Labels
docs Documentation and website install Installation issues third-party Third-party packages and services

Comments

@oliverdain
Copy link

I added https://github.com/explosion/spacy-models/releases/download/es_core_web_md-1.0.0/es_core_web_md-1.0.0.tar.gz to my requirements.txt and then installed it via pip -r requirements.txt. That worked as expected. However, if I then re-run pip -r requirements.txt it doesn't correctly detect that the model is installed and it re-installs it. That means that every time I add a package to my requirements.txt I have to wait while a 350MB file is downloaded and installed. Am I doing something wrong or is there perhaps something wrong with the pip file provided?

@ines ines added the install Installation issues label Jun 22, 2017
@ines
Copy link
Member

ines commented Jun 22, 2017

As far as I know, this is just pip's default behaviour: If you install a package from a file or URL, it'll always be overwritten (possibly because pip can't fetch any metadata for the package from a server).

I'll look into the pip options – maybe there's a flag that can be set, or some other workaround to prevent this. (In any case, in v2.0+, this will be at least less annoying, as the models are much smaller - e.g. 15 MB for the small English model.)

@oliverdain
Copy link
Author

Normal pip install does detect if things are already installed and does not re-download. For example, here's some output:

$ time pip install -r requirements.txt
Requirement already satisfied: spacy==1.8 in ./build/virtualenv/lib/python3.5/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied: pandas==0.20.2 in ./build/virtualenv/lib/python3.5/site-packages (from -r requirements.txt (line 2))
Requirement already satisfied: pathlib in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: preshed<2.0.0,>=1.0.0 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: ujson>=1.35 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: plac<1.0.0,>=0.9.6 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: six in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: regex==2017.4.5 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: cymem<1.32,>=1.30 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: thinc<6.6.0,>=6.5.0 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: dill<0.3,>=0.2 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: numpy>=1.7 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: murmurhash<0.27,>=0.26 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: requests<3.0.0,>=2.13.0 in ./build/virtualenv/lib/python3.5/site-packages (from spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: pytz>=2011k in ./build/virtualenv/lib/python3.5/site-packages (from pandas==0.20.2->-r requirements.txt (line 2))
Requirement already satisfied: python-dateutil>=2 in ./build/virtualenv/lib/python3.5/site-packages (from pandas==0.20.2->-r requirements.txt (line 2))
Requirement already satisfied: cytoolz<0.9,>=0.8 in ./build/virtualenv/lib/python3.5/site-packages (from thinc<6.6.0,>=6.5.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in ./build/virtualenv/lib/python3.5/site-packages (from thinc<6.6.0,>=6.5.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: wrapt in ./build/virtualenv/lib/python3.5/site-packages (from thinc<6.6.0,>=6.5.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: termcolor in ./build/virtualenv/lib/python3.5/site-packages (from thinc<6.6.0,>=6.5.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: certifi>=2017.4.17 in ./build/virtualenv/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in ./build/virtualenv/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: urllib3<1.22,>=1.21.1 in ./build/virtualenv/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: idna<2.6,>=2.5 in ./build/virtualenv/lib/python3.5/site-packages (from requests<3.0.0,>=2.13.0->spacy==1.8->-r requirements.txt (line 1))
Requirement already satisfied: toolz>=0.8.0 in ./build/virtualenv/lib/python3.5/site-packages (from cytoolz<0.9,>=0.8->thinc<6.6.0,>=6.5.0->spacy==1.8->-r requirements.txt (line 1))

real	0m0.567s
user	0m0.487s
sys	0m0.062s

Note all the "Requirement already satisfied". Those come back immediately, nothing is downloaded, and the whole thing ran in under a second. When you run it for the first time, with nothing installed, everything is downloaded and it takes several minutes to run.

The models are different - they get re-downloaded every time. pip can not tell that it's already installed them. I'm not a pip expert so I'm not sure if that's because they're specified by URL rather than being downloaded from pyPi via some metadata or if there's an issue with the setup.py. Either way it means that adding one or two new dependencies makes the runtime go from a few seconds to 10's of minutes.

@ines
Copy link
Member

ines commented Jun 23, 2017

I'm not a pip expert so I'm not sure if that's because they're specified by URL rather than being downloaded from pyPi via some metadata

Yeah, I'm pretty sure this is the case – sorry if this wasn't clear from my comment. In your example above, all packages listed as "Requirement already satisfied" are available on PyPi with meta data.

The good news is, I just did some digging in the pip docs and I think I found a solution: If I specify the package name as #egg=en_core_web_md attached to the URL, it doesn't redownload and I get a "Requirement already satisfied" message:

pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_md-1.2.0/en_core_web_md-1.2.0.tar.gz#egg=en_core_web_md"

Does this work for you?

@ines ines added the third-party Third-party packages and services label Jun 23, 2017
@oliverdain
Copy link
Author

That's perfect. Thanks!

Just a note: I think you've got a typo in that example. The model being downloaded is Spanish ("es" not "en") but the #egg= line is specifying "en".

@ines
Copy link
Member

ines commented Jun 24, 2017

Oops, copied the wrong URL – fixed, thanks! And I'm glad it worked – I'll make sure to add this to the docs as well, might be very helpful for others, too.

Btw, it looks like you can even specify the version within #egg= and use the same method in the dependency_links in your setup.py, see here: https://stackoverflow.com/a/3481388/6400719 (I'm actually working on another project at the moment that'll depend on spaCy and a model, so I'll play around with this some more.)

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website install Installation issues third-party Third-party packages and services
Projects
None yet
Development

No branches or pull requests

2 participants