Installing models using pip: improve documentation #1099

danielhers · 2017-06-04T14:25:16Z

Right now, my package has spaCy as a requirement in requirements.txt, and python -m spacy download en is run as part of the installation process.
According to the documentation, models can be listed in requirements.txt, but no example is given. How can I add a requirements.txt entry to just install the default English models?
And is this enough, or will I then also need to run python -m spacy link or something?

The text was updated successfully, but these errors were encountered:

ines · 2017-06-04T17:19:49Z

Thanks – and sorry about the confusion. I agree, this should definitely be more clear!

The standard way of installing packages specified in the requirements.txt assumes they're downloadable via a PyPi server (usually pypi.python.org). While model packages are valid pip packages, they can't be uploaded to theofficial PyPi directory, as they don't meet the requirements (they're too large and consist of mostly binary data). However, a lot of companies run their own internal installations of PyPi – in that case, you can simply upload the model there and point your pip at the internal server.

Alternatively, pip also lets you specify URLs and other sources in the requirements – see here for more info and examples. So instead of only the package name, you can add the URLs of the models you want to install.

This won't run any spaCy internals like download (which is mostly a convenience wrapper for pip's installer) or link. So you'll either have to create the symlink yourself afterwards, or load the model by importing the package and calling its load() method with no arguments:

import en_core_web_sm
nlp = en_core_web_sm.load()

In general, we do recommend this syntax for larger code bases because it doesn't depend on symlinks, and is cleaner and more "native" – for example, if a model package is not installed, Python will raise an ImportError immediately, instead of failing somewhere down the line when calling spacy.load().

So if specifying models in your requirements.txt is useful for your project, there's a high chance that native model imports will actually be more convenient as well. I hope this helps – will definitely add a section about this to the docs as well 👍

TL;DR Adding the model URL instead of the package name to your requirements.txt and importing the model as a package in your code should do the trick.

danielhers · 2017-06-04T17:55:38Z

Thank you, this is very clear!
So is en_core_web_sm the same package I get when I run python -m spacy download en?

ines · 2017-06-04T17:59:01Z

Yes, en and all other shortcuts download the default models, usually the most compact ones – in this case en_core_web_sm. (In the list of available models, the default models are the ones marked with a star. Internally, spaCy resolves the shortcuts by looking them up in this table.)

lalvarezguillen · 2017-06-06T20:07:24Z

Very clear indeed! Now I'm wondering if there's a simple equivalent for setup.py

We used a call to spacy.en.download in our setup.py to install the required modules, I believe the practice is deprecated or frowned upon.

ines · 2017-06-06T21:45:23Z

@lalvarezguillen I think you might be looking for a solution like this: https://stackoverflow.com/a/3481388/6400719

We used a call to spacy.en.download in our setup.py to install the required modules, I believe the practice is deprecated or frowned upon.

In theory, you could still use spacy.cli.download for this (spacy.en.download is deprecated since v1.7). I wouldn't say that this practice is frowned upon, but we definitely wouldn't recommend it for production use. If you know which model your application needs, you shouldn't have to do an additional roundtrip and depend on spaCy's downloader just to fetch and pip install a package from a URL. (This was also part of the reason we decided to publish the models on GitHub and not just route all requests via our server. Especially since there's not just one "the model" anymore, but several different ones for different languages and use cases.)

Btw, in spaCy v2.x, another option could be to simply package the models with your application. The new alpha models are only 12 and 15 MB – about the size of the spaCy package, and probably smaller than many other random pip packages.

Edit: Just to clarify, this approach would be mostly for internal production use – not if you're actually distributing your package on PyPi or GitHub. While the model licenses (CC BY-SA) allow redistribution, we don't want to encourage people to reupload and mirror the official spaCy models. After all, they're just binary data and we want to make sure that there's only one official distribution. This makes things safer and less confusing for everyone.

ines · 2017-07-22T13:44:59Z

Addressed in 7c4bf99 and live here!

…9\#issuecomment-306053749

shuhei · 2018-01-15T08:39:07Z

For python newbies like me. To add a model to Pipfile:

[packages]

spacy = "*"
de_core_news_sm = { file = 'https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.0.0/de_core_news_sm-2.0.0.tar.gz' }

msmedes · 2018-04-22T16:01:57Z

Not sure why but I added the model to my Pipfile, updated the lock file, but spacy doesn't appear to be working. Right now my Pipfile looks like this:

[packages]
spacy = "*"
gunicorn = "*"
flask = "*"
"en_core_web_sm" = {file = "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz"}

and my import and package loading looks like this:

import en_core_web_sm
print("Loading spacy...")
nlp = en_core_web_sm.load()
print(nlp)
print("Spacy loaded.")

my print statements look like this:
11:45:02 web.1 | Loading spacy something...
11:45:03 web.1 | <spacy.lang.en.English object at 0x10d676e80>
11:45:03 web.1 | Spacy loaded.

but when I actually process text or do anything with the nlp object...nothing happens. It might be tokenizing the text but not much else. If I pass text in with doc = nlp(text) and run print(doc) I get the text back. But so far any attempts at looking at doc.ents have failed. Printing doc.ents returns an empty set. I should mention that this whole thing works not through heroku. If I run it in the local environment using python app.py it fires up no problem and processes text. However when I run heroku local web or git push heroku master I get diddly, despite the fact it appears to be loading the spacy model. Any ideas as to what I'm doing wrong?

(Apologies if this is in the wrong place or I should have made a new issue. If so let me know and I'll do so.)

lock · 2018-05-22T16:35:58Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ines added docs Documentation and website models Issues related to the statistical models labels Jun 4, 2017

ines added a commit that referenced this issue Jun 4, 2017

Add more details on model packages and requirements.txt (see #1099)

63cd539

ines mentioned this issue Jun 13, 2017

Specify model in requirements.txt #1129

Closed

ines closed this as completed Jul 22, 2017

Jeiwan mentioned this issue Jul 23, 2017

Deploying to Heroku #308

Closed

chinying referenced this issue in chinying/learn2telegram Jul 29, 2017

heroku issues again ref https://github.com/explosion/spaCy/issues/109…

2cb1c9f

…9\#issuecomment-306053749

lock bot locked as resolved and limited conversation to collaborators May 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installing models using pip: improve documentation #1099

Installing models using pip: improve documentation #1099

danielhers commented Jun 4, 2017 •

edited

Loading

ines commented Jun 4, 2017

danielhers commented Jun 4, 2017

ines commented Jun 4, 2017

lalvarezguillen commented Jun 6, 2017

ines commented Jun 6, 2017 •

edited

Loading

ines commented Jul 22, 2017

shuhei commented Jan 15, 2018

msmedes commented Apr 22, 2018 •

edited

Loading

lock bot commented May 22, 2018

Installing models using pip: improve documentation #1099

Installing models using pip: improve documentation #1099

Comments

danielhers commented Jun 4, 2017 • edited Loading

ines commented Jun 4, 2017

danielhers commented Jun 4, 2017

ines commented Jun 4, 2017

lalvarezguillen commented Jun 6, 2017

ines commented Jun 6, 2017 • edited Loading

ines commented Jul 22, 2017

shuhei commented Jan 15, 2018

msmedes commented Apr 22, 2018 • edited Loading

lock bot commented May 22, 2018

danielhers commented Jun 4, 2017 •

edited

Loading

ines commented Jun 6, 2017 •

edited

Loading

msmedes commented Apr 22, 2018 •

edited

Loading