-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue278 dockerize annif #279
Conversation
Codecov Report
@@ Coverage Diff @@
## master #279 +/- ##
=======================================
Coverage 99.31% 99.31%
=======================================
Files 52 52
Lines 2624 2624
=======================================
Hits 2606 2606
Misses 18 18 Continue to review full report at Codecov.
|
Excellent! In the end we probably want a single Docker image which contains all optional dependencies. The TFIDF-only variant is only useful as an intermediate step, it's unlikely to be useful on its own. Basing on python3.6-slim seems like the best choice. With Alpine it's going to be difficult to install all the dependencies, and the image size is already huge without them for some reason. I think we also want to include a WSGI server in the image, for example uwsgi or gunicorn, so that we can run a production web server. Flask includes a simple HTTP server ( We also need documentation in the wiki for how to run Annif via Docker, something similar to this page on the Skosmos wiki. @kinow any comments? (closes #278) |
@osma I saw this commit this morning and was really happy as this would make it easier for some devs - myself included - to run and contribute to Annif. I noticed the several images, which is not a problem, if the intention is to allow users to choose the flavour, and use the image not only for development, but also in production - the Skosmos one was created with development in mind, but with some polishing should be able to run on Amazon, Docker Swarm, etc. Only comment I have is that it would be good if the documentation included how to map the volumes needed for Annif. It would be really great if I could have all the corpora and configuration files needed in some volume, map to one or multiple containers, and be able to switch branches while developing, re-using the Docker files without destroying the data. 🎉 |
Thanks for chiming in @kinow! I think we want to end up with one or maybe two Dockerfiles, not more, but it's good to test different variations first. I agree that at least for development, the configuration and data files should live outside the Docker image, or at least that should be the default option. For some deployment scenarios it might actually be desirable to include the config and data (pretrained models) within the container, but I suppose we could support both. |
Thank you @natlibfi-arlehiko for your comments. Some responses:
The current Dockerfile uses a multi-stage build already.
Swagger UI is built in to Connexion, which is a core component of Annif. I don't see how to easily move it to a separate container.
This is the current plan. The main container only includes the software itself, not the data which is mounted separately.
Unfortunately it seems difficult to get rid of any of these, @juhoinkinen already tried excising the most useless ones, without success. @juhoinkinen Do you think this is starting to be ready for merging? The PR is still marked as draft but I don't see any major actionable TODOs in the recent discussion. |
The way we do it in Melinda's record import API:
Might be possible with connexion too. |
@natlibfi-arlehiko I see. That's good from the perspective of separating concerns between containers, but I don't think it will reduce the container size, since Connexion will include Swagger UI regardless of whether we enable it or not. Did I understand correctly? |
Yeah. Good point :D |
Yes, I just updated the wiki https://github.com/NatLibFi/Annif/wiki/Usage-with-Docker about the docker-compose to combine Annif+NGINX+Mauiservice (before there were two docker-compose files, one for Annif+NGINX and other for Annif+Mauiservice). The current There is also |
Looks good now. My only concern is that the Drone build is failing according to the above status display. I can see there is this error at the end of the Drone
Do you need to add some kind of credentials somewhere? |
Fixed. |
Different Dockerfiles for creating images.
Based on python:3.6-slim (138 MB):
Based on python:3.6-alpine (79.1MB):
Alpine approach seems quite tricky regarding all backends and size increase (and the alpine base image is only 60 MB smaller than python-slim). Maybe best is to proceed on python:3.6-slim based images.
Tests pass in all containers.
There was a problem using Vowpal Wabbit (related to libboost-python-dev, which has dependency on Python2 and also wrongly linked to it). Using VW version 8.4 solves this.
Few TODOs in Dockerfiles for shrinking image size.
Usage of containers is e.g. like this
(or, actually, maybe the container doesn't need to be running all the time...):Start bash in a container running in background (bind-mounting vocabylary and training data):
docker run -v /home/local/jmminkin/data:/data -itd --name annif_container annif bash
Execute commands in the already running container:
docker exec annif_container annif
docker exec annif_container annif loadvoc tfidf-en /data/vocab/yso-en.tsv
docker exec annif_container annif train tfidf-en /data/training/yso-finna-en.tsv.gz
echo This is the sample text to run suggest test on. | docker exec -i annif_container annif suggest tfidf-en