-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a multi-stage build process. #475
Conversation
Make the odklite image the minimal image needed to use a "standard" ODK-generated repository.
Update the run.sh wrapper script to allow choosing the ODK image to use without having to edit the file. Use the odklite image by default.
Use constraints to fix the versions of Python packages (constraint file generated by 'pip freeze').
Avoid messing with the default /usr/bin/python executable. Just make sure to always call python3.
Update the version number in docker/odklite/Makefile (used when not called from the top-level Makefile). Define ODK_VERSION in the environment from odklite.
Build the odkfull image on top of the odklite image. Use a multi-stage build to install SWI-Prolog and Soufflé. Do not build the obolibrary/robot image, as it is redundant with the obolibrary/odklite image (which is now barely larger).
There's no pre-compiled binary for fastobo-validator targeting the arm64 architecture, so we build it ourselves on the builder image.
Konclude is broken an all platforms: on x86_64, the version we used to install depends on the musl C library that is not provided in the image; on arm64, we don't have a compatible binary. On x86_64, we now install a more recent version of Konclude that depends on the standard GNU libc. On arm64, we compile Konclude ourselves.
Docker should default to build an image targeting the architecture it is currently running on, but for some reason on M1 that does not seem to always work and it may sometimes try to build an amd64 image instead (thus causing all sorts of failure down the line). Better be explicit and say which architecture we are targeting. The default is taken from the OS system's uname command, so it *should* always give the expected result without any user intervention. If needed, the user can override the automatic detection of architecture by assigning the ARCH variable herself when calling make.
Remove useless '>= 0.0' version qualifiers and remove packages that are not strictly needed in a standard ODK-generated repository.
Move all the build steps to a dedicated "builder" image called "obolibrary/odkbuild".
Create separate rules to build the builder image and the odklite images, and adjust the dependency chain so that odkfull depends on odklite which depends on odkbuild.
Even though the builder image is not intended for public use, we may have to push it to the Docker Hub for building multi-arch images, so we might as well try to reduce its size if we can.
Make sure to build the builder image (as a multi-arch image itself) before attempting to build the final multi-arch images. Because it does not seem there is a way to load a multi-arch image locally, the builder image will have to be pushed to the Docker Hub before it can be used to build the final odklite and odkfull images. This is silly and inefficient, but Docker does not offer any other solution.
Even though odklite is now intended to be sufficient to run a standard ODK-generated repository, it is probably to early to push such a change. We should collect some return on experience from ODK users first.
When installing the Python packages for the odkfull image, pip is ignorant of the packages we have already installed for the odklite image, so some packages already installed in the odklite staging tree may be installed again in the odkfull staging tree, if they are dependencies of the odkfull packages. This would result in artificially and needlessly increasing the size of the odkfull image, because those Python packages will be present in two different layers. To avoid that, we forcibly remove from the odkfull staging tree any file that is already present in the odklite staging tree.
The standard ODK Makefile will soon use curl in some rules, so it needs to be part of the odklite image.
Check that the various programs installed as part of the ODK can be run from the image. This should catch the most obvious installation problems, such as a binary with the wrong architecture (as happened with Konclude recently) or a missing shared library.
When running the test suite, if the image to be tested is not locally available, print a message to say so instead of silently terminating.
Update the README files in the docker directory to reflect the recent changes in the different images.
This is so awesome, I cant contain myself. Running builds locally now! |
Do not mandate package versions in requirements.txt. Use the constraints.txt file to freeze version numbers.
When cleaning up redundant packages in the odkfull staging tree, make sure to remove empty directories (this was not the case because rmdir was called with an invalid -f option).
Since the ODK build process is now a bit more complex, add some docs to explain how and where to add a new component when needed.
Ok I did a few checks locally.. I am wondering if more needs to be done. If you have no reasonable concerns, I am happy. I:
Happy to merge if yo uare. |
The one thing that still needs to be tested is building multi-arch images. It should work as expected (i.e., the If you agree, what I would like to do is to temporarily bump the version number to "1.2.30-alpha" and manually trigger the |
I am not worried about the -alpha; if the build succeeds and the image is pushed, then a |
I was hoping there would be a way to mark an image on Docker Hub as "beta", "not-production-ready", or anything like that, but that does not seem to be the case. Scheiße. The next best option is then to publish under the obotools name (where we initially published the M1-specific images when we were still figuring out how to build multi-arch images) instead of the obolibrary name. I'll do that from my own repo, which is already set up for publishing in obotools. |
Ok, if this works as well, great.. I would anyways like to aim for an ODK update soon, but want to deal with some of the smaller tickets first. Maybe first week of october. |
GitHub multi-arch builders are so slow … 5 hours in and it's still building the |
Use the same name as in the other Makefiles and make sure to push the images to the hub.
… And the build was forcibly cancelled because it was taking too long… There's a 360 minutes limit for each job, and building the There are a few things we can try to circumvent this, such as creating another intermediate image just to build Konclude (so that it is built in a different job, which would be subjected to its own time limit), but I am very reluctant to make the build process even more complex than it already is just to accommodate GitHub's arbitrary limits… Maybe we should just go back to building the images-to-be-released on one of our own machines and forget the |
It's a pity but I don't mind. I have servers etc that do this work for me! Whatever you think is best. I have no problem building locally and pushing! |
I'd say merge as it is then. At the next release, we can try again the automated build, and if it fails again (which is likely, unless in the meantime GitHub switches to faster machines for its builders), you can build and push the images yourself. Let me know if you need any help setting up your local builder for multi-arch builds, if you have not done so already. |
When building the builder image, remove the source code and object files of the projects we are compiling (SWI-Prolog, Soufflé, Fastobo, Konclude), and only keep what has been installed in the staging area. This saves more than 1 GB in the odkbuild image.
I've just pushed one last bit to remove intermediate object files from the I am done tweaking things with that PR, I promise! :) |
Thank you running tests now :) Is it ready to merge then? Are you happy? If my tests pass here, I will just merge it then. |
Still disappointed by how slow the GitHub builders are, and the subsequent inability to use the |
First fail in my third server:
Lets see whether it was a fluke |
Unfortunately it was not.. building the odklite does not work on my Ubunto machine fails..
I am not too worried but so far, I was able to run the builds there.. What is your opinion? |
I can't reproduce the error here, whether on my MacOS machine or my Linux machine… What's the Docker version on your Ubuntu 18.04 machine? |
I completely uninstalled and reinstalled docker:
|
So that's the same version than the one on my MacOS machine, and an even fresher version than the one on my Linux machine… Sorry, I have no idea what could cause the problem here. It looks as if |
Maybe its OS version related somehow in some weird way we don't understand. Lets merge it; works on two of my machines so its fine. |
This PR completely overhauls the way both
odkfull
andodklite
are built, in order to:Along the way, it also includes the proposed freeze of Python packages (#463), fixes the broken Konclude (#473), and adds some basic checks to ensure the various programs of the ODK are installed correctly (#472).
Basically, building
odkfull
is now a 3-steps process:(1) First, an intermediate image called
odkbuild
is built. This image serves as a sort of "staging area", where we can install all the build-time dependencies we want (without worrying about image size) and build everything we need. This includes installing the Python modules, because some of them have to be compiled on arm64 for lack of pre-compiled arm64 wheels.(2) Second,
odklite
is built, using some of the tricks proposed in #471 to keep the image size minimal (mostly forbiddingapt-get
from installing "recommended" packages and using a headless variant of the JRE). That image contains ROBOT, DOSDPTOOLS, OWLTOOLS, and Ammonite, along with the minimal set of Python modules required to runodk.py
. It is intended that this image should be sufficient for use with a "standard" ODK-generated repository. Size of the image: ~650 Mb.(3) Third,
odkfull
is built on top ofodklite
(which is now, by construction, a strict subset ofodkfull
), adding Konclude, Soufflé, Jena, SPARQL-prog, along with a richer set of Python modules and build tools. Size of the image: ~1.8 Gb.