-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container-storage-setup fails when updating Atomic Host OS on existing system #267
Comments
I think problem is with the upgrade. Upgrade should not have overwritten /etc/sysconfig/docker-storage-setup. It should write STORAGE_DRIVER=overlay2 only if this file is not present. @cgwalters Any thoughts on what's going on. cc @rhatdan |
Yeah, I have a variant of this on my workstation too;
Which IIRC I upgraded from F26. So...we obviously talked about this scenario in the original PR: Rereading that I'm not certain we actually found a solution to that problem? You know in general I sort of feel like c-s-s should just be a "run at most once" service, like we drop a stamp file in |
To give my answers to the questions:
It's a bug.
Yes, it's safe. |
@cgwalters @rhvgoyal |
I haven't fully analyzed this but here's what I think is happening. Today in Fedora docker spec we mark it as (ostree-managed systems are basically Now the problem comes in that the docker package also has a crazy posttrans which overrides its own config file - hence making it modified. So on "classic" systems managed with |
To rephrase this: a very important detail here being that with "classic" systems, To say it another way: what the |
Actually this is all fixed since ec391a1 But we haven't actually e.g. shipped any updates to c-s-s in Fedora 28 since https://src.fedoraproject.org/rpms/docker/blob/901d9fb5aba12bde899b8be1b915b4a60a4f9afc/f/docker.spec#_52 which is...ancient. |
No wait, that's dead code since we split off c-s-s into its own package. But we're still waiting on https://bodhi.fedoraproject.org/updates/FEDORA-2018-03bdc0733a |
Without going into specific details, so how %config(noreplace) equivalent is implemented, in atomic image upgrades. IOW, if a user decided to edit a file in /etc/sysconfig/ which now should not be replaced over upgrade, how atomic upgrade takes care of that? |
Ok, in fedora docker spec, /etc/sysconfig/docker-storage-setup is generated on the fly (only if file is not present already). And it is marked as %ghost file. That is package does not install it but if package is un-installed, then this file will be removed. This means that over upgrade of docker package, this fill will not be touched at all. Only during fresh install, it will be created as part of %posttrans. That also means, any changes user/other scripts have made, will continue to be there over upgrade. How does atomic upgrade take care of this situation. That is user made changes are not lost over upgrade for a file present in /etc/ To me it does not matter if changes to /etc/sysconfig/docker-storage-setup are done in %posttrnas, or manually by user or by an external script or by cloud-config. Atomic has to deal with it over upgrade. |
It's implemented in libostree. There's some mention of this in https://ostree.readthedocs.io/en/latest/manual/atomic-upgrades/ Basically, modified config files are always preserved. Does it matter though now that we discovered we'd already fixed the root issue here? yum-managed systems will stay with the default config file, but that seems OK since we want them to also stay with devicemapper anyways by default. |
container-storage-setup fix is more of a workaround. We should not even be getting to that point if our assumptions w.r.t upgrades are correct. I am trying to understand that why did we end up in this situation at all over upgrade. That is not clear to me yet. My guess is that atomic overwrote /etc/sysconfig/docker-storage-setup and that's a problem. We have not addressed the real problem. |
Why did storage driver change to begin with over upgrade, that's the real problem which needs fixing. How container-storage-setup copes with the change, is a different issue altogether. |
I don't agree, getting updated config files should be the default. What's happening in the |
Documentation says following. "Now that we have a deployment directory, a 3-way merge is performed between the (by default) currently booted deployment's /etc, its default configuration, and the new deployment (based on its /usr/etc)." I am not sure what does it mean. Why /etc/ needs to be touched by upgrade at all. That's host detail and should probably be touched only during install. Any upgrades should go to files in /usr/... dir. I mean if a user installed atomic images with some custom changes in /etc/sysconfig/foo.txt using cloud-init, how would you retain these changes over upgrade if you decide to override these files. |
How does it matter if /etc/sysconfig/docker-storage-setup was touched by %posttrans. It could have very well be done by cloud-init. I really can't see anything wrong with what's being done in %posttrans. That section is there so that user can do system specific things. IMO, real problem is that atomic upgrades are modifying a file over upgrade, which they should not be modifying. |
I think real issue is that atomic upgrade will not deal with any file being dropped by %posttrans section in /etc. We drop this file to setup default for the system and it is marked as "%ghost". So while it works well for rpms, it seems to become an issue with atomic. I am not sure how to setup a default on a system dynamically during package installation if I don't use %posttrans section. Dropping config files only works for static defaults. |
Do it at service startup via systemd. If you think about it, that's what we're effectively doing today as state is also stored in |
We had gone towards that path with the option of STORAGE_DRIVER=auto and ran into various issues. Also, that means we will need to carry additional patches to source code to change fedora/redhat defaults. (if they are different from upstream default. For example, upstream default is devicemapper while we want overlay2 as default). |
@cgwalters looks like exiting early is not enough. It exposes a race where we wait for thin pool to come up and now that wait does not happen and docker fails saying thin pool device does not exist. @dustymabe is able to reproduce it. We had fixed it using following commit. |
reference: https://pagure.io/atomic-wg/issue/498 |
OK, and we reintroduced that race condition because the "storage is changed" check is before the thin pool wait, and this only affects Atomic Host as we propagated the new default config file. Whee. So... ?
|
I don't want to be calling setup_lvm_thin_pool, when I know /etc/sysconfig/docker-storage-setup has changed and it has lost valid options like AUTO_POOL_EXTEND=yes. That would disable auto extension. I am working on a hack patch to just wait for thin pool to come up if storage driver is already configured and it has changed. |
This also breaks the reset storage path. That path looks at /etc/sysconfig/docker-storage-setup to figure out storage driver and other options and reset accordingly. |
Ok, I have proposed a PR to fix this corner case here. PTAL |
Hi,
We are using Atomic Host CentOS on our systems in the field to run our docker application containers.
We are also running cockpit to have easy access to system services and Atomic Host updates. We noticed when updating to the latest Atomic Host release (via cockpit), docker storage setup fails on reboot.
systemctl diagnostics give the following errors:
`bash-4.2# systemctl status docker-storage-setup.service
â docker-storage-setup.service - Docker Storage Setup
Loaded: loaded (/usr/lib/systemd/system/docker-storage-setup.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2018-04-23 10:48:40 CEST; 42min ago
Process: 859 ExecStart=/usr/bin/container-storage-setup (code=exited, status=1/FAILURE)
Main PID: 859 (code=exited, status=1/FAILURE)
Apr 23 10:48:39 master systemd[1]: Starting Docker Storage Setup...
Apr 23 10:48:40 master container-storage-setup[859]: ERROR: Storage is already configured with devicemapper driver. Can't configure it wi... retry.
Apr 23 10:48:40 master systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Apr 23 10:48:40 master systemd[1]: Failed to start Docker Storage Setup.
Apr 23 10:48:40 master systemd[1]: Unit docker-storage-setup.service entered failed state.
Apr 23 10:48:40 master systemd[1]: docker-storage-setup.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
`
I think this is caused by the docker storage system switching to overlay2 FS by default ?
On a system BEFORE the Atomic Host update I notice that the file /etc/sysconfig/docker-storage-setup contains the configuration:
CONTAINER_THINPOOL=docker-pool
On a system after the Atomic Host update I notice that the same file now contains the configuration:
STORAGE_DRIVER=overlay2
There is already a docker storage setup on this system however, in /etc/sysconfig/docker-storage:
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/cah_master-docker--pool --storage-opt dm.use_deferred_removal=true --storage-opt dm.use_deferred_deletion=true "
My understanding is that this existing configuration is what container-storage-setup complains about during startup of the updated system.
I have two questions:
The text was updated successfully, but these errors were encountered: