Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile: use correct VOLUMEs #23

Closed
wants to merge 1 commit into from
Closed

Dockerfile: use correct VOLUMEs #23

wants to merge 1 commit into from

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Dec 14, 2016

The persistent data is just the contents of config/ (configuration),
apps/ (installed apps), and data/ (user data). The source code should
not be made into a volume because it will cause persistence issues when
you try to update an image.

This matches how ownCloud's setup works as well.

Signed-off-by: Aleksa Sarai [email protected]

@SnowMB
Copy link
Contributor

SnowMB commented Dec 14, 2016

I haven't digged into this topic, but is this the reason why there are additional files left after updating and the integrity check fails?

For example:

gallery] => Array
   (
       [EXTRA_FILE] => Array
           (
               [.github/CONTRIBUTING.md] => Array
                   (
                       [expected] => 
                       [current] => 19c87bcc731aea406018e9621d810c4e8b661d39689156ef026be0bd35c864ccba9c51fdae82589e63aa65e786652fe0052d8b6375f0cd4e3e176d1d926a31ab
                   )

           )

   )

[user_saml] => Array
   (
       [EXTRA_FILE] => Array
           (
               [js/preauth.js] => Array
                   (
                       [expected] => 
                       [current] => ed3e953457de95454fe5e8749ab6ea62c11bd005f6e787c66e53ff0648e777f91f343b9f8add432e26e7bbfda35aaf34675b199683e93cc698ac0f31dcedcd5a
                   )

           )

   )

I just ignored it in the past.

@cyphar
Copy link
Member Author

cyphar commented Dec 14, 2016

@SnowMB It's possible, though I assume that the "upgrade path" for Docker images is the same (you run the upgrader, and you don't start a new image with the new code). So unless you're restarting into a new image with /var/www/html as a volume I think it shouldn't make a difference.

@pierreozoux
Copy link
Member

@cyphar different remarks:

What do you think?

@SnowMB
Copy link
Contributor

SnowMB commented Jan 7, 2017

After reading a bit further into the docker documentation and fooling around a little bit, I think can explain in more detail what the problem is. Correct me if I'm wrong.

Since in the Dockerfile the whole /var/www/html folder is defined as a volume, stopping and restarting the container uses the same files and changes made to the installation files are persistent.
So upgrading from the updater app works fine in the first place, but it can have consequences in the long run. If you somehow rebuild the image or switch machines, another volume is used and so the nextcloud installation falls back to what is specified in the Dockerfile.
The volume will also be existing on the host as a dangling volume even when no nextcloud container is running anymore.
That hit me really bad a few times now ;). You have to specificly delete the volume to force a new build and update. For example if you have already build and run the container with nextcloud 10.0.2 (ENV), then change the variable (e.g. 11.0.0) in the dockerfile and rebuild and run again -> Suprise you are still running a 10.0.2 instance.

I think this is not a desireable behaviour.

I would suggest, just as cyphar said to ditch the /var/www/html volume and use the three ones. You can then be sure that running a docker container with a specified version runs even that version and not whatever is leftover in your volume.

For upgrading it means that using the updater app will not cause a persistent change, but you have to use the "Docker way". That means killing the old container and mounting a new one with newer nextcloud installation.

Another possibility would be recommending to save the whole nextcloud installation and all data on the host. It means changing the documentation and deleting the "fine grained" approach. With running the container one should always mount the whole /var/www/html folder. Then upgrading with the updater app is possible and always persistent.

@SnowMB
Copy link
Contributor

SnowMB commented Jan 9, 2017

Possible related: #24

@cyphar
Copy link
Member Author

cyphar commented Jan 13, 2017

So, @SnowMB already explained this for me while I was on holidays. 😉

Odd. That means they have the same issue as us.

  • if we accept this change, we also need to update README in the hub

Sure, that can be done.

I'm a bit afraid of the consequences for the users that use these volumes

I'd need to test how --volumes-from acts in this instance, but the issue with keeping it as-is is that the Docker image isn't actually being used to ship source code -- the source is stored inside a volume alongside the data.

@SnowMB
Copy link
Contributor

SnowMB commented Jan 13, 2017

Just adding a little info:

Docker image isn't actually being used to ship source code -- the source is stored inside a volume alongside the data.

That is not entirely correct. The source code is present inside the container in /usr/src/ .
But for running the container, the source code inside the volume is used.

The source code from inside the container will be copied to the volume, if the version.php file is missing. (I' don't know why this file was chosen). See https://github.com/nextcloud/docker/blob/master/docker-entrypoint.sh#L4 .

@cyphar
Copy link
Member Author

cyphar commented Jan 24, 2017

Rebased atop master. PTAL.

The persistent data is just the contents of config/ (configuration),
apps/ (installed apps), and data/ (user data). The source code should
not be made into a volume because it will cause persistence issues when
you try to update an image.

Currently the existing setup means that standard Docker-style updates
(where you start a new Docker image) won't actually update the source
code -- causing confusion and other issues.

This matches how ownCloud's setup works as well.

Signed-off-by: Aleksa Sarai <[email protected]>
@albgus
Copy link

albgus commented Jan 29, 2017

I have run with this kind of separation for a while, or rather the "fine grained data persistence" from the current README. I have seen the same issues that @SnowMB saw specifically for a lot of files from the apps/ tree after an update from 10 to 11. I found that when /var/www/html/apps were persisted while /var/www/html wasn't the entrypoint script would extract the tar file, but it doesn't remove files from the core apps that shouldn't exist anymore.

While this change would mean that updating the container "the docker way" would become possible, I'm not sure what the impact of possibly leaving old files could be in the long run. Preferably all code should be updated at the same time, but as long as core and 3rd party code is mixed in a volume this will become tricky.

Either way the correct update procedure must be correctly documented. Currently the default is to have /var/www/html as a volume but the documented update procedure is to spin up a new container, so the code will not actually be updates as version.php still exists.

@cyphar
Copy link
Member Author

cyphar commented Jan 29, 2017

@Gusson Both upgrade methods would actually work with this PR applied. You could start up a new container with the new version and do php occ maintenance:upgrade or do an upgrade inside the running container. Restarting the container with a newer version should continue to work (because the code will be identical) -- and if you set up your volumes correctly your data will be safe.

Currently the first method does not work with the current images, and changing images doesn't actually help the situation. Sure, you can do the "fine grained" method, but declaring /var/www/html as the VOLUME will break things like --volumes-from.

@SnowMB
Copy link
Contributor

SnowMB commented Jan 29, 2017

I just recently updated to 11.0.1 (started with 10.0.0) an it's annoying how much leftover files i've got in my apps folder now. The "docker way" to upgrade is definetly running in problems over there.

I ignored the integrity check in the past, because it just shows a few files, but this time it went nuts ;)

Does anyone know any easy cleanup methods?

@cyphar
Copy link
Member Author

cyphar commented Jan 30, 2017

When you say "docker way" what are your referring to? Do you mean that you have /var/www/html/{data,apps,config} as volumes and /var/www/html is not a volume, and that you started a new container with the same volumes?

In that case, the integrity check failing outside of those directories shouldn't be caused by the volumes being there. Is there some process that happens during "official" upgrade which involves anything other than swapping out the source code -- is there some change to data, apps or config which isn't done if you just swap out the source tree?

@SnowMB
Copy link
Contributor

SnowMB commented Jan 30, 2017

Sorry for the confusion.

My configuration is set this way:

Dockerfile contains line: VOLUME /var/www/html
Docker-compose file contains lines:

volumes:
      - ./nextcloud/apps:/var/www/html/apps
      - ./nextcloud/config:/var/www/html/config
      - ./nextcloud/data:/var/www/html/data

When I upgade, I stop and delete the docker-container, then pull or build the new image and run the new configuration.

docker-compose down -v
docker-compose pull
docker-compose up -d

When I next access the site I go through the upgrade dialoge for my data.

In that case, the integrity check failing outside of those directories shouldn't be caused by the volumes being there. Is there some process that happens during "official" upgrade which involves anything other than swapping out the source code -- is there some change to data, apps or config which isn't done if you just swap out the source tree?

Hmm, I just realised that this way I might omit changes to the system(default or whatever you would call them) apps completely. 😟

@ambis
Copy link

ambis commented Mar 3, 2017

Does this all also mean that you are not going to release patches (like 11.0.1->11.0.2) to the Nextcloud docker hub image?

I've been anxiously waiting for a 11.0.2 tag. But do you guys think the updater is a the correct way to update?

Personally, I'm completely with everyone here suggesting /var/www/html is NOT a volume. Using docker completely removes the need to have an updater in the first place. And my updater does not even work because of this https://github.com/nextcloud/server/issues/3663

@SnowMB
Copy link
Contributor

SnowMB commented Mar 3, 2017

Like I said in another issue, just make a quick PR and change the version number from 11.0.1 to 11.0.2 and it will get merged. Someone just has to do it ;)

@ambis
Copy link

ambis commented Mar 3, 2017

@SnowMB See #39

@pierreozoux
Copy link
Member

In the mean time, I remembered why /var/www/html is a volume:

https://github.com/indiehosters/nextcloud/blob/master/docker-compose.yml#L14-L15
https://github.com/indiehosters/nextcloud/blob/master/nginx.conf#L82

It allows us and people in general to mount assets into nginx from the fpm image.
(this was the way to do before, but we can disucss that this is no longer relevant)

So let's go back to the problem space.
What is the real problem? The update?
Then, we can be slightly smarter and compare version in /usr/src/ and /var/www/html
We were discussing about this with invoiceninja, and this is the way they solved it:
https://github.com/invoiceninja/dockerfiles/blob/master/entrypoint.sh#L26-L31

@cyphar
Copy link
Member Author

cyphar commented Mar 14, 2017

@pierreozoux From my point of view, the problem that I was trying to fix this is that the upgrade process for Docker containers is not well-defined. Either you want to say that we should always be running the upgrade process in the same way as you would on a normal server, at which point you're not really using Docker "properly" (you're treating the container as a VM) -- which is fine but it needs to be documented properly. On the other hand, if you want to embrace the "Docker way" of doing management of version upgrades, then it needs to be possible to restart a container with a newer version of NextCloud and then be able to simply do docker exec -u www-user -it <container> php occ upgrade to do the upgrade.

The second option is currently broken because all of the source code is currently stored in the volume, so changing the image you're using doesn't change the source code (but this will break eventually if you start upgrading libraries or other things that become too new for older NextCloud versions).

@SnowMB
Copy link
Contributor

SnowMB commented Mar 14, 2017

@cyphar, @pierreozoux : There is a problem remaining. The upgrade process (before upgrading data) also makes changes to the apps folder. I'm guessing it's updating core apps that are bundled with the nextcloud installation. So at the moment the docker way to upgrade (pull new, throw away old and start new) would only be possible if neither the whole /var/www/html nor the /var/www/html/apps folder are mounted as volumes.

But mounting at least the apps folder makes sense because you wouldn't want to loose your apps when simply recreating the image (without any update).

It's a problem where the implementation wasn't designed for docker and thus is a little edgy 😕 .

I don't know how this would effect the solution @pierreozoux suggested. Maybe you have a word on that?

Right now the best solution I see, is sticking with the web-updater.

@pierreozoux
Copy link
Member

The solution space is full of hope once the problem space is well defined :)

I've been using this to track how to update ownCloud/Nextcloud with different versions:
libresh/compose-owncloud#3

And so the conclusion would be in pseudo code, Replace this block with:

if version is different in /usr/src and /var/www/html then:
  tar cf - --one-file-system -C /usr/src/nextcloud . | tar xf -
  php occ app:list > /tmp/list_before
  for app in `ls /usr/src/nextcloud/apps/`; do rm -rf ./apps/$app;cp -TR /usr/src/nextcloud/apps/$app/ ./apps/$app/; chown -R www-data:www-data ./apps/$app/; done
  php occ upgrade --no-app-disable
  php occ app:list > /tmp/list_after
  echo "the following app have beed disabled:
  diff <(sed -n "/Enabled:/,/Disabled:/p" /tmp/list_before) <(sed -n "/Enabled:/,/Disabled:/p" /tmp/list_after) | grep '<' | cut -d- -f2 | cut -d: -f1
fi

We have to make sure that the tar keeps data, apps and config folder in shape.

Then, we are as good as the web updater:

  • we keep config/data and apps
  • integrity check should be good
  • apps should be updated
  • it is "docker compliant"

What do you think?

@SnowMB
Copy link
Contributor

SnowMB commented Mar 15, 2017

Sounds great.

The only thing could go bad is the following:

if version is different in /usr/src and /var/www/html then:

We should actually check if version is greater, because... you know... downgrades are not supported 😉

@SnowMB SnowMB mentioned this pull request Mar 15, 2017
@radhus
Copy link
Member

radhus commented Mar 16, 2017

I would personally like to see that upgrading Nextcloud is as simple as stepping the image version, and I think @pierreozoux proposal looks good.

I do however still have my doubts whether keeping the source extracted to a volume is a good idea in the long run. Things I've thought about:

  1. It would be nice to have the Nextcloud code immutable. In case of unwanted modification, simply relaunching the container would solve it, or the code could even be mounted as read-only from the beginning.
  2. Will upgrades always work with the current solution? Like when the Docker image base is stepped from PHP 5.x to 7.x while older Nextcloud releases didn't support PHP 7. Would the old code support the upgrade?

@MK-42
Copy link

MK-42 commented Mar 18, 2017

But mounting at least the apps folder makes sense because you wouldn't want to loose your apps when simply recreating the image (without any update).

Wouldn't custom app directories help there? Putting the original apps dir in read-only mode and allowing the user to install own apps in another directory (mounted as volume). Wonderfall/Nextcloud does the same thing as far as I can tell here.

don't know if that addition can be made without incorporating the whole bootstrapping/installation. Maybe using an autoconfig.php?

@SnowMB
Copy link
Contributor

SnowMB commented Mar 19, 2017

Wouldn't custom app directories help there?

I love the idea of splitting the app folder.

However it would only be easy to implement this for new installations where the config.php does not exist yet. People that switch to docker and have existing config and data folders would have to edit their config file by hand (which might be a bit inconvenient). Or someone writes a script that checks and manipulates the config file.

@ambis
Copy link

ambis commented Mar 20, 2017

However it would only be easy to implement this for new installations where the config.php does not exist yet.

How about a simple "How to migrate your installation to a Docker container" tutorial? Maybe to the docker hub page?

I think the gain here is much greater than the small trouble to those migrating from an installation into Docker containers. If that even happens that often?

@MBurchard
Copy link

Just my 5 cent to this topic.

If I update my docker image It's absolutely clear, that I can't start a previous image and access the data directory again and everything is as before. That can not work. Also the database schema has potentially been updated.

So in my opinion a docker update must automatically remove all old files that are not data, means user content, and replace it with new versions. The image documentation should point this out, that everybody is aware of that.

If possible it would be nice, if manually installed/activated apps like "Two Factor Totp" are not deactivated...

@ChessSpider
Copy link

ChessSpider commented Apr 12, 2017

in the mean time, pretty, pretty please just remove any and all VOLUME declarations in the Dockerfile and just let the docker-compose or the docker run command fix it.

This gives 100% control to the user, and doesnt give any unexpected behavior if someone is doing something silly as trying to have his fpm-container start/run as www-data instead of the default root (and then dropping privs).

This now fails because www-data tries to unpack into the VOLUME declared /var/www/html, which is owned by root, and thus fails. Also it prevents me from unpacking it myself and properly setting the permissions (everything readonly except for the dynamic stuff like data directory) in my own Dockerfile because everything is gone when the container starts, as at that point the VOLUME takes over...

(Tbh I dont understand why I cannot undo the VOLUME declaration in my Dockerfile)

@tilosp tilosp requested review from pierreozoux and removed request for pierreozoux April 13, 2017 08:20
@tilosp
Copy link
Member

tilosp commented Apr 14, 2017

I started to work on this in #65.

@cyphar
Copy link
Member Author

cyphar commented Apr 17, 2017

@ChessSpider

in the mean time, pretty, pretty please just remove any and all VOLUME declarations in the Dockerfile and just let the docker-compose or the docker run command fix it.

Absoutely, most definitely, not. This would break --volumes-from and would allow users to cause themselves to automatically snapshot things. Lets not make our lives harder by constructing new footguns for no reason.

@ChessSpider
Copy link

ChessSpider commented Apr 17, 2017

@cyphar

Absoutely, most definitely, not. This would break --volumes-from and would allow users to cause themselves to automatically snapshot things. Lets not make our lives harder by constructing new footguns for no reason.

About volumes_from, e.g. in Docker Compose 3 they removed that syntax with a good reason. Generally you only want specific volumes from a different container. E.g. mount the volume with only the static files (css, js, etc) as readonly in your nginx. As opposed to also mounting the volumes with php-config files.

I'm not really sure what your second argument is. But if you're arguing against putting control in the hands of the users then I do oppose it.

I do agree with the underlying argument of changing critical behavior of the container without properly informing the users is a really bad idea.

But the fact remains that the VOLUME declaration is used for source files. AFAIK is that against the best practices of Docker, which states that user-data should be in dynamic volumes. They state that a new container should be created when the source code changes. Or at least that's how I interpreted it.

At the moment I have these problems with this official Nextcloud repository:

  1. I cannot start the container as a non-privileged user as the VOLUME created by Docker is owned by root, and the extract script obviously cannot write to that folder.
  2. I cannot start the container with the readonly flag, as extracting the source-code is critical behavior of the container. (Note; mounted volumes (on RW locations like the data configuration ) are not impacted by the readonly flag of the container.)
  3. The container recommends the use of a Compose-script which mounts the entire source-code of Nextcloud, including essential configuration php-scripts, into an ill-configured webserver. Instead, only the directories containing static files should be explicitly mounted in read-only mode.

Anyway I'm not just ranting because of ranting. I'm willing to help & be part of the solution :).

@tilosp
Copy link
Member

tilosp commented Apr 17, 2017

@ChessSpider what do you think about the changes in #65 ?

@ChessSpider
Copy link

@tilosp thx ill have a look tomorrow

@cyphar
Copy link
Member Author

cyphar commented Apr 19, 2017

@ChessSpider

About volumes_from, e.g. in Docker Compose 3 they removed that syntax with a good reason. Generally you only want specific volumes from a different container. E.g. mount the volume with only the static files (css, js, etc) as readonly in your nginx. As opposed to also mounting the volumes with php-config files.

Okay, but there's also Docker's volume drivers which require the image to contain VOLUMEs. This isn't a contentious position, mysql (for example) uses VOLUMEs for its database files. And so does basically every other project that wants to have some persistent data.

I'm not really sure what your second argument is. But if you're arguing against putting control in the hands of the users then I do oppose it.

Your change will restrict users from being able to use features of Docker. My suggestion will make it so that they don't do something they shouldn't be doing (snapshotting their data into a container image rather than having it persistently stored external to the image -- which is a horrible practice to encourage).

I do agree with the underlying argument of changing critical behavior of the container without properly informing the users is a really bad idea.

But the fact remains that the VOLUME declaration is used for source files. AFAIK is that against the best practices of Docker, which states that user-data should be in dynamic volumes. They state that a new container should be created when the source code changes. Or at least that's how I interpreted it.

Yes, that is correct. And I would love it if we could split up the data into separate places so we can just create volumes for the non-source code bits. And trust me, I tried. However, that's now how NextCloud works at the moment. Apps are source code that you want to be persistent because they're (mostly) not shipped as part of the image because they are additions to the image. In addition, the updating process for apps is entirely separate to the process for the core server. And also there is some weirdness that happens if you run the upgrader app (it doesn't appear to upgrade everything that needs to be upgraded). This might be because it uses --one-filesystem or something like that which will break if you have a separate volume for the different subdirectories of the source code.

I cannot start the container as a non-privileged user as the VOLUME created by Docker is owned by root, and the extract script obviously cannot write to that folder.

... You can chmod the folder you're connecting the volume to. If you want we can also just add a chmod to the image as well.

I cannot start the container with the readonly flag, as extracting the source-code is critical behavior of the container. (Note; mounted volumes (on RW locations like the data configuration ) are not impacted by the readonly flag of the container.)

If you actually use the volume you won't have this problem (mount the entire http volume).

The container recommends the use of a Compose-script which mounts the entire source-code of Nextcloud, including essential configuration php-scripts, into an ill-configured webserver. Instead, only the directories containing static files should be explicitly mounted in read-only mode.

Yes, I would love if we did that but as I mentioned above, I don't know if NextCloud would allow us to do that in a sane way. I've run into a lot of issues with splitting up the volumes and upgrading. It's not as simple as it sounds to fix.

@ChessSpider
Copy link

@cyphar of course I trust you 👍

Okay, but there's also Docker's volume drivers which require the image to contain VOLUMEs. This isn't a contentious position, mysql (for example) uses VOLUMEs for its database files. And so does basically every other project that wants to have some persistent data.

Where did it say in the documentation that Docker's volume drivers requires te Dockerfile to contain VOLUME declarations? This is new to me. To my knowledge a volume can be mounted with docker run -v on any location, and docker-compose can do that too.

Your change will restrict users from being able to use features of Docker. My suggestion will make it so that they don't do something they shouldn't be doing (snapshotting their data into a container image rather than having it persistently stored external to the image -- which is a horrible practice to encourage).

Please see my docker-compose examples in 65 the functionality I'd like to have from nextcloud.

(....) In addition, the updating process for apps is entirely separate to the process for the core server. (....)

Which is why I propose to disable the update process for the apps in #65 so that you force users to do a docker --pull to upgrade. This ensures users will both get the latest version of Nexcloud and the latest version of all (host) dependancies (and their security fixes!)

... You can chmod the folder you're connecting the volume to. If you want we can also just add a chmod to the image as well.

Maybe a chown to www-data is better? Would be helpful definitely <3 but not a main issue.

Yes, I would love if we did that but as I mentioned above, I don't know if NextCloud would allow us to do that in a sane way. I've run into a lot of issues with splitting up the volumes and upgrading. It's not as simple as it sounds to fix.

I'm beginning to notice that as well! If Nextcloud still had asset pipelining it wouldve been a lot easier; just do a VOLUME on /assets and re-run the asset pipeline on upgrade and done! But I still think keeping the VOLUME on /var/www/html is a bad thing. You don't need it, and you want users to not in-place upgrade the app but always do a docker pulll because security

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants