Why is there volume for data in the first place? #255

etki · 2017-01-11T16:28:02Z

Hi.
My inquiry may seem strange, but i really don't get it.
Why does MySQL Dockerfile contain VOLUME directive? From my perspective, there are more cons than pros:

(+) Users that did not specify host mount on startup have a chance to recover their data
(-) Anonymous volume is terribly hard to search for when container is gone and you have literally hundreds of them
(-) Volumes consume free space at dramatic speeds. Because htere is no GC. Because they were made to not to be garbage collected, and that means they should not be created automagically, because, in turn, that means they would require garbage collection.
(-) Users that persist their data will certainly do a regular host mount that renders volume useless.
(-) Users that do tons of CI builds a day and thought they can finally forget about data bloat when using containers, they, well, are highly annoyed when they discover the consequences.
(-) You can add a volume later, but you literally can't cancel declared volume. And if you mount it to your host - well, if that's remote host, you have no chance at cleaning volume at the end of the build.

I know i'll cause huge 'watch it, kid!' next second, but shouldn't it be dropped? I really don't see any huge benefits over drawbacks it brings in.

ltangvald · 2017-01-12T07:29:56Z

Volumes will also be faster than using the container's internal storage, but I agree it tends to very quickly clutter up the disk with anonymous volumes, since starting the container without any volumes specified is something I at least mostly do for quick testing.
@tianon @yosifkit
What are your thoughts on this? While volume cleanup has become simpler with the «docker volume» commands, do we need to have /var/lib/mysql a volume by default?

yosifkit · 2017-01-12T20:38:06Z

I just use a series of alias's and periodically cleanup stopped containers, unused volumes, and dangling images.

alias dclean='docker ps -aq | xargs --no-run-if-empty docker rm'
alias dcleanvol="docker volume ls | awk '/^local/ { print \$2 }' | xargs --no-run-if-empty docker volume rm"
alias ddangling='docker images --filter dangling=true -q | sort -u | xargs --no-run-if-empty docker rmi'

With upcoming docker 1.13.0 there will be built-in commands with docker container prune, docker volume prune, docker image prune.

Edit: If you do docker rm -v mysql-container it will also clean up the volumes associated with the stopped container you are deleting. It is automatic on a docker run -it --rm.

etki · 2017-01-17T12:36:17Z

@yosifkit this is fine workaround for local (and, probably, swarm - haven't worked with it) containers, but as soon as nomad / kubernetes / other orchestration hero is hit, things go bad, sometimes you don't have easy automation for the node itself at all.

shitalm · 2017-02-08T12:31:38Z

Agree with etki. Biggest problem for us is we can't save data in the image itself. Why not leave it for the users to decide whether they want to use volumes or not. They can always add it later but as etki mentioned there's no way to remove it.

schwamster · 2017-05-09T20:57:14Z

we have the same problem as shitalm - we would like to add data to the image itself so we can e.g. prepare test/demo data or just to deliver static/readonly data. It tedious to have to copy dockerfiles, remove the volume instruction and build it ourselves... I could also live with a separate image with a tag - 5.7-no-volume

tianon · 2017-05-09T20:59:15Z

If you run your container with "--datadir=...", you should be able to adjust where MySQL stores data (or put "datadir" into a configuration file in your image), which will then free you from the volume.

etki · 2017-05-10T10:07:50Z

@tianon it won't. The data will be stored in other place, but untracked directory will still be created on the host, consuming an inode. This is not as bad, but still something that has zero positive effect.

armpogart · 2017-06-23T20:07:16Z

As far as I know it's now recommended by docker team not to declare volumes in base images (or in this case official images). It would be better in my opinion to document the usage of volume, but not declare it in the image as it depends on the end user case, how he/she is going to store the data (local vs production cases).

ltangvald · 2017-06-26T06:10:52Z

@tianon, @yosifkit
Should we just remove it? The only downside I see is that people doing very basic testing (as most proper use will generally map the data directory anyway) would experience worse performance?

ltangvald · 2017-06-26T06:20:21Z

Actually, there is one other effect: https://docs.docker.com/engine/reference/builder/#notes-about-specifying-volumes
When installing the standard Debian packages (5.7 and older), /var/lib/mysql will be populated with a database as part of package installation. Since /var/lib/mysql is declared a volume that database will then be discarded.

If we just drop the VOLUME statement, the database will still be there, and no database initialization will be performed for basic testing of the image. I don't think this would require anything more than clearing out the directory after installing, though.

yosifkit · 2017-06-27T22:53:05Z

I feel like removing the volume would break many users that rely on the volume when using docker-compose. Compose tries hard to keep the volume between restarts of the container to persist the data and these users would suddenly see new deployments unable to survive a re-creation. My opinion is that if it is such a problem to have a volume defined, then docker needs to provide an unvolume/"don't use any defined volumes" (via Dockerfile and docker run) and users should set automatic volume and image deletion if space/inode usage is a problem.

I would think that most users would rather their database data preserved by default rather than discovering that their data has been automatically deleted when they did a docker-compose restart (or docker stack deploy) after bumping a database version number.

There is not a good alternative for telling the user where persistent data lives. Labels are not standardized and many users skip over the Docker Hub documentation.

Being able to build an image that ships with a database already initialized is still possible and the automatic volume would be left empty.

FROM mysql:5.7
CMD ["--datadir=/sql"]
# assuming ./sql-datadir contains an already initialized database
COPY ./sql-datadir/* /sql/
# on startup the entrypoint script will detect the already initialized database and start right up
# leaving /var/lib/mysql empty

or.... without having to use a different data directory:

FROM mysql:5.7
# ./sql-datadir contains a database dump of *.sql files
COPY ./sql-datadir/* /docker-entrypoint-initdb.d/
# initdb logic will restore the database via the sql files in alphanumeric order on first container start
# users will have to `docker rm -vf sql-container` when a new image is pulled with a new database dump

@ltangvald, as for the automatic population of /var/lib/mysql/ by the apt package, that is already deleted as soon as it is created (since the volume is declared later).

bflad · 2017-07-06T17:48:26Z

Would it be simple to tag the image twice for both use cases? e.g. do everything the same sans VOLUME in the Dockerfile and tag it something like #-no-volume (naming is hard) then simply have another Dockerfile do the below and tag with the existing tags:

FROM mysql:#-no-volume
VOLUME ["/var/lib/mysql"]

Image behavior stays the same for existing tags while we allow the other use case for those who want it.

ltangvald · 2017-07-11T06:28:00Z

@yosifkit I hadn't considered the compose use case
I agree this would probably be too big a behavior change to the existing images.

@bflad In general I don't think we want more files to maintain (though it's simple enough), but when/if we get a template system in place (discussed in issue #289) this might be an option.

codycraven · 2017-12-26T15:26:17Z

@yosifkit I agree with you regarding an "UNVOLUME" command, however I don't see Docker implementing that anytime in the near future.

Until that occurs we're basically stuck telling educated Docker users that they need to go copy the Dockerfile from the MySQL image that they want and create their own image with the VOLUME line commented/deleted. Preventing the user from automatically receiving potential security updates or writing a script to automate the process (which makes me uneasy, but I have seriously considered it...).

I'm a heavy user of Compose (doing a lot of local-dev with Docker) and would have been perfectly fine seeing the documentation on Docker Hub stating that I need to define a volume in my run command or docker-compose service.

I know you stated that many users skip over the Docker Hub documentation, but the image is already relatively useless if you don't scroll down to read the section regarding environment variables. The Compose/Stack documentation appears before that section, which could certainly include a sample Volume definition with a comment above it, something like:

# Use root/example as user/password credentials
version: '3.1'

services:

  db:
    image: mysql
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: example
    # Use a volume to support persistent storage on container restart.
    volumes:
      - data-volume:/var/lib/mysql

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080

volumes:
  data-volume:

I'd be happy to write a suggestion for the "Where to Store Data" section as well, if that's a hangup.

thaJeztah · 2017-12-26T17:23:02Z

however I don't see Docker implementing that anytime in the near future.

If someone wants to work on that, it may be implemented, see moby/moby#3465 (comment) and moby/moby#3465 (comment)

Nobody so far offered working on it though

ufoscout · 2018-03-22T08:00:10Z

The request for docker to support an "UNSET" feature is there only to help people to cope with bad images.
In addition, it is a workaround that will force everyone to create custom images to unset something that should have never been set.
Setting an anonymous volume is clearly a bad practice everywhere discouraged. In my company, we use lots of different database docker images and the MySQL ones are the only ones with this annoying problem.

About this sentence:

I feel like removing the volume would break many users that rely on the volume when using docker-compose.

it is completely wrong.

In every company and project I have ever worked, when it is desired to persist data between docker restarts, either you don't delete the container or you explicitly mount a volume. I have never seen someone relying on (or being in love with) anonymous volumes in the real world.

If you don't want to break the (frustrating) behavior of this image, you should really adopt another tag and offer both the alternatives.
Anyway, from my point of view, the default tags (e.g. "5.7") should offer the behavior that everybody expects, which is without the volume; then you can extend the default image adding the VOLUME option and offer another specific tag (e.g. "5.7-persistent" or whatever). Obviously, this should be clearly reported and highlighted in the documentation.

t-hofmann · 2018-11-27T13:40:29Z

I would second the request to remove the volume definition from the Dockerfile.
Dockerfiles should merely define how an image is built (built-time configuration) and not how a container is run (runtime configuration), and I deem the definition of volumes and also ports as runtime configuration.

As a user of docker-compose I see the built- and runtime-configuration nicely separated, the docker-compose.yml refers to the build environment (including the Dockerfile) for the underlying image and it allows to define the runtime configuration of the actual container (including, volumes and ports).

alexlatchford · 2019-01-14T17:24:10Z

Just hit this issue as well, took several hours of a junior devs time before we found the underlying cause as we didn't think this would be included by default and was a big surprise. Having a no-volume tag would suffice for me as well understanding the compatibility concerns but I guess we're stuck with a forked Dockerfile for now.

Addresses issues like docker-library#255 and docker-library#214

swen128 · 2021-03-26T14:36:16Z

I'm confused about this part:

I feel like removing the volume would break many users that rely on the volume when using docker-compose. Compose tries hard to keep the volume between restarts of the container to persist the data and these users would suddenly see new deployments unable to survive a re-creation.

docker-compose up after docker-compose down creates a new anonymous volume. While older volume is preserved on host's disk, it is not reused by a re-created container unless explicitly specified.

How is docker-compose relevant to the problem? Could someone give me an example use case which will be affected by the removal of VOLUME direction?

tianon · 2021-03-26T17:24:52Z

$ cat Dockerfile
FROM bash
VOLUME /foo
$ cat docker-compose.yml
version: '3.8'
services:
  bash:
    build: .
    tty: true

$ docker-compose build
...
$ docker-compose up -d
Starting tmp_bash_1 ... done
$ docker-compose exec bash touch /foo/bar
$ docker-compose exec bash ls /foo
bar
$ docker-compose up -d --force-recreate
Recreating tmp_bash_1 ...
$ docker-compose exec bash ls /foo
bar

(Docker Compose works extra hard to keep even anonymous unspecified volumes around and attached to the appropriate container.)

lucasbasquerotto · 2021-10-19T12:56:23Z

@tianon Couldn't you just add an explicit anonymous volume to the docker command or docker-compose file, like docker run -v /foo image, or in the docker-compose file:

volumes:
  - /foo

The above has the benefits of having the volume defined explicitly, so as to not catch people by surprise with lots of anonymous volumes that shouldn't have been created, and not reusing persisted data that should not have been persisted in the fist place (and also to not have the need of hacks like defining the mysql data directory in another place, that doesn't stop the volume creation, anyway).

Furthermore, as @ufoscout said:

In every company and project I have ever worked, when it is desired to persist data between docker restarts, either you don't delete the container or you explicitly mount a volume. I have never seen someone relying on (or being in love with) anonymous volumes in the real world.

Last year oracle removed volume from their official image, and I don't know about it impacting people negatively (although the mysql image is probably more used):

oracle/docker-images#640 (comment)

etki mentioned this issue Feb 27, 2017

Dockerfile issues reportportal/service-gateway#3

Closed

ufoscout mentioned this issue Jun 23, 2017

Is there a plan to make an alpine build of mysql? #179

Closed

rfay mentioned this issue Jun 27, 2017

Persist data after container removal with named MySQL volumes ddev/ddev#349

Closed

brancz mentioned this issue Mar 26, 2018

Image improvements grafana/grafana-docker#146

Merged

wglambert added the question Usability question, not directly related to an error with the image label Apr 24, 2018

wglambert added the Request Request for image modification or feature label May 3, 2018

tianon mentioned this issue Aug 20, 2018

MY-010123 error running version 8.0 #475

Closed

wglambert mentioned this issue May 31, 2019

Volume-less flavours docker-library/cassandra#180

Closed

MaddieM4 added a commit to MaddieM4/mysql that referenced this issue Sep 24, 2019

Don't use VOLUME in a base Dockerfile

694d376

Addresses issues like docker-library#255 and docker-library#214

wglambert mentioned this issue May 19, 2020

VOLUME declaration can result in difficult to diagnose misbehavior docker-library/rabbitmq#410

Closed

wglambert mentioned this issue Dec 28, 2020

The meaning and purpose of the volume field of the dockerfile file #736

Closed

wglambert mentioned this issue Feb 23, 2021

Building a Wordpress production image with a custom theme. docker-library/wordpress#567

Closed

lucasbasquerotto mentioned this issue Jan 26, 2022

Feature: Ignore image volumes when running containers moby/moby#43190

Open

polarathene mentioned this issue Feb 18, 2022

Remove VOLUME instructions caddyserver/caddy-docker#118

Closed

This comment was marked as spam.

Sign in to view

This comment was marked as abuse.

Sign in to view

polarathene mentioned this issue Mar 12, 2024

Should not declare VOLUME for /data/db docker-library/mongo#306

Open

lucasbasquerotto mentioned this issue Mar 22, 2024

Reset properties inherited from parent image moby/moby#3465

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there volume for data in the first place? #255

Why is there volume for data in the first place? #255

etki commented Jan 11, 2017 •

edited

Loading

ltangvald commented Jan 12, 2017

yosifkit commented Jan 12, 2017 •

edited

Loading

etki commented Jan 17, 2017 •

edited

Loading

shitalm commented Feb 8, 2017

schwamster commented May 9, 2017

tianon commented May 9, 2017 via email

etki commented May 10, 2017

armpogart commented Jun 23, 2017

ltangvald commented Jun 26, 2017

ltangvald commented Jun 26, 2017

yosifkit commented Jun 27, 2017

bflad commented Jul 6, 2017

ltangvald commented Jul 11, 2017

codycraven commented Dec 26, 2017 •

edited

Loading

thaJeztah commented Dec 26, 2017

ufoscout commented Mar 22, 2018 •

edited

Loading

t-hofmann commented Nov 27, 2018

alexlatchford commented Jan 14, 2019

swen128 commented Mar 26, 2021

tianon commented Mar 26, 2021

lucasbasquerotto commented Oct 19, 2021 •

edited

Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as abuse.

Why is there volume for data in the first place? #255

Why is there volume for data in the first place? #255

Comments

etki commented Jan 11, 2017 • edited Loading

ltangvald commented Jan 12, 2017

yosifkit commented Jan 12, 2017 • edited Loading

etki commented Jan 17, 2017 • edited Loading

shitalm commented Feb 8, 2017

schwamster commented May 9, 2017

tianon commented May 9, 2017 via email

etki commented May 10, 2017

armpogart commented Jun 23, 2017

ltangvald commented Jun 26, 2017

ltangvald commented Jun 26, 2017

yosifkit commented Jun 27, 2017

bflad commented Jul 6, 2017

ltangvald commented Jul 11, 2017

codycraven commented Dec 26, 2017 • edited Loading

thaJeztah commented Dec 26, 2017

ufoscout commented Mar 22, 2018 • edited Loading

t-hofmann commented Nov 27, 2018

alexlatchford commented Jan 14, 2019

swen128 commented Mar 26, 2021

tianon commented Mar 26, 2021

lucasbasquerotto commented Oct 19, 2021 • edited Loading

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as abuse.

etki commented Jan 11, 2017 •

edited

Loading

yosifkit commented Jan 12, 2017 •

edited

Loading

etki commented Jan 17, 2017 •

edited

Loading

codycraven commented Dec 26, 2017 •

edited

Loading

ufoscout commented Mar 22, 2018 •

edited

Loading

lucasbasquerotto commented Oct 19, 2021 •

edited

Loading