Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underlying kernel does not support BTRFS #388

Closed
willejs opened this issue Aug 22, 2016 · 30 comments
Closed

Underlying kernel does not support BTRFS #388

willejs opened this issue Aug 22, 2016 · 30 comments

Comments

@willejs
Copy link

willejs commented Aug 22, 2016

Expected behavior

When selecting the btrfs docker driver, it works

Actual behavior

It does not work as btrfs is not enabled in the kernel.

Information

  • This affects concourse ci workers
  • Using the latest docker for mac

Steps to reproduce the behavior

  1. try and run docker with the btrfs driver
  2. try to start a concourse ci worker in a container

Possible fix

Compile the kernel with btrfs support enabled in the underlying VM for mac.

@willejs
Copy link
Author

willejs commented Aug 24, 2016

@dsheets @samoht Ideas?

@dsheets
Copy link
Contributor

dsheets commented Sep 26, 2016

Could you direct us to something about why btrfs is necessary for this use case? We currently support aufs and overlay2 and we would like the user experience of Docker for Mac to abstract the graph driver decision entirely. Any information about why btrfs is necessary would be really helpful for us. Thanks!

@willejs
Copy link
Author

willejs commented Sep 26, 2016

Hi @dsheets I'm using concourse ci and they create a scratch disk using btrfs when starting a worker container, if not it falls back to tmpfs which is sloow. However, given btrfs is in the mainline kernel, and hailed as the future filesystem, would it not make sense to add it to the underlying image? Even a way to bake a custom image with a custom kernel would suffice... What are your thoughts?

@dsheets
Copy link
Contributor

dsheets commented Sep 27, 2016

I'm curious about why Concourse CI requires btrfs and falls back to tmpfs. Do you know (or could you find out) why Concourse has these particular requirements and fallback chain?

@cirocosta
Copy link

not using concourse here, but adding another reason to the discussion, aufs does not support disk size quota, right? (as of https://github.com/docker/docker/blob/7e29b33546098816d5dbc1fc429e868f02b69e44/docs/reference/commandline/run.md#set-storage-driver-options-per-container) - brtfs does.

@willejs
Copy link
Author

willejs commented Oct 4, 2016

@dsheets Its hazy. Basically, when you run a concourse worker in docker, it runs tasks in docker containers in the worker, so it does docker in docker. When it does this, it creates btrfs filesystems inside the docker container, I think it mounts them to the task containers, not sure. Anyway, if not it falls back to VFS and creates a tmpfs filesystem which is slow.

Anyway, that aside, why wouldn't you want to support BTRFS?
Are there any plans to make this all open source?

@topherbullock
Copy link

Related Concourse issue concourse/concourse#896

@eedwardsdisco
Copy link

+1

I had to downgrade from 1.13.0 back to 1.12.x due to breaking change with btrfs affecting concourse.

@berisberis
Copy link

@willejs Concourse Ci does not use docker in docker, in fact it does not use docker compose to orchestrate its internal containers. It uses cloudfoundry/garden. I'm planning to use docker in docker to run multiple containers in one concourse task (which can only use one garden container) to be able to do integration tests using various dependencies like selenium, mysql, nginx each in its own container using docker-compose like in my dev/stage/prod environments.

@berisberis
Copy link

+1 I also had to downgrade to docker-toolbox because of this issue. I don't know if docker/for-mac should support btrfs or concourse should use other storage drivers.

@vito
Copy link

vito commented Feb 2, 2017

@dsheets Concourse uses btrfs because it nests trivially, allowing our docker-image resource to simply spin up Docker, have it use its btrfs driver, and fetch images with the Docker CLI. If we were to use aufs or overlay the resource would have to use a loopback device to make a local system image, as neither of those nest. This is costly as there can be many docker-image resources, and loopback devices are a global system resource, that can outlive their container if we're not careful.

@doubledgedboard
Copy link

@dsheets sorry to ping on this one, but it's forcing me to stay on 1.12.x and that's quickly going to become a pain point as pressure to upgrade increases

is there an official stance on why btrfs support is gone? I'm surprised considering how glowingly positive the official docker article and release notes are for it

@dsheets
Copy link
Contributor

dsheets commented Feb 24, 2017

@doubledgedboard As far as I know, Docker for Mac has never supported btrfs. What is the breaking change from 1.12.x to 1.13.x?

We haven't enabled btrfs because it slows boot by an unacceptably long time for a feature that is typically unused.

@eedwardsdisco
Copy link

@dsheets so what is the change that caused concourse/concourse#896 to break from 1.12.x to 1.13.x?

The concourse team is saying it's a btrfs issue with docker for mac.

@dsheets
Copy link
Contributor

dsheets commented Feb 24, 2017

@eedwardsdisco I don't know what change caused the regression. Could you please post a step-by-step reproduction with any required configuration files here so we can investigate or bisect the issue? We are not familiar with Concourse so a sequence of steps to go from a fresh macOS install to either success (under 1.12.x) or failure (under 1.13.x) would greatly speed our work. Thanks!

@berisberis
Copy link

berisberis commented Feb 24, 2017

@dsheets Try this docker-compose in docker/for-mac:

concourse-db:
  image: postgres:9.5
  environment:
    POSTGRES_DB: concourse
    POSTGRES_USER: concourse
    POSTGRES_PASSWORD: changeme
    PGDATA: /database

concourse-web:
  image: concourse/concourse
  links: [concourse-db]
  command: web
  ports: ["8080:8080"]
  volumes: ["./keys/web:/concourse-keys"]
  environment:
    CONCOURSE_BASIC_AUTH_USERNAME: concourse
    CONCOURSE_BASIC_AUTH_PASSWORD: changeme
    CONCOURSE_EXTERNAL_URL: http://ci.example.app:8080
    CONCOURSE_POSTGRES_DATA_SOURCE: |-
      postgres://concourse:changeme@concourse-db:5432/concourse?sslmode=disable

concourse-worker:
  image: concourse/concourse
  privileged: true
  links: [concourse-web]
  command: worker
  volumes: ["./keys/worker:/concourse-keys"]
  environment:
    CONCOURSE_TSA_HOST: concourse-web

Use this docker-compose script to bring up concourse in docker for mac and then try to run this (or any) simple pipeline.

groups:
- name: develop
  jobs:
  - navi
  
resources:
- name: every-1m
  type: time
  source: {interval: 1m}

jobs:
- name: navi
  plan:
  - get: every-1m
    trigger: true
  - task: annoy
    config:
      platform: linux
      image_resource:
        type: docker-image
        source: {repository: ubuntu}
      run:
        path: echo
        args: ["Hey! Listen!"]

@berisberis
Copy link

@dsheets you will also need the fly-cli to login and register the pipeline
https://concourse.ci/fly-cli.html

@dsheets
Copy link
Contributor

dsheets commented Feb 24, 2017

@berisberis Ok, I run docker-compose up with the compose file and get

concourse-web_1     | failed to load authorized keys: open : no such file or directory
concoursebug_concourse-web_1 exited with code 1

I'm not sure what to do with your second file. Where do I save it and with what file name? Do I need to install software on the host? Which software exactly (version)? How do I run the pipeline?

@berisberis
Copy link

berisberis commented Feb 24, 2017

@dsheets yo also need a folder ./keys/web and ./keys/workerin the same path as the docker-compose.

@berisberis
Copy link

berisberis commented Feb 24, 2017

@dsheets also... for the second file you can name it whatever you want .yml
that is the name you will use when registering the pipeline with the fly-cli

@berisberis
Copy link

When you have the fly-cli run this to login:
fly -t concourse login -c http://ci.example.app:8080
use the user and login in the docker-compose file.
then use this to register the pipeline:
fly sp -t concourse -c ~/path/to/your/pipeline.yml -p MyPipeline

@dsheets
Copy link
Contributor

dsheets commented Feb 24, 2017

@berisberis I have created keys/web and I still get the error above. I downloaded the fly CLI binary 2.7.0 from https://concourse.ci/downloads.html but I'm not sure if I need the web container running before testing the system. I don't understand which steps must be done in order to observe the failure and what state is present after they are done. It would be very helpful to have a list of exactly the steps needed to reproduce the issue, preferably with as few steps as possible. Additionally, knowing the easiest way to reset the system (other than deleting everything related) would be helpful but isn't necessary. We don't know how to use Concourse or what its state model looks like and we unfortunately don't have time to learn how to use Concourse competently and then guess whether we are seeing the same failure you are seeing.

@eedwardsdisco
Copy link

@dsheets

Hey David,

Here's some explicit steps (from http://concourse.ci/docker-repository.html)

Create docker-compose.yml (uses latest concourse binary)

concourse-db:
  image: postgres:9.5
  environment:
    POSTGRES_DB: concourse
    POSTGRES_USER: concourse
    POSTGRES_PASSWORD: changeme
    PGDATA: /database

concourse-web:
  image: concourse/concourse
  links: [concourse-db]
  command: web
  ports: ["8080:8080"]
  volumes: ["./keys/web:/concourse-keys"]
  environment:
    CONCOURSE_BASIC_AUTH_USERNAME: concourse
    CONCOURSE_BASIC_AUTH_PASSWORD: changeme
    CONCOURSE_EXTERNAL_URL: "${CONCOURSE_EXTERNAL_URL}"
    CONCOURSE_POSTGRES_DATA_SOURCE: |-
      postgres://concourse:changeme@concourse-db:5432/concourse?sslmode=disable

concourse-worker:
  image: concourse/concourse
  privileged: true
  links: [concourse-web]
  command: worker
  volumes: ["./keys/worker:/concourse-keys"]
  environment:
    CONCOURSE_TSA_HOST: concourse-web

create keys

mkdir -p keys/web keys/worker

ssh-keygen -t rsa -f ./keys/web/tsa_host_key -N ''
ssh-keygen -t rsa -f ./keys/web/session_signing_key -N ''

ssh-keygen -t rsa -f ./keys/worker/worker_key -N ''

cp ./keys/worker/worker_key.pub ./keys/web/authorized_worker_keys
cp ./keys/web/tsa_host_key.pub ./keys/worker

create a host entry in /private/etc/hosts pointing 'concourse' to your current local interface IP (not loopback!)

192.168.1.10  concourse

export env var mapping external host to your custom local dns

export CONCOURSE_EXTERNAL_URL=http://concourse:8080

start the concourse stack (web\worker\coordinator)

docker-compose up

browse to the url and download the fly cli from the link on the page

http://concourse:8080

use the fly cli to create your login target (yes the password literally is 'changeme' as per above)

fly login --target=main --concourse-url=http://concourse:8080 --username=concourse --password=changeme --team-name=main

create navi-pipeline.yml pipeline file

resources:
- name: every-1m
  type: time
  source: {interval: 1m}

jobs:
- name: navi
  plan:
  - get: every-1m
    trigger: true
  - task: annoy
    config:
      platform: linux
      image_resource:
        type: docker-image
        source: {repository: ubuntu}
      run:
        path: echo
        args: ["Hey! Listen!"]

upload pipeline to concourse

fly -t main set-pipeline -p hello-world -c navi-pipeline.yml

observe automatic (every minute) invocation of pipeline at the url

http://concourse:8080

destroy the stack (e.g. to then switch underlying docker versions...)

docker-compose down

repeat as needed

@dsheets
Copy link
Contributor

dsheets commented Mar 1, 2017

@eedwardsdisco Thanks! I did all of that under 1.13.1 and 1.12.6 and, as far as I could tell, the behavior was the same. The pipeline shows "pending" pulsing, "starting" pulsing, and eventually "failed" highlighted in the web UI. The logs of the worker show:

concourse-worker_1 | {"timestamp":"1488378340.510931969","source":"worker","message":"worker.baggageclaim.fs.run-command.failed","log_level":2,"data":{"args":["bash","-e","-x","-c","\n\t\tif [ ! -e $IMAGE_PATH ] || [ "$(stat --printf="%s" $IMAGE_PATH)" != "$SIZE_IN_BYTES" ]; then\n\t\t\ttouch $IMAGE_PATH\n\t\t\ttruncate -s ${SIZE_IN_BYTES} $IMAGE_PATH\n\t\tfi\n\n\t\tlo="$(losetup -j $IMAGE_PATH | cut -d':' -f1)"\n\t\tif [ -z "$lo" ]; then\n\t\t\tlo="$(losetup -f --show $IMAGE_PATH)"\n\t\tfi\n\n\t\tif ! file $IMAGE_PATH | grep BTRFS; then\n\t\t\t/worker-state/2.7.0/linux/btrfs/mkfs.btrfs --nodiscard $IMAGE_PATH\n\t\tfi\n\n\t\tmkdir -p $MOUNT_PATH\n\n\t\tif ! mountpoint -q $MOUNT_PATH; then\n\t\t\tmount -t btrfs $lo $MOUNT_PATH\n\t\tfi\n\t"],"command":"/bin/bash","env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","MOUNT_PATH=/worker-state/volumes","IMAGE_PATH=/worker-state/volumes.img","SIZE_IN_BYTES=63381999616"],"error":"exit status 32","session":"2.2.1","stderr":"+ '[' '!' -e /worker-state/volumes.img ']'\n++ stat --printf=%s /worker-state/volumes.img\n+ '[' 63381999616 '!=' 63381999616 ']'\n++ losetup -j /worker-state/volumes.img\n++ cut -d: -f1\n+ lo=\n+ '[' -z '' ']'\n++ losetup -f --show /worker-state/volumes.img\n+ lo=/dev/loop1\n+ file /worker-state/volumes.img\n+ grep BTRFS\nbash: line 11: file: command not found\n+ /worker-state/2.7.0/linux/btrfs/mkfs.btrfs --nodiscard /worker-state/volumes.img\n+ mkdir -p /worker-state/volumes\n+ mountpoint -q /worker-state/volumes\n+ mount -t btrfs /dev/loop1 /worker-state/volumes\nmount: unknown filesystem type 'btrfs'\n","stdout":"btrfs-progs v4.4\nSee http://btrfs.wiki.kernel.org for more information.\n\nLabel: (null)\nUUID: b2f2ac93-4ccd-4f47-9216-f3c06ff4223e\nNode size: 16384\nSector size: 4096\nFilesystem size: 59.03GiB\nBlock group profiles:\n Data: single 8.00MiB\n Metadata: DUP 1.01GiB\n System: DUP 12.00MiB\nSSD detected: no\nIncompat features: extref, skinny-metadata\nNumber of devices: 1\nDevices:\n ID SIZE PATH\n 1 59.03GiB /worker-state/volumes.img\n\n"}}

which contains the error:

mount: unknown filesystem type 'btrfs'

from

mount -t btrfs /dev/loop1 /worker-state/volumes

I don't see a difference between running your reproduction on 1.12.6 and 1.13.1 so I don't think there's been a regression in Docker for Mac or this test case does not show it. I see

{"timestamp":"1488379673.105854750","source":"worker","message":"worker.baggageclaim.falling-back-on-naive-driver","log_level":2,"data":{"error":"exit status 32","session":"2"}}

in the logs but no further mention of the driver. Later, I see

{"timestamp":"1488379673.116163015","source":"worker","message":"worker.beacon.restarting","log_level":2,"data":{"error":"failed to dial: failed to connect to TSA: dial tcp 172.17.0.3:2222: getsockopt: connection refused","session":"3"}}{"timestamp":"1488379673.116110563","source":"baggageclaim","message":"baggageclaim.listening","log_level":1,"data":{"addr":"127.0.0.1:7788"}}

which sounds potentially fatal. Is this error expected?

@eedwardsdisco
Copy link

@dsheets

Hrm strange. I'm using 1.12.3 and it worked. I'm going to try and see if I can get a copy of 1.12.6 and see if it breaks on that. (and I'm on OSX Sierra 10.12.3)

Pulling ubuntu@sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535...
sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535: Pulling from library/ubuntu
d54efb8db41d: Pulling fs layer
f8b845f45a87: Pulling fs layer
e8db7bf7c39f: Pulling fs layer
9654c40e9079: Pulling fs layer
6d9ef359eaaa: Pulling fs layer
9654c40e9079: Waiting
6d9ef359eaaa: Waiting
f8b845f45a87: Verifying Checksum
f8b845f45a87: Download complete
e8db7bf7c39f: Download complete
9654c40e9079: Verifying Checksum
9654c40e9079: Download complete
6d9ef359eaaa: Verifying Checksum
6d9ef359eaaa: Download complete
d54efb8db41d: Verifying Checksum
d54efb8db41d: Download complete
d54efb8db41d: Pull complete
f8b845f45a87: Pull complete
e8db7bf7c39f: Pull complete
9654c40e9079: Pull complete
6d9ef359eaaa: Pull complete
Digest: sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535
Status: Downloaded newer image for ubuntu@sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535

Successfully pulled ubuntu@sha256:dd7808d8792c9841d0b460122f1acf0a2dd1f56404f8d1e56298048885e45535.

Hey! Listen!

@eedwardsdisco
Copy link

@dsheets

Very strange. While you're seeing failure on both versions, I'm now seeing success.

Tried the repro I gave you on 1.12.3 (what I was running) and it worked, but then upgraded to 1.12.6 (the version you were running) and it worked, and then finally upgraded to latest ( 17.03.0-ce-mac1) and it still worked.

Rebooted, cleared my local volumes, ran again, still worked.

I'm stumped but I can't complain. I'll wait for others to see what they report when running latest concourse + docker.

@ericis
Copy link

ericis commented May 2, 2017

I get a similar error running Windows 10 w/ Docker 17.03.1-ce, build c6d412e and using latest instructions and builds from http://concourse.ci/docker-repository.html.

docker: Error response from daemon: error creating aufs mount to /var/lib/docker/aufs/mnt/9df06285b4b1b55e5c87c8f9b74274b6404cc2fd5e259db0127f236946322fed-init: invalid argument.
See 'docker run --help'.

@rn
Copy link

rn commented Mar 23, 2018

btrfs is enabled in the LinuxKit based Docker for Mac. However it is only available as a module as compiling it into the kernel slows down the boot process considerably.

You may have to modprobe btrfs from a sufficiently privileged container first.

@rn rn closed this as completed Mar 23, 2018
@vito
Copy link

vito commented Mar 28, 2018

@rn Thanks!

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jun 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests