Deploy CephFS in Docker Swarm

Note: Due to lack of privileged:true containers in Swarm, there is no way to run Ceph Bluestore.

Deployment

Consider we have 5 nodes swarm cluster:

$ docker node ls --format 'table {{.Hostname}}\t{{.ManagerStatus}}'
HOSTNAME            MANAGER STATUS
node1.domain.local  Leader
node2.domain.local  Reachable
node3.domain.local  Reachable
node4.domain.local
node5.domain.local

First 3 nodes are masters, and the rest are workers. Locations of corresponding Ceph roles:

mon to master nodes
osd to each node
mds two (active/standby) anywhere
mgr one anywhere

As osd would work in directory mode, preparing disks on each swarm node manually:

apt install xfsprogs
mkfs.xfs -f -i size=2048 /dev/sdX
echo '/dev/sdX /mnt/osd xfs rw,noatime,inode64 0 0' >> /etc/fstab
mkdir -p /mnt/osd && mount /mnt/osd

Generate secrets and configs for uploading to swarm. This should be done on any swarm master node via throw-away container:

docker run -d --rm --net=host \
    --name ceph_mon \
    -v `pwd`/etc:/etc/ceph \
    -v `pwd`/var:/var/lib/ceph \
    -e NETWORK_AUTO_DETECT=4 \
    -e DEBUG=verbose \
    ceph/daemon mon

docker exec -it ceph_mon ceph mon getmap -o /etc/ceph/ceph.monmap

docker stop ceph_mon

Need to fix main config and provide all mon hostnames (which are the same as swarm masters):

# cat etc/ceph.conf
[global]
fsid = 1e4d9f52-314e-49f4-a2d3-5283da875e33
mon initial members = node1, node2, node3
mon host = node1.domain.local, node2.domain.local, node3.domain.local
osd journal size = 100
log file = /dev/null
mon cluster log file = /var/lib/ceph/mon/$cluster-$id/$channel.log

Create secrets and configs in swarm:

docker config create ceph.conf etc/ceph.conf
docker config ls

docker secret create ceph.monmap etc/ceph.monmap
docker secret create ceph.client.admin.keyring etc/ceph.client.admin.keyring
docker secret create ceph.mon.keyring etc/ceph.mon.keyring
docker secret create ceph.bootstrap-osd.keyring var/bootstrap-osd/ceph.keyring
docker secret ls

# Cleanup
rm -r ./var ./etc

Deploy the stack:

docker stack deploy -c docker-compose.yml ceph

After everything is up, login to any mon container:

# docker exec -it `docker ps -qf name=ceph_mon` bash
# ceph -s
  cluster:
    id:     1e4d9f52-314e-49f4-a2d3-5283da875e33
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node3,node2,node1
    mgr: node1(active)
    osd: 5 osds: 5 up, 5 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   209 MB used, 10020 MB / 10230 MB avail
    pgs:

# Configure CephFS
ceph osd pool create cephfs_data 64
ceph osd pool create cephfs_metadata 64
ceph fs new cephfs cephfs_metadata cephfs_data
# User for mounting, save this key
ceph fs authorize cephfs client.swarm / rw

# Tweak for VMs
ceph osd pool set cephfs_data nodeep-scrub 1

Client Mounting

On each node specify at least 2 swarm master nodes, to mount from:

# Save the key from previous step:
echo 'AQDilPRa1BYKFxAanqbBx0JnutW4AdlYJmUehg==' > /root/.ceph
apt install ceph-fs-common
echo 'node1.domain.local,node2.domain.local:/ /mnt/ceph ceph _netdev,name=swarm,secretfile=/root/.ceph 0 0' >> /etc/fstab
mkdir /mnt/ceph && mount /mnt/ceph

Basic comparison with GlusterFS

Installing GlusterFS on same 3 swarm master nodes with one replica=3 volume mounted to /mnt/gluster on default settings:

gluster volume info

Volume Name: data
Type: Replicate
Volume ID: 9a582ddc-b593-4694-921c-d5601787936d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1.domain.local:/var/lib/brick/data
Brick2: node2.domain.local:/var/lib/brick/data
Brick3: node3.domain.local:/var/lib/brick/data
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Write throughtput:

dd if=/dev/zero of=/mnt/gluster/test bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.28377 s, 45.9 MB/s

dd if=/dev/zero of=/mnt/ceph/test bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0868178 s, 1.2 GB/s

Metadata ops (same content ~125 dirs):

time ls -R /mnt/gluster >/dev/null
real  0m0.101s
user  0m0.000s
sys   0m0.004s

time ls -R /mnt/ceph >/dev/null
real  0m0.004s
user  0m0.000s
sys   0m0.000s

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
compat.sh		compat.sh
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deploy CephFS in Docker Swarm

Deployment

Client Mounting

Basic comparison with GlusterFS

About

Releases

Packages

Languages

sepich/ceph-swarm

Folders and files

Latest commit

History

Repository files navigation

Deploy CephFS in Docker Swarm

Deployment

Client Mounting

Basic comparison with GlusterFS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages