Change root dir after using nydus for a while #488

Bubbleioa · 2023-05-29T03:29:29Z

Situation

After deploying nydus on a k8s cluster, it has been accepting scheduling (OCI image).
But we need change root dir to a new one.
cp <old dir> <new dir> is fine, but we lost the data from the old folder.

Start nydus and pull image again, it reports:

FATA[0000] run pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/2/sha256:xxxx" bucket: not found

Causes

This is caused by the inconsistency between containerd's meta.db and nydus-snapshot data.

/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db

Specifically, the container started by nydus-snapshotter still exists in meta.db, and the image used by the container is also in snapshots, but the nydus-snapshotter folder is newly created.

Solution

All we have to do is delete the nydus records in the meta.db.

Make sure containerd's snapshotter is overlayfs, nydus is off.

#! /bin/bash
for ctr in $(nerdctl ps -a --namespace k8s.io  | awk '{print $1}' | awk 'NR!=1')
do
#echo $(nerdctl inspect $ctr --namespace k8s.io)
if [[ $(nerdctl inspect $ctr --namespace k8s.io) == *'"Driver": "nydus"'* ]]; then
  nerdctl rm $ctr --namespace k8s.io
fi
done

WARN[0000] failed to inspect Task error="no running task found: task 18aa081ccfe4a0a8fe77ff758c038dd7ec12b195580cacebc8ba51b72e927c75 not found: not found" id=18aa081ccfe4a0a8fe77ff758c038dd7ec12b195580cacebc8ba51b72e927c75
You may see it, but it doesn't matter

It is also possible to complete using crictl.

Then we delete all the images.

crictl images -q | xargs -n 1 crictl rmi 2>/dev/null

After that, nydus will be able to start normally.

Troubleshooting

When you try to nerdctl rm $ctr --namespace k8s.io, you may see this error:

FATA[0000] 1 errors:
container 6f823cbfaf16735ed9d4978f9d204d023381a5e0c7aa6eff8876a9b0613e4cbf is in running status. unpause/stop container first or force removal

It's because the container started during the last nydus deployment is still running. There are two ways to solve the problem.

Add nerdctl stop $ctr --namespace k8s.io before remove. This operation will stop and delete a running container, please be very careful.
Use boltbrowser to remove snapshots/nydus in /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db and restart containerd. Since the side effects are unknown, you should always back up meta.db HelpWanted: boltbrowser is a tui tool, how can I write it into script?

The text was updated successfully, but these errors were encountered:

adamqqqplay · 2023-06-27T08:29:05Z

Already linked to Nydus FAQ: https://github.com/dragonflyoss/image-service/wiki/FAQ#q-how-to-migrate-the-root-dir-of-nydus-shanpshotter

Maybe we could also include it in some docs in this repo? @imeoer @changweige

imeoer · 2023-06-27T08:32:52Z

@adamqqqplay It's ok to link to FAQ page, thanks @Bubbleioa

imeoer closed this as completed Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change root dir after using nydus for a while #488

Change root dir after using nydus for a while #488

Bubbleioa commented May 29, 2023 •

edited

Loading

adamqqqplay commented Jun 27, 2023

imeoer commented Jun 27, 2023

Change root dir after using nydus for a while #488

Change root dir after using nydus for a while #488

Comments

Bubbleioa commented May 29, 2023 • edited Loading

Situation

Causes

Solution

Troubleshooting

adamqqqplay commented Jun 27, 2023

imeoer commented Jun 27, 2023

Bubbleioa commented May 29, 2023 •

edited

Loading