Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change root dir after using nydus for a while #488

Closed
Bubbleioa opened this issue May 29, 2023 · 2 comments
Closed

Change root dir after using nydus for a while #488

Bubbleioa opened this issue May 29, 2023 · 2 comments

Comments

@Bubbleioa
Copy link

Bubbleioa commented May 29, 2023

Situation

After deploying nydus on a k8s cluster, it has been accepting scheduling (OCI image).
But we need change root dir to a new one.
cp <old dir> <new dir> is fine, but we lost the data from the old folder.

Start nydus and pull image again, it reports:

FATA[0000] run pod sandbox: rpc error: code = NotFound desc = failed to create containerd container: failed to create snapshot: missing parent "k8s.io/2/sha256:xxxx" bucket: not found

Causes

This is caused by the inconsistency between containerd's meta.db and nydus-snapshot data.

/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db

Specifically, the container started by nydus-snapshotter still exists in meta.db, and the image used by the container is also in snapshots, but the nydus-snapshotter folder is newly created.
image
image

Solution

All we have to do is delete the nydus records in the meta.db.

Make sure containerd's snapshotter is overlayfs, nydus is off.

#! /bin/bash
for ctr in $(nerdctl ps -a --namespace k8s.io  | awk '{print $1}' | awk 'NR!=1')
do
#echo $(nerdctl inspect $ctr --namespace k8s.io)
if [[ $(nerdctl inspect $ctr --namespace k8s.io) == *'"Driver": "nydus"'* ]]; then
  nerdctl rm $ctr --namespace k8s.io
fi
done

WARN[0000] failed to inspect Task error="no running task found: task 18aa081ccfe4a0a8fe77ff758c038dd7ec12b195580cacebc8ba51b72e927c75 not found: not found" id=18aa081ccfe4a0a8fe77ff758c038dd7ec12b195580cacebc8ba51b72e927c75
You may see it, but it doesn't matter

It is also possible to complete using crictl.

Then we delete all the images.

crictl images -q | xargs -n 1 crictl rmi 2>/dev/null

After that, nydus will be able to start normally.

Troubleshooting

When you try to nerdctl rm $ctr --namespace k8s.io, you may see this error:

FATA[0000] 1 errors:
container 6f823cbfaf16735ed9d4978f9d204d023381a5e0c7aa6eff8876a9b0613e4cbf is in running status. unpause/stop container first or force removal

It's because the container started during the last nydus deployment is still running. There are two ways to solve the problem.

  1. Add nerdctl stop $ctr --namespace k8s.io before remove. This operation will stop and delete a running container, please be very careful.
  2. Use boltbrowser to remove snapshots/nydus in /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db and restart containerd. Since the side effects are unknown, you should always back up meta.db HelpWanted: boltbrowser is a tui tool, how can I write it into script?
@adamqqqplay
Copy link
Contributor

Already linked to Nydus FAQ: https://github.com/dragonflyoss/image-service/wiki/FAQ#q-how-to-migrate-the-root-dir-of-nydus-shanpshotter

Maybe we could also include it in some docs in this repo? @imeoer @changweige

@imeoer
Copy link
Collaborator

imeoer commented Jun 27, 2023

@adamqqqplay It's ok to link to FAQ page, thanks @Bubbleioa

@imeoer imeoer closed this as completed Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants