Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containerd does not fall back to default endpoint when using wildcard registry entry (airgap env) #10351

Closed
siddjellali opened this issue Jun 13, 2024 · 11 comments
Assignees
Milestone

Comments

@siddjellali
Copy link

siddjellali commented Jun 13, 2024

Environmental Info:
K3s Version:

k3s version v1.30.0+k3s1 (1454953)
go version go1.22.2

Node(s) CPU architecture, OS, and Version:

Linux dane 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

Describe the bug:

1 server

Steps To Reproduce:

INSTALL_K3S_SKIP_DOWNLOAD=true sh $INSTALL_DIR/install.sh - server \
    --cluster-init \
    --debug \
    --disable=traefik \
    --disable=local-storage \
    --disable=servicelb \
    --disable-network-policy \
    --flannel-backend=none \
    --disable-kube-proxy \
    --embedded-registry \
    --disable metrics-server

cat > /etc/rancher/k3s/registries.yaml << EOF
mirrors:
  "*":
EOF

Expected behavior:

all pods and deployments come up correctly on service startup

Actual behavior:

essential static pods do not start up and remain waiting

Additional context / logs:

# kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   cilium-2nl44                               0/1     Init:0/6            0          47m
kube-system   cilium-operator-6c48dbf7b9-d8hrf           0/1     ContainerCreating   0          48m
kube-system   coredns-576bfc4dc7-8l777                   0/1     ContainerCreating   0          48m
kube-system   longhorn-driver-deployer-dd57bf77b-66q4j   0/1     Init:0/1            0          48m
kube-system   longhorn-manager-v4hnb                     0/1     ContainerCreating   0          48m
kube-system   longhorn-ui-966cf4bb4-9tvqn                0/1     ContainerCreating   0          48m
kube-system   longhorn-ui-966cf4bb4-stc5t                0/1     ContainerCreating   0          48m
kube-system   traefik-prtxp                              0/1     ContainerCreating   0          48m
# kubectl get --raw /api/v1/nodes/dane/proxy/metrics | grep -F 'spegel'
# HELP spegel_mirror_requests_total Total number of mirror requests.
# TYPE spegel_mirror_requests_total counter
spegel_mirror_requests_total{cache="miss",registry="docker.io",source="internal"} 742

# journalctl -u k3s -f | grep -i spegel

Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Warning  FailedScheduling        48m                   default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
  Warning  FailedScheduling        48m (x2 over 48m)     default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
  Normal   Scheduled               48m                   default-scheduler  Successfully assigned kube-system/cilium-operator-6c48dbf7b9-d8hrf to dane
  Warning  FailedCreatePodSandBox  43m (x7 over 47m)     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  42m                   kubelet            Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp 34.226.69.105:443: i/o timeout
  Warning  FailedCreatePodSandBox  41m (x4 over 41m)     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  39m (x6 over 40m)     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  36m (x15 over 39m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  20m (x71 over 36m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  17m (x12 over 20m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error
  Warning  FailedCreatePodSandBox  2m21s (x72 over 17m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error

Jun 13 15:31:58 dane k3s[74749]: time="2024-06-13T15:31:58+02:00" level=info msg="spegel 2024/06/13 15:31:58 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"405.253µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:01 dane k3s[74749]: time="2024-06-13T15:32:01+02:00" level=info msg="spegel 2024/06/13 15:32:01 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"359.004µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:01 dane k3s[74749]: time="2024-06-13T15:32:01+02:00" level=info msg="spegel 2024/06/13 15:32:01 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"379.008µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:01 dane k3s[74749]: time="2024-06-13T15:32:01+02:00" level=info msg="spegel 2024/06/13 15:32:01 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"339.396µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:03 dane k3s[74749]: time="2024-06-13T15:32:03+02:00" level=info msg="spegel 2024/06/13 15:32:03 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"249.859µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:04 dane k3s[74749]: time="2024-06-13T15:32:04+02:00" level=info msg="spegel 2024/06/13 15:32:04 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"302.011µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:05 dane k3s[74749]: time="2024-06-13T15:32:05+02:00" level=info msg="spegel 2024/06/13 15:32:05 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"369.366µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:09 dane k3s[74749]: time="2024-06-13T15:32:09+02:00" level=info msg="spegel 2024/06/13 15:32:09 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"615.352µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:11 dane k3s[74749]: time="2024-06-13T15:32:11+02:00" level=info msg="spegel 2024/06/13 15:32:11 p2p: \"level\"=0 \"msg\"=\"leader is self skipping connection to bootstrap node\""
Jun 13 15:32:11 dane k3s[74749]: time="2024-06-13T15:32:11+02:00" level=info msg="spegel 2024/06/13 15:32:11 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"299.93µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:12 dane k3s[74749]: time="2024-06-13T15:32:12+02:00" level=info msg="spegel 2024/06/13 15:32:12 \"level\"=0 \"msg\"=\"running scheduled image state update\""
Jun 13 15:32:12 dane k3s[74749]: time="2024-06-13T15:32:12+02:00" level=info msg="spegel 2024/06/13 15:32:12 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"324.435µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:12 dane k3s[74749]: time="2024-06-13T15:32:12+02:00" level=info msg="spegel 2024/06/13 15:32:12 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"331.427µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:13 dane k3s[74749]: time="2024-06-13T15:32:13+02:00" level=info msg="spegel 2024/06/13 15:32:13 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"356.44µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""
Jun 13 15:32:14 dane k3s[74749]: time="2024-06-13T15:32:14+02:00" level=info msg="spegel 2024/06/13 15:32:14 \"msg\"=\"\" \"error\"=\"mirror resolve retries exhausted for key: docker.io/rancher/mirrored-pause:3.6\" \"path\"=\"/v2/rancher/mirrored-pause/manifests/3.6\" \"status\"=500 \"method\"=\"HEAD\" \"latency\"=\"245.941µs\" \"ip\"=\"127.0.0.1\" \"handler\"=\"mirror\""

# cat /var/lib/rancher/k3s/agent/etc/containerd/certs.d/_default/hosts.toml
# File generated by k3s. DO NOT EDIT.

[host]
[host."https://127.0.0.1:6443/v2"]
  capabilities = ["pull", "resolve"]
  ca = ["/var/lib/rancher/k3s/agent/server-ca.crt"]
  client = [["/var/lib/rancher/k3s/agent/client-k3s-controller.crt", "/var/lib/rancher/k3s/agent/client-k3s-controller.key"]]
@brandond
Copy link
Contributor

Can you check the containerd log file to see what exactly it's actually trying? The kubelet logs do not reflect all the actual endpoints that are attempted when trying the pull.

@brandond brandond added this to the Backlog milestone Jun 13, 2024
@brandond brandond self-assigned this Jun 13, 2024
@siddjellali
Copy link
Author

siddjellali commented Jun 13, 2024

Please find below :

# journalctl -u containerd
Jun 13 22:12:06 dane systemd[1]: Starting containerd container runtime...
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06+02:00" level=warning msg="containerd config version `1` has been deprecated and will be converted on each startup in containerd v2.0, use `containerd config migrate` after upgrading to containerd 2.0 to avoid conversion on startup"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.779983420+02:00" level=info msg="starting containerd" revision=d2d58213f83a351ca8f528a95fbd145f5654e957 version=1.6.33
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.806971366+02:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.aufs\"..." type=io.containerd.snapshotter.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811661649+02:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.aufs\"..." error="aufs is not supported (modprobe aufs failed: exit status 1 \"modprobe: FATAL: Module aufs not found in directory /lib/modules/5.14.0-362.24.1.el9_3>
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811699248+02:00" level=info msg="loading plugin \"io.containerd.content.v1.content\"..." type=io.containerd.content.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811727474+02:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.devmapper\"..." type=io.containerd.snapshotter.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811744053+02:00" level=warning msg="failed to load plugin io.containerd.snapshotter.v1.devmapper" error="devmapper not configured"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811756820+02:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.native\"..." type=io.containerd.snapshotter.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811784558+02:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.overlayfs\"..." type=io.containerd.snapshotter.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.811915103+02:00" level=info msg="loading plugin \"io.containerd.snapshotter.v1.zfs\"..." type=io.containerd.snapshotter.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812102408+02:00" level=info msg="skip loading plugin \"io.containerd.snapshotter.v1.zfs\"..." error="path /var/lib/containerd/io.containerd.snapshotter.v1.zfs must be a zfs filesystem to be used with the zfs snapshotter: skip plugin" type=io.contain>
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812129381+02:00" level=info msg="loading plugin \"io.containerd.metadata.v1.bolt\"..." type=io.containerd.metadata.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812150951+02:00" level=warning msg="could not use snapshotter devmapper in metadata plugin" error="devmapper not configured"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812162007+02:00" level=info msg="metadata content store policy set" policy=shared
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812300655+02:00" level=info msg="loading plugin \"io.containerd.differ.v1.walking\"..." type=io.containerd.differ.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812328614+02:00" level=info msg="loading plugin \"io.containerd.event.v1.exchange\"..." type=io.containerd.event.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812343427+02:00" level=info msg="loading plugin \"io.containerd.gc.v1.scheduler\"..." type=io.containerd.gc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812378510+02:00" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812431131+02:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812482466+02:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812517165+02:00" level=info msg="loading plugin \"io.containerd.warning.v1.deprecations\"..." type=io.containerd.warning.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812532804+02:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812795644+02:00" level=info msg="loading plugin \"io.containerd.service.v1.containers-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812822207+02:00" level=info msg="loading plugin \"io.containerd.service.v1.content-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812838877+02:00" level=info msg="loading plugin \"io.containerd.service.v1.diff-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812855760+02:00" level=info msg="loading plugin \"io.containerd.service.v1.images-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812877434+02:00" level=info msg="loading plugin \"io.containerd.service.v1.introspection-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812893113+02:00" level=info msg="loading plugin \"io.containerd.service.v1.leases-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812908440+02:00" level=info msg="loading plugin \"io.containerd.service.v1.namespaces-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812923601+02:00" level=info msg="loading plugin \"io.containerd.service.v1.snapshots-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812939217+02:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812962099+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812979473+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.812994925+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813009819+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813023834+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813039029+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813053456+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813068628+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813084303+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813100624+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813117134+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813130931+02:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813147985+02:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="skip plugin: tracing endpoint not configured" type=io.containerd.tracing.processor.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813161390+02:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813175942+02:00" level=info msg="skip loading plugin \"io.containerd.internal.v1.tracing\"..." error="skip plugin: tracing endpoint not configured" type=io.containerd.internal.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813188779+02:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813258181+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813286437+02:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813395115+02:00" level=info msg="Start cri plugin with config {PluginConfig:{ContainerdConfig:{Snapshotter:overlayfs DefaultRuntimeName:runc DefaultRuntime:{Type: Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[] Privileged>
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813454431+02:00" level=info msg="Connect containerd service"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.813548758+02:00" level=info msg="Get image filesystem path \"/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs\""
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814182979+02:00" level=info msg="Start subscribing containerd event"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814303227+02:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814311354+02:00" level=info msg="Start recovering state"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814375950+02:00" level=info msg=serving... address=/run/containerd/containerd.sock
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814463729+02:00" level=info msg="Start event monitor"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814527225+02:00" level=info msg="Start snapshots syncer"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814555792+02:00" level=info msg="Start cni network conf syncer for default"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814576978+02:00" level=info msg="Start streaming server"
Jun 13 22:12:06 dane containerd[32602]: time="2024-06-13T22:12:06.814722808+02:00" level=info msg="containerd successfully booted in 0.035886s"
Jun 13 22:12:06 dane systemd[1]: Started containerd container runtime.

@siddjellali
Copy link
Author

You might also find this one interesting.

[root@dane pods]# tail -f  /var/lib/rancher/k3s/agent/containerd/containerd.log
time="2024-06-13T22:28:38.407225150+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:longhorn-ui-966cf4bb4-fkk69,Uid:eebdc890-fac8-4b99-b2ea-c5d8e3b8fbbd,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:28:38.407288713+02:00" level=info msg="stop pulling image docker.io/rancher/mirrored-pause:3.6: active requests=0, bytes read=0"
time="2024-06-13T22:28:40.378677781+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:cilium-ppw2b,Uid:3608a2ba-395a-4328-bb54-ab458faebeb7,Namespace:kube-system,Attempt:0,}"
time="2024-06-13T22:28:42.361881615+02:00" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6\": dial tcp 34.226.69.105:443: i/o timeout" host=registry-1.docker.io
time="2024-06-13T22:28:42.363255193+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:traefik-j7sbz,Uid:92c2c2db-a035-4a99-aed0-bacfae01c27c,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:28:42.363315655+02:00" level=info msg="stop pulling image docker.io/rancher/mirrored-pause:3.6: active requests=0, bytes read=0"
time="2024-06-13T22:28:45.362642962+02:00" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6\": dial tcp 34.226.69.105:443: i/o timeout" host=registry-1.docker.io
time="2024-06-13T22:28:45.364140685+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:longhorn-ui-966cf4bb4-vmr5w,Uid:88d51dcf-04ac-43a9-b24e-fa999a47d088,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:28:45.364238148+02:00" level=info msg="stop pulling image docker.io/rancher/mirrored-pause:3.6: active requests=0, bytes read=0"
time="2024-06-13T22:28:46.351879381+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:coredns-576bfc4dc7-ctrj8,Uid:d72dc388-2480-410c-b193-399b31321db7,Namespace:kube-system,Attempt:0,}"
time="2024-06-13T22:28:51.360740186+02:00" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6\": dial tcp 34.226.69.105:443: i/o timeout" host=registry-1.docker.io
time="2024-06-13T22:28:51.362068104+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:longhorn-manager-ml6z9,Uid:3e4a460f-c35a-44ce-b17c-b58146cad5b1,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:28:51.362135009+02:00" level=info msg="stop pulling image docker.io/rancher/mirrored-pause:3.6: active requests=0, bytes read=0"
time="2024-06-13T22:28:51.378998139+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:longhorn-driver-deployer-dd57bf77b-js5zh,Uid:610ffe0e-d6c5-4d3f-b9bb-fe3d6a6974b1,Namespace:kube-system,Attempt:0,}"
time="2024-06-13T22:28:52.365412952+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:longhorn-ui-966cf4bb4-fkk69,Uid:eebdc890-fac8-4b99-b2ea-c5d8e3b8fbbd,Namespace:kube-system,Attempt:0,}"

@brandond
Copy link
Contributor

The first log shows only the startup, no pulls. The second log shows only pulling from the upstream, which is the opposite of what you reported. Can you attach the full log, showing a pull that is doing as you described?

@siddjellali
Copy link
Author

siddjellali commented Jun 13, 2024

Traefik image are stored localy but i cannot access it.
I don't have access to internet on this server.

#ctr images ls
REF                                                    TYPE                                                 DIGEST                                                                  SIZE      PLATFORMS   LABELS
docker.io/library/traefik:v3.0.2                       application/vnd.docker.distribution.manifest.v2+json sha256:6e5dce54a05259388ae7f48d88a6080adb7e149f46b91327cb4e0b741260d0e1 164.6 MiB linux/amd64 -

# crictl pull docker.io/library/traefik:v3.0.2
E0613 22:59:14.794517   35177 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/library/traefik:v3.0.2\": failed to resolve reference \"docker.io/library/traefik:v3.0.2\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/library/traefik/manifests/v3.0.2?ns=docker.io: 500 Internal Server Error" image="docker.io/library/traefik:v3.0.2"
FATA[0030] pulling image: failed to pull and unpack image "docker.io/library/traefik:v3.0.2": failed to resolve reference "docker.io/library/traefik:v3.0.2": unexpected status from HEAD request to https://127.0.0.1:6443/v2/library/traefik/manifests/v3.0.2?ns=docker.io: 500 Internal Server Error

# tail -f  /var/lib/rancher/k3s/agent/containerd/containerd.log | grep traefik
time="2024-06-13T22:58:44.784037699+02:00" level=info msg="PullImage \"docker.io/library/traefik:v3.0.2\""
time="2024-06-13T22:58:51.400635406+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:traefik-j7sbz,Uid:92c2c2db-a035-4a99-aed0-bacfae01c27c,Namespace:kube-system,Attempt:0,}"
time="2024-06-13T22:59:14.792084668+02:00" level=info msg="trying next host" error="failed to do request: Head \"https://registry-1.docker.io/v2/library/traefik/manifests/v3.0.2\": dial tcp 54.196.99.49:443: i/o timeout" host=registry-1.docker.io
time="2024-06-13T22:59:14.793745397+02:00" level=error msg="PullImage \"docker.io/library/traefik:v3.0.2\" failed" error="failed to pull and unpack image \"docker.io/library/traefik:v3.0.2\": failed to resolve reference \"docker.io/library/traefik:v3.0.2\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/library/traefik/manifests/v3.0.2?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:59:14.793832826+02:00" level=info msg="stop pulling image docker.io/library/traefik:v3.0.2: active requests=0, bytes read=0"
time="2024-06-13T22:59:21.411640254+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:traefik-j7sbz,Uid:92c2c2db-a035-4a99-aed0-bacfae01c27c,Namespace:kube-system,Attempt:0,} failed, error" error="failed to get sandbox image \"rancher/mirrored-pause:3.6\": failed to pull image \"rancher/mirrored-pause:3.6\": failed to pull and unpack image \"docker.io/rancher/mirrored-pause:3.6\": failed to resolve reference \"docker.io/rancher/mirrored-pause:3.6\": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error"
time="2024-06-13T22:59:33.375585471+02:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:traefik-j7sbz,Uid:92c2c2db-a035-4a99-aed0-bacfae01c27c,Namespace:kube-system,Attempt:0,}"


@brandond
Copy link
Contributor

brandond commented Jun 13, 2024

I'm confused by what your problem is then. You're airgapped, and don't want it to fall back to a default endpoint because it's not reachable, but you opened an issue complaining that it doesn't fall back to the default endpoint? And then provided logs showing that it IS falling back to the default endpoint.

What do you want to happen, and what is actually happening?

@brandond
Copy link
Contributor

brandond commented Jun 13, 2024

#ctr images ls

How did you import this image? If you're importing it manually, did you follow the documented instructions to import it into the correct namespace? If you don't import it into the correct containerd namespace, and don't see it in ctr image ls, it won't be available to pull from other nodes.
https://docs.k3s.io/installation/private-registry#adding-images-to-the-private-registry

@siddjellali
Copy link
Author

siddjellali commented Jun 13, 2024

I might be a bit confused myself, so thank you for giving me some time to clarify. I am in an air-gapped environment where I upload my images to the registry using the command ctr pull image . When I start my environment, the first image that Kubernetes tries to load is docker.io/rancher/mirrored-pause:3.6, which is available in the image registry:

# ctr image ls | grep docker.io/rancher/mirrored-pause
docker.io/rancher/mirrored-pause:3.6                   application/vnd.docker.distribution.manifest.v2+json sha256:79b611631c0d19e9a975fb0a8511e5153789b4c26610d1842e9f735c57cc8b13 669.8 KiB linux/amd64 -

However, it cannot find the image.
To address this, I have enabled the Embedded Registry Mirror but without success.
I don't think I need to retag the images since I don't have an endpoint, correct?

The image is stored locally, but k3s is still trying to access the internet to download the image docker.io/rancher/mirrored-pause:3.6. This doesn't make sense, right?

# kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS              RESTARTS   AGE
kube-system   cilium-operator-6c48dbf7b9-4ncr8           0/1     ContainerCreating   0          87m
kube-system   cilium-ppw2b                               0/1     Init:0/6            0          87m

# kubectl describe pods -n kube-system cilium-operator-6c48dbf7b9-4ncr8
Events:
  Type     Reason                  Age                    From     Message
  ----     ------                  ----                   ----     -------
  Warning  FailedCreatePodSandBox  2m51s (x120 over 87m)  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error

@brandond
Copy link
Contributor

brandond commented Jun 13, 2024

I am in an air-gapped environment where I upload my images to the registry using the command ctr pull image.

That pulls images from a registry, into the containerd image store. Where exactly are you pulling it from if you are airgapped and don't have an upstream registry, or a local mirror? How did it get there?

When I start my environment, the first image that Kubernetes tries to load is docker.io/rancher/mirrored-pause:3.6, which is available in the image registry:

You're not showing the contents of your registry here, you're showing the contents of the containerd image store.

The image is stored locally, but k3s is still trying to access the internet to download the image

Is the image in the correct namespace, as described in the documentation I linked earlier? If its in the wrong namespace, it's not visible to the kubelet.

@siddjellali
Copy link
Author

siddjellali commented Jun 13, 2024

You are right ! i store image with ctr image import <tarball_file>
It look like all image are store in default namespace...

I move every image in namespace where pods are running and i got the same error.
ctr -n kube-system image import <tarball_file>

Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               81s                default-scheduler  Successfully assigned kube-system/coredns-576bfc4dc7-z69qj to dane
  Warning  FailedCreatePodSandBox  10s (x2 over 51s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": unexpected status from HEAD request to https://127.0.0.1:6443/v2/rancher/mirrored-pause/manifests/3.6?ns=docker.io: 500 Internal Server Error

what is ==> .6?ns=docker.io ns as namespace maybe ?.

@brandond
Copy link
Contributor

brandond commented Jun 13, 2024

Please stop trying random things and go read the docs page I linked above... you're still using the wrong namespace.

You need to use the actual namespace shown in the docs, not whatever namespace the pods are in. Containerd's namespaces have nothing to do with Kubernetes namespaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed
Status: Done Issue
Development

No branches or pull requests

2 participants