Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verifyStatusResponse does not work with containerd v2 #579

Closed
jfroy opened this issue Sep 10, 2024 · 15 comments · Fixed by #581
Closed

verifyStatusResponse does not work with containerd v2 #579

jfroy opened this issue Sep 10, 2024 · 15 comments · Fixed by #581
Labels
bug Something isn't working

Comments

@jfroy
Copy link
Contributor

jfroy commented Sep 10, 2024

Spegel version

v0.0.24

Kubernetes distribution

Talos

Kubernetes version

v1.31.0

CNI

Cilium

Describe the bug

After updating my homelab cluster to Talos v1.8.0-beta0, I noticed that the spegel pod no longer launched. The logs just say

{"time":"2024-09-10T07:29:25.425786168Z","level":"ERROR","source":{"function":"main.main","file":"/build/main.go","line":89},"msg":"run exit with error","err":"Containerd registry config path needs to be set for mirror configuration to take effect"}

before the program terminates. Looking at the code, that's pkg.oci.containerd verifyStatusResponse() failing its check for ConfigPath in the received containerd config.

https://github.com/spegel-org/spegel/blob/main/pkg/oci/containerd.go#L126

I have a feeling Talos is not configuring containerd correctly, but the config file reads:

version = 3

[plugins]
  [plugins.'io.containerd.cri.v1.images']
    discard_unpacked_layers = false

    [plugins.'io.containerd.cri.v1.images'.registry]
      config_path = '/etc/cri/conf.d/hosts'

which according to https://github.com/containerd/containerd/blob/main/docs/cri/config.md is the correct way in 2.x.

@jfroy jfroy added the bug Something isn't working label Sep 10, 2024
@jfroy
Copy link
Contributor Author

jfroy commented Sep 10, 2024

I am pretty sure containerd v2 no longer has an API endpoint that will return the Registry config object. It's been refactored into internal/cri/config.ImageConfig. The Status rpc returns the internal/cri/config.Config object.

So we may want to make the verification optional or at least not fatal, and also open an issue on containerd.

@jfroy jfroy changed the title verifyStatusResponse does not work with containerd v2 (config schema 3) verifyStatusResponse does not work with containerd v2 Sep 10, 2024
@onedr0p
Copy link
Contributor

onedr0p commented Sep 26, 2024

@phillebaba any chance that PR can be looked at? Talos 1.8.0 has packaged containerd 2.0.0-rc.4.

@aki263
Copy link

aki263 commented Sep 27, 2024

Yeah, same error for me on Talos 1.8.0 with containerd://2.0.0-rc.4.

@onedr0p Any workaround for this error? For now I am just not deploying spegel chart.

@RobReus
Copy link

RobReus commented Sep 27, 2024

I have cherry-picked the patch of PR #581 into the v0.0.24 release tag and built the container image locally. Pushed it to my own registry and configured the helm chart to use my custom image instead. With the patch applied, spegel works perfectly again on Talos 1.8.0.

@onedr0p
Copy link
Contributor

onedr0p commented Sep 27, 2024

@aki263 for now I am using this image in the helm values

image:
  repository: ghcr.io/deedee-ops/spegel
  tag: 0.0.24
  digest: ""

@phillebaba
Copy link
Member

I have been busy with work lately so have not had time for Spegel. I would rather we fix the underlying problem rather than just ignore verification if the version is Containerd v2. I will have a look and see if this can be solved some other way.

@phillebaba
Copy link
Member

phillebaba commented Sep 29, 2024

My current guess is that the that the problem comes from the difference in versions of Containerd config that are supported. Containerd v2 will support v2 and v3 configuration files so either way we will have to deal with this. What better time than now to figure out the problems?

It seems like the compatibility guide no longer works. The default config no longer writes the certs.d path. Could someone share a talosctl command that used to work pre v1.8.0 or one that currently configures everything minus the verification step failing.

@james-callahan
Copy link

Could someone share a talosctl command that used to work pre v1.8.0 or one that currently configures everything minus the verification step failing.

What do you mean by "a talosctl command" here? Installing spegel I never had to run anything via talosctl...

@phillebaba
Copy link
Member

I am not going to spin up a production cluster to test these things. Using talosctl allows me to spin up a local cluster running in docker. When I follow the quick start guide and install Spegel it will not start because there are missing directories or directories that are read only.

Which is why I ask how others are installing Spegel on Talos currently and if there are any special configurations that are required.

@solarisfire
Copy link

Can confirm, the following works on Talos OS v1.8.0:

helm upgrade --create-namespace --namespace spegel --install --version v0.0.24 spegel oci://ghcr.io/spegel-org/helm-charts/spegel   --set image.repository=ghcr.io/deedee-ops/spegel   --set image.digest=sha256:267dc3b79750dfca9a961be1f6f9b39997e598f6ee4b2cac7a98641d163a7a78 --set spegel.containerdRegistryConfigPath="/etc/cri/conf.d/hosts"

@RobReus
Copy link

RobReus commented Oct 4, 2024

I am not going to spin up a production cluster to test these things. Using talosctl allows me to spin up a local cluster running in docker. When I follow the quick start guide and install Spegel it will not start because there are missing directories or directories that are read only.

Which is why I ask how others are installing Spegel on Talos currently and if there are any special configurations that are required.

You're probably getting these errors as Talos uses a non-standard containerd registry path if I recall correctly. These are the helm chart values I use for my spegel installation:

spegel:
  containerdSock: /run/containerd/containerd.sock
  containerdRegistryConfigPath: /etc/cri/conf.d/hosts

With these values, the installation should probably work, however, I have never installed Talos in docker so it might be totally different when not running on bare metal.

@onedr0p
Copy link
Contributor

onedr0p commented Oct 4, 2024

I am not going to spin up a production cluster to test these things. Using talosctl allows me to spin up a local cluster running in docker. When I follow the quick start guide and install Spegel it will not start because there are missing directories or directories that are read only.

If you are following this guide, I am guessing this is due to having Docker be the CRI instead of straight up containerd? It might be better to test with QEMU or Virtualbox?

I do wonder if this issue can be reproduced on any k8s distribution that is using containerd v2.0.0-rc.4 🤔

@phillebaba
Copy link
Member

I have spent some time looking at alternative methods of verifying the configuration without much luck. I have created an issue in Containerd containerd/containerd#10780 for long term work. In the meantime I will work to merge #581 to disable verification for Containerd v2.

It would be nice to start testing Containerd v2 so Talos may be a good candidate to run e2e tests.

soulwhisper added a commit to soulwhisper/home-ops that referenced this issue Oct 11, 2024
soulwhisper added a commit to soulwhisper/home-ops that referenced this issue Oct 11, 2024
@sunnoy
Copy link

sunnoy commented Oct 29, 2024

if use kind 1.31.1 k8s maybe it works

#!/bin/bash

# 定义需要处理的容器列表
CONTAINERS=(
    "kind-control-plane"
    "kind-worker"
    "kind-worker2"
    "kind-worker3"
)

# 配置内容
CONFIG_ADDITION='
[plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"'

# 遍历所有容器
for container in "${CONTAINERS[@]}"; do
    echo "Processing container: $container"
    
    # 检查容器是否运行
    if ! docker ps | grep -q "$container"; then
        echo "Warning: Container $container is not running, skipping..."
        continue
    fi
    
    # 备份原配置文件
    echo "Creating backup of containerd config in $container..."
    docker exec "$container" cp /etc/containerd/config.toml /etc/containerd/config.toml.backup
    
    # 检查配置是否已存在
    if docker exec "$container" grep -q "config_path = \"/etc/containerd/certs.d\"" /etc/containerd/config.toml; then
        echo "Config already exists in $container, skipping..."
        continue
    fi
    
    echo "Adding registry configuration to $container..."
    
    # 添加配置到文件末尾
    docker exec "$container" /bin/sh -c "echo '$CONFIG_ADDITION' >> /etc/containerd/config.toml"
    
    # 重启 containerd(在容器内)
    echo "Restarting containerd in $container..."
    docker exec "$container" systemctl restart containerd
    
    # 验证配置是否已添加
    echo "Verifying configuration in $container..."
    docker exec "$container" grep -A 2 "registry" /etc/containerd/config.toml
    
    echo "Completed processing $container"
    echo "----------------------------------------"
done

echo "All containers processed successfully!"

# 检查所有容器中的 containerd 状态
echo "Checking containerd status in all containers:"
for container in "${CONTAINERS[@]}"; do
    echo "Status for $container:"
    docker exec "$container" systemctl status containerd | grep Active
    echo "----------------------------------------"
done

@phillebaba
Copy link
Member

@sunnoy this has temporarily been fixed in the latest release of Spegel until an upstream fix has been implemented in Containerd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants