Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

How to make dockershim support clear containers? #987

Open
miaoyq opened this issue Feb 6, 2018 · 11 comments
Open

How to make dockershim support clear containers? #987

miaoyq opened this issue Feb 6, 2018 · 11 comments

Comments

@miaoyq
Copy link

miaoyq commented Feb 6, 2018

Recently I was trying to make dockershim support cc, and found that dockershim supporting cc need to meet the following conditions at least:

  1. Dockershim need provide container type(app or sandbox container) info in Annotation that clear containers need.
    dockershim have provided this info in Label` field of docker container struct.
  2. Docker need pass Label of docker container to Annotations of OCI runtime spec.
    I have sent a pr to moby project about this, see: moby/moby/pull/36181
  3. Annotations(Labels) of dockershim can be recognized by cc-runtime
    The annotations(labels) are defined as private fields in dockershim and can't be referenced by virtcontainer. So I only modify the codes locally for testing.
  4. Kubelet should be able to get the IP address after the sandbox started
    The IP address of the veth device that is create by cni plugin will be remove when cc-runtime create a tap device linked to the veth device. So we must create the network namespace and get the IP address of the veth before creating the sandbox container with the existing network namespace, just like this issue description in cri-containerd. However, docker does not provide an interface for using existing network namespace to create containers, we must use the namespace created by docker, so we can't get the IP via cni plugin after setting up the network since the IP of veth have been removed. This is the main obstruction point for me at present.

After modifying the code corresponding to the first three conditions, and compiling codes of each component and setting up the k8s cluster, I could create the pod successfully but the network of pod could not be set up correctly. I do not know whether it is related to conditions, expect to get help here. Thanks!

@sameo @plutoinmii

/cc @guangxuli

@sameo
Copy link

sameo commented Feb 6, 2018

cc @egernst

@miaoyq I think we're looking at a couple of potential solutions here:

  1. Have virtcontainers somehow not cleaning the veth IP so that PodSandboxStatus() would just work.
  2. Have docker-shim cache the pod IP and use that cache instead of running unnecessary netns calls into the pod networking namespace. A pod IP is immutable, so there's no reasons for always checking about it.
  3. Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

Solution #2 is obviously the most appealing to us, as #1 would be quite disruptive for CC in general.
Solution #3 may work, but since I don't have a working docker-shim based CC deployment, I can't guarantee that. If it does work, it would be a quick and portable solution. #2 would still be more efficient...

@egernst
Copy link

egernst commented Feb 6, 2018

Thanks for the heads up @sameo

Interesting, painful work @miaoyq! I don't have any other input aside from what sameo already pointed out. Are you looking to compare CC with dockershim v cri-containerd?

@miaoyq
Copy link
Author

miaoyq commented Feb 7, 2018

@sameo Thanks for providing the potential solutions

  1. Use the Docker engine API (GET /containers/(id or name)/json) to ask for the pod container network settings from the Docker engine itself.

The third solution doesn't work, since the network of sandbox is set up by cni plugin, so docker can not get the IP of the veth. I have verified this locally. :-)

The second solution may be the best one. I will also try to seek better ways to get the sandbox IP on dockershim side.

@miaoyq
Copy link
Author

miaoyq commented Feb 7, 2018

Are you looking to compare CC with dockershim v cri-containerd?

@egernst Yeah. k8s + cri-containerd(containerd) + cc can work normally. Currently, The main difference between dockershim and cri-containerd is the network creation of the sandbox.

@sameo
Copy link

sameo commented Feb 7, 2018

@miaoyq There already is a per pod map in the dockerService to track if a pod's network is ready. It would seem logical to add an IP string there in order to track both the network readiness and its IP.
It should not be a big effort neither, so probably worth it.

@miaoyq
Copy link
Author

miaoyq commented Feb 9, 2018

@sameo Yeah, it's a good idea.
Actually, dockershim have provided the way to store the pod data. When run a sandbox, the checkpointHandler of dockerService will create a checkpoint for this pod, the checkpoint can be used to track the pod IP. :-)

@miaoyq
Copy link
Author

miaoyq commented Feb 9, 2018

@sameo Today, I ran a container via docker + cc, and get the container pid by docker inspect -f '{{.State.Pid}}' [container_id], I found the pid of container is not the pid of qemu process (in container network namespace) but the pid of cc-shim process (in host network namespace).

In dockershim, the pid of container is used to get the netns path of sandbox container, so dockershim can't get the right netns path currently.

Could we get the pid of qemu process instead of cc-shim process when cc-runtme create vm-based container?

@sboeuf
Copy link
Contributor

sboeuf commented Feb 9, 2018

@miaoyq we actually don't want to provide the pid of the qemu process since this is related to the pod and not to the container itself.
What we could do to solve your issue is to make sure that we run the shim in the right network namespace. Look into virtcontainers shim implementation cc_shim.go pointing to some function in shim.go. This is where we should spawn the shim process inside the right namespace.

Please open an issue on virtcontainers repo for that, so that we can discuss further on this.

@miaoyq
Copy link
Author

miaoyq commented Feb 11, 2018

Thanks @sboeuf, I also think it's reasonable to run shim in container network namespace. :)
I have filed an issue on virtcontainers repo. See: containers/virtcontainers#615

@miaoyq
Copy link
Author

miaoyq commented Feb 11, 2018

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

@miaoyq
Copy link
Author

miaoyq commented Feb 11, 2018

@sameo @sboeuf BTW, Could we reset the network for a exist cc container whose network have't been set when created?

I think I have got the answer in containers/virtcontainers#241.
Looking forward to this feature. :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants