-
Notifications
You must be signed in to change notification settings - Fork 374
cli: Don't wait for OCI delete to stop the sandbox #232
Conversation
@chavafg could you please try this patch with both |
cli/delete.go
Outdated
if _, err := vci.StopSandbox(sandboxID); err != nil { | ||
return err | ||
func deleteSandbox(sandboxID, containerID string, forceStop bool) error { | ||
if forceStop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a fairly fundamental change so I was expecting to see some new/modified tests for this...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I need to make sure this is the right thing to do and it fixes all issues before I spend some time on those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf - in which case, I wonder if @grahamwhaley's soak test could help us in this regard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have in mind a more global PR. I am currently working on #213 (the unit tests are a nightmare...), but I hope I can update this one tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the soak test, we should look at kata-containers/tests#195 or maybe preferably kata-containers/tests#215 to add that ability/check to kata.
Hi @sboeuf |
I see 2 kata-runtime errors when running
Container ID: Pod ID: Full log:
Again after this,
|
Thanks @chavafg , I'll have to investigate a bit more on this one! |
@chavafg @egernst FWIW, I could properly run a
But something went wrong with
Resulting in error when running
|
3dd3747
to
82a0d12
Compare
Ready to be reviewed. |
82a0d12
to
26ba724
Compare
cli/delete.go
Outdated
// reason because that's the behavior of "--force". Also, a | ||
// previous call to kill might have been performed earlier, | ||
// causing this one to fail. | ||
if err := vci.KillContainer(sandboxID, containerID, syscall.SIGKILL, true); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleteContainer(forceStop==true) will also send SIGKILL to the container. Is there any reason that we need to do it here before virtcontainers does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, deleteSandbox() also sends it. I do not see why we need it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are we killing the containers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nevermind, I forgot this was handled through virtcontainers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed your comment.
202cd2d
to
ff6c8c6
Compare
@WeiZhang555 @laijs @bergwolf PTAL |
} | ||
|
||
switch containerType { | ||
case vc.PodSandbox: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf when do crio/cri-containerd/docker-shim etc. set the podsandbox annotation? It is set for the first container created in a sandbox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is set for the container representing the sandbox, and yes it is usually the first one. In case of CRIO, they create what they call an infra
container as being the sandbox, and then they spawn a bunch of other containers later.
case vc.PodContainer: | ||
_, err = vci.StopContainer(sandboxID, containerID) | ||
default: | ||
return fmt.Errorf("Invalid container type found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add containerID
to this error?
I think the changes make, but I see them as two separate changesets, hence better as two separate commits. Also, I am finding the description of the PR (what has this got to do with "wait"-ing?) and the commit description confusing. |
@jodh-intel I can split them into 2 commits but this means the first one won't be working without the second, which should be fine since we only care about the status of the whole PR. |
The way a delete works, it was always trying to stop the sandbox, even when the force flag was not enabled. Because we want to be able to stop the sandbox from a kill command, this means a sandbox stop might be called twice, and we don't want the second stop to fail, leading to the failure of the delete command. That's why this commit checks for the sandbox status before to try stopping the sandbox. Fixes kata-containers#246 Signed-off-by: Sebastien Boeuf <[email protected]>
The same way a caller of "kata-runtime kill 12345" expects the container 12345 to be killed, the same call to a container representing a sandbox should actually kill the sandbox, meaning it would be stopped after the container has been killed. This way, the caller knows the VM is stopped after kill returns. This is an issue raised by Openshift and Kubernetes tests. They call into delete way after the call to kill has been submitted, and in the meantime they kill all processes related to the container, meaning they do kill the VM before we could do it ourselves. In this case, the delete responsible of stopping the VM comes too late and it returns an error when trying to destroy the sandbox while trying to communicate with the agent since the VM is not here anymore. This commit addresses this issue by letting "kill" call into StopSandbox() if the command relates to a sandbox instead of a simple container. Fixes kata-containers#246 Signed-off-by: Sebastien Boeuf <[email protected]>
ff6c8c6
to
07af4ed
Compare
Codecov Report
@@ Coverage Diff @@
## master #232 +/- ##
=========================================
+ Coverage 65.15% 65.26% +0.1%
=========================================
Files 74 74
Lines 7863 7876 +13
=========================================
+ Hits 5123 5140 +17
+ Misses 2186 2182 -4
Partials 554 554
Continue to review full report at Codecov.
|
@jodh-intel I have just split into 2 different commits, providing a separate description for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @sboeuf! I appreciate that that's a lot of text to describe a relatively small code change, but I do think it makes the behaviour and rationale for the change much clearer. lgtm |
@sboeuf - I'll let you remove the dnm label when you're ready. |
This change adds openshift installation into the CI process and executes a simple test that verifies connectivity between an openshift pod and the host. Fixes: clearcontainers#220. Depends-on: github.com/kata-containers/runtime#232 Signed-off-by: Salvador Fuentes <[email protected]>
Once all vCPUs have been connected, cpuset cgroup MUST BE updated, to achieve that, each cpuset cgroup parent of the container MUST BE updated with the actual range of vCPUs. fixes kata-containers#239 fixes kata-containers#232 Signed-off-by: Julio Montes <[email protected]>
This commit tries to address uses cases where the caller of the
runtime expects a kill to stop the container representing the pod,
meaning in our case the end of the VM.
This leads to some rework of the delete command since we want a
delete --force to always work, even if the container/pod has been
killed before.
Fixes #197
Signed-off-by: Sebastien Boeuf [email protected]