Fix container package #820

NicolasMahe · 2019-03-16T16:50:44Z

This PR fixes and simplifies a few stuff on the package container:

Introduce the option StopGracePeriod.
This option tells docker the time to wait between the service is removed and killing the associated container. Now it's possible to set a custom on when starting a docker service.
Upgrade function deletePendingContainer to wait for a grace period before removing the container.
It will either use the grace period is set in the docker service data or use the default one (10sec) if the first is not set.
The previous logic of deletePendingContainer was killing and removing the container after only a few second, without anyway to change the time (timeout of containerStop was somehow not taking into account, maybe because of docker service).
Simplifying / improving function DeleteNetwork.
The DeleteNetwork was using Docker.Events function to listen for network deletion event with a context with timeout. The problem is, sometime, the network is deleted either too early or too late (after the context timeout), so the DeleteNetwork was falling in such case.
I update the code to use a similar logic that the function deletePendingContainer, a recursive call that check if the network is actually deleted first.
Change default container.callTimeout to 60sec.

antho1404 · 2019-03-17T09:08:05Z

Why removing the part to listen when the network is actually deleted ?

We need to ensure that the network is deleted when we remove the service otherwise we will have some problems again. Also adding 1s a bit everywhere in the code is not a good solution, sometime it might take more that 1s to delete the network/container/service

NicolasMahe · 2019-03-17T09:10:40Z

@antho1404

Why removing the part to listen when the network is actually deleted ?

We need to ensure that the network is deleted when we remove the service otherwise we will have some problems again. Also adding 1s a bit everywhere in the code is not a good solution, sometime it might take more that 1s to delete the network/container/service

Did you see the explanation about this in the PR description? Do you want more explanation?

The DeleteNetwork was using Docker.Events function to listen for network deletion event with a context with timeout. The problem is, sometime, the network is deleted either too early or too late (after the context timeout), so the DeleteNetwork was falling in such case.
I update the code to use a similar logic that the function deletePendingContainer, a recursive call that check if the network is actually deleted first.

antho1404 · 2019-03-17T09:31:20Z

Sorry i missed the recursive call. It's better but I feel having a polling or any kind of solutions like that is not so good when we can have events. Why not just using a context.Background and remove the timeout and still keep the event based system ?

NicolasMahe · 2019-03-17T10:21:39Z

Sorry i missed the recursive call. It's better but I feel having a polling or any kind of solutions like that is not so good when we can have events. Why not just using a context.Background and remove the timeout and still keep the event based system ?

That's what I tried first and it doesn't solve the issue.
I guess the problem is some events are fired before the listener is actually ready (same problem we had and solve with the gRPC acknowledgement header system), or we don't listen for the right event, or the event is never trigger under certain condition..

A Docker event system will make sense to sync in background the service status in realtime between Docker and the Service database. Otherwise, I feel it's too complicated.

container/service.go

…ve call with a 1sec sleep.

…ice.

…t one.

NicolasMahe · 2019-03-18T12:01:37Z

@mesg-foundation/core Modification done, PR description updated, please review

ilgooz · 2019-03-19T06:22:43Z

container/service.go

-func (c *DockerContainer) deletePendingContainer(namespace []string) error {
-	ctx, cancel := context.WithTimeout(context.Background(), c.callTimeout)
-	defer cancel()
+func (c *DockerContainer) deletePendingContainer(namespace []string, maxGraceTime time.Time) error {


We can simplify time logic by using <-time.After(stopGracePeriod).

I don't see how it will simplify.

The goal here is to "return" as soon as the container is not found.
The maxGraceTime is kind of a timeout.

I think using <-time.After(stopGracePeriod) will always to call ContainerRemove or to prevent this, another chan done need to be created to tell the go routine to stop. So actually it seems more complex.

yes, i guess current implementation is fine because after we remove the service and check if the container exists right after, Docker still might be in the deleting process of container. So it's good that we don't block for the whole timeout and check the status of container again in short periods.

ilgooz

manual tests are good

NicolasMahe marked this pull request as ready for review March 16, 2019 17:15

NicolasMahe requested review from ilgooz, antho1404 and krhubert March 16, 2019 17:15

NicolasMahe commented Mar 18, 2019

View reviewed changes

container/service.go Outdated Show resolved Hide resolved

NicolasMahe mentioned this pull request Mar 18, 2019

stop service on core stop in cli #815

Closed

NicolasMahe added 4 commits March 18, 2019 18:47

Remove event listener from DeleteNetwork in favor of a simple recursi…

ecdd7f1

…ve call with a 1sec sleep.

Introduce StopGracePeriod in container. Set it to 60sec for Core serv…

5ee72b5

…ice.

Misc fixes

1acfd03

Remove the container after the provided gracePeriodTime or the defaul…

f01fded

…t one.

NicolasMahe force-pushed the fix/container branch from 3d71b21 to f01fded Compare March 18, 2019 11:53

Remove useless context creation

e372a29

Fix TestStopService

aab185c

ilgooz reviewed Mar 19, 2019

View reviewed changes

krhubert approved these changes Mar 19, 2019

View reviewed changes

ilgooz approved these changes Mar 19, 2019

View reviewed changes

NicolasMahe merged commit 62f8093 into dev Mar 19, 2019

NicolasMahe deleted the fix/container branch March 19, 2019 06:57

NicolasMahe mentioned this pull request Apr 9, 2019

Ci fails some times on TestStopService with context deadline exceeded #765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix container package #820

Fix container package #820

NicolasMahe commented Mar 16, 2019 •

edited

Loading

antho1404 commented Mar 17, 2019

NicolasMahe commented Mar 17, 2019 •

edited

Loading

antho1404 commented Mar 17, 2019

NicolasMahe commented Mar 17, 2019 •

edited

Loading

NicolasMahe commented Mar 18, 2019 •

edited

Loading

ilgooz Mar 19, 2019

NicolasMahe Mar 19, 2019

ilgooz Mar 19, 2019

ilgooz left a comment

Fix container package #820

Fix container package #820

Conversation

NicolasMahe commented Mar 16, 2019 • edited Loading

antho1404 commented Mar 17, 2019

NicolasMahe commented Mar 17, 2019 • edited Loading

antho1404 commented Mar 17, 2019

NicolasMahe commented Mar 17, 2019 • edited Loading

NicolasMahe commented Mar 18, 2019 • edited Loading

ilgooz Mar 19, 2019

Choose a reason for hiding this comment

NicolasMahe Mar 19, 2019

Choose a reason for hiding this comment

ilgooz Mar 19, 2019

Choose a reason for hiding this comment

ilgooz left a comment

Choose a reason for hiding this comment

NicolasMahe commented Mar 16, 2019 •

edited

Loading

NicolasMahe commented Mar 17, 2019 •

edited

Loading

NicolasMahe commented Mar 17, 2019 •

edited

Loading

NicolasMahe commented Mar 18, 2019 •

edited

Loading