Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix container package #820

Merged
merged 6 commits into from
Mar 19, 2019
Merged

Fix container package #820

merged 6 commits into from
Mar 19, 2019

Conversation

NicolasMahe
Copy link
Member

@NicolasMahe NicolasMahe commented Mar 16, 2019

This PR fixes and simplifies a few stuff on the package container:

  • Introduce the option StopGracePeriod.
    This option tells docker the time to wait between the service is removed and killing the associated container. Now it's possible to set a custom on when starting a docker service.

  • Upgrade function deletePendingContainer to wait for a grace period before removing the container.
    It will either use the grace period is set in the docker service data or use the default one (10sec) if the first is not set.
    The previous logic of deletePendingContainer was killing and removing the container after only a few second, without anyway to change the time (timeout of containerStop was somehow not taking into account, maybe because of docker service).

  • Simplifying / improving function DeleteNetwork.
    The DeleteNetwork was using Docker.Events function to listen for network deletion event with a context with timeout. The problem is, sometime, the network is deleted either too early or too late (after the context timeout), so the DeleteNetwork was falling in such case.
    I update the code to use a similar logic that the function deletePendingContainer, a recursive call that check if the network is actually deleted first.

  • Change default container.callTimeout to 60sec.

@NicolasMahe NicolasMahe marked this pull request as ready for review March 16, 2019 17:15
@antho1404
Copy link
Member

Why removing the part to listen when the network is actually deleted ?

We need to ensure that the network is deleted when we remove the service otherwise we will have some problems again. Also adding 1s a bit everywhere in the code is not a good solution, sometime it might take more that 1s to delete the network/container/service

@NicolasMahe
Copy link
Member Author

NicolasMahe commented Mar 17, 2019

@antho1404

Why removing the part to listen when the network is actually deleted ?

We need to ensure that the network is deleted when we remove the service otherwise we will have some problems again. Also adding 1s a bit everywhere in the code is not a good solution, sometime it might take more that 1s to delete the network/container/service

Did you see the explanation about this in the PR description? Do you want more explanation?

The DeleteNetwork was using Docker.Events function to listen for network deletion event with a context with timeout. The problem is, sometime, the network is deleted either too early or too late (after the context timeout), so the DeleteNetwork was falling in such case.
I update the code to use a similar logic that the function deletePendingContainer, a recursive call that check if the network is actually deleted first.

@antho1404
Copy link
Member

Sorry i missed the recursive call. It's better but I feel having a polling or any kind of solutions like that is not so good when we can have events. Why not just using a context.Background and remove the timeout and still keep the event based system ?

@NicolasMahe
Copy link
Member Author

NicolasMahe commented Mar 17, 2019

Sorry i missed the recursive call. It's better but I feel having a polling or any kind of solutions like that is not so good when we can have events. Why not just using a context.Background and remove the timeout and still keep the event based system ?

That's what I tried first and it doesn't solve the issue.
I guess the problem is some events are fired before the listener is actually ready (same problem we had and solve with the gRPC acknowledgement header system), or we don't listen for the right event, or the event is never trigger under certain condition..

A Docker event system will make sense to sync in background the service status in realtime between Docker and the Service database. Otherwise, I feel it's too complicated.

container/service.go Outdated Show resolved Hide resolved
@NicolasMahe
Copy link
Member Author

NicolasMahe commented Mar 18, 2019

@mesg-foundation/core Modification done, PR description updated, please review

func (c *DockerContainer) deletePendingContainer(namespace []string) error {
ctx, cancel := context.WithTimeout(context.Background(), c.callTimeout)
defer cancel()
func (c *DockerContainer) deletePendingContainer(namespace []string, maxGraceTime time.Time) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify time logic by using <-time.After(stopGracePeriod).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it will simplify.

The goal here is to "return" as soon as the container is not found.
The maxGraceTime is kind of a timeout.

I think using <-time.After(stopGracePeriod) will always to call ContainerRemove or to prevent this, another chan done need to be created to tell the go routine to stop. So actually it seems more complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i guess current implementation is fine because after we remove the service and check if the container exists right after, Docker still might be in the deleting process of container. So it's good that we don't block for the whole timeout and check the status of container again in short periods.

Copy link
Contributor

@ilgooz ilgooz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manual tests are good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants