Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail earlier on compose errors #13110

Closed
wants to merge 1 commit into from

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Jul 30, 2019

Compose wrapper was retrying on docker-compose up till timeout in case
it was caused by resources exhaustion, but most of the times it fails, it
is caused by unrecoverable errors like mistakes in docker-compose.yml or
Dockerfiles. Fail earlier in all cases except in the ones that can be
recovered with the time.

The motivation to handle the case of networks exhaustion is that we are
planning to support multiple docker compose files, and multiple
scenarios or versions at the same time, this can consume all the
available network pools when tests for several modules are run in
parallel.

Related to #7957, #12909.

Compose wrapper was retrying on docker-compose up till timeout in case
it was caused by resources exhaustion, but most of the times it fails it
is caused by unrecoverable errors like mistakes in docker-compose.yml or
Dockerfiles. Fail earlier in all cases except in the ones that can be
recovered with the time.

The motivation to handle the case of networks exhaustion is that we are
planning to support multiple docker compose files, and multiple
scenarios or versions at the same time, this can consume all the
available network ranges when tests for several modules are run in
parallel.
@jsoriano jsoriano requested a review from a team as a code owner July 30, 2019 18:28
@jsoriano jsoriano self-assigned this Jul 30, 2019
@jsoriano jsoriano requested a review from a team July 30, 2019 18:28
@jsoriano
Copy link
Member Author

jenkins, test this again please

@@ -95,7 +98,47 @@ func (d *wrapperDriver) Up(ctx context.Context, opts UpOptions, service string)
args = append(args, service)
}

return d.cmd(ctx, "up", args...).Run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where was the retry before this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have been living too much in the world of #7957 🤦‍♂️

I added a retry there because with the changes for multiple docker compose files and multiple versions the number of scenarios and thus the number of networks increases, what can end up taking all the address pools. On #7957 scenarios are started and destroyed on each test "suite", so this was a recoverable error.

I am going to close this PR by now, till the problem arises again.

}

var recoverableErrors = []string{
`could not find an available, non-overlapping IPv4 address pool`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this that common? I wonder if this should be fatal (something to handle in CI)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens if multiple scenarios are started at the same time, what can happen if we merge the parts of #7957 for multiple docker compose scenarios. I am going to close this till this happen.

@jsoriano
Copy link
Member Author

jsoriano commented Jul 31, 2019

Closing this by now, it would be needed if the parts of #7957 to have multiple docker compose scenarios are merged, waiting till then.

Main motivation to open this PR was to remove the code from #13055, where this shouldn't be needed yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metricbeat Metricbeat module review Team:Integrations Label for the Integrations team :Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants