Fail earlier on compose errors #13110

jsoriano · 2019-07-30T18:28:10Z

Compose wrapper was retrying on docker-compose up till timeout in case
it was caused by resources exhaustion, but most of the times it fails, it
is caused by unrecoverable errors like mistakes in docker-compose.yml or
Dockerfiles. Fail earlier in all cases except in the ones that can be
recovered with the time.

The motivation to handle the case of networks exhaustion is that we are
planning to support multiple docker compose files, and multiple
scenarios or versions at the same time, this can consume all the
available network pools when tests for several modules are run in
parallel.

Related to #7957, #12909.

Compose wrapper was retrying on docker-compose up till timeout in case it was caused by resources exhaustion, but most of the times it fails it is caused by unrecoverable errors like mistakes in docker-compose.yml or Dockerfiles. Fail earlier in all cases except in the ones that can be recovered with the time. The motivation to handle the case of networks exhaustion is that we are planning to support multiple docker compose files, and multiple scenarios or versions at the same time, this can consume all the available network ranges when tests for several modules are run in parallel.

jsoriano · 2019-07-30T23:07:12Z

jenkins, test this again please

exekias · 2019-07-31T07:44:10Z

libbeat/tests/compose/wrapper.go

@@ -95,7 +98,47 @@ func (d *wrapperDriver) Up(ctx context.Context, opts UpOptions, service string)
 		args = append(args, service)
 	}

-	return d.cmd(ctx, "up", args...).Run()


where was the retry before this change?

I think I have been living too much in the world of #7957 🤦‍♂️

I added a retry there because with the changes for multiple docker compose files and multiple versions the number of scenarios and thus the number of networks increases, what can end up taking all the address pools. On #7957 scenarios are started and destroyed on each test "suite", so this was a recoverable error.

I am going to close this PR by now, till the problem arises again.

exekias · 2019-07-31T07:44:46Z

libbeat/tests/compose/wrapper.go

+}
+
+var recoverableErrors = []string{
+	`could not find an available, non-overlapping IPv4 address pool`,


is this that common? I wonder if this should be fatal (something to handle in CI)

It happens if multiple scenarios are started at the same time, what can happen if we merge the parts of #7957 for multiple docker compose scenarios. I am going to close this till this happen.

jsoriano · 2019-07-31T08:41:16Z

Closing this by now, it would be needed if the parts of #7957 to have multiple docker compose scenarios are merged, waiting till then.

Main motivation to open this PR was to remove the code from #13055, where this shouldn't be needed yet.

jsoriano added module review Metricbeat Metricbeat :Testing [zube]: In Review Team:Integrations Label for the Integrations team labels Jul 30, 2019

jsoriano requested a review from a team as a code owner July 30, 2019 18:28

jsoriano self-assigned this Jul 30, 2019

jsoriano requested a review from a team July 30, 2019 18:28

exekias reviewed Jul 31, 2019

View reviewed changes

jsoriano closed this Jul 31, 2019

zube bot added [zube]: Done and removed [zube]: In Review labels Jul 31, 2019

andresrc removed the [zube]: Done label Aug 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail earlier on compose errors #13110

Fail earlier on compose errors #13110

jsoriano commented Jul 30, 2019 •

edited

Loading

jsoriano commented Jul 30, 2019

exekias Jul 31, 2019

jsoriano Jul 31, 2019

exekias Jul 31, 2019

jsoriano Jul 31, 2019

jsoriano commented Jul 31, 2019 •

edited

Loading

Fail earlier on compose errors #13110

Fail earlier on compose errors #13110

Conversation

jsoriano commented Jul 30, 2019 • edited Loading

jsoriano commented Jul 30, 2019

exekias Jul 31, 2019

Choose a reason for hiding this comment

jsoriano Jul 31, 2019

Choose a reason for hiding this comment

exekias Jul 31, 2019

Choose a reason for hiding this comment

jsoriano Jul 31, 2019

Choose a reason for hiding this comment

jsoriano commented Jul 31, 2019 • edited Loading

jsoriano commented Jul 30, 2019 •

edited

Loading

jsoriano commented Jul 31, 2019 •

edited

Loading