-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail earlier on compose errors #13110
Conversation
Compose wrapper was retrying on docker-compose up till timeout in case it was caused by resources exhaustion, but most of the times it fails it is caused by unrecoverable errors like mistakes in docker-compose.yml or Dockerfiles. Fail earlier in all cases except in the ones that can be recovered with the time. The motivation to handle the case of networks exhaustion is that we are planning to support multiple docker compose files, and multiple scenarios or versions at the same time, this can consume all the available network ranges when tests for several modules are run in parallel.
jenkins, test this again please |
@@ -95,7 +98,47 @@ func (d *wrapperDriver) Up(ctx context.Context, opts UpOptions, service string) | |||
args = append(args, service) | |||
} | |||
|
|||
return d.cmd(ctx, "up", args...).Run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where was the retry before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have been living too much in the world of #7957 🤦♂️
I added a retry there because with the changes for multiple docker compose files and multiple versions the number of scenarios and thus the number of networks increases, what can end up taking all the address pools. On #7957 scenarios are started and destroyed on each test "suite", so this was a recoverable error.
I am going to close this PR by now, till the problem arises again.
} | ||
|
||
var recoverableErrors = []string{ | ||
`could not find an available, non-overlapping IPv4 address pool`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this that common? I wonder if this should be fatal (something to handle in CI)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It happens if multiple scenarios are started at the same time, what can happen if we merge the parts of #7957 for multiple docker compose scenarios. I am going to close this till this happen.
Compose wrapper was retrying on
docker-compose up
till timeout in caseit was caused by resources exhaustion, but most of the times it fails, it
is caused by unrecoverable errors like mistakes in docker-compose.yml or
Dockerfiles. Fail earlier in all cases except in the ones that can be
recovered with the time.
The motivation to handle the case of networks exhaustion is that we are
planning to support multiple docker compose files, and multiple
scenarios or versions at the same time, this can consume all the
available network pools when tests for several modules are run in
parallel.
Related to #7957, #12909.