-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail earlier on compose errors #13110
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,11 +18,14 @@ | |
package compose | ||
|
||
import ( | ||
"bytes" | ||
"context" | ||
"fmt" | ||
"io" | ||
"os" | ||
"os/exec" | ||
"strings" | ||
"time" | ||
|
||
"github.com/docker/docker/api/types" | ||
"github.com/docker/docker/api/types/filters" | ||
|
@@ -95,7 +98,47 @@ func (d *wrapperDriver) Up(ctx context.Context, opts UpOptions, service string) | |
args = append(args, service) | ||
} | ||
|
||
return d.cmd(ctx, "up", args...).Run() | ||
var stdout, stderr bytes.Buffer | ||
defer io.Copy(os.Stdout, &stdout) | ||
defer io.Copy(os.Stderr, &stderr) | ||
for { | ||
// Up can fail if we have reached some system limit, specially | ||
// number of networks, retry while the context is not done | ||
cmd := d.cmd(ctx, "up", args...) | ||
cmd.Stdout = &stdout | ||
cmd.Stderr = &stderr | ||
|
||
err := cmd.Run() | ||
if err == nil { | ||
return nil | ||
} | ||
if err := fatalError(&stderr); err != nil { | ||
return errors.Wrapf(err, "docker-compose up failed for service %s", service) | ||
} | ||
|
||
select { | ||
case <-time.After(time.Second): | ||
case <-ctx.Done(): | ||
return err | ||
} | ||
} | ||
} | ||
|
||
var recoverableErrors = []string{ | ||
`could not find an available, non-overlapping IPv4 address pool`, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this that common? I wonder if this should be fatal (something to handle in CI) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It happens if multiple scenarios are started at the same time, what can happen if we merge the parts of #7957 for multiple docker compose scenarios. I am going to close this till this happen. |
||
} | ||
|
||
// fatalError parses the error message and check if it is caused by an unrecoverable error or not. | ||
// It considers recoverable errors the ones caused by resources exhaustion. | ||
// Parsing the error message is not nice, but it is the only way to clasify docker compose errors. | ||
func fatalError(message *bytes.Buffer) error { | ||
data := message.String() | ||
for _, errorMsg := range recoverableErrors { | ||
if strings.Contains(data, errorMsg) { | ||
return nil | ||
} | ||
} | ||
return errors.New(data) | ||
} | ||
|
||
func (d *wrapperDriver) Kill(ctx context.Context, signal string, service string) error { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where was the retry before this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have been living too much in the world of #7957 🤦♂️
I added a retry there because with the changes for multiple docker compose files and multiple versions the number of scenarios and thus the number of networks increases, what can end up taking all the address pools. On #7957 scenarios are started and destroyed on each test "suite", so this was a recoverable error.
I am going to close this PR by now, till the problem arises again.