Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use --bootstrap to wait for remote to become active #1103

Merged
merged 1 commit into from
May 12, 2022

Conversation

jedevc
Copy link
Collaborator

@jedevc jedevc commented May 9, 2022

Previously, the --bootstrap flag didn't do anything with the remote driver - now it should wait until the remote becomes active, with a small linear backoff.

This should help improve scenarios where the remote may not come up instantly if it has been started just before as was the case in the issue that caused #1094.

@crazy-max
Copy link
Member

@jedevc Can we remove this step now since these changes:

-
name: Check remote buildkitd
if: matrix.driver == 'remote'
run: |
try=0
max=10
until [ "$(docker container inspect remote-buildkit --format '{{ .State.Health.Status }}')" = "healthy" ]; do
if [ $try -gt $max ]; then
echo >&2 "healthcheck failed after $max trials"
exit 1
fi
sleep $(awk "BEGIN{print (100 + $try * 20) * 0.002}")
try=$(expr $try + 1)
done
?

@jedevc jedevc force-pushed the remote-driver-bootstrap branch 2 times, most recently from 56d4ce3 to c245f30 Compare May 9, 2022 12:50
}

func (d *Driver) Info(ctx context.Context) (*driver.Info, error) {
c, err := d.Client(ctx)
if err != nil {
return nil, errors.Wrapf(driver.ErrNotConnecting, err.Error())
return &driver.Info{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still return error and we should check for "network error" on the caller side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about preserving the error to be able to display - I'm not sure about how much to change the caller side. Info gets called before Boot, which requires some fiddly logic changes in generic code to ignore network errors in that case.

I think there's two scenarios for errors here:

  • Failure to connect to the endpoint providing the driver (can't connect to kubernetes, can't connect to docker daemon, etc) - this kind of failure means it's not worth trying to bootstrap, something is wrong.
  • Failure to connect to the individual driver node (as in this case) - bootstrapping here is fine

Could we maybe rework status to be a struct and add an Err field to it? Then we could express a status of a node that's inactive because of an err. It currently feels like the error returned from here is quite tied to it's status, and is a little inconsistent across the drivers.

@tonistiigi tonistiigi merged commit 062cf29 into docker:master May 12, 2022
@jedevc jedevc deleted the remote-driver-bootstrap branch September 6, 2023 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants