Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for cloud-init on dstack-gateway before attempting any operations #1220

Open
jvstme opened this issue May 14, 2024 · 10 comments
Open

Wait for cloud-init on dstack-gateway before attempting any operations #1220

jvstme opened this issue May 14, 2024 · 10 comments

Comments

@jvstme
Copy link
Collaborator

jvstme commented May 14, 2024

Current

After connecting to dstack-gateway via SSH, dstack-server will attempt updating the gateway with update.sh or configuring it by calling the /api/config endpoint. However, dstack-gateway's installation and setup with cloud-init may be unfinished by that moment yet. This would lead to unclear dstack-server errors like

Failed to configure gateway 35.202.8.178: ReadError(‘’)

or

Failed to update gateway 35.202.8.178: /bin/sh: 0: cannot open dstack/update.sh: No such file

Proposed

  • After establishing each SSH connection to dstack-gateway ensure that cloud-init has finished by running
    cloud-init status --wait
  • Check the output of cloud-init status and report an error to the user if cloud-init was not successful
  • Add a timeout for waiting for cloud-init status and report an error to the user if the timeout is reached
  • Remove the retry logic when configuring dstack-gateway or reduce the number of attempts

This should improve the user experience, facilitate troubleshooting, prevent bugs.

@r4victor
Copy link
Collaborator

After #1236 we give gateway more than enough time to install and setup. If it takes more time for some reason, then we should fix the underlying problem. This issue only addresses the error messages, so I'd state it as minor.

@peterschmidt85
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@peterschmidt85
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

@peterschmidt85 peterschmidt85 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2024
@jvstme jvstme reopened this Jul 1, 2024
@peterschmidt85
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2024
@jvstme jvstme reopened this Aug 16, 2024
@jvstme jvstme added minor and removed stale labels Aug 16, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Sep 16, 2024
@peterschmidt85
Copy link
Contributor

@jvstme is this issue still valid?

@jvstme
Copy link
Collaborator Author

jvstme commented Sep 27, 2024

@peterschmidt85, yes

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Oct 28, 2024
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants