-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeflow 1.0.2 Katib NAS default example fails due to overwritten container #1365
Comments
Issue-Label Bot is automatically applying the labels:
Please mark this comment with 👍 or 👎 to give our bot feedback! |
Issue Label Bot is not confident enough to auto-label this issue. |
@philwinder Can you check which Trial template you use in the UI? I assume, you use template for mxnet-mnist HP example ? You should use If you want to use out-of-the-box NAS, it's better to use at least Kubeflow 1.1. We have issue to create tags for training container images: #1272. |
Hi @andreyvelich. Yes it was the
Yes, that's a valid solution, but this particular cluster is stuck on 1.0.2 for the moment. Thanks again! |
Sorry for that. We have added tags to training container images in this PR: #1372 to avoid this problem.
Just for your information, you can update Katib version for your Kubeflow cluster without deleting other Kubeflow components.
|
@philwinder I close this issue, feel free to re-open if you have any other questions. |
/kind bug
What steps did you take and what happened:
The trial pods fail due to a change in the underlying mnist dockerfile.
I tried to find an older version of the container here: https://hub.docker.com/r/kubeflowkatib/mxnet-mnist/tags, but only one exists, and it was updated 3 months ago. This is likely why it is failing now.
What did you expect to happen:
The default examples should work out of the box. That's going to be hard to fix now, because the container tag isn't set.
Ideally, in the future, I'd like properly tagged examples that work forever.
In the meantime, can you suggest the best way of getting a working out-of-the-box NAS example (CPU preferable, for testing), on the 1.0.2 version of Kubeflow?
Thanks,
Phil
The text was updated successfully, but these errors were encountered: