AutoML WG and Kubeflow 1.5 release #2106

DnPlas · 2022-01-19T17:26:02Z

@kubeflow/wg-automl-leads let's use this tracking issue to coordinate the integration of AutoML with the Kubeflow 1.5 release.

First off a heads up that the feature freeze phase will start Tuesday (25th January). Before then I'd like to have updated this repo with the manifests of the kubeflow/katib repo, in order to be able to cut the first RC tag in this repo.

So what I'd like to ask as a first step before the feature freeze is:

What version of Katib would you like to include for the 1.5 release?
Could you provide me with a branch/tag for this version? It doesn't have to be final. The branch/tag provided can keep on getting fixes through out the release process, but not new features
Are there any open issues/work in progress that you will be working on for your version as the KF release process will be progressing?
What will the K8s supported versions be for kubeflow/katib?

The text was updated successfully, but these errors were encountered:

kimwnasptd · 2022-01-24T16:13:48Z

From the versioning issue we had we know we are targeting 0.13 #2098 (comment). @kubeflow/wg-automl-leads let's use this issue for further updates, new tags, progressing issues etc.

DomFleischmann · 2022-02-08T08:59:16Z

Hi @kubeflow/wg-automl-leads , Before the manifest testing on Wednesday, Feb 9th, the release team is planning on cutting another RC to use for the testing.

Based on a previous communication, the release team will be using AutoML version 0.13rc0. If the AutoML WG have identified any issues since the feature freeze and would like to update the AutoML version before the manifest testing, let us know before Feb. 9th. Thank you!

@andreyvelich

kimwnasptd · 2022-02-09T15:41:50Z

After syncing in today's AutoML we will keep on using the 0.13-rc0 tag, for the RC1 of the Manifests. A newer RC might be cut for the kubeflow/katib repo later on, in case more issues arise.

Also another note, the @kubeflow/wg-automl-leads will update the kubeflow/katib e2e tests to be using the v1.5-branch branch of the manifests. This means that the e2e tests will be using the latest training operators, so we'll be keeping an eye on issues that might arise.

yhwang · 2022-02-09T21:50:00Z

deployed kubeflow from v1.5-branch and ran this example: https://github.com/kubeflow/katib/blob/master/examples/v1beta1/kubeflow-pipelines/kubeflow-e2e-mnist.ipynb
I encountered this issue: kubeflow/katib#1795

I found the metric collector is not injected into the trial pod:

mnist-e2e-jxnc28x2-chief-0                                        0/1     Completed 
mnist-e2e-jxnc28x2-worker-0                                       0/1     Completed

Does anyone have the same issue? not sure if this is the right place to discuss/report this.

BTW, early-stop sample works well and I do see metric collector container was injected:

median-stop-new2-nxh6jbn7-h7h48                                   0/2     Completed

kimwnasptd · 2022-02-10T09:14:30Z

Thanks for raising this @yhwang! I also bumped into this when writing the e2e tests

The fix for this should be to use training.kubeflow.org/job-role: master as the PrimaryPodLabel. Here's how I did it in the codified version of the above notebook:
https://github.com/kubeflow/manifests/pull/2128/files#diff-ba317d8735e3ac6c584fe8dc196fddb304ad5e548b94599c35eeb59bcfa8e89eR159

We also discussed this in this week's AutoML meeting, and we'll expose the full list of annotations/changes users need to keep in mind for the new 1.4 version of the Training Operators.

yhwang · 2022-02-10T18:19:46Z

thanks @kimwnasptd I tried training.kubeflow.org/job-role: master and the metric collector is injected. however, it only finished 1st trial, and no more sequential trial was scheduled. The experiment is still in the running state but no more progress. do you have the same issue?

kimwnasptd · 2022-02-11T10:46:15Z

Haven't bumped into this, in my case with a KinD 1.20 cluster all the trials got to Succeeded state after running the test https://github.com/kubeflow/manifests/blob/master/tests/e2e/runner.sh.

Can you open a distinct issue in the kubeflow/katib so that we can get more deep into it?

I'll also start using Prow for the e2e tests with AWS clusters in the manifests repo, I'll give a heads up if I bump into this.

yhwang · 2022-02-15T17:21:08Z

forgot to update you on my latest status of katib. the problem seems to be a tfjob from previous run got stuck in a weird state. after I removed that job, my katib works well. thanks for the script and hint.

kimwnasptd · 2022-03-01T07:17:15Z

@andreyvelich @johnugeorge @gaocegege I'm working on finalizing the manifests for the release, as we are getting closer to the release date of March 9th.

Regarding the kubeflow/katib repo, when are you planning to cut the final v0.13 tag? Could you do it within this week so that we can get the manifests closer to their final state?

johnugeorge · 2022-03-01T11:42:28Z

@kimwnasptd . we will do it this week

kimwnasptd · 2022-03-04T17:21:16Z

Just saw it's ready. Congrats on the release 🎉

shannonbradshaw · 2022-03-07T23:00:55Z

Hey folks, any docs changes required as a result of this work? Please create an issue and mention it on this tracking issue.
kubeflow/website#3130

DnPlas · 2023-04-25T13:08:39Z

This effort has been finalised.

This was referenced Jan 24, 2022

Notebooks WG and Kubeflow 1.5 release #2109

Closed

KF 1.5 tracking #2112

Closed

kimwnasptd mentioned this issue Jan 26, 2022

Update kubeflow/katib manifests from v0.13.0-rc.0 #2116

Merged

andreyvelich mentioned this issue Feb 15, 2022

Update manifests for Katib v0.13.0-rc.1 release #2139

Merged

1 task

kimwnasptd mentioned this issue Mar 4, 2022

Sync kubeflow katib manifests v0.13.0 #2156

Merged

DnPlas closed this as completed Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML WG and Kubeflow 1.5 release #2106

AutoML WG and Kubeflow 1.5 release #2106

DnPlas commented Jan 19, 2022 •

edited

Loading

kimwnasptd commented Jan 24, 2022

DomFleischmann commented Feb 8, 2022

kimwnasptd commented Feb 9, 2022

yhwang commented Feb 9, 2022 •

edited

Loading

kimwnasptd commented Feb 10, 2022

yhwang commented Feb 10, 2022

kimwnasptd commented Feb 11, 2022

yhwang commented Feb 15, 2022

kimwnasptd commented Mar 1, 2022

johnugeorge commented Mar 1, 2022 •

edited

Loading

kimwnasptd commented Mar 4, 2022

shannonbradshaw commented Mar 7, 2022 •

edited

Loading

DnPlas commented Apr 25, 2023

AutoML WG and Kubeflow 1.5 release #2106

AutoML WG and Kubeflow 1.5 release #2106

Comments

DnPlas commented Jan 19, 2022 • edited Loading

kimwnasptd commented Jan 24, 2022

DomFleischmann commented Feb 8, 2022

kimwnasptd commented Feb 9, 2022

yhwang commented Feb 9, 2022 • edited Loading

kimwnasptd commented Feb 10, 2022

yhwang commented Feb 10, 2022

kimwnasptd commented Feb 11, 2022

yhwang commented Feb 15, 2022

kimwnasptd commented Mar 1, 2022

johnugeorge commented Mar 1, 2022 • edited Loading

kimwnasptd commented Mar 4, 2022

shannonbradshaw commented Mar 7, 2022 • edited Loading

DnPlas commented Apr 25, 2023

DnPlas commented Jan 19, 2022 •

edited

Loading

yhwang commented Feb 9, 2022 •

edited

Loading

johnugeorge commented Mar 1, 2022 •

edited

Loading

shannonbradshaw commented Mar 7, 2022 •

edited

Loading