-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use low-level controller and handlers in SetupWithManager #1315
Use low-level controller and handlers in SetupWithManager #1315
Conversation
Signed-off-by: Jiaxin Shan <[email protected]>
Signed-off-by: Jiaxin Shan <[email protected]>
/cc @kubeflow/wg-training-leads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! 🎉 👍
Signed-off-by: Jiaxin Shan <[email protected]>
We enabled tests for this branch SDK tests failed and the running job takes long time to finish. Note: The enabled test case tests against original operator. We have not tested new operator yet. (should come soon)
|
/retest |
I will fix the test failure today |
Signed-off-by: Jiaxin Shan <[email protected]>
Seems tf-operator is not successfully deployed in the cluster which leads to all TFJob submitted to the cluster failed.
|
em.. Our CI images are private actually and I have to rebuilt the image myself.
This is because we import controller-runtime which defines The original way should not involve new changes. Let me see if we can exclude those tools when build original binary |
CI reports `flag redefined: kubeconfig` issue and this is due to duplicate flag registration. See kubeflow#1316 for more details. Signed-off-by: Jiaxin Shan <[email protected]>
This is to fix original controller issue. It leverages init to register default to scheme. In our unversal operator project, we did clean up for register.go and break the case. For more details, please check kubeflow#1317 (comment) Signed-off-by: Jiaxin Shan <[email protected]>
/cc @gaocegege test failures are fixed. Please have another look |
return nil | ||
} | ||
// using onOwnerCreateFunc is easier to set defaults | ||
if err = c.Watch(&source.Kind{Type: &mxjobv1.MXJob{}}, &handler.EnqueueRequestForObject{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we prefer using Watch
instead of Owns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch
is the low-level functions provided by builder. Owns
and For
can be replaced by Watch
with some enqueue handlers.
We want to leverage onOwnerCreateFunc
to set defaults instead of using Kubebuilder way (webhook). I notice owns can not provide pluggable handler.
If there's a way to do that, let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM!
@@ -51,7 +51,7 @@ func NewServerOption() *ServerOption { | |||
|
|||
// AddFlags adds flags for a specific CMServer to the specified FlagSet. | |||
func (s *ServerOption) AddFlags(fs *flag.FlagSet) { | |||
fs.StringVar(&s.Kubeconfig, "kubeconfig", "", "The path of kubeconfig file") | |||
//fs.StringVar(&s.Kubeconfig, "kubeconfig", "", "The path of kubeconfig file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can not find a way to workaround it. User can still use KUBECONFIG
to specify the path of config files. This is the only user facing change to tf controller.v1 users.
Next step is to build universal operator to replace tf controller and hit against all test case. Once conformance test pass. We will remove tf controller.v1
@@ -448,17 +456,12 @@ func onOwnerCreateFunc() func(event.CreateEvent) bool { | |||
} | |||
|
|||
// TODO: check default setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt check this comment. Let's keep it for now.
I remember the reason I put a TODO there is because we use mxjobv1.SetDefaults_MXJob(mxjob)
to set defaults, We can use scheme.Scheme.Default(mxjob)
, This is equivalent actually but I prefer to change to scheme.Scheme.Default(mxjob)
in the near future.
Currently, we have not registered defaulters in scheme so that scheme.Scheme.Default(mxjob)
do nothing there.
description "check default setting" is not accurate.. I should change to something more descriptive later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @gaocegege
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jeffwan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Resolve #1299 |
* Use low-level controller and handlers in SetupWithManager Signed-off-by: Jiaxin Shan <[email protected]> * Add job validation in reconcile loop Signed-off-by: Jiaxin Shan <[email protected]> * Set defaults in onOwnerCreateFunc Signed-off-by: Jiaxin Shan <[email protected]> * Correctly update job status in apiserver Signed-off-by: Jiaxin Shan <[email protected]> * Remove Flag for Kubeconfig to fix flag redefined CI reports `flag redefined: kubeconfig` issue and this is due to duplicate flag registration. See #1316 for more details. Signed-off-by: Jiaxin Shan <[email protected]> * Fix tensorflow job missing default port issue This is to fix original controller issue. It leverages init to register default to scheme. In our unversal operator project, we did clean up for register.go and break the case. For more details, please check #1317 (comment) Signed-off-by: Jiaxin Shan <[email protected]>
Resolve #1312 #1314 #1316 #1317
Since we use kubeflow/common to observe expections, current (kubebuilder v3) way is kind of hard to be compatible with kubeflow/common pattern. The expectation calculation is not accurate. I determine to change back to kubebuilder v1 pattern to use low level controller in SetupWithManager. Rest of the logics and programming model is still same as Kubebuilder v3.0 which is compatible
Add validation back to reconciler.
Set defaults in onOwnerCreateFunc
Fix job status update issue.
Make CI work in this dev branch