Skip to content
This repository has been archived by the owner on Sep 12, 2023. It is now read-only.

feat(common) add reconciler.v1 #141

Merged
merged 1 commit into from
Aug 19, 2021
Merged

Conversation

zw0610
Copy link
Member

@zw0610 zw0610 commented Jul 10, 2021

#140

The common repository offers a controller.v1 package for developers to make kubeflow operators. The controller mode exposes low-level APIs for operators needs high customization requests, which are more about the controller mechanism, such like pre-processing/post-processing for enqueue/dequeue actions.

However, for more entry-level developers or users who may not that familiar with controller, exposing to many low-level APIs could be confusing. Meanwhile, with effort from controller-runtime, kubebuilder and operator-sdk, the reconciler mode is offering a high-level, controller-mechanism-decoupled APIs which are only related to how to meet the resources expectation defined in the declarative API like TFJob, MXNetJob, etc.

To add the reconciler.v1 package as long as to keep code re-use between the controller.v1 and reconciler.v1 packages, core package is extracted to be shared between reconciler.v1 and controller.v1.

The reconciler.v1 defines its API in reconciler.v1/common/interface.go which methods only related to reconciliation exposed.

A base/parent implementation named KubeflowReconciler is defined so developers only need to override methods in reconciler.v1/common/must_customize.go.

  • Add reconciler.v1 package (controller.v1 is also modified to extract shared functionality)
  • Add API docs in terms of comment
  • Add unit-test cases for reconciler.v1

@Jeffwan
Copy link
Member

Jeffwan commented Jul 10, 2021

Let's try to minimize the code changes. If pod.go services.go are reusable, then let's try not to replica the changes

@Jeffwan
Copy link
Member

Jeffwan commented Jul 11, 2021

/hold

@zw0610
Copy link
Member Author

zw0610 commented Jul 12, 2021

Let's try to minimize the code changes. If pod.go services.go are reusable, then let's try not to replica the changes

sure. I shall finish the gang part and try to maximize code re-use in the next commit.

@zw0610
Copy link
Member Author

zw0610 commented Jul 30, 2021

@Jeffwan While I'm still working on adding comment and unit-tests for reconciler.v1 package, you may take a look at this pull request and let me know any concern to the changes especially to the existing controller.v1 package.

Let me know if I need to rebase it first.


// JobInterface defines the abstract interface for Pod related actions, such like get, create or delete TFJob,
// PyTorchJob or KFJob
type JobInterface interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it brings more interfaces. I notice this change is very large and includes lots of refactoring. Maybe it's better to add more details in the issue for training leads to review?

It would be great to cover following topics

  1. What's the advantage of using reconciler.v1 over controller.v1?
  2. What's the user facing changes? for example, if user wants to build an operator, how this solution benefit their development?

/cc @kubeflow/wg-training-leads

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For questions here, I updated #141 (comment) to elaborate the benefit of reconciler.v1.

But from my perspective, I would keep both (reconciler and controller) for developers/users at different level.

@google-oss-robot google-oss-robot requested a review from a team August 2, 2021 03:25
@Jeffwan
Copy link
Member

Jeffwan commented Aug 4, 2021

/cc @kubeflow/wg-training-leads Please help review the proposal.

@johnugeorge
Copy link
Member

In general, do we need to low level api code in longer run? Can we discuss in next meeting before merge?

@Jeffwan
Copy link
Member

Jeffwan commented Aug 6, 2021

@johnugeorge Sure, this is a big change requires user (operator) facing changes. Let's have an agreement in the meeting first.

@Jeffwan
Copy link
Member

Jeffwan commented Aug 6, 2021

/hold

@zw0610 zw0610 changed the title WIP: feat(common) add reconciler.v1 feat(common) add reconciler.v1 Aug 7, 2021
@zw0610
Copy link
Member Author

zw0610 commented Aug 7, 2021

In general, do we need to low level api code in longer run? Can we discuss in next meeting before merge?

I think the Reconciler struct exposes limited APIs that are sufficient for most operators development. While there might always be some customization supported only in Controller mode, especially for users with private deploy, these two mode should be

The concern on keeping both reconciler.v1 and controller.v1, I believe, is about the effort to maintain features and bug-fix for both package. I would suggest the following steps:

  1. add reconciler.v1 package to kubeflow/common repository
  2. convert controllers in kubeflow/tf-operator based on kubeflow/common reconciler.v1
  3. deprecate controller.v1

Because the scheduling for the community meeting next week (August 11/12th) is not so friendly to my local time, I can attend if the discussion is quite critical. Otherwise, could we discuss on August 25th?

@johnugeorge Sure. Let's discuss this on the community meeting on August 11th.

@gaocegege
Copy link
Member

/cc @kubeflow/wg-automl-leads

go.mod Outdated
@@ -23,4 +26,6 @@ replace (
k8s.io/apimachinery => k8s.io/apimachinery v0.19.9
k8s.io/client-go => k8s.io/client-go v0.19.9
k8s.io/code-generator => k8s.io/code-generator v0.19.9
sigs.k8s.io/controller-runtime => sigs.k8s.io/controller-runtime v0.7.2
github.com/prometheus/client_golang v1.11.0 => github.com/prometheus/client_golang v1.7.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this replace or we can remove it since we are using v0.21.3 version of K8s packages ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep go packages in common and tf-operator consistent. We can discuss this dependency issue on the next community meeting (August 11th).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are using v0.21.3, shall I get rid of other replace items like k8s.io/client-go => k8s.io/client-go v0.19.9?

@zw0610 zw0610 force-pushed the reconciler branch 3 times, most recently from 3b28ba7 to 7551d6b Compare August 13, 2021 01:39
@gaocegege
Copy link
Member

/cc @alculquicondor

@alculquicondor
Copy link
Contributor

alculquicondor commented Aug 13, 2021

This is a long PR. Would it be worth splitting the go.mod upgrades in its own commit or PR?

Any specific parts were you would like my input, given that I'm still unfamiliar with this project?

}

// BareKubeflowJobReconciler returns the pointer of a KubeflowJobReconciler with minimal implementation
func BareKubeflowJobReconciler(client client.Client) *KubeflowJobReconciler {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use term like "Base"

Copy link
Member Author

@zw0610 zw0610 Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the Bare means only method interfaces independently defined in JobInterface is implemented in the returned KubeflowJobReconciler pointer, I think the Base conveys the same message.

pkg/reconciler.v1/common/job_must_customize.go Outdated Show resolved Hide resolved
pkg/reconciler.v1/common/job_must_customize.go Outdated Show resolved Hide resolved
@Jeffwan
Copy link
Member

Jeffwan commented Aug 17, 2021

The scope and codes overall looks good to me. Other leads, please leave comments.

/cc @kubeflow/wg-training-leads

@google-oss-robot google-oss-robot requested a review from a team August 17, 2021 04:09
go.mod Outdated Show resolved Hide resolved
@zw0610 zw0610 force-pushed the reconciler branch 2 times, most recently from a2a0424 to 0daf75e Compare August 17, 2021 13:31
@Jeffwan
Copy link
Member

Jeffwan commented Aug 18, 2021

Overall looks good to me. I think this is safe to merge and it doesn't introduce user facing low level JobController. (just some reorganization to extract common codes).

If we plan to support a CustomJob which can support arbitrary roles. reconciler.v1 would be a good start. For existing frameworks, we can make some plans later. This is not a blocking issue to 1.4 release and beta release of all-in-one operator.

We can probably introduce this to v2 apis & controllers.

@kubeflow/wg-training-leads Please have a double check. Let's hold it for two days.

@johnugeorge
Copy link
Member

I agree with @Jeffwan For existing frameworks this is not necessary at this point. We can plan for it at a later point

@Jeffwan
Copy link
Member

Jeffwan commented Aug 19, 2021

/lgtm
/approve

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jeffwan
Copy link
Member

Jeffwan commented Aug 19, 2021

/hold cancel

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants