Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Trial Template API controller implementation #1202

Merged
merged 10 commits into from
Jun 10, 2020

Conversation

andreyvelich
Copy link
Member

This is new API for Trial Template.
See comment: #906 (comment).

In TrialParameters user must define parameters that are replaced in Trial Template.
As @gaocegege proposed, I added Reference to make connection between TrialParameters and Parameters (search space).
Also, in TrialSpec metadata.name and metadata.namespace must be omitted, since we automatically added it in Katib controller.

I added ConfigMap specification to TrialTemplate, if user wants to read Trial Template from configMap.

In the future, we can extend TrialTemplate with custom settings (for example, how to get succeeded state of the Job), to support any Kubernetes CRDs.
In this CRDs we should be able to run TrialJobs, such as Argo, etc (#1081).

/assign @gaocegege @jlewi @johnugeorge @sperlingxx
/cc @czheng94 @terrykong @nielsmeima

@k8s-ci-robot
Copy link

@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: nielsmeima, czheng94, terrykong.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This is new API for Trial Template.
See comment: #906 (comment).

In TrialParameters user must define parameters that are replaced in Trial Template.
As @gaocegege proposed, I added Reference to make connection between TrialParameters and Parameters (search space).
Also, in TrialSpec metadata.name and metadata.namespace must be omitted, since we automatically added it in Katib controller.

I added ConfigMap specification to TrialTemplate, if user wants to read Trial Template from configMap.

In the future, we can extend TrialTemplate with custom settings (for example, how to get succeeded state of the Job), to support any Kubernetes CRDs.
In this CRDs we should be able to run TrialJobs, such as Argo, etc (#1081).

/assign @gaocegege @jlewi @johnugeorge @sperlingxx
/cc @czheng94 @terrykong @nielsmeima

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubeflow-bot
Copy link

This change is Reviewable

TrialSpec *unstructured.Unstructured `json:"trialSpec,omitempty"`

// Name of config map where Trial template is located
ConfigMapName string `json:"configMapName,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field is already in GoTemplate, should we remove it here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I think when we implement new logic we will not support GoTemplate ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, then can we just remove gotemplate?

Retain bool `json:"retain,omitempty"`
// Retain indicates that Trial resources must be not cleanup
Retain bool `json:"retain,omitempty"`

GoTemplate *GoTemplate `json:"goTemplate,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move it into TrialSpec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case what do you think about the name where Unstructured template will be located?

type TrialSpec struct {
<must be some name>     *unstructured.Unstructured `json:"<name>,omitempty"`

ConfigMapName string `json:"configMapName,omitempty"`
ConfigMapNamespace string `json:"configMapNamespace,omitempty"`
TemplatePath string `json:"templatePath,omitempty"`
}

Copy link
Member

@gaocegege gaocegege Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this (learn from https://github.com/kubernetes/api/blob/master/core/v1/types.go#L51):

type TrialTemplate struct {
	Retain     bool        `json:"retain,omitempty"`

	TrialSource `json:",inline"`

	// List of parameres that are used in Trial template
	TrialParameters []TrialParameterSpec `json:"trialParameters,omitempty"`
}

type TrialSource struct {
	ConfigMap *ConfigMapSource `json:"configMap"`
	TrialSpec *unstructured.Unstructured `json:"trialSpec"`
}

type ConfigMapSource struct {
	// Name of config map where Trial template is located
	ConfigMapName string `json:"configMapName,omitempty"`

	// Namespace of config map where Trial template is located
	ConfigMapNamespace string `json:"configMapNamespace,omitempty"`
	
	// Path in config map where Trial template is located
	TemplatePath string `json:"templatePath,omitempty"`
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege SGTM, it is nice to follow k8s APIs structure way.
What do you think @johnugeorge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@sperlingxx
Copy link
Member

sperlingxx commented Jun 3, 2020

Generally, it looks great to me. Will we deprecate rawTemplate or keep supporting it?

Retain bool `json:"retain,omitempty"`
// Retain indicates that Trial resources must be not cleanup
Retain bool `json:"retain,omitempty"`

GoTemplate *GoTemplate `json:"goTemplate,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supporting Gotemplate also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All current examples in v1beta1 are working with GoTemplate.
You think we should remove it and don't run e2e test until we implement new logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove and disable test if we decide to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Lets make all changes to trial template API in a single PR for better tracking . Once new TrialTemplate implementation is added, move all examples to new trial template.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will remove GoTemplate in this PR and disable all e2e tests.

@andreyvelich
Copy link
Member Author

Generally, it looks great to me. Will we deprecate rawTemplate or keep supporting it?

Thanks. I think we can deprecate it, when we implement new logic. Can't find any case when rawTemplate can give us additional functionality.

@jlewi
Copy link
Contributor

jlewi commented Jun 3, 2020

Great to see this coming along. I'll let the Katib experts review this.

@jlewi jlewi removed their assignment Jun 3, 2020
Remove GoTemplate from Experiment API
@andreyvelich
Copy link
Member Author

If we delete GoTemplate API in this PR, v1alpha3 controller image also can't be built.

We need to fix problems in API package, for example: https://github.com/andreyvelich/katib/blob/new-trial-template-api/pkg/apis/controller/experiments/v1beta1/experiment_defaults.go#L57.

So maybe we can delete GoTemplate in the next PR, where we implement new logic for the controller.
In that case, we will not stop Katib development on other issues.

What do you think @gaocegege @johnugeorge ?

@gaocegege
Copy link
Member

If we delete GoTemplate API in this PR, v1alpha3 controller image also can't be built.

I am not sure why. I think v1alpha3 is decoupled with v1beta1.

@andreyvelich
Copy link
Member Author

If we delete GoTemplate API in this PR, v1alpha3 controller image also can't be built.

I am not sure why. I think v1alpha3 is decoupled with v1beta1.

I think the problem is with this: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/addtoscheme_katib_v1beta1.go#L26-L29.
Since v1beta1 and v1alpha3 in the same package apis, while building image for v1alpha3, it checks that we don't have errors in v1beta1 package: https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/register.go#L23.

So the solution can be comment the lines for v1beta1, from init() function:
https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/addtoscheme_katib_v1beta1.go#L26-L29

Then, I can build image for v1alpha3.

Fix experiment defaults
change valid/invalid experiment
@andreyvelich
Copy link
Member Author

I commited changes for the controller and examples in this PR to avoid problems in testing.
List of changes:

  1. Change all examples in v1beta1 to support new Trial Template
  2. Submit new API changes for Experiment API and Trial API (RunSpec now also unstructured).
  3. Modify deepcopy and experiment_defaults to support new API
  4. Delete old trial template configMap. I left only ConfigMap with trial template which is labeled: app: katib-trial-templates. So users can correct see them in the UI.
  5. In Experiment Controller I think we should createTrialInstance only when Suggestion returns some trials. It can avoid unexpected errors with already created trials.
    // If Trial Assignment is not ready we don't need to create Trials
    if len(trialNames) != 0 {
    logger.Info("Create Trials", "trialNames", trialNames)
    for _, trial := range trials {
    if err = r.createTrialInstance(instance, &trial); err != nil {
    logger.Error(err, "Create trial instance error", "trial", trial)
    continue
    }
  6. Modify all tests for Experiment and Trial controller.
  7. Add new logic in generator.go to support new Trial Template.
  8. Add new test for generator_test.go.
  9. Comment Validator for Trial Template. Need to modify it in the future PRs.

I think it is ready for review.
After this PR, we should pass the test for the new Trial Template. We still need to add validation and improve generator_test.go with more tests, which can be tracked here: #1208.

/cc @gaocegege @johnugeorge

if err = r.createTrialInstance(instance, &trial); err != nil {
logger.Error(err, "Create trial instance error", "trial", trial)
continue
trialNames = append(trialNames, trial.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph can be simplified to

for _, trial := range trials {
    if err = r.createTrialInstance(instance, &trial); err != nil {
        logger.Error(err, "Create trial instance error", "trial", trial)
        continue
    }
}			

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we try to create trials even if trials==[ ] . Is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich If trials == [], the foreach loop will be skipped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sperlingxx Thanks, I got your point. Will make change.

@andreyvelich andreyvelich changed the title New Trial Template API New Trial Template API controller implementation Jun 8, 2020
Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks for your contribution! 🎉 👍

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jun 9, 2020
@gaocegege
Copy link
Member

/lgtm

/assign @sperlingxx @johnugeorge

Copy link
Member

@sperlingxx sperlingxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Great contribution!

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 88da808 into kubeflow:master Jun 10, 2020
@czheng94
Copy link

So excited about this!! 🎉

sperlingxx pushed a commit to sperlingxx/katib that referenced this pull request Jul 9, 2020
* Add new Trial Template API

* Add TrialSource in TrialTemplate
Remove GoTemplate from Experiment API

* Add init logic for new trial template

* Change examples
Implement new Trial Template to controller

* Modify trial template configMap for new version of Trial Template
Fix experiment defaults
change valid/invalid experiment

* Run gofmt

* Fix hyperband

* Remove num-epochs from grid example

* Add tag to file mc example

* Modify create Trials loop
@andreyvelich andreyvelich deleted the new-trial-template-api branch October 6, 2021 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants