Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Kustomize manifests for Katib #1464

Merged
merged 18 commits into from
Mar 12, 2021

Conversation

andreyvelich
Copy link
Member

I refactored Kustomize manifest for Katib.
I followed this approach:

  1. All Katib manifests are located in /components.
  2. All installs with kustomization files are located in /installs.

User can select required install Kustomize file, e.g. Katib standalone, Kubeflow, IBM, etc..
User also are able to use pure kubectl to deploy Katib, we can extend deploy.sh script for that later.

We can keep the latest tag for the Katib standalone installation and we can add Katib 0.11 tag for the Kubeflow Katib installation.

For the Katib config I added patchesStrategicMerge to the Kustomize file. I am not sure if it is possible to modify image tags in the other way.

For the Kubeflow installation I added:

commonLabels:
  app.kubernetes.io/component: katib

I left only 1 application with this label for the Katib component.

Please take a look @yanniszark.

I am still updating some docs in this PR.

/cc @gaocegege @johnugeorge

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich andreyvelich changed the title [WIP] Refactor Kustomize manifests for Katib Refactor Kustomize manifests for Katib Mar 10, 2021
@andreyvelich
Copy link
Member Author

I changed docs in the Katib repo.
/hold for the reviews

@andreyvelich
Copy link
Member Author

andreyvelich commented Mar 10, 2021

@DomFleischmann @knkski @RFMVasconcelos I disabled GitHub actions at this PR, since we should fix: #1453 and add cert-generator to the manifests.
If you check logs from the controller: https://github.com/kubeflow/katib/pull/1464/checks?check_run_id=2080590497, manager fails because /tmp/cert/tls.crt file doesn't exist.

@andreyvelich
Copy link
Member Author

I also updated tags for the Trial images to v1beta1-c6c9172 for these changes: #1457.

@yanniszark
Copy link
Contributor

@andreyvelich to explore the diff, I built:

  • manifests/v1beta1/installs/katib-standalone on master
  • manifests/v1beta1/installs/katib-with-kubeflow on this branch

I'm uploading the old and new yaml, for completeness:
old.yaml.log
new.yaml.log

The main diff I saw is:

  • The new YAML has the webhook configurations. Nice!
  • The new YAML has the cert-generator script. Also nice, it would be good if we could somehow make it optional. But we can discuss later on this.
  • The new YAML creates the following PersistentVolume:
apiVersion: v1
kind: PersistentVolume
metadata:
  labels:
    app.kubernetes.io/component: katib
    type: local
  name: katib-mysql
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 10Gi
  hostPath:
    path: /tmp/katib
  storageClassName: katib

I don't think it's right to include a PersistentVolume, as it ties Katib installation to a particular type of storage (hostPath). It also presumes the existence of the Katib storageclass. By including just a PersistentVolumeClaim, we make storage pluggable. It's standard to include a PVC (not a PV) in other manifests as well. What do you think?

@andreyvelich
Copy link
Member Author

Thank you for the review @yanniszark!

The new YAML has the cert-generator script. Also nice, it would be good if we could somehow make it optional. But we can discuss later on this.

I think in the next release (~ Katib 0.11.1 or Katib 0.12 ) we can make it optional. We should have a discussion how we can do it (cert-manager overlay for instance).

I don't think it's right to include a PersistentVolume, as it ties Katib installation to a particular type of storage (hostPath). It also presumes the existence of the Katib storageclass. By including just a PersistentVolumeClaim, we make storage pluggable. It's standard to include a PVC (not a PV) in other manifests as well. What do you think?

We can exclude this PV from the Kubeflow installation.
Before the manifest migration, PV was included in the Katib installation, but PV was excluded in the Kubeflow installation.
We have seen a lot of problems when users use their own StorageClass to provision volume (#1415, #1156, #1212, etc.)
That is why we included PV with hostPath in the Katib installation by default.

If users know that cluster Storage Class can properly provision volume, they can always modify the kustomization file.

WDYT @yanniszark ? cc @gaocegege @johnugeorge

@yanniszark
Copy link
Contributor

We have seen a lot of problems when users use their own StorageClass to provision volume (#1415, #1156, #1212, etc.)
That is why we included PV with hostPath in the Katib installation by default.

Thanks for including the related issues for context! I agree that it's good to try and make users' life easier. Since creating the PV is essentially an environment setup step (and may have security implications because it's using a hostPath volume, thus having side-effects on the host machine), I would include it in a separate manifest and add an optional step to install it.

For this PR, excluding it in the Kubeflow overlay seems ok. I'd like to open the discussion in the future if you agree!

@andreyvelich
Copy link
Member Author

Thanks for including the related issues for context! I agree that it's good to try and make users' life easier. Since creating the PV is essentially an environment setup step (and may have security implications because it's using a hostPath volume, thus having side-effects on the host machine), I would include it in a separate manifest and add an optional step to install it.

For this PR, excluding it in the Kubeflow overlay seems ok. I'd like to open the discussion in the future if you agree!

Sure, it makes sense. Let's keep for the Kubeflow 1.3 release as it was before (Katib installation - PV is included, Kubeflow installation - PV is excluded). I will create an issue and we can discuss how we want to manager it in the further release.

@andreyvelich
Copy link
Member Author

@yanniszark Also what do you think about this patch:

patchesStrategicMerge:
- |-
apiVersion: v1
kind: ConfigMap
metadata:
name: katib-config
namespace: kubeflow
data:
metrics-collector-sidecar: |-
{
"StdOut": {
"image": "docker.io/kubeflowkatib/file-metrics-collector:v1beta1-c6c9172"
},
"File": {
"image": "docker.io/kubeflowkatib/file-metrics-collector:v1beta1-c6c9172"
},
"TensorFlowEvent": {
"image": "docker.io/kubeflowkatib/tfevent-metrics-collector:v1beta1-c6c9172",
"resources": {
"limits": {
"memory": "1Gi"
}
}
}
}
suggestion: |-
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v1beta1-c6c9172"
},
"grid": {
"image": "docker.io/kubeflowkatib/suggestion-chocolate:v1beta1-c6c9172"
},
"hyperband": {
"image": "docker.io/kubeflowkatib/suggestion-hyperband:v1beta1-c6c9172"
},
"bayesianoptimization": {
"image": "docker.io/kubeflowkatib/suggestion-skopt:v1beta1-c6c9172"
},
"tpe": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt:v1beta1-c6c9172"
},
"enas": {
"image": "docker.io/kubeflowkatib/suggestion-enas:v1beta1-c6c9172",
"resources": {
"limits": {
"memory": "200Mi"
}
}
},
"cmaes": {
"image": "docker.io/kubeflowkatib/suggestion-goptuna:v1beta1-c6c9172"
},
"darts": {
"image": "docker.io/kubeflowkatib/suggestion-darts:v1beta1-c6c9172"
}
}
early-stopping: |-
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop:v1beta1-c6c9172"
}
}

Does it sound good for you ?

@yanniszark
Copy link
Contributor

Does it sound good for you ?

Yes, unfortunately you need to patch the whole ConfigMap :/
You can also use the configMapGenerator with behavior merge:
https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/configmapgenerator/

@andreyvelich
Copy link
Member Author

andreyvelich commented Mar 11, 2021

I believe configMapGenerator will add hash to the name once it generates resource which doesn't work for us currently.
Let's continue with Patch for now.

@yanniszark If you are fine with other changes, please can you give your lgtm and we can move forward.

@andreyvelich
Copy link
Member Author

/retest

@yanniszark
Copy link
Contributor

/lgtm

@andreyvelich
Copy link
Member Author

/retest

@andreyvelich
Copy link
Member Author

@yanniszark I was actually thinking about the patch more.
Since we try to follow release process as you mentioned here: #1466, we don't need to have specific image tag in the katib-with-kubeflow install.

Thus, we can use re-use patch for the Katib config and simplify kustomization files.

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@andreyvelich
Copy link
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants