-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Inclusion PV for the MySQL in Katib standalone manifests #1479
Comments
One of your users here. I think you don't have any problem with volume provisioners. I think what's causing all the issues you linked is this overzealous liveness probe. katib/manifests/v1beta1/components/mysql/mysql.yaml Lines 49 to 57 in f83df1b
In environments that have slightly slower storage, the mysql container get's killed in its initialisation phase and never recovers from that. I would suggest increasing the See [1], [2]. Also I can confirm from personal experience that the empty root password issue (#1212) with the mysql container is also cause by the container being killed before it has gotten rid of the temporary initialisation database. Regarding providing an "easy and stable" way for your users to deploy katib: You should seriously reconsider supplying a static hostPath PV in any circumstance. As I understand it the mysql database does not store ephemeral data in this case. Supplying a static hostPath PV creates a bunch of problems:
So unless I am mistaken here and the database indeed stores only ephemeral data, supplying unsuspecting users with a static hostPath PV for non-ephemeral data seems like a recipe for disaster. |
Thank you so much for sharing your thoughts @heilerich!
What is the value we should specify for the
Do we have any other PV type that satisfies any K8s environment, whether it is GCP, AWS, on-prem, etc. ? In general, I agree with you @heilerich. Setup the proper PV is the part of Kubernetes infrastructure environment, not Katib responsibility. We can provide the manifest for MySQL PV and don't include it in the Kustomize install. In the documentation, we have to explain that user's Kubernetes cluster should have the provisioner or user can deploy PV manifest. What do you think @gaocegege @johnugeorge @yanniszark ? |
My preferred solution would be to add a
I don't think you should be including any PV in the default installation manifests. It is common practice to just include the PVC and rely on the dynamic provisioning process to handle the rest. Dynamic provisioning is available at all big cloud providers and comes with many on-premise distributions nowadays. In fact, I have not encountered a PV that was bundled per default in any software that I deployed on kubernetes for a long time (of course excluding situations where it is explicitly necessary like installing drivers through a DaemonSet or physically tying hardware configuration to a node). To me, including a PV with a specific storage flavour feels like actively circumventing the storage considerations that the administrators made for their environment. Which is fine if it is an optional setting, but might be dangerous if it is the default and can be overlooked when deploying katib. |
Make sense, I agree with that. As you said, let's just include
I agree. Probably, we should also reconsider our default configuration for Resuming Experiment from volume: https://www.kubeflow.org/docs/components/katib/resume-experiment/#resume-policy-fromvolume. It would be great if we could discuss this issue in the next AutoML WG meeting. |
According to Kubernetes concepts,
There is no By the way, the sole requirement I stumbled upon when running MySQL on PV is setting PVC's |
/kind feature
See discussion: #1464 (comment).
Currently we exclude PV for MySQL in
katib-with-kubeflow
installation and include PV inkatib-standalone
installation.Our users faced with many problems when they are using custom volume provisioner: #1415, #1156, #1212.
We should discuss what is the easiest and stable way for the users to install Katib.
/cc @anneum @hjkkhj123 @Utkagr
The text was updated successfully, but these errors were encountered: