From a5321ea0ca8e2d96ebdeb78cedc36e0e00ec8b4a Mon Sep 17 00:00:00 2001 From: avelichk Date: Wed, 28 Oct 2020 19:39:56 +0000 Subject: [PATCH 1/5] Add istio sidecar annotation inforation --- .../en/docs/components/katib/experiment.md | 20 ++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 35a9210a75..7742f291b6 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -716,9 +716,23 @@ to launch an experiment from the command line: kubectl apply -f ``` -**Note:** If you deployed Katib as part of Kubeflow (your Kubeflow deployment -should include Katib), you need to change Kubeflow namespace to your -profile namespace. Run the following command to launch an experiment +**Note:** + +- If you deployed Katib as part of Kubeflow (your Kubeflow deployment + should include Katib), you need to change Kubeflow namespace to your + profile namespace. + +- (Optional) Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you install Kubeflow using + [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify this annotation: + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For examples on how to do it for `Job`, `TFJob` (TensorFlow) or + `PyTorchJob` (PyTorch), refer to the + [getting-started guide](http://localhost:1313/docs/components/hyperparameter-tuning/hyperparameter/#examples). + +Run the following command to launch an experiment using the random algorithm example: ```shell From daa3c282d4c2080269da30318d7bf78d0055503f Mon Sep 17 00:00:00 2001 From: avelichk Date: Tue, 3 Nov 2020 00:09:38 +0000 Subject: [PATCH 2/5] Address review comments --- content/en/docs/components/katib/experiment.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 7742f291b6..f55e5dc6ac 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -727,8 +727,8 @@ kubectl apply -f If you install Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), you have to disable sidecar injection. To do that, specify this annotation: - `sidecar.istio.io/inject: "false"` in your experiment's trial template. - For examples on how to do it for `Job`, `TFJob` (TensorFlow) or + `sidecar.istio.io/inject: "false"` in your experiment's trial template. For + examples on how to do it for `Job`, `TFJob` (TensorFlow) or `PyTorchJob` (PyTorch), refer to the [getting-started guide](http://localhost:1313/docs/components/hyperparameter-tuning/hyperparameter/#examples). From bc352d670125708a720da1fc0f37205a393abba3 Mon Sep 17 00:00:00 2001 From: avelichk Date: Tue, 3 Nov 2020 00:11:32 +0000 Subject: [PATCH 3/5] Should be your experiment --- .../hyperparameter-tuning/hyperparameter.md | 406 ++++++++++++++++++ 1 file changed, 406 insertions(+) create mode 100644 content/en/docs/components/hyperparameter-tuning/hyperparameter.md diff --git a/content/en/docs/components/hyperparameter-tuning/hyperparameter.md b/content/en/docs/components/hyperparameter-tuning/hyperparameter.md new file mode 100644 index 0000000000..16fe40c8ea --- /dev/null +++ b/content/en/docs/components/hyperparameter-tuning/hyperparameter.md @@ -0,0 +1,406 @@ ++++ +title = "Getting started with Katib" +description = "How to set up Katib and run some hyperparameter tuning examples" +weight = 20 + ++++ + +This page gets you started with Katib. Follow this guide to perform any +additional setup you may need, depending on your environment, and to run a few +examples using the command line and the Katib user interface (UI). + +For an overview of the concepts around Katib and hyperparameter tuning, read the +[introduction to +Katib](/docs/components/hyperparameter-tuning/overview/). + +## Katib setup + +This section describes some configurations that you may need to add to your +Kubernetes cluster, depending on the way you're using Kubeflow and Katib. + + + +### Installing Katib + +You can skip this step if you have already installed Kubeflow. Your Kubeflow +deployment includes Katib. + +To install Katib as part of Kubeflow, follow the +[Kubeflow installation guide](/docs/started/getting-started/). + +If you want to install Katib separately from Kubeflow, or to get a later version +of Katib, run the following commands to install Katib directly from its +repository on GitHub and deploy Katib to your cluster: + +``` +git clone https://github.com/kubeflow/katib +bash ./katib/scripts/v1alpha3/deploy.sh +``` + +### Setting up persistent volumes + +If you used [above script](#katib-install) to deploy Katib, you can skip this step. This script deploys PVC and PV on your cluster. + +You can skip this step if you're using Kubeflow on Google Kubernetes Engine +(GKE) or if your Kubernetes cluster includes a StorageClass for dynamic volume +provisioning. For more information, see the Kubernetes documentation on +[dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) +and [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/). + +If you're using Katib outside GKE and your cluster doesn't include a +StorageClass for dynamic volume provisioning, you must create a persistent +volume (PV) to bind to the persistent volume claim (PVC) required by Katib. + +After deploying Katib to your cluster, run the following command to create the +PV: + +``` +kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/manifests/v1alpha3/pv/pv.yaml +``` + +The above `kubectl apply` command uses a YAML file +([`pv.yaml`](https://raw.githubusercontent.com/kubeflow/katib/master/manifests/v1alpha3/pv/pv.yaml)) +that defines the properties of the PV. + + + +## Accessing the Katib UI + +You can use the Katib user interface (UI) to submit experiments and to monitor +your results. The Katib home page within Kubeflow looks like this: + +The Katib home page within the Kubeflow UI + +If you installed Katib as part of Kubeflow, you can access the +Katib UI from the Kubeflow UI: + +1. Open the Kubeflow UI. See the guide to + [accessing the central dashboard](/docs/components/central-dash/overview/). +1. Click **Katib** in the left-hand menu. + +Alternatively, you can set port-forwarding for the Katib UI service: + +``` +kubectl port-forward svc/katib-ui -n kubeflow 8080:80 +``` + +Then you can access the Katib UI at this URL: + +``` +http://localhost:8080/katib/ +``` + +## Examples + +This section introduces some examples that you can run to try Katib. + + + +### Example using random algorithm + +You can create an experiment for Katib by defining the experiment in a YAML +configuration file. The YAML file defines the configurations for the experiment, +including the hyperparameter feasible space, optimization parameter, +optimization goal, suggestion algorithm, and so on. + +This example uses the [YAML file for the +random algorithm example](https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/random-example.yaml). + +The random algorithm example uses an MXNet neural network to train an image +classification model using the MNIST dataset. You can check training container source code [here](https://github.com/kubeflow/katib/tree/master/examples/v1alpha3/mxnet-mnist). The experiment runs three training jobs with various hyperparameters and saves the results. + +Run the following commands to launch an experiment using the random algorithm +example: + +1. Download the example: + + ``` + curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/random-example.yaml --output random-example.yaml + ``` + +1. Edit `random-example.yaml` and change the following line to use your Kubeflow user profile namespace: + + ``` + Namespace: kubeflow + ``` + +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` + in your experiment's trial template. + + For the provided random example with Kubernetes [`Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/) + trial template, annotation should be under + [`.trialSpec.spec.template.metadata.annotations`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml#L52). + For the Kubeflow `TFJob` or other training operators check + [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) how to set annotation. + +1. Deploy the example: + ``` + kubectl apply -f random-example.yaml + ``` + +This example embeds the hyperparameters as arguments. You can embed +hyperparameters in another way (for example, using environment variables) +by using the template defined in the `TrialTemplate.GoTemplate.RawTemplate` +section of the YAML file. The template uses the +[Go template format](https://golang.org/pkg/text/template/). + +This example randomly generates the following hyperparameters: + +- `--lr`: Learning rate. Type: double. +- `--num-layers`: Number of layers in the neural network. Type: integer. +- `--optimizer`: Optimizer. Type: categorical. + +Check the experiment status: + +``` +kubectl -n describe experiment random-example +``` + +The output of the above command should look similar to this: + +``` +Name: random-example +Namespace: +Labels: controller-tools.k8s.io=1.0 +Annotations: +API Version: kubeflow.org/v1alpha3 +Kind: Experiment +Metadata: + Creation Timestamp: 2019-12-22T22:53:25Z + Finalizers: + update-prometheus-metrics + Generation: 2 + Resource Version: 720692 + Self Link: /apis/kubeflow.org/v1alpha3/namespaces//experiments/random-example + UID: dc6bc15a-250d-11ea-8cae-42010a80010f +Spec: + Algorithm: + Algorithm Name: random + Algorithm Settings: + Max Failed Trial Count: 3 + Max Trial Count: 12 + Metrics Collector Spec: + Collector: + Kind: StdOut + Objective: + Additional Metric Names: + accuracy + Goal: 0.99 + Objective Metric Name: Validation-accuracy + Type: maximize + Parallel Trial Count: 3 + Parameters: + Feasible Space: + Max: 0.03 + Min: 0.01 + Name: --lr + Parameter Type: double + Feasible Space: + Max: 5 + Min: 2 + Name: --num-layers + Parameter Type: int + Feasible Space: + List: + sgd + adam + ftrl + Name: --optimizer + Parameter Type: categorical + Resume Policy: LongRunning + Trial Template: + Go Template: + Raw Template: apiVersion: batch/v1 +kind: Job +metadata: + name: {{.Trial}} + namespace: {{.NameSpace}} +spec: + template: + spec: + containers: + - name: {{.Trial}} + image: docker.io/kubeflowkatib/mxnet-mnist-example + command: + - "python" + - "/mxnet/example/image-classification/train_mnist.py" + - "--batch-size=64" + {{- with .HyperParameters}} + {{- range .}} + - "{{.Name}}={{.Value}}" + {{- end}} + {{- end}} + restartPolicy: Never +Status: + Conditions: + Last Transition Time: 2019-12-22T22:53:25Z + Last Update Time: 2019-12-22T22:53:25Z + Message: Experiment is created + Reason: ExperimentCreated + Status: True + Type: Created + Last Transition Time: 2019-12-22T22:55:10Z + Last Update Time: 2019-12-22T22:55:10Z + Message: Experiment is running + Reason: ExperimentRunning + Status: True + Type: Running + Current Optimal Trial: + Observation: + Metrics: + Name: Validation-accuracy + Value: 0.981091 + Parameter Assignments: + Name: --lr + Value: 0.025139701133432946 + Name: --num-layers + Value: 4 + Name: --optimizer + Value: sgd + Start Time: 2019-12-22T22:53:25Z + Trials: 12 + Trials Running: 2 + Trials Succeeded: 10 +Events: +``` + +When the last value in `Status.Conditions.Type` is `Succeeded`, the experiment +is complete. + + +View the results of the experiment in the Katib UI: + +1. Open the Katib UI as described [above](#katib-ui). +1. Click **Hyperparameter Tuning** on the Katib home page. +1. Open the Katib menu panel on the left, then open the **HP** section and + click **Monitor**: + + The Katib menu panel + +1. You should see the list of experiments: + + The random example in the list of Katib experiments + +1. Click the name of the experiment, **random-example**. +1. You should see a graph showing the level of validation and train accuracy for various + combinations of the hyperparameter values (learning rate, number of layers, + and optimizer): + + Graph produced by the random example + +1. Below the graph is a list of trials that ran within the experiment: + + Trials that ran during the experiment + +1. You can click on trial name to see metrics for the particular trial: + + Trials that ran during the experiment + +### TensorFlow example + +Run the following commands to launch an experiment using the Kubeflow's +TensorFlow training job operator, TFJob: + +1. Download the tfjob-example.yaml file + + ``` + curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml --output tfjob-example.yaml + ``` + +1. Edit `tfjob-example.yaml` and change the following line to use your Kubeflow user profile namespace: + + ``` + Namespace: kubeflow + ``` + +1. Deploy the example: + + ``` + kubectl apply -f tfjob-example.yaml + ``` + +1. (Optional) **Note that** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` + in your experiment's trial template. For the provided `TFJob` example check + [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) how to set annotation. + +1. You can check the status of the experiment: + ``` + kubectl -n describe experiment tfjob-example + ``` + +Follow the steps as described for the _random algorithm example_ +[above](#view-ui), to see the results of the experiment in the Katib UI. + +### PyTorch example + +Run the following commands to launch an experiment using Kubeflow's PyTorch +training job operator, PyTorchJob: + +1. Download the pytorchjob-example.yaml file + + ``` + curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/pytorchjob-example.yaml --output pytorchjob-example.yaml + ``` + +1. Edit `pytorchjob-example.yaml` and change the following line to use your Kubeflow user profile namespace: + + ``` + Namespace: kubeflow + ``` + +1. Deploy the example: + + ``` + kubectl apply -f pytorchjob-example.yaml + ``` + +1. (Optional) **Note that** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` + in your experiment's trial template. For the provided `PyTorchJob` example setting the annotation should be similar to + [`TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob). + +1. You can check the status of the experiment: + ``` + kubectl -n describe experiment pytorchjob-example + ``` + +Follow the steps as described for the _random algorithm example_ +[above](#view-ui), to see the results of the experiment in the Katib UI. + +## Cleanup + +Delete the installed components: + +``` +bash ./scripts/v1alpha3/undeploy.sh +``` + +## Next steps + +- For details of how to configure and run your experiment, see the guide to + [running an experiment](/docs/components/hyperparameter-tuning/experiment/). + +- For a detailed instruction of the Katib Configuration file, + read the [Katib config page](/docs/components/hyperparameter-tuning/katib-config/). + +- See how you can change installation of Katib component in the [environment variables guide](/docs/components/hyperparameter-tuning/env-variables/). From cd7f36f07441d35c3e727bf95519f801752a49a2 Mon Sep 17 00:00:00 2001 From: avelichk Date: Thu, 5 Nov 2020 22:46:20 +0000 Subject: [PATCH 4/5] Annotation step after changing namespace --- .../hyperparameter-tuning/hyperparameter.md | 33 +++++++++++-------- .../en/docs/components/katib/experiment.md | 11 +++++++ 2 files changed, 30 insertions(+), 14 deletions(-) diff --git a/content/en/docs/components/hyperparameter-tuning/hyperparameter.md b/content/en/docs/components/hyperparameter-tuning/hyperparameter.md index 16fe40c8ea..2b067b19e1 100644 --- a/content/en/docs/components/hyperparameter-tuning/hyperparameter.md +++ b/content/en/docs/components/hyperparameter-tuning/hyperparameter.md @@ -328,19 +328,22 @@ TensorFlow training job operator, TFJob: Namespace: kubeflow ``` +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using + [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify annotation + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For the provided `TFJob` example check + [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) + how to set annotation. + 1. Deploy the example: ``` kubectl apply -f tfjob-example.yaml ``` -1. (Optional) **Note that** Katib's experiments don't work with - [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). - If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), - you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` - in your experiment's trial template. For the provided `TFJob` example check - [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) how to set annotation. - 1. You can check the status of the experiment: ``` kubectl -n describe experiment tfjob-example @@ -366,19 +369,21 @@ training job operator, PyTorchJob: Namespace: kubeflow ``` +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using + [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify annotation + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For the provided `PyTorchJob` example setting the annotation should be similar to + [`TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob). + 1. Deploy the example: ``` kubectl apply -f pytorchjob-example.yaml ``` -1. (Optional) **Note that** Katib's experiments don't work with - [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). - If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), - you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` - in your experiment's trial template. For the provided `PyTorchJob` example setting the annotation should be similar to - [`TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob). - 1. You can check the status of the experiment: ``` kubectl -n describe experiment pytorchjob-example diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index f55e5dc6ac..6009a92821 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -718,17 +718,28 @@ kubectl apply -f **Note:** +<<<<<<< HEAD:content/en/docs/components/katib/experiment.md - If you deployed Katib as part of Kubeflow (your Kubeflow deployment should include Katib), you need to change Kubeflow namespace to your profile namespace. +======= +- If you deploy Katib as part of Kubeflow, you have to change the Kubeflow + namespace to your profile namespace. +>>>>>>> Annotation step after changing namespace:content/en/docs/components/hyperparameter-tuning/experiment.md - (Optional) Katib's experiments don't work with [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). If you install Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), +<<<<<<< HEAD:content/en/docs/components/katib/experiment.md you have to disable sidecar injection. To do that, specify this annotation: `sidecar.istio.io/inject: "false"` in your experiment's trial template. For examples on how to do it for `Job`, `TFJob` (TensorFlow) or +======= + you have to disable sidecar injection. To do that, specify annotation + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For examples on how to do it for `Job`, `TFJob` (TensorFlow) or +>>>>>>> Annotation step after changing namespace:content/en/docs/components/hyperparameter-tuning/experiment.md `PyTorchJob` (PyTorch), refer to the [getting-started guide](http://localhost:1313/docs/components/hyperparameter-tuning/hyperparameter/#examples). From d406cc63faf626dbd78d18a71691272caf678291 Mon Sep 17 00:00:00 2001 From: avelichk Date: Wed, 11 Nov 2020 13:47:33 +0000 Subject: [PATCH 5/5] Fix links --- .../hyperparameter-tuning/hyperparameter.md | 411 ------------------ .../en/docs/components/katib/experiment.md | 13 +- .../docs/components/katib/hyperparameter.md | 34 ++ 3 files changed, 35 insertions(+), 423 deletions(-) delete mode 100644 content/en/docs/components/hyperparameter-tuning/hyperparameter.md diff --git a/content/en/docs/components/hyperparameter-tuning/hyperparameter.md b/content/en/docs/components/hyperparameter-tuning/hyperparameter.md deleted file mode 100644 index 2b067b19e1..0000000000 --- a/content/en/docs/components/hyperparameter-tuning/hyperparameter.md +++ /dev/null @@ -1,411 +0,0 @@ -+++ -title = "Getting started with Katib" -description = "How to set up Katib and run some hyperparameter tuning examples" -weight = 20 - -+++ - -This page gets you started with Katib. Follow this guide to perform any -additional setup you may need, depending on your environment, and to run a few -examples using the command line and the Katib user interface (UI). - -For an overview of the concepts around Katib and hyperparameter tuning, read the -[introduction to -Katib](/docs/components/hyperparameter-tuning/overview/). - -## Katib setup - -This section describes some configurations that you may need to add to your -Kubernetes cluster, depending on the way you're using Kubeflow and Katib. - - - -### Installing Katib - -You can skip this step if you have already installed Kubeflow. Your Kubeflow -deployment includes Katib. - -To install Katib as part of Kubeflow, follow the -[Kubeflow installation guide](/docs/started/getting-started/). - -If you want to install Katib separately from Kubeflow, or to get a later version -of Katib, run the following commands to install Katib directly from its -repository on GitHub and deploy Katib to your cluster: - -``` -git clone https://github.com/kubeflow/katib -bash ./katib/scripts/v1alpha3/deploy.sh -``` - -### Setting up persistent volumes - -If you used [above script](#katib-install) to deploy Katib, you can skip this step. This script deploys PVC and PV on your cluster. - -You can skip this step if you're using Kubeflow on Google Kubernetes Engine -(GKE) or if your Kubernetes cluster includes a StorageClass for dynamic volume -provisioning. For more information, see the Kubernetes documentation on -[dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) -and [persistent volumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/). - -If you're using Katib outside GKE and your cluster doesn't include a -StorageClass for dynamic volume provisioning, you must create a persistent -volume (PV) to bind to the persistent volume claim (PVC) required by Katib. - -After deploying Katib to your cluster, run the following command to create the -PV: - -``` -kubectl apply -f https://raw.githubusercontent.com/kubeflow/katib/master/manifests/v1alpha3/pv/pv.yaml -``` - -The above `kubectl apply` command uses a YAML file -([`pv.yaml`](https://raw.githubusercontent.com/kubeflow/katib/master/manifests/v1alpha3/pv/pv.yaml)) -that defines the properties of the PV. - - - -## Accessing the Katib UI - -You can use the Katib user interface (UI) to submit experiments and to monitor -your results. The Katib home page within Kubeflow looks like this: - -The Katib home page within the Kubeflow UI - -If you installed Katib as part of Kubeflow, you can access the -Katib UI from the Kubeflow UI: - -1. Open the Kubeflow UI. See the guide to - [accessing the central dashboard](/docs/components/central-dash/overview/). -1. Click **Katib** in the left-hand menu. - -Alternatively, you can set port-forwarding for the Katib UI service: - -``` -kubectl port-forward svc/katib-ui -n kubeflow 8080:80 -``` - -Then you can access the Katib UI at this URL: - -``` -http://localhost:8080/katib/ -``` - -## Examples - -This section introduces some examples that you can run to try Katib. - - - -### Example using random algorithm - -You can create an experiment for Katib by defining the experiment in a YAML -configuration file. The YAML file defines the configurations for the experiment, -including the hyperparameter feasible space, optimization parameter, -optimization goal, suggestion algorithm, and so on. - -This example uses the [YAML file for the -random algorithm example](https://github.com/kubeflow/katib/blob/master/examples/v1alpha3/random-example.yaml). - -The random algorithm example uses an MXNet neural network to train an image -classification model using the MNIST dataset. You can check training container source code [here](https://github.com/kubeflow/katib/tree/master/examples/v1alpha3/mxnet-mnist). The experiment runs three training jobs with various hyperparameters and saves the results. - -Run the following commands to launch an experiment using the random algorithm -example: - -1. Download the example: - - ``` - curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/random-example.yaml --output random-example.yaml - ``` - -1. Edit `random-example.yaml` and change the following line to use your Kubeflow user profile namespace: - - ``` - Namespace: kubeflow - ``` - -1. (Optional) **Note:** Katib's experiments don't work with - [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). - If you installed Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), - you have to disable sidecar injection. To do that, specify annotation `sidecar.istio.io/inject: "false"` - in your experiment's trial template. - - For the provided random example with Kubernetes [`Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/) - trial template, annotation should be under - [`.trialSpec.spec.template.metadata.annotations`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml#L52). - For the Kubeflow `TFJob` or other training operators check - [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) how to set annotation. - -1. Deploy the example: - ``` - kubectl apply -f random-example.yaml - ``` - -This example embeds the hyperparameters as arguments. You can embed -hyperparameters in another way (for example, using environment variables) -by using the template defined in the `TrialTemplate.GoTemplate.RawTemplate` -section of the YAML file. The template uses the -[Go template format](https://golang.org/pkg/text/template/). - -This example randomly generates the following hyperparameters: - -- `--lr`: Learning rate. Type: double. -- `--num-layers`: Number of layers in the neural network. Type: integer. -- `--optimizer`: Optimizer. Type: categorical. - -Check the experiment status: - -``` -kubectl -n describe experiment random-example -``` - -The output of the above command should look similar to this: - -``` -Name: random-example -Namespace: -Labels: controller-tools.k8s.io=1.0 -Annotations: -API Version: kubeflow.org/v1alpha3 -Kind: Experiment -Metadata: - Creation Timestamp: 2019-12-22T22:53:25Z - Finalizers: - update-prometheus-metrics - Generation: 2 - Resource Version: 720692 - Self Link: /apis/kubeflow.org/v1alpha3/namespaces//experiments/random-example - UID: dc6bc15a-250d-11ea-8cae-42010a80010f -Spec: - Algorithm: - Algorithm Name: random - Algorithm Settings: - Max Failed Trial Count: 3 - Max Trial Count: 12 - Metrics Collector Spec: - Collector: - Kind: StdOut - Objective: - Additional Metric Names: - accuracy - Goal: 0.99 - Objective Metric Name: Validation-accuracy - Type: maximize - Parallel Trial Count: 3 - Parameters: - Feasible Space: - Max: 0.03 - Min: 0.01 - Name: --lr - Parameter Type: double - Feasible Space: - Max: 5 - Min: 2 - Name: --num-layers - Parameter Type: int - Feasible Space: - List: - sgd - adam - ftrl - Name: --optimizer - Parameter Type: categorical - Resume Policy: LongRunning - Trial Template: - Go Template: - Raw Template: apiVersion: batch/v1 -kind: Job -metadata: - name: {{.Trial}} - namespace: {{.NameSpace}} -spec: - template: - spec: - containers: - - name: {{.Trial}} - image: docker.io/kubeflowkatib/mxnet-mnist-example - command: - - "python" - - "/mxnet/example/image-classification/train_mnist.py" - - "--batch-size=64" - {{- with .HyperParameters}} - {{- range .}} - - "{{.Name}}={{.Value}}" - {{- end}} - {{- end}} - restartPolicy: Never -Status: - Conditions: - Last Transition Time: 2019-12-22T22:53:25Z - Last Update Time: 2019-12-22T22:53:25Z - Message: Experiment is created - Reason: ExperimentCreated - Status: True - Type: Created - Last Transition Time: 2019-12-22T22:55:10Z - Last Update Time: 2019-12-22T22:55:10Z - Message: Experiment is running - Reason: ExperimentRunning - Status: True - Type: Running - Current Optimal Trial: - Observation: - Metrics: - Name: Validation-accuracy - Value: 0.981091 - Parameter Assignments: - Name: --lr - Value: 0.025139701133432946 - Name: --num-layers - Value: 4 - Name: --optimizer - Value: sgd - Start Time: 2019-12-22T22:53:25Z - Trials: 12 - Trials Running: 2 - Trials Succeeded: 10 -Events: -``` - -When the last value in `Status.Conditions.Type` is `Succeeded`, the experiment -is complete. - - -View the results of the experiment in the Katib UI: - -1. Open the Katib UI as described [above](#katib-ui). -1. Click **Hyperparameter Tuning** on the Katib home page. -1. Open the Katib menu panel on the left, then open the **HP** section and - click **Monitor**: - - The Katib menu panel - -1. You should see the list of experiments: - - The random example in the list of Katib experiments - -1. Click the name of the experiment, **random-example**. -1. You should see a graph showing the level of validation and train accuracy for various - combinations of the hyperparameter values (learning rate, number of layers, - and optimizer): - - Graph produced by the random example - -1. Below the graph is a list of trials that ran within the experiment: - - Trials that ran during the experiment - -1. You can click on trial name to see metrics for the particular trial: - - Trials that ran during the experiment - -### TensorFlow example - -Run the following commands to launch an experiment using the Kubeflow's -TensorFlow training job operator, TFJob: - -1. Download the tfjob-example.yaml file - - ``` - curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml --output tfjob-example.yaml - ``` - -1. Edit `tfjob-example.yaml` and change the following line to use your Kubeflow user profile namespace: - - ``` - Namespace: kubeflow - ``` - -1. (Optional) **Note:** Katib's experiments don't work with - [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). - If you installed Kubeflow using - [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), - you have to disable sidecar injection. To do that, specify annotation - `sidecar.istio.io/inject: "false"` in your experiment's trial template. - For the provided `TFJob` example check - [here](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob) - how to set annotation. - -1. Deploy the example: - - ``` - kubectl apply -f tfjob-example.yaml - ``` - -1. You can check the status of the experiment: - ``` - kubectl -n describe experiment tfjob-example - ``` - -Follow the steps as described for the _random algorithm example_ -[above](#view-ui), to see the results of the experiment in the Katib UI. - -### PyTorch example - -Run the following commands to launch an experiment using Kubeflow's PyTorch -training job operator, PyTorchJob: - -1. Download the pytorchjob-example.yaml file - - ``` - curl https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/pytorchjob-example.yaml --output pytorchjob-example.yaml - ``` - -1. Edit `pytorchjob-example.yaml` and change the following line to use your Kubeflow user profile namespace: - - ``` - Namespace: kubeflow - ``` - -1. (Optional) **Note:** Katib's experiments don't work with - [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). - If you installed Kubeflow using - [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), - you have to disable sidecar injection. To do that, specify annotation - `sidecar.istio.io/inject: "false"` in your experiment's trial template. - For the provided `PyTorchJob` example setting the annotation should be similar to - [`TFJob`](https://www.kubeflow.org/docs/components/training/tftraining/#what-is-tfjob). - -1. Deploy the example: - - ``` - kubectl apply -f pytorchjob-example.yaml - ``` - -1. You can check the status of the experiment: - ``` - kubectl -n describe experiment pytorchjob-example - ``` - -Follow the steps as described for the _random algorithm example_ -[above](#view-ui), to see the results of the experiment in the Katib UI. - -## Cleanup - -Delete the installed components: - -``` -bash ./scripts/v1alpha3/undeploy.sh -``` - -## Next steps - -- For details of how to configure and run your experiment, see the guide to - [running an experiment](/docs/components/hyperparameter-tuning/experiment/). - -- For a detailed instruction of the Katib Configuration file, - read the [Katib config page](/docs/components/hyperparameter-tuning/katib-config/). - -- See how you can change installation of Katib component in the [environment variables guide](/docs/components/hyperparameter-tuning/env-variables/). diff --git a/content/en/docs/components/katib/experiment.md b/content/en/docs/components/katib/experiment.md index 6009a92821..a4cd3722ce 100644 --- a/content/en/docs/components/katib/experiment.md +++ b/content/en/docs/components/katib/experiment.md @@ -718,30 +718,19 @@ kubectl apply -f **Note:** -<<<<<<< HEAD:content/en/docs/components/katib/experiment.md - If you deployed Katib as part of Kubeflow (your Kubeflow deployment should include Katib), you need to change Kubeflow namespace to your profile namespace. -======= -- If you deploy Katib as part of Kubeflow, you have to change the Kubeflow - namespace to your profile namespace. ->>>>>>> Annotation step after changing namespace:content/en/docs/components/hyperparameter-tuning/experiment.md - (Optional) Katib's experiments don't work with [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). If you install Kubeflow using [Istio config](https://www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/), -<<<<<<< HEAD:content/en/docs/components/katib/experiment.md you have to disable sidecar injection. To do that, specify this annotation: `sidecar.istio.io/inject: "false"` in your experiment's trial template. For examples on how to do it for `Job`, `TFJob` (TensorFlow) or -======= - you have to disable sidecar injection. To do that, specify annotation - `sidecar.istio.io/inject: "false"` in your experiment's trial template. - For examples on how to do it for `Job`, `TFJob` (TensorFlow) or ->>>>>>> Annotation step after changing namespace:content/en/docs/components/hyperparameter-tuning/experiment.md `PyTorchJob` (PyTorch), refer to the - [getting-started guide](http://localhost:1313/docs/components/hyperparameter-tuning/hyperparameter/#examples). + [getting-started guide](/docs/components/katib/hyperparameter/#examples). Run the following command to launch an experiment using the random algorithm example: diff --git a/content/en/docs/components/katib/hyperparameter.md b/content/en/docs/components/katib/hyperparameter.md index 3716353c1a..48f234efc7 100644 --- a/content/en/docs/components/katib/hyperparameter.md +++ b/content/en/docs/components/katib/hyperparameter.md @@ -135,6 +135,21 @@ an experiment using the random algorithm example: Namespace: kubeflow ``` +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using + [Istio config](/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify this annotation: + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + + For the provided random example with Kubernetes + [`Job`](https://kubernetes.io/docs/concepts/workloads/controllers/job/) + trial template, annotation should be under + [`.trialSpec.spec.template.metadata.annotations`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/random-example.yaml#L52). + For the Kubeflow `TFJob` or other training operators check + [here](/docs/components/training/tftraining/#what-is-tfjob) + how to set the annotation. + 1. Deploy the example: ```shell @@ -373,6 +388,16 @@ the Kubeflow's TensorFlow training job operator, TFJob: Namespace: kubeflow ``` +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using + [Istio config](/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify this annotation: + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For the provided `TFJob` example check + [here](/docs/components/training/tftraining/#what-is-tfjob) + how to set the annotation. + 1. Deploy the example: ```shell @@ -407,6 +432,15 @@ using Kubeflow's PyTorch training job operator, PyTorchJob: Namespace: kubeflow ``` +1. (Optional) **Note:** Katib's experiments don't work with + [Istio sidecar injection](https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/#automatic-sidecar-injection). + If you installed Kubeflow using + [Istio config](/docs/started/k8s/kfctl-k8s-istio/), + you have to disable sidecar injection. To do that, specify this annotation: + `sidecar.istio.io/inject: "false"` in your experiment's trial template. + For the provided `PyTorchJob` example setting the annotation should be similar to + [`TFJob`](/docs/components/training/tftraining/#what-is-tfjob) + 1. Deploy the example: ```shell