Add resume policy instructions for Katib experiments #2324

andreyvelich · 2020-10-30T00:03:06Z

Fixes: kubeflow/katib#1292.
Blocked by: #2312.

I've added doc about restarting Katib experiment, please take a look.

/assign @johnugeorge @gaocegege
/cc @RFMVasconcelos @8bitmp3

kubeflow-bot · 2020-10-30T00:03:15Z

This change is

8bitmp3 · 2020-11-08T23:32:09Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md


- For a detailed instruction of the Katib Configuration file,
-  read the [Katib config page](/docs/components/hyperparameter-tuning/katib-config/).
+- Read about [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).


for a11y:

Suggested change

- Read about [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).

- Check the [Katib configuration (Katib config)](/docs/components/katib/katib-config/) page.

8bitmp3 · 2020-11-08T23:33:24Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

-Suggestion data can be retained in the volume.
-When you restart the experiment, suggestion's deployment and service are created and
-suggestion statistics can be recovered from the volume.
+After the experiment has succeeded, the suggestion's deployment and


Suggested change

After the experiment has succeeded, the suggestion's deployment and

After the experiment is successful, the suggestion's deployment and

or maybe "has finished"? "Successful" can be subjective, IMHO

8bitmp3 · 2020-11-08T23:36:56Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

+See the
+[from volume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18).


for a11y:

Suggested change

See the

[from volume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18).

Check the

[`from-volume-resume.yaml`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18)

example to learn more.

WDYT? It's more precise, since it's not a "tutorial" example, just a "hands-on" YAML file.

8bitmp3 · 2020-11-08T23:37:32Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

 and [service](https://kubernetes.io/docs/concepts/services-networking/service/)
 are deleted and you can't restart the experiment.
-Read more about Katib concepts in [overview guide](/docs/components/hyperparameter-tuning/overview/#katib-concepts).
+Read more about Katib concepts in the


For a11y:

Suggested change

Read more about Katib concepts in the

Learn more about Katib concepts in the

8bitmp3 · 2020-11-08T23:37:59Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

 and [service](https://kubernetes.io/docs/concepts/services-networking/service/)
 are deleted and you can't restart the experiment.
-Read more about Katib concepts in [overview guide](/docs/components/hyperparameter-tuning/overview/#katib-concepts).
+Read more about Katib concepts in the
+[overview guide](/docs/components/hyperparameter-tuning/overview/#katib-concepts).

 See the [never resume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/never-resume.yaml#L20).


For a11y:

Suggested change

See the [never resume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/never-resume.yaml#L20).

Check the [`never-resume.yaml`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/never-resume.yaml#L20)

example for more details.

WDYT?

8bitmp3 · 2020-11-08T23:38:32Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md


 ## Resume succeeded experiment

-To control various resume policies, you can specify `.spec.resumePolicy` for the experiment.
+To control various resume policies, you can specify `.spec.resumePolicy`
+for the experiment.
 See the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).


Suggested change

See the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).

(Refer to the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).)

8bitmp3 · 2020-11-08T23:41:01Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

+While the experiment is running you are able to change trial count parameters.
+For example, if you want to decrease the maximum number of
+hyperparameter sets that are trained parallel.


This may appear like an incomplete sentence because of the "if" statement. Let's try the following:

Suggested change

While the experiment is running you are able to change trial count parameters.

For example, if you want to decrease the maximum number of

hyperparameter sets that are trained parallel.

While the experiment is running you are able to change trial count parameters.

For example, you can decrease the maximum number of

hyperparameter sets that are trained in parallel.

Note: hyperparam sets can be trained "in parallel" not "parallel", I think.

8bitmp3 · 2020-11-08T23:43:36Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

+This page describes in detail how to modify running experiment
+and restart succeeded experiment. Follow this guide to know more
+about changing the experiment execution process and use various


"a running experiment" or "running experiment"

"page" -> "guide" (everything can be a page?)

"succeeded"? -> "completed" (success in ML experiments can be subjective, but completion (it's over) is more objective, I think)

Talk to the reader - "you will learn more..."

Suggested change

This page describes in detail how to modify running experiment

and restart succeeded experiment. Follow this guide to know more

about changing the experiment execution process and use various

This guide describes how to modify running experiments

and restart completed experiments. You will learn

about changing the experiment execution process and use various

8bitmp3 · 2020-11-08T23:44:11Z

content/en/docs/components/hyperparameter-tuning/resume-experiment.md

-Follow this guide to known more about changing experiment execution process and use various
+This page describes in detail how to modify running experiment
+and restart succeeded experiment. Follow this guide to know more
+about changing the experiment execution process and use various
 resume policies for the Katib experiment.

 For details of how to configure and run your experiment, see the guide to


Grammar

a11y

Suggested change

For details of how to configure and run your experiment, see the guide to

For details on how to configure and run your experiment, check the guide on

Also, check the links please (e.g. /docs/components/hyperparameter-tuning/experiment/ since some may be moving to .../kativ/.. I think?)

8bitmp3 · 2020-11-08T23:45:03Z

content/en/docs/components/hyperparameter-tuning/experiment.md

@@ -142,7 +142,8 @@ These are the fields in the experiment configuration spec:
 * **resumePolicy**: Experiment resume policy. Can be one of `LongRunning`, `Never` or `FromVolume`.
  Default value is `LongRunning`.
  See the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).


Suggested change

See the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).

(Refer to the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).)

For other GO files in the doc, can you please apply this logic, if it makes sense to you? It's mainly for a11y, as well as letting the reader know why you should "see" the file, since no reasons are originally provided. I think it can be good practice.

k8s-ci-robot · 2020-11-11T15:26:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~content/en/docs/components/katib/OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2020-11-11T15:29:28Z

This PR is ready.
/cc @8bitmp3 @gaocegege @johnugeorge

gaocegege

LGTM 👍
/lgtm

gaocegege · 2020-11-13T23:53:24Z

Please do a rebase.

andreyvelich · 2020-11-14T01:13:42Z

/hold

andreyvelich · 2020-11-14T01:18:43Z

@gaocegege It's done.
/hold cancel

gaocegege · 2020-11-14T02:18:14Z

LGTM 👍
/lgtm

k8s-ci-robot assigned gaocegege Oct 30, 2020

k8s-ci-robot added the do-not-merge/work-in-progress label Oct 30, 2020

k8s-ci-robot assigned johnugeorge Oct 30, 2020

k8s-ci-robot requested review from 8bitmp3 and rui-vas October 30, 2020 00:03

k8s-ci-robot added approved size/L labels Oct 30, 2020

andreyvelich mentioned this pull request Oct 30, 2020

Kubeflow 1.2 release doc changes #2322

Closed

8bitmp3 reviewed Nov 8, 2020

View reviewed changes

andreyvelich force-pushed the issue-1292-resume-experiment-doc branch from 9c71d2e to e1fc105 Compare November 11, 2020 15:26

andreyvelich changed the title ~~[WIP] Add resume policy instructions for Katib experiments~~ Add resume policy instructions for Katib experiments Nov 11, 2020

k8s-ci-robot removed the do-not-merge/work-in-progress label Nov 11, 2020

k8s-ci-robot requested review from 8bitmp3, gaocegege and johnugeorge November 11, 2020 15:29

andreyvelich force-pushed the issue-1292-resume-experiment-doc branch from 6b8a0b7 to 49856e2 Compare November 13, 2020 18:49

andreyvelich mentioned this pull request Nov 13, 2020

stop a running experiment without deleting it kubeflow/katib#934

Closed

gaocegege reviewed Nov 13, 2020

View reviewed changes

k8s-ci-robot added the lgtm label Nov 13, 2020

andreyvelich added 10 commits November 14, 2020 00:11

Add resume policy instructions

8748560

Change order

0f444f5

Delete resource when experiment is deleted

ff32a54

Fix few spelling mistakes

3dff6ea

Fix line

271d21c

Fix default value

dbe07a3

Modify resume experiment doc

c6607d2

Plural description

9105dc8

Make capital title

dbcf277

Fix links

391bfd6

andreyvelich force-pushed the issue-1292-resume-experiment-doc branch from 49856e2 to 391bfd6 Compare November 14, 2020 01:11

k8s-ci-robot removed the lgtm label Nov 14, 2020

k8s-ci-robot added the do-not-merge/hold label Nov 14, 2020

Fix next steps

041ffdb

k8s-ci-robot removed the do-not-merge/hold label Nov 14, 2020

k8s-ci-robot added the lgtm label Nov 14, 2020

k8s-ci-robot merged commit 8c7a5d3 into kubeflow:master Nov 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add resume policy instructions for Katib experiments #2324

Add resume policy instructions for Katib experiments #2324

andreyvelich commented Oct 30, 2020

kubeflow-bot commented Oct 30, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

8bitmp3 Nov 8, 2020

k8s-ci-robot commented Nov 11, 2020

andreyvelich commented Nov 11, 2020

gaocegege left a comment

gaocegege commented Nov 13, 2020

andreyvelich commented Nov 14, 2020

andreyvelich commented Nov 14, 2020

gaocegege commented Nov 14, 2020

	- Read about [Katib Configuration (Katib config)](/docs/components/katib/katib-config/).
	- Check the [Katib configuration (Katib config)](/docs/components/katib/katib-config/) page.

	After the experiment has succeeded, the suggestion's deployment and
	After the experiment is successful, the suggestion's deployment and

		See the
		[from volume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18).

-See the
-[from volume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18).
+Check the
+[`from-volume-resume.yaml`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/from-volume-resume.yaml#L18)
+example to learn more.

	Read more about Katib concepts in the
	Learn more about Katib concepts in the

	See the [never resume policy example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/never-resume.yaml#L20).
	Check the [`never-resume.yaml`](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/resume-experiment/never-resume.yaml#L20)
	example for more details.

	See the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).
	(Refer to the [`ResumePolicy` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/experiments/v1beta1/experiment_types.go#L54).)

	For details of how to configure and run your experiment, see the guide to
	For details on how to configure and run your experiment, check the guide on

Add resume policy instructions for Katib experiments #2324

Add resume policy instructions for Katib experiments #2324

Conversation

andreyvelich commented Oct 30, 2020

kubeflow-bot commented Oct 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 11, 2020

andreyvelich commented Nov 11, 2020

gaocegege left a comment

Choose a reason for hiding this comment

gaocegege commented Nov 13, 2020

andreyvelich commented Nov 14, 2020

andreyvelich commented Nov 14, 2020

gaocegege commented Nov 14, 2020