Reconcile semantics for Suggestion Algorithms #1633

johnugeorge · 2021-08-22T15:21:34Z

Currently, GetSuggestions call does not follow Kubernetes reconcile semantics. eg: If suggestion controller cannot update the suggestions returned from GetSuggestions call(from suggestion algorithm service), new suggestions are created again during the next try. This causes few suggestions to be leaked out.

In this PR, new variable is passed in the GetSuggestions Call which indicates the total Suggestions requested till date. If there are more trials in DB which are not recorded, it reuses the missed suggestions from DB while remaining required number is generated. So, GetSuggestions will ensure that missed suggestions are reused first before generating new ones.

Fixes #1534
/hold

gaocegege · 2021-08-23T02:33:11Z

/retest

johnugeorge · 2021-08-23T06:15:30Z

/test

aws-kf-ci-bot · 2021-08-23T06:15:37Z

@johnugeorge: The /test command needs one or more targets.
The following commands are available to trigger jobs:

/test kubeflow-katib-presubmit

Use /test all to run all jobs.

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

johnugeorge · 2021-08-23T06:35:30Z

/test kubeflow-katib-presubmit

johnugeorge · 2021-08-23T07:48:36Z

/test kubeflow-katib-presubmit

johnugeorge · 2021-08-23T08:46:19Z

/hold cancel

andreyvelich · 2021-08-23T15:10:42Z

pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient.go

+	logger.Info("Getting suggestions", "endpoint", endpoint, "response", len(responseSuggestion.ParameterAssignments),
+		"requestNum", requestNum)


Make it more clear here ?

Suggested change

logger.Info("Getting suggestions", "endpoint", endpoint, "response", len(responseSuggestion.ParameterAssignments),

"requestNum", requestNum)

logger.Info("Getting suggestions", "endpoint", endpoint, "number of response parameters", len(responseSuggestion.ParameterAssignments),

"number of request parameters", requestNum)

andreyvelich · 2021-08-23T15:11:36Z

pkg/controller.v1beta1/suggestion/suggestionclient/suggestionclient.go

 	if len(responseSuggestion.ParameterAssignments) != requestNum {
 		err := fmt.Errorf("The response contains unexpected trials")
-		logger.Error(err, "The response contains unexpected trials", "requestNum", requestNum, "response", responseSuggestion)
+		logger.Error(err, "The response contains unexpected trials", "requestNum", requestNum, "response", len(responseSuggestion.ParameterAssignments))


Suggested change

logger.Error(err, "The response contains unexpected trials", "requestNum", requestNum, "response", len(responseSuggestion.ParameterAssignments))

logger.Error(err, "The response contains unexpected trials", "number of request parameters", requestNum, "number of response parameters", len(responseSuggestion.ParameterAssignments))

andreyvelich · 2021-08-23T15:14:49Z

pkg/mock/v1beta1/api/earlystopping.go

@@ -6,10 +6,11 @@ package mock

 import (
 	context "context"
+	reflect "reflect"


We might need to define specific version for mockgen to have same generation files.
Do we need these changes in the PR ?

andreyvelich · 2021-08-23T15:28:34Z

pkg/suggestion/v1beta1/chocolate/base_service.py

+        new_actual_requested_no = total_request_number - len(self.created_trials)
+        prev_generated_no = request_number - new_actual_requested_no
+        logger.info("In this call, New {} Trials will be generated, {} Trials will be reused from previously generated".format(new_actual_requested_no, prev_generated_no))


In the first call, new_actual_requested_no = 0 and prev_generated_no = 3, when request_number = 3.
Is that correct ?

It is the other way around.
In the normal case,
total_request_number == len(self.created_trials) + request_number where self.created_trials correspond to the number of previously created trials. So, prev_generated_no will be 0 in this case

When there is a difference, it means that some of the suggestions in self.created_trials(same as in DB) are not recorded in K8s Suggestions resource. So, prev_generated_no will be greater than 0 in this case

andreyvelich · 2021-08-23T15:30:19Z

pkg/suggestion/v1beta1/chocolate/base_service.py

+        if total_request_number != len(self.created_trials) + request_number:
+            logger.info("Mismatch in generated trials with k8s suggestions trials")


What does this log mean ?

Earlier comment

andreyvelich · 2021-08-23T17:13:12Z

/hold for the testing

gaocegege · 2021-08-24T03:31:34Z

/lgtm

gaocegege · 2021-08-24T03:34:30Z

/retest

andreyvelich

I tested this fix.

/lgtm
/approve
/hold cancel

google-oss-robot · 2021-08-24T15:27:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich,johnugeorge]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2021-08-24T15:27:57Z

/retest

andreyvelich · 2021-08-24T15:41:20Z

/lgtm

* Reuse suggestions * Fix tests

) * Reuse suggestions * Fix tests

google-oss-robot added do-not-merge/hold approved labels Aug 22, 2021

google-oss-robot requested review from gaocegege, hougangliu and sperlingxx August 22, 2021 15:21

google-oss-robot added the size/L label Aug 22, 2021

google-oss-robot removed the do-not-merge/hold label Aug 23, 2021

andreyvelich reviewed Aug 23, 2021

View reviewed changes

google-oss-robot added the do-not-merge/hold label Aug 23, 2021

google-oss-robot assigned gaocegege Aug 24, 2021

google-oss-robot added the lgtm label Aug 24, 2021

johnugeorge force-pushed the master branch from fd4980e to cec0198 Compare August 24, 2021 04:40

google-oss-robot removed the lgtm label Aug 24, 2021

Reuse suggestions

ce12a89

johnugeorge force-pushed the master branch from cec0198 to ce12a89 Compare August 24, 2021 13:45

andreyvelich reviewed Aug 24, 2021

View reviewed changes

google-oss-robot removed the do-not-merge/hold label Aug 24, 2021

google-oss-robot assigned andreyvelich Aug 24, 2021

google-oss-robot added the lgtm label Aug 24, 2021

Fix tests

acc67db

google-oss-robot removed the lgtm label Aug 24, 2021

andreyvelich mentioned this pull request Aug 24, 2021

Rename request_number parameter in gRPC manager #1637

Closed

google-oss-robot added the lgtm label Aug 24, 2021

google-oss-robot merged commit fe5963f into kubeflow:master Aug 24, 2021

johnugeorge added a commit to johnugeorge/katib that referenced this pull request Aug 27, 2021

Reconcile semantics for Suggestion Algorithms (kubeflow#1633)

5528447

* Reuse suggestions * Fix tests

johnugeorge mentioned this pull request Aug 27, 2021

CherryPick: Reconcile semantics for Suggestion Algorithms (#1633) #1644

Merged

google-oss-robot pushed a commit that referenced this pull request Aug 27, 2021

CherryPick: Reconcile semantics for Suggestion Algorithms (#1633) (#1644

698a9c6

) * Reuse suggestions * Fix tests

This was referenced Feb 15, 2022

Trials with same hyperparameters - Random search #1566

Closed

Multiple trials spawned with the same parameters when using RANDOM search #842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconcile semantics for Suggestion Algorithms #1633

Reconcile semantics for Suggestion Algorithms #1633

johnugeorge commented Aug 22, 2021

gaocegege commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

aws-kf-ci-bot commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

andreyvelich Aug 23, 2021

andreyvelich Aug 23, 2021

andreyvelich Aug 23, 2021

andreyvelich Aug 23, 2021

johnugeorge Aug 23, 2021

andreyvelich Aug 23, 2021

johnugeorge Aug 23, 2021

andreyvelich commented Aug 23, 2021

gaocegege commented Aug 24, 2021

gaocegege commented Aug 24, 2021

andreyvelich left a comment

google-oss-robot commented Aug 24, 2021

andreyvelich commented Aug 24, 2021

andreyvelich commented Aug 24, 2021

		logger.Info("Getting suggestions", "endpoint", endpoint, "response", len(responseSuggestion.ParameterAssignments),
		"requestNum", requestNum)

	logger.Error(err, "The response contains unexpected trials", "requestNum", requestNum, "response", len(responseSuggestion.ParameterAssignments))
	logger.Error(err, "The response contains unexpected trials", "number of request parameters", requestNum, "number of response parameters", len(responseSuggestion.ParameterAssignments))

		if total_request_number != len(self.created_trials) + request_number:
		logger.info("Mismatch in generated trials with k8s suggestions trials")

Reconcile semantics for Suggestion Algorithms #1633

Reconcile semantics for Suggestion Algorithms #1633

Conversation

johnugeorge commented Aug 22, 2021

gaocegege commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

aws-kf-ci-bot commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

johnugeorge commented Aug 23, 2021

andreyvelich Aug 23, 2021

Choose a reason for hiding this comment

andreyvelich Aug 23, 2021

Choose a reason for hiding this comment

andreyvelich Aug 23, 2021

Choose a reason for hiding this comment

andreyvelich Aug 23, 2021

Choose a reason for hiding this comment

johnugeorge Aug 23, 2021

Choose a reason for hiding this comment

andreyvelich Aug 23, 2021

Choose a reason for hiding this comment

johnugeorge Aug 23, 2021

Choose a reason for hiding this comment

andreyvelich commented Aug 23, 2021

gaocegege commented Aug 24, 2021

gaocegege commented Aug 24, 2021

andreyvelich left a comment

Choose a reason for hiding this comment

google-oss-robot commented Aug 24, 2021

andreyvelich commented Aug 24, 2021

andreyvelich commented Aug 24, 2021