Add possibility for a client to cancel a build… #510

vdemeester · 2018-12-21T10:52:12Z

… by updating the status of a build.

It will mark the build as failed and clean any pods related to it.

Related to tektoncd/pipeline#272
I tried that on knative/build first (as it's simpler 🙃)

Signed-off-by: Vincent Demeester [email protected]

imjasonh

This looks great, thanks for adding this. Two questions:

1.) Is it normally common practice for a client to be able to update k8s resources' status this way? I'm thinking of something like Job, which runs to completion, but which if you want to cancel it you basically just delete it. In this case, we want to stop the build's execution without also deleting any record of that resource existing.

2.) Once this is done we should also make the same change for TaskRun, and then PipelineRun, which should cancel (but not delete) its created TaskRuns. We might even just want to go straight to TaskRun and drop this PR. I'll leave that decision to up to you.

vdemeester · 2019-01-07T14:54:39Z

@imjasonh

1.) Is it normally common practice for a client to be able to update k8s resources' status this way? I'm thinking of something like Job, which runs to completion, but which if you want to cancel it you basically just delete it. In this case, we want to stop the build's execution without also deleting any record of that resource existing.

I really am not sure… I'm trying to look around for that 😅

2.) Once this is done we should also make the same change for TaskRun, and then PipelineRun, which should cancel (but not delete) its created TaskRuns. We might even just want to go straight to TaskRun and drop this PR. I'll leave that decision to up to you.

Yes, I did this here as it was simpler/quicker but I intend to follow-up on build-pipeline. I'm also fine doing it directly on build-pipeline, I just thought if we want it in build-pipeline we may want it there too 👼

shashwathi · 2019-01-07T15:33:55Z

@vdemeester Great PR

If build is completed then what is the point updating status to Cancelled?
Build is completed so all associated resources are deleted.
If build is running then cluster name would be set in build status.
So I don't understand why there is a API call to get pod resource? and why not just call the delete API directly?

vdemeester · 2019-01-07T15:43:45Z

@vdemeester Great PR

If build is completed then what is the point updating status to Cancelled?
Build is completed so all associated resources are deleted.

So, if you want to cancel the build (currently using the spec), then you want to mark it as Cancelled as soon as possible and stop the execution, thus deleting the pod real quick 👼, same way we do on build timeout 🙃. (making me think there could be a small refactor to share code there).

If build is running then cluster name would be set in build status.
So I don't understand why there is a API call to get pod resource? and why not just call the delete API directly?

Indeed, there is no need.

abayer · 2019-01-07T15:56:40Z

At Kubecon, @bobcatfish told me (in the context of tektoncd/pipeline#355) that I shouldn't be making changes to the status of one CRD while reconciling another one, so I jumped through some hoops to handle having the PipelineRun's timeout apply to its TaskRuns. Not sure if that is a general thing where nothing should be modifying a CRD's status but its reconciler or if it's just that one CRD's reconciler shouldn't mess with another CRD's status.

shashwathi · 2019-01-07T22:13:18Z

So, if you want to cancel the build (currently using the spec), then you want to mark it as Cancelled as soon as possible and stop the execution, thus deleting the pod real quick 👼

I agree with this completely. I do not think you understood my question so let me rephrase
If build is completed (pods are not present and only build object exists), then what is the point of marking build as "Cancelled"?

I am thinking of this like a cycle.
build starts -> pod creation -> pod finished and deleted -> build completed.

If user marks build as "Cancelled" after completion, I do not see how that has any effect on underlying resource(pod). I would expect in this case for status to be not changed at all. If build has finished then status should not be changed(IMO).

same way we do on build timeout 🙃. (making me think there could be a small refactor to share code there).

Controller checks if build is still running and then cancels the build if timeout has passed. It doesn't cancel if build has already completed.

vdemeester · 2019-01-08T15:51:52Z

I agree with this completely. I do not think you understood my question so let me rephrase
If build is completed (pods are not present and only build object exists), then what is the point of marking build as "Cancelled"?

I am thinking of this like a cycle.
build starts -> pod creation -> pod finished and deleted -> build completed.

If user marks build as "Cancelled" after completion, I do not see how that has any effect on underlying resource(pod). I would expect in this case for status to be not changed at all. If build has finished then status should not be changed(IMO).

Indeed there is no point, and if a user marks the build as Cancelled after completion, nothing should happen indeed. It should already be the case (https://github.com/knative/build/pull/510/files#diff-f2d9d6d36e3340295720d4f3e028cb14R152). If the build is completed (successfully or not), it won't ever go into the "cancel" block 👼

Controller checks if build is still running and then cancels the build if timeout has passed. It doesn't cancel if build has already completed.

Yes, this is what happens.

shashwathi

@vdemeester my apologies for not understanding the flow correctly :) Thanks for pinging and explaining it. I appreciate it
I have reviewed again. 👍 Please take a look at my comments

shashwathi · 2019-01-08T16:01:28Z

test/e2e/simple_test.go

+
+	// Wait for a little while
+	// FIXME(vdemeester) I would prefer something less flaky-prone
+	time.Sleep(20 * time.Second)


Probably consider adding polling function to check build status is "Running" for 10sec(? or more). That is better than wait solution IMO. WDYT @vdemeester ?

@shashwathi yeah, that's why the FIXME 👼 not sure why I didn't use the polling functions here 😅 will update 😉

shashwathi · 2019-01-08T16:03:52Z

test/e2e/simple_test.go

+	}
+
+	if _, err := clients.buildClient.watchBuild(buildName); err == nil {
+		t.Fatalf("watchBuild did not return expected `cancelled` error")


watchBuild function returns build object so you could consider verifying that build status is updated with expected reason("BuildCancelled") here.

shashwathi · 2019-01-08T16:23:00Z

pkg/reconciler/build/build_test.go

+		t.Errorf("error syncing build: %v", err)
+	}
+
+	// Check that the build has the expected timeout status.


typo: I think you meant cancelled status here

shashwathi · 2019-01-08T16:27:46Z

pkg/reconciler/build/build.go

+			logger.Warnf("build %q has no pod running yet", build.Name)
+			return nil
+		}
+		p, err := c.kubeclientset.CoreV1().Pods(build.Namespace).Get(build.Status.Cluster.PodName, metav1.GetOptions{})


I think I already mentioned this(?) in previous comment about reusing build.Status.Cluster.PodName for Delete API directly. Sorry to repeat it again

oh right, didn't update that 😓

shashwathi

/lgtm

Thank you for addressing my comments.

shashwathi · 2019-01-08T21:21:55Z

/hold

shashwathi · 2019-01-08T21:24:19Z

I noticed @imjasonh had some comment earlier in thread so I am adding hold for him to add his review.

imjasonh · 2019-01-18T19:24:39Z

/lgtm
/approve
/hold cancel

Oh my goodness I didn't realize this was blocking on me, so so sorry to keep this held up so long. Thanks for this change! ❤️

vdemeester · 2019-01-18T19:46:53Z

ah.. I may need to rebase and fix 😅

shashwathi

/approve
/lgtm

knative-prow-robot · 2019-01-21T21:20:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ImJasonH, shashwathi, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ImJasonH,shashwathi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

… by updating the status of a build. It will mark the build as failed and clean any pods related to it. Signed-off-by: Vincent Demeester <[email protected]>

knative-metrics-robot · 2019-01-22T09:08:32Z

The following is the coverage report on pkg/.
Say /test pull-knative-build-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/build/build.go	76.8%	75.9%	-0.9

vdemeester · 2019-01-22T09:08:34Z

Hum… the coverage build is a bit a pain 😅 as the threshold is higher than the current coverage on that file, any change trigger a failure… 😓

knative-prow-robot · 2019-01-22T09:29:40Z

@vdemeester: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-knative-build-go-coverage	`0e37001`	link	`/test pull-knative-build-go-coverage`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

vdemeester · 2019-01-22T10:07:01Z

/test pull-knative-build-integration-tests

bobcatfish · 2019-01-22T18:29:23Z

Hum… the coverage build is a bit a pain sweat_smile as the threshold is higher than the current coverage on that file, any change trigger a failure… sweat

I agree @vdemeester - feel free to open an issue and we could work on makin this better - tbh I think we should drop the threshold for the reconcilers themselves - and/or add end to end test coverage measurement to the equation

/lgtm
/meow space

knative-prow-robot · 2019-01-22T18:29:26Z

@bobcatfish:

In response to this:

Hum… the coverage build is a bit a pain sweat_smile as the threshold is higher than the current coverage on that file, any change trigger a failure… sweat

I agree @vdemeester - feel free to open an issue and we could work on makin this better - tbh I think we should drop the threshold for the reconcilers themselves - and/or add end to end test coverage measurement to the equation

/lgtm
/meow space

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

… by updating the status of a build. It will mark the build as failed and clean any pods related to it. Signed-off-by: Vincent Demeester <[email protected]>

knative-prow-robot requested review from mattmoor and srinivashegde86 December 21, 2018 10:52

knative-prow-robot added the size/L label Dec 21, 2018

vdemeester force-pushed the cancel-build branch 3 times, most recently from ac1b453 to 7ba6f0e Compare December 21, 2018 11:20

imjasonh reviewed Jan 4, 2019

View reviewed changes

vdemeester force-pushed the cancel-build branch 2 times, most recently from 641b3c4 to d8a3947 Compare January 7, 2019 15:32

shashwathi suggested changes Jan 8, 2019

View reviewed changes

vdemeester force-pushed the cancel-build branch 3 times, most recently from a2c94b6 to 5d8630b Compare January 8, 2019 19:02

shashwathi approved these changes Jan 8, 2019

View reviewed changes

knative-prow-robot assigned shashwathi Jan 8, 2019

knative-prow-robot added lgtm approved labels Jan 8, 2019

knative-prow-robot added the do-not-merge/hold label Jan 8, 2019

vdemeester mentioned this pull request Jan 10, 2019

Add possibility for a client to cancel a {task,pipeline}run tektoncd/pipeline#381

Merged

knative-prow-robot assigned imjasonh Jan 18, 2019

knative-prow-robot removed the do-not-merge/hold label Jan 18, 2019

vdemeester force-pushed the cancel-build branch from 5d8630b to 5627048 Compare January 21, 2019 10:14

knative-prow-robot added lgtm and removed lgtm labels Jan 21, 2019

shashwathi approved these changes Jan 21, 2019

View reviewed changes

vdemeester force-pushed the cancel-build branch from 5627048 to c9d4cc7 Compare January 22, 2019 08:48

knative-prow-robot removed the lgtm label Jan 22, 2019

vdemeester force-pushed the cancel-build branch from c9d4cc7 to c4a8e85 Compare January 22, 2019 09:02

Add possibility for a client to cancel a build…

0e37001

… by updating the status of a build. It will mark the build as failed and clean any pods related to it. Signed-off-by: Vincent Demeester <[email protected]>

vdemeester force-pushed the cancel-build branch from c4a8e85 to 0e37001 Compare January 22, 2019 09:07

knative-prow-robot assigned bobcatfish Jan 22, 2019

knative-prow-robot added the lgtm label Jan 22, 2019

knative-prow-robot merged commit addfaf4 into knative:master Jan 22, 2019

vdemeester deleted the cancel-build branch January 22, 2019 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility for a client to cancel a build… #510

Add possibility for a client to cancel a build… #510

vdemeester commented Dec 21, 2018

imjasonh left a comment

vdemeester commented Jan 7, 2019

shashwathi commented Jan 7, 2019

vdemeester commented Jan 7, 2019

abayer commented Jan 7, 2019

shashwathi commented Jan 7, 2019 •

edited

Loading

vdemeester commented Jan 8, 2019

shashwathi left a comment •

edited

Loading

shashwathi Jan 8, 2019

vdemeester Jan 8, 2019

shashwathi Jan 8, 2019

shashwathi Jan 8, 2019

shashwathi Jan 8, 2019

vdemeester Jan 8, 2019

shashwathi left a comment

shashwathi commented Jan 8, 2019

shashwathi commented Jan 8, 2019

imjasonh commented Jan 18, 2019

vdemeester commented Jan 18, 2019

shashwathi left a comment

knative-prow-robot commented Jan 21, 2019

knative-metrics-robot commented Jan 22, 2019

vdemeester commented Jan 22, 2019

knative-prow-robot commented Jan 22, 2019 •

edited

Loading

vdemeester commented Jan 22, 2019

bobcatfish commented Jan 22, 2019

knative-prow-robot commented Jan 22, 2019

Add possibility for a client to cancel a build… #510

Add possibility for a client to cancel a build… #510

Conversation

vdemeester commented Dec 21, 2018

imjasonh left a comment

Choose a reason for hiding this comment

vdemeester commented Jan 7, 2019

shashwathi commented Jan 7, 2019

vdemeester commented Jan 7, 2019

abayer commented Jan 7, 2019

shashwathi commented Jan 7, 2019 • edited Loading

vdemeester commented Jan 8, 2019

shashwathi left a comment • edited Loading

Choose a reason for hiding this comment

shashwathi Jan 8, 2019

Choose a reason for hiding this comment

vdemeester Jan 8, 2019

Choose a reason for hiding this comment

shashwathi Jan 8, 2019

Choose a reason for hiding this comment

shashwathi Jan 8, 2019

Choose a reason for hiding this comment

shashwathi Jan 8, 2019

Choose a reason for hiding this comment

vdemeester Jan 8, 2019

Choose a reason for hiding this comment

shashwathi left a comment

Choose a reason for hiding this comment

shashwathi commented Jan 8, 2019

shashwathi commented Jan 8, 2019

imjasonh commented Jan 18, 2019

vdemeester commented Jan 18, 2019

shashwathi left a comment

Choose a reason for hiding this comment

knative-prow-robot commented Jan 21, 2019

knative-metrics-robot commented Jan 22, 2019

vdemeester commented Jan 22, 2019

knative-prow-robot commented Jan 22, 2019 • edited Loading

vdemeester commented Jan 22, 2019

bobcatfish commented Jan 22, 2019

knative-prow-robot commented Jan 22, 2019

shashwathi commented Jan 7, 2019 •

edited

Loading

shashwathi left a comment •

edited

Loading

knative-prow-robot commented Jan 22, 2019 •

edited

Loading