TEP 0097: Breakpoints for TaskRun and PipelineRuns #572

waveywaves · 2021-12-09T22:08:18Z

This proposal builds on top of TEP-0042. It outlines the implementation details for attaching breakpoints against steps in TaskRuns and breakpoints against Tasks in PipelineRuns.

todo:

Add details on PipelineRun timeout
Implications for Dependent TaskRuns
Expound on the usage of /tekton/run

waveywaves · 2021-12-14T21:19:45Z

/assign vdemeester

jerop · 2021-12-16T22:51:49Z

/assign

jerop

Thanks for the detailed proposal @waveywaves!

Data Structure

As discussed with @bobcatfish, our main concern is that the API change is encoded into Strings. Prefixing the name or index of a Step with before or after specializes some strings and restricts their use. Instead, we can use the appropriate data structure that supports the type needed here. I see that in TEP-0042: TaskRun Breakpoint on Failure of Step, there was an alternative that has the data structure to support this (we can explore other alternatives too):

debug:
  breakpoint: 
    onFailure: true
    beforeStep: []
    afterStep: []

Names vs Indexes

Another concern that came up in an API WG discussion was the use of Step indexes which may change if a Task is updated without bumping its version (bumping policy to be addressed in tektoncd/catalog#784). In the meantime, can we please scope down this TEP to use the name of the Steps only (not the indexes)? Given that there are some Steps that are unnamed, we can make it explicit that this functionality is only supported in named Steps -- and we can explore supporting unnamed Steps in future work (preferably after we establish the version bumping policy).

waveywaves · 2021-12-31T08:06:18Z

@jerop Thank you for the review.

Prefixing the name or index of a Step with before or after specializes some strings and restricts their use. Instead, we can use the appropriate data structure that supports the type needed here. I see that in TEP-0042: TaskRun Breakpoint on Failure of Step, there was an alternative that has the data structure to support this (we can explore other alternatives too):
debug:
 breakpoint: 
   onFailure: true
   beforeStep: []
   afterStep: []

I agree. Prefixing stepNames with "before" or "after" for breakpointing would disallow the user from using certain strings as step/task name eg: before-build etc. +1 for using the proposed alternative of the breakpoint datastructure.

Another concern that came up in an API WG discussion was the use of Step indexes which may change if a Task is updated without bumping its version (bumping policy to be addressed in tektoncd/catalog#784). In the meantime, can we please scope down this TEP to use the name of the Steps only (not the indexes)? Given that there are some Steps that are unnamed, we can make it explicit that this functionality is only supported in named Steps -- and we can explore supporting unnamed Steps in future work (preferably after we establish the version bumping policy).

That makes sense. Using indexes does add some ambiguity in the mix of which step/task might be getting used (especially if the tasks are getting unversioned updates). +1 for scoping this proposal down to referencing step/task names using just their names initially.

waveywaves · 2022-01-16T13:13:14Z

@jerop I have updated the proposal let me know what you think

based on tektoncd/community#572 (review) --- the onFailure breakpoint is updated to be a bool to avoid considering "onFailure" as reserved string.

Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisted that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy. Fxes tektoncd#784

Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisted that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy in the contributions guide. Fixes tektoncd#784

Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisited that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy in the contributions guide. Fixes tektoncd#784

afrittoli

Thanks for this, and sorry it took so long to review on my side 😓

I think the proposal looks solid, I have a few comments but they are more related to the format of the TEP than to the essence of the design.

From the TEP it looks like debug is entirely new functionality, while some part of it is already implemented, so maybe you could distinguish the two parts in the TEP so the scope is more clear.

I see no mention about the timeout implications, do you plan to add those in a follow-up PR?

/approve

afrittoli · 2022-02-03T14:14:54Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+    - [CLI integration](#cli-integration)
+        - [Environment access](#environment-access)
+- [Alternatives](#alternatives)
+-<!-- /toc -->


NIT: empty -

afrittoli · 2022-02-03T21:50:49Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+```yaml
+apiVersion: tekton.dev/v1beta1
+kind: PipelineRun
+metadata: 
+  generateName: quarante-deux-deux
+spec:
+  debug:
+    breakpoint:
+      onFailure: true
+      beforeStep: ["task-build-archive"]
+      afterStep: ["deploy-frontend"]
+```
+
+```yaml
+apiVersion: tekton.dev/v1beta1
+kind: TaskRun
+metadata:
+  generateName: quarante-deux
+spec:
+  debug:
+    breakpoint:
+      onFailure: true
+      beforeStep: ["test-js"]
+      afterStep: ["push-to-registry"]
+```


This looks more like the proposed API design than the goal :)
Maybe you could move this to a section further down.

You can use goals and non-goals to define the scope you have in mind for the TEP.

afrittoli · 2022-02-03T21:53:40Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+
+## Requirements
+
+* To enable breakpoint on failure of a step, it would be necessary to update the Entrypoint to support lifecycle


NIT: the steps required for implementation should be further down in the proposal.
Requirements could be something like:

being able to set a breakpoint on failure for a step

being able to set a breakpoint before a step

etc...

afrittoli · 2022-02-03T21:55:12Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps
+which the user plans on debugging. 
+
+_This functionality will only support named steps_


+1 - I think this is a fair assumption. We had a similar discussion about resources, and we want to document setting the name for steps as the recommended best practice.

afrittoli · 2022-02-03T21:56:35Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+      beforeStep: ["task-build-archive"]
+      afterStep: ["deploy-frontend"]


is this before|afterStep or before|afterTask ?

afrittoli · 2022-02-03T22:30:02Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+      beforeStep: ["test-js"]
+      afterStep: ["push-to-registry"]


Both before and after be specified, and they're both lists, correct?

maybe renaming them to beforeSteps and afterSteps could make it clear that they are lists?

afrittoli · 2022-02-03T22:38:19Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+`/tekton/debug/scripts/debug-continue` : Mark the step as completed with success by writing to `/tekton/run/<step-no.>/`. eg: User wants to mark
+failed step 0 as a success. Running this script would create `/tekton/run/0/out`.


Nice, so even in case of breakpoint on failure, the user can simulate success and continue?

afrittoli · 2022-02-03T22:41:33Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+`/tekton/debug/info/<n>` : Contains information about the step. Single EmptyDir shared between all step containers, but renamed 
+to reflect step number. eg: Step 0 will have `/tekton/debug/info/0`, Step 1 will have `/tekton/debug/info/1` etc.
+
+#### Debug Scripts


All the scripts below talk about a failed step 0 - are these scripts only meant to be used for the case of breakpoint on failure? Can they be used for other breakpoint types as well, with similar effects?

It would be possible to use them in other scenarios as well.

afrittoli · 2022-02-03T22:46:02Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+        afterTask: ["deploy-frontend"]
+```
+
+### Breakpoint before or after a TaskRun


Nice! :)

Since a pipeline may be running several tasks in parallel, in case of breakpoints, multiple ones could be hit in parallel, and the user will have the opportunity of interacting with all the tasks under debug.
Tasks that do not depend on any task being debugged will continue to run and be scheduled normally, correct?

Tasks that do not depend on any task being debugged will continue to run and be scheduled normally, correct?

Yes. Those tasks will continue running normally.

I have added your comment as a statement under Dependent and Independent TaskRuns in this proposal.

afrittoli · 2022-02-03T22:49:10Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+   for the TaskRun to stop executing (due to failure) and go into the limbo state which would be leveraged for debugging.
+   The CLI will open a shell to the step container which would be a reimplementation of `kubectl exec -it`.
+
+## Alternatives


It's nice to see alternatives but there's no pros and cons specified for them, so it's not clear why you picked the other option? I think the proposed option looks good, but it would be good to call out pros / cons for these, or if none is available mention it in a comment.

Makes sense, I'll add these.

Done ! ✔️

waveywaves · 2022-07-10T11:03:02Z

Finally updated this TEP after ages.
@jerop @vdemeester @afrittoli If you would be so kind would love to have your eyes on this one. I have added the timeout implications, changed breakpoint to breakpoints to not break previous API behavior.

jerop

@waveywaves thanks for the updates - the API is much clearer 🎉

left some comments to clarify that we're adding support for debugging pipelineruns - but overall looks good to me

jerop · 2022-07-11T14:37:21Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+Debugging TaskRuns can be tiresome. Re-running Tasks to figure out which part of a particular step is to blame would tax productivity.
+By enabling breakpoint on failure for a TaskRun we should be able to halt a TaskRun at the failing step and get access to the
+step environment to analyze cause of the failure. Unlike legacy systems, this allows the user to debug TaskRuns while the TaskRun 
+is still running, hence improving developer productivity and Pipeline debuggability.


@waveywaves this summary is focused on taskruns - which was addressed in https://github.com/tektoncd/community/blob/main/teps/0042-taskrun-breakpoint-on-failure.md - seems copy pasted from there?

we need to update it to reflect that this TEP proposes debugging for pipelineruns

jerop · 2022-07-11T14:37:48Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+This enhancement can allow a more interactive way of working with TaskRuns. CLI or IDE extensions could allow users to run a 
+Task with breakpoint-on-failure and while the logs are being streamed, the user can be dropped to a shell in the breakpoint
+failed step and allow the user to debug as they develop.


and pipelineruns?

jerop · 2022-07-11T14:38:37Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps
+which the user plans on debugging.


Suggested change

The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps

which the user plans on debugging.

The `breakpoint` spec would contain the `beforeSteps` and `afterSteps` specs which are arrays containing names of steps

which the user plans on debugging.

Suggested change

The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps

which the user plans on debugging.

The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps

which the user plans on debugging.

jerop · 2022-07-11T14:43:29Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+authors:
+  - "@waveywaves"
+creation-date: 2021-12-10
+last-updated: 2021-12-10


please update this date

jerop · 2022-07-11T14:44:07Z

teps/0097-breakpoints-for-taskruns-and-pipelineruns.md

+  - "@waveywaves"
+creation-date: 2021-12-10
+last-updated: 2021-12-10
+status: proposed 


you can mark the tep as implementable given that the design is included in this change

pritidesai · 2022-07-11T16:09:43Z

API WG - ready to merge this offline after the TEP is updated, we have enough approval!

waveywaves · 2022-07-12T05:19:05Z

@jerop I have added the updates based on your comments. The summary indeed was a copy 😝 I have added a more relevant one and updated the use cases to reflect changes introduced in this PR and pipelineruns. Changed before|afterSteps -> before|afterStep. Updated the date to today's and status to implementable.

jerop · 2022-07-12T12:20:48Z

@jerop I have added the updates based on your comments. The summary indeed was a copy 😝 I have added a more relevant one and updated the use cases to reflect changes introduced in this PR and pipelineruns. Changed before|afterSteps -> before|afterStep. Updated the date to today's and status to implementable.

@waveywaves thanks for making the updates - i intended the change to be before|afterStep -> before|afterSteps, the same way that we have before|afterTasks, because it's an array

waveywaves · 2022-07-12T12:32:41Z

@jerop just updated. I misread your comment earlier then.

jerop

@waveywaves thank you!

/lgtm

tekton-robot · 2022-07-12T12:33:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, jerop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [afrittoli,jerop]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This proposal builds on top of TEP-0042. It outlines the implementation details for attaching breakpoints against steps in TaskRuns and breakpoints against Tasks in PipelineRuns.

jerop · 2022-07-12T12:49:36Z

/lgtm

wuhuizuo · 2023-11-01T03:55:46Z

Is it implemented? tektoncd/pipeline#5184 was closed by bot without any pull requests.

vdemeester · 2023-11-02T11:07:40Z

@wuhuizuo not yet probably.

tekton-robot requested review from bobcatfish and imjasonh December 9, 2021 22:08

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 9, 2021

waveywaves force-pushed the update/debug-pipelinerun branch 2 times, most recently from f57840e to 64f4873 Compare December 10, 2021 02:47

tekton-robot assigned vdemeester Dec 14, 2021

tekton-robot assigned jerop Dec 16, 2021

jerop reviewed Dec 20, 2021

View reviewed changes

dibyom added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Dec 20, 2021

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 21, 2021

waveywaves force-pushed the update/debug-pipelinerun branch from 64f4873 to fd5bf13 Compare January 16, 2022 11:07

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 16, 2022

waveywaves force-pushed the update/debug-pipelinerun branch from fd5bf13 to 5c74bd5 Compare January 16, 2022 12:56

waveywaves mentioned this pull request Jan 20, 2022

convert onFailure breakpoint to a bool tektoncd/pipeline#4500

Closed

5 tasks

msvticket mentioned this pull request Jan 27, 2022

Change to use vanilla tektoncd/pipeline jenkins-x/lighthouse#1413

Closed

jerop mentioned this pull request Jan 31, 2022

Versioning Resources tektoncd/catalog#918

Closed

3 tasks

tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 1, 2022

afrittoli reviewed Feb 3, 2022

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 3, 2022

tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2022

waveywaves force-pushed the update/debug-pipelinerun branch 2 times, most recently from 9c79355 to c961121 Compare July 10, 2022 11:01

waveywaves force-pushed the update/debug-pipelinerun branch 3 times, most recently from 6578c0b to 3a3bcf3 Compare July 10, 2022 11:47

jerop approved these changes Jul 11, 2022

View reviewed changes

jerop reviewed Jul 11, 2022

View reviewed changes

waveywaves force-pushed the update/debug-pipelinerun branch 3 times, most recently from 37ffbf7 to b0a115b Compare July 12, 2022 05:15

waveywaves force-pushed the update/debug-pipelinerun branch from b0a115b to 257a160 Compare July 12, 2022 12:30

jerop approved these changes Jul 12, 2022

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2022

TEP 0097: Breakpoints for TaskRun and PipelineRuns

031c292

This proposal builds on top of TEP-0042. It outlines the implementation details for attaching breakpoints against steps in TaskRuns and breakpoints against Tasks in PipelineRuns.

waveywaves force-pushed the update/debug-pipelinerun branch from 257a160 to 031c292 Compare July 12, 2022 12:34

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2022

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 12, 2022

tekton-robot merged commit 6a0bd53 into tektoncd:main Jul 12, 2022

waveywaves mentioned this pull request Jul 21, 2022

TEP 0097: Add breakpoints, breakpoints.onFailure to debug and extend support to PipelineRun tektoncd/pipeline#5184

Open


		## Requirements

		* To enable breakpoint on failure of a step, it would be necessary to update the Entrypoint to support lifecycle

		beforeStep: ["task-build-archive"]
		afterStep: ["deploy-frontend"]

		`/tekton/debug/scripts/debug-continue` : Mark the step as completed with success by writing to `/tekton/run/<step-no.>/`. eg: User wants to mark
		failed step 0 as a success. Running this script would create `/tekton/run/0/out`.

		The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps
		which the user plans on debugging.

TEP 0097: Breakpoints for TaskRun and PipelineRuns #572

TEP 0097: Breakpoints for TaskRun and PipelineRuns #572

Conversation

waveywaves commented Dec 9, 2021 • edited Loading

waveywaves commented Dec 14, 2021

jerop commented Dec 16, 2021

jerop left a comment • edited Loading

Choose a reason for hiding this comment

Data Structure

Names vs Indexes

waveywaves commented Dec 31, 2021 • edited Loading

waveywaves commented Jan 16, 2022

afrittoli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waveywaves Jul 10, 2022 • edited Loading

Choose a reason for hiding this comment

waveywaves Jul 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waveywaves commented Jul 10, 2022

jerop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerop Jul 11, 2022 • edited Loading

Choose a reason for hiding this comment

pritidesai commented Jul 11, 2022

waveywaves commented Jul 12, 2022

jerop commented Jul 12, 2022

waveywaves commented Jul 12, 2022

jerop left a comment

Choose a reason for hiding this comment

tekton-robot commented Jul 12, 2022

jerop commented Jul 12, 2022

wuhuizuo commented Nov 1, 2023 • edited Loading

vdemeester commented Nov 2, 2023

waveywaves commented Dec 9, 2021 •

edited

Loading

jerop left a comment •

edited

Loading

waveywaves commented Dec 31, 2021 •

edited

Loading

waveywaves Jul 10, 2022 •

edited

Loading

waveywaves Jul 10, 2022 •

edited

Loading

jerop Jul 11, 2022 •

edited

Loading

wuhuizuo commented Nov 1, 2023 •

edited

Loading