-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP 0097: Breakpoints for TaskRun and PipelineRuns #572
TEP 0097: Breakpoints for TaskRun and PipelineRuns #572
Conversation
f57840e
to
64f4873
Compare
/assign vdemeester |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed proposal @waveywaves!
Data Structure
As discussed with @bobcatfish, our main concern is that the API change is encoded into Strings. Prefixing the name or index of a Step
with before
or after
specializes some strings and restricts their use. Instead, we can use the appropriate data structure that supports the type needed here. I see that in TEP-0042: TaskRun Breakpoint on Failure of Step, there was an alternative that has the data structure to support this (we can explore other alternatives too):
debug:
breakpoint:
onFailure: true
beforeStep: []
afterStep: []
Names vs Indexes
Another concern that came up in an API WG discussion was the use of Step
indexes which may change if a Task
is updated without bumping its version (bumping policy to be addressed in tektoncd/catalog#784). In the meantime, can we please scope down this TEP to use the name of the Steps
only (not the indexes)? Given that there are some Steps
that are unnamed, we can make it explicit that this functionality is only supported in named Steps
-- and we can explore supporting unnamed Steps
in future work (preferably after we establish the version bumping policy).
@jerop Thank you for the review.
I agree. Prefixing stepNames with "before" or "after" for breakpointing would disallow the user from using certain strings as step/task name eg: before-build etc. +1 for using the proposed alternative of the breakpoint datastructure.
That makes sense. Using indexes does add some ambiguity in the mix of which step/task might be getting used (especially if the tasks are getting unversioned updates). +1 for scoping this proposal down to referencing step/task names using just their names initially. |
64f4873
to
fd5bf13
Compare
fd5bf13
to
5c74bd5
Compare
@jerop I have updated the proposal let me know what you think |
based on tektoncd/community#572 (review) --- the onFailure breakpoint is updated to be a bool to avoid considering "onFailure" as reserved string.
based on tektoncd/community#572 (review) --- the onFailure breakpoint is updated to be a bool to avoid considering "onFailure" as reserved string.
based on tektoncd/community#572 (review) --- the onFailure breakpoint is updated to be a bool to avoid considering "onFailure" as reserved string.
based on tektoncd/community#572 (review) --- the onFailure breakpoint is updated to be a bool to avoid considering "onFailure" as reserved string.
Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisted that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy. Fxes tektoncd#784
Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisted that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy in the contributions guide. Fixes tektoncd#784
Today, we don't always bump versions of resources when updating them. The guideline has been to bump versions only when behavior changes, but it's hard to figure out when the behavior has changed (a change that could be trivial to one user could be meaningful to another). Not bumping resource versions when changing them causes issues where the resource definition becomes dependent on the time when it was applied by the user - which causes unexpected failures as described in tektoncd#784. This issue also came up as an issue where users cannot depend on the Step indices because they can change: tektoncd/community#572 (review). In TEP-0003, we already proposed that a policy for versioning of resources: https://github.com/tektoncd/community/blob/main/teps/0003-tekton-catalog-organization.md#versioning-resources In Catalog Working Group on 01/13/2022, we revisited that policy and: - agreed to follow the versioning policy - make it easier to bump resources (see tektoncd/plumbing#994) This change documents the versioning policy in the contributions guide. Fixes tektoncd#784
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, and sorry it took so long to review on my side 😓
I think the proposal looks solid, I have a few comments but they are more related to the format of the TEP than to the essence of the design.
From the TEP it looks like debug is entirely new functionality, while some part of it is already implemented, so maybe you could distinguish the two parts in the TEP so the scope is more clear.
I see no mention about the timeout implications, do you plan to add those in a follow-up PR?
/approve
- [CLI integration](#cli-integration) | ||
- [Environment access](#environment-access) | ||
- [Alternatives](#alternatives) | ||
-<!-- /toc --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: empty -
```yaml | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: PipelineRun | ||
metadata: | ||
generateName: quarante-deux-deux | ||
spec: | ||
debug: | ||
breakpoint: | ||
onFailure: true | ||
beforeStep: ["task-build-archive"] | ||
afterStep: ["deploy-frontend"] | ||
``` | ||
|
||
```yaml | ||
apiVersion: tekton.dev/v1beta1 | ||
kind: TaskRun | ||
metadata: | ||
generateName: quarante-deux | ||
spec: | ||
debug: | ||
breakpoint: | ||
onFailure: true | ||
beforeStep: ["test-js"] | ||
afterStep: ["push-to-registry"] | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like the proposed API design than the goal :)
Maybe you could move this to a section further down.
You can use goals and non-goals to define the scope you have in mind for the TEP.
|
||
## Requirements | ||
|
||
* To enable breakpoint on failure of a step, it would be necessary to update the Entrypoint to support lifecycle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: the steps required for implementation should be further down in the proposal.
Requirements could be something like:
- being able to set a breakpoint on failure for a step
- being able to set a breakpoint before a step
etc...
The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps | ||
which the user plans on debugging. | ||
|
||
_This functionality will only support named steps_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - I think this is a fair assumption. We had a similar discussion about resources, and we want to document setting the name for steps as the recommended best practice.
beforeStep: ["task-build-archive"] | ||
afterStep: ["deploy-frontend"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this before|afterStep
or before|afterTask
?
beforeStep: ["test-js"] | ||
afterStep: ["push-to-registry"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both before
and after
be specified, and they're both lists, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe renaming them to beforeSteps
and afterSteps
could make it clear that they are lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done !
`/tekton/debug/scripts/debug-continue` : Mark the step as completed with success by writing to `/tekton/run/<step-no.>/`. eg: User wants to mark | ||
failed step 0 as a success. Running this script would create `/tekton/run/0/out`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so even in case of breakpoint on failure, the user can simulate success and continue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep !
`/tekton/debug/info/<n>` : Contains information about the step. Single EmptyDir shared between all step containers, but renamed | ||
to reflect step number. eg: Step 0 will have `/tekton/debug/info/0`, Step 1 will have `/tekton/debug/info/1` etc. | ||
|
||
#### Debug Scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the scripts below talk about a failed step 0
- are these scripts only meant to be used for the case of breakpoint on failure? Can they be used for other breakpoint types as well, with similar effects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible to use them in other scenarios as well.
afterTask: ["deploy-frontend"] | ||
``` | ||
|
||
### Breakpoint before or after a TaskRun |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! :)
Since a pipeline may be running several tasks in parallel, in case of breakpoints, multiple ones could be hit in parallel, and the user will have the opportunity of interacting with all the tasks under debug.
Tasks that do not depend on any task being debugged will continue to run and be scheduled normally, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tasks that do not depend on any task being debugged will continue to run and be scheduled normally, correct?
Yes. Those tasks will continue running normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added your comment as a statement under Dependent and Independent TaskRuns
in this proposal.
for the TaskRun to stop executing (due to failure) and go into the limbo state which would be leveraged for debugging. | ||
The CLI will open a shell to the step container which would be a reimplementation of `kubectl exec -it`. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see alternatives but there's no pros and cons specified for them, so it's not clear why you picked the other option? I think the proposed option looks good, but it would be good to call out pros / cons for these, or if none is available mention it in a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll add these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ! ✔️
9c79355
to
c961121
Compare
Finally updated this TEP after ages. |
6578c0b
to
3a3bcf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waveywaves thanks for the updates - the API is much clearer 🎉
left some comments to clarify that we're adding support for debugging pipelineruns - but overall looks good to me
Debugging TaskRuns can be tiresome. Re-running Tasks to figure out which part of a particular step is to blame would tax productivity. | ||
By enabling breakpoint on failure for a TaskRun we should be able to halt a TaskRun at the failing step and get access to the | ||
step environment to analyze cause of the failure. Unlike legacy systems, this allows the user to debug TaskRuns while the TaskRun | ||
is still running, hence improving developer productivity and Pipeline debuggability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waveywaves this summary is focused on taskruns - which was addressed in https://github.com/tektoncd/community/blob/main/teps/0042-taskrun-breakpoint-on-failure.md - seems copy pasted from there?
we need to update it to reflect that this TEP proposes debugging for pipelineruns
This enhancement can allow a more interactive way of working with TaskRuns. CLI or IDE extensions could allow users to run a | ||
Task with breakpoint-on-failure and while the logs are being streamed, the user can be dropped to a shell in the breakpoint | ||
failed step and allow the user to debug as they develop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and pipelineruns?
The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps | ||
which the user plans on debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps | |
which the user plans on debugging. | |
The `breakpoint` spec would contain the `beforeSteps` and `afterSteps` specs which are arrays containing names of steps | |
which the user plans on debugging. |
The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps | |
which the user plans on debugging. | |
The `breakpoint` spec would contain the `beforeStep` and `afterStep` specs which are arrays containing names of steps | |
which the user plans on debugging. |
authors: | ||
- "@waveywaves" | ||
creation-date: 2021-12-10 | ||
last-updated: 2021-12-10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update this date
- "@waveywaves" | ||
creation-date: 2021-12-10 | ||
last-updated: 2021-12-10 | ||
status: proposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can mark the tep as implementable given that the design is included in this change
API WG - ready to merge this offline after the TEP is updated, we have enough approval! |
37ffbf7
to
b0a115b
Compare
@jerop I have added the updates based on your comments. The summary indeed was a copy 😝 I have added a more relevant one and updated the use cases to reflect changes introduced in this PR and pipelineruns. Changed |
@waveywaves thanks for making the updates - i intended the change to be |
b0a115b
to
257a160
Compare
@jerop just updated. I misread your comment earlier then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waveywaves thank you!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, jerop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This proposal builds on top of TEP-0042. It outlines the implementation details for attaching breakpoints against steps in TaskRuns and breakpoints against Tasks in PipelineRuns.
257a160
to
031c292
Compare
/lgtm |
Is it implemented? tektoncd/pipeline#5184 was closed by bot without any pull requests. |
@wuhuizuo not yet probably. |
This proposal builds on top of TEP-0042. It outlines the implementation details for attaching breakpoints against steps in TaskRuns and breakpoints against Tasks in PipelineRuns.
todo: