Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add granular termination reason in container termination message #7390

Closed
wants to merge 1 commit into from

Conversation

renzodavid9
Copy link
Contributor

@renzodavid9 renzodavid9 commented Nov 17, 2023

Fixes #7223.

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We evaluated changing the container reason directly, but looks like k8s doesn't allow this.

/kind feature

Changes

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Has Docs if any changes are user facing, including updates to minimum requirements e.g. Kubernetes version bumps
  • Has Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add one by adding a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings). See some examples of good release notes.
  • Release notes contains the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Steps in a TaskRun will have more granular termination reasons indicating what exactly happened: Completed, Continued, Error, TimeoutExceeded, Skipped, TaskRunCancelled

@tekton-robot tekton-robot added release-note-none Denotes a PR that doesnt merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Nov 17, 2023
Copy link

linux-foundation-easycla bot commented Nov 17, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: renzodavid9 / name: Renzo Rojas (2b5e3f3)
  • ✅ login: chitrangpatel / name: Chitrang Patel (2b5e3f3)

@tekton-robot
Copy link
Collaborator

Hi @renzodavid9. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 17, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.6% 86.5% -1.2

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 23, 2023
@tekton-robot tekton-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 27, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.0% 85.4% -1.6
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/entrypoint/entrypointer.go 87.0% 85.4% -1.6
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 28, 2023
@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 4, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesnt merit a release note. labels Dec 4, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 92.9% 95.1% 2.2

@JeromeJu JeromeJu self-assigned this Dec 4, 2023
@JeromeJu
Copy link
Member

JeromeJu commented Dec 4, 2023

/ok-to-test

@tekton-robot tekton-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Dec 4, 2023
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 92.9% 95.1% 2.2

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2023
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2023
@JeromeJu
Copy link
Member

JeromeJu commented Dec 6, 2023

cc @chitrangpatel for the context of #7223

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

pkg/pod/pod.go Outdated Show resolved Hide resolved
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

@chitrangpatel
Copy link
Contributor

chitrangpatel commented Dec 11, 2023

/hold Would changing the Reason to a different string be something that is considered a breaking change according to our API compatibility policy?

#7223 (comment) shows the changes to the Reason value that has been implemented here.

Looking for feedback before we merge this. Wondering how this might affect users of this field (maybe @tektoncd/dashboard-maintainers )

cc @tektoncd/core-maintainers

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2023
@chitrangpatel
Copy link
Contributor

/approve

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chitrangpatel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2023
@JeromeJu
Copy link
Member

JeromeJu commented Dec 11, 2023

/hold Would changing the Reason to a different string be something that is considered a breaking change according to our API compatibility policy?

From my understanding, the change shall not fall under the categorization of a breaking change. Given its context, it seems more of a "bug" fix rather than breaking our users given that we were previously covering the actual termination reasons.

Related with tektoncd#7223.

To report specific Steps termination reasons we need to know why its continer finished; we use the termination message to store a new "state" with this information. We evaluated changing the container `reason` directly, but looks like k8s doesn't allow this.

Co-authored-by: JeromeJu <[email protected]>
Co-authored-by: Chitrang Patel <[email protected]>
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage-df to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/waiter.go 81.0% 85.0% 4.0
pkg/entrypoint/entrypointer.go 87.0% 86.5% -0.5
pkg/pod/status.go 93.0% 95.2% 2.2

@AlanGreene
Copy link
Member

AlanGreene commented Dec 11, 2023

Looking for feedback before we merge this. Wondering how this might affect users of this field (maybe @tektoncd/dashboard-maintainers )

This will affect the Dashboard, and likely any other client, which relies on the current reason values to correctly determine if a step was completed successfully. For example, changing the reason from Completed to Continued where a step uses onError: continue to ignore errors would no longer be considered a success by the Dashboard and would display an unknown or pending status.

I'm not sure I agree with the original categorisation of #7223 as a bug, it seems more like an enhancement to differentiate between these cases with more granular reason values.

@vdemeester
Copy link
Member

Looking for feedback before we merge this. Wondering how this might affect users of this field (maybe @tektoncd/dashboard-maintainers )

This will affect the Dashboard, and likely any other client, which relies on the current reason values to correctly determine if a step was completed successfully. For example, changing the reason from Completed to Continued where a step uses onError: continue to ignore errors would no longer be considered a success by the Dashboard and would display an unknown or pending status.

I'm not sure I agree with the original categorisation of #7223 as a bug, it seems more like an enhancement to differentiate between these cases with more granular reason values.

I agree with @AlanGreene there. It's gonna affect any clients, so we have to take any change there very carefully, and ideally, in a non breaking way.

@chitrangpatel
Copy link
Contributor

Looking for feedback before we merge this. Wondering how this might affect users of this field (maybe @tektoncd/dashboard-maintainers )

This will affect the Dashboard, and likely any other client, which relies on the current reason values to correctly determine if a step was completed successfully. For example, changing the reason from Completed to Continued where a step uses onError: continue to ignore errors would no longer be considered a success by the Dashboard and would display an unknown or pending status.
I'm not sure I agree with the original categorisation of #7223 as a bug, it seems more like an enhancement to differentiate between these cases with more granular reason values.

I agree with @AlanGreene there. It's gonna affect any clients, so we have to take any change there very carefully, and ideally, in a non breaking way.

Sound good! I'm glad that I put a hold on it 😅. Let's discuss this more at the API WG call on Monday (I believe there is agenda item for this already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surfacing of actual Termination Reason in Step Status
6 participants