Skip to content

Commit

Permalink
tep to ignore step error
Browse files Browse the repository at this point in the history
Proposing a tep to ignore step error and provide an option to
continue after capturing the non zero exit code. Also document the
container termination state to access it after the pipeline exectution finishes.
  • Loading branch information
pritidesai committed Feb 2, 2021
1 parent 97f1064 commit 2729f73
Show file tree
Hide file tree
Showing 2 changed files with 177 additions and 0 deletions.
176 changes: 176 additions & 0 deletions teps/0040-ignore-step-error.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
status: proposed
title: 'Ignore Step Error'
creation-date: '2021-01-06'
last-updated: '2021-02-02'
authors:
- '@pritidesai'
- '@afrittoli'
---

# TEP-0040: Ignore Step Error

<!-- toc -->
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Requirements](#requirements)
- [Use Cases](#use-cases)
- [References](#references)
<!-- /toc -->

## Summary

Tekton tasks are defined as a collection of steps in which each step can specify a container image to run.
Steps are executed in order in which they are specified. One single step failure results in a task failure
i.e. once a step results in a failure, rest of the steps are not executed. When a container exits with
non-zero exit code, the step results in error:

```yaml
$ kubectl get tr failing-taskrun-hw5xj -o json | jq .status.steps
[
{
"container": "step-failing-step",
"imageID": "...",
"name": "failing-step",
"terminated": {
"containerID": "...",
"exitCode": 244,
"finishedAt": "2021-02-02T18:27:46Z",
"reason": "Error",
"startedAt": "2021-02-02T18:27:46Z"
}
}
]
```

`TaskRun` with such step error, stops executing subsequent steps and results in a failure:

```yaml
$ kubectl get tr failing-taskrun-hw5xj -o json | jq .status.conditions
[
{
"lastTransitionTime": "2021-02-02T18:27:47Z",
"message": "\"step-failing-step\" exited with code 244 (image: \"..."); for logs run: kubectl -n default logs failing-taskrun-hw5xj-pod-wj6vn -c step-failing-step\n",
"reason": "Failed",
"status": "False",
"type": "Succeeded"
}
]
```

If such a task with a failing step is part of a pipeline, `pipelineRun` stops executing subsequent steps in that task
(similar to `taskRun`) and stops executing any other task in the pipeline and results in a pipeline failure.

```yaml
$ kubectl get pr pipelinerun-with-failing-step-csmjr -o json | jq .status.conditions
[
{
"lastTransitionTime": "2021-02-02T18:51:15Z",
"message": "Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 3",
"reason": "Failed",
"status": "False",
"type": "Succeeded"
}
]
```

Many common tasks have requirement where a step failure must not stop executing rest of the steps.
In order to continue executing subsequent steps, task authors have flexibility of wrapping an image and
exiting that step with success. This changes the failing step into success and does not block further
execution. But this is a workaround and only works with images which can be wrapped:

```shell
steps:
- image: docker.io/library/golang:latest
name: ignore-unit-test-failure
script: |
go test .
TEST_EXIT_CODE=$?
if [ $TEST_EXIT_CODE != 0 ]; then
exit 0
fi
```

This workaround does not apply to off-the-shelf container images.

Similarly, many pipelines have requirement of continue executing rest of the tasks in a pipeline by stopping the
failure of such a task in that pipeline.

As a pipeline execution engine, we want to support off-the-shelf container image as a step and provide
an option to ignore such step error. The task author can choose to continue execution, capture original non-zero
exit code, and make it available for the rest of the steps in that task. Also, provide an option to a pipeline author
to continue executing rest of the tasks by ignoring a step failure and allow accessing original non-zero exit code of
that step from rest of the tasks.

Issue: [tektoncd/pipeline#2800](https://github.com/tektoncd/pipeline/issues/2800)


## Motivation

It should be possible to easily use off-the-shelves (OTS) images as steps in Tekton tasks. A task author has no
control on the image but may desire to ignore an error and continue executing rest of the steps.

One more motivation for this proposal is to expose step level failure at the pipeline level to support tasks from
the catalog. Allowing configuring step level failures at the pipeline authoring time opens up a possibility for
the pipeline author to utilize the catalog when the author has no control over the catalog of tasks.

**Note:** Both motivations might bring separate API changes (former at the task level, and later at the pipeline level)
but the changes must be compatible with each other.

### Goals

Design a step failure strategy so that the task author can control the behaviour of an image and decide to
continue executing rest of the steps in the task.

Prevent a task from failing when a step fails.

Store the container termination state or error state and make it accessible to rest of the steps in a task.

after the task finishes execution.

This proposal must be applicable to any container image including custom images and off-the-shelf images.

### Non-Goals

This design is limited to a step within a task and does not apply to pipeline tasks.

## Requirements

* Users should be able to use prebuilt images as-is without having to do one or more of the following
(see also [TEP-0011](https://github.com/tektoncd/community/blob/master/teps/0011-redirecting-step-output-streams.md)):
* Investigating how they are built to understand if they contain a shell and possibly overriding the entrypoint
* Build and maintain their own images (i.e. add in required shell or other binaries) from those images

* It should be possible to know that a step was allowed to fail by observing the status of the `TaskRun`
(and `PipelineRun` if applicable) (e.g. to show a "warning" / display as "yellow" status in a UI)

* When a step is allowed to fail, the exit code of the process that failed should not be lost and should at a minimum be
available in the status of the `TaskRun` (and `PipelineRun` if applicable).


### Use Cases

* As a task author, I would like to design a task with multiple steps. One of the steps is running an
enterprise image to run unit tests, and the next step needs to report test results even after a previous
step results in failure due to tests failure.

* Allow migrating scripts and automations from other CI/CD systems that allowed image failures.

* A [platform team](https://github.com/tektoncd/community/blob/master/user-profiles.md#1-pipeline-and-task-authors)
wants to share a `Task` to their team which runs the following steps in sequence:
* Run unit tests (which may fail)
* Apply a mutation to the test results (e.g. converts them to a certain format such as junit)
* Upload the results to a central location used by all the teams

* As a pipeline author, I would like to utilize shared `task` (which may result in step error) and configure the pipeline
to ignore such step error.


## References

* [Capture Exit Code, tektoncd/pipeline#2800](https://github.com/tektoncd/pipeline/issues/2800)
* [Add a field to Step that allows it to ignore failed prior Steps *within the same Task, tektoncd/pipeline#1559](https://github.com/tektoncd/pipeline/issues/1559)
* [Scott's Changes to allow steps to run regardless of previous step errors](https://github.com/tektoncd/pipeline/pull/1573)
* [Christie's Notes](https://docs.google.com/document/d/11wygsRe2d4G-wTJMddIdBgSOB5TpsWCqGGACSXusy_U/edit?resourcekey=0-skOAYQiz0xIktxYxCm-SFg) - Thank You, Christie!
1 change: 1 addition & 0 deletions teps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,4 +148,5 @@ This is the complete list of Tekton teps:
|[TEP-0035](0035-document-tekton-position-around-policy-authentication-authorization.md) | document-tekton-position-around-policy-authentication-authorization | implementable | 2020-12-09 |
|[TEP-0036](0036-start-measuring-tekton-pipelines-performance.md) | Start Measuring Tekton Pipelines Performance | proposed | 2020-11-20 |
|[TEP-0037](0037-remove-gcs-fetcher.md) | Remove `gcs-fetcher` image | implementing | 2021-01-27 |
|[TEP-0040](0040-ignore-step-error.md) | Ignore Step Error | proposed | 2021-02-02 |
|[TEP-0045](0045-whenexpressions-in-finally-tasks.md) | WhenExpressions in Finally Tasks | implementable | 2021-01-28 |

0 comments on commit 2729f73

Please sign in to comment.