tep to ignore task failures

Adding a tep to ignore task failures, allowing pipeline authors to unblock execution after a single failure
tektoncd · Feb 5, 2021 · b0fb546 · b0fb546
1 parent 8c30b1d
commit b0fb546
Show file tree

Hide file tree

Showing 2 changed files with 103 additions and 0 deletions.
diff --git a/teps/0050-ignore-task-failures.md b/teps/0050-ignore-task-failures.md
@@ -0,0 +1,102 @@
+---
+status: proposed
+title: 'Ignore Task Failures'
+creation-date: '2021-02-05'
+last-updated: '2021-02-05'
+authors:
+- '@pritidesai'
+---
+
+# TEP-0040: Ignore Task Failures
+
+<!-- toc -->
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+- [Requirements](#requirements)
+  - [Use Cases](#use-cases)
+- [References](#references)
+<!-- /toc -->
+
+## Summary
+
+Tekton pipelines are defined as a collection of tasks in which each task is executed as a pod on a Kubernetes cluster.
+Tasks are scheduled and executed in directed acyclic graph where each task represents a node on the graph. Two nodes
+or two tasks are connected by an edge which is defined using either resource dependency (`from` or `task results`) or
+ordering dependency (`runAfter`). One single task failure results in a pipeline failure i.e. a task resulting in a
+failure blocks executing the rest of the graph. 
+
+```yaml
+$ kubectl get pr pipelinerun-with-failing-task-csmjr -o json | jq .status.conditions
+[
+  {
+    "lastTransitionTime": "2021-02-05T18:51:15Z",
+    "message": "Tasks Completed: 1 (Failed: 1, Cancelled 0), Skipped: 3",
+    "reason": "Failed",
+    "status": "False",
+    "type": "Succeeded"
+  }
+]
+```
+
+Tekton [catalog](https://github.com/tektoncd/catalog) has a wide range of `tasks` which are designed to be reusable
+in many pipelines. As a pipeline execution engine, we encourage the pipeline authors to utilize arbitrary tasks from
+the Tekton catalog. But, many common pipelines have the requirement where a task failure must not block executing the
+rest of the tasks.
+
+A pipeline author has an option to utilize `finally` section of the pipeline in which all the final tasks are executed
+after all the tasks in a graph have completed regardless of success or failure. `finally` has its own advantages and
+very helpful in various use cases including notifications, cleanup, etc.
+
+But, the pipeline authors does not have the flexibility to unblock executing the rest of the graph after experiencing a
+single task failure.
+
+
+## Motivation
+
+It should be possible to utilize tasks from the Tekton catalog in a pipeline. A pipeline author has no
+control over the task definitions but may desire to ignore a failure and continue executing the rest of the graph.
+
+
+### Goals
+
+* Design a task failure strategy so that the pipeline author can control the behavior of the underlying tasks 
+  and decide whether to continue executing the rest of the graph in the event of failure.
+
+* Be applicable to any pipeline with references to the tasks in a catalog or inlined task specifications.
+
+### Non-Goals
+
+* Not an alternative to combining the tasks in a pipeline which is covered in
+  [TEP-0044 Composing Tasks with Tasks](https://github.com/tektoncd/community/pull/316).
+
+* Not optimizing pipeline runtime which is covered in
+  [TEP-0046 PipelineRun in a Pod](https://github.com/tektoncd/community/pull/318).
+
+## Requirements
+
+* Users should be able to use any task from the catalog without having to alter its specification to allow that task to
+  fail without stopping the execution of a pipeline.
+
+* It should be possible to know that a task failed, and the rest of the graph was allowed to continue by observing
+  the status of the `PipelineRun`.
+
+
+### Use Cases
+
+* As a pipeline author, I would like to design a pipeline where a task running
+  [unit tests](https://github.com/tektoncd/catalog/tree/master/task/golang-test/0.1) might fail,
+  but can continue running integration tests, so that my pipeline can identify failures in both the tests.
+
+* As a pipeline author, I would like to design a pipeline where a task running
+  [linting](https://github.com/tektoncd/catalog/tree/master/task/golangci-lint/0.1) might fail,
+  but can continue running tests, so that my pipeline can report failures from the linting and all the tests.
+
+* As a new Tekton user, I want to migrate existing workflows from the other CI/CD systems that allowed a
+  similar task unit of failure.
+
+
+## References
+
+* [TEP-0040 Ignore Step Errors](https://github.com/tektoncd/community/pull/302)
diff --git a/teps/README.md b/teps/README.md
@@ -150,3 +150,4 @@ This is the complete list of Tekton teps:
 |[TEP-0037](0037-remove-gcs-fetcher.md) | Remove `gcs-fetcher` image | implementing | 2021-01-27 |
 |[TEP-0039](0039-add-variable-retries-and-retrycount.md) | Add Variable `retries` and `retry-count` | proposed | 2021-01-31 |
 |[TEP-0045](0045-whenexpressions-in-finally-tasks.md) | WhenExpressions in Finally Tasks | implementable | 2021-01-28 |
+|[TEP-0050](0050-ignore-task-failures.md) | Ignore Task Failures | proposed | 2021-02-05 |