[TEP-0140] Produce Results from Matrixed PipelineTasks #1047

EmmaMunley · 2023-07-31T16:20:45Z

TEP-0090: Matrix proposed executing a PipelineTask in parallel TaskRuns and Runs with substitutions from combinations of Parameters in a Matrix. Specifying Results in a Matrix was in scope in [TEP-0090][results], and is already supported. However, producing Results from PipelineTasks with a Matrix was out of scope to await
TEP-0075: Object Parameters and Results and TEP-0076: Array Results. This TEP aims to enable producing Results from PipelineTasks with a Matrix so that they can be used in subsequent PipelineTasks.

/kind TEP

vdemeester · 2023-07-31T16:50:20Z

/assign @lbernick

vdemeester · 2023-07-31T16:50:23Z

/assign

teps/0140-producing-results-in-matrix.md

lbernick · 2023-08-01T15:47:52Z

teps/0140-producing-results-in-matrix.md

+              - $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome
+              - $(tasks.matrix-emitting-results.results.report-url[3]) # linux-safari
+              - $(tasks.matrix-emitting-results.results.report-url[6]) # linux-firefox


What is the use case for dynamically fanning out combinations based on params, but then only using some of those combinations in later tasks? Would matrix include be better suited to these use cases?

The syntax here would only work if the param names are static right? If we added a new parameter e.g. android to the list then the task would produce unexpected results?

What is the use case for dynamically fanning out combinations based on params, but then only using some of those combinations in later tasks?

One of them is documented under Platforms and Browsers section.

In addition to this, there are many common CI/CD use cases with this kind of requirement in the industry. In microservices world that we are in today, most of the applications comprised of multiple repos, e.g. (1) Source Code (2) Configurations. To bless an application deployment in production, all the repos has to be built using the same build task which might be producing a digest as a task result. When it comes to deployment task, a digest from building a source repo is needed but not the configuration repo.

There is an additional use case which we discussed this morning of running the same scan task for a given GitHub repo and retrieving only a scan report from the instance which ran scan with X to continue execution. The rest of the results might be consumed by the rest of the tasks in a pipeline.

None of these use cases are coming from a single vendor.

Would matrix include be better suited to these use cases?

Matrix include is designed to specify a combination of params for a given task. This particular sample pipeline if we re-write using include, it will result in a very complex and verbose definition. We will have to spell out all possible combinations of params which were expected to be generated by the controller. And in turn, it defeats the purpose of introducing matrix. Include is a very useful feature for the use cases documented in the TEP.

The syntax here would only work if the param names are static right?

The syntax here is ideal when the param values are static. Its inherent nature of the params which allows pipeline and task definitions to provide static values. We are taking advantage of such static values in our own CI.

If we added a new parameter e.g. android to the list then the task would produce unexpected results?

The pipeline author when updates the static value of a param, he/she is generally aware of its dependent tasks which are consuming the results. The task would not produce any unexpected results, the pipeline author will have to re-map the indices to match with her/his expectation.

Indexing into an array param/results is not something new being introduced by this TEP. Indexing into an array results is already a beta feature - https://github.com/tektoncd/pipeline/blob/main/docs/pipelines.md#passing-one-tasks-results-into-the-parameters-or-when-expressions-of-another

All the concerns we have here also applies to the pipeline which consumes array results by indexing into an array result or to the pipeline which consumes array params by indexing into the param from a pipelineRun (without matrix).

This example will result in the same failure or will question the underlying task definition if the order of environments changes from [ 'staging', 'qa', 'prod'] to [ 'qa', 'prod', 'staging' ] in the pipelineRun.

I would like to continue addressing the rest of the comments if we all align on the use case and reach a consensus on this feature request such that @EmmaMunley can focus on updating the TEP if needed.

@lbernick @dibyom @vdemeester @afrittoli thoughts?

One of them is documented under Platforms and Browsers section.

To be honest, I'm not sure why someone would want to run tests on different combinations of browsers and platforms, but then only fetch some of the test results. Is there user feedback reflecting this use case?

To bless an application deployment in production, all the repos has to be built using the same build task which might be producing a digest as a task result. When it comes to deployment task, a digest from building a source repo is needed but not the configuration repo.

I'm a little bit confused about why you'd want to matrix a single build task for a source repo and a configuration repo. For example, let's say you're using kaniko build for your source repo. Does your configuration repo also contain a Dockerfile that should be built in parallel with the source repo? It seems more likely to me that a configuration repo would contain e.g. infrastructure as code and not a binary that should be built in parallel in the same way as the source repo.

There is an additional use case which we discussed this morning of running the same scan task for a given GitHub repo and retrieving only a scan report from the instance which ran scan with X to continue execution. The rest of the results might be consumed by the rest of the tasks in a pipeline.

Can you elaborate a bit more on why the rest of the pipeline is gated on only one of the scan tasks in this use case? @EmmaMunley it would be helpful to add this use case to the TEP proposal. It may also be helpful to add an example pipeline based on this use case, so we can show what the example would look like with different syntax options.

Would matrix include be better suited to these use cases?

Matrix include is designed to specify a combination of params for a given task. This particular sample pipeline if we re-write using include, it will result in a very complex and verbose definition. We will have to spell out all possible combinations of params which were expected to be generated by the controller.

@EmmaMunley would you be able to write out your example of selecting only certain matrixed results using include instead of indexing so we can compare the syntax in each case?

Indexing into an array param/results is not something new being introduced by this TEP... This example will result in the same failure or will question the underlying task definition if the order of environments changes from [ 'staging', 'qa', 'prod'] to [ 'qa', 'prod', 'staging' ] in the pipelineRun.

Thanks for this! I thought about it a bit more and this actually makes me more concerned about array indexing rather than making me more convinced we should use it here :(
The TEP that introduced array indexing explained the use case in this way: "Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that only take string parameters and this limits the interoperability between Tasks...Use case for array indexing: Unwinding loops (i.e. using one Task multiple times for some set of items)". Now that we can produce matrixed taskruns from array results, I wonder if array indexing is still the best way of doing this?

I would like to continue addressing the rest of the comments if we all align on the use case and reach a consensus on this feature request such that @EmmaMunley can focus on updating the TEP if needed.

Thanks very much @pritidesai! I would love to see the use cases you've identified here added to the TEP, with any related user feedback if we have it. It sounds like the main goal they have in common is to be able to use a result that was generated with a particular set of parameters; this goal would be helpful to have in the TEP as well, and then we can explore different solutions for achieving this goal.

To be honest, I'm not sure why someone would want to run tests on different combinations of browsers and platforms, but then only fetch some of the test results. Is there user feedback reflecting this use case?

It all depends on how they have configured their pipelines.

Its not about only fetch some of the test results. This example could be interpreted as classify and collect the reports for a given platform before sending it for further processing. There could be a subsequent set of parallel tasks to collect reports based on the platform and send them to their platform specific buckets.

pipelineTask: fetch-and-send-linux-reports pipelineTask: fetch-and-send-mac-reports pipelineTask: fetch-and-scan-windows-report

As a platform provider, we provide building blocks and its up to the users how they would like to build their choice of resource.

lbernick · 2023-08-01T15:52:35Z

teps/0140-producing-results-in-matrix.md

+
+### Support String Results produced by a matrix, Aggregated as An Array during Fanning Out - Referencing using indexes
+
+Since the ordering of a fanned out matrix is predeterministic due to sorting the combinations, we can reference results produced from different TaskRuns using indexing using the syntax  `$(tasks.<pipelinetask-name>.results.<result-name>[INDEX])`.  This approach will work for any matrix regardless if it has matrix params and/or include params.  A con to this approach is that users will need to know in advance how the order of the TaskRuns is determined and figure out what the order will be ahead of time if users need to reference or filter for a specific result. At the pipeline definition time, the parameters are unknown. Another con is that it might become exponentially more challenging as you scale with more combinations, making it less user friendly.


I'm concerned about array indexing into matrixed results for a few reasons:

UX issues: the user needs to run the pipeline to determine the correct indices to use. If they rename or change params, they will need to rewrite their pipeline.

Pipelines will not work for different input parameters. This conflicts with our reusability design principle (https://github.com/tektoncd/community/blob/main/design-principles.md#reusability) which states that "At run time (i.e. when invoking a Pipeline or Task via PipelineRun or TaskRun), users should be able to control execution as needed by their context without having to modify Tasks and Pipelines".

We cannot evolve the implementation of matrix. Deterministic ordering of taskruns was not intended to be a behavior users should rely on (see discussion here) and in fact, I think we should allow ordering to be randomized so that users don't come to depend on it. (This is what Go chose to do for the order of iteration over map keys, for a very similar reason: https://go.dev/doc/go1#iteration)

For now, I have descoped this TEP to only support emitting and consuming the entire array of results produced by fanned out TaskRuns as this is what everyone currently agrees on and marked this TEP implementable so I can continue to make progress, while we continue the discussion in the comments section and in the API working group around handling referencing results for a singular or specific combination(s) of params.

@lbernick @pritidesai @dibyom @Yongxuanzhang @vdemeester @afrittoli

lbernick · 2023-08-01T15:53:16Z

teps/0140-producing-results-in-matrix.md

+### Support only Matrix.Include Results by Result Name
+If the only use case we need to support is for explicit combinations, meaning only `Matrix.Include` without `Matrix Params`, then we can support results using the syntax of $(tasks.<include-name>.results.<result-name>) so that we can have an explicit mapping for each combination.
+Pros: This is a straightforward approach to map the combinations to the original params.
+Cons: This approach won’t support `Matrix` with `Params` or `Matrix` with both `Matrix` `Include` and `Matrix` `Params`.


I like this approach. Can we expand it to support using full array results fanned out from matrix params, but not array indexing of matrixed results?

teps/0140-producing-results-in-matrix.md

lbernick · 2023-08-01T19:05:30Z

teps/0140-producing-results-in-matrix.md

+
+In the example below, a user wants to run tests on a combination of different platforms and browsers and then fetch the reports for only linux platforms.
+```yaml
+apiVersion: tekton.dev/v1beta1


Would the following alternative syntax work? This uses pipelines in pipelines.

apiVersion: tekton.dev/v1beta1 kind: PipelineRun metadata: generateName: platforms-with-results spec: serviceAccountName: "default" pipelineSpec: tasks: - name: matrix-emitting-results matrix: params: - name: platform value: - linux - mac - windows - name: browser value: - chrome - safari - firefox pipelineSpec: tasks: - name: produce-results params: - name: platform default: "" - name: browser default: "" results: - name: report-url type: string steps: - name: produce-report-url image: alpine script: | echo "Running tests on $(params.platform)-$(params.browser)" echo -n "https://api.example/get-report/$(params.platform)-$(params.browser)" | tee $(results.report-url.path) - name: task-consuming-results when: - input: $(params.platform) operator: in value: ["linux"] params: - name: url value: $(tasks.produce-results.results.report-url) taskSpec: params: - name: url type: string steps: - name: echo image: alpine script: | for arg in "$@"; do echo "Arg: $arg" done

teps/0140-producing-results-in-matrix.md

dibyom · 2023-08-01T19:44:46Z

teps/0140-producing-results-in-matrix.md

+              - $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome
+              - $(tasks.matrix-emitting-results.results.report-url[3]) # linux-safari
+              - $(tasks.matrix-emitting-results.results.report-url[6]) # linux-firefox


The syntax here would only work if the param names are static right? If we added a new parameter e.g. android to the list then the task would produce unexpected results?

dibyom · 2023-08-01T19:51:12Z

teps/0140-producing-results-in-matrix.md

+        params:
+          - name: url
+            value:
+              - $(tasks.matrix-emitting-results.results.report-url[0]) # linux-chrome


spitballing here, if the use case is filter all the results where params.platform == linux, we could add some new syntax similar to JSONPath filters e.g. $(tasks.matrix-emitting-results.results.report-url?(@params.platform.value=='linux')]

teps/0140-producing-results-in-matrix.md

vdemeester · 2023-08-07T08:56:32Z

teps/0140-producing-results-in-matrix.md

+               echo -n "https://api.example/get-report/$(params.platform)-$(params.browser)" | tee $(results.report-url.path)
+```
+
+## Proposal


What happens if one of the matrixed Task fails (prior to write the result) ? Are we producing incomplete results array or no results at all ?

In the event that a Matrixed PipelineTask that produces Results fans out into multiple TaskRuns and one or more of the fanned out TaskRuns fail, the current behavior of allowing a failed TaskRun to still produce a Result would be supported tektoncd/pipeline#6510

Thanks Vincent for this question, I think it's interesting/important! There are actually a few related questions here IMO:

If a matrixed taskrun fails before writing a result, will the array result not exist or be length n - 1 or n (i.e. we provide some "zero value" for the result of a taskrun that doesn't produce a result that should) where n is the number of combinations? I think this is what Vincent was asking. IMO it should be length n - 1, and then once tep 50 is implemented, pipeline authors can choose whether to continue operating on the incomplete result even though the pipeline task is failed.

If a matrixed taskrun fails after writing a result, will the array result exist? I think feat: support to produce results from a failed task pipeline#6510 allows a taskrun to produce a result even if it failed after writing it, so it seems reasonable to me to have an array result populated for a matrixed pipeline task for consistency.

If a matrixed taskrun is supposed to produce a result, but doesn't, and still succeeds, should the array result not exist or be length n - 1 or n? Reading through TEP 48 I'm not sure exactly what the right call is here but I'd say an incomplete array result makes sense for consistency; the pipeline task will succeed so it wouldn't make sense to have it not produce a result.

In general I don't think it would be a good idea to produce "zero value" results for tasks that don't write the results to ensure the array result is the length of the number of combinations. (Note that this is another reason indexing might be a confusing UX: we'd have to either produce results for tasks that don't write them, or not produce an array result any time a task didn't write a result, even if the task succeeded.)

lbernick · 2023-08-14T16:11:35Z

teps/0140-producing-results-in-matrix.md

+
+1. Support a matrixed `pipelineTask` that produces `result` of type `array` or `object` that are then consumed by
+   another `pipelineTask`.
+  - A matrixed `pipelineTask` can support `result` of any type as long as they aren’t consumed by


I'm assuming they also can't be consumed in pipeline results, but you might be able to consume them with pipelines in pipelines or with a script?

Yes, these can be consumed by chains and in future will be helpful in consuming results from an individual instances and could be useful while implementing pipelines in pipelines or with a scripts.

teps/0140-producing-results-in-matrix.md

lbernick · 2023-08-14T16:18:58Z

teps/0140-producing-results-in-matrix.md

+#### Access Aggregated Results Length
+
+The pipeline authors can access the length of the array of aggregated results that were
+actually produced using the syntax: `tasks.<pipelineTaskName>.matrix.<resultName>`


I can't find this in our docs, but I thought it was possible to reference a full array result with our without the [*] at the end? If so, could we use a different syntax for this context variable?

I am not sure either, @Yongxuanzhang do you any insight into such syntax? We can always adapt if such syntax is already supported.

I think we need to add [*] for array results reference.
tep: https://github.com/tektoncd/community/blob/main/teps/0076-array-result-types.md#to-provide-values-for-array-params
doc: https://github.com/tektoncd/pipeline/blob/main/docs/tasks.md#substituting-array-parameters

Do we want to support the reference without using [*] here?

lbernick · 2023-08-14T16:19:46Z

teps/0140-producing-results-in-matrix.md

+
+The pipeline authors can access the length of the array of aggregated results that were
+actually produced using the syntax: `tasks.<pipelineTaskName>.matrix.<resultName>`
+and `finally.<pipelineTaskName>.matrix.<resultName>`. This will allow users to loop over the


Would finally.<pipelineTaskName>.matrix.<resultName> be used in pipeline results? I don't think it could be used in a subsequent task

Yes finally can be used in the pipeline results.

lbernick · 2023-08-14T16:20:32Z

teps/0140-producing-results-in-matrix.md

+The producer task must have defined a `result` of type `string`, `matrix` aggregates the results from each instance of
+the matrixed `pipelineTask` which can be consumed into a `param` or `when` expressions as type `array`.
+
+#### Missing Task Results


tekton-robot · 2023-08-21T09:15:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lbernick, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [lbernick,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

teps/0140-producing-results-in-matrix.md

[TEP-0090: Matrix][tep-0090] proposed executing a `PipelineTask` in parallel `TaskRuns` and `Runs` with substitutions from combinations of `Parameters` in a `Matrix`. Specifying `Results` in a `Matrix` was in scope in [TEP-0090][results], and is already supported. However, producing `Results` from `PipelineTasks` with a `Matrix` was out of scope to await [TEP-0075: Object Parameters and Results][tep-0075] and [TEP-0076: Array Results][tep-0076]. This TEP aims to enable producing `Results` from `PipelineTasks` with a `Matrix` so that they can be used in subsequent `PipelineTasks`. [tep-0075]: https://github.com/tektoncd/community/blob/main/teps/0075-object-param-and-result-types.md [tep-0076]: https://github.com/tektoncd/community/blob/main/teps/0076-array-result-types.md [tep-0090]: https://github.com/tektoncd/community/blob/main/teps/0090-matrix.md [tep-0118]: https://github.com/tektoncd/community/blob/main/teps/0118-matrix-with-explicit-combinations-of-parameters.md Signed-off-by: Priti Desai <[email protected]>

lbernick · 2023-08-21T16:11:32Z

API WG:
/lgtm

tekton-robot requested review from dibbles and piyush-garg July 31, 2023 16:20

tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 31, 2023

tekton-robot assigned lbernick Jul 31, 2023

tekton-robot assigned vdemeester Jul 31, 2023

lbernick reviewed Aug 1, 2023

View reviewed changes

dibyom reviewed Aug 1, 2023

View reviewed changes

teps/0140-producing-results-in-matrix.md Outdated Show resolved Hide resolved

dibyom reviewed Aug 1, 2023

View reviewed changes

Yongxuanzhang reviewed Aug 2, 2023

View reviewed changes

teps/0140-producing-results-in-matrix.md Show resolved Hide resolved

EmmaMunley force-pushed the produce-results-matrix branch from 4d3cb3b to 2e75724 Compare August 4, 2023 19:00

vdemeester reviewed Aug 7, 2023

View reviewed changes

EmmaMunley changed the title ~~TEP-0140: Produce Results from Matrixed PipelineTasks~~ [TEP-0140] Produce Results from Matrixed PipelineTasks Aug 7, 2023

afrittoli added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 7, 2023

EmmaMunley force-pushed the produce-results-matrix branch 2 times, most recently from 77981e4 to b3e2e4e Compare August 7, 2023 20:48

pritidesai force-pushed the produce-results-matrix branch 3 times, most recently from 9b58427 to 7f6e51e Compare August 11, 2023 00:31

pritidesai mentioned this pull request Aug 11, 2023

Produce Results from PipelineTasks fanned out in a Matrix tektoncd/pipeline#5266

Closed

lbernick approved these changes Aug 14, 2023

View reviewed changes

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2023

EmmaMunley force-pushed the produce-results-matrix branch from 7f6e51e to d0159ed Compare August 16, 2023 15:38

vdemeester approved these changes Aug 21, 2023

View reviewed changes

jerop reviewed Aug 21, 2023

View reviewed changes

teps/0140-producing-results-in-matrix.md Outdated Show resolved Hide resolved

jerop reviewed Aug 21, 2023

View reviewed changes

teps/0140-producing-results-in-matrix.md Outdated Show resolved Hide resolved

EmmaMunley force-pushed the produce-results-matrix branch from d0159ed to 5460a22 Compare August 21, 2023 14:22

EmmaMunley force-pushed the produce-results-matrix branch from 5460a22 to 21b49ff Compare August 21, 2023 14:23

EmmaMunley force-pushed the produce-results-matrix branch from 21b49ff to 8100b7a Compare August 21, 2023 14:35

EmmaMunley mentioned this pull request Aug 21, 2023

Change ordering of Matrixed TaskRuns Produced to Not Be Deterministic tektoncd/pipeline#7056

Closed

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2023

tekton-robot merged commit fe0c2bb into tektoncd:main Aug 21, 2023

EmmaMunley deleted the produce-results-matrix branch August 21, 2023 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEP-0140] Produce Results from Matrixed PipelineTasks #1047

[TEP-0140] Produce Results from Matrixed PipelineTasks #1047

EmmaMunley commented Jul 31, 2023 •

edited

Loading

vdemeester commented Jul 31, 2023

vdemeester commented Jul 31, 2023

lbernick Aug 1, 2023

dibyom Aug 1, 2023

pritidesai Aug 1, 2023 •

edited

Loading

lbernick Aug 3, 2023 •

edited

Loading

pritidesai Aug 4, 2023

lbernick Aug 1, 2023

EmmaMunley Aug 4, 2023

lbernick Aug 1, 2023

lbernick Aug 1, 2023

dibyom Aug 1, 2023

dibyom Aug 1, 2023 •

edited

Loading

vdemeester Aug 7, 2023

EmmaMunley Aug 7, 2023

lbernick Aug 8, 2023

lbernick Aug 14, 2023

pritidesai Aug 15, 2023

lbernick Aug 14, 2023

pritidesai Aug 15, 2023

Yongxuanzhang Aug 15, 2023

lbernick Aug 14, 2023

pritidesai Aug 15, 2023

lbernick Aug 14, 2023

tekton-robot commented Aug 21, 2023

lbernick commented Aug 21, 2023


		### Support String Results produced by a matrix, Aggregated as An Array during Fanning Out - Referencing using indexes

		Since the ordering of a fanned out matrix is predeterministic due to sorting the combinations, we can reference results produced from different TaskRuns using indexing using the syntax `$(tasks.<pipelinetask-name>.results.<result-name>[INDEX])`. This approach will work for any matrix regardless if it has matrix params and/or include params. A con to this approach is that users will need to know in advance how the order of the TaskRuns is determined and figure out what the order will be ahead of time if users need to reference or filter for a specific result. At the pipeline definition time, the parameters are unknown. Another con is that it might become exponentially more challenging as you scale with more combinations, making it less user friendly.

[TEP-0140] Produce Results from Matrixed PipelineTasks #1047

[TEP-0140] Produce Results from Matrixed PipelineTasks #1047

Conversation

EmmaMunley commented Jul 31, 2023 • edited Loading

vdemeester commented Jul 31, 2023

vdemeester commented Jul 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pritidesai Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

lbernick Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dibyom Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tekton-robot commented Aug 21, 2023

lbernick commented Aug 21, 2023

EmmaMunley commented Jul 31, 2023 •

edited

Loading

pritidesai Aug 1, 2023 •

edited

Loading

lbernick Aug 3, 2023 •

edited

Loading

dibyom Aug 1, 2023 •

edited

Loading