Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0060: Propose remote resolution experimental controller #493

Merged
merged 1 commit into from Aug 31, 2021
Merged

TEP-0060: Propose remote resolution experimental controller #493

merged 1 commit into from Aug 31, 2021

Conversation

ghost
Copy link

@ghost ghost commented Aug 4, 2021

This commit updates the proposed remote resolution feature with a suggestion to
develop a controller in our experimental repo to try out some of the
Alternatives.

In order to support such an experimental controller two changes are also
proposed for Tekton Pipelines that would make such a controller much easier to
develop and test.

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 4, 2021
@ghost
Copy link
Author

ghost commented Aug 4, 2021

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 4, 2021
@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 5, 2021
@pritidesai
Copy link
Member

Keeping this proposal open for a week, giving folks a chance to review if anyone is interested 🧑‍💻

@bobcatfish
Copy link
Contributor

/assign

@tekton-robot tekton-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 12, 2021
In order to support an experimental controller independently performing
resolution we could add an arg to Tekton Pipelines' controller (e.g.
`--experimental-disable-in-tree-resolution`) that completely disables the
built-in resolution logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if there's an alternative where we could leverage the "pending" state - e.g. if you want to use the experimental controller, you create "pending" pipelineruns, then the experimental resolution controller can do it's thing (embedding the spec) and remove the "pending" status

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree this is an option for PipelineRuns and I considered it. We don't have that Pending feature for TaskRuns though. Should we gate this work on that being decided on and delivered first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right - yeah that's kind of a pain 🤔

at the least i think the deciding part of adding "pending" for taskruns would be very brief in that the PR that was accepted for the pending tep actually was called "TEP-0015 - Add a pending setting to Tekton PipelineRun and TaskRuns" (#203) (more discussion in the PR that adds the feature tektoncd/pipeline#3522 (comment)) - so since we accepted adding pending PipelineRuns, im not sure why we wouldnt accept pending TaskRuns

I agree with you tho, making this TEP depend on that happening would be putting yet another obstacle in the path of this TEP and I'd rather accept the lesser evil of an extra config option than make it take even longer to start experimenting with this

(and as a side note, ive had this awful/wonderful idea recently that we'd be better off making the pending setting more like a semaphore - i.e. what if you have multiple controllers that need to sign off before executing a pending run? maybe there's a better way than that, e.g. a list of explicit controllers you want to process a run before it executes? LONG STORY SHORT i think you're making the right call)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and as a side note, ive had this awful/wonderful idea recently that we'd be better off making the pending setting more like a semaphore - i.e. what if you have multiple controllers that need to sign off before executing a pending run? maybe there's a better way than that, e.g. a list of explicit controllers you want to process a run before it executes? LONG STORY SHORT i think you're making the right call)

YES. This is an excellent point. Very tricky edge case if a user wants both resolution and has their own pending processing logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I've also updated the doc to include mention of this alternative approach)

also other Tekton projects. Having the metadata copy performed by an external
controller doesn't make a ton of sense in that case: metadata copying is a
feature/requirement of Tekton Pipelines, not a general requirement for
resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain a bit more about this? im not totally following because:

  1. afaik the experimental resolution controllers know enough about pipelineruns + taskruns to modify them directly, so in that sense they're not generic enough to use for anything else (maybe im missing something tho)
  2. metadata like labels and annotations is pretty universal across k8s resources, so in that sense having some logic around this even in a very generic tool seems reasonable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolution controller I'm proposing for the experimental repo would aim to deliver the functionality that we've been discussing through TEP-0060, which has now largely boiled down to: (1) It should be a separate project (2) It should process the resources before Pipelines sees them and (3) it shouldn't be Pipelines-specific.

It's true that metadata is a universal feature of objects in k8s. But the behaviour where metadata is copied verbatim from Tasks to TaskRuns and Pipelines to PipelineRuns is quite specifically Tekton Pipelines I think? Let's say we open the system up in such a way that anyone can write a Resolver. Should we require their resolver to also perform this Pipelines-specific metadata copying action? I guess I'm not convinced that this is an action we want left to external resolvers to perform. Those labels / annotations can trip up the pipelines controller when they're missing and are part of our backwards-compat story I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the behaviour where metadata is copied verbatim from Tasks to TaskRuns and Pipelines to PipelineRuns is quite specifically Tekton Pipelines I think?

i wonder if one of the reasons we're copying around metadata like this is because we needed a better way to query pipelineruns and taskruns - which maybe is the role that tekton results fills now? might be something we want to consider for v1

I guess I'm not convinced that this is an action we want left to external resolvers to perform.

in that case im leaning more toward the approach where we embed the entire resolved yaml document (or as much of it as we know the controller needs) 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, possibly the biggest reason for the metadata duplication is about querying the cluster for PipelineRuns associated to Pipelines (and TaskRuns to Tasks). I definitely agree that Results might be a better host for this kind of relational data.

Just to throw an additional wrinkle (because I'm also generally in favor of embedding the entire yaml doc): if we go the route of embedding the entire resolved yaml I imagine there will be some amount of time where we need to store both pipelineSpec and (hypothetical) pipelineYAML in single PipelineRuns' status for compatibility's sake. In turn I wonder if this would push PipelineRuns much more quickly towards etcd's size limits when very large Pipelines are involved. wdyt?

Copy link
Author

@ghost ghost Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't quite finish my thought inre: metadata duplication. While querying associations is one very big use case I think there are also probably others and it would probably behoove us to enumerate those and make sure they're accounted for before we decide to drop the copying behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine there will be some amount of time where we need to store both pipelineSpec and (hypothetical) pipelineYAML in single PipelineRuns' status for compatibility's sake.

hmm I'm trying to think through the scenario where we'd need to do this - all I can think of is if the controller is downgraded during pipelinerun execution?

The current controller expects to find a pipeline spec in the pipelinerun status - the updated version would expect to find the entire pipelineYAML but could also support the pipeline spec if it found it (we could wait until we're outside the compatibility window to remove that support) - but most cases wouldnt require actually storing both i dont think?

scenarios i can think of:

  1. upgrade from "pipeline spec" controller to new version:
    • completed pipelineruns would contain "pipeline spec" in status, doesnt matter cuz they're done
    • in progress pipelineruns would be using "pipeline spec" which would be okay as long as the new version still supported it
    • new pipelineruns would start using "pipelineYAML"
  2. downgrade from new version of the controller to "pipeline spec" controller
    • completed pipeline runs contain "pipelineYAML", doesnt matter cuz they're done (possible that on controller start up it gets upset about unexpected fields?)
    • in progress pipeline runs would be in trouble - the old controller wouldn't know how to read "pipelineYAML" out of their specs
    • new pipeline runs would use "pipeline spec"

Do we need to support the downgrade case in (2)? (specifically: support runs which execute during the update?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was specifically thinking of users who have written code that expects the pipelineSpec to be embedded in the status rather than scenarios where the system itself might be caught in an intermediate state. But it sounds like you don't think this is a situation we need to worry about?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've updated the proposal to pass the entire resolved yaml via a taskYaml field for taskruns and a pipelineYaml field for pipelineruns.

FYI @vdemeester and @dibyom - this is a change to the proposal so probably worth giving it one more look to confirm you're happy to keep approving.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving both metadata and spec under a taskYAML or pipelineYAML field sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users who have written code that expects the pipelineSpec to be embedded in the status

ah that's a good point, i was only thinking about this from the controller's perspective - i always forget about status as part of the api 😅

you're probably right, it's probably best to support both - maybe we can use (even more) config options to control whether we do that or not?

@bobcatfish
Copy link
Contributor

/assign vdemeester
/assign dibyom

@ghost
Copy link
Author

ghost commented Aug 17, 2021

The refactoring I'd started in pipelines#4108 ended up being far too invasive to deliver in a single PR. I've instead added the new resolving reconcilers behind flags in a way that's similar to what's described by this proposal. This protects pipelines' users from being exposed to the new changes until we're ready to commit to them. It also clears the way for external processes (like the experimental controller proposed here) to perform remote resolution.

@jerop
Copy link
Member

jerop commented Aug 23, 2021

needs re-approvals with the TEP changes, please take a look @tektoncd/core-maintainers

When the proposed experimental resolution controller sees an unresolved
`TaskRun` or `PipelineRun` it will read the resource and then resolve whatever
ref, bundle or inline document is included. The resolved `spec` will then be
embedded in the `status.taskSpec` field for `TaskRuns` or the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be status.taskYAML.spec and status.pipelineYAML.spec?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, thank you! Fixed and pushed.

also other Tekton projects. Having the metadata copy performed by an external
controller doesn't make a ton of sense in that case: metadata copying is a
feature/requirement of Tekton Pipelines, not a general requirement for
resolution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolving both metadata and spec under a taskYAML or pipelineYAML field sounds good to me.

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking great to me, i really like your thorough incremental approach

/approve


There are two approaches that I've considered for solutions to this:

1. Change the contract such that the _entire_ resolved YAML document is passed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the risk of making this too complicated and unreliable, i wonder if we should identify some exceptions to this right off the bat

I was just looking at a Task in the dogfooding cluster and the metadata looks like this:
image

I'm wondering if we want to identify a list of metadata fields and specific annotations we don't copy, e.g.:

  • the annotation kubectl.kubernetes.io/last-applied-configuration
  • managedFields

@_@ OR we start with only labels? we could pave the way for potentially embedding the entire thing by making the field able to store the entire yaml, but initially only store a certain subset that we know we need?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a note to decide precisely how we filter the copied metadata as part of this experimental phase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we should exclude storing manageFields and last-applied-configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with mention of managedFields in addition to last-applied-configuration.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, dibyom, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021
This commit updates the proposed remote resolution feature with a suggestion to
develop a controller in our experimental repo to try out some of the
Alternatives.

In order to support such an experimental controller two changes are also
proposed for Tekton Pipelines that would make such a controller much easier to
develop and test.
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 31, 2021
@bobcatfish
Copy link
Contributor

Looking at the working group notes, it looks like the only reason this wasn't LGTM-ed in the meeting was b/c it needed a rebase, so:

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 31, 2021
@tekton-robot tekton-robot merged commit bb8b823 into tektoncd:main Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Proposed
Development

Successfully merging this pull request may close these issues.

6 participants