Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update enhancement proposal to support update of user-defined AWS tags #1000

Merged
merged 38 commits into from
Feb 25, 2022

Conversation

TrilokGeer
Copy link
Contributor

This PR addresses design considerations to support update of user-defined AWS tags.
Also, details are listed for graduation criteria for tech preview and GA.

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2022

Hi @TrilokGeer. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enhancements/api-review/custom-tags-aws.md Outdated Show resolved Hide resolved
@@ -75,10 +75,70 @@ will apply these tags to all AWS resources they create.

userTags that are specified in the infrastructure resource will merge with userTags specified in an individual component. In the case where a userTag is specified in the infrastructure resource and there is a tag with the same key specified in an individual component, the value from the individual component will have precedence and be used.

The userTags field is intended to be set at install time and is considered immutable. Components that respect this field must only ever add tags that they retrieve from this field to cloud resources, they must never remove tags from the existing underlying cloud resource even if the tags are removed from this field(despite it being immutable).
Users can update the `resourceTags` by editing `.spec.aws` of the `infrastructure.config.openshift.io` type. New addition of tags are always appended. Any update in the existing tags, which are added by installer or by edit in `resourceTags`, will replace the previous tag value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that all resources will be updated with the new tag values? Are there any that are create and forget?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that all resources will be updated with the new tag values?

As I understand it, the point of this PR is to make it so.

Are there any that are create and forget?

This needs to be clarified in this PR. The enhancement summary only lists one component (the ingress operator) that required changes for the original user-defined AWS tags API implementation. However, there are probably other components that also required changes. We'll need to make sure that every team that has a component that required changes be aware of this update to the API. We really don't want to get into a state where updating tags is kind of sort of supported because some components handle updates while others do not.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that all resources will be updated with the new tag values? Are there any that are create and forget?

@JoelSpeed EBS would be a candidate as we lose track of the volumes and won't attempt to re-tag existing ones. This should also be discussed here, I think we added this into our user story back then when we refined it.

@Miciah you can find some overview of all components that have tags and need them updated as stories in the epic: https://issues.redhat.com/browse/CFE-69

I agree that this should be included in here as a listing and we should get the respective teams to review too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we find out which components create other AWS resources, for example, I'm working on adding support for a new AWS resource, Placement Groups, and at the moment, we aren't tagging it with user defined tags. We will need to do so for GA but it's not in the MVP.

This is another resource where we are creating but not updating the object and so I'm not sure how the tagging would work here, I guess it will be like the EBS situation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we find out which components create other AWS resources, for example, I'm working on adding support for a new AWS resource, Placement Groups, and at the moment, we aren't tagging it with user defined tags. We will need to do so for GA but it's not in the MVP.

I've been looking into this a couple months back, one can tap into CloudTrail and see what resources were PUT by the IAM user that created the cluster and then request the resource tags by the respective resource ID (then check for missing tags etc). We have to setup a proper environment for this I believe, so we didn't pursue this any further because of the effort.

Do you know how many new AWS (taggable) resources we introduce into core OpenShift per release? I was under the impression that this will continuously reduce towards zero as we focus on addon and component optionality. Placement Groups is certainly a great counter-example :)

This is another resource where we are creating but not updating the object and so I'm not sure how the tagging would work here, I guess it will be like the EBS situation?

I guess so, and there will for sure be other cases that could also be outside of our control in the future. We need to ensure to document them properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking into this a couple months back, one can tap into CloudTrail and see what resources were PUT by the IAM user that created the cluster and then request the resource tags by the respective resource ID (then check for missing tags etc). We have to setup a proper environment for this I believe, so we didn't pursue this any further because of the effort.

This sounds like something we could possibly codify into an E2E test, have you considered that?

Do you know how many new AWS (taggable) resources we introduce into core OpenShift per release? I was under the impression that this will continuously reduce towards zero as we focus on addon and component optionality. Placement Groups is certainly a great counter-example :)

Our team frequently adds support for new EC2 features to the Machine API. Sometimes these rely on creating additional resources, as with the Placement Group example. I doubt we will see that slow down as we still have many customer RFEs asking for support for various use cases on each of the clouds

I guess so, and there will for sure be other cases that could also be outside of our control in the future. We need to ensure to document them properly.

Agreed, would be good to enumerate all of the components and the resources they create and have that documented. Sounds like your CloudTrail thing could be used to generate a list?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like something we could possibly codify into an E2E test, have you considered that?

I do remember that there was a problem with this approach and thus we dropped it, @TrilokGeer it's probably good to spin up a R&D task to figure out if we can test this case in the e2e too. @thejasn is working on testing right now I believe.

Agreed, if we can automate this to generate a list, that would be even cooler!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjungblu @JoelSpeed yes, this can be checked out. Thanks.


The precedence helps to maintain creator/updator tool (in-case of external tool usage) remains same for user-defined tags which are created correspondingly.

The userTags field is intended to be set at install time and updatable (not allowed to delete). Components that respect this field must only ever add tags that they retrieve from this field to cloud resources, they must never remove tags from the existing underlying cloud resource even if the tags are removed from this field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we probably will want to have some way to deny removal of tags from the infrastructure spec. That or, we have the spec as a source of requested tags, where users can make changes freely, but the status is where other components copy from and, we have the config controller sync the spec to status based on the rules outlined in this document.

The latter of these is probably easier to implement and gives the benefit of an "observed config" which makes it easy for end users to see what OpenShift thinks it should be applying

I left a comment describing some of this on the API PR, openshift/api#1064 (comment)

One note, if we do go down the route of a sync between spec and status, is how we will deal with the 25 tag limit from users, we will need to know which tags are added by users into the status and which are openshift added (assuming openshift is adding tags to the status list as well)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep clarity of Desired (spec) and Observed (status) and still restrict unsupported values - with dynamic admission control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a dynamic admission control for this, though at the moment, I don't think we have anything set up to observe the infrastructure object.

If that's the route you want to take here, I would consult the API team for their suggestion as you will need to create and manage a validating and/or mutating webhook which comes with some overhead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great point, thanks.

The idea is to have a "observed config" way to be aware of how Openshift deals with the tags.

.status.platformStatus.aws.resourceTags reflects the present set of userTags. In case when the userTags are created by installer or newly added in infrastructure resource is updated on the individual component directly using external tools, the value from infrastructure resource will have the precedence.

Also, the precedence of user tag updates can be maintained.

The precedence helps to maintain creator/updator tool (in-case of external tool usage) remains same for user-defined tags which are created correspondingly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like we probably will want to have some way to deny removal of tags from the infrastructure spec.

just my 2cents here. I think it's totally fine from a customer PoV to remove tags from the infra spec to set these tags as "unmanaged" by OpenShift. Imagine switching to a 3rd party component to manage tags like that one: https://github.com/mtougeron/k8s-aws-ebs-tagger

Would be bad if a customer would need to reinstall the entire cluster to get rid of the tag management.

What we continue to NOT do, is try to be smart about deleting existing tags on resources or detecting what we created vs. what was already there and such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be bad if a customer would need to reinstall the entire cluster to get rid of the tag management.

This is a pretty good argument, I'd say this needs writing out in the enhancement to explain that if that's the route we want to take. The existing language saying not to delete the tags will need to be updated/clarified

Copy link
Contributor

@staebler staebler Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is how we will deal with the 25 tag limit from users

The reality is even worse. Since S3 buckets can have at most 10 tags, and OpenShift is already using 2 of those, the user can only add 8 more tags.

enhancements/api-review/custom-tags-aws.md Outdated Show resolved Hide resolved
@staebler
Copy link
Contributor

/cc @staebler

@TrilokGeer
Copy link
Contributor Author

/assign @dhellmann

@tjungblu
Copy link

/assign @tjungblu
/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 12, 2022
@TrilokGeer
Copy link
Contributor Author

/label tide/merge-method-squash

@openshift-ci openshift-ci bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jan 12, 2022
TrilokGeer and others added 14 commits February 23, 2022 16:29
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1 similar comment
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Contributor

Given the volume of outstanding comments on this, I think for now we should remove the approval to prevent an accidental merge

/approve cancel

@TrilokGeer
Copy link
Contributor Author

I'm still of the opinion that until we address resources created by the installer adding support for updating the user-defined tags is severely incomplete. Perhaps we could have an operator that behaves similar to how the installer's destroy behaves, namely by using the resource tagging API to search for resources with the kubernetes.io/.../cluster=owned tag. The operator would reconcile on changes to the infrastructure CR to ensure that resources owned by the cluster have tags matching what is in the infrastructure CR. That would get tricky when it came to resources owned by in-cluster operators, though, particularly with tags that may have different values in the corresponding kube resources.

For now, I think, it is better to have limitation.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 25, 2022

@TrilokGeer: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@tjungblu
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2022
@openshift-merge-robot openshift-merge-robot merged commit e2d3fc8 into openshift:master Feb 25, 2022
@JoelSpeed
Copy link
Contributor

I do not think this was ready to merge. There are a number of outstanding comments which have not been addressed and I'm still not clear exactly how this is going to work given we have some resources that have controllers and some that do not. IMO this should be reverted and a new PR should be opened to continue addressing the outstanding comments

@sdodson
Copy link
Member

sdodson commented Feb 25, 2022

Reverted in #1044. @TrilokGeer Can you please open a new PR to re-introduce these changes and address outstanding questions there? Also, before we merge that we'll want to squash it down so that it's either one atomic change or at least a minimal set of logically grouped changes rather than 38 commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.