Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add var automatic_channel_upgrade #281

Merged
merged 4 commits into from
Dec 15, 2022
Merged

feat: add var automatic_channel_upgrade #281

merged 4 commits into from
Dec 15, 2022

Conversation

the-technat
Copy link
Contributor

@the-technat the-technat commented Dec 9, 2022

Since AKS now supports automatic upgrades, we @swisspost like to use this feature. I addressed the concerns described in the issue with the following tests:

  • Created cluster on v1.23.8 using examples/startup.
  • Configured automatic_channel_upgrade=patch using Terraform
  • Manually upgrade cluster to v1.23.12 using Azure CLI
  • Rerun Terraform

=> As expected Terraform will downgrade the cluster to v1.23.8, same must the case when automatically patching minor versions and specifying the version.

So the correct usage would be that one specifies only the minor version (e.g 1.23) and enables automatic patch upgrade or omits the version field completely and enables automatic minor upgrades. But since Terraform variable validation can't reference other variables I can't catch misconfiguration there. But If I'm honest, it makes absolutely sense to me, that you don't specify the patch version if you want automatic upgrades.

Note: the default value must be null and note none as the Provider docs would suggest. I got errors in validation when trying to pass none as default value.

Issue number

Closes #280

Checklist before requesting a review

  • The pr title can be used to describe what this pr did in CHANGELOG.md file
  • I have executed pre-commit on my machine
  • I have passed pr-check on my machine

@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

Potential Breaking Changes in 07aef57:
[update] "Variables.log_analytics_solution_id.Nullable" from 'true' to ''
[update] "Variables.log_analytics_workspace.Nullable" from 'true' to ''
[update] "Variables.log_analytics_workspace_resource_group_name.Nullable" from 'true' to ''

variables.tf Show resolved Hide resolved
variables.tf Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Member

@lonegunmanb lonegunmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @the-technat thanks for opening this pr! The pr-check job failed so would you please run the pre-commit and pr-check job on your machine as the document instructed then try again?

I also recommend to add this new variable in examples/startup's code so our e2e test could cover it. Again, thanks for your contribution!

main.tf Outdated Show resolved Hide resolved
variables.tf Outdated Show resolved Hide resolved
variables.tf Outdated Show resolved Hide resolved
main.tf Outdated Show resolved Hide resolved
main.tf Outdated Show resolved Hide resolved
main.tf Outdated
condition = (var.automatic_channel_upgrade == null || (
// or we have set it to patch and only specify the minor version
contains(["patch"], var.automatic_channel_upgrade) &&
matches("^[0-9]{1,}\\.[0-9]{1,}$", var.kubernetes_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any special reason we should use this regex here instead of determining whether var.kubernetes_version is null or empty?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lonegunmanb when using the patch channel the user can specify a minor version, like 1.20 or 1.21, but the user cannot specify a patch version, like 1.20.0 or 1.21.4.
This is because if the cluster is autoupgraded to a newer patch version because of the channel setting, we dont want Terraform to roll back that change because a specific patch version is specified in the variable kubernetes_version.
Instead setting a minor version does not trigger a rollback when the patch version is upgraded by the channel

Copy link
Contributor Author

@the-technat the-technat Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Simply checking if it's null is only viable if someone set automatic_channel_upgrade=stable|rapid since then AKS automatically updates to new minor versions and no one only specifies the major version I guess. But with automatic patch upgrades we want to make sure that a valid minor version (e.g 1.24 or 1.25) is specified, thus the regex matching those versions.

Copy link
Member

@lonegunmanb lonegunmanb Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not an aks expert, I'll leave the shot to @zioproto , you make the call.

@zioproto
Copy link
Collaborator

@the-technat I have to go back on the nullable = true removals in the variables.tf file. Those changes were correct. The CI was updated, and now tflint shows as redundant to have nullable = true when the default value is already null.

Please dont fix it in this PR. Could you please rebase your branch on the current master branch ? This will solve that issue.

@the-technat
Copy link
Contributor Author

@zioproto, @lonegunmanb I addressed some of your suggestions already, rebased the PR and fixed the pre-commit/pr-test check.

I'm happy to add some unit tests as well but first we need to find the correct way of validation. @lonegunmanb suggested moving the checks into locals.tf cause they are quite complex. I like this idea but locals can only control the value of something, not if it get's applied or not, so what about doing the following:

  • remove automatic_channel_upgrade variable completely
  • set the automatic_channel_upgrade filed on the cluster resource based on the version the user specifies
    • if kubernetes_version=1.23.12, set automatic_channel_upgrade=null
    • if kubernetes_version=1.23, set automatic_channel_upgrade=patch
    • if kubernetes_version=null, set automatic_channel_upgrade=stable (or rapid, maybe controlled via extra input var)

@lonegunmanb
Copy link
Member

@zioproto, @lonegunmanb I addressed some of your suggestions already, rebased the PR and fixed the pre-commit/pr-test check.

I'm happy to add some unit tests as well but first we need to find the correct way of validation. @lonegunmanb suggested moving the checks into locals.tf cause they are quite complex. I like this idea but locals can only control the value of something, not if it get's applied or not, so what about doing the following:

  • remove automatic_channel_upgrade variable completely

  • set the automatic_channel_upgrade filed on the cluster resource based on the version the user specifies

    • if kubernetes_version=1.23.12, set automatic_channel_upgrade=null
    • if kubernetes_version=1.23, set automatic_channel_upgrade=patch
    • if kubernetes_version=null, set automatic_channel_upgrade=stable (or rapid, maybe controlled via extra input var)

I think we could just extract the expression we're using now for the condition in the precondition block into a new local variable, and this expression would only referred variable, so we can mock different values to test different cases. Could we do that?

@the-technat
Copy link
Contributor Author

@lonegunmanb ah now I see what you meant. Yeah that should work. I'll try to do it.

- AKS clusters can now be automatically upgraded to new patch or minor versions
- Added unit tests to covert differnet auto-upgrade settings
- Extended startup example to enable automatic patch upgrades (most common use-case)

Signed-off-by: Nathanael Liechti <[email protected]>
@the-technat
Copy link
Contributor Author

@lonegunmanb I added a table-driven unit test to ensure that the upgrade check produces the right result. Is this what you expected?

@the-technat
Copy link
Contributor Author

Tests still fail but locally they pass.

@lonegunmanb
Copy link
Member

Tests still fail but locally they pass.

No worries, I'll check the test.

@lonegunmanb
Copy link
Member

Tests still fail but locally they pass.

Apology for the inconvenience @the-technat , after a quick glance your unit tests are good, the failure was caused by Checkov check. I'll keep digging.

@lonegunmanb
Copy link
Member

lonegunmanb commented Dec 14, 2022

Hi @the-technat , thanks for your patience.

It looks like Checkov (the policy checking tool we're using now in this pipeline) was broken, to fix the Checkov we need upgrade Checkov's version in our CI runner image, but that brought another issue, in newer Checkov they've introduced new policies for Aks resource, which this module failed to comply for now.

I don't want to make this pr too complex, so I want to skip these new policies, would you please help me by adding this config file with you pr so the Checkov would just skip these policies? The new checkov file name should be .checkov_config.yaml, that would fix our issue.

@the-technat
Copy link
Contributor Author

the-technat commented Dec 14, 2022

@lonegunmanb no problem, we all well know flaky CI tests 😉. I've added the config file as you recommended and now the tests seem to pass here as well. Anything else that needs to be addressed?

Copy link
Member

@lonegunmanb lonegunmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @the-technat for your update, almost LGTM, once the e2e test passed, only a tiny change request to solve, one extra commit contains only one line change would make this pr been merged.

locals.tf Outdated Show resolved Hide resolved
@the-technat
Copy link
Contributor Author

Based on
image

I switched the version to 1.24 in e2e tests. @lonegunmanb would you rerun them?

@the-technat the-technat temporarily deployed to acctests December 14, 2022 23:57 — with GitHub Actions Inactive
@lonegunmanb
Copy link
Member

Based on image

I switched the version to 1.24 in e2e tests. @lonegunmanb would you rerun them?

With pleasure!

Copy link
Member

@lonegunmanb lonegunmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @the-technat , LGTM! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Auto-upgrade feature
3 participants