Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VMSS to Mariner with FIPS enabled #3741

Merged
merged 2 commits into from
Aug 22, 2024

Conversation

s-fairchild
Copy link
Collaborator

@s-fairchild s-fairchild commented Jul 30, 2024

Which issue this PR addresses:

Covers improving bootstrap scripts in : ARO-6773
Covers move to Mariner OS for FIPS in: ARO-8989

Fixes

What this PR does / why we need it:

This moves our RP and Gateway VMSS instances to Mariner OS, which is compatible with FIPS.
This is also needed for a resilient to failure RP deployment.
Our current implementation has shown some faults over time with RP deployments, and need improvements in general.

Move all shared code into a commonly shared file to be sourced by all
bootstrapping scripts. This allows for code reuse, minimal duplication.
Change VMSS OS to Mariner with FIPS enabled by default.
Firewalls are configured at the Azure vnet level. I did not configure firewalld, because in my testing Mariner OS does not support firewalld out of the box.

Test plan for issue:

Local Full Service Dev RP Testing

Tested during development by deploying a full service development RP environment.
Further testing will be done by deploying to INT. I will update with the successful INT deployment once completed.

Deployment to Canary Sector

Successfully deployed to Canary sector in pipeline run

Example Log Output

Abort
Error: Unable to find a match: azsec-clamav azsec-monitor azure-cli azure-mdsd azure-security
dnf_install_pkgs: failed to install required packages
Log
configure_sshd: setting ssh password authentication
configure_disk_partitions: extending partition table

Is there any documentation that needs to be updated for this PR?

No

How do you know this will function as expected in production?

Per testing mentioned in the test plan.

@s-fairchild s-fairchild added the chainsaw Pull requests or issues owned by Team Chainsaw label Jul 30, 2024
@s-fairchild s-fairchild changed the title S fairchild/aro 8989 update bootstrap Update VMSS to Mariner with FIPS enabled Jul 30, 2024
@s-fairchild s-fairchild force-pushed the s-fairchild/ARO-8989-update-bootstrap branch 6 times, most recently from 36529ee to 3863fee Compare August 9, 2024 16:03
@s-fairchild
Copy link
Collaborator Author

azp run ci,e2e

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot to review here. Thank you in advance for humoring my questions 🙇🏻

pkg/deploy/generator/resources_rp.go Show resolved Hide resolved
pkg/deploy/generator/resources_rp.go Show resolved Hide resolved
pkg/deploy/generator/resources_rp.go Outdated Show resolved Hide resolved
pkg/deploy/generator/scripts/util-common.sh Outdated Show resolved Hide resolved
pkg/deploy/generator/scripts/util-common.sh Outdated Show resolved Hide resolved
pkg/deploy/generator/scripts/rpVMSS.sh Show resolved Hide resolved
pkg/deploy/generator/scripts/gatewayVMSS.sh Show resolved Hide resolved
pkg/deploy/generator/scripts/gatewayVMSS.sh Outdated Show resolved Hide resolved
pkg/deploy/generator/scripts/gatewayVMSS.sh Show resolved Hide resolved
pkg/deploy/generator/scripts/devProxyVMSS.sh Outdated Show resolved Hide resolved
@s-fairchild s-fairchild added release-blocker next-release To be included in the next RP release rollout labels Aug 14, 2024
Copy link
Collaborator

@tsatam tsatam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes mostly LGTM - @kimorris27 covered most of the suggestions I have, and I'd like to see a deployment of this change in a non-dev environment.

pkg/deploy/generator/resources_rp.go Show resolved Hide resolved
@s-fairchild s-fairchild force-pushed the s-fairchild/ARO-8989-update-bootstrap branch from 8e6db08 to 374af12 Compare August 15, 2024 20:33
@s-fairchild
Copy link
Collaborator Author

s-fairchild commented Aug 16, 2024

Do we have understanding of when automatic updates are ran or is it just as-needed, outside of our control? Do we know how long they take to execute per-VM? It seems like we'd need that to make adjustments to our alerting.

I am in agreement that we should tolerate 1 unhealthy at a time, maximum, but only for a short period of time.

@cadenmarchese We could manually apply updates ourselves. I think that would require creating a pipeline to do so, which would give us more control. That's a topic I think is worth discussion amongst the team.

For them to be automatically applied, we have no real insight or control.
They don't provide a time estimate per machine. I have configured a 10 minute waiting period between updating machines, so an update will apply to instance 0, wait 10 minutes, proceed with the next update.

The OS updates are re-provisioning the instance with an updated OS image. I believe that means our custom script will have to run after this each time. That would mean we'd see two reboots before the update is completed. One for the updated OS re-image, our custom script runs, reboots post completion, then it's back to responding to requests.

You can read their documentation here: Automatic Guest VM Patching

@s-fairchild
Copy link
Collaborator Author

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cadenmarchese
Copy link
Collaborator

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@s-fairchild s-fairchild force-pushed the s-fairchild/ARO-8989-update-bootstrap branch 2 times, most recently from b7cc458 to 08003b3 Compare August 20, 2024 16:00
@s-fairchild
Copy link
Collaborator Author

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@s-fairchild s-fairchild force-pushed the s-fairchild/ARO-8989-update-bootstrap branch from 08003b3 to f5debe1 Compare August 21, 2024 16:58
…ly Configured FIPS Mode

System Changes:

Remove lvm disk resize, Mariner does not use lvm, the disk is automatically grown to the full size specified.
Remove semanage, Mariner Linux does not have selinux configured.

Remove gateway log rotation config
Log rotation for the podman level driver log was not the correct
approach. The podman log driver is now journald, so all logs will be
shipped to journald rather than a ctr.log file.

fips mode is manually configured following the example code at https://eng.ms/docs/products/azure-linux/features/security/fips
SKU cbl-mariner-2-gen2-fips does not support Automatic OS Updates, therefore we are switching to cbl-mariner-2-gen2, manually configuring fips mode, to allow for Automatic OS Updates.

Script Changes:

Restructure VMSS bootstrap bash scripts for increased reliability, and easier debugging
Move all shared code into a commonly shared file to be sourced by all
bootstrapping scripts. This allows for code reuse, minimal duplication.

Fix mdm mdsd certificate download script
During mdm and mdsd setup, I've added wait steps for the download
scripts to complete getting certificates. Without this, the download
scripts run in a subshell and fixing up the certificates fails.

Add firewalld configuration, required for podman networking
Add podman aro network creation to isolate RP containers from possible
interaction on the default podman network.

Package Changes:

Install Azure Security Monitor via VMSS Extension
Remove RHUI and Microsoft repo configuration, add Mariner Extended repo config
Increase rpm retry time to 30 minutes total, every 30 seconds.
This is to reduce the amount of type conversions needed.
@s-fairchild s-fairchild force-pushed the s-fairchild/ARO-8989-update-bootstrap branch from f5debe1 to 169f42c Compare August 21, 2024 17:55
@s-fairchild
Copy link
Collaborator Author

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remaining outstanding comments are small things, so they can be addressed in a follow-up PR 👍🏻

@tsatam tsatam merged commit 89cf7d3 into master Aug 22, 2024
44 of 47 checks passed
@s-fairchild s-fairchild deleted the s-fairchild/ARO-8989-update-bootstrap branch August 22, 2024 14:38
@s-fairchild s-fairchild mentioned this pull request Sep 13, 2024
edisonLcardenas pushed a commit that referenced this pull request Sep 16, 2024
* Update RP and Gateway vmss OS image to cbl-mariner-2-gen2 with Manually Configured FIPS Mode

System Changes:

Remove lvm disk resize, Mariner does not use lvm, the disk is automatically grown to the full size specified.
Remove semanage, Mariner Linux does not have selinux configured.

Remove gateway log rotation config
Log rotation for the podman level driver log was not the correct
approach. The podman log driver is now journald, so all logs will be
shipped to journald rather than a ctr.log file.

fips mode is manually configured following the example code at https://eng.ms/docs/products/azure-linux/features/security/fips
SKU cbl-mariner-2-gen2-fips does not support Automatic OS Updates, therefore we are switching to cbl-mariner-2-gen2, manually configuring fips mode, to allow for Automatic OS Updates.

Script Changes:

Restructure VMSS bootstrap bash scripts for increased reliability, and easier debugging
Move all shared code into a commonly shared file to be sourced by all
bootstrapping scripts. This allows for code reuse, minimal duplication.

Fix mdm mdsd certificate download script
During mdm and mdsd setup, I've added wait steps for the download
scripts to complete getting certificates. Without this, the download
scripts run in a subshell and fixing up the certificates fails.

Add firewalld configuration, required for podman networking
Add podman aro network creation to isolate RP containers from possible
interaction on the default podman network.

Package Changes:

Install Azure Security Monitor via VMSS Extension
Remove RHUI and Microsoft repo configuration, add Mariner Extended repo config
Increase rpm retry time to 30 minutes total, every 30 seconds.

* Embed scripts as strings rather than []byte

This is to reduce the amount of type conversions needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chainsaw Pull requests or issues owned by Team Chainsaw next-release To be included in the next RP release rollout release-blocker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants