Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARO-9570: Add a controller to the ARO operator to lay down etc hosts machine config #3771

Closed
wants to merge 1 commit into from

Conversation

lranjbar
Copy link
Collaborator

@lranjbar lranjbar commented Aug 15, 2024

Which issue this PR addresses:

Fixes ARO-9570

What this PR does / why we need it:

This PR adds a MachineConfigs for the ARO operator to lay down:

99-master-aro-etc-hosts-gateway-domains
99-worker-aro-etc-hosts-gateway-domains

During the UDR installation incident we have gone back and forth on how resolve the root issue. The root cause is that the gateway domains are not populated for OpenShift images to be pulled and installed. These domains are IPs so we can pass them into /etc/hosts to resolve them before other DNS resolution options are available.

Test plan for issue:

I've some minimal smoke testing on the operator with this PR.

Is there any documentation that needs to be updated for this PR?

How do you know this will function as expected in production?

Copy link
Contributor

@kimorris27 kimorris27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM broadly speaking, but I asked a few questions about parts I'm uncertain about. It would also be nice to have E2E tests, though they could probably go in a separate PR if we're in a rush.

Also, I don't think we can run E2E unless you reopen your PR using a branch directly off of Azure/ARO-RP. I'll copy my review comments over to the new PR once you're opened it to keep things organized in one place.

pkg/operator/controllers/etchosts/etchosts.go Show resolved Hide resolved
pkg/operator/controllers/etchosts/etchosts.go Show resolved Hide resolved
Comment on lines +212 to +244
b, err := json.Marshal(ignConfig)
if err != nil {
return nil, err
}

// canonicalise the machineconfig payload the same way as MCO
var i interface{}
err = json.Unmarshal(b, &i)
if err != nil {
return nil, err
}

rawExt := runtime.RawExtension{}
rawExt.Raw, err = json.Marshal(i)
if err != nil {
return nil, err
}

return &mcfgv1.MachineConfig{
TypeMeta: metav1.TypeMeta{
APIVersion: mcfgv1.SchemeGroupVersion.String(),
Kind: "MachineConfig",
},
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("99-%s-aro-etc-hosts-gateway-domains", role),
Labels: map[string]string{
"machineconfiguration.openshift.io/role": role,
},
},
Spec: mcfgv1.MachineConfigSpec{
Config: rawExt,
},
}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly necessary for this PR if we're in a rush to get this merged, but this portion seems like it could be factored out into a shared utility function that accepts the ignition config, the role, and the MachineConfig name and returns the MachineConfig.

)

const (
EtcHostsMachineConfigControllerName = "DnsmasqMachineConfig"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super up-to-speed with the full incident response plan, but I guess this name choice indicates that this controller is intended to completely replace the existing dnsmasq MachineConfig controller?

If so, does it need the additional controllers for watching the ARO cluster CR and the MCPs (like the existing controllers in pkg/operator/controllers/dnsmasq)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, its actually just that I missed this rename. This code its based off the dnsmasq controller. It's not super clear to me why that controller also watches those things. When you place the machine config MCO will reconcile the MCPs. There should be no reason to watch the CR at all.

@SudoBrendan
Copy link
Collaborator

/azp run ci

@SudoBrendan
Copy link
Collaborator

/azp run e2e

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

No pipelines are associated with this pull request.

@SudoBrendan
Copy link
Collaborator

No pipelines are associated with this pull request.

@lranjbar yea - kipp is right - can you move this to a branch under ARO-RP so we can run our pipelines? This is a result of security hardening... you can run make init-contrib to set up some local hooks on branch naming strategies, generally, <GITHUB_USERNAME>/<JIRA_ID><other-optional-text>.

@lranjbar
Copy link
Collaborator Author

Opened #3781 to merge from branch inside the repo

@lranjbar lranjbar closed this Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next-release To be included in the next RP release rollout
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants