-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-11437: MCO keeps the pull secret to .orig file once it replaced #3759
OCPBUGS-11437: MCO keeps the pull secret to .orig file once it replaced #3759
Conversation
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
/test all |
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
24a1f49
to
1845ca2
Compare
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Verified using IPI on OSP
The file is created and removed normally. We can read the delete-stale logs when removing the file
When we deploy the MC the file /var/lib/kubelet/config.json has this content: {} When we remove the MC the file /var/lib/kubelet/config.json has the content defined in the pull-secret.
After configuring the emtpy pull-secret the file /var/lib/kubelet/config.json does not exist
Before the upgrade, the file exists and contains the right content:
After the upgrade, the file still exists
Even if we re-configure the pull-secret, and we set an empty pull-secret, the file is not removed.
Look for that file:
We will use /etc/tpm2-tss/fapi-config.json. We create a MC to replace the file in the machine
The the configuration is applied without problems. We see that there is no "orig" file created for this file:
When we remove the MC, the file is deleted. We can see these lines in the daemon logs
And the original file is lost:
My main concerns after the verification steps are:
Before adding the qe-appoved label I would like to make sure that those behaviors are expected/OK. Thank you very much! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ines! The general approach makes sense to me, some comments:
When we upgrade a cluster to a version with this fix, the file /etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig is still present in the cluster, and is not removed.
Agree with Sergio. I think eventually we want to remove the orig approach. We could make it such that we delete the orig file entirely for the pull secret even on upgrades as noted by Sergio's test process. I don't think it must be part of this PR though, so it depends on what you want to do @inesqyx
Also minor nit: if you can update the commit message to also have some of the context in your PR message, that would be good as well!
pkg/daemon/file_writers.go
Outdated
@@ -60,6 +61,12 @@ func createOrigFile(fromPath, fpath string) error { | |||
return writeFileAtomicallyWithDefaults(noOrigFileStampName(fpath), nil) | |||
} | |||
|
|||
// https://issues.redhat.com/browse/OCPBUGS-11437 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When commenting a bug like this, it means that the code that follows is intended as a workaround, and we would want to remove the code at a later time.
I think that makes sense here, since we eventually want to fix the "orig file" implementation, either separately or eventually as part of layering. Maybe it would be good to mention that here? i.e. this code can be eventually superceded by a proper fix to the underlying problem
pkg/daemon/file_writers.go
Outdated
@@ -60,6 +61,12 @@ func createOrigFile(fromPath, fpath string) error { | |||
return writeFileAtomicallyWithDefaults(noOrigFileStampName(fpath), nil) | |||
} | |||
|
|||
// https://issues.redhat.com/browse/OCPBUGS-11437 | |||
// MCO keeps the pull secret to .orig file once it replaced | |||
if strings.HasSuffix(fpath, "config.json") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think like Sergio mentions, this probably should check specifically for the pull secret path and not just any config.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @sergiordlr and @yuqi-zhang for pointing this out. I have made the changes to more carefully point at the desired path, and yet to be tested.
I set the jira status to "assigned" again, please, set it back to "post" when the PR is ready for verification again. |
1845ca2
to
27901de
Compare
/retest-required |
1 similar comment
/retest-required |
I think we can more categorically exclude everything in That would avoid hardcoding anything relating to the pull secret. |
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
f28b928
to
7e615b5
Compare
/retest-required |
73fa61d
to
9e77c24
Compare
/retest-required |
e08f5db
to
8a36c31
Compare
/retest-required |
1 similar comment
/retest-required |
/test unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits on logs and comments, otherwise lgtm!
pkg/daemon/file_writers.go
Outdated
@@ -45,12 +46,80 @@ func noOrigFileStampName(fpath string) string { | |||
return filepath.Join(noOrigParentDir(), fpath+".mcdnoorig") | |||
} | |||
|
|||
func isFileOwnedByRPMPkg(fpath string) (bool, bool, error) { | |||
// The first bool returned indicated whether the underlying OS is Mac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for clarity purposes, maybe something like:
The first bool returns false if rpm exists, and true otherwise (indicating other Linux distros or Mac). We would like to skip orig file creation and preservation
pkg/daemon/file_writers.go
Outdated
} | ||
} else if isMac { | ||
// Run on Mac | ||
klog.Infof("Running on Mac,skip orig file preservation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above, I think rephrase to "rpm binary not found", and change the references from Mac to "rpmNotFound" or similar, I think that's more clear in case someone gets this error on a non-fedora/rhel machine for some reason
pkg/daemon/update.go
Outdated
} | ||
} else if isMac { | ||
// Run on Mac | ||
klog.Infof("Running on Mac,skip file restoration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same as above
pkg/daemon/update_test.go
Outdated
assert.Nil(t, errr) | ||
assert.Nil(t, errr) | ||
// fileNotOwnedMsg := "not owned" | ||
// if strings.Contains(string(out), fileNotOwnedMsg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove unused code
pkg/daemon/update_test.go
Outdated
assert.Nil(t, err) | ||
// Check whether the given path is owned by an rpm pkg | ||
cmd := exec.Command(path, "-qf", path) | ||
_, errr := cmd.CombinedOutput() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, why is this errr? Why not just _, err = cmd.CombinedOutput()
@inesqyx @yuqi-zhang any update on this PR? we have a automation PR depends on this one. https://github.com/openshift/openshift-tests-private/pull/11128 |
Hey, Rio! I am back from the PTO and sorry for the late reply. I will have the changes in place and make it through in a day or so. Should be quick. Thanks for the reminder! |
8a36c31
to
39cfba3
Compare
/retest-required |
…are claimed by the rpm; delete all incorrectly preserved orig files and create noorigs for them
39cfba3
to
1cebf67
Compare
@inesqyx: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for the fixes! @rioliu-rh would you say this is a critical fix then? Merging to master (4.14) is currently blocked unless it can be considered a critical fix
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: inesqyx, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
/jira refresh The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target different OCP), recalculating validity. |
@openshift-bot: This pull request references Jira Issue OCPBUGS-11437, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@inesqyx: This pull request references Jira Issue OCPBUGS-11437, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@inesqyx: Jira Issue OCPBUGS-11437: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-11437 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The pull secret is preserved in the path /etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdorig. Whereas, the user needs it to be deleted when the pull secret is being deleted/ replaced.
There is complaint about that the MCD is creating orig files to preserve files that should not be preserved. To be more specific, the orig preserving system should only preserve files that are originally on the disk, but are now preserving files that are written by Ignition. The root cause is that Ignition writes the files before the MCD takes the control as a result when MCD comes in, it sees all the files written before as files on disk, which is very wrong.
Add more constraints for orig file generation: orig files should only be generated on files that 1) claimed by the rpm 2) exist in the usr/etc directory 3) exist before MCD takes control
Ignition writes files in the /var directory. And now when cat into the orig folder for the MCD, there is no var folder, meaning that files written by Ignition are not preserved now. Specific example would be that catting into /etc/machine-config-daemon/orig/var/lib/kubelet/config.json.mcdori will now return no so such file