-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
microsoft/azure: allow empty certificate chain in PKCS12 file #1074
Conversation
67b15ec
to
cbdb2a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM
Thanks for working on this!
src/providers/microsoft/azure/mod.rs
Outdated
Ok(k) => k, | ||
Err(e) => match e.root_cause().to_string().as_ref() { | ||
"SSH public key not found in chain" => { | ||
warn!("SSH pubkeys requested, but not provisioned for this instance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the right warning to give to users, this makes it seem like we desired something (pubkeys requested
), but couldn't find them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like:
No SSH keys found in the certificate chain for this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a much better warning.
While we have this open, should I change the message a few lines up as well? I was attempting to show the warning we would have expected prior to coreos/fedora-coreos-tracker#1553
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warn!("SSH pubkeys requested, but not provisioned for this instance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that warning would display when the certs endpoint doesn't exist, maybe that one could just read:
SSH pubkeys not provisioned for this instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me
2db1a5d
to
250091c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
docs/release-notes.md
Outdated
@@ -12,6 +12,8 @@ Major changes: | |||
|
|||
Minor changes: | |||
|
|||
- Fix Azure SSH key fetching when no key provisioned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: we usually don't have a trailing period in these notes
src/providers/microsoft/azure/mod.rs
Outdated
let key = match self.get_ssh_pubkey(certs_endpoint) { | ||
Ok(k) => k, | ||
Err(e) => match e.root_cause().to_string().as_ref() { | ||
"detected empty certificate chain" => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of string matching errors, a more rustic way to do this is to have get_ssh_pubkey()
return a Result<Option<PublicKey>>
. Then, get_ssh_pubkey()
would return Ok(None)
for this scenario. And here, we'd do e.g.:
let key: Vec<PublicKey> = self.get_ssh_pubkeys(certs_endpoint)?.unwrap_or_default();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the line based on your suggestion. Since PublicKey
doesn't implement Default, I used unwrap_or(vec![])
instead to assign an empty vector if a None value is returned.
src/providers/microsoft/azure/mod.rs
Outdated
Ok(k) => k, | ||
Err(e) => match e.root_cause().to_string().as_ref() { | ||
"detected empty certificate chain" => { | ||
warn!("No SSH keys found in the certificate chain for this instance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cause for warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not warn, but at least an info log statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is cause for warning, but I chose to warn here because that's the behavior we expected when an SSH key wasn't supplied before this issue arose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to make sure I understand this: before, Azure would not offer up a certificate endpoint if no SSH keys were provided, but now it always does and instead signals no SSH keys through the certificate itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, yes. I don't think Azure is directly signaling that no SSH key is provided through the certificate, but that's what we've found so far: that the certificate chain is empty when no SSH key is provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, if the new behaviour is that the certificate chain is empty when no SSH keys are present, then it should not even be considered a warning since it's completely supported to not pass an SSH key. For the same reason as the other comment, I wouldn't even info!
it (just like other providers don't info!
if they didn't get a key), but not strongly against. We already log something when we do write SSH keys.
debug!()
sounds appropriate as a debugging aid if we want and could live directly in src/providers/microsoft/crypto/mod.rs
where we return Ok(None)
so that the calling code here can stay simple with unwrap_or_default()
(otherwise, to log it here, we'd need to inspect the Option
first to know if it's None
).
src/providers/microsoft/azure/mod.rs
Outdated
@@ -403,7 +403,7 @@ impl MetadataProvider for Azure { | |||
let certs_endpoint = match goalstate.certs_endpoint() { | |||
Some(ep) => ep, | |||
None => { | |||
warn!("SSH pubkeys requested, but not provisioned for this instance"); | |||
warn!("SSH pubkeys not provisioned for this instance"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "requested" wording in this warning refers to the fact that something called afterburn --ssh-keys USER
but we couldn't find anything. And so at this layer of this stack, the warning as written previously was accurate. (Even though we know that at the FCOS layer, it's because we unconditionally run this service for the core
user.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does afterburn --ssh-keys USER
imply, though?
It's hard not to look at this from the lens of FCOS where we are saying essentially "if there are SSH keys give them to me".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the code for the other providers and the history of this line, I think we should drop this warning entirely. Other providers don't warn in the case that no SSH keys were found, and AFAICT there is no reason to treat Azure differently.
This test is still failing, but there's a PR open to resolve this. Let's snooze for another week so we can land: coreos/afterburn#1074
This test is still failing, but there's a PR open to resolve this. Let's snooze for another week so we can land: coreos/afterburn#1074
28baf16
to
388b98d
Compare
I reworked this PR and updated the description based on the review comments. |
This test is still failing. Let's snooze again while we wait for coreos/afterburn#1074 to land.
458dafc
to
c3aa867
Compare
The "Rust / Lints, pinned toolchain (pull_request)" check is failing on |
This test is still failing. Let's snooze again while we wait for coreos/afterburn#1074 to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Rust / Lints, pinned toolchain (pull_request)" check is failing on cargo fmt -- --check -l, but I get no errors when I run this command locally on this branch. What am I missing here?
This likely means that you're using a different version than what CI is using. You can see the version it's using in
afterburn/.github/workflows/rust.yml
Line 21 in 547c950
ACTIONS_LINTS_TOOLCHAIN: 1.75.0 |
dnf update rust cargo clippy rustfmt
) or from upstream if you used rustup (e.g. rustup update
).
src/providers/microsoft/azure/mod.rs
Outdated
|
||
Ok(ssh_pubkey) | ||
Ok(Some(vec![ssh_pubkey])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making the signature of this function as Result<Option<PublicKey>>
and then here it'd just be
Ok(Some(vec![ssh_pubkey])) | |
Ok(Some(ssh_pubkey)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think once we do the above, we can shrink this whole thing to just call crypto::p12_to_ssh_pubkey(&p12).context("failed to convert pkcs12 blob to ssh pubkey")
as the final line of the function since it already matches the signature we need.
src/providers/microsoft/azure/mod.rs
Outdated
let key = self.get_ssh_pubkey(certs_endpoint)?; | ||
Ok(vec![key]) | ||
let key: Vec<PublicKey> = self.get_ssh_pubkey(certs_endpoint)?.unwrap_or(vec![]); | ||
|
||
Ok(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this code here would remain untouched.
IOW, the lower-level get_ssh_pubkey()
function can just focus on doing the obvious thing and this function here is where we have to deal with what our callers expect from us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having a difficult time figuring out how to return a vector on None
without inspecting the Option like
afterburn/src/providers/microsoft/azure/mod.rs
Lines 406 to 408 in c3aa867
let certs_endpoint = match goalstate.certs_endpoint() { | |
Some(ep) => ep, | |
None => return Ok(vec![]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right sorry, I missed that detail here. There's a neat Rust trick which is that Option
actually supports iteration, in which case it'll iterate zero times or once (if the item exists). So something like Ok(maybe_key.into_iter().collect())
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks! I made that change.
a032055
to
21b1e89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit, LGTM otherwise!
src/providers/microsoft/azure/mod.rs
Outdated
let ssh_pubkey = match crypto::p12_to_ssh_pubkey(&p12) | ||
.context("failed to convert pkcs12 blob to ssh pubkey")? | ||
{ | ||
Some(key) => key, | ||
None => return Ok(None), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that all this can now be
let ssh_pubkey = match crypto::p12_to_ssh_pubkey(&p12) | |
.context("failed to convert pkcs12 blob to ssh pubkey")? | |
{ | |
Some(key) => key, | |
None => return Ok(None), | |
}; | |
crypto::p12_to_ssh_pubkey(&p12) | |
.context("failed to convert pkcs12 blob to ssh pubkey") |
And we can remove the Ok()
line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated! Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt update it properly. I'll work on it.
a82baf2
to
04be6a4
Compare
Azure is populating the certs endpoint even when an SSH key is not provided. The certificate chain in the PKCS12 file is empty in this case causing afterburn to fail with the current logic. Check if the certificate chain is empty before retrieving the SSH public key and handle this case as a valid option. The more general `crypto/mod.rs` was updated here which also affected the azurestack code. Update azurestack to accomodate the changes when retrieving the SSH public key.
04be6a4
to
622b0d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Extend the snooze on the `coreos.ignition.ssh.key` test failure for another month until a new release of Afterburn can be performed that will include the fix for the failure: coreos/afterburn#1074
Extend the snooze on the `coreos.ignition.ssh.key` test failure for another month until a new release of Afterburn can be performed that will include the fix for the failure: coreos/afterburn#1074
Extend the snooze on the `coreos.ignition.ssh.key` test failure for another month until a new release of Afterburn can be performed that will include the fix for the failure: coreos/afterburn#1074. A release is pending in: coreos/afterburn#1095
Extend the snooze on the `coreos.ignition.ssh.key` test failure for another month until a new release of Afterburn can be performed that will include the fix for the failure: coreos/afterburn#1074. A release is pending in: coreos/afterburn#1095
Azure is populating the certs endpoint even when an SSH key is not
provided. The certificate chain in the PKCS12 file is empty in this
case causing afterburn to fail with the current logic. Check if the
certificate chain is empty before retrieving the SSH public key
and handle this case as a valid option.
The more general
crypto/mod.rs
was updated here which also affectedthe azurestack code. Update azurestack to accommodate the changes when
retrieving the SSH public key.
See: coreos/fedora-coreos-tracker#1553