Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

microsoft/azure: allow empty certificate chain in PKCS12 file #1074

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

marmijo
Copy link
Member

@marmijo marmijo commented May 8, 2024

Azure is populating the certs endpoint even when an SSH key is not
provided. The certificate chain in the PKCS12 file is empty in this
case causing afterburn to fail with the current logic. Check if the
certificate chain is empty before retrieving the SSH public key
and handle this case as a valid option.

The more general crypto/mod.rs was updated here which also affected
the azurestack code. Update azurestack to accommodate the changes when
retrieving the SSH public key.

See: coreos/fedora-coreos-tracker#1553

Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM

Thanks for working on this!

src/providers/microsoft/azure/mod.rs Outdated Show resolved Hide resolved
Ok(k) => k,
Err(e) => match e.root_cause().to_string().as_ref() {
"SSH public key not found in chain" => {
warn!("SSH pubkeys requested, but not provisioned for this instance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right warning to give to users, this makes it seem like we desired something (pubkeys requested), but couldn't find them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something like:

No SSH keys found in the certificate chain for this instance. 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a much better warning.
While we have this open, should I change the message a few lines up as well? I was attempting to show the warning we would have expected prior to coreos/fedora-coreos-tracker#1553

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn!("SSH pubkeys requested, but not provisioned for this instance");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that warning would display when the certs endpoint doesn't exist, maybe that one could just read:

SSH pubkeys not provisioned for this instance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

src/providers/microsoft/crypto/mod.rs Outdated Show resolved Hide resolved
@marmijo marmijo force-pushed the marmijo-allow-empty-ca branch 2 times, most recently from 2db1a5d to 250091c Compare May 9, 2024 21:42
Copy link
Member

@dustymabe dustymabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -12,6 +12,8 @@ Major changes:

Minor changes:

- Fix Azure SSH key fetching when no key provisioned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: we usually don't have a trailing period in these notes

let key = match self.get_ssh_pubkey(certs_endpoint) {
Ok(k) => k,
Err(e) => match e.root_cause().to_string().as_ref() {
"detected empty certificate chain" => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of string matching errors, a more rustic way to do this is to have get_ssh_pubkey() return a Result<Option<PublicKey>>. Then, get_ssh_pubkey() would return Ok(None) for this scenario. And here, we'd do e.g.:

let key: Vec<PublicKey> = self.get_ssh_pubkeys(certs_endpoint)?.unwrap_or_default();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the line based on your suggestion. Since PublicKey doesn't implement Default, I used unwrap_or(vec![]) instead to assign an empty vector if a None value is returned.

Ok(k) => k,
Err(e) => match e.root_cause().to_string().as_ref() {
"detected empty certificate chain" => {
warn!("No SSH keys found in the certificate chain for this instance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cause for warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not warn, but at least an info log statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is cause for warning, but I chose to warn here because that's the behavior we expected when an SSH key wasn't supplied before this issue arose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to make sure I understand this: before, Azure would not offer up a certificate endpoint if no SSH keys were provided, but now it always does and instead signals no SSH keys through the certificate itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, yes. I don't think Azure is directly signaling that no SSH key is provided through the certificate, but that's what we've found so far: that the certificate chain is empty when no SSH key is provided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, if the new behaviour is that the certificate chain is empty when no SSH keys are present, then it should not even be considered a warning since it's completely supported to not pass an SSH key. For the same reason as the other comment, I wouldn't even info! it (just like other providers don't info! if they didn't get a key), but not strongly against. We already log something when we do write SSH keys.

debug!() sounds appropriate as a debugging aid if we want and could live directly in src/providers/microsoft/crypto/mod.rs where we return Ok(None) so that the calling code here can stay simple with unwrap_or_default() (otherwise, to log it here, we'd need to inspect the Option first to know if it's None).

@@ -403,7 +403,7 @@ impl MetadataProvider for Azure {
let certs_endpoint = match goalstate.certs_endpoint() {
Some(ep) => ep,
None => {
warn!("SSH pubkeys requested, but not provisioned for this instance");
warn!("SSH pubkeys not provisioned for this instance");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "requested" wording in this warning refers to the fact that something called afterburn --ssh-keys USER but we couldn't find anything. And so at this layer of this stack, the warning as written previously was accurate. (Even though we know that at the FCOS layer, it's because we unconditionally run this service for the core user.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does afterburn --ssh-keys USER imply, though?

It's hard not to look at this from the lens of FCOS where we are saying essentially "if there are SSH keys give them to me".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the code for the other providers and the history of this line, I think we should drop this warning entirely. Other providers don't warn in the case that no SSH keys were found, and AFAICT there is no reason to treat Azure differently.

marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request May 13, 2024
This test is still failing, but there's a PR open to resolve this.
Let's snooze for another week so we can land:
coreos/afterburn#1074
dustymabe pushed a commit to coreos/fedora-coreos-config that referenced this pull request May 13, 2024
This test is still failing, but there's a PR open to resolve this.
Let's snooze for another week so we can land:
coreos/afterburn#1074
@marmijo marmijo force-pushed the marmijo-allow-empty-ca branch 3 times, most recently from 28baf16 to 388b98d Compare June 19, 2024 23:51
@marmijo
Copy link
Member Author

marmijo commented Jun 19, 2024

I reworked this PR and updated the description based on the review comments.

marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request Jun 21, 2024
This test is still failing. Let's snooze again while we wait for
coreos/afterburn#1074 to land.
@marmijo marmijo force-pushed the marmijo-allow-empty-ca branch 2 times, most recently from 458dafc to c3aa867 Compare June 21, 2024 19:24
@marmijo
Copy link
Member Author

marmijo commented Jun 21, 2024

The "Rust / Lints, pinned toolchain (pull_request)" check is failing on cargo fmt -- --check -l, but I get no errors when I run this command locally on this branch. What am I missing here?

jlebon pushed a commit to coreos/fedora-coreos-config that referenced this pull request Jun 21, 2024
This test is still failing. Let's snooze again while we wait for
coreos/afterburn#1074 to land.
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Rust / Lints, pinned toolchain (pull_request)" check is failing on cargo fmt -- --check -l, but I get no errors when I run this command locally on this branch. What am I missing here?

This likely means that you're using a different version than what CI is using. You can see the version it's using in

ACTIONS_LINTS_TOOLCHAIN: 1.75.0
and compare to yours. You don't have to match it; it should be enough to make sure you have the latest toolchain from Fedora (e.g. dnf update rust cargo clippy rustfmt) or from upstream if you used rustup (e.g. rustup update).


Ok(ssh_pubkey)
Ok(Some(vec![ssh_pubkey]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making the signature of this function as Result<Option<PublicKey>> and then here it'd just be

Suggested change
Ok(Some(vec![ssh_pubkey]))
Ok(Some(ssh_pubkey))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think once we do the above, we can shrink this whole thing to just call crypto::p12_to_ssh_pubkey(&p12).context("failed to convert pkcs12 blob to ssh pubkey") as the final line of the function since it already matches the signature we need.

Comment on lines 415 to 412
let key = self.get_ssh_pubkey(certs_endpoint)?;
Ok(vec![key])
let key: Vec<PublicKey> = self.get_ssh_pubkey(certs_endpoint)?.unwrap_or(vec![]);

Ok(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this code here would remain untouched.

IOW, the lower-level get_ssh_pubkey() function can just focus on doing the obvious thing and this function here is where we have to deal with what our callers expect from us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having a difficult time figuring out how to return a vector on None without inspecting the Option like

let certs_endpoint = match goalstate.certs_endpoint() {
Some(ep) => ep,
None => return Ok(vec![]),

Copy link
Member

@jlebon jlebon Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right sorry, I missed that detail here. There's a neat Rust trick which is that Option actually supports iteration, in which case it'll iterate zero times or once (if the item exists). So something like Ok(maybe_key.into_iter().collect()) should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks! I made that change.

@marmijo marmijo force-pushed the marmijo-allow-empty-ca branch 2 times, most recently from a032055 to 21b1e89 Compare June 21, 2024 21:34
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, LGTM otherwise!

Comment on lines 274 to 279
let ssh_pubkey = match crypto::p12_to_ssh_pubkey(&p12)
.context("failed to convert pkcs12 blob to ssh pubkey")?
{
Some(key) => key,
None => return Ok(None),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that all this can now be

Suggested change
let ssh_pubkey = match crypto::p12_to_ssh_pubkey(&p12)
.context("failed to convert pkcs12 blob to ssh pubkey")?
{
Some(key) => key,
None => return Ok(None),
};
crypto::p12_to_ssh_pubkey(&p12)
.context("failed to convert pkcs12 blob to ssh pubkey")

And we can remove the Ok() line below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt update it properly. I'll work on it.

@marmijo marmijo force-pushed the marmijo-allow-empty-ca branch 2 times, most recently from a82baf2 to 04be6a4 Compare June 21, 2024 22:04
@marmijo marmijo requested a review from jlebon June 21, 2024 22:05
Azure is populating the certs endpoint even when an SSH key is not
provided. The certificate chain in the PKCS12 file is empty in this
case causing afterburn to fail with the current logic. Check if the
certificate chain is empty before retrieving the SSH public key
and handle this case as a valid option.

The more general `crypto/mod.rs` was updated here which also affected
the azurestack code. Update azurestack to accomodate the changes when
retrieving the SSH public key.
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@jlebon jlebon enabled auto-merge June 21, 2024 22:40
@jlebon jlebon merged commit 4dfc444 into coreos:main Jun 21, 2024
9 checks passed
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request Jul 1, 2024
Extend the snooze on the `coreos.ignition.ssh.key` test failure
for another month until a new release of Afterburn can be
performed that will include the fix for the failure:
coreos/afterburn#1074
aaradhak pushed a commit to coreos/fedora-coreos-config that referenced this pull request Jul 1, 2024
Extend the snooze on the `coreos.ignition.ssh.key` test failure
for another month until a new release of Afterburn can be
performed that will include the fix for the failure:
coreos/afterburn#1074
marmijo added a commit to marmijo/fedora-coreos-config that referenced this pull request Aug 12, 2024
Extend the snooze on the `coreos.ignition.ssh.key` test failure
for another month until a new release of Afterburn can be
performed that will include the fix for the failure:
coreos/afterburn#1074.

A release is pending in:
coreos/afterburn#1095
aaradhak pushed a commit to coreos/fedora-coreos-config that referenced this pull request Aug 12, 2024
Extend the snooze on the `coreos.ignition.ssh.key` test failure
for another month until a new release of Afterburn can be
performed that will include the fix for the failure:
coreos/afterburn#1074.

A release is pending in:
coreos/afterburn#1095
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coreos.ignition.ssh.key failing in kola-azure
3 participants