-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connect: connect CA Roots in secondary datacenters should use a SigningKeyID derived from their local intermediate #6513
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Might have been a little preemptive with the review as you have a couple todos for tests and some other things.
The changes look great and make total sense. Is there a reasonable way to unit test this? Something maybe that validates the root rotation logic no longer fires on intermediate certificates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of this thought inline?
// the root changed. | ||
// authorityKeyId is the key ID of the CA (intermediate) root that signed | ||
// the current cert. This is just to save parsing the whole cert everytime | ||
// we have to check if the root changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// we have to check if the root changed. | |
// we have to check if the signing key changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 actually I'm not sure this is quite right.
The only reason we care about rotating the certificate in this case is if the actual root changes. If for some reason a new intermediate was generated but for the same root, we don't need to rotate certs signed by the old intermediate because the will still validate against the same root cert.
We only care about rotating if the actual root key is different.
So this was kinda more correct before, but I think we misguidedly built the logic to look at the (intermediate) signing key assuming that was the right thing to do for intermediates and so broke this assumption.
So I think this PR is more correct than before it because it at least behaves well and has the same set of assumptions in the logic and the data we are propagating.
But actually in some sense it's not necessary to rotate if only the intermediate is changing and it would be more correct to only compare against the actual root key even when an intermediate is being used.
I think that boils down to SigningKeyID
being misnamed in the CARoot
since the thing we care about it more like RootKeyID
.
What do you think? This is better than no change for sure but should we go back and make this all correct and optimal instead? Or am I missing a reason you would need to rotate if intermediate changed too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To play devil's advocate... the only time intermediates should change in Connect is if roots do too currently which possibly makes this argument moot on an optimality point of view...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to rotate the cert even when the intermediate changes. This could use verification but I am pretty sure we have to push both the root and intermediate cert into Envoy for verification. It has to have the whole chain and if the leaf cert was signed by an unknown intermediate (an old one that we have since regenerated) then I don't think the certs will validate and connections will be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So unless there's something non obvious here, I can't see a field on a leaf cert that expresses a link back to the primary root, only a link to the intermediate that signed it.
We could walk the linked list of intermediates back to the primary root in the leaf caching code and compare the leaf.SigningKeyID->intermediate
, intermediate->root
, root=?=prevRoot
but that would require unpacking the whole chain of certs every time the leaf cache Fetch
is called.
But as you said given that intermediates rotate only when the primary root rotates (or possibly in rare other circumstances), we can just rely upon waiting until the intermediates rotate first because they detect that the primary root rotated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkeeler the intermediate is delivered by the client Envoy along with the leaf. As long as the intermediate is still valid (i.e. not revoked and we don't have revocation checking yet..) then it's fine as long as it was signed by the same root as the current intermediate.
That said, this will be more important if we allow revocation of intermediates in future.
So unless there's something non obvious here, I can't see a field on a leaf cert that expresses a link back to the primary root, only a link to the intermediate that signed it.
Yep that is the root (pardon the pun) of the confusion here - it's my mistake as I thought the signing key was the important thing but then only verified against the root. We could instead have store the key ID of the root in that struct and just compared that to the current root instead.
That said on the basis of the future-proofing against intermediate revocation and my indication that it makes not actual difference currently I think this is good as it is.
I've updated this to include tests and cleanup logic to fix previously persisted CARoot objects in the state store during secondary init after upgrade. |
0665f40
to
02dcbb1
Compare
a33f90b
to
00453e3
Compare
02dcbb1
to
e2a1c3f
Compare
00453e3
to
6612a78
Compare
…ngKeyID derived from their local intermediate This fixes an issue where leaf certificates issued in secondary datacenters would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering because a hash of the ultimate root (in the primary) was being compared against a hash of the local intermediate root (in the secondary) and always failing.
5be8002
to
e9b3e66
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
…ngKeyID derived from their local intermediate (#6513) This fixes an issue where leaf certificates issued in secondary datacenters would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering because a hash of the ultimate root (in the primary) was being compared against a hash of the local intermediate root (in the secondary) and always failing.
…ingKeyID derived from their local intermediate This fixes an issue where leaf certificates issued in primary datacenters using Vault as a Connect CA would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering. The hash of the rootCA was being compared against a hash of the intermediateCA and always failing. This doesn't apply to the Consul built-in CA provider because there is no intermediate in use in the primary DC. This is reminiscent of #6513.
…ingKeyID derived from their local intermediate This fixes an issue where leaf certificates issued in primary datacenters using Vault as a Connect CA would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering. The hash of the rootCA was being compared against a hash of the intermediateCA and always failing. This doesn't apply to the Consul built-in CA provider because there is no intermediate in use in the primary DC. This is reminiscent of #6513.
…ingKeyID derived from their local intermediate This fixes an issue where leaf certificates issued in primary datacenters using Vault as a Connect CA would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering. The hash of the rootCA was being compared against a hash of the intermediateCA and always failing. This doesn't apply to the Consul built-in CA provider because there is no intermediate in use in the primary DC. This is reminiscent of #6513.
…ingKeyID derived from their local intermediate This fixes an issue where leaf certificates issued in primary datacenters using Vault as a Connect CA would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering. The hash of the rootCA was being compared against a hash of the intermediateCA and always failing. This doesn't apply to the Consul built-in CA provider because there is no intermediate in use in the primary DC. This is reminiscent of #6513.
…ingKeyID derived from their local intermediate (#9428) This fixes an issue where leaf certificates issued in primary datacenters using Vault as a Connect CA would be reissued very frequently (every ~20 seconds) because the logic meant to detect root rotation was errantly triggering. The hash of the rootCA was being compared against a hash of the intermediateCA and always failing. This doesn't apply to the Consul built-in CA provider because there is no intermediate in use in the primary DC. This is reminiscent of #6513
Fixes #6507
This fixes an issue where leaf certificates issued in secondary
datacenters would be reissued very frequently (every ~20 seconds)
because the logic meant to detect root rotation was errantly triggering
because a hash of the ultimate root (in the primary) was being compared
against a hash of the local intermediate root (in the secondary) and
always failing.