Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUT-228 - Fixing monitor errors in stage #981

Merged
merged 15 commits into from
Sep 12, 2024
Merged

AUT-228 - Fixing monitor errors in stage #981

merged 15 commits into from
Sep 12, 2024

Conversation

alexcottner
Copy link
Contributor

@alexcottner alexcottner commented Sep 6, 2024

AUT-228
Working on resolving monitor errors in stage. This is part 1 of N.

  1. Fixing our stage config to use correct certs and signatures
  2. Removing oddity from verifyXPISignature that allowed it to verify with a nil truststore

Need to:

  • Create followup ticket to prevent signatures from being verified without truststores (AUT-245)
  • Review updated monitor logs post deployment to stage

@alexcottner alexcottner marked this pull request as ready for review September 6, 2024 20:42
@alexcottner alexcottner requested review from a team as code owners September 6, 2024 20:42
@alexcottner alexcottner requested review from jmhodges and removed request for a team September 6, 2024 20:42
@@ -8,100 +8,83 @@ const autographDevRootHash = `5E:36:F2:14:DE:82:3F:8B:29:96:89:23:5F:03:41:AC:AF

// firefoxPkiStageRootHash is the SHA2 hash of the firefoxPkiStageRoot
// cert raw bytes
const firefoxPkiStageRootHash = `DB:74:CE:58:E4:F9:D0:9E:E0:42:36:BE:6C:C5:C4:F6:6A:E7:74:7D:C0:21:42:7A:03:BC:2F:57:0C:8B:9B:90`
const firefoxPkiStageRootHash = `C0:F0:5D:59:B1:FD:E2:57:80:85:4C:32:FA:E8:FA:BA:84:81:C2:33:B4:C1:D3:90:CC:A5:F2:CE:A8:19:30:EE`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only one of the two XPI roots that are used in staging. You'll need to include both. The two were used to do initial "backwards compat" testing during cert succession because the prior staging root key had been lost to time.

Both are on https://bugzilla.mozilla.org/show_bug.cgi?id=1882192 (and soon https://github.com/mozilla-services/hsm/pull/31)

The cas_new is the "long-term" staging root we're using and is used for all of the cas_new_ signers in staging of types xpi and contentsignaturepki and will be in all new signers of those types.

cas_cur was the root that'll expire in a few months and is used for all of the cas_cur_ signers in staging of both xpi and contentsignaturepki types.

The easiest thing to do would be to add both as constants here and then add both to the truststore and do the sha checking for both. Key agility, baby.

Copy link
Contributor

@jmhodges jmhodges Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, I don't think we've removed the old staging signer ids, yet, either, so you probably ALSO need to keep the current cert included in the truststore, too.

Copy link
Contributor

@jmhodges jmhodges Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wellll, actually, that would be a good forcing function for us to remove the signers using that old staging root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've converted this to an array for now and have the 3 root certs in there.

Maybe I'm thinking about this backwards. But we should remove the certs from the store before removing them from monitor, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, imo. We should see this working and then see it continue to work after we remove them from the configs.

It could go the other way, but this lets us scream test at our leisure instead of gating the GCP migration on the scream test having without screams.

Copy link
Contributor

@jmhodges jmhodges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep forgetting if I had reviewed the latest changes here, so let me drop this that we discussed offline:

the rootHash var also needs this change.

@alexcottner alexcottner marked this pull request as draft September 10, 2024 15:47
switch conf.env {
case "stage":
conf.rootHash = firefoxPkiStageRootHash
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring to just have truststore and depTruststore.
Also using arrays for multiple roots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a helper function to populate and no longer need root hashes for stage and prod.

if rhash != certHash {
return fmt.Errorf("hash does not match expected root: expected=%s; got=%s", rhash, certHash)
var matchFound = false
for _, rootHash := range rootHashes {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to check each root hash.
Argued with myself about removing colons ahead of time or in here. This seems like less code, the other way seems like less compute time. Minimal difference either way I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use slices.ContainsFunc or slices.Contains (if you move the strings.Replace to earlier) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, ContainsFunc seems like the cleanest way.

// autograph dev config:
// https://github.com/mozilla-services/autograph/blob/b3081068f4a9c0c1de02150432f2d02887dd6722/autograph.yaml#L113-L126
// used on the normandy and remote settings development servers
const autographDevRootHash = `5E:36:F2:14:DE:82:3F:8B:29:96:89:23:5F:03:41:AC:AF:A0:75:AF:82:CB:4C:D4:30:7C:3D:B3:43:39:2A:FE`
var autographDevRootHashes = []string{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit weird because arrays can't be constants in golang. But these are scoped to monitor so seems fine?

@@ -109,8 +109,8 @@ func signTestCert(options signOptions) *x509.Certificate {
return certX509
}

func sha2Fingerprint(cert *x509.Certificate) string {
return strings.ToUpper(fmt.Sprintf("%x", sha256.Sum256(cert.Raw)))
func sha2Fingerprint(cert *x509.Certificate) []string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand turning this into a []string return type is convenient, but this is not what I would expect to receive from a func taking a single cert and would spend time investigating why. Could we do this outside of this func (and its similarly named friend elsewhere) or change this (and its similarly named friend) to take multiple certs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. Adjusting this and fixed the property names too.

xNCR3VBsgIgAhyO11hXiBo5nX5OHn2PG1MPV9+H0RQb6g5zOuKrWoh9CzB8mBl7d
0JMkzy6TFAcAo3KL5ZNvRqqB8xRmIYLK
-----END CERTIFICATE-----`,
// current root cert
Copy link
Contributor

@jmhodges jmhodges Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"current" isn't accurate. I know we called it "cas_cur" but that was a temporary thing. It's now the one that isn't used by staging services. cas_new is. I know, I know, this is confusing now, but we had to optimize for not confusing the folks doing backwards compat testing during cert succession.

Future signers won't be prefixed with cas_new_ but will use that "cas_new" intermediate cert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, updating comments to make this (hopefully) a little less confusing.

@jmhodges
Copy link
Contributor

And additional note that isn't strongly required: You know, you also could use a func to calculate the rootHashes of the certificates instead of us having to get both the cert and the root hash right. That would let you fix up the hash strings however you want and we could avoid some annoying global churn

@alexcottner alexcottner marked this pull request as ready for review September 11, 2024 16:59
Copy link
Contributor

@jmhodges jmhodges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dope and very close! Thank you

@@ -113,6 +95,15 @@ func main() {
}
}

// Helper function to load a series of certificates and their hashes to a given truststore and hash list
func LoadCertsToTruststore(hashArr *[]string, truststore *x509.CertPool, certificates []string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could create the CertPool inside itself and return the hashArr and the CertPool, couldn't it? Passing a pointer to a slice like this is not typical in Go, and hints that we may be too side-effect-y. (Even if we did take the slice as param, typical style is to return a slice -- that might be unchanged -- from the func instead to avoid nil pointer panics from being possible)

So the new type I'm proposing would be loadCertsToTrustStore(certificates []string) ([]string, *x509.CertPool).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omg duh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chaa, been there. Go (and Rust) are interesting because you get pointers (unlike most languages us web-y folks work in these days), but you also don't have to weird C stuff with them for simple things like output params etc etc!

tools/autograph-monitor/constants.go Outdated Show resolved Hide resolved
@alexcottner alexcottner marked this pull request as ready for review September 12, 2024 18:08
if rhash != certHash {
return fmt.Errorf("hash does not match expected root: expected=%s; got=%s", rhash, certHash)
var matchFound = slices.ContainsFunc(rootHashes, func(rootHash string) bool {
return certHash == strings.Replace(rootHash, ":", "", -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this Replace now that we have the LoadCertsToTruststore func? I think no (or, if so, could we do this there?)

if err != nil {
break
}
conf.depTruststore, conf.depRootHashes, err = LoadCertsToTruststore(firefoxPkiProdRoots)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firefoxPkiStageRoots I believe?

conf.depTruststore.AppendCertsFromPEM([]byte(firefoxPkiStageRoot))
conf.truststore, conf.rootHashes, err = LoadCertsToTruststore(firefoxPkiProdRoots)
if err != nil {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I'm not stoked by having to reuse err across the switch (yes, i didn't mention this on the zoom; it's not a huge deal), but since we do, we probalby want to make this err and the stage one after it more specific with a

err = fmt.Errorf("unable to load production roots: %w", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, that could get confusing. I think I can make this a bit more explicit since we have 2 trust stores, why not have 2 errors?

depErr = fmt.Errorf("failed to load depTruststore root certificates: %w", depErr)
}
if err != nil || depErr != nil {
log.Fatalf("%s", errors.Join(err, depErr))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should allow us to know if either truststore is failing. Which could be useful in the future if we start using both for stage or dev for whatever reason.

Copy link
Contributor

@jmhodges jmhodges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alexcottner alexcottner merged commit 19c4068 into main Sep 12, 2024
14 checks passed
@alexcottner alexcottner deleted the aut-228 branch September 26, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants