Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix child manifest handling #199

Merged
merged 22 commits into from
Mar 27, 2020
Merged

Conversation

listx
Copy link
Contributor

@listx listx commented Mar 12, 2020

This should fix #191

The problem was that the parsing around child manifests was not done correctly. I have added a unit test to cover this case.

Putting this up now for feedback. I still have to (1) add more unit tests for Audit() and (2) add an e2e test for the bug in #191. But I would like to do those in separate PRs as this PR is already pretty big.

/cc @justinsb @thockin @dims @bartsmykla

Linus Arver added 9 commits March 11, 2020 19:10
This makes the Contains() method much more intelligent about which
pieces of the GCR payload were matched (found) in a given promoter
manifest.
Otherwise, the unit tests for Audit() will require actually fetching
data from GCR.
Also make Auditor() into a method on the ServerContext to keep the args
list minimal.
This way, we can examine the logged output during tests after the
logging has finished.
This part of the code deals with handling child image manifests whose
digests are NOT recorded in the promoter manifests [1]. Such use cases
are perfectly valid because it may be the case that users only want to
track the parent digest, not all of the component child digests as well.
So in the event that a child image's digest is received by the auditor,
it must only reject if the corresponding parent digest cannot be found
in the promoter manifests.

The fix involves using the improved Contains() method (which now has
much higher granularity about how close of a match the GCR payload for
the child image had with a promoter manifest) instead of using a very
crude suffix matching algorithm.

[1]: kubernetes-sigs#165
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 12, 2020
@bartsmykla
Copy link

/assign

Comment on lines +26 to +31
deps = [
"//lib/dockerregistry:go_default_library",
"//lib/logclient:go_default_library",
"//lib/remotemanifest:go_default_library",
"//lib/report:go_default_library",
],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @listx. It's not actually review comment but I decided I'm gonna try to understand bazel more in the future so could you point me to the documentation maybe about what are these doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deps is just a list of dependencies. The auditor.go file (which is part of package audit) now depends on these (new) local dependencies. BTW I didn't manually put this in, I just ran bazel run //:gazelle and it does all of these Bazel imports for me automatically.

for _, manifest := range manifests {
if manifest.Contains(*gcrPayload) {
if manifest.Contains(*gcrPayload)&(reg.FlagDigestMatched|reg.FlagTagMatched) != 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the use of the bitwise operator but I feel it can be problematic for some of the less experienced contributors. What do you think about maybe add some of the examples how it should/would behave in the comment above? Let's treat it as a place to discussion, not a suggestion yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add maybe more explanations where I declare FlagDigestMatched instead. I'd prefer adding comments in a central place (where they are declared) over where they are used.

if gcrPayload.Digest == fqin {
return true
r |= FlagDigestMatched

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same comment about bitwise operator as above, let's discuss if maybe we could provide some information about how it will behave for less experienced contributors?

}
}
return false
return r
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is VERY complex, do you think we can maybe simplify it somehow? Maybe spliting it into smaller functions would help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can split it up at least 1 more level. ACK

@listx listx changed the title WIP: fix child manifest handling fix child manifest handling Mar 12, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2020
@listx
Copy link
Contributor Author

listx commented Mar 12, 2020

I think this PR is ready for merge. I'll work on adding more test cases in subsequent PRs.

/cc @justinsb

Contains() is now replaced with Match(). The function is no longer a
receiver on the Manifest, but rather the GCRPubSubPayload.

Also, a new flag has been introduced: "FlagTagMismatch", to better
capture the range of matching results.
We now additionally capture and check against the error and alert
loggers, as well as the report buffer.

Also, use "iota" because otherwise the consts for IndexLogError and
IndexLogAlert are also set to 0.
This information is not really useful because we already log the
found (if matched) parent digest in the "TRANSACTION VERIFIED ..."
message.
@listx
Copy link
Contributor Author

listx commented Mar 14, 2020

FYI gonna sync up with @justinsb on Monday to try to simplify this. Setting this to work-in-progres to reflect that.

/wip

@listx listx added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 14, 2020
for _, manifest := range manifests {
if manifest.Contains(*gcrPayload) {
m := gcrPayload.Match(manifest)
if m&(reg.FlagDigestMatched|reg.FlagTagMatched) != 0 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: I do wonder if the flags would be better expressed as a struct with bool members, particularly as some of them have true for match and some true for mismatch.

But ... what if match results (bool, Info), where info was the bitset or struct with more info, and the bool was true if it's a match.

}
}
// If we can't find the source registry for this image, then reject the
// Find the subproject repository responsible for the GCRPubSubPayload. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to split out PubSub (a transport) from the idea of a "Change" we are validating. You could also then split this function into two, one that parses pubsub and one that validates a change.

// repository and manifest list.
//
// nolint[lll]
type GcrReadingFacility struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't love the name, but I'm not sure I understand it well enough to propose a better one. GcrReader?

// FlagDigestMatched is set if the digest in the payload matches a digest in
// the promoter manifest. This is ONLY matched if the path also matches.
FlagDigestMatched
// FlagTagMatched is ONLY matched if the digest also matches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think these rules suggest it isn't a bitset after all, hence why I suggested just returning (bool, Info)

return r
}
}
return r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever non-zero?

}
// Speed up the search by skipping over registry names whose leading
// characters do not match.
if !strings.HasPrefix(payload.Digest, (string)(rc.Name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're comparing Digest against name - is that deliberate?

return r
}
}
return r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here - is this ever non-zero? It makes the code harder to reason about if there's the potential for results carrying over


var r GcrPayloadMatch

if !strings.Contains(payload.Digest, (string)(image.ImageName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - digest vs image name? Needs a comment if it's right :-)

r |= FlagPathMatched
for digest, tags := range image.Dmap {
fqin := ToFQIN(rc.Name, image.ImageName, digest)
// The payload.Digest field actually holds the FQIN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see - and I presume that's an artifact of the GCR pubsub format? If so, let's definitely convert that notification into something that is easier to work with... even if we just rename Digest to FQIN

return r
}
r |= FlagPathMatched
for digest, tags := range image.Dmap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to loop? It looks like we are only interested in one key value...

@justinsb
Copy link
Contributor

Two concrete suggestions that I think aren't too big changes and could make this much clearer:

  1. Replace the bitset with a struct, and return (bool, InfoStruct). If there's a match, the first value is true and that's that. If there's no match, we look at InfoStruct to see how bad things are.

  2. Parse the pubsub message into the fields we want, and isolate ourselves from that rather than letting that flow into our code. It looks like splitting out the digest and the tags from the name would simplify things.

@listx
Copy link
Contributor Author

listx commented Mar 24, 2020

Two concrete suggestions that I think aren't too big changes and could make this much clearer:

  1. Replace the bitset with a struct, and return (bool, InfoStruct). If there's a match, the first value is true and that's that. If there's no match, we look at InfoStruct to see how bad things are.

I think just replacing the bitset with a struct would be enough, but I'll try out your suggestion. I imagine this will just be a struct with some bools in it.

  1. Parse the pubsub message into the fields we want, and isolate ourselves from that rather than letting that flow into our code. It looks like splitting out the digest and the tags from the name would simplify things.

This... Yes, it would make things a lot cleaner. I'll prioritize this change first.

This is just a variable name change.
Linus Arver added 3 commits March 25, 2020 15:56
This method takes the FQIN and PQIN fields in the payload and uses them
to generate much more user-friendly fields (path, digest, tag) to use
internally within our codebase.
This increases code readability, because we match on smaller
pieces (digest, tag) directly. It also makes the code faster because we
no longer have to construct FQINs and PQINs.

Also, we match more strictly for FlagPathMatched. Instead of just
checking for a prefix, we match on the entire path for an exact match.
This should make it more robust for cases where we have nested projects.
Since we are only interested in looking up a single digest (the one in
the payload), we can remove the loop with a simple lookup.
@listx
Copy link
Contributor Author

listx commented Mar 25, 2020

Thinking about the (bool, InfoStruct) suggestion again, I'm not sure if it will make it clearer.

The problem here is that there are 2 dimensions of matching:

  1. does payload match the promoter manifest, field-for-field (first the path, then digest, then tag (if present))?
  2. if the payload matches up to the digest, and the payload has a tag field, does the payload's tag match the promoter manifest's tag? (FlagTagMismatch)

Replace the bitset with a struct, and return (bool, InfoStruct). If there's a match, the first value is true and that's that. If there's no match, we look at InfoStruct to see how bad things are.

It's not clear what bool here would mean in the first part of the tuple.

@listx
Copy link
Contributor Author

listx commented Mar 26, 2020

@justinsb I've gone ahead with your second suggestion to rename the misnamed Digest field in the payload to be FQIN (also PQIN in place of Tag).

Linus Arver added 2 commits March 25, 2020 17:28
The fields have become a bit more verbose, but that's because of the
extra fields we store. We can simplify how these messages are logged in
the future.
This does a 1:1 replacement of bits with bools. No functional change.
@justinsb
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 26, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, listx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@listx listx removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 27, 2020
@k8s-ci-robot k8s-ci-robot merged commit 71603cc into kubernetes-sigs:master Mar 27, 2020
listx pushed a commit to listx/k8s.io that referenced this pull request Mar 30, 2020
This version includes a cip-auditor version with fixes to logging [1] and
also child digest image detection [2].

[1]: kubernetes-sigs/promo-tools#206
[2]: kubernetes-sigs/promo-tools#199
@listx listx deleted the fix-auditor branch May 8, 2020 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

auditor fails to parse valid image references
4 participants