Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load entrypoint from remote #183

Merged

Conversation

tanner-bruce
Copy link

Currently, any steps in a Pipeline that rely on the built in entrypointof a container will not have the expected behaviour while running due to the entrypoint being overridden at runtime. This fixes #175.

A major side effect of this work is that the override step will now possibly make HTTP calls every time a step is overridden. There is very rudimentary caching in place which could be improved if performance becomes an issue.

This builds on the work in PR #167

@knative-prow-robot knative-prow-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 24, 2018
@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 24, 2018
@imjasonh
Copy link
Member

imjasonh commented Oct 24, 2018

This is awesome. 👍

  1. We probably need images to be PullPolicy: Always to have this work. If an image is already present on the node and gets pushed with a different entrypoint, this code will assume the new entrypoint, even if the image it runs might not want/have it.

  2. Is there any way to move this up into the controller, rather than running it from the pod each time? This would help us de-duplicate the lookups per build at least, and would enable us to keep a cache of image digest -> entrypoint, which could be reused across runs.

  3. Does this work with private builder images?

@imjasonh
Copy link
Member

/ok-to-test

@knative-prow-robot knative-prow-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 24, 2018
@tanner-bruce
Copy link
Author

  1. We probably need images to be PullPolicy: Always to have this work. If an image is already present on the node and gets pushed with a different entrypoint, this code will assume the new entrypoint, even if the image it runs might not want/have it.

Good point. I don't think it would be possible to check the node itself unless the pod ran as privileged. Adding PullPolicy should be harmless and expected - if you are using overwriting tags on your repo you would likely want Always regardless, and if you are not overwriting tags, this shouldn't be an issue.

  1. Is there any way to move this up into the controller, rather than running it from the pod each time? This would help us de-duplicate the lookups per build at least, and would enable us to keep a cache of image digest -> entrypoint, which could be reused across runs.

For sure, I'll take a look! I maintain the minimal cache per build but it would definitely be better higher up the control chain.

  1. Does this work with private builder images?

Yes, it uses the same authentication mechanism other go-containerregistry tools use.

@imjasonh
Copy link
Member

Yes, it uses the same authentication mechanism other go-containerregistry tools use.

I'm not sure that's enough to be sure it has the necessary credentials. Among the e2e tests we should have for this, we should make sure this works (and keeps working) with private builder images.

We might find out that supporting entrypoint with private images isn't feasible today, in which case we should at least document that you can't use the two together for now.

ep, err := getEntrypoint(step.Image)
if err != nil {
return fmt.Errorf("couldn't get entrypoint for %s: %v", step.Image, err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my feeling is that if there is an error here we should log it but continue onward

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like @imjasonh has the bulk of the review covered, I just made a couple minor style comments!

@bobcatfish
Copy link
Collaborator

Sorry for all the conflicts @tanner-bruce there have been a lot of changes going in lately!

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this change, could you also update the image contract docs to explain that entrypoint can be used (and maybe the limitations on it - i.e. if the image pulled by the node uses a different entrypoint than the image pulled by the controller, there will be problems)?

(I think these docs were added while you were working on this @tanner-bruce sorry about that! Hopefully we can get this in asap so you don't have to keep updating the PR!)

I have a couple remaining questions:

I'm not sure that's enough to be sure it has the necessary credentials. Among the e2e tests we should have for this, we should make sure this works (and keeps working) with private builder images.
We might find out that supporting entrypoint with private images isn't feasible today, in which case we should at least document that you can't use the two together for now.

@tanner-bruce any update about this?

I think the existing e2e tests would cover this, since they push to a private registry afaik? But I would think you'd run into problems if the controller didn't have the right credentials to pull from the private registry - what's the behavior in that case? I would think it should log an error but still continue on without failing the TaskRun?

Is there any way to move this up into the controller, rather than running it from the pod each time?

I'm confused by this @imjasonh - I thought this logic was in the controller all along?

@tanner-bruce
Copy link
Author

tanner-bruce commented Oct 31, 2018

@bobcatfish @imjasonh I tested that a private image works with the entrypoint lookup, however to accomplish it I set the HOME envvar on the controller to be /root (just to be explicit), made a ~/.docker/config.json` that had a _json_key/service account username/password combo and then tested with the private image.

It successfully pulled the entrypoint and the tests pass, but it did require modifying the runtime of the controller itself to have the docker creds. This could likely be done in a nicer way - maybe go-containerregistry can have creds loaded via code and we could allow config for this, or have the task itself contain a ref to the secret and the controller would load it.

For testing this as it stands we would need to add some sort of creds to the controller deployment.

I think the existing e2e tests would cover this, since they push to a private registry afaik? But I would think you'd run into problems if the controller didn't have the right credentials to pull from the private registry - what's the behavior in that case? I would think it should log an error but still continue on without failing the TaskRun?

Right now we log a warning message and carry on.

I'm confused by this @imjasonh - I thought this logic was in the controller all along?

Previously it was only caching at the build level. I moved the entrypoint.NewCache up to the reconciler level but am not persisting the cache inside the actual reconciler at this moment

@tanner-bruce tanner-bruce force-pushed the load-entrypoint-from-remote branch 2 times, most recently from 7fd99fb to c5d7706 Compare October 31, 2018 18:40
ep, err := entrypoint.GetRemoteEntrypoint(cache, step.Image)
if err != nil {
c.Logger.Warnf("could not get remote entrypoint for %s: %v", step.Image, err)
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, perfect 👍

only improvement i can think of is the log message could elaborate on what this means, e.g. something like "could not get remote entrypoint for %s, if %s relies on entrypoint behaviour may be blah blah blah blah: %v"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just realized we'd only in this if the command is not specified - maybe failing silently isn't the right way to go since there's pretty much no chance the behaviour you'd end up with is what the user wants in this case

do you have a preference @tanner-bruce re. whether we fail the whole TaskRun in this case or keep going?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think it should definitely fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message returned is pretty explicit:

couldn't get config for image gcr.io/build-crd-testing/secret-sauce: UNAUTHORIZED: "You don't have the needed permissions to perform this operation, and you may have invalid credentials.

Copy link
Collaborator

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple minor comments!

/lgtm

remove if you want to merge as-is:
/hold

or ping me if you make changes and need another lgtm

@knative-prow-robot knative-prow-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Oct 31, 2018
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2018
@tanner-bruce
Copy link
Author

/test pull-knative-build-pipeline-unit-tests

@tanner-bruce
Copy link
Author

/hold cancel

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 1, 2018
command: [docker]
```

However, if the steps specified a custom `command`, that is what would be used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is a little odd now since the example above specifies a custom command as well, but we can iterate on it!

@bobcatfish
Copy link
Collaborator

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2018
@tejal29
Copy link
Contributor

tejal29 commented Nov 1, 2018

@tanner-bruce can you please rebase from latest master and resolve conflict.

@tejal29 tejal29 closed this Nov 1, 2018
@tejal29 tejal29 reopened this Nov 1, 2018
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Nov 1, 2018
@tanner-bruce
Copy link
Author

/test pull-knative-build-pipeline-integration-tests
/test pull-knative-build-pipeline-unit-tests

@tanner-bruce tanner-bruce force-pushed the load-entrypoint-from-remote branch 2 times, most recently from 5bbcd2f to 2ebd459 Compare November 1, 2018 22:44
Currently, any steps in a Pipeline that rely on the built in entrypoint
of a container will not have the expected behaviour while running due to
the entrypoint being overridden at runtime. This fixes tektoncd#175.

A major side effect of this work is that the override step will now
possibly make HTTP calls every time a step is overridden. There is very
rudimentary caching in place, however this could likely be improved if
performance becomes an issue.

Fixes tektoncd#175
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-build-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/v1alpha1/taskrun/entrypoint/entrypoint.go Do not exist 84.2%
pkg/reconciler/v1alpha1/taskrun/taskrun.go 72.7% 71.4% -1.3

@bobcatfish bobcatfish added this to the 0.0.1 Alpha release milestone Nov 3, 2018
@bobcatfish
Copy link
Collaborator

looks like the test coverage went down slightly but lets go ahead with this anyway, we can keep iterating on it.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2018
@bobcatfish
Copy link
Collaborator

🤦‍♀️
/approve

@knative-prow-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bobcatfish, tanner-bruce

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2018
@knative-prow-robot knative-prow-robot merged commit 8fa4437 into tektoncd:master Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build steps that rely on entrypoint wont work properly with logging, i.e. Command is required currently
7 participants