Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jit config #163

Merged
merged 14 commits into from
Sep 24, 2023
Merged

Conversation

gabriel-samfira
Copy link
Member

@gabriel-samfira gabriel-samfira commented Aug 22, 2023

This PR changes the way GARM registers runners. With this change, GARM will now attempt to use Just-In-Time configuration instead of passing a runner registration token to the instance to run against the ./config.sh script.

This adds a number of advantages:

  • The runner registration token is valid for one hour, and can theoretically be used to register any number of runners. If compromised, an attacker could inject their own runners inside the entity that generated the token. GARM attempts to mitigate this by not sending the token via user-data, and will only allow the retrieval of the token once. If only saved in memory, that should mitigate the issue to some extent.
  • The JIT configuration can only be used by one runner at any given point in time. Once we spin up the runner and it connects to github, it should not matter if an attacker manages to read the config (they can do that anyway once a job runs), because:
    • A runner is already running
    • The runner is ephemeral. Killing it while the job is running will invalidate the credentials, as JIT runners are ephemeral by nature.
  • With JIT runners, we finally have the ability to skip adding the default labels to runners (eg: self-hosted, x64, linux)

This change is currently blocked until the following PRs merge and a release is cut:

@gabriel-samfira gabriel-samfira self-assigned this Aug 22, 2023
@gabriel-samfira gabriel-samfira marked this pull request as draft August 22, 2023 19:08
@gabriel-samfira
Copy link
Member Author

CC @maigl This change will work with GHES 3.10 and above. Before this is ready to merge, I will ensure that it can fall back to the old way of registering a runner. It should already be able to do that, but some extra checks would not hurt.

@maigl
Copy link
Contributor

maigl commented Aug 23, 2023

CC @maigl This change will work with GHES 3.10 and above. Before this is ready to merge, I will ensure that it can fall back to the old way of registering a runner. It should already be able to do that, but some extra checks would not hurt.

Okay thx, we're not at 3.10 yet, so the fallback is important to us.

labels := r.getLabelsForInstance(pool)
// Attempt to create JIT config
jitConfig, runner, err := r.helper.GetJITConfig(ctx, instance, pool, labels)
if err == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maigl This should fall back to the old way of registering a runner. Any error in creating a JIT config should make it fall back to the old registration method.

@gabriel-samfira gabriel-samfira force-pushed the add-jit-config branch 2 times, most recently from 08b9bf1 to bd50c05 Compare August 24, 2023 13:47
@gabriel-samfira
Copy link
Member Author

gabriel-samfira commented Aug 24, 2023

@maigl with the latest commit, fallback should work.

The logs will show something like this:

Aug 24 13:37:36 garm garm[3161868]: 2023/08/24 13:37:36 [Pool mgr ghadmin/test-repo] failed to get JIT config, falling back to registration token: failed to get JIT config: POST https://ghe.example.com/api/v3/repos/ghadmin/test-repo/actions/runners/generate-jitconfig: 404 Not Found []

If you have some cycles, it would be great if you could test it against your env. Note, you will also need to update the openstack external provider to latest main and optionally add:

[default]
webhook_url = "https://garm.example.com/webhooks"
enable_webhook_management = true

in your garm config.

Once your instance updates to 3.10 or newer, it should automatically switch to JIT.

This PR will most likely linger in draft until there is an upstream go-github release that includes the PRs in the description.

@gabriel-samfira
Copy link
Member Author

CC @mkulke

I know you were looking into adding JIT at some point.

@mkulke
Copy link

mkulke commented Aug 24, 2023

CC @mkulke

I know you were looking into adding JIT at some point.

Awesome, thanks. I'll take a look

@gabriel-samfira gabriel-samfira force-pushed the add-jit-config branch 6 times, most recently from 5eb76d7 to 09405dd Compare August 28, 2023 10:12
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
When using JIT runners, we register the runner on GitHub before we get
a chance to spin up the instance in the provider. In such cases, we end
up with a runner in "offline" state while we're creating the actual resource
that will embody the runner. This change will give runners a chance to come
online before garm tries to clean them up.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
We must create the DB entry for a runner with a JIT config included. Adding it later
via an update runs the risk of having the consolidate loop pick up the incomplete instance.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@gabriel-samfira gabriel-samfira marked this pull request as ready for review September 24, 2023 13:58
@gabriel-samfira gabriel-samfira merged commit a48ec0c into cloudbase:main Sep 24, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants