Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit-followup: deal with project ssh-keys growing #1751

Closed
spiffxp opened this issue Mar 3, 2021 · 13 comments · Fixed by #1904
Closed

audit-followup: deal with project ssh-keys growing #1751

spiffxp opened this issue Mar 3, 2021 · 13 comments · Fixed by #1904
Assignees
Labels
area/access Define who has access to what via IAM bindings, role bindings, policy, etc. area/audit Audit of project resources, audit followup issues, code in audit/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Mar 3, 2021

Followup from #1748

over time k8s-infra-e2e-* projects are accumulating entries in project.commonInstanceMetadata["ssh-keys"] (eg: https://github.com/kubernetes/k8s.io/pull/1748/files#diff-358508c57e9216cf9e24c719a001712286b35054ea2618b4c7904db2290e8337R6)

briefly:

  • first entry ends in prow\nprow:prow:ssh-rsa (this is intended, same across all projects and managed by scripts in this repo)
  • second entry ends in prow\nkubetest2:ssh-rsa (not sure if this is actively used / the same key everywhere)
  • a lot of the entries end in something like root@6e6f003c-7933-11eb-a603-2218f636630c\nkubetest2:ssh-rsa

concerns:

  • are these growing unbounded?
  • what happens if we hit a max entries limit?
  • do we want to pre-create these keys ourselves vs. trusting CI to be allowed to do this?
  • if we preserve status quo of ever-changing, can we make audit of these expected changes less noisy?

/kind bug
/wg k8s-infra
/sig testing
/area prow
/area infra-mgmt
/area access
/area infra/auditing
/priority important-longerm

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters area/access Define who has access to what via IAM bindings, role bindings, policy, etc. labels Mar 3, 2021
@k8s-ci-robot
Copy link
Contributor

@spiffxp: The label(s) area/infra-mgmt, priority/important-longerm cannot be applied, because the repository doesn't have them

In response to this:

Followup from #1748

over time k8s-infra-e2e-* projects are accumulating entries in project.commonInstanceMetadata["ssh-keys"] (eg: https://github.com/kubernetes/k8s.io/pull/1748/files#diff-358508c57e9216cf9e24c719a001712286b35054ea2618b4c7904db2290e8337R6)

briefly:

  • first entry ends in prow\nprow:prow:ssh-rsa (this is intended, same across all projects and managed by scripts in this repo)
  • second entry ends in prow\nkubetest2:ssh-rsa (not sure if this is actively used / the same key everywhere)
  • a lot of the entries end in something like root@6e6f003c-7933-11eb-a603-2218f636630c\nkubetest2:ssh-rsa

concerns:

  • are these growing unbounded?
  • what happens if we hit a max entries limit?
  • do we want to pre-create these keys ourselves vs. trusting CI to be allowed to do this?

/kind bug
/wg k8s-infra
/sig testing
/area prow
/area infra-mgmt
/area access
/area infra/auditing
/priority important-longerm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the area/audit Audit of project resources, audit followup issues, code in audit/ label Mar 3, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Mar 3, 2021

/assign @amwat
To answer questions re: kubetest2

@amwat
Copy link
Contributor

amwat commented Mar 3, 2021

so i think both of those are generated by kube-up for the same reasons, except they have a different $USER
kubetest just inherits from https://github.com/kubernetes/test-infra/blob/master/config/prow/config.yaml#L614-L615 kubetest2 overrides it to use kubetest2

@spiffxp
Copy link
Member Author

spiffxp commented Mar 3, 2021

if that's the case I guess I would expect to see a lot more spurious keys with not-kubetest2 at the end of them

wonder what it is that kube-up is doing differently on its own? regardless, I'm feeling like some possible paths forward could be:

  • make kube-up clean after itself
  • make boskos clean up some of this stuff (I don't think wiping all common metadata is what we want, and not even all keys in this one field)
  • have some other custom janitor sweep through and clean these up ~daily/weekly

@amwat
Copy link
Contributor

amwat commented Mar 4, 2021

I think kube-down cleaning after itself and janitor both make sense.

Just to elaborate on the "why they are all the same":

we add preset-k8s-ssh keys to all the jobs

https://github.com/kubernetes/test-infra/blob/92cd24353f22a1d9be918983bb7330d4a317ea55/config/prow/config.yaml#L611-L628

which mounts a pair of existing keys (as secrets in the prow cluster) to each job.

which scenarios/kubernetes_e2e.py copies into /workspace/.ssh/google_compute_engine
which gcloud ends up reusing.

https://github.com/kubernetes/test-infra/blob/92cd24353f22a1d9be918983bb7330d4a317ea55/scenarios/kubernetes_e2e.py#L192-L205

This will propagate once for every time it encounters a new project, after that the keys will already exist (so makes it faster).

Whereas by default, gcloud will just create a new pair of keys (currently what's happening for kubetest2)

Depending on what's more important (time saving with static keys/ clean runs)
we can (port over all this logic to kubetest2/go the should be cleaned up in kube-down/janitor route)

@spiffxp
Copy link
Member Author

spiffxp commented Mar 9, 2021

That assumed key the preset adds is already prepopulated in all projects by ensure-e2e-projects.sh

We should figure out why it's not being detected

@spiffxp
Copy link
Member Author

spiffxp commented Mar 9, 2021

New key per run and cleaning up afterward would be a good followup and reduce impact of the assumed key needing to be rotated, but let's cut the noise down first

@spiffxp
Copy link
Member Author

spiffxp commented Mar 11, 2021

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 11, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Apr 1, 2021

kubernetes-sigs/kubetest2#109 - support for reusing ssh keys
kubernetes/test-infra#21561 - enable for our kubetest2 jobs

At this point I'm inclined to try a one-shot that nukes the ssh-keys and resets to known default. I will try to take a look at currently running jobs (look at boskos resources on relevant build clusters) to see if it's worth avoiding potential disruption caused by wiping keys.

@spiffxp
Copy link
Member Author

spiffxp commented Apr 9, 2021

Latest audit PR's don't show keys randomly getting added anymore.

Opened #1894 to stage a manual edit so I can edit keys more quickly, will run K8S_INFRA_ENSURE_E2E_PROJECTS_RESETS_SSH_KEYS=true ./infra/gcp/prow/ensure-e2e-projects.sh to purge all the random ssh-keys, and then look for results to be reflected in latest audit PR

@spiffxp
Copy link
Member Author

spiffxp commented Apr 9, 2021

Latest audit PR (#1897) shows the results of #1894

I sorta want to wait one more round and see if any more keys come back

@spiffxp
Copy link
Member Author

spiffxp commented Apr 9, 2021

OK, so I think we're at a steady state,

prow:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCmYxHh/wwcV0P1aChuFLpl28w6DFyc7G5Xrw1F8wH1Re9AdxyemM2bTZ/PhsP3u9VDnNbyOw3UN00VFdumkFLjLf1WQ7Q6rZDlPjlw7urBIvAMqUecY6ae1znqsZ0dMBxOuPXHznlnjLjM5b7O7q5WsQMCA9Szbmz6DsuSyCuX0It2osBTN+8P/Fa6BNh3W8AF60M7L8/aUzLfbXVS2LIQKAHHD8CWqvXhLPuTJ03iSwFvgtAK1/J2XJwUP+OzAFrxj6A9LW5ZZgk3R3kRKr0xT/L7hga41rB1qy8Uz+Xr/PTVMNGW+nmU4bPgFchCK0JBK7B12ZcdVVFUEdpaAiKZ prow
prow:prow:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCmYxHh/wwcV0P1aChuFLpl28w6DFyc7G5Xrw1F8wH1Re9AdxyemM2bTZ/PhsP3u9VDnNbyOw3UN00VFdumkFLjLf1WQ7Q6rZDlPjlw7urBIvAMqUecY6ae1znqsZ0dMBxOuPXHznlnjLjM5b7O7q5WsQMCA9Szbmz6DsuSyCuX0It2osBTN+8P/Fa6BNh3W8AF60M7L8/aUzLfbXVS2LIQKAHHD8CWqvXhLPuTJ03iSwFvgtAK1/J2XJwUP+OzAFrxj6A9LW5ZZgk3R3kRKr0xT/L7hga41rB1qy8Uz+Xr/PTVMNGW+nmU4bPgFchCK0JBK7B12ZcdVVFUEdpaAiKZ prow

But I don't grok why prow:prow:ssh-rsa is part of it:

  • I get that there's a bug in ensure-e2e-projects that ignores the last line of ssh-keys (should have ignored only empty lines), so that's why the extra key isn't getting removed
  • but there is some interaction I don't understand between one or more of
    • prow
    • kubetest2-gce
    • cluster/log-dump/log-dump.sh
    • gcloud ssh

Honestly, as long as keys aren't getting constantly added and blowing up audit PRs, I'm happy to leave this as is. If I can't figure this out after 30min, I'm going to open a followup issue for this bug, and just make the "steady state" the expected set of ssh-keys

@spiffxp
Copy link
Member Author

spiffxp commented Jul 13, 2021

/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/access Define who has access to what via IAM bindings, role bindings, policy, etc. area/audit Audit of project resources, audit followup issues, code in audit/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants