-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor boskos cleanup timing #13
Comments
@ixdy: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/unassign |
/help |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@ixdy: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lifecycle frozen |
this is something to add a prometheus metric for this operation? @detiber |
@cpanato I believe that to be the case, yes. That said, I haven't dug into how the existing metrics are exposed for boskos. The dashboards sit at monitoring.prow.k8s.io, though. |
hello @ixdy the metric in question should be added in this part https://github.com/kubernetes-sigs/boskos/tree/master/cmd/cleaner ? or it is for another part of the code? |
Sorry for the delay in response. To clarify, this would be metrics added to the janitor(s), not the (unfortunately named) cleaner component. The basic gist is just adding some Prometheus metrics to the janitors, yes, but the primary challenge is that in some deployments (such as k8s.io prow) Boskos + the janitors run in a completely separate build cluster from the prow monitoring stack, which makes collecting these metrics more challenging, since they aren't directly accessible. In the case of k8s.io prow, to collect metrics from the core boskos service, we expose the boskos metrics port on an external IP and then explicitly collect from that address. Since the janitors run as a separate container, we'd need to either expose additional IPs for each janitor (non-ideal) or set up some sort of collector for all of the boskos metrics (core and janitor) and then expose that to the prow monitoring stack. Alternately, we could collect/push these metrics to the monitoring stack. [Note: I'm probably using the wrong Prometheus terminology here.] Figuring all of this out is the harder aspect of this issue. If this sounds interesting to you, please take it on! |
@ixdy thanks and my turn to say sorry for the delay 😄 There are two different things we need to do, the first one is to add the metric in the janitor and the second the infrastructure part. For the second I have a couple of questions:
I will work on the first part to add the metrics while we discuss the second If that sounds good to you thanks! |
/assign |
It depends. There are 3 (or 4) different janitor endpoints right now:
The one-shot janitors could be run as CronJobs, with or without Boskos (e.g. to manage AWS environments, GCP projects, etc that are not managed by Boskos). The Boskos-specific janitors tend to run as long-running pods. (So one follow-up question you might have: which janitor? The ones most relevant to this issue are probably
In general, yes, the janitors run in the same cluster as Boskos. This is because the necessary credentials/service accounts needed to interact with AWS accounts/GCP projects likely already exist in those clusters, as they are used by the test jobs. |
thanks for the clarification @ixdy |
aws-janitor-boskos: add clean time and process time metrics
aws-janitor: add job duration metric
Originally filed as kubernetes/test-infra#14715 by @BenTheElder
What would you like to be added: export and graph metrics for boskos cleanup timing
Why is this needed: so we can determine if this is increasing and we need to increase the janitor or fix boskos xref #14697
Possibly this should also move to the new monitoring stack? cc @cjwagner @detiber
/area boskos
/assign @krzyzacy
cc @fejta @mm4tt
/kind feature
The text was updated successfully, but these errors were encountered: