Replies: 1 comment 4 replies
-
Hi @maigl ! Ideally this should be something that is handled by github itself. Garm should concern itself with making runners available to github, managing the lifecycle of the instances on which the runner is hosted, but not the lifecycle of the runner itself. That should be github's responsibility. After we run the command to register the runner to github, we no longer dictate what that runner does. Jobs are sent by github to the runner, and github removes the runner (if it's ephemeral) from the list of available runners, once it's job is done. Having garm intervene here, would cross an architectural boundary that would probably be difficult to move away from once it's adopted. The only real way garm could enforce something like this is if we are willing to have garm forcefully cancel jobs. Having garm cancel jobs, leads to jobs failing in non obvious ways and inevitably frustration if the developer is unaware that garm has a specific timeout set. This seems like something that should reside outside of garm. At least at this stage. There is a discussion here: https://github.com/orgs/community/discussions/25631 on the same matter. There is a Sadly, there is no org/enterprise level setting to enforce this. It seems that people need this as evidenced by this comment https://github.com/orgs/community/discussions/25631#discussioncomment-3248533, si it may be worth pinging that thread. In the absence of an org/enterprise wide default timeout, this could possibly be enforced as a "best practice" and caught through proper vetting/linting in pre-push hooks. For example, a pre-push hook could be created that parses all workflow jobs and ensures that an explicit Alternatively, monitoring of job run times can be implemented and jobs can be canceled using a cron job or something similar. |
Beta Was this translation helpful? Give feedback.
-
We want to use garm in a wider scenario with many users.
For fairness and to avoid misuse and for optimization we need to be able to see who's using the system to which extend and we also need to be able to set limits.
In github.com you also have a number of limits:
https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration#usage-limits
One important first step would be a feature to add a max job execution time.
E.g. 6 hours - after that time an active runner will be stopped.
What do you think?
(Happy to provide a PR if that's the right solution.)
Beta Was this translation helpful? Give feedback.
All reactions