-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for autoscaling self-hosted github runners #845
Comments
In AWS we do this: GitHub App tokens:
Runner registration:
The runner instance:
The ASGs:
The way we do registration is so that:
|
@j3parker thank you for solution! |
Fantastic question -- I opened an issue about that over here: #699 We've prototyped a few hacks to detect when a job is started (to remove the runner from the ASG, triggering a new one to start booting to replace it + ASG policies to scale-up). We're just waiting patiently for ephemeral runners to be supported 😄
Our plan is to terminate, yes. Vaguely I'm assuming the runner will exit and we will trigger a shutdown. You can configure an EC2 instance to terminate on shutdown. Spinning up VMs for builds might be expensive. We do a fair clip of builds during the day so one option I'm mulling is to use firecracker rather than VMs, but you need to buy a whole (metal) instance for that. We haven't costed out if that would make sense for us yet. Hopefully in the long-term someone will develop a turn-key AWS solution that can do a mix of spot-based instances for small load and bulk firecracker-based ones for better latency at scale. |
I guess GitHub going to present something new in Q3.
@j3parker good tip, thank you.
Why expensive as EC2 instances are currently billed per-second ? |
Nice! That is simple.
Oh sorry, that was unclear. I mean in terms of time (there is a latency to spin up a machine.) Spinning up hot capacity in the background can hide that from users, but of course you're also paying for that. With enough concurrent builds it could be worth it (both in terms of money and managing perceived latency) to have an entire machine rented from AWS and use firecracker (which will boot things faster than EC2, e.g. it's what powers AWS Lambda). An
😄 We do that by taking actions/virtual-environments which defines the GitHub-hosted runners and patching the packer files with jsonnet to tweak things for our purposes (and install the runner exe.) I definitely recommend it. You need to keep up with versions of the runner so that when your VM connects to github it doesn't accept a job and then download the newer version of the runner (we have a scheduled github action that polls for new releases of the runner.) |
I doing a project like this by using GCP Preemptible VM but there are some issues:
I'm changing to Gcloud build. I think it's easier. |
Gitlab already support this feature for long time ago use Gitlab Runner Manager |
Waiting for this feature to be running on AWS ECS fargate |
Check this out as well |
@vietanhduong , how did you implement that in GCP? I'm trying to use a MIG with runners on them. |
How will they make you pay if runners are easy to auto scale? Its similar to "planned obsolesce", this would be "authentication nightmare" |
You can create a simple cronjob to regenerate the token every 30 minutes let's say. I created an scalable environment in an ECS cluster and sometimes the containers die after more then 1h, before unsubscribe the runner a function refresh the token. |
Strange that no one is pointing to the docs on this: https://docs.github.com/en/actions/hosting-your-own-runners/autoscaling-with-self-hosted-runners |
I would suppose that's because the features that doc is written around are fairly new, released 20 Sept. :D
|
I've just noticed this warning in the logs of my runner:
However this won't work for us as a project in the It has not changed as per https://docs.github.com/en/rest/reference/actions#self-hosted-runners In order to create a registration token for an org group (i.e. not belonging to a single repo) I'll need an access token with Admin rights on the org:
If this goes ahead then all apache projects won't be able to have single-shot runners anymore. |
Hi, In the cloudwatch logs I see that the lambda triggers the scale up function but it is not creating the EC2 instance and also the job builds are not queued up in SQS. As my understanding is right, whenever the job is in queued it should post it in the SQS queue and from there the lambda scale up function picks up the job. But that is not happening. I'm not seeing any messages come to SQS always the available messages is "0". Cloudwatch logs for Scaleup function 2022-07-22 17:55:35.045 INFO [scale-up:b0c371ee-c099-xxxxxxxx index.js:1142xx scaleUp] Received workflow_job from xxxxxxx |
Disclaimer: This doesn't answers the actual question, but suggests an alternative: You can achieve this easily with https://cirun.io/ It creates on demand runners for GitHub Actions on your cloud and manages the complete lifecycle. You simply connect your cloud provider and define what runners you need in a simple yaml file and that's it. See https://docs.cirun.io/reference/examples.html#aws for example. |
Describe the enhancement
I'm looking for a way to put a self-hosted github runner into an autoscale group.
I've discussed with Github Support and they've explained that the tokens are only valid for one hour. That's problematic for an autoscale group because it means they will fail to bring up a runner an hour after I deploy the autoscale group. They recommended raising my issue here, I apologize if we've both missed an obvious solution for this.
Code Snippet
Not Applicable.
The text was updated successfully, but these errors were encountered: