GitHub Action
Time To Merge Tool - Model Inference Test
Want an estimate on how long it will take to get your PR merged?
The data science team within Red Hat's Emerging Technologies group wanted to know if we could leverage Github Actions to help us do some predictive analytics and measure the velocity of our projects. To that end we developed the "Time to Merge" (TTM) tool .
This repository contains the TTM GitHub Action needed to train a custom model for individual projects and provide predictions in the form of PR comments. This model can be trained on any repository and can be used to predict the time to merge of new pull requests. To learn more about this approach, please see here.
NOTE: Currently the GitHub action is set up in a way to succeed and comment time to merge estimates on pull requests only when pull requests are opened from a feature branch of the same repository.
To use the Github Action for your own repository and train a model, follow these steps:
-
S3 bucket: You will need an S3 bucket to store the data and the model generated as a part of the training process. You can pass the S3 bucket credentials in 2 ways. You can either set them up as Github Action Secrets or pass them as a payload from your http request.
-
Personal Acess Token: You also need a personal access token to trigger the workflow and download data from GitHub. You can generate that by going here
Once you have the pre-requisites in place, add your S3 credentials to your repository action secrets (this is the recommended approach) if they are private and you dont want to pass them on through the http request .
To do that, go to your repository "Settings" -> "Security" -> "Secrets" -> "Actions" -> "New Repository Secret" and add secrets for S3_BUCKET
, S3_ENDPOINT_URL
, AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, your personal access token as your GITHUB_TOKEN
, the prefix/folder where you want to store the data on the S3 bucket as the CEPH_BUCKET_PREFIX
, and the GitHub repository that you want to train the model on and the organization it belongs to as REPO
and ORG
.
To use this Github Action for both training and inference on new PR's, you will need to add 2 workflow files to your repository.
One action controls the training workflow, and is triggered on-demand and as needed. The other action controls the inference workflow, and is triggered on each new PR submission.
- Training Mode :
For every new repository, you need to first train the model on the historical pull requests.
To do that, you will need to add a train-ttm.yaml
file to the .github/worklows/
folder on your repository that looks like this. To run the action in training mode, make sure you specify the MODE
as 1
.
This mode will initiate the model training process which includes data collection, feature engineering and model training on the historical pull requests and finally runs the inference i.e. predicting the time to merge for the last pull request on the repository.
(NOTE : This workflow will fail if there are no PRs on the repository. For this model to train correctly, you would need atleast 11 closed PRs on the repository being trained. For good model performance, we recommend training the model on repositories with atleast a few 100 PRs.)
You can also initiate a manual trigger by going to actions for your repository like here:
Go select - Run Time to Merge Model Training
and go to Run workflow
on upper right and run it like such :
This will initiate the model training and inference action.
- Inference Mode :
Similar to the train-ttm.yaml
file, you need to add another file called predict-ttm.yaml
to the .github/worklows/
folder in your repository that looks like this. This file has set MODE
to 0
which will enable inference on all new incoming pull requests and add a comment on the pull request specifying the approximate time it will take to be merged.
(NOTE: For the inference workflow to succesfully comment a time range prediction, you need atleast one open PR on the repsoitory)
To view your running workflow from the Github UI, go to "Actions" and click on the workflow run :
Click on pipeline
to see logs and errors :
You can also use this tool on your repository with an alternate approach without adding the workflow file to your repository. Here are the steps that you can follow:
-
Fork this repository and to your fork add the secrets as mentioned here. Make sure to mention the
REPO
andORG
for the repository you want to run TTM on. -
Go to
Actions
for your fork and select therun in container
workflow to train the model.
- You can also interact with this tool by POST request to Github API endpoint. From your terminal, clone your repository and run
bash run-ttm.sh
. This will run the training workflow and train the TTM model on the repo and org of your choice.
- Enter your github username
- Enter the repository you want to train the model on eg:
community
- Enter the organization the repo belongs to eg:
operate-first
- Enter the personal access token generated in the previous step eg:
ghp_xyzxyzxyz
If you are passing your S3 credentials here
- Enter your bucket name
- Enter your endpoint url
- Enter your Access Key
- Enter your Secret Key