A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel.
The motivation behind writing this script was:
- implement workaround for
gitleaks
intermittent failures when cloning very large repositories - implement ability to scan multiple repostiories in parallel
- implement ability to scan repositories for a user, a specified organization or read from a file
Notes:
- the script uses https to clone the repos
- you must set the
USERNAME
andPASSWORD
environment variables - this credential needs to have access to the repos being scanned - if using
--file
then https clone urls must be supplied in the file
- you must set the
- the maximum number of background processes (workers) that will be started is
35
- if the number of repos to process is less than the maximum number of workers
- the script will start one worker per repository
- if the number of repos to process is greater than the maximum number of workers
- the repos will be added to a thread-safe queue and processed by all the workers
- if the number of repos to process is less than the maximum number of workers
- the Docker container must run with a bind mount to the working directory in order to access logs/reports
- the repos will be cloned to the
./scans/clones
folder in the working directory - the reports will be written to the
./scans/reports/
folder in the working directory - a summary report will be written to
mpgitleaks.csv
- the repos will be cloned to the
usage: mpgitleaks [-h] [--file FILENAME] [--user] [--org ORG] [--exclude EXCLUDE] [--include INCLUDE] [--size SIZE] [--log]
A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel
optional arguments:
-h, --help show this help message and exit
--file FILENAME scan repos contained in the specified file
--user scan repos for the authenticated GitHub user where user is owner or collaborator
--org ORG scan repos for the specified GitHub organization
--exclude EXCLUDE a regex to match name of repos to exclude from scanning
--include INCLUDE a regex to match name of repos to include in scanning
--size SIZE scan repos less than specified size (in KB)
--log log messages to log file
Set the required environment variables:
export USERNAME='--username--'
export PASSWORD='--password-or-token--'
If using --user
or --org
options and GitHub instance is not api.github.com
:
export GH_BASE_URL='--api-address-to-github-instance--'
Execute the Docker container:
docker container run \
--rm \
-it \
-e http_proxy \
-e https_proxy \
-e GH_BASE_URL \
-e USERNAME \
-e PASSWORD \
-v $PWD:/opt/mpgitleaks \
soda480/mpgitleaks:latest \
[MPGITLEAKS OPTIONS]
Note: the http[s]_proxy
environment variables are only required if executing behind a proxy server
Scan all repos contained in the file repos.txt
but exclude the repos that match the specified regex, an example of a repos.txt
can be found here:
mpgitleaks --file 'repos.txt' --exclude 'soda480/mplogp'
Scan all repos for the authenticated user but exclude the repos that match the specified regex:
mpgitleaks --user --exclude 'intel|edgexfoundry|soda480/openhack'
Scan all repos in the specified organization but only include the repos that match the specified regex:
mpgitleaks --org 'myorg' --include '.*-go'
Clone the repository and ensure the latest version of Docker is installed on your development server.
Build the Docker image:
docker image build \
--target build \
--build-arg http_proxy \
--build-arg https_proxy \
-t \
mpgitleaks:latest .
Run the Docker container:
docker container run \
--rm \
-it \
-e http_proxy \
-e https_proxy \
-v $PWD:/code \
mpgitleaks:latest \
/bin/bash
Build application:
pyb -X