This repo is a quick tutorial on how to scan a repo for secrets, remove them from history, and prevent leaking of secrets. Go through these steps chronologically.
Any step that has "(no action)" at the end indicates that no action is required before moving to the next step of the tutorial.
- Fork this repo. If you simply clone instead, you won't be able to push the changes you make.
- We're going to scan our repo with
gitLeaks
(though there are many options). Follow the directions here to install it. - Make sure that you follow the instructions to install
pre-commit
(pip install pre-commit
orbrew install pre-commit
) and runpre-commit autoupdate
- In a terminal, navigate to the base of the repo
cd /path/to/this/repo
. - Run
pre-commit install
- Think about what we want to catch. GitLeaks has an impressive list of rules that it uses to identify data it strongly suspects are secrets. It will catch secrets that follow the format of common secrets like AWS secret keys and GCP API keys, or even obscure secrets like "Etsy Access Tokens." You can also add custom formats. (no action)
- We could run a check right now against the default rules, but we will add one rule first to catch when a file contains the exact string
PASSWORD=
followed by anything. At the root of this repo, create a file called.gitleaks.toml
with the following content:
title = "Gitleaks Custom Rules"
[extend]
useDefault = true
[[rules]]
id = "exclude PASSWORD="
description = "checks for instances of PASSWORD= followed by digits"
regex = '''PASSWORD=.*'''
[allowlist]
description = "allow README, for examples"
paths = [
'''README.md'''
]
- Now, with our default and custom rules set up, let's test that we can't commit something bad. Create a file called
password.txt
with the following content:AWS_KEY=AKIAIMNOJVGFDXXXE4OA
- Add the file
git add password.txt
- Commit it:
git commit -m "uh oh, secret here"
. If you get an error, congratulations! You've blocked yourself from committing a secret! - Yeah, let's just delete that
password.txt
. We don't want that anymore. - Let's direct our attention to the repo and our history. If you look at our current files, we don't see anything that raises concern. It's just this
README.md
,.pre-commit-config.yaml
, adocs
folder with images, and a.gitleaks.toml
you created. But anything can happen in our history. (no action) - Run
git log --pretty=format:"%h%x09%an%x09%ad%x09%s"
(You can also rungit log
, but this will just be longer output, and we want something short and pretty). It should look something like the following, perhaps longer:
e9e8a60 Nathaniel Larson Wed Oct 11 23:07:04 2023 -0500 gitleaks update install and wording
ffc33c7 Nathaniel Larson Wed Oct 11 22:55:40 2023 -0500 delete bad.env
e016876 Nathaniel Larson Wed Oct 11 22:54:53 2023 -0500 not supicious 2023
a42c0ea Nathaniel Larson Fri Oct 28 03:37:02 2022 -0500 Preventative Measures steps 1-3 added
cdba762 Nathaniel Larson Fri Oct 28 03:36:09 2022 -0500 pre-commit config for repo
8b0e924 Nathaniel Larson Fri Oct 28 03:35:51 2022 -0500 Removing Sensitive Information steps 1-7 added
0b8a5c0 Nathaniel Larson Fri Oct 28 03:08:57 2022 -0500 steps 10-14 added, with error image
cdf02c1 Nathaniel Larson Fri Oct 28 03:01:33 2022 -0500 steps 5-9 added
6e82a4a Nathaniel Larson Fri Oct 28 02:59:47 2022 -0500 steps 2-4 added
d18cd01 Nathaniel Larson Fri Oct 28 02:48:13 2022 -0500 not suspicious
67f13d5 Nathaniel Larson Fri Oct 28 00:34:54 2022 -0500 step 1 added
ca2dbf1 Nathaniel Larson Fri Oct 28 00:33:55 2022 -0500 init README.md
- Nothing suspicious there, right? Just kidding. A couple of those commits may contain a secret. We will use the
detect
command:gitleaks detect
. It should give us a warning ("WRN"):
WRN leaks found: 3
- Re-run, this time with the
-v
flag for a verbose output:gitleaks detect -v
- Did you get an error for the
bad.env
file? It has broken our custom rule, as well as one of the default rules--but it is useful to note we only see the first (custom) rule listed in the warning. Expected output should look something like the following: - What's next? Removing that secret that some bumbling idiot committed!!
Note: bfg
may not remove all instances of a secret. For instance, you're removing it from your fork of my repo, but the secret it still in mine! Be cognizant of this when using this in the real world
These instructions closely follow those posted here by GitHub.
- Install bfg. On MacOS:
brew install bfg
. - Navigate to our repo root
cd /path/to/this/repo
(you're probably already there) - Identify the file we want to filter OUT:
bad.env
. We can either remove this file completely (using the--delete-files
flag) or replace all the text of particular files (using the--replace-text
flag). We will removebad.env
completely. - In this case, we will check our history for the commit(s) that we expect to filter OUT:
git log --pretty=format:"%h%x09%an%x09%ad%x09%s"
. The descriptions of the two commits arenot suspicious
andnot suspicious 2023
. We expect not to see these commit after the filter since they only involve adding or modifyingbad.env
. - Now for the powerful function:
bfg --delete-files bad.env
(this is an intense function, double-check that your command is correct before running) - Once we run this, let's make sure that the logs look correct. If we run
git log --pretty=format:"%h%x09%an%x09%ad%x09%s"
, most of the history should be the same, with the notable exception of ournot suspicious
andnot suspicious 2023
commits. If you look closely at our history,bad.env
is gone! - Let's run
gitleaks detect
to make sure. It should pass. - Now that we're sure that we're good. Let's run
git push --force
. This will update the remote repository. We're all fixed!
That's nerve-wracking, and can be a bit of work! How can we just prevent these things from happening? There are many ways. Some of the most effective ways to prevent leaking of secrets is to not have them in the repository at all:
- put secrets in a location outside of the repo,
- use environment variables to export secrets, and then read them from those variables in the code, or
- use a secrets management system.
But for this small demonstration, let's say you want a .env
file in your repo:
- Let's prevent that specific case of secret-committing from happening. Create a
.gitignore
with the contents, which will ignore any file with the.env
extension:
*.env
- Actually, we can do even better than that. GitHub maintains a list of
.gitignore
templates for dozens of languages here. Let's just choose the Python one here. Copy the contents of that into our.gitignore
- Create a file called
bad.env
. Rungit status
. Notice how that file doesn't even show up? The power of.gitignore
. You're safe, *as long as the.gitignore
ignores this file!
The seCureLI open source project that not only scans your local repo for secrets before you commit code to a remote repository like gitLeaks, but also installs linters based on the code of your project to support security and coding best practices, and configures all the hooks needed so you don’t have to. It collects valuable tools based on the type of project you're working on.
Visit the seCureLI page to get started with this tool.