A set of CLI tools for downloading, creating, and searching an offline version of the Pwned Passwords database.
Due to the complexity of efficiently loading and querying the 850+ million passwords into any sort of database this project uses Golomb Coded Sets as the storage medium. Golomb Coded Sets are a probabilistic data structure similar to bloom filters, but with a more compact memory footprint and a somewhat slower query time.
Using GCS, the Pwned Passwords v8 list (SHA1 encoded) that occupies around 40 GB uncompressed only uses about 3 GB as a Golomb Coded Set. The set, when used for querying returns results with a one in a hundred million false positive rate in less than 10ms (with SSD storage).
This project also includes a simple REST API for querying the generated GCS Pwned Password database for use in other systems that need this kind of check.
Some use cases (like the one that inspired this project) require that no passwords (not even partial hashes) ever, ever leave the local network. This means that the use of the excellent Have I Been Pwned range API is forbidden.
It may also be used in airgapped networks where a strong security posture is required for in-house applications.
The CLI tool includes 3 different modes of operation: download
, create
, and query
. Each
command needs the output of the previous one to work correctly. This means that the create
command
requires the output file from the download
command, and the query
command needs the file output
from the create
command.
Each sub-command has a -h
(help) flag to help with their use, but a fast example on how to get up
and running "quickly" are presented here:
# Download the pwned passwords SHA1 file
# this takes a while, more threads may make this go faster
go run cmd/pwd-checker/main.go download -o "/home/user/pwned-pwds.txt"
# Create a GCS file with a 1-in-100m false positive rate
# this also may take a while, SSD storage makes this go much faster, RAM is also important
go run cmd/pwd-checker/main.go create -i "/home/user/pwned-pwds.txt" -o "/home/user/pwned-pwds-p100m.gcs" -g 1024 -p 100000000
# Run an interactive shell session to query the database.
# May also be run non interactively if omitting the -n flag
# CTRL+C to interrupt
go run cmd/pwd-checker/main.go query -n -i "/home/user/pwned-pwds-p100m.gcs"
- The download command uses the haveibeenpwned.com API to download the password hashes. It does not download the files from the website, so the file downloaded using this command has more leaked passwords than the ones available on the website.
- The download may appear stuck at the end, it's not. With slower storage the writing of the file may take more time, even after all hashes have already been downloaded. Faster storage (SSD) makes this final step faster.
- The downloaded file requires at least 40GB of storage. If your output path does not have enough storage the command may error out before starting.
- Increasing the thread count when using the download command improves the download speed. 128 threads netted me a constant 150Mbps download speed. If the download feels to slow with the default threads set you may increase it, but there are diminishing returns based on the amount of logical CPU cores, internet speed, and storage speed.
- The create command has a minimum RAM warning. The calculation is not that precise. It will eat all your available RAM, but the minimum amount of memory is enforced. In my experience closing all other programs when running this command reduces the processing time by 2-3 minutes.
The serve
command exposes a simple unauthenticated REST API to query an already generated GCS
database from the other commands of this project. The server uses and requires TLS to start.
the flags --self-tls
, --tls-key
, and --tls-cert
configure the certificate to be used by the
server. If --tls-key
and --tls-cert
are used the value of --self-tls
is ignored. --self-tls
generates a self-signed certificate on server start. It has a validity of 30 days, and regenerates
on each server restart.
To change the port use the --port
flag, by default it uses port 3100
.
To start the server use the serve
command:
# Start the server with a self signed certificate on port 3100
go run cmd/pwd-checker/main.go serve -i "/home/user/pwned-pwds-p100m.gcs" --self-tls
# Start the server with your own certificates on port 3100
go run cmd/pwd-checker/main.go serve -i "/home/user/pwned-pwds-p100m.gcs" --tls-key "/home/user/tls/pwned.key" --tls-cert "/home/user/tls/pwned.pem"
The server exposes two endpoints, one to check a SHA1 hash directly, for example if you don't want to expose user passwords over the network; and another to check a plain text password directly.
POST /v1/check/hash
# Request
{
"hash": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"
}
# Response
{
"pwned": true
}
This API also checks the password strength of the supplied password. The check is done using the zxcvbn algorithm developed by Dropbox. This can be useful to present the user with a strength meter on password change, or for use in other risk evaluation systems you may have in use.
POST /v1/check/password
# Request
{
"password": "Password1"
}
# Response
{
"pwned": true,
"strength": {
"crackTime": 0,
"crackTimeDisplay": "instant",
"score": 0
}
}
- To improve performance each query opens a file handle to the referenced GCS file, this may require an increase of the max open files OS variables. The handle is closed as soon as the query is complete.
- The server logs to stdout in JSON format.
- The server logs the HTTP calls, also in JSON format.
- The server caches the password check requests for one hour, with a max of 50.000 unique requests cached.
- The server supports the autoconfiguration of a self-signed TLS certificate (valid for 30 days)
with the use of the
self-tls
flag. This certificate is regenerated on each server start.
There is a docker-compose.yaml
and Dockerfile
file that builds the server to be used. Be warned
that the file reads are incredibly slow, so the start-up procedure takes some time, and requests
tend to take 500-1500ms.
Use the Docker container at your own peril...
PD: I will probably accept PR's for the Docker container to work correctly.
Every command of this project has support for GO's pprof package. To enable profiling set
the --profile
and --profile-port
flags to start an HTTP pprof server. By default, the profile
port used is 6060
.
The project comes with unit tests for the gcs
amd hibp
package only. All other packages are
untested for now.
I have done some load tests using K6. The test checks 3 different password / hash combinations per virtual user, so 100 VUs equal 300 requests per iteration.
In the tests/k6
folder are the k6 scripts that test the API endpoints. There is also a Grafana
dashboard, with a statsd service to see the load test results in real time.
To use the Grafana dashboard a docker-compose.yaml
file is provided in the test/k6/realtime
folder. Some manual configuration is needed before the test results can be seen in the dashboard:
- After running
docker compose up
, open grafana. User:admin
, Pass:admin
- Create the prometheus datasource pointing to
http://prometheus:9090
- Import the dashboard with id
13861
- Turn the API server on
- Run the tests
# Test the hash check API
k6 run -o statsd -e API_BASE_URL="https://localhost:3100" hash.js
# Test the password check API
k6 run -o statsd -e API_BASE_URL="https://localhost:3100" password.js
# Test the hash check API
k6 run -e API_BASE_URL="https://localhost:3100" hash.js
# Test the password check API
k6 run -e API_BASE_URL="https://localhost:3100" password.js
This test was done on an i7 7700HQ (4c/8t) laptop with 32GB of RAM with no request caching done.
/\ |‾‾| /‾‾/ /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: password.js
output: -
scenarios: (100.00%) 2 scenarios, 100 max VUs, 6m30s max duration (incl. graceful stop):
* smoke: 1 looping VUs for 1m0s (gracefulStop: 30s)
* load: Up to 100 looping VUs for 5m0s over 3 stages (gracefulRampDown: 30s, startTime: 1m0s, gracefulStop: 30s)
running (6m00.0s), 000/100 VUs, 27324 complete and 0 interrupted iterations
smoke ✓ [======================================] 1 VUs 1m0s
load ✓ [======================================] 000/100 VUs 5m0s
✓ is ok
✓ is pwned
✓ is not pwned
checks.........................: 100.00% ✓ 163944 ✗ 0
data_received..................: 18 MB 49 kB/s
data_sent......................: 14 MB 40 kB/s
http_req_blocked...............: avg=9.1µs min=0s med=0s max=12.39ms p(90)=0s p(95)=0s
http_req_connecting............: avg=5.49µs min=0s med=0s max=12.39ms p(90)=0s p(95)=0s
http_req_duration..............: avg=669.04ms min=9.51ms med=681.53ms max=4.21s p(90)=1.21s p(95)=1.39s
{ expected_response:true }...: avg=669.04ms min=9.51ms med=681.53ms max=4.21s p(90)=1.21s p(95)=1.39s
✓ http_req_failed................: 0.00% ✓ 0 ✗ 81972
http_req_receiving.............: avg=59.15µs min=0s med=0s max=6.51ms p(90)=81.78µs p(95)=520.29µs
http_req_sending...............: avg=17.16µs min=0s med=0s max=9.41ms p(90)=0s p(95)=0s
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=668.96ms min=9.12ms med=681.47ms max=4.21s p(90)=1.21s p(95)=1.39s
http_reqs......................: 81972 227.6835/s
iteration_duration.............: avg=881.67ms min=21.99ms med=894ms max=4.21s p(90)=1.5s p(95)=1.76s
iterations.....................: 27324 75.8945/s
vus............................: 1 min=1 max=100
vus_max........................: 100 min=100 max=100
Special thanks to Thomas Hurst for having an example of using the GCS algorithm in this repository, and to Giovanni Bajo for writing the original GCS blog that inspired all of this.
Also thanks to haveibeenpwned for providing the hashes of leaked passwords for free. Troy Hunt is the best.