Skip to content

dangardner/typogard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TypoGard

TypoGard is a client-side tool created to protect users from package typosquatting attacks. TypoGard was originally implemented as a standalone tool for npm, but it can easily be extended to other languages and can even be embedded in package installation software.

TypoGard-Crates

TypoGard-Crates is a fork of TypoGard with support for the crates.io registry. It implements the following additional features:

  • Queries a local Postgres instance containing a nightly crates.io database dump
  • Bit-flip detection - detects single bit-flip variations of popular crates
  • Filtering based on similarity of package descriptions (using spacy NLP, falling back to Levenshtein distance)
  • Uses crate authors instead of namespaces to identify common ownership
  • Downloads tarballs of potentially typosquatting crates, to make investigation more convenient
  • Ignores known non-malicious typosquatters with expected metadata signatures:

How TypoGard Works

TypoGard works by applying a number of transformations to a given package name and comparing the results to a list of popular package names. These transformations are based on the same transformations made by malicious actors in past package typosquatting attacks. For more detailed information, read the TypoGard paper on arXiv (the tool was referred to as SpellBound at the time of publication).

TypoGard-Crates Requirements

The version of TypoGard in this repository, which specifically targets crates.io, relies on the following:

  • Python3 (tested with Python 3.7.10, but other versions may work too)
  • psycopg2 for Postgres client functionality
  • spaCy for NLP-based crate description similarity scoring
  • semver to compare semantic versions of crates
  • requests to fetch crate tarballs from crates.io
  • blip to calculate single bit-flip variations
  • rapidfuzz to calculate Levenshtein distance

TypoGard-Crates Usage

usage: typogard_crates.py [-h] [--days CHECK_DAYS] [--top MOST_POPULAR]
                          [--similarity-threshold SIMILARITY_THRESHOLD]
                          [--lev-threshold LEVENSHTEIN_THRESHOLD]
                          [--download-dir CRATE_DOWNLOAD_DIR]
                          [--dbconf DB_CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  --days CHECK_DAYS     Only check for crates with versions created/updated
                        within this number of days (default: 3)
  --top MOST_POPULAR    Number of most popular crates to consider as
                        typosquatting targets (default: 3000)
  --similarity-threshold SIMILARITY_THRESHOLD
                        Similarity threshold in the range 0-1, over which we
                        consider crate descriptions similar (default: 0.97)
  --lev-threshold LEVENSHTEIN_THRESHOLD
                        Levenshtein distance threshold under which we consider
                        crate descriptions similar (default: 10)
  --download-dir CRATE_DOWNLOAD_DIR
                        Directory into which discovered crates will be
                        downloaded, will be created if necessary (default:
                        /var/tmp/cratefiles)
  --dbconf DB_CONFIG    Database configuration file (default: db.conf)

TypoGard fundamentally relies on a list of packages considered to be popular. TypoGard-Crates uses the nightly database dump from crates.io, allowing the user to specify the number of most popular crates with which to populate this list, and how many days in the past to look for newly published crate versions that may be typosquatting the most popular crates.

The database configuration file specified with --dbconf is in the standard psycopg2 format. See the file db.conf.example for an example configuration.

Example Usage:

If one would like to determine whether any crates with new versions published in the last 3 days are typosquatting any of the 3000 most popular packages, they could use:

python3 typogard_crates.py

This will notify the user and download the tarballs of any crates that appear to be typosquatting.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.6%
  • JavaScript 2.4%