Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wire up typosquatting checks when new packages are published #7206

Merged
merged 8 commits into from
Nov 14, 2023

Conversation

LawnGnome
Copy link
Contributor

@LawnGnome LawnGnome commented Sep 29, 2023

This PR adds experimental support for checking the names of newly published crates for potential typosquatting. It uses essentially the same algorithm as typogard as adapted by @dangardner for offline crates.io checks, reimplemented in Rust by me in the new typomania crate.

I want to emphasise the word experimental in the above paragraph: right now, the only thing that will happen if a new crate might be typosquatting a popular crate is that an e-mail will be sent to a list of interested parties, which at present is defined as myself and @walterhpearce (who has kindly volunteered to help me with this), acting on behalf of the Rust Foundation Security Initiative. No private information is sent in those e-mails; it's a simple notification that crate A might be squatting crates B and C. If anyone else would like to be on that list, I'm happy to update the configuration accordingly.

Should there be actual typosquatting, we'll bring it to the crates.io team via Zulip and we can handle it from there using our usual process. (Well, realistically, I'll probably take off my Rust Foundation hat, put on my crates.io hat, and handle it myself in consultation with the rest of you.)

I would like to run this experiment for about two months before taking any further action, to judge the potential impact of using these checks for anything more formal or in a way that might impact the load on the crates.io team.

To be very clear, the crates.io team can say no to this. I promise I won't be mad.

Possible courses of action after that time period might include:

  1. If the notifications are way too noisy to be useful, backing this PR out. (I've intentionally kept it as separate as possible — even in places where it would have been simpler to integrate more closely with the core of crates.io — to make this straightforward.)
  2. If the notifications are a little bit too noisy, I may adjust the checks to try to improve the signal to noise ratio. (This may be something I do sooner if it's very obviously bad.)
  3. If the notifications feel just right, then we might do one or both of:
    1. Notify the crates.io team using Zendesk or some other method, and/or
    2. If the crate quarantine RFC is ultimately finished and accepted, then we could also feed this into a potential quarantine of new crates that are potential typosquats.

To lay my cards on the table, my hope is that this ends up with course 3(i) — notifying the crates.io team in general. Using quarantine is a possibility, but it's probably something I'd prefer to keep off unless we're actively under a spam attack.

Technical detail

This is implemented as a new background job that is enqueued when a new crate is published. The background job keeps an in-memory cache of the most popular 3000 crates (as judged by download count) to make the checks speedy and minimise extra load on the database. (Regenerating the cache locally takes about 700 ms, so I'm not terribly concerned about this in practice.)

There's no TTL on the cache because (as @Turbo87 pointed out) the dynos get restarted every 24 hours anyway, which is plenty fresh enough for what we need.

With the top 3000 crates, the cache uses less than 1 MiB of heap. I think we can afford that.

Because we have to send e-mails from the background job, Emails now has to be plumbed through the Environment that is exposed to background jobs. I don't think this is a problem in practice, but it is an extra change.

Finally, as this is an experiment, the configuration is hardcoded into the new worker job module. If this becomes a longer term thing, this would be split out into our normal configuration system for easier management, but right now this isolates the changes as much as possible in worker::typosquat.

@LawnGnome LawnGnome added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-publish A-backend ⚙️ labels Sep 29, 2023
@LawnGnome LawnGnome requested a review from a team September 29, 2023 02:50
@LawnGnome LawnGnome self-assigned this Sep 29, 2023
Cargo.toml Outdated Show resolved Hide resolved
src/tests/util/test_app.rs Outdated Show resolved Hide resolved
src/email.rs Outdated Show resolved Hide resolved
src/worker/typosquat/config.rs Outdated Show resolved Hide resolved
src/worker/typosquat/config.rs Outdated Show resolved Hide resolved
src/worker/typosquat/types.rs Outdated Show resolved Hide resolved
src/worker/typosquat/types.rs Outdated Show resolved Hide resolved
src/worker/typosquat/cache.rs Outdated Show resolved Hide resolved
src/worker/typosquat.rs Outdated Show resolved Hide resolved
src/worker/typosquat/cache.rs Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Oct 3, 2023

☔ The latest upstream changes (presumably 5210643) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented Oct 31, 2023

☔ The latest upstream changes (presumably #7395) made this pull request unmergeable. Please resolve the merge conflicts.

@LawnGnome
Copy link
Contributor Author

This has been heavily rebased after the various background worker changes in the last couple of weeks — although @Turbo87 looked at this a while back, I'd appreciate a re-review (preferably from him, but I'm also grateful to anyone else who wants to cast a quick eye over this).

Tested locally; all seems to work as expected based on trying to publish a serd crate.

src/worker/typosquat/database.rs Outdated Show resolved Hide resolved
src/worker/environment.rs Show resolved Hide resolved
src/worker/typosquat/mod.rs Outdated Show resolved Hide resolved
@bors
Copy link
Contributor

bors commented Nov 13, 2023

☔ The latest upstream changes (presumably d40529a) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: let's give this a try :)

@LawnGnome LawnGnome merged commit 7608eab into rust-lang:main Nov 14, 2023
6 checks passed
LawnGnome added a commit to LawnGnome/crates.io that referenced this pull request Nov 20, 2023
This extends our new typosquatting checks (see rust-lang#7206) to detect an
attack vector we've seen more recently where a bad actor tries to squat
an existing, popular crate by adding or removing a common suffix (such
as `-rs` or `-sys`).

The suffix list in the configuration has been taken _approximately_ from
the most popular suffixes in the existing set of crates, with a small
amount of human judgement involved on which ones are more likely to be
abused based on recent incidents.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ A-publish C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants