spit-out-the-onion

With the current state of the world, I'm sure everyone's second-guessed whether a news headline was real or satire. To save people the trouble of having to actually read the article, I built a classifier model that takes in your headline and gives the percentage chance that it came from The Onion, the largest satirical news network on the web. The heart of the process comes in the form of a pre-trained DistilBERT model, which eats a tokenized version of your headline and kindly spits out a lower-dimensional embedding vector that represents it. Then, I use a logistic regression model that was trained on top of those embeddings to give a probability that it's from The Onion. To train the logistic regression, I used 30k headlines gathered from two subreddits, r/TheOnion and r/NotTheOnion. With a balanced dataset, the downstream logistic regression model achieves 87% accuracy on the training set and 85% accuracy on the test set.

This project is just for fun, but if you'd like more details, check out the notebook that walks through my process 🤗.

Data pull originally sourced from https://github.com/lukefeilberg/onion

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
OnionOrNot.csv		OnionOrNot.csv
README.md		README.md
data_pull.ipynb		data_pull.ipynb
example.png		example.png
onion_or_not.ipynb		onion_or_not.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spit-out-the-onion

About

Releases

Packages

Languages

awalker88/spit-out-the-onion

Folders and files

Latest commit

History

Repository files navigation

spit-out-the-onion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages