Skip to content

Machine learning model to determine whether a news headline is actually from The Onion

Notifications You must be signed in to change notification settings

awalker88/spit-out-the-onion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spit-out-the-onion

With the current state of the world, I'm sure everyone's second-guessed whether a news headline was real or satire. To save people the trouble of having to actually read the article, I built a classifier model that takes in your headline and gives the percentage chance that it came from The Onion, the largest satirical news network on the web. The heart of the process comes in the form of a pre-trained DistilBERT model, which eats a tokenized version of your headline and kindly spits out a lower-dimensional embedding vector that represents it. Then, I use a logistic regression model that was trained on top of those embeddings to give a probability that it's from The Onion. To train the logistic regression, I used 30k headlines gathered from two subreddits, r/TheOnion and r/NotTheOnion. With a balanced dataset, the downstream logistic regression model achieves 87% accuracy on the training set and 85% accuracy on the test set.

This project is just for fun, but if you'd like more details, check out the notebook that walks through my process 🤗.

img Data pull originally sourced from https://github.com/lukefeilberg/onion

About

Machine learning model to determine whether a news headline is actually from The Onion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published