Skip to content

Data-Science-for-Linguists-2024/subreddit-clustering

Repository files navigation

subreddit-clustering

Madeline Powers, [email protected], April 28, 2024

Project description

This repository contains an analysis of the topics of subreddits using topic modeling with scikit learn on a large dataset of reddit comments.

Data description

The data used for this project was collected specifically for it, using PRAW. It is included in the repository in partial form, and can be reconstructed using the unredact_data.py script.

Directory

Guestbook

guestbook for fellow students

Releases

No releases published

Packages

No packages published