-
Notifications
You must be signed in to change notification settings - Fork 19
/
README
28 lines (20 loc) · 1.17 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
NLP Project
===========
Finding the genre of a song based on its lyrics.
------------------------------------------------
This repo is part of a project done during the Natural Language Processing in Python
course given at KAIST during Spring 2012. The aim of this project was to automatically
find the theme of a song based on its lyrics.
Tagging by Hand
---------------
The first goal was to find plenty of lyrics, tag them by hand and then run a machine learning
algorithm for the program to learn to which words is associated a particular theme.
We first scraped 250 lyrics from letssingit.com and built an interface using Meteor.js to
tag them with words like "violence", "love", "party".
Compute TF-IDF and train classifier
-----------------------------------
The last step was to compute the tf-idf for each term and then train a Naive
Bayes classifier from the library nltk on this data. After this step, our algorithm
would normally be able to automatically classify new songs based on its lyrics.
Unfortunately, since the sample of lyrics was too small and the tagging a bit incoherent
since it had been done by 3 different people, the classifier had a poor recall and precision.