Who Wins? Utilizing Facebook Comments for Football Match Predictions
CS372 Natural Language Processing with NLP, KAIST (Spring 2022)
Check our paper.
We propose a novel approach to predict football match results which depends only on Facebook public comments posted before the match starts and does not require previous statistics that may include expensive access or require intensive feature engineering.
EPC20-21 Dataset: Facebook comments in the starting line-up updates from 405 posts across 16 teams and 301 matches in the English Premier League 2020-21 season.
Labels | Number of Comments |
---|---|
Win | 19392 |
Draw | 13352 |
Lose | 20388 |
Install the following libraries:
- nltk
- pandas
- scikit-learn
- tqdm
- facebook_scraper
Optional libraries (using semantic-based encoder):
- tensorflow-hub
- sentence-transformers
Example:
python main.py --encoder unigram --predictor spearman --eval_mode single
python main.py --encoder tfidf --predictor cossim --eval_mode single
python main.py --encoder sentbert --predictor cossim --eval_mode double
Run the following commands:
# Loop your preferred teams to be included in the dataset
python scraper/page_scraper.py [team]
# Manually arrange post IDs as shown in the epl_fixtures-post_ids.csv file
python scraper/post_scraper.py