Social Media Analytics 2020 - student project

Overview

The project is centered on Twitter data analysis since it’s the most affordable platform for educational research, and there are several handy tools available for analysis, like TAGS.

Since there’s no possibility to access historical data, I focus on current events (6-12 October (Tue-Mon) 2020), and one of the most prominent ones is a military conflict between Armenia and Azerbaijan over disputed territories, denoted by the “#karabakh OR #artsakh” hashtags on Twitter.

Main points of the exercise

read a very large Google Sheets data file and clean it
write processed data back to .xlsx file
process resulting .xlsx file locally

All processing is done in Python (v. 3.8.) using Jupyter Notebooks.

Obstacles

Very large Google Sheets data file

When TAGS software generates a Google Sheets data file of around 187000 records, it's almost impossible to work with it in the browser. Therefore one has to download it to local machine.
It's impossible to download a large Google Sheets data file without errors

Upon downloading to local machine as an .xlsx file, Excel detects errors and at least in MacOS version, it is unable to fix them.

Solutions

Read in Google Sheets data file in Google Colab, process it (clean) and save as an .xlsx file (in order to save space)
- see 1_get_data_from_googlesheets.ipynb
Download resulting (much smaller) data file to a local machine
Process data locally with pandas and other libraries, generate WordCloud image
- see 2_tweets.ipynb

Data and results

Original Google Sheets data sample:
Downloaded .xlsx data sample:
WordCloud resulting image:
Final project report: PDF

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
materials		materials
1_get_data_from_googlesheets.ipynb		1_get_data_from_googlesheets.ipynb
2_tweets.ipynb		2_tweets.ipynb
LICENSE		LICENSE
Project_Report.pdf		Project_Report.pdf
README.md		README.md
twitter-logo-mask.png		twitter-logo-mask.png
wordcloud.png		wordcloud.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Media Analytics 2020 - student project

Overview

Main points of the exercise

Obstacles

Solutions

Data and results

About

Languages

License

andrejkurusiov/social-media-analytics-2020

Folders and files

Latest commit

History

Repository files navigation

Social Media Analytics 2020 - student project

Overview

Main points of the exercise

Obstacles

Solutions

Data and results

About

Topics

Resources

License

Stars

Watchers

Forks

Languages