The project is centered on Twitter data analysis since it’s the most affordable platform for educational research, and there are several handy tools available for analysis, like TAGS.
Since there’s no possibility to access historical data, I focus on current events (6-12 October (Tue-Mon) 2020), and one of the most prominent ones is a military conflict between Armenia and Azerbaijan over disputed territories, denoted by the “#karabakh OR #artsakh” hashtags on Twitter.
- read a very large Google Sheets data file and clean it
- write processed data back to .xlsx file
- process resulting .xlsx file locally
All processing is done in Python (v. 3.8.) using Jupyter Notebooks.
-
Very large Google Sheets data file
When TAGS software generates a Google Sheets data file of around 187000 records, it's almost impossible to work with it in the browser. Therefore one has to download it to local machine.
-
It's impossible to download a large Google Sheets data file without errors
Upon downloading to local machine as an .xlsx file, Excel detects errors and at least in MacOS version, it is unable to fix them.
- Read in Google Sheets data file in Google Colab, process it (clean) and save as an .xlsx file (in order to save space)
- Download resulting (much smaller) data file to a local machine
- Process data locally with
pandas
and other libraries, generate WordCloud image- see
2_tweets.ipynb
- see
-
Final project report:
PDF