Emily Martin, Spring 2021, [email protected]
Sentiment analysis of German newspapers across the political spectrum during the refugee crisis in 2015 using SpaCy and the pipeline extension Sentiws. The goal was to determine if there was a difference in sentiments towards refugees between right-leaning and left-leaning new sources. This was done using Sentiws, a SpaCy extension for German sentiment analysis, which I also analyzed further to try and understand its functions and short-comings.
Here you can find my guestbook
I scraped my data using urllib and beautiful soup from Der Zeit, one of the largest weekly newspapers in Germany: centrist/liberal in its political leanings; Die Tageszeitung, a daily German newspaper with a modest circulation: leans left-wing/green; Der Süddeutsche Zeitung, a daily newspaper with a very wide circulation (second largest after Der Zeit): leans left-liberal; and Junge Freiheit, a small weekly newspaper: fairly strong right-wing leanings. Because of copyright issues I am not able to share the data itself, however, my scraping scripts along with the links for TAZ in my scraping folder can be shared and will let anyone interested in the future acquire the data for themselves.
-
README.md: you are here ;)
-
LICENSE.md: information about the license for this project.
-
project_plan.md: my original proposal and plan for the project.
-
progress_report.md: a report of my progress throughout the semester.
-
Presentation.ipynb: a notebook with the sentiment analysis and sentiws exploration
- Here it is in nbviewer with working links!
-
scraping: a folder with all my scraping scripts and the manually compiledTAZ links
- Junge_Freiheit.ipynb: code for scraping Junge Junge_Freiheit. Here is the nbviewer version
- taz.ipynb: code for scraping Die Tageszeitung. Here is the nbviewer version
- taz_urls.txt: manually compiled links for scraping taz
- zeit.ipynb: code for scraping Der Zeit. here is the nbviewer version
- Süddeutsce_zeitung.ipynb: code for scraping Der Süddeutsche Zeitung. Here is the nbviewer version
-
final_report.md: final report, documenting my process and findings.
-
DS_presentation.pdf: a pdf version of my class presentation.
-
html pics: pictures of the html for each scraped site.
-
images: a folder with all the images generated in my notebook.