Material for the text analysis internship SoSe2021 @ISTO
Starting Date: September 2021 Where: ISTO LMU Munich (3rd floor) Coordination: Joy Wu and Giulia Solinas
The intern will make use of R to develop a repository of tools on for text analysis on large datasets. We are integrating our coding workflow on a GitHub repository that can be edited directly through RStudio.
- To familiarize with GitHub, Git, and RStudio, please check the materials on "Happy Git and GitHub for useR."
- This GitHub guide gets you started.
For the internship, we expect intermediate familiarity with R and Rstudio. Knowledge of the tidyverse packages--especially dplyr and ggplot--maybe helpful.
The R community usually refers to two very useful online books: "R for Data Science" (at this link) and "R Cookbook" (at this link). These books span from basic workflow with R to model building.
Emil Hvitfled and Julia Silge ran this fantastic tutorial on developing predictive modeling with text using the tidy data principles. They have also published this online book "Supervised Machine Learning for Text Analysis in R."
This GitHub repo by Arthur Spirling contains materials from his NYU "Text as Data" course from spring 2021.
For references on text mining, you can consult the book "Text Mining with R" by Julia Silge and David Robinson here.A summary is available at Julia's blog.
Julia Silge is an expert on modeling using R and with Max Kuhn, she is the author of "Tidy Modeling with R" (here). Modeling is not the main objective of this intern, but it is worth to consider potential applications that may derive.
There are several resources available online. Members of the Academy of Management from the Terry College of Business of the University of Georgia have created this webpage with links to workshops, resources, and publications.