Merge remote-tracking branch 'origin/master' into corrections-2024-09

acl-org · Sep 24, 2024 · 6a401bf · 6a401bf
2 parents 6cf9e2c + edae538
commit 6a401bf
Show file tree

Hide file tree

Showing 4 changed files with 630 additions and 1 deletion.
diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml
@@ -341,7 +341,7 @@
       <author><first>Elliot</first><last>Pickens</last><affiliation>Department of Computer Science, University of Wisconsin - Madison</affiliation></author>
       <author><first>Coen</first><last>Needell</last></author>
       <author><first>David</first><last>Rothschild</last><affiliation>Research, Microsoft</affiliation></author>
-      <author><first>Maria</first><last>Pacheco</last><affiliation>University of Colorado at Boulder</affiliation></author>
+      <author><first>Maria Leonor</first><last>Pacheco</last><affiliation>University of Colorado at Boulder</affiliation></author>
       <pages>393-415</pages>
       <abstract>The mainstream media has much leeway in what it chooses to cover and how it covers it. These choices have real-world consequences on what people know and their subsequent behaviors. However, the lack of objective measures to evaluate editorial choices makes research in this area particularly difficult. In this paper, we argue that there are newsworthy topics where objective measures exist in the form of supporting data and propose a computational framework to analyze editorial choices in this setup. We focus on the economy because the reporting of economic indicators presents us with a relatively easy way to determine both the selection and framing of various publications. Their values provide a ground truth of how the economy is doing relative to how the publications choose to cover it. To do this, we define frame prediction as a set of interdependent tasks. At the article level, we learn to identify the reported stance towards the general state of the economy. Then, for every numerical quantity reported in the article, we learn to identify whether it corresponds to an economic indicator and whether it is being reported in a positive or negative way. To perform our analysis, we track six American publishers and each article that appeared in the top 10 slots of their landing page between 2015 and 2023.</abstract>
       <url hash="09aa02ed">2024.acl-long.24</url>

diff --git a/data/xml/2024.cpss.xml b/data/xml/2024.cpss.xml
@@ -0,0 +1,141 @@
+<?xml version='1.0' encoding='UTF-8'?>
+<collection id="2024.cpss">
+  <volume id="1" ingest-date="2024-09-18" type="proceedings">
+    <meta>
+      <booktitle>Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers</booktitle>
+      <editor><first>Christopher</first><last>Klamm</last></editor>
+      <editor><first>Gabriella</first><last>Lapesa</last></editor>
+      <editor><first>Simone Paolo</first><last>Ponzetto</last></editor>
+      <editor><first>Ines</first><last>Rehbein</last></editor>
+      <editor><first>Indira</first><last>Sen</last></editor>
+      <publisher>Association for Computational Linguistics</publisher>
+      <address>Vienna, Austria</address>
+      <month>Sep</month>
+      <year>2024</year>
+      <url hash="c6e14888">2024.cpss-1</url>
+      <venue>cpss</venue>
+      <venue>ws</venue>
+    </meta>
+    <frontmatter>
+      <url hash="82986c8e">2024.cpss-1.0</url>
+      <bibkey>cpss-2024-1</bibkey>
+    </frontmatter>
+    <paper id="1">
+      <title>Detecting Calls to Action in Multimodal Content: Analysis of the 2021 <fixed-case>G</fixed-case>erman Federal Election Campaign on <fixed-case>I</fixed-case>nstagram</title>
+      <author><first>Michael</first><last>Achmann-Denkler</last></author>
+      <author><first>Jakob</first><last>Fehle</last></author>
+      <author><first>Mario</first><last>Haim</last></author>
+      <author><first>Christian</first><last>Wolff</last></author>
+      <pages>1–13</pages>
+      <abstract>This study investigates the automated classification of Calls to Action (CTAs) within the 2021 German Instagram election campaign to advance the understanding of mobilization in social media contexts. We analyzed over 2,208 Instagram stories and 712 posts using fine-tuned BERT models and OpenAI’s GPT-4 models. The fine-tuned BERT model incorporating synthetic training data achieved a macro F1 score of 0.93, demonstrating a robust classification performance. Our analysis revealed that 49.58% of Instagram posts and 10.64% of stories contained CTAs, highlighting significant differences in mobilization strategies between these content types. Additionally, we found that FDP and the Greens had the highest prevalence of CTAs in posts, whereas CDU and CSU led in story CTAs.</abstract>
+      <url hash="7753c9d4">2024.cpss-1.1</url>
+      <bibkey>achmann-denkler-etal-2024-detecting</bibkey>
+    </paper>
+    <paper id="2">
+      <title>Multilingual Bot Accusations: How Different Linguistic Contexts Shape Perceptions of Social Bots</title>
+      <author><first>Leon</first><last>Fröhling</last></author>
+      <author><first>Xiaofei</first><last>Li</last></author>
+      <author><first>Dennis</first><last>Assenmacher</last></author>
+      <pages>14–32</pages>
+      <abstract>Recent research indicates that the online use of the term ”bot” has evolved over time. In the past, people used the term to accuse others of displaying automated behavior. However, it has gradually transformed into a linguistic tool to dehumanize the conversation partner, particularly on polarizing topics. Although this trend has been observed in English-speaking contexts, it is still unclear whether it holds true in other socio-linguistic environments. In this work we extend existing work on bot accusations and explore the phenomenon in a multilingual setting. We identify three distinct accusation patterns that characterize the different languages.</abstract>
+      <url hash="660f9508">2024.cpss-1.2</url>
+      <bibkey>frohling-etal-2024-multilingual</bibkey>
+    </paper>
+    <paper id="3">
+      <title>Operationalising the Hermeneutic Grouping Process in Corpus-assisted Discourse Studies</title>
+      <author><first>Philipp</first><last>Heinrich</last></author>
+      <author><first>Stephanie</first><last>Evert</last></author>
+      <pages>33–44</pages>
+      <abstract>We propose a framework for quantitative-qualitative research in corpus-assisted discourse studies (CADS), which operationalises the central process of manually forming groups of related words and phrases in terms of “discoursemes” and their constellations. We introduce an open-source implementation of this framework in the form of a REST API based on Corpus Workbench. Going through the workflow of a collocation analysis for fleeing and related terms in the German Federal Parliament, the paper gives details about the underlying algorithms, with available parameters and further possible choices. We also address multi-word units (which are often disregarded by CADS tools), a semantic map visualisation of collocations, and how to compute assocations between discoursemes.</abstract>
+      <url hash="a3a78eb0">2024.cpss-1.3</url>
+      <bibkey>heinrich-evert-2024-operationalising</bibkey>
+    </paper>
+    <paper id="4">
+      <title>A Few Hypocrites: Few-Shot Learning and Subtype Definitions for Detecting Hypocrisy Accusations in Online Climate Change Debates</title>
+      <author><first>Paulina Garcia</first><last>Corral</last></author>
+      <author><first>Avishai</first><last>Green</last></author>
+      <author><first>Hendrik</first><last>Meyer</last></author>
+      <author><first>Anke</first><last>Stoll</last></author>
+      <author><first>Xiaoyue</first><last>Yan</last></author>
+      <author><first>Myrthe</first><last>Reuver</last></author>
+      <pages>45–60</pages>
+      <abstract>The climate crisis is a salient issue in online discussions, and hypocrisy accusations are a central rhetorical element in these debates. However, for large-scale text analysis, hypocrisy accusation detection is an understudied tool, most often defined as a smaller subtask of fallacious argument detection. In this paper, we define hypocrisy accusation detection as an independent task in NLP, and identify different relevant subtypes of hypocrisy accusations. Our Climate Hypocrisy Accusation Corpus (CHAC) consists of 420 Reddit climate debate comments, expert-annotated into two different types of hypocrisy accusations: personal versus political hypocrisy. We evaluate few-shot in-context learning with 6 shots and 3 instruction-tuned Large Language Models (LLMs) for detecting hypocrisy accusations in this dataset. Results indicate that the GPT-4o and Llama-3 models in particular show promise in detecting hypocrisy accusations (F1 reaching 0.68, while previous work shows F1 of 0.44). However, context matters for a complex semantic concept such as hypocrisy accusations, and we find models struggle especially at identifying political hypocrisy accusations compared to personal moral hypocrisy. Our study contributes new insights in hypocrisy detection and climate change discourse, and is a stepping stone for large-scale analysis of hypocrisy accusation in online climate debates.</abstract>
+      <url hash="d6e535da">2024.cpss-1.4</url>
+      <bibkey>corral-etal-2024-hypocrites</bibkey>
+    </paper>
+    <paper id="5">
+      <title>Language Complexity in Populist Rhetoric</title>
+      <author><first>Sergio E.</first><last>Zanotto</last></author>
+      <author><first>Diego</first><last>Frassinelli</last></author>
+      <author><first>Miriam</first><last>Butt</last></author>
+      <pages>61–80</pages>
+      <abstract>Research suggests that politicians labeled as populists tend to use simpler language than their mainstream opponents. Yet, the metrics traditionally employed to assess the complexity of their language do not show consistent and generalizable results across different datasets and languages. This inconsistencies raise questions about the claimed simplicity of populist discourse, suggesting that the issue may be more nuanced than it initially seemed. To address this topic, we analyze the linguistic profile of IMPAQTS, a dataset of transcribed Italian political speeches, to identify linguistic features differentiating populist and non-populist parties. Our methodology ensures comparability of political texts and combines various statistical analyses to reliably identify key linguistic characteristics to test our case study. Results show that the “simplistic” language features previously described in the literature are not robust predictors of populism. This suggests that the characteristics defining populist statements are highly dependent on the specific dataset and the language being analysed, thus limiting the conclusions drawn in previous research. In our study, various linguistic features statistically differentiate between populist and mainstream parties, indicating that populists tend to employ specific well-known rhetorical strategies more frequently; however, none of them strongly indicate that populist parties use simpler language.</abstract>
+      <url hash="2a5d38b6">2024.cpss-1.5</url>
+      <bibkey>zanotto-etal-2024-language</bibkey>
+    </paper>
+    <paper id="6">
+      <title><fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case> as Your n-th Annotator: Experiments in Leveraging Large Language Models for Social Science Text Annotation in <fixed-case>S</fixed-case>lovak Language</title>
+      <author><first>Endre</first><last>Hamerlik</last></author>
+      <author><first>Marek</first><last>Šuppa</last></author>
+      <author><first>Miroslav</first><last>Blšták</last></author>
+      <author><first>Jozef</first><last>Kubík</last></author>
+      <author><first>Martin</first><last>Takáč</last></author>
+      <author><first>Marián</first><last>Šimko</last></author>
+      <author><first>Andrej</first><last>Findor</last></author>
+      <pages>81–89</pages>
+      <abstract>Large Language Models (LLMs) are increasingly influential in Computational Social Science, offering new methods for processing and analyzing data, particularly in lower-resource language contexts. This study explores the use of OpenAI’s GPT-3.5 Turbo and GPT-4 for automating annotations for a unique news media dataset in a lower resourced language, focusing on stance classification tasks. Our results reveal that prompting in the native language, explanation generation, and advanced prompting strategies like Retrieval Augmented Generation and Chain of Thought prompting enhance LLM performance, particularly noting GPT-4’s superiority in predicting stance. Further evaluation indicates that LLMs can serve as a useful tool for social science text annotation in lower resourced languages, notably in identifying inconsistencies in annotation guidelines and annotated datasets.</abstract>
+      <url hash="0406e88a">2024.cpss-1.6</url>
+      <bibkey>hamerlik-etal-2024-chatgpt</bibkey>
+    </paper>
+    <paper id="7">
+      <title>Detecting emotional polarity in <fixed-case>F</fixed-case>innish parliamentary proceedings</title>
+      <author><first>Suvi</first><last>Lehtosalo</last></author>
+      <author><first>John</first><last>Nerbonne</last></author>
+      <pages>90–100</pages>
+      <abstract>Few studies have focused on detecting emotion in parliamentary corpora, and none have done this for the Finnish parliament. In this paper, this gap is addressed by applying the polarity lexicon–based methodology of a study by Rheault et al. (2016) on speeches in the British Parliament to a Finnish corpus. The findings show an increase in positive sentiment over time. Additionally, the findings indicate that politicians’ emotional states may be impacted by the state of the economy and other major events, such as the Covid-19 pandemic and the Russian invasion of Ukraine.</abstract>
+      <url hash="ed124e81">2024.cpss-1.7</url>
+      <bibkey>lehtosalo-nerbonne-2024-detecting</bibkey>
+    </paper>
+    <paper id="8">
+      <title>Topic-specific social science theory in stance detection: a proposal and interdisciplinary pilot study on sustainability initiatives</title>
+      <author><first>Myrthe</first><last>Reuver</last></author>
+      <author><first>Alessandra</first><last>Polimeno</last></author>
+      <author><first>Antske</first><last>Fokkens</last></author>
+      <author><first>Ana Isabel</first><last>Lopes</last></author>
+      <pages>101–111</pages>
+      <abstract>Topic-specificity is often seen as a limitation of stance detection models and datasets, especially for analyzing political and societal debates. However, stances contain topic-specific aspects that are crucial for an in-depth understanding of these debates. Our interdisciplinary approach identifies social science theories on specific debate topics as an opportunity for further defining stance detection research and analyzing online debate. This paper explores sustainability as debate topic, and connects stance to the sustainability-related Value-Belief-Norm (VBN) theory. VBN theory states that arguments in favor or against sustainability initiatives contain the dimensions of feeling power to change the issue with the initiative, and thinking whether or not the initiative tackles an urgent threat to the environment. In a pilot study with our Reddit European Sustainability Initiatives corpus, we develop an annotation procedure for these complex concepts. We then compare crowd-workers with Natural Language Processing experts’ annotation proficiency. Both crowd-workers and NLP experts find the tasks difficult, but experts reach more agreement on some difficult examples. This pilot study shows that complex theories about debate topics are feasible and worthwhile as annotation tasks for stance detection.</abstract>
+      <url hash="2c3a4d52">2024.cpss-1.8</url>
+      <bibkey>reuver-etal-2024-topic</bibkey>
+    </paper>
+    <paper id="9">
+      <title>The Echoes of the ‘<fixed-case>I</fixed-case>’: Tracing Identity with Demographically Enhanced Word Embeddings</title>
+      <author><first>Ivan</first><last>Smirnov</last></author>
+      <pages>112–118</pages>
+      <abstract>Identity is one of the most commonly studied constructs in social science. However, despite extensive theoretical work on identity, there remains a need for additional empirical data to validate and refine existing theories. This paper introduces a novel approach to studying identity by enhancing word embeddings with socio-demographic information. As a proof of concept, we demonstrate that our approach successfully reproduces and extends established findings regarding gendered self-views. Our methodology can be applied in a wide variety of settings, allowing researchers to tap into a vast pool of naturally occurring data, such as social media posts. Unlike similar methods already introduced in computer science, our approach allows for the study of differences between social groups. This could be particularly appealing to social scientists and may encourage the faster adoption of computational methods in the field.</abstract>
+      <url hash="feb9fbea">2024.cpss-1.9</url>
+      <bibkey>smirnov-2024-echoes</bibkey>
+    </paper>
+    <paper id="10">
+      <title><fixed-case>TPPMI</fixed-case> - a Temporal Positive Pointwise Mutual Information Embedding of Words</title>
+      <author><first>Paul</first><last>Schmitt</last></author>
+      <author><first>Zsófia</first><last>Rakovics</last></author>
+      <author><first>Márton</first><last>Rakovics</last></author>
+      <author><first>Gábor</first><last>Recski</last></author>
+      <pages>119–125</pages>
+      <abstract>We present Temporal Positive Pointwise Mutual Information (TPPMI) embeddings as a robust and data-efficient alternative for modeling temporal semantic change. Based on the assumption that the semantics of the most frequent words in a corpus are relatively stable over time, our model represents words as vectors of their PPMI similarities with a predefined set of such context words. We evaluate our method on the temporal word analogy benchmark of Yao et al. (2018) and compare it to the TWEC model (Di Carlo et al., 2019), demonstrating the competitiveness of the approach. While the performance of TPPMI stays below that of the state-of-the-art TWEC model, it offers a higher degree of interpretability and is applicable in scenarios where only a limited amount of data is available.</abstract>
+      <url hash="8c1c9308">2024.cpss-1.10</url>
+      <bibkey>schmitt-etal-2024-tppmi</bibkey>
+    </paper>
+    <paper id="11">
+      <title>Augmented Political Leaning Detection: Leveraging Parliamentary Speeches for Classifying News Articles</title>
+      <author><first>Charlott</first><last>Jakob</last></author>
+      <author><first>Pia</first><last>Wenzel</last></author>
+      <author><first>Salar</first><last>Mohtaj</last></author>
+      <author><first>Vera</first><last>Schmitt</last></author>
+      <pages>126–133</pages>
+      <abstract>In an era where political discourse infiltrates online platforms and news media, identifying opinion is increasingly critical, especially in news articles, where objectivity is expected. Readers frequently encounter authors’ inherent political viewpoints, challenging them to discern facts from opinions. Classifying text on a spectrum from left to right is a key task for uncovering these viewpoints. Previous approaches rely on outdated datasets to classify current articles, neglecting that political opinions on certain subjects change over time. This paper explores a novel methodology for detecting political leaning in news articles by augmenting them with political speeches specific to the topic and publication time. We evaluated the impact of the augmentation using BERT and Mistral models. The results show that the BERT model’s F1 score improved from a baseline of 0.82 to 0.85, while the Mistral model’s F1 score increased from 0.30 to 0.31.</abstract>
+      <url hash="94b431ea">2024.cpss-1.11</url>
+      <bibkey>jakob-etal-2024-augmented</bibkey>
+    </paper>
+  </volume>
+</collection>