diff --git a/.github/ISSUE_TEMPLATE/04-ingestion-request.yml b/.github/ISSUE_TEMPLATE/04-ingestion-request.yml index 09f9d10939..663ac659df 100644 --- a/.github/ISSUE_TEMPLATE/04-ingestion-request.yml +++ b/.github/ISSUE_TEMPLATE/04-ingestion-request.yml @@ -24,12 +24,19 @@ body: placeholder: ex. emnlp, repl4nlp validations: required: true + - type: input + id: venue_sig + attributes: + label: "ACL SIG(s) sponsoring or endorsing the whole venue" + description: | + Provide a comma-separated list of any SIGs that apply to the whole venue. If there are multiple subvenues/volumes with different SIGs, provide the mapping under Supporting Information. + placeholder: ex. SIGLEX, SIGSEM - type: input id: volume_title attributes: label: Volume Title description: | - What is the title of the volume that should be published? + What is the title of the (main) volume that should be published? placeholder: ex. Proceedings of the 2019 Meeting of the Conference on Empirical Methods in Natural Language Processing (EMNLP) validations: required: true @@ -54,9 +61,16 @@ body: description: | When would you like the material to be published on the ACL Anthology? If you are submitting material that can be published immediately (e.g. for conferences that already happened in the past), you can leave this field blank. placeholder: ex. 2023-12-31 + - type: input + id: volume_address + attributes: + label: Location + description: | + What address should be included in bibliography entries, if any? For conferences this is the location of the conference. For a fully-online event use "Online", optionally following the host team location. Ensure the address field is consistent across submitted volumes. + placeholder: ex. Barcelona, Spain (Online) - type: textarea id: ingestion_information attributes: label: Supporting Information description: | - If there is anything else we should know about this ingestion request, please provide the information here. You can also use this field to **provide links or attach files** of the material, if you already have them. + If there is anything else we should know about this ingestion request, please provide the information here. E.g. for venues with multiple volumes, list them with the volume identifier, volume title, and any SIGs for the volume. You can also use this field to **provide links or attach files** of the material, if you already have them. diff --git a/bin/anthology/papers.py b/bin/anthology/papers.py index 9e334e6f95..1bbf0cb4dd 100644 --- a/bin/anthology/papers.py +++ b/bin/anthology/papers.py @@ -180,14 +180,8 @@ def from_xml(xml_element, *args): paper.attrib["retracted"] = " " # Adjust the title for retracted papers - if ( - "retracted" in paper.attrib - and "xml_title" in paper.attrib - and paper.attrib["xml_title"].text is not None - ): - paper.attrib["xml_title"].text = ( - "[RETRACTED] " + paper.attrib["xml_title"].text - ) + if "retracted" in paper.attrib and "xml_title" in paper.attrib: + paper.add_prefix_to_title("[RETRACTED] ") if "removed" in paper.attrib and paper.attrib["removed"] is None: paper.attrib["removed"] = " " @@ -307,6 +301,13 @@ def get(self, name, default=None): except KeyError: return default + def add_prefix_to_title(self, prefix): + """Add a prefix to the title of the paper. + The attrib is an lxml Element object.""" + if self.attrib["xml_title"].text is None: + self.attrib["xml_title"].text = "" + self.attrib["xml_title"].text = prefix + self.attrib["xml_title"].text + def get_title(self, form="xml"): """Returns the paper title, optionally formatting it. diff --git a/bin/requirements.txt b/bin/requirements.txt index 05ba7edb6d..6760374aff 100644 --- a/bin/requirements.txt +++ b/bin/requirements.txt @@ -1,3 +1,4 @@ +filelock==3.15.1 black~=23.9.0 citeproc-py citeproc-py-styles diff --git a/data/xml/2020.aacl.xml b/data/xml/2020.aacl.xml index 5704c4b343..b74ca8f616 100644 --- a/data/xml/2020.aacl.xml +++ b/data/xml/2020.aacl.xml @@ -462,7 +462,6 @@ nadeem-etal-2020-systematic moinnadeem/characterizing-sampling-algorithms WikiText-103 - WikiText-2 <fixed-case>C</fixed-case>hinese Content Scoring: Open-Access Datasets and Features on Different Segmentation Levels diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml index 566a7524ab..d79c11ec9f 100644 --- a/data/xml/2020.acl.xml +++ b/data/xml/2020.acl.xml @@ -4006,7 +4006,6 @@ press-etal-2020-improving WikiText-103 - WikiText-2 Single Model Ensemble using Pseudo-Tags and Distinct Vectors @@ -12697,7 +12696,6 @@ bhatt-etal-2020-much bhattg/Decay-RNN-ACL-SRW2020 WikiText-103 - WikiText-2 Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining diff --git a/data/xml/2020.coling.xml b/data/xml/2020.coling.xml index ad11e8413d..8fe9e02327 100644 --- a/data/xml/2020.coling.xml +++ b/data/xml/2020.coling.xml @@ -4761,7 +4761,6 @@ GLUE SQuAD WikiText-103 - WikiText-2 How <fixed-case>LSTM</fixed-case> Encodes Syntax: Exploring Context Vectors and Semi-Quantization on Natural Text @@ -6271,7 +6270,7 @@ <fixed-case>S</fixed-case>a<fixed-case>SAKE</fixed-case>: Syntax and Semantics Aware Keyphrase Extraction from Research Papers - SantoshTokala + SantoshT.y.s.s DebarshiKumar Sanyal Plaban KumarBhowmick Partha PratimDas diff --git a/data/xml/2020.conll.xml b/data/xml/2020.conll.xml index 4c8838e8f5..5c952a7be2 100644 --- a/data/xml/2020.conll.xml +++ b/data/xml/2020.conll.xml @@ -643,7 +643,6 @@ 10.18653/v1/2020.conll-1.49 eisape-etal-2020-cloze WikiText-103 - WikiText-2 Disentangling dialects: a neural approach to <fixed-case>I</fixed-case>ndo-<fixed-case>A</fixed-case>ryan historical phonology and subgrouping diff --git a/data/xml/2020.emnlp.xml b/data/xml/2020.emnlp.xml index d109f8c3a3..770c9e4bbc 100644 --- a/data/xml/2020.emnlp.xml +++ b/data/xml/2020.emnlp.xml @@ -409,10 +409,12 @@ DavidSchlangen 357–374 While humans process language incrementally, the best language encoders currently used in NLP do not. Both bidirectional LSTMs and Transformers assume that the sequence that is to be encoded is available in full, to be processed either forwards and backwards (BiLSTMs) or as a whole (Transformers). We investigate how they behave under incremental interfaces, when partial output must be provided based on partial input seen up to a certain time step, which may happen in interactive systems. We test five models on various NLU datasets and compare their performance using three incremental evaluation metrics. The results support the possibility of using bidirectional encoders in incremental mode while retaining most of their non-incremental quality. The “omni-directional” BERT model, which achieves better non-incremental performance, is impacted more by the incremental access. This can be alleviated by adapting the training regime (truncated training), or the testing procedure, by delaying the output until some right context is available or by incorporating hypothetical right contexts generated by a language model like GPT-2. - 2020.emnlp-main.26 + 2020.emnlp-main.26 10.18653/v1/2020.emnlp-main.26 @@ -6347,7 +6348,6 @@ shen-etal-2020-blank Varal7/blank_language_model WikiText-103 - WikiText-2 <fixed-case>COD3S</fixed-case>: Diverse Generation with Discrete Semantic Signatures @@ -7533,7 +7533,7 @@ 10.18653/v1/2020.emnlp-main.498 The importance of fillers for text representations of speech transcripts @@ -10682,7 +10681,6 @@ 10.18653/v1/2020.emnlp-main.703 @@ -11319,7 +11317,6 @@ 10.18653/v1/2020.emnlp-main.743 An information theoretic view on selecting linguistic probes diff --git a/data/xml/2020.eval4nlp.xml b/data/xml/2020.eval4nlp.xml index a84ee3843d..cd7fd69e87 100644 --- a/data/xml/2020.eval4nlp.xml +++ b/data/xml/2020.eval4nlp.xml @@ -193,7 +193,6 @@ dudy-bedrick-2020-words shiranD/word_level_evaluation WikiText-103 - WikiText-2 On Aligning <fixed-case>O</fixed-case>pen<fixed-case>IE</fixed-case> Extractions with Knowledge Bases: A Case Study diff --git a/data/xml/2020.findings.xml b/data/xml/2020.findings.xml index 94137820f9..13c9df719a 100644 --- a/data/xml/2020.findings.xml +++ b/data/xml/2020.findings.xml @@ -115,7 +115,6 @@ huang-etal-2020-reducing SST WikiText-103 - WikiText-2 Improving Text Understanding via Deep Syntax-Semantics Communication @@ -3740,7 +3739,6 @@ 10.18653/v1/2020.findings-emnlp.250 lioutas-etal-2020-improving WikiText-103 - WikiText-2 <fixed-case>P</fixed-case>harm<fixed-case>MT</fixed-case>: A Neural Machine Translation Approach to Simplify Prescription Directions @@ -6440,7 +6438,6 @@ NCBI Disease SST WikiText-103 - WikiText-2 <fixed-case>E</fixed-case>xploiting <fixed-case>U</fixed-case>nsupervised <fixed-case>D</fixed-case>ata for <fixed-case>E</fixed-case>motion <fixed-case>R</fixed-case>ecognition in <fixed-case>C</fixed-case>onversations @@ -6471,7 +6468,6 @@ IMDb Movie Reviews SST WikiText-103 - WikiText-2 Speaker or Listener? The Role of a Dialog Agent diff --git a/data/xml/2020.lrec.xml b/data/xml/2020.lrec.xml index 10b9d60a5c..3faa42f108 100644 --- a/data/xml/2020.lrec.xml +++ b/data/xml/2020.lrec.xml @@ -5590,7 +5590,7 @@ <fixed-case>NMT</fixed-case> and <fixed-case>PBSMT</fixed-case> Error Analyses in <fixed-case>E</fixed-case>nglish to <fixed-case>B</fixed-case>razilian <fixed-case>P</fixed-case>ortuguese Automatic Translations HelenaCaseli - MarcioInácio + MarcioLima Inácio 3623–3629 Machine Translation (MT) is one of the most important natural language processing applications. Independently of the applied MT approach, a MT system automatically generates an equivalent version (in some target language) of an input sentence (in some source language). Recently, a new MT approach has been proposed: neural machine translation (NMT). NMT systems have already outperformed traditional phrase-based statistical machine translation (PBSMT) systems for some pairs of languages. However, any MT approach outputs errors. In this work we present a comparative study of MT errors generated by a NMT system and a PBSMT system trained on the same English – Brazilian Portuguese parallel corpus. This is the first study of this kind involving NMT for Brazilian Portuguese. Furthermore, the analyses and conclusions presented here point out the specific problems of NMT outputs in relation to PBSMT ones and also give lots of insights into how to implement automatic post-editing for a NMT system. Finally, the corpora annotated with MT errors generated by both PBSMT and NMT systems are also available. 2020.lrec-1.446 diff --git a/data/xml/2020.msr.xml b/data/xml/2020.msr.xml index 219ff20831..10f00762a8 100644 --- a/data/xml/2020.msr.xml +++ b/data/xml/2020.msr.xml @@ -33,7 +33,6 @@ mille-etal-2020-third talnupf/ud2deep WikiText-103 - WikiText-2 <fixed-case>BME</fixed-case>-<fixed-case>TUW</fixed-case> at <fixed-case>SR</fixed-case>’20: Lexical grammar induction for surface realization diff --git a/data/xml/2020.scil.xml b/data/xml/2020.scil.xml index 810e825d92..3816227342 100644 --- a/data/xml/2020.scil.xml +++ b/data/xml/2020.scil.xml @@ -356,7 +356,6 @@ hu-etal-2020-closer jennhu/reflexive-anaphor-licensing WikiText-103 - WikiText-2 <fixed-case>M</fixed-case>ona<fixed-case>L</fixed-case>og: a Lightweight System for Natural Language Inference Based on Monotonicity diff --git a/data/xml/2020.tacl.xml b/data/xml/2020.tacl.xml index cdd99ee3da..17daba8d35 100644 --- a/data/xml/2020.tacl.xml +++ b/data/xml/2020.tacl.xml @@ -364,7 +364,6 @@ GLUE WebText WikiText-103 - WikiText-2 Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems diff --git a/data/xml/2021.acl.xml b/data/xml/2021.acl.xml index d90bc03c95..91b0d01c81 100644 --- a/data/xml/2021.acl.xml +++ b/data/xml/2021.acl.xml @@ -724,7 +724,6 @@ LZhengisme/CODA WMT 2014 WikiText-103 - WikiText-2 Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor @@ -1123,7 +1122,6 @@ Integrated Directional Gradients: Feature Interaction Attribution for Neural <fixed-case>NLP</fixed-case> Models @@ -1404,7 +1402,6 @@ SST SST-2 WikiText-103 - WikiText-2 Explainable Prediction of Text Complexity: The Missing Preliminaries for Text Simplification @@ -3725,7 +3722,6 @@ DuReader IMDb Movie Reviews WikiText-103 - WikiText-2 Marginal Utility Diminishes: Exploring the Minimum Knowledge for <fixed-case>BERT</fixed-case> Knowledge Distillation @@ -4778,7 +4774,6 @@ <fixed-case>T</fixed-case>ext<fixed-case>SETTR</fixed-case>: Few-Shot Text Style Extraction and Tunable Targeted Restyling @@ -6582,7 +6577,6 @@ Lower Perplexity is Not Always Human-Like @@ -6953,7 +6947,6 @@ ofirpress/shortformer BookCorpus WikiText-103 - WikiText-2 <fixed-case>B</fixed-case>andit<fixed-case>MTL</fixed-case>: Bandit-based Multi-task Learning for Text Classification @@ -10657,7 +10650,6 @@ 10.18653/v1/2021.acl-short.90 he-etal-2021-towards @@ -11003,7 +10995,6 @@ 10.18653/v1/2021.acl-short.112 zhou-etal-2021-generation Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images @@ -12263,7 +12254,7 @@ <fixed-case>CRSL</fixed-case>ab: An Open-Source Toolkit for Building Conversational Recommender System KunZhou - XiaoleiWang + XiaoleiWang YuanhangZhou ChenzhanShang YuanCheng diff --git a/data/xml/2021.adaptnlp.xml b/data/xml/2021.adaptnlp.xml index 91c1580f9c..027c1d0957 100644 --- a/data/xml/2021.adaptnlp.xml +++ b/data/xml/2021.adaptnlp.xml @@ -203,7 +203,6 @@ 2021.adaptnlp-1.15 buck-vlachos-2021-trajectory WikiText-103 - WikiText-2 Dependency Parsing Evaluation for Low-resource Spontaneous Speech @@ -223,7 +222,7 @@ Compound probabilistic context-free grammars (C-PCFGs) have recently established a new state of the art for phrase-structure grammar induction. However, due to the high time-complexity of chart-based representation and inference, it is difficult to investigate them comprehensively. In this work, we rely on a fast implementation of C-PCFGs to conduct evaluation complementary to that of (CITATION). We highlight three key findings: (1) C-PCFGs are data-efficient, (2) C-PCFGs make the best use of global sentence-level information in preterminal rule probabilities, and (3) the best configurations of C-PCFGs on English do not always generalize to morphology-rich languages. 2021.adaptnlp-1.17 zhao-titov-2021-empirical - zhaoyanpeng/cpcfg + zhaoyanpeng/xcfg English Web Treebank diff --git a/data/xml/2021.cmcl.xml b/data/xml/2021.cmcl.xml index 29dc25c61d..4c637ba2a0 100644 --- a/data/xml/2021.cmcl.xml +++ b/data/xml/2021.cmcl.xml @@ -203,7 +203,6 @@ 10.18653/v1/2021.cmcl-1.16 vickers-etal-2021-cognlp WikiText-103 - WikiText-2 Team <fixed-case>R</fixed-case>ead<fixed-case>M</fixed-case>e at <fixed-case>CMCL</fixed-case> 2021 Shared Task: Predicting Human Reading Patterns by Traditional Oculomotor Control Models and Machine Learning diff --git a/data/xml/2021.emnlp.xml b/data/xml/2021.emnlp.xml index a0c055baba..a1961628bf 100644 --- a/data/xml/2021.emnlp.xml +++ b/data/xml/2021.emnlp.xml @@ -1164,7 +1164,6 @@ CoLA Natural Stories WikiText-103 - WikiText-2 Condenser: a Pre-training Architecture for Dense Retrieval @@ -2385,7 +2384,7 @@ wang-etal-2021-gender 10.18653/v1/2021.emnlp-main.151 @@ -3003,7 +3002,6 @@ hu-etal-2021-ranknas 10.18653/v1/2021.emnlp-main.191 WikiText-103 - WikiText-2 <fixed-case>FL</fixed-case>i<fixed-case>T</fixed-case>ext: A Faster and Lighter Semi-Supervised Text Classification with Convolution Networks @@ -3743,7 +3741,6 @@ QNLI SNLI WikiText-103 - WikiText-2 Adversarial Mixing Policy for Relaxing Locally Linear Constraints in Mixup @@ -7029,7 +7026,6 @@ Connecting Attributions and <fixed-case>QA</fixed-case> Model Behavior on Realistic Counterfactuals @@ -7242,7 +7238,6 @@ <fixed-case>ST</fixed-case>ra<fixed-case>TA</fixed-case>: Self-Training with Task Augmentation for Better Few-shot Learning @@ -9524,7 +9519,6 @@ asappresearch/sru Billion Word Benchmark WikiText-103 - WikiText-2 Universal-<fixed-case>KD</fixed-case>: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation @@ -11217,7 +11211,7 @@ hardalov-etal-2021-cross 10.18653/v1/2021.emnlp-main.710 Text <fixed-case>A</fixed-case>uto<fixed-case>A</fixed-case>ugment: Learning Compositional Augmentation Policy for Text Classification @@ -12620,7 +12614,6 @@ Mewsli-9 SQuAD Tatoeba - TyDiQA TyDiQA-GoldP XCOPA XNLI @@ -13031,7 +13024,6 @@ Block Pruning For Faster Transformers @@ -13072,7 +13064,6 @@ How to Train <fixed-case>BERT</fixed-case> with an Academic Budget diff --git a/data/xml/2021.findings.xml b/data/xml/2021.findings.xml index 38768c60ac..a1238639e8 100644 --- a/data/xml/2021.findings.xml +++ b/data/xml/2021.findings.xml @@ -1168,7 +1168,6 @@ CONAN NEWSROOM WikiText-103 - WikiText-2 <fixed-case>SOLID</fixed-case>: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification @@ -3144,7 +3143,7 @@ 2021.findings-acl.217 10.18653/v1/2021.findings-acl.217 kruengkrai-etal-2021-multi - nii-yamagishilab/mla + nii-yamagishilab/mla FEVER @@ -4916,7 +4915,6 @@ QUASAR-T SearchQA WikiText-103 - WikiText-2 Minimally-Supervised Morphological Segmentation using <fixed-case>A</fixed-case>daptor <fixed-case>G</fixed-case>rammars with Linguistic Priors @@ -5620,7 +5618,6 @@ garimella-etal-2021-intelligent Task-adaptive Pre-training of Language Models with Word Embedding Regularization @@ -11686,7 +11683,6 @@ machelreid/subformer CNN/Daily Mail WikiText-103 - WikiText-2 Leveraging Information Bottleneck for Scientific Document Summarization @@ -11718,7 +11714,6 @@ SST SST-2 WikiText-103 - WikiText-2 Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots diff --git a/data/xml/2021.konvens.xml b/data/xml/2021.konvens.xml index a5d88a6479..68876689cb 100644 --- a/data/xml/2021.konvens.xml +++ b/data/xml/2021.konvens.xml @@ -44,7 +44,6 @@ SST SST-2 WikiText-103 - WikiText-2 <fixed-case>A</fixed-case>rgue<fixed-case>BERT</fixed-case>: How To Improve <fixed-case>BERT</fixed-case> Embeddings for Measuring the Similarity of Arguments diff --git a/data/xml/2021.mtsummit.xml b/data/xml/2021.mtsummit.xml index e27d4dc876..43f8a714a7 100644 --- a/data/xml/2021.mtsummit.xml +++ b/data/xml/2021.mtsummit.xml @@ -474,7 +474,6 @@ Our models outperform massively multilingual models such as Google (+8 2021.mtsummit-at4ssl.7 A cascaded Sign Language Translation system first maps sign videos to gloss annotations and then translates glosses into a spoken languages. This work focuses on the second-stage gloss translation component, which is challenging due to the scarcity of publicly available parallel data. We approach gloss translation as a low-resource machine translation task and investigate two popular methods for improving translation quality: hyperparameter search and backtranslation. We discuss the potentials and pitfalls of these methods based on experiments on the RWTH-PHOENIX-Weather 2014T dataset. zhang-duh-2021-approaching - PHOENIX14T RWTH-PHOENIX-Weather 2014 T @@ -514,7 +513,6 @@ Our models outperform massively multilingual models such as Google (+8 2021.mtsummit-at4ssl.10 One of the major challenges in sign language translation from a sign language to a spoken language is the lack of parallel corpora. Recent works have achieved promising results on the RWTH-PHOENIX-Weather 2014T dataset, which consists of over eight thousand parallel sentences between German sign language and German. However, from the perspective of neural machine translation, this is still a tiny dataset. To improve the performance of models trained on small datasets, transfer learning can be used. While this has been previously applied in sign language translation for feature extraction, to the best of our knowledge, pretrained language models have not yet been investigated. We use pretrained BERT-base and mBART-50 models to initialize our sign language video to spoken language text translation model. To mitigate overfitting, we apply the frozen pretrained transformer technique: we freeze the majority of parameters during training. Using a pretrained BERT model, we outperform a baseline trained from scratch by 1 to 2 BLEU-4. Our results show that pretrained language models can be used to improve sign language translation performance and that the self-attention patterns in BERT transfer in zero-shot to the encoder and decoder of sign language translation models. de-coster-etal-2021-frozen - PHOENIX14T RWTH-PHOENIX-Weather 2014 T diff --git a/data/xml/2021.naacl.xml b/data/xml/2021.naacl.xml index 711cc7fb6e..631062899b 100644 --- a/data/xml/2021.naacl.xml +++ b/data/xml/2021.naacl.xml @@ -1232,6 +1232,7 @@ yang-etal-2021-mtag @@ -2586,7 +2587,6 @@ SST-2 SST-5 WikiText-103 - WikiText-2 <fixed-case>DA</fixed-case>-Transformer: Distance-aware Transformer @@ -5517,6 +5517,7 @@ 10.18653/v1/2021.naacl-main.359 gosangi-etal-2021-use Data and Model Distillation as a Solution for Domain-transferable Fact Verification @@ -6108,7 +6109,6 @@ ding-koehn-2021-evaluating @@ -6236,7 +6236,6 @@ SimengSun/revisit-nplm LAMBADA WikiText-103 - WikiText-2 <fixed-case>R</fixed-case>ead<fixed-case>T</fixed-case>wice: Reading Very Large Documents with Memories diff --git a/data/xml/2021.ranlp.xml b/data/xml/2021.ranlp.xml index f8b59a27dd..a4bb656c24 100644 --- a/data/xml/2021.ranlp.xml +++ b/data/xml/2021.ranlp.xml @@ -798,7 +798,7 @@ Semantic-Based Opinion Summarization - MarcioInácio + MarcioLima Inácio ThiagoPardo 619–628 The amount of information available online can be overwhelming for users to digest, specially when dealing with other users’ comments when making a decision about buying a product or service. In this context, opinion summarization systems are of great value, extracting important information from the texts and presenting them to the user in a more understandable manner. It is also known that the usage of semantic representations can benefit the quality of the generated summaries. This paper aims at developing opinion summarization methods based on Abstract Meaning Representation of texts in the Brazilian Portuguese language. Four different methods have been investigated, alongside some literature approaches. The results show that a Machine Learning-based method produced summaries of higher quality, outperforming other literature techniques on manually constructed semantic graphs. We also show that using parsed graphs over manually annotated ones harmed the output. Finally, an analysis of how important different types of information are for the summarization process suggests that using Sentiment Analysis features did not improve summary quality. diff --git a/data/xml/2021.spnlp.xml b/data/xml/2021.spnlp.xml index 85c798a91a..4605fac2bf 100644 --- a/data/xml/2021.spnlp.xml +++ b/data/xml/2021.spnlp.xml @@ -81,7 +81,6 @@ Using Hierarchical Class Structure to Improve Fine-Grained Claim Classification diff --git a/data/xml/2021.sustainlp.xml b/data/xml/2021.sustainlp.xml index 67d8ea1ba1..a0c337486a 100644 --- a/data/xml/2021.sustainlp.xml +++ b/data/xml/2021.sustainlp.xml @@ -97,7 +97,6 @@ ROPES SQuAD WikiText-103 - WikiText-2 <fixed-case>B</fixed-case>io<fixed-case>C</fixed-case>opy: A Plug-And-Play Span Copy Mechanism in <fixed-case>S</fixed-case>eq2<fixed-case>S</fixed-case>eq Models @@ -250,7 +249,6 @@ GLUE QNLI WikiText-103 - WikiText-2 Efficient Domain Adaptation of Language Models via Adaptive Tokenization diff --git a/data/xml/2021.textgraphs.xml b/data/xml/2021.textgraphs.xml index 8aa46fd4de..d073ad2bcb 100644 --- a/data/xml/2021.textgraphs.xml +++ b/data/xml/2021.textgraphs.xml @@ -117,7 +117,6 @@ DBpedia GenWiki WikiText-103 - WikiText-2 Selective Attention Based Graph Convolutional Networks for Aspect-Level Sentiment Classification diff --git a/data/xml/2022.acl.xml b/data/xml/2022.acl.xml index 8273af5a86..034ed48462 100644 --- a/data/xml/2022.acl.xml +++ b/data/xml/2022.acl.xml @@ -66,7 +66,6 @@ yu-etal-2022-rare 10.18653/v1/2022.acl-long.3 WikiText-103 - WikiText-2 <fixed-case>A</fixed-case>leph<fixed-case>BERT</fixed-case>: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level @@ -414,7 +413,6 @@ LAMBADA SuperGLUE WikiText-103 - WikiText-2 <fixed-case>Q</fixed-case>uote<fixed-case>R</fixed-case>: A Benchmark of Quote Recommendation for Writing @@ -1565,7 +1563,6 @@ 10.18653/v1/2022.acl-long.96 richardbaihe/robustlm WikiText-103 - WikiText-2 Tackling Fake News Detection by Continually Improving Social Context Representations using Graph Neural Networks @@ -3146,7 +3143,6 @@ PIQA RiddleSense WikiText-103 - WikiText-2 A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models @@ -6070,7 +6066,6 @@ in the Case of Unambiguous Gender deep-spin/infinite-former PG-19 WikiText-103 - WikiText-2 Systematic Inequalities in Language Technology Performance across the World’s Languages @@ -8352,7 +8347,6 @@ in the Case of Unambiguous Gender RealNews WMT 2014 WikiText-103 - WikiText-2 The Dangers of Underclaiming: Reasons for Caution When Reporting How <fixed-case>NLP</fixed-case> Systems Fail @@ -8985,6 +8979,7 @@ in the Case of Unambiguous Gender lu-etal-2022-fantastically 10.18653/v1/2022.acl-long.556 Simple and Effective Knowledge-Driven Query Expansion for <fixed-case>QA</fixed-case>-Based Product Attribute Extraction @@ -11494,7 +11488,6 @@ in the Case of Unambiguous Gender angelova-etal-2022-using 10.18653/v1/2022.acl-srw.21 dfki-signlanguage/gloss-to-text-sign-language-translation - PHOENIX14T Flexible Visual Grounding diff --git a/data/xml/2022.coling.xml b/data/xml/2022.coling.xml index c6ad791da2..42a975572c 100644 --- a/data/xml/2022.coling.xml +++ b/data/xml/2022.coling.xml @@ -3344,6 +3344,7 @@ Social media spreads both real news and fake news in various domains including politics, health, entertainment, etc. It is crucial to automatically detect fake news, especially for news of influential domains like politics and health because they may lead to serious social impact, e.g., panic in the COVID-19 pandemic. Some studies indicate the correlation between domains and perform multi-domain fake news detection. However, these multi-domain methods suffer from a seesaw problem that the performance of some domains is often improved by hurting the performance of other domains, which could lead to an unsatisfying performance in the specific target domains. To address this issue, we propose a Domain- and Instance-level Transfer Framework for Fake News Detection (DITFEND), which could improve the performance of specific target domains. To transfer coarse-grained domain-level knowledge, we train a general model with data of all domains from the meta-learning perspective. To transfer fine-grained instance-level knowledge and adapt the general model to a target domain, a language model is trained on the target domain to evaluate the transferability of each data instance in source domains and re-weight the instance’s contribution. Experiments on two real-world datasets demonstrate the effectiveness of DITFEND. According to both offline and online experiments, the DITFEND shows superior effectiveness for fake news detection. 2022.coling-1.250 nan-etal-2022-improving + ICTMCG/DITFEND Student Surpasses Teacher: Imitation Attack for Black-Box <fixed-case>NLP</fixed-case> <fixed-case>API</fixed-case>s @@ -4112,8 +4113,11 @@ MarcoAvagnano 3465–3479 Driven by deep learning breakthroughs, natural language generation (NLG) models have been at the center of steady progress in the last few years, with a ubiquitous task influence. However, since our ability to generate human-indistinguishable artificial text lags behind our capacity to assess it, it is paramount to develop and apply even better automatic evaluation metrics. To facilitate researchers to judge the effectiveness of their models broadly, we introduce NLG-Metricverse—an end-to-end open-source library for NLG evaluation based on Python. Our framework provides a living collection of NLG metrics in a unified and easy-to-use environment, supplying tools to efficiently apply, analyze, compare, and visualize them. This includes (i) the extensive support to heterogeneous automatic metrics with n-arity management, (ii) the meta-evaluation upon individual performance, metric-metric and metric-human correlations, (iii) graphical interpretations for helping humans better gain score intuitions, (iv) formal categorization and convenient documentation to accelerate metrics understanding. NLG-Metricverse aims to increase the comparability and replicability of NLG research, hopefully stimulating new contributions in the area. - 2022.coling-1.306 + 2022.coling-1.306 frisoni-etal-2022-nlg + + Retracted by the COLING 2022 PC chairs. + Retracted by the COLING 2022 PC chairs. <fixed-case>T</fixed-case>est<fixed-case>A</fixed-case>ug: A Framework for Augmenting Capability-based <fixed-case>NLP</fixed-case> Tests @@ -4153,7 +4157,6 @@ 2022.coling-1.309 abonizio-etal-2022-monobyte lersouza/lang-agnostic - TyDiQA XNLI mC4 @@ -5488,6 +5491,7 @@ Conditional computation algorithms, such as the early exiting (EE) algorithm, can be applied to accelerate the inference of pretrained language models (PLMs) while maintaining competitive performance on resource-constrained devices. However, this approach is only applied to the vertical architecture to decide which layers should be used for inference. Conversely, the operation of the horizontal perspective is ignored, and the determination of which tokens in each layer should participate in the computation fails, leading to a high redundancy for adaptive inference. To address this limitation, a unified horizontal and vertical multi-perspective early exiting (MPEE) framework is proposed in this study to accelerate the inference of transformer-based models. Specifically, the vertical architecture uses recycling EE classifier memory and weighted self-distillation to enhance the performance of the EE classifiers. Then, the horizontal perspective uses recycling class attention memory to emphasize the informative tokens. Conversely, the tokens with less information are truncated by weighted fusion and isolated from the following computation. Based on this, both horizontal and vertical EE are unified to obtain a better tradeoff between performance and efficiency. Extensive experimental results show that MPEE can achieve higher acceleration inference with competent performance than existing competitive methods. 2022.coling-1.414 kong-etal-2022-accelerating + junkong5/mpee GLUE MRPC MultiNLI @@ -6724,7 +6728,7 @@ Research on Automatic Story Generation (ASG) relies heavily on human and automatic evaluation. However, there is no consensus on which human evaluation criteria to use, and no analysis of how well automatic criteria correlate with them. In this paper, we propose to re-evaluate ASG evaluation. We introduce a set of 6 orthogonal and comprehensive human criteria, carefully motivated by the social sciences literature. We also present HANNA, an annotated dataset of 1,056 stories produced by 10 different ASG systems. HANNA allows us to quantitatively evaluate the correlations of 72 automatic metrics with human criteria. Our analysis highlights the weaknesses of current metrics for ASG and allows us to formulate practical recommendations for ASG evaluation. 2022.coling-1.509 chhun-etal-2022-human - dig-team/hanna-benchmark-asg + lashoun/hanna-benchmark-asg HANNA WritingPrompts @@ -6871,6 +6875,7 @@ Recent research on code summarization relies on the structural information from the abstract syntax tree (AST) of source codes. It is, however, questionable whether it is the most effective to use AST for expressing the structural information. We find that a program dependency graph (PDG) can represent the structure of a code more effectively. We propose PDG Boosting Module (PBM) that encodes PDG into graph embedding and the framework to implement the proposed PBM with the existing models. PBM achieves improvements of 6.67% (BLEU) and 7.47% (ROUGE) on average. We then analyze the experimental results, and examine how PBM helps the training of baseline models and its performance robustness. For the validation of robustness, we measure the performance of an out-of-domain benchmark dataset, and confirm its robustness. In addition, we apply a new evaluation measure, SBERT score, to evaluate the semantic performance. The models implemented with PBM improve the performance of SBERT score. This implies that they generate summaries that are semantically more similar to the reference summary. 2022.coling-1.521 son-etal-2022-boosting + sjk0825/coling2022 CodeSearchNet @@ -7324,7 +7329,6 @@ sun-etal-2022-summarize BookCorpus WikiText-103 - WikiText-2 WritingPrompts @@ -7500,7 +7504,6 @@ fadedcosine/pos-guided-neural-text-generation PARANMT-50M WikiText-103 - WikiText-2 Enhancing Pre-trained Models with Text Structure Knowledge for Question Generation diff --git a/data/xml/2022.creativesumm.xml b/data/xml/2022.creativesumm.xml index 0dd2542e4a..9b44072853 100644 --- a/data/xml/2022.creativesumm.xml +++ b/data/xml/2022.creativesumm.xml @@ -24,6 +24,7 @@ Summarizing Interactive Digital Narratives (IDN) presents some unique challenges to existing text summarization models especially around capturing interactive elements in addition to important plot points. In this paper, we describe the first IDN dataset (IDN-Sum) designed specifically for training and testing IDN text summarization algorithms. Our dataset is generated using random playthroughs of 8 IDN episodes, taken from 2 different IDN games, and consists of 10,000 documents. Playthrough documents are annotated through automatic alignment with fan-sourced summaries using a commonly used alignment algorithm. We also report and discuss results from experiments applying common baseline extractive text summarization algorithms to this dataset. Qualitative analysis of the results reveals shortcomings in common annotation approaches and evaluation methods when applied to narrative and interactive narrative datasets. The dataset is released as open source for future researchers to train and test their own approaches for IDN text. 2022.creativesumm-1.1 revi-etal-2022-idn + ashwathytr/idn-sum CNN/Daily Mail CRD3 diff --git a/data/xml/2022.emnlp.xml b/data/xml/2022.emnlp.xml index 611d516d91..2416893581 100644 --- a/data/xml/2022.emnlp.xml +++ b/data/xml/2022.emnlp.xml @@ -981,7 +981,7 @@ Deconfounding Legal Judgment Prediction for <fixed-case>E</fixed-case>uropean Court of Human Rights Cases Towards Better Alignment with Experts - T.y.s.sSantoshTechnical University of Munich + SantoshT.y.s.sTechnical University of Munich ShanshanXuTechnical University of Munich OanaIchimGraduate Institute of International and Development Studies MatthiasGrabmairTechnical University of Munich @@ -3307,6 +3307,7 @@ How to disagree well: Investigating the dispute tactics used on <fixed-case>W</fixed-case>ikipedia ChristineDe KockUniversity of Cambridge + TomStaffordUniversity of Cambridge AndreasVlachosUniversity of Cambridge 3824-3837 Disagreements are frequently studied from the perspective of either detecting toxicity or analysing argument structure. We propose a framework of dispute tactics which unifies these two perspectives, as well as other dialogue acts which play a role in resolving disputes, such as asking questions and providing clarification. This framework includes a preferential ordering among rebuttal-type tactics, ranging from ad hominem attacks to refuting the central argument. Using this framework, we annotate 213 disagreements (3,865 utterances) from Wikipedia Talk pages. This allows us to investigate research questions around the tactics used in disagreements; for instance, we provide empirical validation of the approach to disagreement recommended by Wikipedia. We develop models for multilabel prediction of dispute tactics in an utterance, achieving the best performance with a transformer-based label powerset model. Adding an auxiliary task to incorporate the ordering of rebuttal tactics further yields a statistically significant increase. Finally, we show that these annotations can be used to provide useful additional signals to improve performance on the task of predicting escalation. diff --git a/data/xml/2022.findings.xml b/data/xml/2022.findings.xml index 4a8281c64e..f5e7dd16e0 100644 --- a/data/xml/2022.findings.xml +++ b/data/xml/2022.findings.xml @@ -911,7 +911,6 @@ Question Answering Infused Pre-training of General-Purpose Contextualized Representations @@ -3584,7 +3583,6 @@ 10.18653/v1/2022.findings-acl.228 Controllable Natural Language Generation with Contrastive Prefixes @@ -4361,10 +4359,12 @@ AshutoshModi 3521-3536 Many populous countries including India are burdened with a considerable backlog of legal cases. Development of automated systems that could process legal documents and augment legal practitioners can mitigate this. However, there is a dearth of high-quality corpora that is needed to develop such data-driven systems. The problem gets even more pronounced in the case of low resource languages such as Hindi. In this resource paper, we introduce the Hindi Legal Documents Corpus (HLDC), a corpus of more than 900K legal documents in Hindi. Documents are cleaned and structured to enable the development of downstream applications. Further, as a use-case for the corpus, we introduce the task of bail prediction. We experiment with a battery of models and propose a Multi-Task Learning (MTL) based model for the same. MTL models use summarization as an auxiliary task along with bail prediction as the main task. Experiments with different models are indicative of the need for further research in this area. - 2022.findings-acl.278 + 2022.findings-acl.278 2022.findings-acl.278.software.zip kapoor-etal-2022-hldc 10.18653/v1/2022.findings-acl.278 + + This revision updates funding information in the Acknowledgements section of the paper. exploration-lab/hldc @@ -4661,7 +4661,6 @@ 2022.findings-acl.297 jin-etal-2022-prior 10.18653/v1/2022.findings-acl.297 - PHOENIX14T RWTH-PHOENIX-Weather 2014 T @@ -6592,7 +6591,6 @@ SST SST-2 WikiText-103 - WikiText-2 Towards Computationally Feasible Deep Active Learning @@ -7635,7 +7633,6 @@ 10.18653/v1/2022.findings-naacl.151 What kinds of errors do reference resolution models make and what can we learn from them? @@ -11177,6 +11174,11 @@ Faster and Smaller Speech Translation without Quality Compromise 2022.findings-emnlp.149.software.zip li-etal-2022-self 10.18653/v1/2022.findings-emnlp.149 + pldlgb/MetaSD + FB15k + FB15k-237 + WN18 + WN18RR <fixed-case>CQR</fixed-case>-<fixed-case>SQL</fixed-case>: Conversational Question Reformulation Enhanced Context-Dependent Text-to-<fixed-case>SQL</fixed-case> Parsers diff --git a/data/xml/2022.fl4nlp.xml b/data/xml/2022.fl4nlp.xml index 876b60eaff..e616ba926e 100644 --- a/data/xml/2022.fl4nlp.xml +++ b/data/xml/2022.fl4nlp.xml @@ -63,7 +63,6 @@ wu-etal-2022-adaptive 10.18653/v1/2022.fl4nlp-1.3 WikiText-103 - WikiText-2 Intrinsic Gradient Compression for Scalable and Efficient Federated Learning diff --git a/data/xml/2022.in2writing.xml b/data/xml/2022.in2writing.xml index 38d64c797f..544ef1c559 100644 --- a/data/xml/2022.in2writing.xml +++ b/data/xml/2022.in2writing.xml @@ -88,7 +88,6 @@ webis-de/in2writing22-language-models-as-context-sensitive-word-search-engines CLOTH WikiText-103 - WikiText-2 Plug-and-Play Controller for Story Completion: A Pilot Study toward Emotion-aware Story Writing Assistance diff --git a/data/xml/2022.iwslt.xml b/data/xml/2022.iwslt.xml index 136e4f4886..f3661e5e48 100644 --- a/data/xml/2022.iwslt.xml +++ b/data/xml/2022.iwslt.xml @@ -478,7 +478,7 @@ PatrickFernandes SiddharthDalmia JiatongShi - YifanPeng + YifanPeng DanBerrebbi XinyiWang GrahamNeubig diff --git a/data/xml/2022.lrec.xml b/data/xml/2022.lrec.xml index 940e63b8de..2122dd1ee5 100644 --- a/data/xml/2022.lrec.xml +++ b/data/xml/2022.lrec.xml @@ -4887,6 +4887,7 @@ This paper presents ClinIDMap, a tool for mapping identifiers between clinical ontologies and lexical resources. ClinIDMap interlinks identifiers from UMLS, SMOMED-CT, ICD-10 and the corresponding Wikipedia articles for concepts from the UMLS Metathesaurus. Our main goal is to provide semantic interoperability across the clinical concepts from various knowledge bases. As a side effect, the mapping enriches already annotated corpora in multiple languages with new labels. For instance, spans manually annotated with IDs from UMLS can be annotated with Semantic Types and Groups, and its corresponding SNOMED CT and ICD-10 IDs. We also experiment with sequence labelling models for detecting Diagnosis and Procedures concepts and for detecting UMLS Semantic Groups trained on Spanish, English, and bilingual corpora obtained with the new mapping procedure. The ClinIDMap tool is publicly available. 2022.lrec-1.390 zotova-etal-2022-clinidmap + vicomtech/clinidmap MedMentions @@ -4968,6 +4969,7 @@ In the field of Japanese medical information extraction, few analyzing tools are available and relation extraction is still an under-explored topic. In this paper, we first propose a novel relation annotation schema for investigating the medical and temporal relations between medical entities in Japanese medical reports. We experiment with the practical annotation scenarios by separately annotating two different types of reports. We design a pipeline system with three components for recognizing medical entities, classifying entity modalities, and extracting relations. The empirical results show accurate analyzing performance and suggest the satisfactory annotation quality, the superiority of the latest contextual embedding models. and the feasible annotation strategy for high-accuracy demand. 2022.lrec-1.397 cheng-etal-2022-jamie + racerandom/jamie Enhanced Entity Annotations for Multilingual Corpora @@ -5345,6 +5347,7 @@ Olfactory references play a crucial role in our memory and, more generally, in our experiences, since researchers have shown that smell is the sense that is most directly connected with emotions. Nevertheless, only few works in NLP have tried to capture this sensory dimension from a computational perspective. One of the main challenges is the lack of a systematic and consistent taxonomy of olfactory information, where concepts are organised also in a multi-lingual perspective. WordNet represents a valuable starting point in this direction, which can be semi-automatically extended taking advantage of Google n-grams and of existing language models. In this work we describe the process that has led to the semi-automatic development of a taxonomy for olfactory information in four languages (English, French, German and Italian), detailing the different steps and the intermediate evaluations. Along with being multi-lingual, the taxonomy also encloses temporal marks for olfactory terms thus making it a valuable resource for historical content analysis. The resource has been released and is freely available. 2022.lrec-1.429 menini-etal-2022-building + odeuropa/multilingualtaxonomies Attention Understands Semantic Relations @@ -6138,7 +6141,6 @@ jmeadows17/physnlu PhysNLU WikiText-103 - WikiText-2 <fixed-case>HECTOR</fixed-case>: A Hybrid <fixed-case>TE</fixed-case>xt <fixed-case>S</fixed-case>implifi<fixed-case>C</fixed-case>ation <fixed-case>TO</fixed-case>ol for Raw Texts in <fixed-case>F</fixed-case>rench diff --git a/data/xml/2022.naacl.xml b/data/xml/2022.naacl.xml index 9215cabda3..629dd27a77 100644 --- a/data/xml/2022.naacl.xml +++ b/data/xml/2022.naacl.xml @@ -1341,7 +1341,6 @@ Database Search Results Disambiguation for Task-Oriented Dialog Systems @@ -1571,7 +1570,6 @@ ahuja-etal-2022-economics 10.18653/v1/2022.naacl-main.98 @@ -5250,7 +5248,6 @@ GLUE SQuAD WikiText-103 - WikiText-2 <fixed-case>FN</fixed-case>et: Mixing Tokens with <fixed-case>F</fixed-case>ourier Transforms @@ -6872,7 +6869,6 @@ Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with <fixed-case>VAE</fixed-case>s @@ -6965,7 +6961,6 @@ llyx97/TAMT GLUE WikiText-103 - WikiText-2 You Don’t Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers’ Private Personas diff --git a/data/xml/2022.osact.xml b/data/xml/2022.osact.xml index bc75243449..7a0e3e75f9 100644 --- a/data/xml/2022.osact.xml +++ b/data/xml/2022.osact.xml @@ -143,7 +143,6 @@ aftab-malik-2022-erock ARCD SQuAD - TyDiQA TyDiQA-GoldP diff --git a/data/xml/2022.repl4nlp.xml b/data/xml/2022.repl4nlp.xml index 725e9095f1..9b27c0131c 100644 --- a/data/xml/2022.repl4nlp.xml +++ b/data/xml/2022.repl4nlp.xml @@ -147,7 +147,6 @@ IMDb Movie Reviews MultiNLI WikiText-103 - WikiText-2 A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning diff --git a/data/xml/2022.sltat.xml b/data/xml/2022.sltat.xml index 897b6a41db..155e2c41de 100644 --- a/data/xml/2022.sltat.xml +++ b/data/xml/2022.sltat.xml @@ -180,7 +180,6 @@ Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a skeletal graph structure, with joints as nodes and both spatial and temporal connections as edges. To operate on this graphical structure, we propose Skeletal Graph Self-Attention (SGSA), a novel graphical attention layer that embeds a skeleton inductive bias into the SLP model. Retaining the skeletal feature representation throughout, we directly apply a spatio-temporal adjacency matrix into the self-attention formulation. This provides structure and context to each skeletal joint that is not possible when using a non-graphical abstract representation, enabling fluid and expressive sign language production. We evaluate our Skeletal Graph Self-Attention architecture on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset, achieving state-of-the-art back translation performance with an 8% and 7% improvement over competing methods for the dev and test sets. 2022.sltat-1.15 saunders-etal-2022-skeletal - PHOENIX14T RWTH-PHOENIX-Weather 2014 T diff --git a/data/xml/2022.spnlp.xml b/data/xml/2022.spnlp.xml index 36b7040f67..d7ccd5e27b 100644 --- a/data/xml/2022.spnlp.xml +++ b/data/xml/2022.spnlp.xml @@ -112,7 +112,6 @@ treviso-etal-2022-predicting 10.18653/v1/2022.spnlp-1.7 WikiText-103 - WikiText-2 diff --git a/data/xml/2023.acl.xml b/data/xml/2023.acl.xml index a6f8a59ae4..333c094645 100644 --- a/data/xml/2023.acl.xml +++ b/data/xml/2023.acl.xml @@ -6061,7 +6061,7 @@ Answering Ambiguous Questions via Iterative Prompting - WeiweiSunShandong University + WeiweiSunShandong University HengyiCaiJD.com HongshenChenJD.com PengjieRenShandong University @@ -10357,7 +10357,7 @@ <fixed-case>RADE</fixed-case>: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue ZhengliangShiShandong University - WeiweiSunShandong University + WeiweiSunShandong University ShuoZhangBloomberg ZhenZhangShandong University PengjieRenSchool of Computer Science and Technology, Shandong University @@ -12556,7 +12556,7 @@ Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression WenWuUniversity of Cambridge - ChaoZhangTsinghua University + ChaoZhangTsinghua University PhilipWoodlandUniversity of Cambridge 15681-15695 In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception. Though deterministic labels generated by averaging or voting are often used as the ground truth, it ignores the intrinsic uncertainty revealed by the inconsistent labels. This paper proposes a Bayesian approach, deep evidential emotion regression (DEER), to estimate the uncertainty in emotion attributes. Treating the emotion attribute labels of an utterance as samples drawn from an unknown Gaussian distribution, DEER places an utterance-specific normal-inverse gamma prior over the Gaussian likelihood and predicts its hyper-parameters using a deep neural network model. It enables a joint estimation of emotion attributes along with the aleatoric and epistemic uncertainties. AER experiments on the widely used MSP-Podcast and IEMOCAP datasets showed DEER produced state-of-the-art results for both the mean values and the distribution of emotion attributes. @@ -14871,7 +14871,7 @@ <fixed-case>MOSPC</fixed-case>: <fixed-case>MOS</fixed-case> Prediction Based on Pairwise Comparison - KexinWangBytedance + KexinWangBytedance YunlongZhaoInstitute of Automation, Chinese Academy of Sciences QianqianDongByteDance AI Lab TomKoByteDance AI Lab @@ -15923,7 +15923,7 @@ JiatongShiCarnegie Mellon University YunTangFacebook HirofumiInagumaMeta AI - YifanPengCarnegie Mellon University + YifanPengCarnegie Mellon University SiddharthDalmiaGoogle PeterPolákCharles University, MFF UFAL PatrickFernandesCarnegie Mellon University, Instituto de Telecomunicações diff --git a/data/xml/2023.calcs.xml b/data/xml/2023.calcs.xml index 9b8c3cbc69..e3ad0a6e47 100644 --- a/data/xml/2023.calcs.xml +++ b/data/xml/2023.calcs.xml @@ -15,12 +15,11 @@
Singapore
December 2023 - 2023.calcs-1 + 2023.calcs-1 calcs - ws - 2023.calcs-1.0 + 2023.calcs-1.0 calcs-2023-approaches @@ -29,9 +28,8 @@ SimoneTeufel 1-13 This paper contributes to German-English code-switching research. We provide the largest corpus of naturally occurring German-English code-switching, where English is included in German text, and two methods for code-switching identification. The first method is rule-based, using wordlists and morphological processing. We use this method to compile a corpus of 25.6M tweets employing German-English code-switching. In our second method, we continue pretraining of a neural language model on this corpus and classify tokens based on embeddings from this language model. Our systems establish SoTA on our new corpus and an existing German-English code-switching benchmark. In particular, we systematically study code-switching for language-ambiguous words which can only be resolved in context, and morphologically mixed words consisting of both English and German morphemes. We distribute both corpora and systems to the research community. - 2023.calcs-1.1 + 2023.calcs-1.1 sterner-teufel-2023-tongueswitcher - 10.18653/v1/2023.calcs-1.1 Towards Real-World Streaming Speech Translation for Code-Switched Speech @@ -43,9 +41,8 @@ AashishAgarwalUniversität Duisburg-Essen 14-22 Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source monolingual transcription). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier. - 2023.calcs-1.2 + 2023.calcs-1.2 alastruey-etal-2023-towards - 10.18653/v1/2023.calcs-1.2 Language Preference for Expression of Sentiment for <fixed-case>N</fixed-case>epali-<fixed-case>E</fixed-case>nglish Bilingual Speakers on Social Media @@ -53,9 +50,8 @@ KazutakaShimada 23-32 Nepali-English code-switching (CS) has been a growing phenomenon in Nepalese society, especially in social media. The code-switching text can be leveraged to understand the socio-linguistic behaviours of the multilingual speakers. Existing studies have attempted to identify the language preference of the multilingual speakers for expressing different emotions using text in different language pairs. In this work, we aim to study the language preference of multilingual Nepali-English CS speakers while expressing sentiment in social media. We create a novel dataset for sentiment analysis using the public Nepali-English code-switched comments in YouTube. After performing the statistical study on the dataset, we find that the proportion of use of Nepali language is higher in negative comments when compared with positive comments, hence concluding the preference for using native language while expressing negative sentiment. Machine learning and transformer-based models are used as the baseline models for the dataset for sentiment classification. The dataset is released publicly. - 2023.calcs-1.3 + 2023.calcs-1.3 pahari-shimada-2023-language - 10.18653/v1/2023.calcs-1.3 Text-Derived Language Identity Incorporation for End-to-End Code-Switching Speech Recognition @@ -63,9 +59,8 @@ HaizhouLi 33-42 Recognizing code-switching (CS) speech often presents challenges for an automatic speech recognition system (ASR) due to limited linguistic context in short monolingual segments, resulting in language confusion. To mitigate this issue, language identity (LID) is often integrated into the speech recognition system to provide additional linguistic context. However, previous works predominately focus on extracting language identity from speech signals. We introduce a novel approach to learn language identity from pure text data via a dedicated language identity-language model. Besides, we explore two strategies: LID state fusion and language posterior biasing, to integrate the text-derived language identities into the end-to-end ASR system. By incorporating hypothesized language identities, our ASR system gains crucial contextual cues, effectively capturing language transitions and patterns within code-switched utterances. We conduct speech recognition experiments on the SEAME corpus and demonstrate the effectiveness of our proposed methods. Our results reveal significantly improved transcriptions in code-switching scenarios, underscoring the potential of text-derived LID in enhancing code-switching speech recognition. - 2023.calcs-1.4 + 2023.calcs-1.4 wang-li-2023-text - 10.18653/v1/2023.calcs-1.4 Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South <fixed-case>E</fixed-case>ast <fixed-case>A</fixed-case>sian Languages @@ -84,12 +79,11 @@ LongPhan RowenaGarcia ThamarSolorio - AlhamAji + Alham FikriAji 43-63 - While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research. The recent proliferation of Large Language Models (LLMs) compels one to ask: how capable are these systems in generating code-mixed data? In this paper, we explore prompting multilingual LLMs in a zero-shot manner to generate code-mixed data for seven languages in South East Asia (SEA), namely Indonesian, Malay, Chinese, Tagalog, Vietnamese, Tamil, and Singlish. We find that publicly available multilingual instruction-tuned models such as BLOOMZ and Flan-T5-XXL are incapable of producing texts with phrases or clauses from different languages. ChatGPT exhibits inconsistent capabilities in generating code-mixed texts, wherein its performance varies depending on the prompt template and language pairing. For instance, ChatGPT generates fluent and natural Singlish texts (an English-based creole spoken in Singapore), but for English-Tamil language pair, the system mostly produces grammatically incorrect or semantically meaningless utterances. Furthermore, it may erroneously introduce languages not specified in the prompt. Based on our investigation, existing multilingual LLMs exhibit a wide range of proficiency in code-mixed data generation for SEA languages. As such, we advise against using LLMs in this context without extensive human checks. - 2023.calcs-1.5 + The differences in decision making between behavioural models of voice interfaces are hard to capture using existing measures for the absolute performance of such models. For instance, two models may have a similar task success rate, but very different ways of getting there. In this paper, we propose a general methodology to compute the similarity of two dialogue behaviour models and investigate different ways of computing scores on both the semantic and the textual level. Complementing absolute measures of performance, we test our scores on three different tasks and show the practical usability of the measures. + 2023.calcs-1.5 yong-etal-2023-prompting - 10.18653/v1/2023.calcs-1.5 <fixed-case>CONFLATOR</fixed-case>: Incorporating Switching Point based Rotatory Positional Encodings for Code-Mixed Language Modeling @@ -102,12 +96,9 @@ AmanChadha AmitavaDas 64-73 - The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2 -> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to SPs in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results. - -We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation. - 2023.calcs-1.6 + The mixing of two or more languages is called Code-Mixing (CM). CM is a social norm in multilingual societies. Neural Language Models (NLMs) like transformers have been effective on many NLP tasks. However, NLM for CM is an under-explored area. Though transformers are capable and powerful, they cannot always encode positional information since they are non-recurrent. Therefore, to enrich word information and incorporate positional information, positional encoding is defined. We hypothesize that Switching Points (SPs), i.e., junctions in the text where the language switches (L1 -> L2 or L2 -> L1), pose a challenge for CM Language Models (LMs), and hence give special emphasis to SPs in the modeling process. We experiment with several positional encoding mechanisms and show that rotatory positional encodings along with switching point information yield the best results.We introduce CONFLATOR: a neural language modeling approach for code-mixed languages. CONFLATOR tries to learn to emphasize switching points using smarter positional encoding, both at unigram and bigram levels. CONFLATOR outperforms the state-of-the-art on two tasks based on code-mixed Hindi and English (Hinglish): (i) sentiment analysis and (ii) machine translation. + 2023.calcs-1.6 mohammed-etal-2023-conflator - 10.18653/v1/2023.calcs-1.6 Unified Model for Code-Switching Speech Recognition and Language Identification Based on Concatenated Tokenizer @@ -116,9 +107,8 @@ We introduce CONFLATOR: a neural language modeling approach for code-mixed langu BorisGinsburg 74-82 Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation. This paper proposes (1) a new method for creating code-switching ASR datasets from purely monolingual data sources, and (2) a novel Concatenated Tokenizer that enables ASR models to generate language ID for each emitted text token while reusing existing monolingual tokenizers. The efficacy of these approaches for building CS ASR models is demonstrated for two language pairs, English-Hindi and English-Spanish, where we achieve new state-of-the-art results on the Miami Bangor CS evaluation corpus. In addition to competitive ASR performance, the proposed Concatenated Tokenizer models are highly effective for spoken language identification, achieving 98%+ accuracy on the out-of-distribution FLEURS dataset. - 2023.calcs-1.7 + 2023.calcs-1.7 dhawan-etal-2023-unified - 10.18653/v1/2023.calcs-1.7 Multilingual self-supervised speech representations improve the speech recognition of low-resource <fixed-case>A</fixed-case>frican languages with codeswitching @@ -127,9 +117,8 @@ We introduce CONFLATOR: a neural language modeling approach for code-mixed langu DanJurafsky 83-88 While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetuning self-supervised multilingual representations and augmenting them with n-gram language models trained from transcripts reduces absolute word error rates by up to 20% compared to baselines of hybrid models trained from scratch on code-switched data. Our findings suggest that in circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution. - 2023.calcs-1.8 + 2023.calcs-1.8 ogunremi-etal-2023-multilingual - 10.18653/v1/2023.calcs-1.8 diff --git a/data/xml/2023.ccl.xml b/data/xml/2023.ccl.xml index 2b6b5a16f0..7ae56b94d6 100644 --- a/data/xml/2023.ccl.xml +++ b/data/xml/2023.ccl.xml @@ -1626,7 +1626,7 @@
Studying Language Processing in the Human Brain with Speech and Language Models - ZhangChao + ZhangChao ThwaitesAndrew WingfieldCai 17–23 diff --git a/data/xml/2023.cl.xml b/data/xml/2023.cl.xml index 198b403465..820954d873 100644 --- a/data/xml/2023.cl.xml +++ b/data/xml/2023.cl.xml @@ -303,4 +303,85 @@ tait-etal-2023-obituary + + + Computational Linguistics, Volume 49, Issue 4 - December 2023 + MIT Press +
Cambridge, MA
+ December + 2023 + cl + 49 + 4 + + + My Tenure as the Editor-in-Chief of Computational Linguistics + Hwee TouNg + 10.1162/coli_e_00505 + Times flies and it has been close to five and a half years since I became the editor-in-chief of Computational Linguistics on 15 July 2018. In this editorial, I will describe the changes that I have introduced at the journal, and highlight the achievements and challenges of the journal. + 773–775 + 2023.cl-4.1 + ng-2023-tenure + + + Measuring Attribution in Natural Language Generation Models + HannahRashkin + VitalyNikolaev + MatthewLamm + LoraAroyo + MichaelCollins + DipanjanDas + SlavPetrov + Gaurav SinghTomar + IuliaTurc + DavidReitter + 10.1162/coli_a_00486 + Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source. We define AIS and a two-stage annotation pipeline for allowing annotators to evaluate model output according to annotation guidelines. We successfully validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset). We provide full annotation guidelines in the appendices and publicly release the annotated data at https://github.com/google-research-datasets/AIS. + 777–840 + 2023.cl-4.2 + rashkin-etal-2023-measuring + + + Generation and Polynomial Parsing of Graph Languages with Non-Structural Reentrancies + JohannaBjörklund + FrankDrewes + AnnaJonsson + 10.1162/coli_a_00488 + Graph-based semantic representations are popular in natural language processing, where it is often convenient to model linguistic concepts as nodes and relations as edges between them. Several attempts have been made to find a generative device that is sufficiently powerful to describe languages of semantic graphs, while at the same allowing efficient parsing. We contribute to this line of work by introducing graph extension grammar, a variant of the contextual hyperedge replacement grammars proposed by Hoffmann et al. Contextual hyperedge replacement can generate graphs with non-structural reentrancies, a type of node-sharing that is very common in formalisms such as abstract meaning representation, but that context-free types of graph grammars cannot model. To provide our formalism with a way to place reentrancies in a linguistically meaningful way, we endow rules with logical formulas in counting monadic second-order logic. We then present a parsing algorithm and show as our main result that this algorithm runs in polynomial time on graph languages generated by a subclass of our grammars, the so-called local graph extension grammars. + 841–882 + 2023.cl-4.3 + bjorklund-etal-2023-generation + + + Capturing Fine-Grained Regional Differences in Language Use through Voting Precinct Embeddings + AlexRosenfeld + LarsHinrichs + 10.1162/coli_a_00487 + Linguistic variation across a region of interest can be captured by partitioning the region into areas and using social media data to train embeddings that represent language use in those areas. Recent work has focused on larger areas, such as cities or counties, to ensure that enough social media data is available in each area, but larger areas have a limited ability to find fine-grained distinctions, such as intracity differences in language use. We demonstrate that it is possible to embed smaller areas, which can provide higher resolution analyses of language variation. We embed voting precincts, which are tiny, evenly sized political divisions for the administration of elections. The issue with modeling language use in small areas is that the data becomes incredibly sparse, with many areas having scant social media data. We propose a novel embedding approach that alternates training with smoothing, which mitigates these sparsity issues. We focus on linguistic variation across Texas as it is relatively understudied. We develop two novel quantitative evaluations that measure how well the embeddings can be used to capture linguistic variation. The first evaluation measures how well a model can map a dialect given terms specific to that dialect. The second evaluation measures how well a model can map preference of lexical variants. These evaluations show how embedding models could be used directly by sociolinguists and measure how much sociolinguistic information is contained within the embeddings. We complement this second evaluation with a methodology for using embeddings as a kind of genetic code where we identify “genes” that correspond to a sociological variable and connect those “genes” to a linguistic phenomenon thereby connecting sociological phenomena to linguistic ones. Finally, we explore approaches for inferring isoglosses using embeddings. + 883–942 + 2023.cl-4.4 + rosenfeld-hinrichs-2023-capturing + + + Languages Through the Looking Glass of <fixed-case>BPE</fixed-case> Compression + XimenaGutierrez-Vasques + ChristianBentz + TanjaSamardžić + 10.1162/coli_a_00489 + Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It uncovers redundant patterns for compressing the data, and hence alleviates the sparsity problem in downstream applications. Subwords discovered during the first merge operations tend to have the most substantial impact on the compression of texts. However, the structural underpinnings of this effect have not been analyzed cross-linguistically. We conduct in-depth analyses across 47 typologically diverse languages and three parallel corpora, and thereby show that the types of recurrent patterns that have the strongest impact on compression are an indicator of morphological typology. For languages with richer inflectional morphology there is a preference for highly productive subwords on the early merges, while for languages with less inflectional morphology, idiosyncratic subwords are more prominent. Both types of patterns contribute to efficient compression. Counter to the common perception that BPE subwords are not linguistically relevant, we find patterns across languages that resemble those described in traditional typology. We thus propose a novel way to characterize languages according to their BPE subword properties, inspired by the notion of morphological productivity in linguistics. This allows us to have language vectors that encode typological knowledge induced from raw text. Our approach is easily applicable to a wider range of languages and texts, as it does not require annotated data or any external linguistic knowledge. We discuss its potential contributions to quantitative typology and multilingual NLP. + 943–1001 + 2023.cl-4.5 + gutierrez-vasques-etal-2023-languages + + + Language Embeddings Sometimes Contain Typological Generalizations + RobertÖstling + MurathanKurfalı + 10.1162/coli_a_00491 + To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (1) multiple sets of language representations, (2) multilingual word embeddings, (3) projected and predicted syntactic and morphological features, (4) software to provide linguistically sound evaluations of language representations. + 1003–1051 + 2023.cl-4.6 + ostling-kurfali-2023-language + +
diff --git a/data/xml/2023.emnlp.xml b/data/xml/2023.emnlp.xml index 0a8f9eaa89..76bc5bf299 100644 --- a/data/xml/2023.emnlp.xml +++ b/data/xml/2023.emnlp.xml @@ -603,7 +603,7 @@ <fixed-case>LLM</fixed-case>-powered Data Augmentation for Enhanced Cross-lingual Performance ChenxiWhitehouse MonojitChoudhury - AlhamAji + Alham FikriAji 671-686 This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in multilingual commonsense reasoning datasets where the available training data is extremely limited. To achieve this, we utilise several LLMs, namely Dolly-v2, StableVicuna, ChatGPT, and GPT-4, to augment three datasets: XCOPA, XWinograd, and XStoryCloze. Subsequently, we evaluate the effectiveness of fine-tuning smaller multilingual models, mBERT and XLMR, using the synthesised data. We compare the performance of training with data generated in English and target languages, as well as translated English-generated data, revealing the overall advantages of incorporating data generated by LLMs, e.g. a notable 13.4 accuracy score improvement for the best case. Furthermore, we conduct a human evaluation by asking native speakers to assess the naturalness and logical coherence of the generated examples across different languages. The results of the evaluation indicate that LLMs such as ChatGPT and GPT-4 excel at producing natural and coherent text in most languages, however, they struggle to generate meaningful text in certain languages like Tamil. We also observe that ChatGPT falls short in generating plausible alternatives compared to the original dataset, whereas examples from GPT-4 exhibit competitive logical consistency. 2023.emnlp-main.44 @@ -3940,7 +3940,7 @@ Towards Building More Robust <fixed-case>NER</fixed-case> datasets: An Empirical Study on <fixed-case>NER</fixed-case> Dataset Bias from a Dataset Difficulty View RuotianMa - XiaoleiWang + XiaoleiWang XinZhou QiZhang XuanjingHuang @@ -7032,10 +7032,12 @@ RyanCotterell 8069-8086 Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the expressive power of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages—rather, they define probability distributions over strings. In this work, we study what classes of such probability distributions RNN LMs can represent, which allows us to make more direct statements about their capabilities. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models. Furthermore, we study the space complexity of representing finite-state LMs with RNNs. We show that, to represent an arbitrary deterministic finite-state LM with N states over an alphabet \Sigma, an RNN requires \Omega\left(N |\Sigma|\right) neurons. These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations. - 2023.emnlp-main.502 + 2023.emnlp-main.502 svete-cotterell-2023-recurrent 10.18653/v1/2023.emnlp-main.502 Revisiting Source Context in Nearest Neighbor Machine Translation @@ -8681,7 +8683,7 @@ Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models - XiaoleiWang + XiaoleiWang XinyuTang XinZhao JingyuanWang @@ -10792,7 +10794,7 @@ SamuelCahyawijaya Jan Christian BlaiseCruz GentaWinata - AlhamAji + Alham FikriAji 12567-12582 Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods. While there have been extensive studies on their abilities in monolingual tasks, the investigation of their potential in the context of code-switching (CSW), the practice of alternating languages within an utterance, remains relatively uncharted. In this paper, we provide a comprehensive empirical analysis of various multilingual LLMs, benchmarking their performance across four tasks: sentiment analysis, machine translation, summarization and word-level language identification. Our results indicate that despite multilingual LLMs exhibiting promising outcomes in certain tasks using zero or few-shot prompting, they still underperform in comparison to fine-tuned models of much smaller scales. We argue that current “multilingualism’ in LLMs does not inherently imply proficiency with code-switching texts, calling for future research to bridge this discrepancy. 2023.emnlp-main.774 @@ -12199,7 +12201,7 @@ FahimFaisal AlissaOstapenko GentaWinata - AlhamAji + Alham FikriAji SamuelCahyawijaya YuliaTsvetkov AntoniosAnastasopoulos @@ -12893,7 +12895,7 @@ Is <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case> Good at Search? Investigating Large Language Models as Re-Ranking Agents - WeiweiSun + WeiweiSun LingyongYan XinyuMa ShuaiqiangWang @@ -14100,7 +14102,7 @@ Not all Fake News is Written: A Dataset and Analysis of Misleading Video Headlines - YooSung + Yoo YeonSung JordanBoyd-Graber NaeemulHassan 16241-16258 diff --git a/data/xml/2023.findings.xml b/data/xml/2023.findings.xml index e7974149b6..d753431a58 100644 --- a/data/xml/2023.findings.xml +++ b/data/xml/2023.findings.xml @@ -2081,7 +2081,7 @@ Generative Knowledge Selection for Knowledge-Grounded Dialogues - WeiweiSunShandong University + WeiweiSunShandong University PengjieRenShandong University ZhaochunRenShandong University 2077-2088 @@ -4468,7 +4468,7 @@ Speaking Multiple Languages Affects the Moral Bias of Language Models - KatharinaHaemmerlCenter for Information and Language Processing, LMU + KatharinaHämmerlCenter for Information and Language Processing, LMU BjoernDeiserothTU Darmstadt, Aleph Alpha PatrickSchramowskiTU Darmstadt JindřichLibovickýCharles Univeristy @@ -4999,6 +4999,7 @@ 10.18653/v1/2023.findings-acl.173 Author name correction. + <fixed-case>X</fixed-case>-<fixed-case>R</fixed-case>i<fixed-case>SAWOZ</fixed-case>: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents @@ -8513,7 +8514,7 @@ Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity - KatharinaHaemmerlCenter for Information and Language Processing, LMU + KatharinaHämmerlCenter for Information and Language Processing, LMU AlinaFastowskiCenter for Information and Language Processing, LMU Munich JindřichLibovickýCharles Univeristy AlexanderFraserLudwig-Maximilians-Universität München @@ -27763,7 +27764,7 @@ <fixed-case>D</fixed-case>i<fixed-case>QAD</fixed-case>: A Benchmark Dataset for Open-domain Dialogue Quality Assessment YukunZhao LingyongYan - WeiweiSun + WeiweiSun ChongMeng ShuaiqiangWang ZhicongCheng diff --git a/data/xml/2023.ijcnlp.xml b/data/xml/2023.ijcnlp.xml index caae68176a..a8a2da7114 100644 --- a/data/xml/2023.ijcnlp.xml +++ b/data/xml/2023.ijcnlp.xml @@ -1384,7 +1384,7 @@ 10.18653/v1/2023.ijcnlp-tutorials.6 - + Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations SriparnaSaha @@ -1394,10 +1394,9 @@ November 2023 ijcnlp - aacl - 2023.ijcnlp-demo.0 + 2023.ijcnlp-demo.0 ijcnlp-2023-international-joint-natural-language-processing @@ -1492,6 +1491,20 @@ moon-etal-2023-wamp 10.18653/v1/2023.ijcnlp-demo.8 + + <fixed-case>ERNIE</fixed-case>-Music: Text-to-Waveform Music Generation with Diffusion Models + PengfeiZhu + ChaoPang + YekunChai + LeiLi + ShuohuanWang + YuSun + HaoTian + HuaWu + 86–95 + 2023.ijcnlp-demo.9 + zhu-etal-2023-ernie + diff --git a/data/xml/2023.inlg.xml b/data/xml/2023.inlg.xml index d5bdf65766..79da3ed210 100644 --- a/data/xml/2023.inlg.xml +++ b/data/xml/2023.inlg.xml @@ -428,10 +428,12 @@ DimitraGkatzia 443–448 Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we developed the first datasets for Scottish Gaelic NLG, collecting both conversational and summarisation data in a single setting. Our task setup involves dialogues between a pair of speakers discussing museum exhibits, grounding the conversation in images and texts. Then, both interlocutors summarise the dialogue resulting in a secondary dialogue summarisation dataset. This paper presents the dialogue and summarisation corpora, as well as the software used for data collection. The corpus consists of 43 conversations (13.7k words) and 61 summaries (2.0k words), and will be released along with the data collection interface. - 2023.inlg-main.34 + 2023.inlg-main.34 2023.inlg-main.34.Supplementary_Attachment.pdf howcroft-etal-2023-building 10.18653/v1/2023.inlg-main.34 + + This version corrects descriptive corpus statistics, because some conversations, summaries, and participants were erroneously excluded.
Generating Multiple Questions from Presentation Transcripts: A Pilot Study on Earnings Conference Calls diff --git a/data/xml/2023.iwslt.xml b/data/xml/2023.iwslt.xml index e693e656b6..de22ca75ec 100644 --- a/data/xml/2023.iwslt.xml +++ b/data/xml/2023.iwslt.xml @@ -367,7 +367,7 @@ SoumiMaitiML researcher WilliamChenCarnegie Mellon University XinjianLiCarnegie Mellon University - YifanPengCarnegie Mellon University + YifanPengCarnegie Mellon University SiddhantAroraStudent at Carnegie Mellon Univeristy ShinjiWatanabeCarnegie Mellon University 235-240 diff --git a/data/xml/2023.latechclfl.xml b/data/xml/2023.latechclfl.xml index c3a75d732c..6798865ba6 100644 --- a/data/xml/2023.latechclfl.xml +++ b/data/xml/2023.latechclfl.xml @@ -138,7 +138,7 @@ What do Humor Classifiers Learn? An Attempt to Explain Humor Recognition Models - MarcioInácioUniversity of Coimbra + MarcioLima InácioUniversity of Coimbra GabrielaWick-pedroUniversidade Federal de São Carlos HugoGoncalo OliveiraCISUC, DEI, University of Coimbra 88-98 diff --git a/data/xml/2023.matching.xml b/data/xml/2023.matching.xml index b4293301f1..cb8322962a 100644 --- a/data/xml/2023.matching.xml +++ b/data/xml/2023.matching.xml @@ -93,7 +93,7 @@ Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering JinheonBaek - AlhamAji + Alham FikriAji AmirSaffari 70-98 Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks, based on their internal knowledge stored in parameters during pre-training. However, such internalized knowledge might be insufficient and incorrect, which could lead LLMs to generate factually wrong answers. Furthermore, fine-tuning LLMs to update their knowledge is expensive. To this end, we propose to augment the knowledge directly in the input of LLMs. Specifically, we first retrieve the relevant facts to the input question from the knowledge graph based on semantic similarities between the question and its associated facts. After that, we prepend the retrieved facts to the input question in the form of the prompt, which is then forwarded to LLMs to generate the answer. Our framework, Knowledge-Augmented language model PromptING (KAPING), requires no model training, thus completely zero-shot. We validate the performance of our KAPING framework on the knowledge graph question answering task, that aims to answer the user’s question based on facts over a knowledge graph, on which ours outperforms relevant zero-shot baselines by up to 48% in average, across multiple LLMs of various sizes. diff --git a/data/xml/2023.nllp.xml b/data/xml/2023.nllp.xml index 5eaf9b1650..17dd782c65 100644 --- a/data/xml/2023.nllp.xml +++ b/data/xml/2023.nllp.xml @@ -7,7 +7,7 @@ CatalinaGoanta IliasChalkidis LeslieBarrett - Gerasimos (Jerry)Spanakis + GerasimosSpanakis NikolaosAletras Association for Computational Linguistics
Singapore
diff --git a/data/xml/2023.sicon.xml b/data/xml/2023.sicon.xml index 826c31c62e..f3e9bf5f3a 100644 --- a/data/xml/2023.sicon.xml +++ b/data/xml/2023.sicon.xml @@ -71,7 +71,7 @@ <fixed-case>BC</fixed-case>ause: Reducing group bias and promoting cohesive discussion in online deliberation processes through a simple and engaging online deliberation tool LucasAnastasiou - AnnaDe LibboNA + AnnaDe Liddo 39-49 Facilitating healthy online deliberation in terms of sensemaking and collaboration of discussion participants proves extremely challenging due to a number of known negative effects of online communication on social media platforms. We start from concerns and aspirations about the use of existing online discussion systems as distilled in previous literature, we then combine them with lessons learned on design and engineering practices from our research team, to inform the design of an easy-to-use tool (BCause.app) that enables higher quality discussions than traditional social media. We describe the design of this tool, highlighting the main interaction features that distinguish it from common social media, namely: i. the low-cost argumentation structuring of the conversations with direct replies; ii. and the distinctive use of reflective feedback rather than appreciative-only feedback. We then present the results of a controlled A/B experiment in which we show that the presence of argumentative and cognitive reflective discussion elements produces better social interaction with less polarization and promotes a more cohesive discussion than common social media-like interactions. 2023.sicon-1.5 diff --git a/data/xml/2023.swisstext.xml b/data/xml/2023.swisstext.xml index 3baa071256..84e50e44d7 100644 --- a/data/xml/2023.swisstext.xml +++ b/data/xml/2023.swisstext.xml @@ -17,7 +17,7 @@ swisstext - 2023.swisstext-1.0 + 2023.swisstext-1.0 swisstext-2023-edition diff --git a/data/xml/2023.tacl.xml b/data/xml/2023.tacl.xml index ca41be3c16..5154e6037f 100644 --- a/data/xml/2023.tacl.xml +++ b/data/xml/2023.tacl.xml @@ -1091,5 +1091,251 @@ sherborne-etal-2023-optimal + + Testing the Predictions of Surprisal Theory in 11 Languages + Ethan G.Wilcox + TiagoPimentel + ClaraMeister + RyanCotterell + Roger P.Levy + 10.1162/tacl_a_00612 + Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages. + 1451–1470 + 2023.tacl-1.82 + wilcox-etal-2023-testing + + + Shared Lexical Items as Triggers of Code Switching + ShulyWintner + SafaaShehadi + YuliZeira + DoreenOsmelak + YuvalNov + 10.1162/tacl_a_00613 + Why do bilingual speakers code-switch (mix their two languages)? Among the several theories that attempt to explain this natural and ubiquitous phenomenon, the triggering hypothesis relates code-switching to the presence of lexical triggers, specifically cognates and proper names, adjacent to the switch point. We provide a fuller, more nuanced and refined exploration of the triggering hypothesis, based on five large datasets in three language pairs, reflecting both spoken and written bilingual interactions. Our results show that words that are assumed to reside in a mental lexicon shared by both languages indeed trigger code-switching, that the tendency to switch depends on the distance of the trigger from the switch point and on whether the trigger precedes or succeeds the switch, but not on the etymology of the trigger words. We thus provide strong, robust, evidence-based confirmation to several hypotheses on the relationships between lexical triggers and code-switching. + 1471–1484 + 2023.tacl-1.83 + wintner-etal-2023-shared + + + Learning More from Mixed Emotions: A Label Refinement Method for Emotion Recognition in Conversations + JintaoWen + GengTu + RuiLi + DazhiJiang + WenhuaZhu + 10.1162/tacl_a_00614 + One-hot labels are commonly employed as ground truth in Emotion Recognition in Conversations (ERC). However, this approach may not fully encompass all the emotions conveyed in a single utterance, leading to suboptimal performance. Regrettably, current ERC datasets lack comprehensive emotionally distributed labels. To address this issue, we propose the Emotion Label Refinement (EmoLR) method, which utilizes context- and speaker-sensitive information to infer mixed emotional labels. EmoLR comprises an Emotion Predictor (EP) module and a Label Refinement (LR) module. The EP module recognizes emotions and provides context/speaker states for the LR module. Subsequently, the LR module calculates the similarity between these states and ground-truth labels, generating a refined label distribution (RLD). The RLD captures a more comprehensive range of emotions than the original one-hot labels. These refined labels are then used for model training in place of the one-hot labels. Experimental results on three public conversational datasets demonstrate that our EmoLR achieves state-of-the-art performance. + 1485–1499 + 2023.tacl-1.84 + wen-etal-2023-learning + + + Hallucinations in Large Multilingual Translation Models + Nuno M.Guerreiro + Duarte M.Alves + JonasWaldendorf + BarryHaddow + AlexandraBirch + PierreColombo + André F. T.Martins + 10.1162/tacl_a_00615 + Hallucinated translations can severely undermine and raise safety issues when machine translation systems are deployed in the wild. Previous research on the topic focused on small bilingual models trained on high-resource languages, leaving a gap in our understanding of hallucinations in multilingual models across diverse translation scenarios. In this work, we fill this gap by conducting a comprehensive analysis—over 100 language pairs across various resource levels and going beyond English-centric directions—on both the M2M neural machine translation (NMT) models and GPT large language models (LLMs). Among several insights, we highlight that models struggle with hallucinations primarily in low-resource directions and when translating out of English, where, critically, they may reveal toxic patterns that can be traced back to the training data. We also find that LLMs produce qualitatively different hallucinations to those of NMT models. Finally, we show that hallucinations are hard to reverse by merely scaling models trained with the same data. However, employing more diverse models, trained on different data or with different procedures, as fallback systems can improve translation quality and virtually eliminate certain pathologies. + 1500–1517 + 2023.tacl-1.85 + guerreiro-etal-2023-hallucinations + + + <fixed-case>P</fixed-case>anini<fixed-case>QA</fixed-case>: Enhancing Patient Education Through Interactive Question Answering + PengshanCai + ZonghaiYao + FeiLiu + DakuoWang + MeghanReilly + HuixueZhou + LingxiLi + YiCao + AlokKapoor + AdarshaBajracharya + DanBerlowitz + HongYu + 10.1162/tacl_a_00616 + A patient portal allows discharged patients to access their personalized discharge instructions in electronic health records (EHRs). However, many patients have difficulty understanding or memorizing their discharge instructions (Zhao et al., 2017). In this paper, we present PaniniQA, a patient-centric interactive question answering system designed to help patients understand their discharge instructions. PaniniQA first identifies important clinical content from patients’ discharge instructions and then formulates patient-specific educational questions. In addition, PaniniQA is also equipped with answer verification functionality to provide timely feedback to correct patients’ misunderstandings. Our comprehensive automatic & human evaluation results demonstrate our PaniniQA is capable of improving patients’ mastery of their medical instructions through effective interactions.1 + 1518–1536 + 2023.tacl-1.86 + cai-etal-2023-paniniqa + + + Discover, Explain, Improve: An Automatic Slice Detection Benchmark for Natural Language Processing + WenyueHua + LifengJin + LinfengSong + HaitaoMi + YongfengZhang + DongYu + 10.1162/tacl_a_00617 + Pretrained natural language processing (NLP) models have achieved high overall performance, but they still make systematic errors. Instead of manual error analysis, research on slice detection models (SDMs), which automatically identify underperforming groups of datapoints, has caught escalated attention in Computer Vision for both understanding model behaviors and providing insights for future model training and designing. However, little research on SDMs and quantitative evaluation of their effectiveness have been conducted on NLP tasks. Our paper fills the gap by proposing a benchmark named “Discover, Explain, Improve (DEIm)” for classification NLP tasks along with a new SDM Edisa. Edisa discovers coherent and underperforming groups of datapoints; DEIm then unites them under human-understandable concepts and provides comprehensive evaluation tasks and corresponding quantitative metrics. The evaluation in DEIm shows that Edisa can accurately select error-prone datapoints with informative semantic features that summarize error patterns. Detecting difficult datapoints directly boosts model performance without tuning any original model parameters, showing that discovered slices are actionable for users.1 + 1537–1552 + 2023.tacl-1.87 + hua-etal-2023-discover + + + Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems + PengLiu + LemeiZhang + Jon AtleGulla + 10.1162/tacl_a_00619 + The emergence of Pre-trained Language Models (PLMs) has achieved tremendous success in the field of Natural Language Processing (NLP) by learning universal representations on large corpora in a self-supervised manner. The pre-trained models and the learned representations can be beneficial to a series of downstream NLP tasks. This training paradigm has recently been adapted to the recommendation domain and is considered a promising approach by both academia and industry. In this paper, we systematically investigate how to extract and transfer knowledge from pre-trained models learned by different PLM-related training paradigms to improve recommendation performance from various perspectives, such as generality, sparsity, efficiency and effectiveness. Specifically, we propose a comprehensive taxonomy to divide existing PLM-based recommender systems w.r.t. their training strategies and objectives. Then, we analyze and summarize the connection between PLM-based training paradigms and different input data types for recommender systems. Finally, we elaborate on open issues and future research directions in this vibrant field. + 1553–1571 + 2023.tacl-1.88 + liu-etal-2023-pre + + + An Efficient Self-Supervised Cross-View Training For Sentence Embedding + PeeratLimkonchotiwat + WuttikornPonwitayarat + LalitaLowphansirikul + CanUdomcharoenchaikit + EkapolChuangsuwanich + SaranaNutanong + 10.1162/tacl_a_00620 + Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrades as the number of parameters decreases. In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs. To evaluate the effectiveness of SCT, we compare it to 5 baseline and state-of-the-art competitors on seven Semantic Textual Similarity (STS) benchmarks using 5 PLMs with the number of parameters ranging from 4M to 340M. The experimental results show that STC outperforms the competitors for PLMs with less than 100M parameters in 18 of 21 cases.1 + 1572–1587 + 2023.tacl-1.89 + limkonchotiwat-etal-2023-efficient + + + General then Personal: Decoupling and Pre-training for Personalized Headline Generation + Yun-ZhuSong + Yi-SyuanChen + LuWang + Hong-HanShuai + 10.1162/tacl_a_00621 + Personalized Headline Generation aims to generate unique headlines tailored to users’ browsing history. In this task, understanding user preferences from click history and incorporating them into headline generation pose challenges. Existing approaches typically rely on predefined styles as control codes, but personal style lacks explicit definition or enumeration, making it difficult to leverage traditional techniques. To tackle these challenges, we propose General Then Personal (GTP), a novel framework comprising user modeling, headline generation, and customization. We train the framework using tailored designs that emphasize two central ideas: (a) task decoupling and (b) model pre-training. With the decoupling mechanism separating the task into generation and customization, two mechanisms, i.e., information self-boosting and mask user modeling, are further introduced to facilitate the training and text control. Additionally, we introduce a new evaluation metric to address existing limitations. Extensive experiments conducted on the PENS dataset, considering both zero-shot and few-shot scenarios, demonstrate that GTP outperforms state-of-the-art methods. Furthermore, ablation studies and analysis emphasize the significance of decoupling and pre-training. Finally, the human evaluation validates the effectiveness of our approaches.1 + 1588–1607 + 2023.tacl-1.90 + song-etal-2023-general + + + Removing Backdoors in Pre-trained Models by Regularized Continual Pre-training + BiruZhu + GanquCui + YangyiChen + YujiaQin + LifanYuan + ChongFu + YangdongDeng + ZhiyuanLiu + MaosongSun + MingGu + 10.1162/tacl_a_00622 + Recent research has revealed that pre-trained models (PTMs) are vulnerable to backdoor attacks before the fine-tuning stage. The attackers can implant transferable task-agnostic backdoors in PTMs, and control model outputs on any downstream task, which poses severe security threats to all downstream applications. Existing backdoor-removal defenses focus on task-specific classification models and they are not suitable for defending PTMs against task-agnostic backdoor attacks. To this end, we propose the first task-agnostic backdoor removal method for PTMs. Based on the selective activation phenomenon in backdoored PTMs, we design a simple and effective backdoor eraser, which continually pre-trains the backdoored PTMs with a regularization term in an end-to-end approach. The regularization term removes backdoor functionalities from PTMs while the continual pre-training maintains the normal functionalities of PTMs. We conduct extensive experiments on pre-trained models across different modalities and architectures. The experimental results show that our method can effectively remove backdoors inside PTMs and preserve benign functionalities of PTMs with a few downstream-task-irrelevant auxiliary data, e.g., unlabeled plain texts. The average attack success rate on three downstream datasets is reduced from 99.88% to 8.10% after our defense on the backdoored BERT. The codes are publicly available at https://github.com/thunlp/RECIPE. + 1608–1623 + 2023.tacl-1.91 + zhu-etal-2023-removing + + + Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation + PatrickFernandes + AmanMadaan + EmmyLiu + AntónioFarinhas + Pedro HenriqueMartins + AmandaBertsch + José G. C.de Souza + ShuyanZhou + TongshuangWu + GrahamNeubig + André F. T.Martins + 10.1162/tacl_a_00626 + Natural language generation has witnessed significant advancements due to the training of large language models on vast internet-scale datasets. Despite these advancements, there exists a critical challenge: These models can inadvertently generate content that is toxic, inaccurate, and unhelpful, and existing automatic evaluation metrics often fall short of identifying these shortcomings. As models become more capable, human feedback is an invaluable signal for evaluating and improving models. This survey aims to provide an overview of recent research that has leveraged human feedback to improve natural language generation. First, we introduce a taxonomy distilled from existing research to categorize and organize the varied forms of feedback. Next, we discuss how feedback can be described by its format and objective, and cover the two approaches proposed to use feedback (either for training or decoding): directly using feedback or training feedback models. We also discuss existing datasets for human-feedback data collection, and concerns surrounding feedback collection. Finally, we provide an overview of the nascent field of AI feedback, which uses large language models to make judgments based on a set of principles and minimize the need for human intervention. We also release a website of this survey at feedback-gap-survey.info. + 1643–1668 + 2023.tacl-1.92 + fernandes-etal-2023-bridging + + + <fixed-case>A</fixed-case>fri<fixed-case>S</fixed-case>peech-200: Pan-<fixed-case>A</fixed-case>frican Accented Speech Dataset for Clinical and General Domain <fixed-case>ASR</fixed-case> + TobiOlatunji + TejumadeAfonja + AdityaYadavalli + Chris ChinenyeEmezue + SahibSingh + Bonaventure F. P.Dossou + JoanneOsuchukwu + SalomeyOsei + Atnafu LambeboTonja + NaomeEtori + ClintonMbataku + 10.1162/tacl_a_00627 + Africa has a very poor doctor-to-patient ratio. At very busy clinics, doctors could see 30+ patients per day—a heavy patient burden compared with developed countries—but productivity tools such as clinical automatic speech recognition (ASR) are lacking for these overworked clinicians. However, clinical ASR is mature, even ubiquitous, in developed nations, and clinician-reported performance of commercial clinical ASR systems is generally satisfactory. Furthermore, the recent performance of general domain ASR is approaching human accuracy. However, several gaps exist. Several publications have highlighted racial bias with speech-to-text algorithms and performance on minority accents lags significantly. To our knowledge, there is no publicly available research or benchmark on accented African clinical ASR, and speech data is non-existent for the majority of African accents. We release AfriSpeech, 200hrs of Pan-African English speech, 67,577 clips from 2,463 unique speakers across 120 indigenous accents from 13 countries for clinical and general domain ASR, a benchmark test set, with publicly available pre-trained models with SOTA performance on the AfriSpeech benchmark. + 1669–1685 + 2023.tacl-1.93 + olatunji-etal-2023-afrispeech + + + <fixed-case>M</fixed-case>iss<fixed-case>M</fixed-case>odal: Increasing Robustness to Missing Modality in Multimodal Sentiment Analysis + RonghaoLin + HaifengHu + 10.1162/tacl_a_00628 + When applying multimodal machine learning in downstream inference, both joint and coordinated multimodal representations rely on the complete presence of modalities as in training. However, modal-incomplete data, where certain modalities are missing, greatly reduces performance in Multimodal Sentiment Analysis (MSA) due to varying input forms and semantic information deficiencies. This limits the applicability of the predominant MSA methods in the real world, where the completeness of multimodal data is uncertain and variable. The generation-based methods attempt to generate the missing modality, yet they require complex hierarchical architecture with huge computational costs and struggle with the representation gaps across different modalities. Diversely, we propose a novel representation learning approach named MissModal, devoting to increasing robustness to missing modality in a classification approach. Specifically, we adopt constraints with geometric contrastive loss, distribution distance loss, and sentiment semantic loss to align the representations of modal-missing and modal-complete data, without impacting the sentiment inference for the complete modalities. Furthermore, we do not demand any changes in the multimodal fusion stage, highlighting the generality of our method in other multimodal learning systems. Extensive experiments demonstrate that the proposed method achieves superior performance with minimal computational costs in various missing modalities scenarios (flexibility), including severely missing modality (efficiency) on two public MSA datasets. + 1686–1702 + 2023.tacl-1.94 + lin-hu-2023-missmodal + + + Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision + EugeneKharitonov + DamienVincent + ZalánBorsos + RaphaëlMarinier + SertanGirgin + OlivierPietquin + MattSharifi + MarcoTagliasacchi + NeilZeghidour + 10.1162/tacl_a_00618 + We introduce SPEAR-TTS, a multi-speaker text-to-speech (TTS) system that can be trained with minimal supervision. By combining two types of discrete speech representations, we cast TTS as a composition of two sequence-to-sequence tasks: from text to high-level semantic tokens (akin to “reading”) and from semantic tokens to low-level acoustic tokens (“speaking”). Decoupling these two tasks enables training of the “speaking” module using abundant audio-only data, and unlocks the highly efficient combination of pretraining and backtranslation to reduce the need for parallel data when training the “reading” component. To control the speaker identity, we adopt example prompting, which allows SPEAR-TTS to generalize to unseen speakers using only a short sample of 3 seconds, without any explicit speaker representation or speaker labels. Our experiments demonstrate that SPEAR-TTS achieves a character error rate that is competitive with state-of-the-art methods using only 15 minutes of parallel data, while matching ground-truth speech in naturalness and acoustic quality. + 1703–1718 + 2023.tacl-1.95 + kharitonov-etal-2023-speak + + + <fixed-case>R</fixed-case>e<fixed-case>COGS</fixed-case>: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation + ZhengxuanWu + Christopher D.Manning + ChristopherPotts + 10.1162/tacl_a_00623 + Compositional generalization benchmarks for semantic parsing seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark (Kim and Linzen, 2020). COGS poses generalization splits that appear impossible for present-day models, which could be taken as an indictment of those models. However, we show that the negative results trace to incidental features of COGS LFs. Converting these LFs to semantically equivalent ones and factoring out capabilities unrelated to semantic interpretation, we find that even baseline models get traction. A recent variable-free translation of COGS LFs suggests similar conclusions, but we observe this format is not semantically equivalent; it is incapable of accurately representing some COGS meanings. These findings inform our proposal for ReCOGS, a modified version of COGS that comes closer to assessing the target semantic capabilities while remaining very challenging. Overall, our results reaffirm the importance of compositional generalization and careful benchmark task design. + 1719–1733 + 2023.tacl-1.96 + wu-etal-2023-recogs + + + Data-driven Parsing Evaluation for Child-Parent Interactions + ZoeyLiu + EmilyPrud’hommeaux + 10.1162/tacl_a_00624 + We present a syntactic dependency treebank for naturalistic child and child-directed spoken English. Our annotations largely follow the guidelines of the Universal Dependencies project (UD [Zeman et al., 2022]), with detailed extensions to lexical and syntactic structures unique to spontaneous spoken language, as opposed to written texts or prepared speech. Compared to existing UD-style spoken treebanks and other dependency corpora of child-parent interactions specifically, our dataset is much larger (44,744 utterances; 233,907 words) and contains data from 10 children covering a wide age range (18–66 months). We conduct thorough dependency parser evaluations using both graph-based and transition-based parsers, trained on three different types of out-of-domain written texts: news, tweets, and learner data. Out-of-domain parsers demonstrate reasonable performance for both child and parent data. In addition, parser performance for child data increases along children’s developmental paths, especially between 18 and 48 months, and gradually approaches the performance for parent data. These results are further validated with in-domain training. + 1734–1753 + 2023.tacl-1.97 + liu-prudhommeaux-2023-data + + + <fixed-case>QA</fixed-case>meleon: Multilingual <fixed-case>QA</fixed-case> with Only 5 Examples + PriyankaAgrawal + ChrisAlberti + FantineHuot + JoshuaMaynez + JiMa + SebastianRuder + KuzmanGanchev + DipanjanDas + MirellaLapata + 10.1162/tacl_a_00625 + The availability of large, high-quality datasets has been a major driver of recent progress in question answering (QA). Such annotated datasets, however, are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are fine-tuned, thus avoiding costly annotation. Prompt tuning the PLM with only five examples per language delivers accuracy superior to translation-based baselines; it bridges nearly 60% of the gap between an English-only baseline and a fully-supervised upper bound fine-tuned on almost 50,000 hand-labeled examples; and consistently leads to improvements compared to directly fine-tuning a QA model on labeled examples in low resource settings. Experiments on the TyDiqa-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation.1 + 1754–1771 + 2023.tacl-1.98 + agrawal-etal-2023-qameleon + diff --git a/data/xml/2023.wmt.xml b/data/xml/2023.wmt.xml index 8f952c0205..31b7929943 100644 --- a/data/xml/2023.wmt.xml +++ b/data/xml/2023.wmt.xml @@ -1193,7 +1193,7 @@ <fixed-case>NICT</fixed-case>-<fixed-case>AI</fixed-case>4<fixed-case>B</fixed-case>’s Submission to the <fixed-case>I</fixed-case>ndic <fixed-case>MT</fixed-case> Shared Task in <fixed-case>WMT</fixed-case> 2023 RajDabre JayGala - PranjalChitale + Pranjal A.Chitale 941–949 In this paper, we (Team NICT-AI4B) describe our MT systems that we submit to the Indic MT task in WMT 2023. Our primary system consists of 3 stages: Joint denoising and MT training using officially approved monolingual and parallel corpora, backtranslation and, MT training on original and backtranslated parallel corpora. We observe that backtranslation leads to substantial improvements in translation quality up to 4 BLEU points. We also develop 2 contrastive systems on unconstrained settings, where the first system involves fine-tuning of IndicTrans2 DA models on official parallel corpora and seed data used in AI4Bharat et al, (2023), and the second system involves a system combination of the primary and the aforementioned system. Overall, we manage to obtain high-quality translation systems for the 4 low-resource North-East Indian languages of focus. 2023.wmt-1.88 diff --git a/data/xml/2024.bucc.xml b/data/xml/2024.bucc.xml new file mode 100644 index 0000000000..728839447f --- /dev/null +++ b/data/xml/2024.bucc.xml @@ -0,0 +1,166 @@ + + + + + Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024 + PierreZweigenbaum + ReinhardRapp + SergeSharoff + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.bucc-1 + bucc + + + 2024.bucc-1.0 + bucc-2024-building + + + On a Novel Application of <fixed-case>W</fixed-case>asserstein-<fixed-case>P</fixed-case>rocrustes for Unsupervised Cross-Lingual Alignment of Embeddings + GuillemRamírez + RumenDangovski + PreslavNakov + MarinSoljacic + 1–11 + 2024.bucc-1.1 + ramirez-etal-2024-novel + + + Modeling Diachronic Change in <fixed-case>E</fixed-case>nglish Scientific Writing over 300+ Years with Transformer-based Language Model Surprisal + JuliusSteuer + Marie-PaulineKrielke + StefanFischer + StefaniaDegaetano-Ortlieb + MariusMosbach + DietrichKlakow + 12–23 + 2024.bucc-1.2 + steuer-etal-2024-modeling + + + <fixed-case>PORTULAN</fixed-case> <fixed-case>E</fixed-case>xtra<fixed-case>GLUE</fixed-case> Datasets and Models: Kick-starting a Benchmark for the Neural Processing of <fixed-case>P</fixed-case>ortuguese + Tomás FreitasOsório + BernardoLeite + HenriqueLopes Cardoso + LuísGomes + JoãoRodrigues + RodrigoSantos + AntónioBranco + 24–34 + 2024.bucc-1.3 + osorio-etal-2024-portulan + + + Invited Talk: The Way Towards Massively Multilingual Language Models + FrançoisYvon + 35 + 2024.bucc-1.4 + yvon-2024-invited + + + Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets + ZiLong + ZhenHaoTang + XianghuaFu + JianChen + ShilongHou + JinzeLyu + 36–50 + 2024.bucc-1.5 + long-etal-2024-exploring + + + Exploring the Potential of Large Language Models in Adaptive Machine Translation for Generic Text and Subtitles + AbdelhadiSoudi + MohamedHannani + KristofVan Laerhoven + EleftheriosAvramidis + 51–58 + 2024.bucc-1.6 + soudi-etal-2024-exploring + + + <fixed-case>INCLURE</fixed-case>: a Dataset and Toolkit for Inclusive <fixed-case>F</fixed-case>rench Translation + PaulLerner + CyrilGrouin + 59–68 + 2024.bucc-1.7 + lerner-grouin-2024-inclure + + + <fixed-case>B</fixed-case>n<fixed-case>PC</fixed-case>: A Gold Standard Corpus for Paraphrase Detection in <fixed-case>B</fixed-case>angla, and its Evaluation + SouravSaha + Zeshan AhmedNobin + Mufassir AhmadChowdhury + Md. Shakirul Hasan KhanMobin + Mohammad RuhulAmin + SudiptaKar + 69–84 + 2024.bucc-1.8 + saha-etal-2024-bnpc + + + Creating Clustered Comparable Corpora from <fixed-case>W</fixed-case>ikipedia with Different Fuzziness Levels and Language Representativity + AnnaLaskina + EricGaussier + GaelleCalvary + 85–93 + 2024.bucc-1.9 + laskina-etal-2024-creating + + + <fixed-case>E</fixed-case>u<fixed-case>R</fixed-case>e<fixed-case>C</fixed-case>o: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research + MarcKupietz + PiotrBanski + NilsDiewald + BeataTrawinski + AndreasWitt + 94–103 + 2024.bucc-1.10 + kupietz-etal-2024-eureco + + + Building Annotated Parallel Corpora Using the <fixed-case>ATIS</fixed-case> Dataset: Two <fixed-case>UD</fixed-case>-style treebanks in <fixed-case>E</fixed-case>nglish and <fixed-case>T</fixed-case>urkish + NeslihanCesur + AslıKuzgun + MehmetKose + Olcay TanerYıldız + 104–110 + 2024.bucc-1.11 + cesur-etal-2024-building + + + Bootstrapping the Annotation of <fixed-case>UD</fixed-case> Learner Treebanks + AriannaMasciolini + 111–117 + 2024.bucc-1.12 + masciolini-2024-bootstrapping + + + <fixed-case>S</fixed-case>we<fixed-case>D</fixed-case>iagnostics: A Diagnostics Natural Language Inference Dataset for <fixed-case>S</fixed-case>wedish + FelixMorger + 118–124 + 2024.bucc-1.13 + morger-2024-swediagnostics + + + Multiple Discourse Relations in <fixed-case>E</fixed-case>nglish <fixed-case>TED</fixed-case> Talks and Their Translation into <fixed-case>L</fixed-case>ithuanian, <fixed-case>P</fixed-case>ortuguese and <fixed-case>T</fixed-case>urkish + DenizZeyrek + GiedrėValūnaitė Oleškevičienė + AmaliaMendes + 125–134 + 2024.bucc-1.14 + zeyrek-etal-2024-multiple + + + mini-<fixed-case>CIEP</fixed-case>+ : A Shareable Parallel Corpus of Prose + AnnemarieVerkerk + LuigiTalamo + 135–143 + 2024.bucc-1.15 + verkerk-talamo-2024-mini + +
+
diff --git a/data/xml/2024.caldpseudo.xml b/data/xml/2024.caldpseudo.xml index 4c66234bd2..111ce2f1e0 100644 --- a/data/xml/2024.caldpseudo.xml +++ b/data/xml/2024.caldpseudo.xml @@ -30,6 +30,7 @@ Missed recognition of named entities while de-identifying clinical narratives poses a critical challenge in protecting patient-sensitive health information. Mitigating name recognition errors is essential to minimize risk of patient re-identification. In this paper, we emphasize the need for stratified sampling and enhanced contextual considerations concerning Name Tokens using a fine-tuned Longformer BERT model for clinical text de-identifcation. We introduce a Hidden in Plain Sight (HIPS) Markov-based replacement technique for names to mask name recognition misses, revealing a significant reduction in name leakage rates. Our experimental results underscore the impact on addressing name recognition challenges in BERT-based de-identification systems for heightened privacy protection in electronic health records. 2024.caldpseudo-1.1 simancek-vydiswaran-2024-handling +
Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain @@ -42,6 +43,7 @@ Since medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity. 2024.caldpseudo-1.2 nishiyama-etal-2024-assessing + Automatic Detection and Labelling of Personal Data in Case Reports from the <fixed-case>ECHR</fixed-case> in <fixed-case>S</fixed-case>panish: Evaluation of Two Different Annotation Approaches @@ -52,6 +54,7 @@ In this paper we evaluate two annotation approaches for automatic detection and labelling of personal information in legal texts in relation to the ambiguity of the labels and the homogeneity of the annotations. For this purpose, we built a corpus of 44 case reports from the European Court of Human Rights in Spanish language and we annotated it following two different annotation approaches: automatic projection of the annotations of an existing English corpus, and manual annotation with our reinterpretation of their guidelines. Moreover, we employ Flair on a Named Entity Recognition task to compare its performance in the two annotation schemes. 2024.caldpseudo-1.3 sierro-etal-2024-automatic + <fixed-case>PSILENCE</fixed-case>: A Pseudonymization Tool for International Law @@ -61,6 +64,7 @@ Since the announcement of the GDPR, the pseudonymization of legal documents has become a high-priority task in many legal organizations. This means that for making public a document, it is necessary to redact the identity of certain entities, such as witnesses. In this work, we present the first results obtained by PSILENCE, a pseudonymization tool created for redacting semi-automatically international arbitration documents in English. PSILENCE has been built using a Named Entity Recognition (NER) system, along with a Coreference Resolution system. These systems allow us to find the people that we need to redact in a clustered way, but also to propose the same pseudonym throughout one document. This last aspect makes it easier to read and comprehend a redacted legal document. Different experiments were done on four different datasets, one of which was legal, and the results are promising, reaching a Macro F-score of up to 0.72 on the legal dataset. 2024.caldpseudo-1.4 cabrera-diego-gheewala-2024-psilence + Deidentifying a <fixed-case>N</fixed-case>orwegian Clinical Corpus - an Effort to Create a Privacy-preserving <fixed-case>N</fixed-case>orwegian Large Clinical Language Model @@ -74,6 +78,7 @@ The study discusses the methods and challenges of deidentifying and pseudonymizing Norwegian clinical text for research purposes. The results of the NorDeid tool for deidentification and pseudonymization on different types of protected health information were evaluated and discussed, as well as the extension of its functionality with regular expressions to identify specific types of sensitive information. The research used a clinical corpus of adult patients treated in a gastro-surgical department in Norway, which contains approximately nine million clinical notes. The study also highlights the challenges posed by the unique language and clinical terminology of Norway and emphasizes the importance of protecting privacy and the need for customized approaches to meet legal and research requirements. 2024.caldpseudo-1.5 ngo-etal-2024-deidentifying + Extending Off-the-shelf <fixed-case>NER</fixed-case> Systems to Personal Information Detection in Dialogues with a Virtual Agent: Findings from a Real-Life Use Case @@ -85,6 +90,7 @@ We present the findings and results of our pseudonymisation system, which has been developed for a real-life use-case involving users and an informative chatbot in the context of the COVID-19 pandemic. Message exchanges between the two involve the former group providing information about themselves and their residential area, which could easily allow for their re-identification. We create a modular pipeline to detect PIIs and perform basic deidentification such that the data can be stored while mitigating any privacy concerns. The use-case presents several challenging aspects, the most difficult of which is the logistic challenge of not being able to directly view or access the data due to the very privacy issues we aim to resolve. Nevertheless, our system achieves a high recall of 0.99, correctly identifying almost all instances of personal data. However, this comes at the expense of precision, which only reaches 0.64. We describe the sensitive information identification in detail, explaining the design principles behind our decisions. We additionally highlight the particular challenges we’ve encountered. 2024.caldpseudo-1.6 mina-etal-2024-extending + Detecting Personal Identifiable Information in <fixed-case>S</fixed-case>wedish Learner Essays @@ -97,6 +103,7 @@ Linguistic data can — and often does — contain PII (Personal Identifiable Information). Both from a legal and ethical standpoint, the sharing of such data is not permissible. According to the GDPR, pseudonymization, i.e. the replacement of sensitive information with surrogates, is an acceptable strategy for privacy preservation. While research has been conducted on the detection and replacement of sensitive data in Swedish medical data using Large Language Models (LLMs), it is unclear whether these models handle PII in less structured and more thematically varied texts equally well. In this paper, we present and discuss the performance of an LLM-based PII-detection system for Swedish learner essays. 2024.caldpseudo-1.7 szawerna-etal-2024-detecting + Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts @@ -112,6 +119,7 @@ Large language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs. 2024.caldpseudo-1.8 gardiner-etal-2024-data + When Is a Name Sensitive? Eponyms in Clinical Text and Implications for De-Identification @@ -123,6 +131,7 @@ Clinical data, in the form of electronic health records, are rich resources that can be tapped using natural language processing. At the same time, they contain very sensitive information that must be protected. One strategy is to remove or obscure data using automatic de-identification. However, the detection of sensitive data can yield false positives. This is especially true for tokens that are similar in form to sensitive entities, such as eponyms. These names tend to refer to medical procedures or diagnoses rather than specific persons. Previous research has shown that automatic de-identification systems often misclassify eponyms as names, leading to a loss of valuable medical information. In this study, we estimate the prevalence of eponyms in a real Swedish clinical corpus. Furthermore, we demonstrate that modern transformer-based de-identification systems are more accurate in distinguishing between names and eponyms than previous approaches. 2024.caldpseudo-1.9 vakili-etal-2024-name + Did the Names <fixed-case>I</fixed-case> Used within My Essay Affect My Score? Diagnosing Name Biases in Automated Essay Scoring @@ -135,6 +144,7 @@ Automated essay scoring (AES) of second-language learner essays is a high-stakes task as it can affect the job and educational opportunities a student may have access to. Thus, it becomes imperative to make sure that the essays are graded based on the students’ language proficiency as opposed to other reasons, such as personal names used in the text of the essay. Moreover, most of the research data for AES tends to contain personal identifiable information. Because of that, pseudonymization becomes an important tool to make sure that this data can be freely shared. Thus, our systems should not grade students based on which given names were used in the text of the essay, both for fairness and for privacy reasons. In this paper we explore how given names affect the CEFR level classification of essays of second language learners of Swedish. We use essays containing just one personal name and substitute it for names from lists of given names from four different ethnic origins, namely Swedish, Finnish, Anglo-American, and Arabic. We find that changing the names within the essays has no apparent effect on the classification task, regardless of whether a feature-based or a transformer-based model is used. 2024.caldpseudo-1.10 munoz-sanchez-etal-2024-names + diff --git a/data/xml/2024.case.xml b/data/xml/2024.case.xml index 7a928fde53..7d260a74fc 100644 --- a/data/xml/2024.case.xml +++ b/data/xml/2024.case.xml @@ -29,6 +29,7 @@ 2024.case-1.1 2024.case-1.1.SupplementaryMaterial.txt fellman-etal-2024-future +
Fine-Tuning Language Models on <fixed-case>D</fixed-case>utch Protest Event Tweets @@ -54,6 +55,7 @@ 2024.case-1.3 2024.case-1.3.SupplementaryMaterial.txt bakker-etal-2024-timeline + Leveraging Approximate Pattern Matching with <fixed-case>BERT</fixed-case> for Event Detection @@ -63,6 +65,7 @@ 2024.case-1.4 2024.case-1.4.SupplementaryMaterial.txt tanev-2024-leveraging + Socio-political Events of Conflict and Unrest: A Survey of Available Datasets @@ -75,6 +78,7 @@ 2024.case-1.5 2024.case-1.5.SupplementaryMaterial.txt olsen-etal-2024-socio + Evaluating <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case>’s Ability to Detect Hate Speech in <fixed-case>T</fixed-case>urkish Tweets @@ -85,6 +89,7 @@ 2024.case-1.6 2024.case-1.6.SupplementaryMaterial.txt dehghan-yanikoglu-2024-evaluating + <fixed-case>YY</fixed-case>ama@Multimodal Hate Speech Event Detection 2024: Simpler Prompts, Better Results - Enhancing Zero-shot Detection with a Large Multimodal Model @@ -94,6 +99,7 @@ 2024.case-1.7 2024.case-1.7.SupplementaryMaterial.txt yamagishi-2024-yyama + <fixed-case>RACAI</fixed-case> at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Improving Detection of Hate Speech by Extending <fixed-case>LLM</fixed-case> Predictions with Handcrafted Features @@ -103,6 +109,7 @@ 2024.case-1.8 2024.case-1.8.SupplementaryMaterial.txt pais-2024-racai + <fixed-case>CLTL</fixed-case>@Multimodal Hate Speech Event Detection 2024: The Winning Approach to Detecting Multimodal Hate Speech and Its Targets @@ -113,6 +120,7 @@ 2024.case-1.9 2024.case-1.9.SupplementaryMaterial.txt wang-markov-2024-cltl + <fixed-case>HAM</fixed-case>i<fixed-case>S</fixed-case>o<fixed-case>N</fixed-case>-Generative at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Stance Detection using generative large language models @@ -123,6 +131,7 @@ 2024.case-1.10 2024.case-1.10.SupplementaryMaterial.txt fraile-hernandez-penas-2024-hamison + <fixed-case>JRC</fixed-case> at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Lexicon-based Detection of Hate Speech @@ -142,6 +151,7 @@ 2024.case-1.12 2024.case-1.12.SupplementaryMaterial.txt rodriguez-garcia-centeno-2024-hamison + <fixed-case>NLPD</fixed-case>ame at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Mistral Sequence Classification with <fixed-case>PEFT</fixed-case> for Hate Speech, Targets and Stance Event Detection @@ -151,6 +161,7 @@ 2024.case-1.13 2024.case-1.13.SupplementaryMaterial.txt christodoulou-2024-nlpdame + <fixed-case>AAST</fixed-case>-<fixed-case>NLP</fixed-case> at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Ensemble-Based Climate Activism Stance and Hate Speech Detection : Leveraging Pretrained Language Models @@ -161,6 +172,7 @@ 2024.case-1.14 2024.case-1.14.SupplementaryMaterial.txt el-sayed-nasr-2024-aast + <fixed-case>ARC</fixed-case>-<fixed-case>NLP</fixed-case> at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Stance and Hate Speech Detection by Generative and Encoder Models Optimized with Tweet-Specific Elements @@ -172,6 +184,7 @@ 2024.case-1.15 2024.case-1.15.SupplementaryMaterial.txt kaya-etal-2024-arc + <fixed-case>HAM</fixed-case>i<fixed-case>S</fixed-case>o<fixed-case>N</fixed-case>-Ensemble at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Ensemble of <fixed-case>R</fixed-case>o<fixed-case>BERT</fixed-case>a, Llama 2, and Multi-task for Stance Detection @@ -184,6 +197,7 @@ 2024.case-1.16 2024.case-1.16.SupplementaryMaterial.txt rodriguez-garcia-etal-2024-hamison + <fixed-case>M</fixed-case>ason<fixed-case>P</fixed-case>erplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles @@ -198,6 +212,7 @@ 2024.case-1.17 2024.case-1.17.SupplementaryMaterial.txt ganguly-etal-2024-masonperplexity + <fixed-case>M</fixed-case>ason<fixed-case>P</fixed-case>erplexity at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Integrating Advanced Ensemble Techniques and Data Augmentation for Climate Activism Stance and Hate Event Identification @@ -211,6 +226,7 @@ 2024.case-1.18 2024.case-1.18.SupplementaryMaterial.txt bin-emran-etal-2024-masonperplexity + <fixed-case>AAST</fixed-case>-<fixed-case>NLP</fixed-case> at Multimodal Hate Speech Event Detection 2024 : A Multimodal Approach for Classification of Text-Embedded Images Based on <fixed-case>CLIP</fixed-case> and <fixed-case>BERT</fixed-case>-Based Models. @@ -221,6 +237,7 @@ 2024.case-1.19 2024.case-1.19.SupplementaryMaterial.txt el-sayed-nasr-2024-aast-nlp + <fixed-case>CUET</fixed-case>_<fixed-case>B</fixed-case>inary_<fixed-case>H</fixed-case>ackers at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: A Comprehensive Evaluation and Superior Performance of Transformer-Based Models in Hate Speech Event Detection and Stance Classification for Climate Activism @@ -232,6 +249,7 @@ 2024.case-1.20 2024.case-1.20.SupplementaryMaterial.txt farsi-etal-2024-cuet + <fixed-case>HAM</fixed-case>i<fixed-case>S</fixed-case>o<fixed-case>N</fixed-case>-baselines at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: A Study on the Use of External Data for Hate Speech and Stance Detection @@ -252,6 +270,7 @@ 2024.case-1.22 2024.case-1.22.SupplementaryMaterial.txt narayan-biswal-2024-z + Bryndza at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented <fixed-case>GPT</fixed-case>-4 and <fixed-case>LL</fixed-case>a<fixed-case>MA</fixed-case> @@ -266,6 +285,7 @@ 2024.case-1.23 2024.case-1.23.SupplementaryMaterial.txt suppa-etal-2024-bryndza + <fixed-case>IUST</fixed-case> at <fixed-case>C</fixed-case>limate<fixed-case>A</fixed-case>ctivism 2024: Towards Optimal Stance Detection: A Systematic Study of Architectural Choices and Data Cleaning Techniques @@ -276,6 +296,7 @@ 2024.case-1.24 2024.case-1.24.SupplementaryMaterial.txt mahmoudi-eetemadi-2024-iust + <fixed-case>VRLL</fixed-case>ab at <fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang 2024: <fixed-case>T</fixed-case>urkish Hate Speech Detection Online with <fixed-case>T</fixed-case>urkish<fixed-case>BERT</fixed-case>weet @@ -286,6 +307,7 @@ 2024.case-1.25 2024.case-1.25.SupplementaryMaterial.txt najafi-varol-2024-vrllab + Transformers at <fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang 2024: Hate Speech Detection in <fixed-case>A</fixed-case>rabic and <fixed-case>T</fixed-case>urkish Tweets Using <fixed-case>BERT</fixed-case> Based Architectures @@ -296,6 +318,7 @@ 2024.case-1.26 2024.case-1.26.SupplementaryMaterial.txt singhal-bedi-2024-transformers + <fixed-case>R</fixed-case>e<fixed-case>BERT</fixed-case> at <fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang 2024: Fine-Tuning <fixed-case>BERT</fixed-case> with <fixed-case>A</fixed-case>dam<fixed-case>W</fixed-case> for Hate Speech Detection in <fixed-case>A</fixed-case>rabic and <fixed-case>T</fixed-case>urkish @@ -307,6 +330,7 @@ 2024.case-1.27 2024.case-1.27.SupplementaryMaterial.txt yagci-etal-2024-rebert + <fixed-case>D</fixed-case>etective<fixed-case>R</fixed-case>e<fixed-case>DAS</fixed-case>ers at <fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages @@ -318,6 +342,7 @@ 2024.case-1.28 2024.case-1.28.SupplementaryMaterial.txt qachfar-etal-2024-detectiveredasers + Detecting Hate Speech in <fixed-case>T</fixed-case>urkish Print Media: A Corpus and A Hybrid Approach with Target-oriented Linguistic Knowledge @@ -333,6 +358,7 @@ 2024.case-1.29 2024.case-1.29.SupplementaryMaterial.txt uludogan-etal-2024-detecting + Team Curie at <fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang 2024: Hate Speech Detection in <fixed-case>T</fixed-case>urkish and <fixed-case>A</fixed-case>rabic Tweets using <fixed-case>BERT</fixed-case>-based models @@ -361,6 +387,7 @@ 2024.case-1.31 2024.case-1.31.SupplementaryMaterial.txt thapa-etal-2024-extended + Overview of the Hate Speech Detection in <fixed-case>T</fixed-case>urkish and <fixed-case>A</fixed-case>rabic Tweets (<fixed-case>HSD</fixed-case>-2<fixed-case>L</fixed-case>ang) Shared Task at <fixed-case>CASE</fixed-case> 2024 @@ -375,6 +402,7 @@ 2024.case-1.32 2024.case-1.32.SupplementaryMaterial.txt uludogan-etal-2024-overview + Stance and Hate Event Detection in Tweets Related to Climate Activism - Shared Task at <fixed-case>CASE</fixed-case> 2024 @@ -392,6 +420,7 @@ 2024.case-1.33 2024.case-1.33.SupplementaryMaterial.txt thapa-etal-2024-stance + A Concise Report of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text diff --git a/data/xml/2024.cawl.xml b/data/xml/2024.cawl.xml new file mode 100644 index 0000000000..17a00f737b --- /dev/null +++ b/data/xml/2024.cawl.xml @@ -0,0 +1,99 @@ + + + + + Proceedings of the Second Workshop on Computation and Written Language (CAWL) @ LREC-COLING 2024 + KyleGorman + EmilyPrud'hommeaux + BrianRoark + RichardSproat + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.cawl-1 + cawl + ws + + + 2024.cawl-1.0 + cawl-2024-computation + + + <fixed-case>P</fixed-case>ars<fixed-case>T</fixed-case>ext: A Digraphic Corpus for <fixed-case>T</fixed-case>ajik-<fixed-case>F</fixed-case>arsi Transliteration + RayyanMerchant + KevinTang + 1–7 + Despite speaking dialects of the same language, Persian speakers from Tajikistan cannot read Persian texts from Iran and Afghanistan. This is due to the fact that Tajik Persian is written in the Tajik-Cyrillic script, while Iranian and Afghan Persian are written in the Perso-Arabic script. As the formal registers of these dialects all maintain high levels of mutual intelligibility with each other, machine transliteration has been proposed as a more practical and appropriate solution than machine translation. Unfortunately, Persian texts written in both scripts are much more common in print in Tajikistan than online. This paper introduces a novel corpus meant to remedy that gap: ParsText. ParsText contains 2,813 Persian sentences written in both Tajik-Cyrillic and Perso-Arabic manually collected from blog pages and news articles online. This paper presents the need for such a corpus, previous and related work, data collection and alignment procedures, corpus statistics, and discusses directions for future work. + 2024.cawl-1.1 + merchant-tang-2024-parstext + + + A Joint Approach for Automatic Analysis of Reading and Writing Errors + WiekeHarmsen + CatiaCucchiarini + Roelandvan Hout + HelmerStrik + 8–17 + Analyzing the errors that children make on their ways to becoming fluent readers and writers can provide invaluable scientific insights into the processes that underlie literacy acquisition. To this end, we present in this paper an extension of an earlier developed spelling error detection and classification algorithm for Dutch, so that reading errors can also be automatically detected from their phonetic transcription. The strength of this algorithm lies in its ability to detect errors at Phoneme-Corresponding Unit (PCU) level, where a PCU is a sequence of letters corresponding to one phoneme. We validated this algorithm and found good agreement between manual and automatic reading error classifications. We also used the algorithm to analyze written words by second graders and phonetic transcriptions of read words by first graders. With respect to the writing data, we found that the PCUs ‘ei’, ‘eu’, ‘g’, ‘ij’ and ‘ch’ were most frequently written incorrectly, for the reading data, these were the PCUs ‘v’, ‘ui’, ‘ng’, ‘a’ and ‘g’. This study presents a first attempt at developing a joint method for detecting reading and writing errors. In future research this algorithm can be used to analyze corpora containing reading and writing data from the same children. + 2024.cawl-1.2 + 2024.cawl-1.2.OptionalSupplementaryMaterial.zip + harmsen-etal-2024-joint + + + Tool for Constructing a Large-Scale Corpus of Code Comments and Other Source Code Annotations + LunaPeck + SusanBrown + 18–22 + The sublanguage of source code annotations—explanatory natural language writing that accompanies programming source code—is little-studied in linguistics. To facilitate research into this domain, we have developed a program prototype that can extract code comments and changelogs (i.e. commit messages) from public, open-source code repositories, with automatic tokenization and part-of-speech tagging on the extracted text. The program can also automatically detect and discard “commented-out” source code in data from Python repositories, to prevent it from polluting the corpus, demonstrating that such sanitization is likely feasible for other programming languages as well. With the current tool, we have produced a 6-million word corpus of English-language comments extracted from three different programming languages: Python, C, and C++. + 2024.cawl-1.3 + peck-brown-2024-tool + + + Tokenization via Language Modeling: the Role of Preceding Text + RastislavHronsky + EmmanuelKeuleers + 23–35 + While language models benefit immensely from their capacity to model large context (i.e., sequence of preceding tokens), the role of context is unclear in text tokenization, which is, in many cases, language model-driven to begin with. In this paper, we attempt to explore the role in three different writing systems and using three different text tokenization strategies (word-based, Morfessor, and BPE). In the first experiment, we examined how the size of context used for predicting the next token affects the ranking of the segmentation strategies i.t.o. language model surprisal. This effect was very writing system specific: minimal in case of English, and rank-reversing due to increased context size and token granularity in case of Turkish and Chinese. In the second experiment, we examined how context alters segmentation hypotheses when using language models to identify word boundaries. In this case, the effect was subtle: using context-aware, rather than context-free segment scores improved boundary recognition accuracy by up to 0.5%, once baseline effects were exploited. + 2024.cawl-1.4 + hronsky-keuleers-2024-tokenization + + + Abbreviation Across the World’s Languages and Scripts + KyleGorman + BrianRoark + 36–42 + Detailed taxonomies for non-standard words, including abbreviations, have been developed for speech and language processing, though mostly with reference to English. In this paper, we examine abbreviation formation strategies in a diverse sample of more than 50 languages, dialects and scripts. The resulting taxonomy—and data about which strategies are attested in which languages—provides key information needed to create multilingual systems for abbreviation expansion, an essential component for speech processing and text understanding + 2024.cawl-1.5 + gorman-roark-2024-abbreviation + + + Now You See Me, Now You Don’t: ‘Poverty of the Stimulus’ Problems and Arbitrary Correspondences in End-to-End Speech Models + Daanvan Esch + 43–52 + End-to-end models for speech recognition and speech synthesis have many benefits, but we argue they also face a unique set of challenges not encountered in conventional multi-stage hybrid systems, which relied on the explicit injection of linguistic knowledge through resources such as phonemic dictionaries and verbalization grammars. These challenges include handling words with unusual grapheme-to-phoneme correspondences, converting between written forms like ‘12’ and spoken forms such as ‘twelve’, and contextual disambiguation of homophones or homographs. We describe the mitigation strategies that have been used for these problems in end-to-end systems, either implicitly or explicitly, and call out that the most commonly used mitigation techniques are likely incompatible with newly emerging approaches that use minimal amounts of supervised audio training data. We review best-of-both-world approaches that allow the use of end-to-end models combined with traditional linguistic resources, which we show are increasingly straightforward to create at scale, and close with an optimistic outlook for bringing speech technologies to many more languages by combining these strands of research. + 2024.cawl-1.6 + van-esch-2024-now + + + Towards Fast Cognate Alignment on Imbalanced Data + LoganBorn + M. WillisMonroe + KathrynKelley + AnoopSarkar + 53–58 + Cognate alignment models purport to enable decipherment, but their speed and need for clean data can make them unsuitable for realistic decipherment problems. We seek to draw attention to these shortcomings in the hopes that future work may avoid them, and we outline two techniques which begin to overcome the described problems. + 2024.cawl-1.7 + born-etal-2024-towards + + + Simplified <fixed-case>C</fixed-case>hinese Character Distance Based on Ideographic Description Sequences + YixiaWang + EmmanuelKeuleers + 59–66 + Character encoding systems have long overlooked the internal structure of characters. Ideographic Description Sequences, which explicitly represent spatial relations between character components, are a potential solution to this problem. In this paper, we illustrate the utility of Ideographic Description Sequences in computing edit distance and finding orthographic neighbors for Simplified Chinese characters. In addition, we explore the possibility of using Ideographic Description Sequences to encode spatial relations between components in other scripts. + 2024.cawl-1.8 + wang-keuleers-2024-simplified + +
+
diff --git a/data/xml/2024.cl.xml b/data/xml/2024.cl.xml new file mode 100644 index 0000000000..1ef12c5465 --- /dev/null +++ b/data/xml/2024.cl.xml @@ -0,0 +1,127 @@ + + + + + Computational Linguistics, Volume 50, Issue 1 - March 2024 + MIT Press +
Cambridge, MA
+ March + 2024 + cl + 50 + 1 + + + My Big, Fat 50-Year Journey + MarthaPalmer + 10.1162/coli_a_00499 + My most heartfelt thanks to ACL for this tremendous honor. I’m completely thrilled. I cannot tell you how surprised I was when I got Iryna’s email. It is amazing that my first ACL conference since 2019 in Florence includes this award. What a wonderful way to be back with all of my friends and family here at ACL. I’m going to tell you about my big fat 50-year journey. What have I been doing for the last 50 years? Well, finding meaning, quite literally in words. Or in other words, exploring how computational lexical semantics can support natural language understanding. This is going to be quick. Hold onto your hats, here we go. + 1–24 + 2024.cl-1.1 + palmer-2024-big + + + Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation + JianhuiPang + BaosongYang* + Derek FaiWong* + YuWan + DayihengLiu + Lidia SamChao + JunXie + 10.1162/coli_a_00496 + The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as back-translation and self-supervised objectives, including masked language modeling, causal language modeling, and denoise autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of pre-training, back-translation, and multi-task learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation. + 25–47 + 2024.cl-1.2 + pang-etal-2024-rethinking + + + How Is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models + MarkOrmerod + Jesús Martínezdel Rincón + BarryDevereux + 10.1162/coli_a_00495 + Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English noun-noun compounds. We probe Transformer-based language models for their knowledge of the thematic relations that link the head nouns and modifier words of compounds (e.g., KITCHEN CHAIR: a chair located in a kitchen). Firstly, using a dataset featuring groups of compounds with shared lexical or semantic features, we find that token representations of six Transformer-based language models distinguish between pairs of compounds based on whether they use the same thematic relation. Secondly, we utilize fine-grained vector representations of compound semantics derived from human annotations, and find that token vectors from several models elicit a strong signal of the semantic relations used in the compounds. In a novel “compositional probe” setting, where we compare the semantic relation signal in mean-pooled token vectors of compounds to mean-pooled token vectors when the two constituent words appear in separate sentences, we find that the Transformer-based language models that best represent the semantics of noun-noun compounds also do so substantially better than in the control condition where the two constituent works are processed separately. Overall, our results shed light on the ability of Transformer-based language models to support compositional semantic processes in representing the meaning of noun-noun compounds. + 49–81 + 2024.cl-1.3 + ormerod-etal-2024-kitchen + + + Universal Generation for <fixed-case>O</fixed-case>ptimality <fixed-case>T</fixed-case>heory Is <fixed-case>PSPACE</fixed-case>-Complete + SophieHao + 10.1162/coli_a_00494 + This article shows that the universal generation problem for Optimality Theory (OT) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard and at most EXPSPACE-hard, our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general and NL-complete when the number of automata is bounded. Our analysis shows that constraint interaction is the main contributor to the complexity of OT: The ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena. + 83–117 + 2024.cl-1.4 + hao-2024-universal + + + Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering + AkshayChaturvedi + SwarnadeepBhar + SoumadeepSaha + UtpalGarain + NicholasAsher + 10.1162/coli_a_00493 + Transformer-based language models have been shown to be highly effective for several NLP tasks. In this article, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model’s inferences in question answering. We then test this notion by observing a model’s behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models’ inability to deal with negation intervention or to capture the predicate–argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate–argument structure. While InstructGPT models do achieve very high performance on predicate–argument structure task, they fail to respond adequately to our deletion and negation interventions. + 119–155 + 2024.cl-1.5 + chaturvedi-etal-2024-analyzing + + + On the Role of Morphological Information for Contextual Lemmatization + OliaToporkov + RodrigoAgerri + 10.1162/coli_a_00497 + Lemmatization is a natural language processing (NLP) task that consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this article we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish, and English. Furthermore, and unlike the vast majority of previous work, we also evaluate lemmatizers in out-of-domain settings, which constitutes, after all, their most common application use. The results of our study are rather surprising. It turns out that providing lemmatizers with fine-grained morphological features during training is not that beneficial, not even for agglutinative languages. In fact, modern contextual word representations seem to implicitly encode enough morphological information to obtain competitive contextual lemmatizers without seeing any explicit morphological signal. Moreover, our experiments suggest that the best lemmatizers out-of-domain are those using simple UPOS tags or those trained without morphology and, lastly, that current evaluation practices for lemmatization are not adequate to clearly discriminate between models. + 157–191 + 2024.cl-1.6 + toporkov-agerri-2024-role + + + Stance Detection with Explanations + Rudra RanajeeSaha + Laks V. S.Lakshmanan + Raymond T.Ng + 10.1162/coli_a_00501 + Identification of stance has recently gained a lot of attention with the extreme growth of fake news and filter bubbles. Over the last decade, many feature-based and deep-learning approaches have been proposed to solve stance detection. However, almost none of the existing works focus on providing a meaningful explanation for their prediction. In this work, we study stance detection with an emphasis on generating explanations for the predicted stance by capturing the pivotal argumentative structure embedded in a document. We propose to build a stance tree that utilizes rhetorical parsing to construct an evidence tree and to use Dempster Shafer Theory to aggregate the evidence. Human studies show that our unsupervised technique of generating stance explanations outperforms the SOTA extractive summarization method in terms of informativeness, non-redundancy, coverage, and overall quality. Furthermore, experiments show that our explanation-based stance prediction excels or matches the performance of the SOTA model on various benchmark datasets. + 193–235 + 2024.cl-1.7 + saha-etal-2024-stance + + + Can Large Language Models Transform Computational Social Science? + CalebZiems + WilliamHeld + OmarShaikh + JiaaoChen + ZhehaoZhang + DiyiYang + 10.1162/coli_a_00502 + Large language models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the computational social science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools. Towards this end, we contribute a set of prompting best practices and an extensive evaluation pipeline to measure the zero-shot performance of 13 language models on 25 representative English CSS benchmarks. On taxonomic labeling tasks (classification), LLMs fail to outperform the best fine-tuned models but still achieve fair levels of agreement with humans. On free-form coding tasks (generation), LLMs produce explanations that often exceed the quality of crowdworkers’ gold references. We conclude that the performance of today’s LLMs can augment the CSS research pipeline in two ways: (1) serving as zero-shot data annotators on human annotation teams, and (2) bootstrapping challenging creative generation tasks (e.g., explaining the underlying attributes of a text). In summary, LLMs are posed to meaningfully participate in social science analysis in partnership with humans. + 237–291 + 2024.cl-1.8 + ziems-etal-2024-large + + + Language Model Behavior: A Comprehensive Survey + Tyler A.Chang + Benjamin K.Bergen + 10.1162/coli_a_00492 + Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models. + 293–350 + 2024.cl-1.9 + chang-bergen-2024-language + + + <fixed-case>P</fixed-case>olysemy—<fixed-case>E</fixed-case>vidence from Linguistics, Behavioral Science, and Contextualized Language Models + JanoschHaber + MassimoPoesio + 10.1162/coli_a_00500 + Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning the representation and processing of polysemous words. But fuelled by the growing availability of large, crowdsourced datasets providing substantial empirical evidence; improved behavioral methodology; and the development of contextualized language models capable of encoding the fine-grained meaning of a word within a given context, the literature on polysemy recently has developed more complex theoretical analyses. In this survey we discuss these recent contributions to the investigation of polysemy against the backdrop of a long legacy of research across multiple decades and disciplines. Our aim is to bring together different perspectives to achieve a more complete picture of the heterogeneity and complexity of the phenomenon of polysemy. Specifically, we highlight evidence supporting a range of hybrid models of the mental processing of polysemes. These hybrid models combine elements from different previous theoretical approaches to explain patterns and idiosyncrasies in the processing of polysemous that the best known models so far have failed to account for. Our literature review finds that (i) traditional analyses of polysemy can be limited in their generalizability by loose definitions and selective materials; (ii) linguistic tests provide useful evidence on individual cases, but fail to capture the full range of factors involved in the processing of polysemous sense extensions; and (iii) recent behavioral (psycho) linguistics studies, large-scale annotation efforts, and investigations leveraging contextualized language models provide accumulating evidence suggesting that polysemous sense similarity covers a wide spectrum between identity of sense and homonymy-like unrelatedness of meaning. We hope that the interdisciplinary account of polysemy provided in this survey inspires further fundamental research on the nature of polysemy and better equips applied research to deal with the complexity surrounding the phenomenon, for example, by enabling the development of benchmarks and testing paradigms for large language models informed by a greater portion of the rich evidence on the phenomenon currently available. + 351–417 + 2024.cl-1.10 + haber-poesio-2024-polysemy + +
+
diff --git a/data/xml/2024.cl4health.xml b/data/xml/2024.cl4health.xml new file mode 100644 index 0000000000..a80a42db68 --- /dev/null +++ b/data/xml/2024.cl4health.xml @@ -0,0 +1,414 @@ + + + + + Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024 + DinaDemner-Fushman + SophiaAnaniadou + PaulThompson + BrianOndov + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.cl4health-1 + cl4health + ws + + + 2024.cl4health-1.0 + cl4health-2024-patient + + + Improving Sign Language Production in the Healthcare Domain Using <fixed-case>UMLS</fixed-case> and Multi-task Learning + Jonathan DavidMutal + RaphaelRubino + PierretteBouillon + BastienDavid + JohannaGerlach + IreneStrasly + 1–7 + This paper presents a study on Swiss-French sign language production in the medical domain. In emergency care settings, a lack of clear communication can interfere with accurate delivery of health related services. For patients communicating with sign language, equal access to healthcare remains an issue. While previous work has explored producing sign language gloss from a source text, we propose to extend this approach to produce a multichannel sign language output given a written French input. Furthermore, we extend our approach with a multi-task framework allowing us to include the Unified Medical Language System (UMLS) in our model. Results show that the introduction of UMLS in the training data improves model accuracy by 13.64 points. + 2024.cl4health-1.1 + mutal-etal-2024-improving + + + It’s Difficult to Be Neutral – Human and <fixed-case>LLM</fixed-case>-based Sentiment Annotation of Patient Comments + PetterMæhlum + DavidSamuel + Rebecka MariaNorman + ElmaJelin + Øyvind AndresenBjertnæs + LiljaØvrelid + ErikVelldal + 8–19 + Sentiment analysis is an important tool for aggregating patient voices, in order to provide targeted improvements in healthcare services. A prerequisite for this is the availability of in-domain data annotated for sentiment. This article documents an effort to add sentiment annotations to free-text comments in patient surveys collected by the Norwegian Institute of Public Health (NIPH). However, annotation can be a time-consuming and resource-intensive process, particularly when it requires domain expertise. We therefore also evaluate a possible alternative to human annotation, using large language models (LLMs) as annotators. We perform an extensive evaluation of the approach for two openly available pretrained LLMs for Norwegian, experimenting with different configurations of prompts and in-context learning, comparing their performance to human annotators. We find that even for zero-shot runs, models perform well above the baseline for binary sentiment, but still cannot compete with human annotators on the full dataset. + 2024.cl4health-1.2 + maehlum-etal-2024-difficult + + + Simulating Diverse Patient Populations Using Patient Vignettes and Large Language Models + DanielReichenpfader + KerstinDenecke + 20–25 + Ensuring equitable access to digital therapeutics (DTx) is essential to avoid healthcare inequalities in an era of increasing digitization. This requires DTx to be tested with users from diverse populations, which is often not realistic due to time and resource constraints. In this paper, we propose the use of large language models (LLMs) to simulate diverse patients. Specifically, we manually create a patient vignette that characterizes a specific population group. Variations of this vignette are used for role-prompting a commercial LLM, GPT-4, instructing the LLM to take on the role described in the patient vignette and act accordingly. We investigate if the LLM stays in its given role. To do this, we simulate a medical anamnesis interview with the role-prompted LLM and analyze its responses for compliance, coherence, correctness, containment, and clarification. Our results show that GPT-4 generates compliant, coherent and clinically valid responses, including information that is not explicitly stated in the provided patient vignette. + 2024.cl4health-1.3 + reichenpfader-denecke-2024-simulating + + + Annotating Emotions in Acquired Brain Injury Patients’ Narratives + SaloméKlein + AmaliaTodirascu + HélèneVassiliadou + MarieKuppelin + JoffreyBecart + ThalassioBriand + ClaraCoridon + FrancineGerhard-Krait + JoéLaroche + JeanUlrich + AgataKrasny-Pacini + 26–36 + In this article, we aim to measure the patients’ progress in recognizing and naming emotions by capturing a variety of phenomena that express emotion in discourse. To do so, we introduce an emotion annotation scheme adapted for Acquired Brain Injury (ABI) patients’ narratives. We draw on recent research outcomes in line with linguistic and psychological theories of emotion in the development of French resources for Natural Language Processing (NLP). From this perspective and following Battistelli et al. (2022) guidelines, our protocol considers several means of expressing emotions, including prototypical expressions as well as implicit means. Its originality lies on the methodology adopted for its creation, as we combined, adapted, and tested several previous annotation schemes to create a tool tailored to our spoken clinical French corpus and its unique characteristics and challenges. + 2024.cl4health-1.4 + klein-etal-2024-annotating + + + Structuring Clinical Notes of <fixed-case>I</fixed-case>talian <fixed-case>ST</fixed-case>-elevation Myocardial Infarction Patients + VittorioTorri + SaraMazzucato + StefanoDalmiani + UmbertoParadossi + ClaudioPassino + SaraMoccia + SilvestroMicera + FrancescaIeva + 37–43 + In recent years, it has become common for patients to get full access to their Electronic Health Records (EHRs), thanks to the advancements in the EHRs systems of many healthcare providers. While this access empowers patients and doctors with comprehensive and real-time health information, it also introduces new challenges, in particular due to the unstructured nature of much of the information within EHRs. To address this, we propose a pipeline to structure clinical notes, providing them with a clear and concise overview of their health data and its longitudinal evolution, also allowing clinicians to focus more on patient care during consultations. In this paper, we present preliminary results on extracting structured information from anamneses of patients diagnosed with ST-Elevation Myocardial Infarction from an Italian hospital. Our pipeline exploits text classification models to extract relevant clinical variables, comparing rule-based, recurrent neural network and BERT-based models. While various approaches utilized ontologies or knowledge graphs for Italian data, our work represents the first attempt to develop this type of pipeline. The results for the extraction of most variables are satisfactory (f1-score > 0.80), with the exception of the most rare values of certain variables, for which we propose future research directions to investigate. + 2024.cl4health-1.5 + torri-etal-2024-structuring + + + Towards <fixed-case>AI</fixed-case>-supported Health Communication in Plain Language: Evaluating Intralingual Machine Translation of Medical Texts + SilvanaDeilen + EkaterinaLapshinova-Koltunski + SergioHernández Garrido + ChristianeMaaß + JulianHörner + VanessaTheel + SophieZiemer + 44–53 + In this paper, we describe results of a study on evaluation of intralingual machine translation. The study focuses on machine translations of medical texts into Plain German. The automatically simplified texts were compared with manually simplified texts (i.e., simplified by human experts) as well as with the underlying, unsimplified source texts. We analyse the quality of the translations based on different criteria, such as correctness, readability, and syntactic complexity. The study revealed that the machine translations were easier to read than the source texts, but contained a higher number of complex syntactic relations than the human translations. Furthermore, we identified various types of mistakes. These included not only grammatical mistakes but also content-related mistakes that resulted, for example, from mistranslations of grammatical structures, ambiguous words or numbers, omissions of relevant prefixes or negation, and incorrect explanations of technical terms. + 2024.cl4health-1.6 + deilen-etal-2024-towards + + + Large Language Models as Drug Information Providers for Patients + LucaGiordano + Maria Piadi Buono + 54–63 + Recently, a significant interest has arisen about the application of Large Language Models (LLMs) in medical settings to enhance various aspects of healthcare. Particularly, the application of such models to improve knowledge access for both clinicians and patients seems very promising but still far from perfect. In this paper, we present a preliminary evaluation of LLMs as drug information providers to support patients in drug administration. We focus on posology, namely dosage quantity and prescription, contraindications and adverse drug reactions and run an experiment on the Italian language to assess both the trustworthiness of the outputs and their readability. The results show that different types of errors affect the LLM answers. In some cases, the model does not recognize the drug name, due to the presence of synonymous words, or it provides untrustworthy information, caused by intrinsic hallucinations. Overall, the complexity of the language is lower and this could contribute to make medical information more accessible to lay people. + 2024.cl4health-1.7 + giordano-di-buono-2024-large + + + Towards Generation of Personalised Health Intervention Messages + ClaraWan Ching Ho + VolhaPetukhova + 64–72 + Self-care is essential in managing chronic diseases when patients could not always be monitored by medical staff. It therefore fills in the gap to provide patients with advice in improving their conditions in day-to-day practices. However, effectiveness of self-interventions in encouraging healthy behaviour is limited, as they are often delivered in the same manner for patients regardless of their demographics, personality and individual preferences. In this paper, we propose strategies to generate personalized health intervention messages departing from assumptions made by theories of social cognition and learning, planned behaviour and information processing. The main task is then defined personalised argument generation task. Specifically, an existing well-performing Natural Language Generation (NLG) pipeline model is extended to modulate linguistic features by ranking texts generated based on individuals’ predicted preferences for persuasive messages. Results show that the model is capable of generating diverse intervention messages while preserving the original intended meaning. The modulated interventions were approved by human evaluators as being more understandable and maintaining the same level of convincingness as human-written texts. However, the generated personalised interventions did not show significant improvements in the power to change health-related attitudes and/or behaviour compared to their non-personalised counterparts. This is attributed to the fact that human data collected for the model’s training was rather limited in size and variation. + 2024.cl4health-1.8 + wan-ching-ho-petukhova-2024-towards + + + Analysing Emotions in Cancer Narratives: A Corpus-Driven Approach + Daisy MonikaLal + PaulRayson + Sheila A.Payne + YufengLiu + 73–83 + Cancer not only affects a patient’s physical health, but it can also elicit a wide spectrum of intense emotions in patients, friends, and family members. People with cancer and their carers (family member, partner, or friend) are increasingly turning to the web for information and support. Despite the expansion of sentiment analysis in the context of social media and healthcare, there is relatively less research on patient narratives, which are longer, more complex texts, and difficult to assess. In this exploratory work, we examine how patients and carers express their feelings about various aspects of cancer (treatments and stages). The objective of this paper is to illustrate with examples the nature of language in the clinical domain, as well as the complexities of language when performing automatic sentiment and emotion analysis. We perform a linguistic analysis of a corpus of cancer narratives collected from Reddit. We examine the performance of five state-of-the-art models (T5, DistilBERT, Roberta, RobertaGo, and NRCLex) to see how well they match with human comparisons separated by linguistic and medical background. The corpus yielded several surprising results that could be useful to sentiment analysis NLP experts. The linguistic issues encountered were classified into four categories: statements expressing a variety of emotions, ambiguous or conflicting statements with contradictory emotions, statements requiring additional context, and statements in which sentiment and emotions can be inferred but are not explicitly mentioned. + 2024.cl4health-1.9 + lal-etal-2024-analysing + + + Study of Medical Text Reading and Comprehension through Eye-Tracking Fixations + OksanaIvchenko + NataliaGrabar + 84–92 + Reading plays a crucial role in cognitive processes, acting as the primary way in which people access and assimilate information. However, the ability to effectively comprehend and understand text is significantly influenced by various factors related to people and text types. We propose to study the reading easiness and comprehension of texts through the eye-tracking technology, which tracks gaze and records eye movement during reading. We concentrate on the study of eye-tracking measures related to fixations (average duration of fixations and number of fixations). The experiments are performed on several types of texts (clinical cases, encyclopedia articles related to the medical area, general-language texts, and simplified clinical cases). Eye-tracking measures are analysed quantitatively and qualitatively to draw the reading patterns and analyse how the reading differs across the text types. + 2024.cl4health-1.10 + ivchenko-grabar-2024-study + + + A Neuro-Symbolic Approach to Monitoring Salt Content in Food + AnujaTayal + BarbaraDi Eugenio + DevikaSalunke + Andrew D.Boyd + Carolyn A.Dickens + Eulalia P.Abril + OlgaGarcia-Bedoya + Paula G.Allen-Meares + 93–103 + We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings indicate that while fine-tuning transformer-based models on the dataset yields limited performance, the integration of Neuro-Symbolic Rules significantly enhances the system’s performance. Our experiments show that by integrating neuro-symbolic rules, our system achieves an improvement in joint goal accuracy of over 20% across different data sizes compared to naively fine-tuning transformer-based models. + 2024.cl4health-1.11 + tayal-etal-2024-neuro + + + On Simplification of Discharge Summaries in <fixed-case>S</fixed-case>erbian: Facing the Challenges + AnđelkaZečević + MilicaĆulafić + StefanStojković + 104–108 + The simplified information page (SIP) is a simplified discharge summary created to mitigate health risks caused by low medical comprehension. One of the most critical aspects of medical comprehension concerns interpreting medication instructions such as proper dosing, frequency, and duration. In our work, we examine the capacities of mainstream Large Language Models (LLMs) such as ChatGPT and Gemini to generate SIP-like medication-oriented pages based on the provided discharge summaries. We are sharing the initial qualitative assessments of our study based on a small collection of discharge summaries in Serbian, pointing to noticed inaccuracies, unfaithful content, and language quality. Hopefully, these findings might be helpful in addressing the multilingual perspective of patient-oriented language. + 2024.cl4health-1.12 + zecevic-etal-2024-simplification + + + Medical-<fixed-case>FLAVORS</fixed-case>: A Figurative Language and Vocabulary Open Repository for <fixed-case>S</fixed-case>panish in the Medical Domain + LuciaPitarch + EmmaAngles-Herrero + YufengLiu + Daisy MonikaLal + JorgeGracia + PaulRayson + JudithRietjens + 109–114 + Metaphors shape the way we think by enabling the expression of one concept in terms of another one. For instance, cancer can be understood as a place from which one can go in and out, as a journey that one can traverse, or as a battle. Giving patients awareness of the way they refer to cancer and different narratives in which they can reframe it has been proven to be a key aspect when experiencing the disease. In this work, we propose a preliminary identification and representation of Spanish cancer metaphors using MIP (Metaphor Identification Procedure) and MetaNet. The created resource is the first openly available dataset for medical metaphors in Spanish. Thus, in the future, we expect to use it as the gold standard in automatic metaphor processing tasks, which will also serve to further populate the resource and understand how cancer is experienced and narrated. + 2024.cl4health-1.13 + pitarch-etal-2024-medical + + + Generating Synthetic Documents with Clinical Keywords: A Privacy-Sensitive Methodology + SimonMeoni + ÉricDe la Clergerie + ThéoRyffel + 115–123 + Electronic Health Records store valuable patient-staff interaction data. These notes, often unstructured to save healthcare personnel time, can be challenging to analyze manually. Proprietary online Large Language Models have demonstrated impressive results in analyzing EHR notes. However, Clinical NLP faces unique challenges due to the sensitive and specialized nature of the data. Sending patient information via external APIs poses privacy risks, and hospitals require customized NLP systems to align with their unique practices. To address these challenges, developing customized LLMs using specific training datasets is crucial. To address this, we propose generating synthetic training data using keywords extracted without confidential information. Furthermore, we introduce a reward mechanism that iteratively refines the quality of synthetic documents. This involves scoring synthetic candidates against real clinical reports using a semantic textual similarity score and performing an aligment step to align the model with its best-scored utterances. + 2024.cl4health-1.14 + meoni-etal-2024-generating + + + Building Certified Medical Chatbots: Overcoming Unstructured Data Limitations with Modular <fixed-case>RAG</fixed-case> + LeonardoSanna + PatrizioBellan + SimoneMagnolini + MarinaSegala + SabaGhanbari Haez + MonicaConsolandi + MauroDragoni + 124–130 + Creating a certified conversational agent poses several issues. The need to manage fine-grained information delivery and the necessity to provide reliable medical information requires a notable effort, especially in dataset preparation. In this paper, we investigate the challenges of building a certified medical chatbot in Italian that provides information about pregnancy and early childhood. We show some negative initial results regarding the possibility of creating a certified conversational agent within the RASA framework starting from unstructured data. Finally, we propose a modular RAG model to implement a Large Language Model in a certified context, overcoming data limitations and enabling data collection on actual conversations. + 2024.cl4health-1.15 + sanna-etal-2024-building + + + Towards Using Automatically Enhanced Knowledge Graphs to Aid Temporal Relation Extraction + TimotejKnez + SlavkoŽitnik + 131–136 + Temporal relation extraction in medical document analysis is crucial for understanding patient histories and treatment outcomes. This paper introduces a novel approach leveraging a bimodal model integrating textual content and a knowledge graph, to enhance temporal relation extraction. The paper presents ongoing research in constructing an optimal knowledge graph by augmenting PrimeKG with dynamically expanded information using a language model-generated knowledge graph, and further personalize the information with patient-specific graphs tailored for relation prediction. The pipeline for constructing this enriched knowledge graph is detailed, aiming to improve the capabilities of temporal relation extraction models. The preliminary results show that adding a simple knowledge graph to the temporal relation extraction model can significantly increase the performance, achieving new state-of-the-art results. While the research in using enhanced knowledge graphs is still ongoing, this paper lays the groundwork for leveraging common knowledge to advance temporal relation extraction in medical contexts. This approach holds promise for enhancing the understanding of patient histories and treatment outcomes, potentially leading to improved healthcare decision-making and patient care. + 2024.cl4health-1.16 + knez-zitnik-2024-towards + + + Experiments in Automated Generation of Discharge Summaries in <fixed-case>I</fixed-case>talian + LorenzoRuinelli + AmosColombo + MathildeRochat + Sotirios GeorgiosPopeskou + AndreaFranchini + SandraMitrović + Oscar WilliamLithgow + JosephCornelius + FabioRinaldi + 137–144 + Hospital discharge letters are a fundamental component of patient management, as they provide the crucial information needed for patient post-hospital care. However their creation is very demanding and resource intensive, as it requires consultation of several reports documenting the patient’s journey throughout their hospital stay. Given the increasing pressures on doctor’s time, tools that can draft a reasonable discharge summary, to be then reviewed and finalized by the experts, would be welcome. In this paper we present a comparative study exploring the possibility of automatic generation of discharge summaries within the context of an hospital in an Italian-speaking region and discuss quantitative and qualitative results. Despite some shortcomings, the obtained results show that a generic generative system such as ChatGPT is capable of producing discharge summaries which are relatively close to the human generated ones, even in Italian. + 2024.cl4health-1.17 + ruinelli-etal-2024-experiments + + + Evaluating <fixed-case>LLM</fixed-case>s for Temporal Entity Extraction from Pediatric Clinical Text in Rare Diseases Context + Judith JeyafreedaAndrew + MarcVincent + AnitaBurgun + NicolasGarcelon + 145–152 + The aim of this work is to extract Temporal Entities from patients’ EHR from pediatric hospital specialising in Rare Diseases, thus allowing to create a patient timeline relative to diagnosis . We aim to perform an evaluation of NLP tools and Large Language Models (LLM) to test their application in the field of clinical study where data is limited and sensitive. We present a short annotation guideline for temporal entity identification. We then use the tool EDS-NLP, the Language Model CamemBERT-with-Dates and the LLM Vicuna to extract temporal entities. We perform experiments using three different prompting techniques on the LLM Vicuna to evaluate the model thoroughly. We use a small dataset of 50 EHR describing the evolution of rare diseases in patients to perform our experiments. We show that among the different methods to prompt a LLM, using a decomposed structure of prompting method on the LLM vicuna produces the best results for temporal entity recognition. The LLM learns from examples in the prompt and decomposing one prompt to several prompts allows the model to avoid confusions between the different entity types. Identifying the temporal entities in EHRs helps to build the timeline of a patient and to learn the evolution of a diseases. This is specifically important in the case of rare diseases due to the availability of limited examples. In this paper, we show that this can be made possible with the use of Language Models and LLM in a secure environment, thus preserving the privacy of the patient + 2024.cl4health-1.18 + andrew-etal-2024-evaluating + + + Generating Distributable Surrogate Corpus for Medical Multi-label Classification + SeijiShimizu + ShuntaroYada + ShokoWakamiya + EijiAramaki + 153–162 + In medical and social media domains, annotated corpora are often hard to distribute due to copyrights and privacy issues. To overcome this situation, we propose a new method to generate a surrogate corpus for a downstream task by using a text generation model. We chose a medical multi-label classification task, MedWeb, in which patient-generated short messages express multiple symptoms. We first fine-tuned text generation models with different prompting designs on the original corpus to obtain synthetic versions of that corpus. To assess the viability of the generated corpora for the downstream task, we compared the performance of multi-label classification models trained either on the original or the surrogate corpora. The results and the error analysis showed the difficulty of generating surrogate corpus in multi-label settings, suggesting text generation under complex conditions is not trivial. On the other hand, our experiment demonstrates that the generated corpus with a sentinel-based prompting is comparatively viable in a single-label (multiclass) classification setting. + 2024.cl4health-1.19 + shimizu-etal-2024-generating + + + <fixed-case>C</fixed-case>lini<fixed-case>R</fixed-case>es: Publicly Available Mapping of Clinical Lexical Resources + ElenaZotova + MontseCuadros + GermanRigau + 163–172 + This paper presents a human-readable resource for mapping identifiers from various clinical knowledge bases. This resource is a version of UMLS Metathesaurus enriched with WordNet 3.0 and 3.1 synsets, Wikidata items with their clinical identifiers, SNOMED CT to ICD-10 mapping and Spanish ICD-10 codes description. The main goal of the presented resource is to provide semantic interoperability across the clinical concepts from various knowledge bases and facilitate its integration into mapping tools. As a side effect, the mapping enriches already annotated medical corpora for entity recognition or entity linking tasks with new labels. We experiment with entity linking task, using a corpus annotated both manually and with the mapping method and demonstrate that a semi-automatic way of annotation may be used to create new labels. The resource is available in English and Spanish, although all languages of UMLS may be extracted. The new lexical resource is publicly available. + 2024.cl4health-1.20 + zotova-etal-2024-clinires + + + <fixed-case>M</fixed-case>ed<fixed-case>D</fixed-case>ialog-<fixed-case>FR</fixed-case>: A <fixed-case>F</fixed-case>rench Version of the <fixed-case>M</fixed-case>ed<fixed-case>D</fixed-case>ialog Corpus for Multi-label Classification and Response Generation Related to Women’s Intimate Health + XingyuLiu + VincentSegonne + AidanMannion + DidierSchwab + LorraineGoeuriot + FrançoisPortet + 173–183 + This article presents MedDialog-FR, a large publicly available corpus of French medical conversations for the medical domain. Motivated by the lack of French dialogue corpora for data-driven dialogue systems and the paucity of available information related to women’s intimate health, we introduce an annotated corpus of question-and-answer dialogues between a real patient and a real doctor concerning women’s intimate health. The corpus is composed of about 20,000 dialogues automatically translated from the English version of MedDialog-EN. The corpus test set is composed of 1,400 dialogues that have been manually post-edited and annotated with 22 categories from the UMLS ontology. We also fine-tuned state-of-the-art reference models to automatically perform multi-label classification and response generation to give an initial performance benchmark and highlight the difficulty of the tasks. + 2024.cl4health-1.21 + liu-etal-2024-meddialog + + + Exploring the Suitability of Transformer Models to Analyse Mental Health Peer Support Forum Data for a Realist Evaluation + MatthewCoole + PaulRayson + ZoeGlossop + FionaLobban + PaulMarshall + JohnVidler + 184–188 + Mental health peer support forums have become widely used in recent years. The emerging mental health crisis and the COVID-19 pandemic have meant that finding a place online for support and advice when dealing with mental health issues is more critical than ever. The need to examine, understand and find ways to improve the support provided by mental health forums is vital in the current climate. As part of this, we present our initial explorations in using modern transformer models to detect four key concepts (connectedness, lived experience, empathy and gratitude), which we believe are essential to understanding how people use mental health forums and will serve as a basis for testing more expansive realise theories about mental health forums in the future. As part of this work, we also replicate previously published results on empathy utilising an existing annotated dataset and test the other concepts on our manually annotated mental health forum posts dataset. These results serve as a basis for future research examining peer support forums. + 2024.cl4health-1.22 + coole-etal-2024-exploring + + + Revisiting the <fixed-case>MIMIC</fixed-case>-<fixed-case>IV</fixed-case> Benchmark: Experiments Using Language Models for Electronic Health Records + JesusLovon-Melgarejo + ThouriaBen-Haddi + JulesDi Scala + Jose G.Moreno + LyndaTamine + 189–196 + The lack of standardized evaluation benchmarks in the medical domain for text inputs can be a barrier to widely adopting and leveraging the potential of natural language models for health-related downstream tasks. This paper revisited an openly available MIMIC-IV benchmark for electronic health records (EHRs) to address this issue. First, we integrate the MIMIC-IV data within the Hugging Face datasets library to allow an easy share and use of this collection. Second, we investigate the application of templates to convert EHR tabular data to text. Experiments using fine-tuned and zero-shot LLMs on the mortality of patients task show that fine-tuned text-based models are competitive against robust tabular classifiers. In contrast, zero-shot LLMs struggle to leverage EHR representations. This study underlines the potential of text-based approaches in the medical field and highlights areas for further improvement. + 2024.cl4health-1.23 + lovon-melgarejo-etal-2024-revisiting + + + Unraveling Clinical Insights: A Lightweight and Interpretable Approach for Multimodal and Multilingual Knowledge Integration + KanimozhiUma + Marie-FrancineMoens + 197–203 + In recent years, the analysis of clinical texts has evolved significantly, driven by the emergence of language models like BERT such as PubMedBERT, and ClinicalBERT, which have been tailored for the (bio)medical domain that rely on extensive archives of medical documents. While they boast high accuracy, their lack of interpretability and language transfer limitations restrict their clinical utility. To address this, we propose a new, lightweight graph-based embedding method designed specifically for radiology reports. This approach considers the report’s structure and content, connecting medical terms through the multilingual SNOMED Clinical Terms knowledge base. The resulting graph embedding reveals intricate relationships among clinical terms, enhancing both clinician comprehension and clinical accuracy without the need for large pre-training datasets. Demonstrating the versatility of our method, we apply this embedding to two tasks: disease and image classification in X-ray reports. In disease classification, our model competes effectively with BERT-based approaches, yet it is significantly smaller and requires less training data. Additionally, in image classification, we illustrate the efficacy of the graph embedding by leveraging cross-modal knowledge transfer, highlighting its applicability across diverse languages. + 2024.cl4health-1.24 + uma-moens-2024-unraveling + + + Automated Question-Answer Generation for Evaluating <fixed-case>RAG</fixed-case>-based Chatbots + Juan JoséGonzález Torres + Mihai BogdanBîndilă + SebastiaanHofstee + DanielSzondy + Quang-HungNguyen + ShenghuiWang + GwennEnglebienne + 204–214 + In this research, we propose a framework to generate human-like question-answer pairs with long or factoid answers automatically and, based on them, automatically evaluate the quality of Retrieval-Augmented Generation (RAG). Our framework can also create datasets that assess hallucination levels of Large Language Models (LLMs) by simulating unanswerable questions. We then apply the framework to create a dataset of question-answer (QA) pairs based on more than 1,000 leaflets about the medical and administrative procedures of a hospital. The dataset was evaluated by hospital specialists, who confirmed that more than 50% of the QA pairs are applicable. Finally, we show that our framework can be used to evaluate LLM performance by using Llama-2-13B fine-tuned in Dutch (Vanroy, 2023) with the generated dataset, and show the method’s use in testing models with regard to answering unanswerable and factoid questions appears promising. + 2024.cl4health-1.25 + gonzalez-torres-etal-2024-automated + + + Speech Accommodation in Health-Care Interactions: Evidence Using a Mixed-Reality Platform + RoseBaker + Susan C.Bobb + Dai’ShaDowson + ElishaEanes + MakyahMcNeill + HannahRagsdale + AudreyEaves + Joseph G.Lee + KathrinRothermich + 215–219 + Many people in the US use more than one language at home, yet English remains the dominant (L1) language in US society, which can complicate medical encounters. In this study we ask in what ways effective communication can be ensured in health care settings when speakers differ in language proficiency. One strategy people use is second language (L2) speech accommodation, which is characterized by slowed speech, less complex words, and clearer enunciation. We employ a mixed-reality platform called MURSION to document how a group of Physician Assistant students use speech accommodation during a healthcare encounter. MURSION is a computer-based virtual environment where participants interact with an Avatar controlled by a human interactor in a standardized environment. We record 5-minute interactions between the student and a high or low English proficiency Avatar. Our analyses evaluate lexical choices in L1-L2 interactions with SCOPE (South Carolina Psycholinguistic Metabase) and acoustic properties with PRAAT. Results show that clinical students use slower speech and high frequency words when speaking to a low proficiency virtual patient, indicating a sensitivity for the communicative needs of L2 English users. Speech accommodation results will contribute to communication training modules for clinicians to interact efficiently with linguistically diverse populations. + 2024.cl4health-1.26 + baker-etal-2024-speech + + + Enhancing Consumer Health Question Reformulation: Chain-of-Thought Prompting Integrating Focus, Type, and User Knowledge Level + JooyeonLee + Luan HuyPham + ÖzlemUzuner + 220–228 + In this paper, we explore consumer health question (CHQ) reformulation, focusing on enhancing the quality of reformation of questions without considering interest shifts. Our study introduces the use of the NIH GARD website as a gold standard dataset for this specific task, emphasizing its relevance and applicability. Additionally, we developed other datasets consisting of related questions scraped from Google, Bing, and Yahoo. We augmented, evaluated and analyzed the various datasets, demonstrating that the reformulation task closely resembles the question entailment generation task. Our approach, which integrates the Focus and Type of consumer inquiries, represents a significant advancement in the field of question reformulation. We provide a comprehensive analysis of different methodologies, offering insights into the development of more effective and user-centric AI systems for consumer health support. + 2024.cl4health-1.27 + lee-etal-2024-enhancing + + + Exploring the Challenges of Behaviour Change Language Classification: A Study on Semi-Supervised Learning and the Impact of Pseudo-Labelled Data + SelinaMeyer + MarcosFernandez-Pichel + DavidElsweiler + David E.Losada + 229–239 + Automatic classification of behaviour change language can enhance conversational agents’ capabilities to adjust their behaviour based on users’ current situations and to encourage individuals to make positive changes. However, the lack of annotated language data of change-seekers hampers the performance of existing classifiers. In this study, we investigate the use of semi-supervised learning (SSL) to classify highly imbalanced texts around behaviour change. We assess the impact of including pseudo-labelled data from various sources and examine the balance between the amount of added pseudo-labelled data and the strictness of the inclusion criteria. Our findings indicate that while adding pseudo-labelled samples to the training data has limited classification impact, it does not significantly reduce performance regardless of the source of these new samples. This reinforces previous findings on the feasibility of applying classifiers trained on behaviour change language to diverse contexts. + 2024.cl4health-1.28 + meyer-etal-2024-exploring + + + Development of a Benchmark Corpus for Medical Device Adverse Event Detection + SusmithaWunnava + David A.Harris + Florence T.Bourgeois + Timothy A.Miller + 240–245 + The U.S. Food and Drug Administration (FDA) collects real-world adverse events, including device-associated deaths, injuries, and malfunctions, through passive reporting to the agency’s Manufacturer and User Facility Device Experience (MAUDE) database. However, this system’s full potential remains untapped given the extensive use of unstructured text in medical device adverse event reports and lack of FDA resources and expertise to properly analyze all available data. In this work, we focus on addressing this limitation through the development of an annotated benchmark corpus to support the design and development of state-of-the-art NLP approaches towards automatic extraction of device-related adverse event information from FDA Medical Device Adverse Event Reports. We develop a dataset of labeled medical device reports from a diverse set of high-risk device types, that can be used for supervised machine learning. We develop annotation guidelines and manually annotate for nine entity types. The resulting dataset contains 935 annotated adverse event reports, containing 12252 annotated spans across the nine entity types. The dataset developed in this work will be made publicly available upon publication. + 2024.cl4health-1.29 + wunnava-etal-2024-development + + + Using <fixed-case>BART</fixed-case> to Automatically Generate Discharge Summaries from <fixed-case>S</fixed-case>wedish Clinical Text + NilsBerg + HerculesDalianis + 246–252 + Documentation is a regular part of contemporary healthcare practices and one such documentation task is the creation of a discharge summary, which summarizes a care episode. However, to manually write discharge summaries is a time-consuming task, and research has shown that discharge summaries are often lacking quality in various respects. To alleviate this problem, text summarization methods could be applied on text from electronic health records, such as patient notes, to automatically create a discharge summary. Previous research has been conducted on this topic on text in various languages and with various methods, but no such research has been conducted on Swedish text. In this paper, four datasets extracted from a Swedish clinical corpora were used to fine-tune four BART language models to perform the task of summarizing Swedish patient notes into a discharge summary. Out of these models, the best performing model was manually evaluated by a senior, now retired, nurse and clinical coder. The evaluation results show that the best performing model produces discharge summaries of overall low quality. This is possibly due to issues in the data extracted from the Health Bank research infrastructure, which warrants further work on this topic. + 2024.cl4health-1.30 + berg-dalianis-2024-using + + + Biomedical Entity Linking for <fixed-case>D</fixed-case>utch: Fine-tuning a Self-alignment <fixed-case>BERT</fixed-case> Model on an Automatically Generated <fixed-case>W</fixed-case>ikipedia Corpus + FonsHartendorp + TomSeinen + Erikvan Mulligen + SuzanVerberne + 253–263 + Biomedical entity linking, a main component in automatic information extraction from health-related texts, plays a pivotal role in connecting textual entities (such as diseases, drugs and body parts mentioned by patients) to their corresponding concepts in a structured biomedical knowledge base. The task remains challenging despite recent developments in natural language processing. This report presents the first evaluated biomedical entity linking model for the Dutch language. We use MedRoBERTa.nl as basemodel and perform second-phase pretraining through self-alignment on a Dutch biomedical ontology extracted from the UMLS and Dutch SNOMED. We derive a corpus from Wikipedia of ontology-linked Dutch biomedical entities in context and fine-tune our model on this dataset. We evaluate our model on the Dutch portion of the Mantra GSC-corpus and achieve 54.7% classification accuracy and 69.8% 1-distance accuracy. We then perform a case study on a collection of unlabeled, patient-support forum data and show that our model is hampered by the limited quality of the preceding entity recognition step. Manual evaluation of small sample indicates that of the correctly extracted entities, around 65% is linked to the correct concept in the ontology. Our results indicate that biomedical entity linking in a language other than English remains challenging, but our Dutch model can be used to for high-level analysis of patient-generated text. + 2024.cl4health-1.31 + hartendorp-etal-2024-biomedical + + + Unveiling Voices: Identification of Concerns in a Social Media Breast Cancer Cohort via Natural Language Processing + SwatiRajwal + Avinash KumarPandey + ZhishuoHan + AbeedSarker + 264–270 + We leveraged a dataset of ∼1.5 million Twitter (now X) posts to develop a framework for analyzing breast cancer (BC) patients’ concerns and possible reasons for treatment discontinuation. Our primary objectives were threefold: (1) to curate and collect data from a BC cohort; (2) to identify topics related to uncertainty/concerns in BC-related posts; and (3) to conduct a sentiment intensity analysis of posts to identify and analyze negatively polarized posts. RoBERTa outperformed other models with a micro-averaged F1 score of 0.894 and a macro-averaged F1 score of 0.853 for (1). For (2), we used GPT-4 and BERTopic, and qualitatively analyzed posts under relevant topics. For (3), sentiment intensity analysis of posts followed by qualitative analyses shed light on potential reasons behind treatment discontinuation. Our work demonstrates the utility of social media mining to discover BC patient concerns. Information derived from the cohort data may help design strategies in the future for increasing treatment compliance. + 2024.cl4health-1.32 + rajwal-etal-2024-unveiling + + + Intent Detection and Entity Extraction from Biomedical Literature + AnkanMullick + MukurGupta + PawanGoyal + 271–278 + Biomedical queries have become increasingly prevalent in web searches, reflecting the growing interest in accessing biomedical literature. Despite recent research on large-language models (LLMs) motivated by endeavors to attain generalized intelligence, their efficacy in replacing task and domain-specific natural language understanding approaches remains questionable. In this paper, we address this question by conducting a comprehensive empirical evaluation of intent detection and named entity recognition (NER) tasks from biomedical text. We show that Supervised Fine Tuned approaches are still relevant and more effective than general-purpose LLMs. Biomedical transformer models such as PubMedBERT can surpass ChatGPT on NER task with only 5 supervised examples. + 2024.cl4health-1.33 + mullick-etal-2024-intent + +
+
diff --git a/data/xml/2024.clpsych.xml b/data/xml/2024.clpsych.xml index eb10dac684..5babfa07c2 100644 --- a/data/xml/2024.clpsych.xml +++ b/data/xml/2024.clpsych.xml @@ -46,6 +46,7 @@ Depression is a global concern suffered by millions of people, significantly impacting their thoughts and behavior. Over the years, heightened awareness, spurred by health campaigns and other initiatives, has driven the study of this disorder using data collected from social media platforms. In our research, we aim to gauge the severity of symptoms related to depression among social media users. The ultimate goal is to estimate the user’s responses to a well-known standardized psychological questionnaire, the Beck Depression Inventory-II (BDI). This is a 21-question multiple-choice self-report inventory that covers multiple topics about how the subject has been feeling. Mining users’ social media interactions and understanding psychological states represents a challenging goal. To that end, we present here an approach based on search and summarization that extracts multiple BDI-biased summaries from the thread of users’ publications. We also leverage a robust large language model to estimate the potential answer for each BDI item. Our method involves several steps. First, we employ a search strategy based on sentence similarity to obtain pertinent extracts related to each topic in the BDI questionnaire. Next, we compile summaries of the content of these groups of extracts. Last, we exploit chatGPT to respond to the 21 BDI questions, using the summaries as contextual information in the prompt. Our model has undergone rigorous evaluation across various depression datasets, yielding encouraging results. The experimental report includes a comparison against an assessment done by expert humans and competes favorably with state-of-the-art methods. 2024.clpsych-1.2 aragon-etal-2024-delving +
How Can Client Motivational Language Inform Psychotherapy Agents? @@ -56,6 +57,7 @@ Within Motivational Interviewing (MI), client utterances are coded as for or against a certain behaviour change, along with commitment strength; this is essential to ensure therapists soften rather than persisting goal-related actions in the face of resistance. Prior works in MI agents have been scripted or semi-scripted, limiting users’ natural language expressions. With the aim of automating the MI interactions, we propose and explore the task of automated identification of client motivational language. Employing Large Language Models (LLMs), we compare in-context learning (ICL) and instruction fine-tuning (IFT) with varying training sizes for this identification task. Our experiments show that both approaches can learn under low-resourced settings. Our results demonstrate that IFT, though cheaper, is more stable to prompt choice, and yields better performance with more data. Given the detected motivation, we further present an approach to the analysis of therapists’ strategies for balancing building rapport with clients with advancing the treatment plan. A framework of MI agents is developed using insights from the data and the psychotherapy literature. 2024.clpsych-1.3 hoang-etal-2024-client + Linguistic markers of schizophrenia: a case study of <fixed-case>R</fixed-case>obert <fixed-case>W</fixed-case>alser @@ -79,6 +81,7 @@ Therapist Self-Disclosure (TSD) within the context of psychotherapy entails the revelation of personal information by the therapist. The ongoing scholarly discourse surrounding the utility of TSD, spanning from the inception of psychotherapy to the present day, has underscored the need for greater specificity in conceptualizing TSD. This inquiry has yielded more refined classifications within the TSD domain, with a consensus emerging on the distinction between immediate and non-immediate TSD, each of which plays a distinct role in the therapeutic process. Despite this progress in the field of psychotherapy, the Natural Language Processing (NLP) domain currently lacks methodological solutions or explorations for such scenarios. This lacuna can be partly due to the difficulty of attaining publicly available clinical data. To address this gap, this paper presents an innovative NLP-based approach that formalizes TSD as an NLP task. The proposed methodology involves the creation of publicly available, expert-annotated test sets designed to simulate therapist utterances, and the employment of NLP techniques for evaluation purposes. By integrating insights from psychotherapy research with NLP methodologies, this study aims to catalyze advancements in both NLP and psychotherapy research. 2024.clpsych-1.5 shapira-alfi-yogev-2024-therapist + Ethical thematic and topic modelling analysis of sleep concerns in a social media derived suicidality dataset @@ -89,6 +92,7 @@ Objective: A thematic and topic modelling analysis of sleep concerns in a social media derived, privacy-preserving, suicidality dataset. This forms the basis for an exploration of sleep as a potential computational linguistic signal in suicide prevention. Background: Suicidal ideation is a limited signal for suicide. Developments in computational linguistics and mental health datasets afford an opportunity to investigate additional signals and to consider the broader clinical ethical design implications. Methodology: A clinician-led integration of reflexive thematic analysis, with machine learning topic modelling (Bertopic), and the purposeful sampling of the University of Maryland Suicidality Dataset. Results: Sleep as a place of refuge and escape, revitalisation for exhaustion, and risk and vulnerability were generated as core themes in an initial thematic analysis of 546 posts. Bertopic analysing 21,876 sleep references in 16791 posts facilitated the production of 40 topics that were clinically interpretable, relevant, and thematically aligned to a level that exceeded original expectations. Privacy and synthetic representative data, reproducibility, validity and stochastic variability of results, and a multi-signal formulation perspective, are highlighted as key research and clinical issues. 2024.clpsych-1.6 orr-etal-2024-ethical + Automatic Annotation of Dream Report’s Emotional Content with Large Language Models @@ -102,6 +106,7 @@ In the field of dream research, the study of dream content typically relies on the analysis of verbal reports provided by dreamers upon awakening from their sleep. This task is classically performed through manual scoring provided by trained annotators, at a great time expense. While a consistent body of work suggests that natural language processing (NLP) tools can support the automatic analysis of dream reports, proposed methods lacked the ability to reason over a report’s full context and required extensive data pre-processing. Furthermore, in most cases, these methods were not validated against standard manual scoring approaches. In this work, we address these limitations by adopting large language models (LLMs) to study and replicate the manual annotation of dream reports, using a mixture of off-the-shelf and bespoke approaches, with a focus on references to reports’ emotions. Our results show that the off-the-shelf method achieves a low performance probably in light of inherent linguistic differences between reports collected in different (groups of) individuals. On the other hand, the proposed bespoke text classification method achieves a high performance, which is robust against potential biases. Overall, these observations indicate that our approach could find application in the analysis of large dream datasets and may favour reproducibility and comparability of results across studies. 2024.clpsych-1.7 bertolini-etal-2024-automatic + Explainable Depression Detection Using Large Language Models on Social Media Data @@ -112,6 +117,7 @@ Due to the rapid growth of user interaction on different social media platforms, publicly available social media data has increased substantially. The sheer amount of data and level of personal information being shared on such platforms has made analyzing textual information to predict mental disorders such as depression a reliable preliminary step when it comes to psychometrics. In this study, we first proposed a system to search for texts that are related to depression symptoms from the Beck’s Depression Inventory (BDI) questionnaire, and providing a ranking for further investigation in a second step. Then, in this second step, we address the even more challenging task of automatic depression level detection, using writings and voluntary answers provided by users on Reddit. Several Large Language Models (LLMs) were applied in experiments. Our proposed system based on LLMs can generate both predictions and explanations for each question. By combining two LLMs for different questions, we achieved better performance on three of four metrics compared to the state-of-the-art and remained competitive on the one remaining metric. In addition, our system is explainable on two levels: first, knowing the answers to the BDI questions provides clues about the possible symptoms that could lead to a clinical diagnosis of depression; second, our system can explain the predicted answer for each question. 2024.clpsych-1.8 wang-etal-2024-explainable + Analysing relevance of Discourse Structure for Improved Mental Health Estimation @@ -122,6 +128,7 @@ Automated depression estimation has received significant research attention in recent years as a result of its growing impact on the global community. Within the context of studies based on patient-therapist interview transcripts, most researchers treat the dyadic discourse as a sequence of unstructured sentences, thus ignoring the discourse structure within the learning process. In this paper we propose Multi-view architectures that divide the input transcript into patient and therapist views based on sentence type in an attempt to utilize symmetric discourse structure for improved model performance. Experiments on DAIC-WOZ dataset for binary classification task within depression estimation show advantages of Multi-view architecture over sequential input representations. Our model also outperforms the current state-of-the-art results and provide new SOTA performance on test set of DAIC-WOZ dataset. 2024.clpsych-1.9 agarwal-etal-2024-analysing + Using Daily Language to Understand Drinking: Multi-Level Longitudinal Differential Language Analysis @@ -139,6 +146,7 @@ Analyses for linking language with psychological factors or behaviors predominately treat linguistic features as a static set, working with a single document per person or aggregating across multiple posts (e.g. on social media) into a single set of features. This limits language to mostly shed light on between-person differences rather than changes in behavior within-person. Here, we collected a novel dataset of daily surveys where participants were asked to describe their experienced well-being and report the number of alcoholic beverages they had within the past 24 hours. Through this data, we first build a multi-level forecasting model that is able to capture within-person change and leverage both the psychological features of the person and daily well-being responses. Then, we propose a longitudinal version of differential language analysis that finds patterns associated with drinking more (e.g. social events) and less (e.g. task-oriented), as well as distinguishing patterns of heavy drinks versus light drinkers. 2024.clpsych-1.10 matero-etal-2024-using + Prevalent Frequency of Emotional and Physical Symptoms in Social Anxiety using Zero Shot Classification: An Observational Study @@ -148,6 +156,7 @@ Social anxiety represents a prevalent challenge in modern society, affecting individuals across personal and professional spheres. Left unaddressed, this condition can yield substantial negative consequences, impacting social interactions and performance. Further understanding its diverse physical and emotional symptoms becomes pivotal for comprehensive diagnosis and tailored therapeutic interventions. This study analyze prev lance and frequency of social anxiety symptoms taken from Mayo Clinic, exploring diverse human experiences from utilizing a large Reddit dataset dedicated to this issue. Leveraging these platforms, the research aims to extract insights and examine a spectrum of physical and emotional symptoms linked to social anxiety disorder. Upholding ethical considerations, the study maintains strict user anonymity within the dataset. By employing a novel approach, the research utilizes BART-based multi-label zero-shot classification to identify and measure symptom prevalence and significance in the form of probability score for each symptom under consideration. Results uncover distinctive patterns: “Trembling” emerges as a prevalent physical symptom, while emotional symptoms like “Fear of being judged negatively” exhibit high frequencies. These findings offer insights into the multifaceted nature of social anxiety, aiding clinical practices and interventions tailored to its diverse expressions. 2024.clpsych-1.11 rizwan-demsar-2024-prevalent + Comparing panic and anxiety on a dataset collected from social media @@ -158,6 +167,7 @@ The recognition of mental health’s crucial significance has led to a growing interest in utilizing social media text data in current research trends. However, there remains a significant gap in the study of panic and anxiety on these platforms, despite their high prevalence and severe impact. In this paper, we address this gap by presenting a dataset consisting of 1,930 user posts from Quora and Reddit specifically focusing on panic and anxiety. Through a combination of lexical analysis, emotion detection, and writer attitude assessment, we explore the unique characteristics of each condition. To gain deeper insights, we employ a mental health-specific transformer model and a large language model for qualitative analysis. Our findings not only contribute to the understanding digital discourse on anxiety and panic but also provide valuable resources for the broader research community. We make our dataset, methodologies, and code available to advance understanding and facilitate future studies. 2024.clpsych-1.12 mitrovic-etal-2024-comparing + Your Model Is Not Predicting Depression Well And That Is Why: A Case Study of <fixed-case>PRIMATE</fixed-case> Dataset @@ -168,6 +178,7 @@ This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts. While previous research relies on social media-based datasets annotated with binary categories, i.e. depressed or non-depressed, recent datasets such as D2S and PRIMATE aim for nuanced annotations using PHQ-9 symptoms. However, most of these datasets rely on crowd workers without the domain knowledge for annotation. Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through reannotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations, to be released under a Data Use Agreement, offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments. 2024.clpsych-1.13 milintsevich-etal-2024-model + Detecting a Proxy for Potential Comorbid <fixed-case>ADHD</fixed-case> in People Reporting Anxiety Symptoms from Social Media Data @@ -178,6 +189,7 @@ We present a novel task that can elucidate the connection between anxiety and ADHD; use Transformers to make progress toward solving a task that is not solvable by keyword-based classifiers; and discuss a method for visualization of our classifier illuminating the connection between anxiety and ADHD presentations. Up to approximately 50% of adults with ADHD may also have an anxiety disorder and approximately 30% of adults with anxiety may also have ADHD. Patients presenting with anxiety may be treated for anxiety without ADHD ever being considered, possibly affecting treatment. We show how data that bears on ADHD that is comorbid with anxiety can be obtained from social media data, and show that Transformers can be used to detect a proxy for possible comorbid ADHD in people with anxiety symptoms. We collected data from anxiety and ADHD online forums (subreddits). We identified posters who first started posting in the Anxiety subreddit and later started posting in the ADHD subreddit as well. We use this subset of the posters as a proxy for people who presented with anxiety symptoms and then became aware that they might have ADHD. We fine-tune a Transformer architecture-based classifier to classify people who started posting in the Anxiety subreddit and then started posting in the ADHD subreddit vs. people who posted in the Anxiety subreddit without later posting in the ADHD subreddit. We show that a Transformer architecture is capable of achieving reasonable results (76% correct for RoBERTa vs. under 60% correct for the best keyword-based model, both with 50% base rate). 2024.clpsych-1.14 lee-etal-2024-detecting + Overview of the <fixed-case>CLP</fixed-case>sych 2024 Shared Task: Leveraging Large Language Models to Identify Evidence of Suicidality Risk in Online Posts @@ -204,6 +216,7 @@ This paper presents our approach to the CLPsych 2024 shared task: utilizing large language models (LLMs) for finding supporting evidence about an individual’s suicide risk level in Reddit posts. Our framework is constructed around an LLM with knowledge self-generation and output refinement. The knowledge self-generation process produces task-related knowledge which is generated by the LLM and leads to accurate risk predictions. The output refinement process, later, with the selected best set of LLM-generated knowledge, refines the outputs by prompting the LLM repeatedly with different knowledge instances interchangeably. We achieved highly competitive results comparing to the top-performance participants with our official recall of 93.5%, recall–precision harmonic-mean of 92.3%, and mean consistency of 96.1%. 2024.clpsych-1.16 tran-matsui-2024-team + Exploring Instructive Prompts for Large Language Models in the Extraction of Evidence for Supporting Assigned Suicidal Risk Levels @@ -217,6 +230,7 @@ Monitoring and predicting the expression of suicidal risk in individuals’ social media posts is a central focus in clinical NLP. Yet, existing approaches frequently lack a crucial explainability component necessary for extracting evidence related to an individual’s mental health state. We describe the CSIRO Data61 team’s evidence extraction system submitted to the CLPsych 2024 shared task. The task aims to investigate the zero-shot capabilities of open-source LLM in extracting evidence regarding an individual’s assigned suicide risk level from social media discourse. The results are assessed against ground truth evidence annotated by psychological experts, with an achieved recall-oriented BERTScore of 0.919. Our findings suggest that LLMs showcase strong feasibility in the extraction of information supporting the evaluation of suicidal risk in social media discourse. Opportunities for refinement exist, notably in crafting concise and effective instructions to guide the extraction process. 2024.clpsych-1.17 chen-etal-2024-exploring + Psychological Assessments with Large Language Models: A Privacy-Focused and Cost-Effective Approach @@ -225,6 +239,7 @@ This study explores the use of Large Language Models (LLMs) to analyze text comments from Reddit users, aiming to achieve two primary objectives: firstly, to pinpoint critical excerpts that support a predefined psychological assessment of suicidal risk; and secondly, to summarize the material to substantiate the preassigned suicidal risk level. The work is circumscribed to the use of “open-source” LLMs that can be run locally, thereby enhancing data privacy. Furthermore, it prioritizes models with low computational requirements, making it accessible to both individuals and institutions operating on limited computing budgets. The implemented strategy only relies on a carefully crafted prompt and a grammar to guide the LLM’s text completion. Despite its simplicity, the evaluation metrics show outstanding results, making it a valuable privacy-focused and cost-effective approach. This work is part of the Computational Linguistics and Clinical Psychology (CLPsych) 2024 shared task. 2024.clpsych-1.18 blanco-cuaresma-2024-psychological + Incorporating Word Count Information into Depression Risk Summary Generation: <fixed-case>INF</fixed-case>@<fixed-case>U</fixed-case>o<fixed-case>S</fixed-case> <fixed-case>CLP</fixed-case>sych 2024 Submission @@ -234,6 +249,7 @@ Large language model classifiers do not directly offer transparency: it is not clear why one class is chosen over another. In this work, summaries explaining the suicide risk level assigned using a fine-tuned mental-roberta-base model are generated from key phrases extracted using SHAP explainability using Mistral-7B. The training data for the classifier consists of all Reddit posts of a user in the University of Maryland Reddit Suicidality Dataset, Version 2, with their suicide risk labels along with selected features extracted from each post by the Linguistic Inquiry and Word Count (LIWC-22) tool. The resulting model is used to make predictions regarding risk on each post of the users in the evaluation set of the CLPsych 2024 shared task, with a SHAP explainer used to identify the phrases contributing to the top scoring, correct and severe risk categories. Some basic stoplisting is applied to the extracted phrases, along with length based filtering, and a locally run version of Mistral-7B-Instruct-v0.1 is used to create summaries from the highest value (based on SHAP) phrases. 2024.clpsych-1.19 preiss-chen-2024-incorporating + Extracting and Summarizing Evidence of Suicidal Ideation in Social Media Contents Using Large Language Models @@ -245,6 +261,7 @@ This paper explores the use of Large Language Models (LLMs) in analyzing social media content for mental health monitoring, specifically focusing on detecting and summarizing evidence of suicidal ideation. We utilized LLMs Mixtral7bx8 and Tulu-2-DPO-70B, applying diverse prompting strategies for effective content extraction and summarization. Our methodology included detailed analysis through Few-shot and Zero-shot learning, evaluating the ability of Chain-of-Thought and Direct prompting strategies. The study achieved notable success in the CLPsych 2024 shared task (ranked top for the evidence extraction task and second for the summarization task), demonstrating the potential of LLMs in mental health interventions and setting a precedent for future research in digital mental health monitoring. 2024.clpsych-1.20 gyanendro-singh-etal-2024-extracting + Detecting Suicide Risk Patterns using Hierarchical Attention Networks with Large Language Models @@ -255,6 +272,7 @@ Suicide has become a major public health and social concern in the world . This Paper looks into a method through use of LLMs (Large Lan- guage Model) to extract the likely reason for a person to attempt suicide , through analysis of their social media text posts detailing about the event , using this data we can extract the rea- son for the cause such mental state which can provide support for suicide prevention. This submission presents our approach for CLPsych Shared Task 2024. Our model uses Hierarchi- cal Attention Networks (HAN) and Llama2 for finding supporting evidence about an individ- ual’s suicide risk level. 2024.clpsych-1.21 l-etal-2024-detecting + Using Large Language Models (<fixed-case>LLM</fixed-case>s) to Extract Evidence from Pre-Annotated Social Media Data @@ -265,6 +283,7 @@ For numerous years, researchers have employed social media data to gain insights into users’ mental health. Nevertheless, the majority of investigations concentrate on categorizing users into those experiencing depression and those considered healthy, or on detection of suicidal thoughts. In this paper, we aim to extract evidence of a pre-assigned gold label. We used a suicidality dataset containing Reddit posts labeled with the suicide risk level. The task is to use Large Language Models (LLMs) to extract evidence from the post that justifies the given label. We used Meta Llama 7b and lexicons for solving the task and we achieved a precision of 0.96. 2024.clpsych-1.22 alhamed-etal-2024-using + <fixed-case>X</fixed-case>in<fixed-case>H</fixed-case>ai@<fixed-case>CLP</fixed-case>sych 2024 Shared Task: Prompting Healthcare-oriented <fixed-case>LLM</fixed-case>s for Evidence Highlighting in Posts with Suicide Risk @@ -276,6 +295,7 @@ In this article, we introduce a new method for analyzing and summarizing posts from r/SuicideWatch on Reddit, overcoming the limitations of current techniques in processing complex mental health discussions online. Existing methods often struggle to accurately identify and contextualize subtle expressions of mental health problems, leading to inadequate support and intervention strategies. Our approach combines the open-source Large Language Model (LLM), fine-tuned with health-oriented knowledge, to effectively process Reddit posts. We also design prompts that focus on suicide-related statements, extracting key statements, and generating concise summaries that capture the core aspects of the discussions. The preliminary results indicate that our method improves the understanding of online suicide-related posts compared to existing methodologies. 2024.clpsych-1.23 zhu-etal-2024-xinhai + A Dual-Prompting for Interpretable Mental Health Language Models @@ -289,6 +309,7 @@ Despite the increasing demand for AI-based mental health monitoring tools, their practical utility for clinicians is limited by the lack of interpretability. The CLPsych 2024 Shared Task (Chim et al., 2024) aims to enhance the interpretability of Large Language Models (LLMs), particularly in mental health analysis, by providing evidence of suicidality through linguistic content. We propose a dual-prompting approach: (i) Knowledge-aware evidence extraction by leveraging the expert identity and a suicide dictionary with a mental health-specific LLM; and (ii) Evidence summarization by employing an LLM-based consistency evaluator. Comprehensive experiments demonstrate the effectiveness of combining domain-specific information, revealing performance improvements and the approach’s potential to aid clinicians in assessing mental state progression. 2024.clpsych-1.24 jeon-etal-2024-dual + Cheap Ways of Extracting Clinical Markers from Texts @@ -299,6 +320,7 @@ This paper describes the Unibuc Archaeology team work for CLPsych’s 2024 Shared Task that involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as opposed to an alternative method that is much more memory and resource efficient. The first approach employs an LLM that is used for generating the summaries and is guided to provide sequences of text indicating suicidal tendencies through a processing chain for highlights. The second approach involves implementing a good old-fashioned machine learning tf-idf with a logistic regression classifier, whose representative features we use to extract relevant highlights. 2024.clpsych-1.25 sandu-etal-2024-cheap + Utilizing Large Language Models to Identify Evidence of Suicidality Risk through Analysis of Emotionally Charged Posts @@ -309,6 +331,7 @@ This paper presents our contribution to the CLPsych 2024 shared task, focusing on the use of open-source large language models (LLMs) for suicide risk assessment through the analysis of social media posts. We achieved first place (out of 15 participating teams) in the task of providing summarized evidence of a user’s suicide risk. Our approach is based on Retrieval Augmented Generation (RAG), where we retrieve the top-k (k=5) posts with the highest emotional charge and provide the level of three different negative emotions (sadness, fear, anger) for each post during the generation phase. 2024.clpsych-1.26 uluslu-etal-2024-utilizing + Integrating Supervised Extractive and Generative Language Models for Suicide Risk Evidence Summarization @@ -318,6 +341,7 @@ We propose a method that integrates supervised extractive and generative language models for providing supporting evidence of suicide risk in the CLPsych 2024 shared task. Our approach comprises three steps. Initially, we construct a BERT-based model for estimating sentence-level suicide risk and negative sentiment. Next, we precisely identify high suicide risk sentences by emphasizing elevated probabilities of both suicide risk and negative sentiment. Finally, we integrate generative summaries using the MentaLLaMa framework and extractive summaries from identified high suicide risk sentences and a specialized dictionary of suicidal risk words. SophiaADS, our team, achieved 1st place for highlight extraction and ranked 10th for summary generation, both based on recall and consistency metrics, respectively. 2024.clpsych-1.27 tanaka-fukazawa-2024-integrating + Archetypes and Entropy: Theory-Driven Extraction of Evidence for Suicide Risk @@ -341,6 +365,7 @@ varadarajan-etal-2024-archetypes The sponsors were added in the Acknowledgement section since it was missed in the initial submission. + diff --git a/data/xml/2024.codi.xml b/data/xml/2024.codi.xml index 82e0310b12..bafe63fbb4 100644 --- a/data/xml/2024.codi.xml +++ b/data/xml/2024.codi.xml @@ -29,6 +29,7 @@ Although diagrams are fundamental to Rhetorical Structure Theory, their interpretation has received little in-depth exploration. This paper presents an algorithmic approach to accessing the meaning of these diagrams. Three algorithms are presented. The first of these, called reenactment, recreates the abstract process whereby structures are created, following the dynamic of coherence development, starting from simple relational propositions, and combing these to form complex expressions which are in turn integrated to define the comprehensive discourse organization. The second algorithm, called composition, implements Marcu’s strong nuclearity assumption. It uses a simple inference mechanism to demonstrate the reducibility of complex structures to simple relational propositions. The third algorithm, called compress, picks up where Marcu’s assumption leaves off, providing a generalized fully scalable procedure for progressive reduction of relational propositions to their simplest accessible forms. These inferred reductions may then be recycled to produce RST diagrams of abridged texts. The algorithms described here are useful in positioning computational descriptions of rhetorical structures as discursive processes, allowing researchers to go beyond static diagrams and look into their formative and interpretative significance. 2024.codi-1.1 potter-2024-algorithmic +
<fixed-case>S</fixed-case>ci<fixed-case>P</fixed-case>ara: A New Dataset for Investigating Paragraph Discourse Structure in Scientific Papers @@ -41,6 +42,7 @@ Good scientific writing makes use of specific sentence and paragraph structures, providing a rich platform for discourse analysis and developing tools to enhance text readability. In this vein, we introduce SciPara, a novel dataset consisting of 981 scientific paragraphs annotated by experts in terms of sentence discourse types and topic information. On this dataset, we explored two tasks: 1) discourse category classification, which is to predict the discourse category of a sentence by using its paragraph and surrounding paragraphs as context, and 2) discourse sentence generation, which is to generate a sentence of a certain discourse category by using various contexts as input. We found that Pre-trained Language Models (PLMs) can accurately identify Topic Sentences in SciPara, but have difficulty distinguishing Concluding, Transition, and Supporting Sentences. The quality of the sentences generated by all investigated PLMs improved with amount of context, regardless of discourse category. However, not all contexts were equally influential. Contrary to common assumptions about well-crafted scientific paragraphs, our analysis revealed that paradoxically, paragraphs with complete discourse structures were less readable. 2024.codi-1.2 kiepura-etal-2024-scipara + Using Discourse Connectives to Test Genre Bias in Masked Language Models @@ -53,6 +55,7 @@ This paper presents evidence for an effect of genre on the use of discourse connectives in argumentation. Drawing from discourse processing research on reasoning based structures, we use fill-mask computation to measure genre-induced expectations of argument realisation, and beta regression to model the probabilities of these realisations against a set of predictors. Contrasting fill-mask probabilities for the presence or absence of a discourse connective in baseline and finetuned language models reveals that genre introduces biases for the realisation of argument structure. These outcomes suggest that cross-domain discourse processing, but also argument mining, should take into account generalisations about specific features, such as connectives, and their probability related to the genre context. 2024.codi-1.3 dorgeloh-etal-2024-using + Projecting Annotations for Discourse Relations: Connective Identification for Low-Resource Languages @@ -63,6 +66,7 @@ 2024.codi-1.4 2024.codi-1.4.SupplementaryMaterial.zip bourgonje-lin-2024-projecting + Experimenting with Discourse Segmentation of <fixed-case>T</fixed-case>aiwan <fixed-case>S</fixed-case>outhern <fixed-case>M</fixed-case>in Spontaneous Speech @@ -73,6 +77,7 @@ 2024.codi-1.5 2024.codi-1.5.SupplementaryMaterial.tex prevot-wang-2024-experimenting + Actor Identification in Discourse: A Challenge for <fixed-case>LLM</fixed-case>s? @@ -84,6 +89,7 @@ 2024.codi-1.6 2024.codi-1.6.SupplementaryMaterial.gz baric-etal-2024-actor + Quantitative metrics to the <fixed-case>CARS</fixed-case> model in academic discourse in biology introductions @@ -93,6 +99,7 @@ Writing research articles is crucial in any academic’s development and is thus an important component of the academic discourse. The Introduction section is often seen as a difficult task within the research article genre. This study presents two metrics of rhetorical moves in academic writing: step-n-grams and lengths of steps. While scholars agree that expert writers follow the general pattern described in the CARS model (Swales, 1990), this study complements previous studies with empirical quantitative data that highlight how writers progress from one rhetorical function to another in practice, based on 50 recent papers by expert writers. The discussion shows the significance of the results in relation to writing instructors and data-driven learning. 2024.codi-1.7 lam-nnamoko-2024-quantitative + Probing of pretrained multilingual models on the knowledge of discourse @@ -103,6 +110,7 @@ 2024.codi-1.8 2024.codi-1.8.SupplementaryMaterial.zip godunova-voloshina-2024-probing + Feature-augmented model for multilingual discourse relation classification @@ -114,6 +122,7 @@ 2024.codi-1.9 2024.codi-1.9.SupplementaryMaterial.zip metheniti-etal-2024-feature + Complex question generation using discourse-based data augmentation @@ -125,6 +134,7 @@ 2024.codi-1.10 2024.codi-1.10.SupplementaryMaterial.zip jahangir-etal-2024-complex + Exploring Soft-Label Training for Implicit Discourse Relation Recognition @@ -134,6 +144,7 @@ This paper proposes a classification model for single label implicit discourse relation recognition trained on soft-label distributions. It follows the PDTB 3.0 framework and it was trained and tested on the DiscoGeM corpus, where it achieves an F1-score of 51.38 on third-level sense classification of implicit discourse relations. We argue that training on soft-label distributions allows the model to better discern between more ambiguous discourse relations. 2024.codi-1.11 costa-kosseim-2024-exploring + The <fixed-case>ARRAU</fixed-case> 3.0 Corpus @@ -147,6 +158,7 @@ 2024.codi-1.12 2024.codi-1.12.SupplementaryMaterial.zip poesio-etal-2024-arrau + Signals as Features: Predicting Error/Success in Rhetorical Structure Parsing @@ -157,6 +169,7 @@ 2024.codi-1.13 2024.codi-1.13.SupplementaryMaterial.zip pastor-oostdijk-2024-signals + <fixed-case>G</fixed-case>round<fixed-case>H</fixed-case>og: Dialogue Generation using Multi-Grained Linguistic Input @@ -168,6 +181,7 @@ 2024.codi-1.14 2024.codi-1.14.SupplementaryMaterial.zip chernyavskiy-etal-2024-groundhog + Discourse Relation Prediction and Discourse Parsing in Dialogues with Minimal Supervision @@ -179,6 +193,7 @@ Discourse analysis plays a crucial role in Natural Language Processing, with discourse relation prediction arguably being the most difficult task in discourse parsing. Previous studies have generally focused on explicit or implicit discourse relation classification in monologues, leaving dialogue an under-explored domain. Facing the data scarcity issue, we propose to leverage self-training strategies based on a Transformer backbone. Moreover, we design the first semi-supervised pipeline that sequentially predicts discourse structures and relations. Using 50 examples, our relation prediction module achieves 58.4 in accuracy on the STAC corpus, close to supervised state-of-the-art. Full parsing results show notable improvements compared to the supervised models both in-domain (gaming) and cross-domain (technical chat), with better stability. 2024.codi-1.15 li-etal-2024-discourse + With a Little Help from my (Linguistic) <fixed-case>F</fixed-case>riends: Topic segmentation of multi-party casual conversations @@ -189,6 +204,7 @@ 2024.codi-1.16 2024.codi-1.16.SupplementaryMaterial.zip decker-amblard-2024-little + diff --git a/data/xml/2024.cogalex.xml b/data/xml/2024.cogalex.xml new file mode 100644 index 0000000000..e312bd32a2 --- /dev/null +++ b/data/xml/2024.cogalex.xml @@ -0,0 +1,214 @@ + + + + + Proceedings of the Workshop on Cognitive Aspects of the Lexicon @ LREC-COLING 2024 + MichaelZock + EmmanueleChersoni + Yu-YinHsu + Simonde Deyne + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.cogalex-1 + cogalex + + + 2024.cogalex-1.0 + cogalex-2024-cognitive + + + <fixed-case>CLAVELL</fixed-case> - Cognitive Linguistic Annotation and Visualization Environment for Language Learning + WernerWiniwarter + 1–13 + In this paper we introduce a novel sentence annotation based on radical construction grammar and Uniform Meaning Representation, which covers all levels of linguistic analysis, from interlinear morphemic glossing to PropBank rolesets, WordNet synsets, and Wikipedia page titles as concept identifiers. We visually enhance our annotation by using images to represent concepts, emojis for thematic roles, and color-coding for constructions. The meaning representation is embedded into the syntactic parse by aligning all concepts with the surface tokens in the sentence. The main motivation for developing this type of representation was its use in second language acquisition as part of a Web-based language learning environment. In entertaining and engaging annotation tasks language students assemble the representation step-by-step following a bottom-up strategy. Based on language exposure while performing these exercises, we populate personal idiolectal constructicons representing the students’ current status of second language comprehension. As first use case, we have implemented a solution for Japanese due to its soaring popularity in our language education program and the particular challenges involved with trying to master this language. + 2024.cogalex-1.1 + winiwarter-2024-clavell + + + Individual Text Corpora Predict Openness, Interests, Knowledge and Level of Education + Markus J.Hofmann + Markus T.Jansen + ChristophWigbels + BennyBriesemeister + Arthur M.Jacobs + 14–25 + Here we examine whether the personality dimension of openness to experience can be predicted from the individual google search history. By web scraping, individual text corpora (ICs) were generated from 214 participants with a mean number of 5 million word tokens. We trained word2vec models and used the similarities of each IC to label words, which were derived from a lexical approach of personality. These IC-label-word similarities were utilized as predictive features in neural models. For training and validation, we relied on 179 participants and held out a test sample of 35 participants. A grid search with varying number of predictive features, hidden units and boost factor was performed. As model selection criterion, we used R2 in the validation samples penalized by the absolute R2 difference between training and validation. The selected neural model explained 35% of the openness variance in the test sample, while an ensemble model with the same architecture often provided slightly more stable predictions for intellectual interests, knowledge in humanities and level of education. Finally, a learning curve analysis suggested that around 500 training participants are required for generalizable predictions. We discuss ICs as a complement or replacement of survey-based psychodiagnostics. + 2024.cogalex-1.2 + hofmann-etal-2024-individual + + + An Empirical Study on Vague Deictic Temporal Adverbials + SvenjaKenneweg + Brendan BalcerakJackson + JoergDeigmoeller + JulianEggert + PhilippCimiano + 26–31 + Temporal adverbial phrases such as recently and some time ago have a special function in communication and temporal cognition. These adverbials are deictic, in that their meaning is tied to their time of utterance; and they are vague, in that the time periods to which they apply are under-specified in comparison to expressions such as yesterday, which precisely indicates the day before the day of utterance. Despite their vagueness, conversational participants have a mental image of when events described using these adverbials take place. We present a study that aims to quantify this mental model in terms of fuzzy or graded membership. To achieve this, we investigated the four English temporal adverbials recently, just, some time ago and long time ago as applied to types of events with different durations and frequencies, by conducting surveys to measure how speakers judge the different adverbials to apply in different time ranges. Our results suggest that it is possible to represent the meanings of deictic vague temporal adverbials geometrically in terms of graded membership within a temporal conceptual space. + 2024.cogalex-1.3 + kenneweg-etal-2024-empirical + + + Symbolic Learning of Rules for Semantic Relation Types Identification in <fixed-case>F</fixed-case>rench Genitive Postnominal Prepositional Phrases + HaniGuenoune + MathieuLafourcade + 32–41 + We are interested in the semantic relations conveyed by polylexical entities in the postnominal prepositional noun phrases form “A de B” (A of B). After identifying a relevant set of semantic relations types, we proceed, using generative AI, to build a collection of phrases, for each semantic relation type identified. We propose an algorithm for creating rules that allow the selection of the relation between A and B in noun phrases of each type. These rules correspond to selecting from a knowledge base the appropriate neighborhood of a given term. For the phrase “désert d’Algérie” carrying the location relation, the term “désert” is identified as a geographical location, and “Algérie” as a country. These constraints are used to automatically learn a set of rules for selecting the location relation for this type of example. Rules are not exclusive as there may be instances that fall under multiple relations. In the phrase “portrait de sa mère - the portrait of his/her mother”, all of depiction, possession, and producer types are a possible match. + 2024.cogalex-1.4 + guenoune-lafourcade-2024-symbolic + + + How Human-Like Are Word Associations in Generative Models? An Experiment in <fixed-case>S</fixed-case>lovene + ŠpelaVintar + MojcaBrglez + AlešŽagar + 42–48 + Large language models (LLMs) show extraordinary performance in a broad range of cognitive tasks, yet their capability to reproduce human semantic similarity judgements remains disputed. We report an experiment in which we fine-tune two LLMs for Slovene, a monolingual SloT5 and a multilingual mT5, as well as an mT5 for English, to generate word associations. The models are fine-tuned on human word association norms created within the Small World of Words project, which recently started to collect data for Slovene. Since our aim was to explore differences between human and model-generated outputs, the model parameters were minimally adjusted to fit the association task. We perform automatic evaluation using a set of methods to measure the overlap and ranking, and in addition a subset of human and model-generated responses were manually classified into four categories (meaning-, positionand form-based, and erratic). Results show that human-machine overlap is very small, but that the models produce a similar distribution of association categories as humans. + 2024.cogalex-1.5 + vintar-etal-2024-human + + + Idiom Complexity in Apple-Pie Order: The Disentanglement of Decomposability and Transparency + IrenePagliai + 49–55 + Both decomposability and transparency investigate the interplay between literality and figurativity in idioms. For this reason, they have often been merged. This study argues that idiom decomposability and transparency are related but conceptually different constructs, thus advocating for their distinction. Leveraging a normed lexicon of Italian and English idioms, the respective effects of decomposability and transparency on idiom meaning recognition are explored via statistical modeling. Results show the two variables contribute differently to idiom meaning recognition in the two languages, while the absence of collinearity underscores their distinct contributions. Based on this empirical evidence, the study finally proposes FrameNet and MetaNet as computational tools for modeling idiom decomposability and transparency. This study thus not only substantiates the separation of idiom decomposability and transparency, but also sets a foundation for future interdisciplinary research to bridge the gap in idiom research between empirical psycholinguistics, cognitive linguistics and computational applications. + 2024.cogalex-1.6 + pagliai-2024-idiom + + + What <fixed-case>GPT</fixed-case>-4 Knows about Aspectual Coercion: Focused on “Begin the Book” + SeohyunIm + ChungminLee + 56–67 + This paper explores whether Pre-trained Large Language Models (PLLMs) like GPT-4 can grasp profound linguistic insights into language phenomena such as Aspectual Coercion through interaction with Microsoft’s Copilot, which integrates GPT-4. Firstly, we examined Copilot’s understanding of the co-occurrence constraints of the aspectual verb “begin” and the complex-type noun “book” using the classic illustration of Aspectual Coercion, “begin the book.” Secondly, we verified Copilot’s awareness of both the default interpretation of “begin the book” with no specific context and the contextually preferred interpretation. Ultimately, Copilot provided appropriate responses regarding potential interpretations of “begin the book” based on its distributional properties and context-dependent preferred interpretations. However, it did not furnish sophisticated explanations concerning these interpretations from a linguistic theoretical perspective. On the other hand, by offering diverse interpretations grounded in distributional properties, language models like GPT-4 demonstrated their potential contribution to the refinement of linguistic theories. Furthermore, we suggested the feasibility of employing Language Models to construct language resources associated with language phenomena including Aspectual Coercion. + 2024.cogalex-1.7 + im-lee-2024-gpt + + + Can <fixed-case>GPT</fixed-case>-4 Recover Latent Semantic Relational Information from Word Associations? A Detailed Analysis of Agreement with Human-annotated Semantic Ontologies. + SimonDe Deyne + ChunhuaLiu + LeaFrermann + 68–78 + Word associations, i.e., spontaneous responses to a cue word, provide not only a window into the human mental lexicon but have also been shown to be a repository of common-sense knowledge and can underpin efforts in lexicography and the construction of dictionaries. Especially the latter tasks require knowledge about the relations underlying the associations (e.g., Taxonomic vs. Situational); however, to date, there is neither an established ontology of relations nor an effective labelling paradigm. Here, we test GPT-4’s ability to infer semantic relations for human-produced word associations. We use four human-labelled data sets of word associations and semantic features, with differing relation inventories and various levels of annotator agreement. We directly prompt GPT-4 with detailed relation definitions without further fine-tuning or training. Our results show that while GPT-4 provided a good account of higher-level classifications (e.g. Taxonomic vs Situational), prompting instructions alone cannot obtain similar performance for detailed classifications (e.g. superordinate, subordinate or coordinate relations) despite high agreement among human annotators. This suggests that latent relations can at least be partially recovered from word associations and highlights ways in which LLMs could be improved and human annotation protocols could adapted to reduce coding ambiguity. + 2024.cogalex-1.8 + de-deyne-etal-2024-gpt + + + What’s in a Name? Electrophysiological Differences in Processing Proper Nouns in <fixed-case>M</fixed-case>andarin <fixed-case>C</fixed-case>hinese + Bernard A. J.Jap + Yu-YinHsu + LaviniaSalicchi + Yu XiLi + 79–85 + The current study examines how proper names and common nouns in Chinese are cognitively processed during sentence comprehension. EEG data was recorded when participants were presented with neutral contexts followed by either a proper name or a common noun. Proper names in Chinese often consist of characters that can function independently as words or be combined with other characters to form words, potentially benefiting from the semantic features carried by each character. Using cluster-based permutation tests, we found a larger N400 for common nouns when compared to proper names. Our results suggest that the semantics of characters do play a role in facilitating the processing of proper names. This is consistent with previous behavioral findings on noun processing in Chinese, indicating that common nouns require more cognitive resources to process than proper names. Moreover, our results suggest that proper names are processed differently between alphabetic languages and Chinese language. + 2024.cogalex-1.9 + jap-etal-2024-whats + + + Cross-Linguistic Processing of Non-Compositional Expressions in <fixed-case>S</fixed-case>lavic Languages + IuliiaZaitova + IrinaStenger + Muhammad UmerButt + TaniaAvgustinova + 86–97 + This study focuses on evaluating and predicting the intelligibility of non-compositional expressions within the context of five closely related Slavic languages: Belarusian, Bulgarian, Czech, Polish, and Ukrainian, as perceived by native speakers of Russian. Our investigation employs a web-based experiment where native Russian respondents take part in free-response and multiple-choice translation tasks. Based on the previous studies in mutual intelligibility and non-compositionality, we propose two predictive factors for reading comprehension of unknown but closely related languages: 1) linguistic distances, which include orthographic and phonological distances; 2) surprisal scores obtained from monolingual Language Models (LMs). Our primary objective is to explore the relationship of these two factors with the intelligibility scores and response times of our web-based experiment. Our findings reveal that, while intelligibility scores from the experimental tasks exhibit a stronger correlation with phonological distances, LM surprisal scores appear to be better predictors of the time participants invest in completing the translation tasks. + 2024.cogalex-1.10 + zaitova-etal-2024-cross + + + Using Language Models to Unravel Semantic Development in Children’s Use of Perception Verbs + Bramvan Dijk + Max J.van Duijn + LiKloostra + MarcoSpruit + BarendBeekhuizen + 98–106 + In this short paper we employ a Language Model (LM) to gain insight into how complex semantics of a Perception Verb (PV) emerge in children. Using a Dutch LM as representation of mature language use, we find that for all ages 1) the LM accurately predicts PV use in children’s freely-told narratives; 2) children’s PV use is close to mature use; 3) complex PV meanings with attentional and cognitive aspects can be found. Our approach illustrates how LMs can be meaningfully employed in studying language development, hence takes a constructive position in the debate on the relevance of LMs in this context. + 2024.cogalex-1.11 + van-dijk-etal-2024-using + + + Representing Abstract Concepts with Images: An Investigation with Large Language Models + LudovicaCerini + AlessandroBondielli + AlessandroLenci + 107–113 + Multimodal metaphorical interpretation of abstract concepts has always been a debated problem in many research fields, including cognitive linguistics and NLP. With the dramatic improvements of Large Language Models (LLMs) and the increasing attention toward multimodal Vision-Language Models (VLMs), there has been pronounced attention on the conceptualization of abstracts. Nevertheless, a systematic scientific investigation is still lacking. This work introduces a framework designed to shed light on the indirect grounding mechanisms that anchor the meaning of abstract concepts to concrete situations (e.g. ability - a person skating), following the idea that abstracts acquire meaning from embodied and situated simulation. We assessed human and LLMs performances by a situation generation task. Moreover, we assess the figurative richness of images depicting concrete scenarios, via a text-to-image retrieval task performed on LAION-400M. + 2024.cogalex-1.12 + cerini-etal-2024-representing + + + Big-Five Backstage: A Dramatic Dataset for Characters Personality Traits & Gender Analysis + Vadim A.Porvatov + CarloStrapparava + MarinaTiuleneva + 114–119 + This paper introduces a novel textual dataset comprising fictional characters’ lines with annotations based on their gender and Big-Five personality traits. Using psycholinguistic findings, we compared texts attributed to fictional characters and real people with respect to their genders and personality traits. Our results indicate that imagined personae mirror most of the language categories observed in real people while demonstrating them in a more expressive manner. + 2024.cogalex-1.13 + porvatov-etal-2024-big + + + Interaction of Semantics and Morphology in <fixed-case>R</fixed-case>ussian Word Vectors + YuliaZinova + Rubenvan de Vijver + AnastasiaYablokova + 120–128 + In this paper we explore how morphological information can be extracted from fastText embeddings for Russian nouns. We investigate the negative effects of syncretism and propose ways of modifying the vectors that can help to find better representations for morphological functions and thus for out of vocabulary words. In particular, we look at the effect of analysing shift vectors instead of original vectors, discuss various possibilities of finding base forms to create shift vectors, and show that using only the high frequency data is beneficial when looking for structure with respect to the morphosyntactic functions in the embeddings. + 2024.cogalex-1.14 + zinova-etal-2024-interaction + + + Listen, Repeat, Decide: Investigating Pronunciation Variation in Spoken Word Recognition among <fixed-case>R</fixed-case>ussian Speakers + Vladislav IvanovichZubov + ElenaRiekhakaynen + 129–132 + Variability is one of the important features of natural speech and a challenge for spoken word recognition models and automatic speech recognition systems. We conducted two preliminary experiments aimed at finding out whether native Russian speakers regard differently certain types of pronunciation variation when the variants are equally possible according to orthoepic norms. In the first experiment, the participants had to repeat the words with three different types of pronunciation variability. In the second experiment, we focused on the assessment of words with variable and only one standard stress. Our results support the hypothesis that listeners pay the most attention to words with variable stress, less to the variability of soft and hard consonants, and even less to the presence / absence of /j/. Assessing the correct pronunciation of words with variable stress takes significantly more time than assessing words which have only one correct pronunciation variant. These preliminary results show that pronunciation variants can provide new evidence on how a listener access the mental lexicon during natural speech processing and chooses among the variants stored in it. + 2024.cogalex-1.15 + zubov-riekhakaynen-2024-listen + + + The Mental Lexicon of Communicative Fragments and Contours: The Remix N-gram Method + EmeseK. Molnár + AndreaDömötör + 133–139 + The classical mental lexicon models represented the lexicon as a list of words. Usage-based models describe the mental lexicon more dynamically, but they do not capture the real-time operation of speech production. In the linguistic model of Boris Gasparov, the notions of communicative fragment and contour can provide a comprehensive description of the diversity of linguistic experience. Fragments and contours form larger linguistic structures than words and they are recognized as a whole unit by speakers through their communicative profile. Fragments are prefabricated units that can be added to or merged with each other during speech production. The contours serve as templates for the utterances by combining specific and abstract linguistic elements. Based on this theoretical framework, our tool applies remix n-grams (combination of word forms, lemmas and POS-tags) to identify similar linguistic structures in different texts that form the basic units of the mental lexicon. + 2024.cogalex-1.16 + k-molnar-domotor-2024-mental + + + Three Studies on Predicting Word Concreteness with Embedding Vectors + MichaelFlor + 140–150 + Human-assigned concreteness ratings for words are commonly used in psycholinguistic and computational linguistic studies. Previous research has shown that such ratings can be modeled and extrapolated by using dense word-embedding representations. However, due to rater disagreement, considerable amounts of human ratings in published datasets are not reliable. We investigate how such unreliable data influences modeling of concreteness with word embeddings. Study 1 compares fourteen embedding models over three datasets of concreteness ratings, showing that most models achieve high correlations with human ratings, and exhibit low error rates on predictions. Study 2 investigates how exclusion of the less reliable ratings influences the modeling results. It indicates that improved results can be achieved when data is cleaned. Study 3 adds additional conditions over those of study 2 and indicates that the improved results hold only for the cleaned data, and that in the general case removing the less reliable data points is not useful. + 2024.cogalex-1.17 + flor-2024-three + + + Combining Neo-Structuralist and Cognitive Approaches to Semantics to Build Wordnets for Ancient Languages: Challenges and Perspectives + EricaBiagetti + MartinaGiuliani + SilviaZampetta + SilviaLuraghi + ChiaraZanchi + 151–161 + This paper addresses challenges encountered in constructing lexical databases, specifically WordNets, for three ancient Indo-European languages: Ancient Greek, Latin, and Sanskrit. The difficulties partly arise from adapting concepts and methodologies designed for modern languages to the construction of lexical resources for ancient ones. A further significant challenge arises from the goal of creating WordNets that not only adhere to a neo-structuralist relational view of meaning but also integrate Cognitive Semantics concepts, aiming for a more realistic representation of meaning. This integration is crucial for facilitating studies in diachronic semantics and lexicology, and representing meaning in such a nuanced manner becomes paramount when constructing language resources for theoretical research, rather than for applied tasks, as is the case with lexical resources for ancient languages. The paper delves into these challenges through a case study focused on the TEMPERATURE conceptual domain in the three languages. It outlines difficulties in distinguishing prototypical and non-prototypical senses, literal and non-literal ones, and, within non-literal meanings, between metaphorical and metonymic ones. Solutions adopted to address these challenges are presented, highlighting the necessity of achieving maximum granularity in meaning representation while maintaining a sustainable workflow for annotators. + 2024.cogalex-1.18 + biagetti-etal-2024-combining + + + <fixed-case>S</fixed-case>ensory<fixed-case>T</fixed-case>5: Infusing Sensorimotor Norms into T5 for Enhanced Fine-grained Emotion Classification + YuhanXia + QingqingZhao + YunfeiLong + GeXu + JiaWang + 162–174 + In traditional research approaches, sensory perception and emotion classification have traditionally been considered separate domains. Yet, the significant influence of sensory experiences on emotional responses is undeniable. The natural language processing (NLP) community has often missed the opportunity to merge sensory knowledge with emotion classification. To address this gap, we propose SensoryT5, a neurocognitive approach that integrates sensory information into the T5 (Text-to-Text Transfer Transformer) model, designed specifically for fine-grained emotion classification. This methodology incorporates sensory cues into the T5’s attention mechanism, enabling a harmonious balance between contextual understanding and sensory awareness. The resulting model amplifies the richness of emotional representations. In rigorous tests across various detailed emotion classification datasets, SensoryT5 showcases improved performance, surpassing both the foundational T5 model and current state-of-the-art works. Notably, SensoryT5’s success signifies a pivotal change in the NLP domain, highlighting the potential influence of neurocognitive data in refining machine learning models’ emotional sensitivity. + 2024.cogalex-1.19 + xia-etal-2024-sensoryt5 + +
+
diff --git a/data/xml/2024.coling.xml b/data/xml/2024.coling.xml new file mode 100644 index 0000000000..9f8310e478 --- /dev/null +++ b/data/xml/2024.coling.xml @@ -0,0 +1,51 @@ + + + + + The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) + Torino, Italia + May, 2024 + + + https://lrec-coling-2024.org + + + 2024.bucc-1 + 2024.cawl-1 + 2024.cl4health-1 + 2024.cogalex-1 + 2024.determit-1 + 2024.delite-1 + 2024.dlnld-1 + 2024.dmr-1 + 2024.ecnlp-1 + 2024.eurali-1 + 2024.finnlp-1 + 2024.games-1 + 2024.htres-1 + 2024.humeval-1 + 2024.isa-1 + 2024.ldl-1 + 2024.legal-1 + 2024.lt4hala-1 + 2024.mathnlp-1 + 2024.mwe-1 + 2024.neusymbridge-1 + 2024.nlperspectives-1 + 2024.osact-1 + 2024.parlaclarin-1 + 2024.politicalnlp-1 + 2024.rail-1 + 2024.rapid-1 + 2024.readi-1 + 2024.rfp-1 + 2024.safety4convai-1 + 2024.sigul-1 + 2024.signlang-1 + 2024.tdle-1 + 2024.trac-1 + 2024.unlp-1 + 2024.wildre-1 + + + diff --git a/data/xml/2024.computel.xml b/data/xml/2024.computel.xml index 2aa634a05e..86b3353050 100644 --- a/data/xml/2024.computel.xml +++ b/data/xml/2024.computel.xml @@ -36,6 +36,7 @@ Blackfoot is challenging for English speaking instructors and learners to acquire because it exhibits unique pitch patterns. This study presents MeTILDA (Melodic Transcription in Language Documentation and Application) as a solution to teaching pitch patterns distinct from English. Specifically, we explore ways to improve data visualization through a visualized pronunciation teaching guide called Pitch Art. The working materials can be downloaded or stored in the cloud for further use and collaboration. These features are aimed to facilitate teachers in developing curriculum for learning pronunciation, and provide students with an interactive and integrative learning environment to better understand Blackfoot language and pronunciation. 2024.computel-1.1 chen-etal-2024-cloud +
Technology and Language Revitalization: A Roadmap for the Mvskoke Language @@ -45,6 +46,7 @@ 2024.computel-1.2 2024.computel-1.2.SupplementaryMaterial.zip mainzinger-2024-technology + Investigating the productivity of Passamaquoddy medials: A computational approach @@ -83,6 +85,7 @@ Descriptive linguistics is a sub-field of linguistics that involves the collection and annotationof language resources to describe linguistic phenomena. The transcription of these resources is often described as a tedious task, and Automatic Speech Recognition (ASR) has frequently been employed to support this process. However, the typical research approach to ASR in documentary linguistics often only captures a subset of the field’s diverse reality. In this paper, we focus specifically on one type of data known as grammaticality judgment elicitation in the context of documenting Kréyòl Gwadloupéyen. We show that only a few minutes of speech is enough to fine-tune a model originally trained in French to transcribe segments in Kréyol. 2024.computel-1.6 le-ferrand-prudhommeaux-2024-automatic + Fitting a Square Peg into a Round Hole: Creating a <fixed-case>U</fixed-case>ni<fixed-case>M</fixed-case>orph dataset of Kanien’kéha Verbs @@ -111,6 +114,7 @@ We investigate the performance of state-of-the-art neural ASR systems in transcribing audio recordings for Hupa, a critically endangered language of the Hoopa Valley Tribe. We also explore the impact on ASR performance when augmenting a small dataset of gold-standard high-quality transcriptions with a) a larger dataset with transcriptions of lower quality, and b) model-generated transcriptions in a self-training approach. An evaluation of both data augmentation approaches shows that the self-training approach is competitive, producing better WER scores than models trained with no additional data and not lagging far behind models trained with additional lower quality manual transcriptions instead: the deterioration in WER score is just 4.85 points when all the additional data is used in experiments with the best performing system, Wav2Vec. These findings have encouraging implications on the use of ASR systems for transcription and language documentation efforts in the Hupa language. 2024.computel-1.9 venkateswaran-liu-2024-looking + Creating Digital Learning and Reference Resources for <fixed-case>S</fixed-case>outhern <fixed-case>M</fixed-case>ichif @@ -136,6 +140,7 @@ We present MunTTS, an end-to-end text-to-speech (TTS) system specifically for Mundari, a low-resource Indian language of the Austo-Asiatic family. Our work addresses the gap in linguistic technology for underrepresented languages by collecting and processing data to build a speech synthesis system. We begin our study by gathering a substantial dataset of Mundari text and speech and train end-to-end speech models. We also delve into the methods used for training our models, ensuring they are efficient and effective despite the data constraints. We evaluate our system with native speakers and objective metrics, demonstrating its potential as a tool for preserving and promoting the Mundari language in the digital age. 2024.computel-1.11 gumma-etal-2024-muntts + End-to-End Speech Recognition for Endangered Languages of <fixed-case>N</fixed-case>epal diff --git a/data/xml/2024.delite.xml b/data/xml/2024.delite.xml new file mode 100644 index 0000000000..7cb464c124 --- /dev/null +++ b/data/xml/2024.delite.xml @@ -0,0 +1,98 @@ + + + + + Proceedings of the First Workshop on Language-driven Deliberation Technology (DELITE) @ LREC-COLING 2024 + AnnetteHautli-Janisz + GabriellaLapesa + LucasAnastasiou + ValentinGold + Anna DeLiddo + ChrisReed + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.delite-1 + delite + + + 2024.delite-1.0 + delite-2024-language + + + <fixed-case>AQ</fixed-case>u<fixed-case>A</fixed-case> – Combining Experts’ and Non-Experts’ Views To Assess Deliberation Quality in Online Discussions Using <fixed-case>LLM</fixed-case>s + MaikeBehrendt + Stefan SylviusWagner + MarcZiegele + LenaWilms + AnkeStoll + DominiqueHeinbach + StefanHarmeling + 1–12 + Measuring the quality of contributions in political online discussions is crucial in deliberation research and computer science. Research has identified various indicators to assess online discussion quality, and with deep learning advancements, automating these measures has become feasible. While some studies focus on analyzing specific quality indicators, a comprehensive quality score incorporating various deliberative aspects is often preferred. In this work, we introduce AQuA, an additive score that calculates a unified deliberative quality score from multiple indices for each discussion post. Unlike other singular scores, AQuA preserves information on the deliberative aspects present in comments, enhancing model transparency. We develop adapter models for 20 deliberative indices, and calculate correlation coefficients between experts’ annotations and the perceived deliberativeness by non-experts to weigh the individual indices into a single deliberative score. We demonstrate that the AQuA score can be computed easily from pre-trained adapters and aligns well with annotations on other datasets that have not be seen during training. The analysis of experts’ vs. non-experts’ annotations confirms theoretical findings in the social science literature. + 2024.delite-1.1 + behrendt-etal-2024-aqua + + + A Unified <fixed-case>LLM</fixed-case>-<fixed-case>KG</fixed-case> Framework to Assist Fact-Checking in Public Deliberation + NikolaosGiarelis + CharalamposMastrokostas + NikosKaracapilidis + 13–19 + Fact-checking plays a crucial role in public deliberation by promoting transparency, accuracy, credibility, and accountability. Aiming to augment the efficiency and adoption of current public deliberation platforms, which mostly rely on the abilities of participants to meaningfully process and interpret the associated content, this paper explores the combination of deep learning and symbolic reasoning. Specifically, it proposes a framework that unifies the capabilities of Large Language Models (LLMs) and Knowledge Graphs (KGs), and reports on an experimental evaluation. This evaluation is conducted through a questionnaire asking users to assess a baseline LLM against the proposed framework, using a series of fact-checking metrics, namely readability, coverage, non-redundancy, and quality. The experimentation results are promising and confirm the potential of combining the capabilities of these two technologies in the context of public deliberation and digital democracy. + 2024.delite-1.2 + giarelis-etal-2024-unified + + + Can Text Simplification Help to Increase the Acceptance of <fixed-case>E</fixed-case>-participation? + ReginaStodden + PhillipNguyen + 20–32 + This study investigated the effect of text simplification (with and without artificial intelligence support) and the role of participants (author or reader) on the acceptance of e-participation processes. Therefore, a near-realistic experimental study with 276 participants was conducted simulating a participatory budgeting process. The results of our study show, on the one hand, that text simplification and the role of participants has no direct influence on the intention to use e-participation. Although a higher level of participation cannot be achieved by text simplification, our results also show that no negative consequences for usage intention can be expected from text simplification. On the other hand, the results show that people with reading and writing difficulties prefer text simplification for proposals in e-participation. + 2024.delite-1.3 + stodden-nguyen-2024-text + + + Pitfalls of Conversational <fixed-case>LLM</fixed-case>s on News Debiasing + IpekBaris Schlicht + DefneAltiok + MaryanneTaouk + LucieFlek + 33–38 + This paper addresses debiasing in news editing and evaluates the effectiveness of conversational Large Language Models in this task. We designed an evaluation checklist tailored to news editors’ perspectives, obtained generated texts from three popular conversational models using a subset of a publicly available dataset in media bias, and evaluated the texts according to the designed checklist. Furthermore, we examined the models as evaluator for checking the quality of debiased model outputs. Our findings indicate that none of the LLMs are perfect in debiasing. Notably, some models, including ChatGPT, introduced unnecessary changes that may impact the author’s style and create misinformation. Lastly, we show that the models do not perform as proficiently as domain experts in evaluating the quality of debiased outputs. + 2024.delite-1.4 + baris-schlicht-etal-2024-pitfalls + + + Integrating conflict prevention tools into deliberative democracy online platforms + SaraGreco + ChiaraJermini + 39–44 + This paper presents a set of preliminary guidelines for conflict prevention developed within the EU-funded research project ORBIS (“Augmenting participation, co-creation, trust and transparency in Deliberative Democracy at all scales”), whose goal is developing online platforms that enable citizens to enhance their participation in democratic processes, through open discussions around important political topics. Based on previous research on communication and argumentation in conflict resolution discourse and on the empirical analysis of discussions around deliberative democracy topics, this paper highlights recurrent interpersonal communication problems that might occur in group discussions around complex topics and that, if not handled well, can lead to conflicts; and introduces a first proposal for solutions to help, both through technology and with the assistance of human moderations, participants in such discussions to avoid the development and the escalation of conflicts. + 2024.delite-1.5 + greco-jermini-2024-integrating + + + A Hybrid Human-<fixed-case>AI</fixed-case> Approach for Argument Map Creation From Transcripts + LucasAnastasiou + AnnaDe Liddo + 45–51 + In order to overcome challenges of traditional deliberation approaches that often silo information exchange between synchronous and asynchronous modes therefore hindering effective deliberation, we present a hybrid framework combining Large Language Models (LLMs) and human-in-the-loop curation to generate argument maps from deliberation transcripts. This approach aims to enhance the efficiency and quality of the generated argument maps, promote transparency, and connect the asynchronous and synchronous deliberation modes. Finally, we outline a realistic deliberation scenario where this process can be successfully integrated. + 2024.delite-1.6 + anastasiou-de-liddo-2024-hybrid + + + Leveraging High-Precision Corpus Queries for Text Classification via Large Language Models + NathanDykes + StephanieEvert + PhilippHeinrich + MerlinHumml + LutzSchröder + 52–57 + We use query results from manually designed corpus queries for fine-tuning an LLM to identify argumentative fragments as a text mining task. The resulting model outperforms both an LLM fine-tuned on a relatively large manually annotated gold standard of tweets as well as a rule-based approach. This proof-of-concept study demonstrates the usefulness of corpus queries to generate training data for complex text categorisation tasks, especially if the targeted category has low prevalence (so that a manually annotated gold standard contains only a small number of positive examples). + 2024.delite-1.7 + dykes-etal-2024-leveraging + +
+
diff --git a/data/xml/2024.determit.xml b/data/xml/2024.determit.xml new file mode 100644 index 0000000000..deccd9099d --- /dev/null +++ b/data/xml/2024.determit.xml @@ -0,0 +1,204 @@ + + + + + Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024 + Giorgio Maria DiNunzio + FedericaVezzani + LianaErmakova + HoseinAzarbonyad + JaapKamps + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.determit-1 + determit + ws + + + 2024.determit-1.0 + determit-2024-determit + + + Reproduction of <fixed-case>G</fixed-case>erman Text Simplification Systems + ReginaStodden + 1–15 + The paper investigates the reproducibility of various approaches to automatically simplify German texts and identifies key challenges in the process. We reproduce eight sentence simplification systems including rules-based models, fine-tuned models, and prompting of autoregressive models. We highlight three main issues of reproducibility: the impossibility of reproduction due to missing details, code, or restricted access to data/models; variations in reproduction, hindering meaningful comparisons; and discrepancies in evaluation scores between reported and reproduced models. To enhance reproducibility and facilitate model comparison, we recommend the publication of model-related details, including checkpoints, code, and training methodologies. Our study also emphasizes the importance of releasing system generations, when possible, for thorough analysis and better understanding of original works. In our effort to compare reproduced models, we also create a German sentence simplification benchmark of the eight models across six test sets. Overall, the study underscores the significance of transparency, documentation, and diverse training data for advancing reproducibility and meaningful model comparison in automated German text simplification. + 2024.determit-1.1 + stodden-2024-reproduction + + + Complexity-Aware Scientific Literature Search: Searching for Relevant and Accessible Scientific Text + LianaErmakova + JaapKamps + 16–26 + Abstract: We conduct a series of experiments on ranking scientific abstracts in response to popular science queries issued by non-expert users. We show that standard IR ranking models optimized on topical relevance are indeed ignoring the individual user’s context and background knowledge. We also demonstrate the viability of complexity-aware retrieval models that retrieve more accessible relevant documents or ensure these are ranked prior to more advanced documents on the topic. More generally, our results help remove some of the barriers to consulting scientific literature by non-experts and hold the potential to promote science literacy in the general public. Lay Summary: In a world of misinformation and disinformation, access to objective evidence-based scientific information is crucial. The general public ignores scientific information due to its perceived complexity, resorting to shallow information on the web or in social media. We analyze the complexity of scientific texts retrieved for a lay person’s topic, and find a great variation in text complexity. A proof of concept complexity-aware search engine is able to retrieve both relevant and accessible scientific information for a layperson’s information need. + 2024.determit-1.2 + ermakova-kamps-2024-complexity + + + Beyond Sentence-level Text Simplification: Reproducibility Study of Context-Aware Document Simplification + JanBakker + JaapKamps + 27–38 + Previous research on automatic text simplification has focused on almost exclusively on sentence-level inputs. However, the simplification of full documents cannot be tackled by naively simplifying each sentence in isolation, as this approach fails to preserve the discourse structure of the document. Recent Context-Aware Document Simplification approaches explore various models whose input goes beyond the sentence-level. These model achieve state-of-the-art performance on the Newsela-auto dataset, which requires a difficult to obtain license to use. We replicate these experiments on an open-source dataset, namely Wiki-auto, and share all training details to make future reproductions easy. Our results validate the claim that models guided by a document-level plan outperform their standard counterparts. However, they do not support the claim that simplification models perform better when they have access to a local document context. We also find that planning models do not generalize well to out-of-domain settings. Lay Summary: We have access to unprecedented amounts of information, yet the most authoritative sources may exceed a user’s language proficiency level. Text simplification technology can change the writing style while preserving the main content. Recent paragraph-level and document-level text simplification approaches outcompete traditional sentence-level approaches, and increase the understandability of complex texts. + 2024.determit-1.3 + bakker-kamps-2024-beyond + + + Towards Automatic <fixed-case>F</fixed-case>innish Text Simplification + AnnaDmitrieva + JörgTiedemann + 39–50 + Automatic text simplification (ATS/TS) models typically require substantial parallel training data. This paper describes our work on expanding the Finnish-Easy Finnish parallel corpus and making baseline simplification models. We discuss different approaches to document and sentence alignment. After finding the optimal alignment methodologies, we increase the amount of document-aligned data 6.5 times and add a sentence-aligned version of the dataset consisting of more than twelve thousand sentence pairs. Using sentence-aligned data, we fine-tune two models for text simplification. The first is mBART, a sequence-to-sequence translation architecture proven to show good results for monolingual translation tasks. The second is the Finnish GPT model, for which we utilize instruction fine-tuning. This work is the first attempt to create simplification models for Finnish using monolingual parallel data in this language. The data has been deposited in the Finnish Language Bank (Kielipankki) and is available for non-commercial use, and the models will be made accessible through either Kielipankki or public repositories such as Huggingface or GitHub. + 2024.determit-1.4 + dmitrieva-tiedemann-2024-towards + + + A Multilingual Survey of Recent Lexical Complexity Prediction Resources through the Recommendations of the Complex 2.0 Framework + MatthewShardlow + KaiNorth + MarcosZampieri + 51–59 + Lexical complexity prediction is the NLP task aimed at using machine learning to predict the difficulty of a target word in context for a given user or user group. Multiple datasets exist for lexical complexity prediction, many of which have been published recently in diverse languages. In this survey, we discuss nine recent datasets (2018-2024) all of which provide lexical complexity prediction annotations. Particularly, we identified eight languages (French, Spanish, Chinese, German, Russian, Japanese, Turkish and Portuguese) with at least one lexical complexity dataset. We do not consider the English datasets, which have already received significant treatment elsewhere in the literature. To survey these datasets, we use the recommendations of the Complex 2.0 Framework (Shardlow et al., 2022), identifying how the datasets differ along the following dimensions: annotation scale, context, multiple token instances, multiple token annotations, diverse annotators. We conclude with future research challenges arising from our survey of existing lexical complexity prediction datasets. + 2024.determit-1.5 + shardlow-etal-2024-multilingual + + + Plain Language Summarization of Clinical Trials + PolydorosGiannouris + TheodorosMyridis + TatianaPassali + GrigoriosTsoumakas + 60–67 + Plain language summarization, or lay summarization, is an emerging natural language processing task, aiming to make scientific articles accessible to an audience of non-scientific backgrounds. The healthcare domain can greatly benefit from applications of automatic plain language summarization, as results that concern a large portion of the population are reported in large documents with complex terminology. However, existing corpora for this task are limited in scope, usually regarding conference or journal article abstracts. In this paper, we introduce the task of automated generation of plain language summaries for clinical trials, and construct CARES (Clinical Abstractive Result Extraction and Simplification), the first corresponding dataset. CARES consists of publicly available, human-written summaries of clinical trials conducted by Pfizer. Source text is identified from documents released throughout the life-cycle of the trial, and steps are taken to remove noise and select the appropriate sections. Experiments show that state-of-the-art models achieve satisfactory results in most evaluation metrics + 2024.determit-1.6 + giannouris-etal-2024-plain + + + Enhancing Lexical Complexity Prediction through Few-shot Learning with Gpt-3 + Jenny AlexandraOrtiz-Zambrano + César HumbertoEspín-Riofrío + ArturoMontejo-Ráez + 68–76 + This paper describes an experiment to evaluate the ability of the GPT-3 language model to classify terms regarding their lexical complexity. This was achieved through the creation and evaluation of different versions of the model: text-Davinci-002 y text-Davinci-003 and prompts for few-shot learning to determine the complexity of the words. The results obtained on the CompLex dataset achieve a minimum average error of 0.0856. Although this is not better than the state of the art (which is 0.0609), it is a performing and promising approach to lexical complexity prediction without the need for model fine-tuning. + 2024.determit-1.7 + ortiz-zambrano-etal-2024-enhancing + + + An Approach towards Unsupervised Text Simplification on Paragraph-Level for <fixed-case>G</fixed-case>erman Texts + LeonFruth + RobinJegan + AndreasHenrich + 77–89 + Text simplification as a research field has received attention in recent years for English and other languages, however, German text simplification techniques are lacking thus far. We present an unsupervised simplification approach for German texts using reinforcement learning (self-critical sequence training). Our main contributions are the adaption of an existing method for English, the selection and creation of German corpora for this task and the customization of rewards for particular aspects of the German language. In our paper, we describe our system and an evaluation, including still present issues and problems due to the complexity of the German language, as well as directions for future research. + 2024.determit-1.8 + fruth-etal-2024-approach + + + Simplification Strategies in <fixed-case>F</fixed-case>rench Spontaneous Speech + LucíaOrmaechea + NikosTsourakis + DidierSchwab + PierretteBouillon + BenjaminLecouteux + 90–102 + Automatic Text Simplification (ATS) aims at rewriting texts into simpler variants while preserving their original meaning, so they can be more easily understood by different audiences. While ATS has been widely used for written texts, its application to spoken language remains unexplored, even if it is not exempt from difficulty. This study aims to characterize the edit operations performed in order to simplify French transcripts for non-native speakers. To do so, we relied on a data sample randomly extracted from the Orféo-CEFC French spontaneous speech dataset. In the absence of guidelines to direct this process, we adopted an intuitive simplification approach, so as to investigate the crafted simplifications based on expert linguists’ criteria, and to compare them with those produced by a generative AI (namely, ChatGPT). The results, analyzed quantitatively and qualitatively, reveal that the most common edits are deletions, and affect oral production aspects, like restarts or hesitations. Consequently, candidate simplifications are typically register-standardized sentences that solely include the propositional content of the input. The study also examines the alignment between human- and machine-based simplifications, revealing a moderate level of agreement, and highlighting the subjective nature of the task. The findings contribute to understanding the intricacies of simplifying spontaneous spoken language. In addition, the provision of a small-scale parallel dataset derived from such expert simplifications, Propicto-Orféo-Simple, can facilitate the evaluation of speech simplification solutions. + 2024.determit-1.9 + ormaechea-etal-2024-simplification + + + <fixed-case>DARES</fixed-case>: Dataset for <fixed-case>A</fixed-case>rabic Readability Estimation of School Materials + MoEl-Haj + SultanAlmujaiwel + DamithPremasiri + TharinduRanasinghe + RuslanMitkov + 103–113 + This research introduces DARES, a dataset for assessing the readability of Arabic text in Saudi school materials. DARES compromise of 13335 instances from textbooks used in 2021 and contains two subtasks; (a) Coarse-grained readability assessment where the text is classified into different educational levels such as primary and secondary. (b) Fine-grained readability assessment where the text is classified into individual grades.. We fine-tuned five transformer models that support Arabic and found that CAMeLBERTmix performed the best in all input settings. Evaluation results showed high performance for the coarse-grained readability assessment task, achieving a weighted F1 score of 0.91 and a macro F1 score of 0.79. The fine-grained task achieved a weighted F1 score of 0.68 and a macro F1 score of 0.55. These findings demonstrate the potential of our approach for advancing Arabic text readability assessment in education, with implications for future innovations in the field. + 2024.determit-1.10 + el-haj-etal-2024-dares + + + Legal Text Reader Profiling: Evidences from Eye Tracking and Surprisal Based Analysis + Calogero J.Scozzaro + DavideColla + MatteoDelsanto + AntonioMastropaolo + EnricoMensa + LuisaRevelli + Daniele P.Radicioni + 114–124 + Reading movements and times are a precious cue to follow reader’s strategy, and to track the underlying effort in text processing. To date, many approaches are being devised to simplify texts to overcome difficulties stemming from sentences obscure, ambiguous or deserving clarification. In the legal domain, ensuring the clarity of norms and regulations is of the utmost importance, as the full understanding of such documents lies at the foundation of core social obligations and rights. This task requires determining which utterances and text excerpts are difficult for which (sort of) reader. This investigation is the aim of the present work. We propose a preliminary study based on eye-tracking data of 61 readers, with focus on individuating different reader profiles, and on predicting reading times of our readers. + 2024.determit-1.11 + scozzaro-etal-2024-legal + + + The Simplification of the Language of Public Administration: The Case of Ombudsman Institutions + GabrielGonzalez-Delgado + BorjaNavarro-Colorado + 125–133 + Language produced by Public Administrations has crucial implications in citizens’ lives. However, its syntactic complexity and the use of legal jargon, among other factors, make it difficult to be understood for laypeople and certain target audiences. The NLP task of Automatic Text Simplification (ATS) can help to the necessary simplification of this technical language. For that purpose, specialized parallel datasets of complex-simple pairs need to be developed for the training of these ATS systems. In this position paper, an on-going project is presented, whose main objectives are (a) to extensively analyze the syntactical, lexical, and discursive features of the language of English-speaking ombudsmen, as samples of public administrative language, with special attention to those characteristics that pose a threat to comprehension, and (b) to develop the OmbudsCorpus, a parallel corpus of complex-simple supra-sentential fragments from ombudsmen’s case reports that have been manually simplified by professionals and annotated with standardized simplification operations. This research endeavor aims to provide a deeper understanding of the simplification process and to enhance the training of ATS systems specialized in administrative texts. + 2024.determit-1.12 + gonzalez-delgado-navarro-colorado-2024-simplification + + + Term Variation in Institutional Languages: Degrees of Specialization in Municipal Waste Management Terminology + NicolaCirillo + DanielaVellutino + 134–140 + Institutional Italian is a variety of Italian used in the official communications of institutions, especially in public administrations. Besides legal and administrative languages, it comprises the language used in websites, social media and advertising material produced by public administrations. To understand the lexical profile of institutional languages completely, standard measures of lexical complexity, like the type-token ratio and the percentage of basic vocabulary, should be complemented with the examination of the terminological variation. This study compares the terminology of three types of institutional texts: administrative acts, technical-operational texts, and informative texts. In particular, we collected 86 terms with various degrees of specialization and analysed their distribution within the subcorpora of ItaIst-DdAC_GRU, a corpus composed of institutional texts drafted by Italian municipalities about municipal waste management. Results suggest that administrative acts employ high-specialization terms compliant with the law, often in the form of acronyms. Conversely, informative texts contain more low-specialization terms, privileging single-word terms to remain self-contained. Finally, the terminology of technical-operational texts is characterised by standardized and formulaic phrases. + 2024.determit-1.13 + cirillo-vellutino-2024-term + + + <fixed-case>LARGEMED</fixed-case>: A Resource for Identifying and Generating Paraphrases for <fixed-case>F</fixed-case>rench Medical Terms + IoanaBuhnila + AmaliaTodirascu + 141–151 + This article presents a method extending an existing French corpus of paraphrases of medical terms ANONYMOUS with new data from Web archives created during the Covid-19 pandemic. Our method semi-automatically detects new terms and paraphrase markers introducing paraphrases from these Web archives, followed by a manual annotation step to identify paraphrases and their lexical and semantic properties. The extended large corpus LARGEMED could be used for automatic medical text simplification for patients and their families. To automatise data collection, we propose two experiments. The first experiment uses the new LARGEMED dataset to train a binary classifier aiming to detect new sentences containing possible paraphrases. The second experiment aims to use correct paraphrases to train a model for paraphrase generation, by adapting T5 Language Model to the paraphrase generation task using an adversarial algorithm. + 2024.determit-1.14 + buhnila-todirascu-2024-largemed + + + Clearer Governmental Communication: Text Simplification with <fixed-case>C</fixed-case>hat<fixed-case>GPT</fixed-case> Evaluated by Quantitative and Qualitative Research + NadineBeks van Raaij + DaanKolkman + KseniaPodoynitsyna + 152–178 + This research investigates the application of ChatGPT for the simplification of Dutch government letters, aiming to enhance their comprehensibility without compromising legal accuracy. We use a three-stage mixed method evaluation procedure to compare the performance of a naive approach, RoBERTA, and ChatGPT. We select the six most complicated letters from a corpus of 200 letters and use the three approaches to simplify them. First, we compare their scores on four evaluation metrics (ROUGE, BLEU, BLEURT, and LiNT), then we assess the simplifications with a legal and linguistic expert. Finally we investigate the performance of ChatGPT in a randomized controlled trial with 72 participants. Our findings reveal that ChatGPT significantly improves the readability of government letters, demonstrating over a 20% increase in comprehensibility scores and a 19% increase in correct question answering among participants. We also demonstrate the importance of a robust evaluation procedure. + 2024.determit-1.15 + beks-van-raaij-etal-2024-clearer + + + Legal Science and Compute Science: A Preliminary Discussions on How to Represent the “Penumbra” Cone with <fixed-case>AI</fixed-case> + AngelaCondello + Giorgio MariaDi Nunzio + 179–184 + Legal science encounters significant challenges with the widespread integration of AI software across various legal operations. The distinction between signs, senses, and references from a linguistic point of view, as drawn by Gottlob Frege, underscores the complexity of legal language, especially in multilingual contexts like the European Union. In this paper, we describe the problems of legal terminology, examining the “penumbra” problem through Herbert Hart’s legal theory of meaning. We also analyze the feasibility of training automatic systems to handle conflicts between different interpretations of legal norms, particularly in multilingual legal systems. By examining the transformative impact of Artificial Intelligence on traditional legal practices, this research contributes to the theoretical discussion about the exploration of innovative methodologies for simplifying complex terminologies without compromising meaning. + 2024.determit-1.16 + condello-di-nunzio-2024-legal + + + Simpler Becomes Harder: Do <fixed-case>LLM</fixed-case>s Exhibit a Coherent Behavior on Simplified Corpora? + MiriamAnschütz + EdoardoMosca + GeorgGroh + 185–195 + Text simplification seeks to improve readability while retaining the original content and meaning. Our study investigates whether pre-trained classifiers also maintain such coherence by comparing their predictions on both original and simplified inputs. We conduct experiments using 11 pre-trained models, including BERT and OpenAI’s GPT 3.5, across six datasets spanning three languages. Additionally, we conduct a detailed analysis of the correlation between prediction change rates and simplification types/strengths. Our findings reveal alarming inconsistencies across all languages and models. If not promptly addressed, simplified inputs can be easily exploited to craft zero-iteration model-agnostic adversarial attacks with success rates of up to 50%. + 2024.determit-1.17 + anschutz-etal-2024-simpler + + + Pre-Gamus: Reducing Complexity of Scientific Literature as a Support against Misinformation + NicoColic + Jin-DongKim + FabioRinaldi + 196–201 + Scientific literature encodes a wealth of knowledge relevant to various users. However, the complexity of scientific jargon makes it inaccessible to all but domain specialists. It would be helpful for different types of people to be able to get at least a gist of a paper. Biomedical practitioners often find it difficult to keep up with the information load; but even lay people would benefit from scientific information, for example to dispel medical misconceptions. Besides, in many countries, familiarity with English is limited, let alone scientific English, even among professionals. All this points to the need for simplified access to the scientific literature. We thus present an application aimed at solving this problem, which is capable of summarising scientific text in a way that is tailored to specific types of users, and in their native language. For this objective, we used an LLM that our system queries using user-selected parameters. We conducted an informal evaluation of this prototype using a questionnaire in 3 different languages. + 2024.determit-1.18 + colic-etal-2024-pre + +
+
diff --git a/data/xml/2024.dlnld.xml b/data/xml/2024.dlnld.xml new file mode 100644 index 0000000000..68a2ac7460 --- /dev/null +++ b/data/xml/2024.dlnld.xml @@ -0,0 +1,104 @@ + + + + + Proceedings of the Workshop on Deep Learning and Linked Data (DLnLD) @ LREC-COLING 2024 + GillesSérasset + Hugo GonçaloOliveira + Giedre ValunaiteOleskeviciene + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.dlnld-1 + dlnld + ws + + + 2024.dlnld-1.0 + dlnld-2024-deep + + + Investigating the Impact of Different Graph Representations for Relation Extraction with Graph Neural Networks + MoritzBlum + GennaroNolano + BasilEll + PhilippCimiano + 1–13 + Graph Neural Networks(GNNs) have been applied successfully to various NLP tasks, particularly Relation Extraction(RE). Even though most of these approaches rely on the syntactic dependency tree of a sentence to derive a graph representation, the impact of this choice compared to other possible graph representations has not been evaluated. We examine the effect of representing text though a graph of different graph representations for GNNs that are applied to RE, considering, e.g., a fully connected graph of tokens, of semantic role structures, and combinations thereof. We further examine the impact of background knowledge injection from Knowledge Graphs(KGs) into the graph representation to achieve enhanced graph representations. Our results show that combining multiple graph representations can improve the model’s predictions. Moreover, the integration of background knowledge positively impacts scores, as enhancing the text graphs with Wikidata features or WordNet features can lead to an improvement of close to 0.1 points in F1. + 2024.dlnld-1.1 + blum-etal-2024-investigating + + + <fixed-case>T</fixed-case>axo<fixed-case>C</fixed-case>ritic: Exploring Credit Assignment in Taxonomy Induction with Multi-Critic Reinforcement Learning + InjySarhan + BendegúzToth + PabloMosteiro + ShihanWang + 14–30 + Taxonomies can serve as a vital foundation for several downstream tasks such as information retrieval and question answering, yet manual construction limits coverage and full potential. Automatic taxonomy induction, particularly using deep Reinforcement Learning (RL), is underexplored in Natural Language Processing (NLP). To address this gap, we present TaxoCritic, a novel approach that leverages deep multi-critic RL agents for taxonomy induction while incorporating credit assignment mechanisms. Our system uniquely assesses different sub-actions within the induction process, providing a granular analysis that aids in the precise attribution of credit and blame. We evaluate the effectiveness of multi-critic algorithms in experiments regarding both accuracy and robustness performance in edge identification. By providing a detailed comparison with state-of-the-art models and highlighting the strengths and limitations of our method, we aim to contribute to the ongoing + 2024.dlnld-1.2 + sarhan-etal-2024-taxocritic + + + Combining Deep Learning Models and Lexical Linked Data: Some Insights from the Development of a Multilingual News Named Entity Recognition and Linking Dataset + EmmanuelCartier + EmilePeetermans + 31–44 + This paper presents the methodology and outcomes of a Named Entity Recognition and Linking multilingual news benchmark that leverages both Deep learning approaches by using a fine-tuned transformer model to detect mentions of persons, locations and organisations in text, and Linguistic Linked Open Data, through the use of Wikidata to disambiguate mentions and link them to ontology entries. It shows all the advantages of combining both approaches, not only for building the benchmark but also for fine-tuning detection models. We also insist on several perspectives of research to improve the accuracy of a combining system and go further on leveraging the complementary approaches. + 2024.dlnld-1.3 + cartier-peetermans-2024-combining + + + Deductive Verification of <fixed-case>LLM</fixed-case> Generated <fixed-case>SPARQL</fixed-case> Queries + AlexandreRademaker + GuilhermeLima + Sandro RamaFiorini + Viviane Torresda Silva + 45–52 + Considering the increasing applications of Large Language Models (LLMs) to many natural language tasks, this paper presents preliminary findings on developing a verification component for detecting hallucinations of an LLM that produces SPARQL queries from natural language questions. We suggest a logic-based deductive verification of the generated SPARQL query by checking if the original NL question’s deep semantic representation entails the SPARQL’s semantic representation. + 2024.dlnld-1.4 + rademaker-etal-2024-deductive + + + How to Turn Card Catalogs into <fixed-case>LLM</fixed-case> Fodder + Mary AnnTan + ShufanJiang + HaraldSack + 53–65 + Bibliographical metadata collections describing pre-modern objects suffer from incompleteness and inaccuracies. This hampers the identification of literary works. In addition, titles often contain voluminous descriptive texts that do not adhere to contemporary title conventions. This paper explores several NLP approaches where greater textual length in titles is leveraged to enhance descriptive information. + 2024.dlnld-1.5 + tan-etal-2024-turn + + + Evaluating Large Language Models for Linguistic Linked Data Generation + Maria Piadi Buono + BlerinaSpahiu + VerginicaBarbu Mititelu + 66–75 + Large language models (LLMs) have revolutionized human-machine interaction with their ability to converse and perform various language tasks. This study investigates the potential of LLMs for knowledge formalization using well-defined vocabularies, specifically focusing on OntoLex-Lemon. As a preliminary exploration, we test four languages (English, Italian, Albanian, Romanian) and analyze the formalization quality of nine words with varying characteristics applying a multidimensional evaluation approach. While manual validation provided initial insights, it highlights the need for developing scalable evaluation methods for future large-scale experiments. This research aims to initiate a discussion on the potential and challenges of utilizing LLMs for knowledge formalization within the Semantic Web framework. + 2024.dlnld-1.6 + di-buono-etal-2024-evaluating + + + Towards Automated Evaluation of Knowledge Encoded in Large Language Models + Bruno Carlos LuísFerreira + CatarinaSilva + HugoGonçalo Oliveira + 76–85 + Large Language Models (LLMs) have a significant user base and are gaining increasing interest and impact across various domains. Given their expanding influence, it is crucial to implement appropriate guardrails or controls to ensure ethical and responsible use. In this paper, we propose to automate the evaluation of the knowledge stored in LLMs. This is achieved by generating datasets tailored for this specific purpose, in any selected domain. Our approach consists of four major steps: (i) extraction of relevant entities; (ii) gathering of domain properties; (iii) dataset generation; and (iv) model evaluation. In order to materialize this vision, tools and resources were experimented for entity linking, knowledge acquisition, classification and prompt generation, yielding valuable insights and lessons. The generation of datasets for domain specific model evaluation has successfully proved that the approach can be a future tool for evaluating and moving LLMs “black-boxes” to human-interpretable knowledge bases. + 2024.dlnld-1.7 + ferreira-etal-2024-towards + + + Self-Evaluation of Generative <fixed-case>AI</fixed-case> Prompts for Linguistic Linked Open Data Modelling in Diachronic Analysis + FlorentinaArmaselu + ChayaLiebeskind + GiedreValunaite Oleskeviciene + 86–91 + This article addresses the question of evaluating generative AI prompts designed for specific tasks such as linguistic linked open data modelling and refining of word embedding results. The prompts were created to assist the pre-modelling phase in the construction of LLODIA, a linguistic linked open data model for diachronic analysis. We present a self-evaluation framework based on the method known in literature as LLM-Eval. The discussion includes prompts related to the RDF-XML conception of the model, and neighbour list refinement, dictionary alignment and contextualisation for the term revolution in French, Hebrew and Lithuanian, as a proof of concept. + 2024.dlnld-1.8 + armaselu-etal-2024-self + +
+
diff --git a/data/xml/2024.dmr.xml b/data/xml/2024.dmr.xml new file mode 100644 index 0000000000..ed29b4041a --- /dev/null +++ b/data/xml/2024.dmr.xml @@ -0,0 +1,201 @@ + + + + + Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024 + ClaireBonial + JuliaBonn + Jena D.Hwang + ELRA and ICCL +
Torino, Italia
+ May + 2024 + 2024.dmr-1 + dmr + ws + + + 2024.dmr-1.0 + dmr-2024-international + + + <fixed-case>P</fixed-case>rop<fixed-case>B</fixed-case>ank-Powered Data Creation: Utilizing Sense-Role Labelling to Generate Disaster Scenario Data + Mollie FrancesShichman + ClaireBonial + Taylor A.Hudson + AustinBlodgett + FrancisFerraro + RachelRudinger + 1–10 + For human-robot dialogue in a search-and-rescue scenario, a strong knowledge of the conditions and objects a robot will face is essential for effective interpretation of natural language instructions. In order to utilize the power of large language models without overwhelming the limited storage capacity of a robot, we propose PropBank-Powered Data Creation. PropBank-Powered Data Creation is an expert-in-the-loop data generation pipeline which creates training data for disaster-specific language models. We leverage semantic role labeling and Rich Event Ontology resources to efficiently develop seed sentences for fine-tuning a smaller, targeted model that could operate onboard a robot for disaster relief. We developed 32 sentence templates, which we used to make 2 seed datasets of 175 instructions for earthquake search and rescue and train derailment response. We further leverage our seed datasets as evaluation data to test our baseline fine-tuned models. + 2024.dmr-1.1 + shichman-etal-2024-propbank + + + Aspect Variability and the Annotation of Aspect in the <fixed-case>IMAGACT</fixed-case> Ontology of Action + MassimoMoneglia + RossellaVarvara + 11–19 + This paper highlights some theoretical and quantitative issues related to the representation and annotation of aspectual meaning in the IMAGACT corpus-based multimodal ontology of action. Given the multimodal nature of this ontology, in which actions are represented through both prototypical visual scenes and linguistic captions, the annotation of aspect in this resource allows us to draw some important considerations about the relation between aspectual meaning and eventualities. The annotation procedure is reported and quantitative data show that, both in the English and Italian corpora, many verbs present aspectual variation, and many eventualities can be represented by locally equivalent verbs with different aspect. The reason why verb aspectual class may vary is investigated. Our analysis makes once more evident that verbs may vary their aspectual properties with respect not only to their argument structure but, more precisely, to the inner qualities of the eventualities they express. Crucially, when eventualities are expressed by equivalent verbs with different aspectual properties, the verbs put on focus different parts of the structure of the eventuality. + 2024.dmr-1.2 + moneglia-varvara-2024-aspect + + + <fixed-case>N</fixed-case>o<fixed-case>VR</fixed-case>ol: A semantic role lexicon of <fixed-case>N</fixed-case>orwegian verbs + HenrikTorgersen + Erlend Ø.Ravnanger + LarsHellan + DagHaug + 20–29 + In this paper, we describe NoVRol, a semantic role lexicon of Norwegian verbs. We start from the NorVal valency lexicon, which describes the syntactic frames of 7.400 verbs. We then enrich each of these frames by annotating, based on the VerbNet annotation scheme, each argument of the verb with the semantic role that it gets. We also encode the syntactic roles of the arguments based on the UD annotation scheme. Our resource will faciliate future research on Norwegian verbs, and can at a future stage be expanded to a full VerbNet + 2024.dmr-1.3 + torgersen-etal-2024-novrol + + + Expanding <fixed-case>R</fixed-case>ussian <fixed-case>P</fixed-case>rop<fixed-case>B</fixed-case>ank: Challenges and Insights for Developing New <fixed-case>SRL</fixed-case> Resources + SkatjeMyers + RomanKhamov + AdamPollins + RebekahTozier + OlgaBabko-Malaya + MarthaPalmer + 30–38 + Semantic role labeling (SRL) resources, such as Proposition Bank (PropBank), provide useful input to downstream applications. In this paper we present some challenges and insights we learned while expanding the previously developed Russian PropBank. This new effort involved annotation and adjudication of all predicates within a subset of the prior work in order to provide a test corpus for future applications. We discuss a number of new issues that arose while developing our PropBank for Russian as well as our solutions. Framing issues include: distinguishing between morphological processes that warrant new frames, differentiating between modal verbs and predicate verbs, and maintaining accurate representations of a given language’s semantics. Annotation issues include disagreements derived from variability in Universal Dependency parses and semantic ambiguity within the text. Finally, we demonstrate how Russian sentence structures reveal inherent limitations to PropBank’s ability to capture semantic data. These discussions should prove useful to anyone developing a PropBank or similar SRL resources for a new language. + 2024.dmr-1.4 + myers-etal-2024-expanding + + + Unveiling Semantic Information in Sentence Embeddings + LeixinZhang + DavidBurian + VojtěchJohn + OndřejBojar + 39–47 + This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult. + 2024.dmr-1.5 + zhang-etal-2024-unveiling + + + A Quantum Theory of Terms and New Challenges to Meaning Representation of Quanterms + DiegoBurgos + 48–53 + This article discusses the challenges to meaning representation of terms posed by a quantum theory of terms (QTT) that was recently reported. We first summarize this theory and then highlight the difficulties of representing quanterms, which is the name we coined for the view that the QTT has of terms as quantum systems by analogy with quantum objects in quantum mechanics. We briefly summarize the representation practices followed to date to record and represent terminology. We use findings reported in the literature to model both terms and quanterms and found that current representations of terms in specialized repositories are collapsed quanterms at the expense of other states of the original quanterm. In this work, both quanterms and collapsed quanterms are mathematically modelled following formulations used in quantum mechanics. These formulations suggest that representations of quanterms need to include information about the probabilities of quanterm states and the role they play in the entanglement of terms for phenomena such as specialized collocations. + 2024.dmr-1.6 + burgos-2024-quantum + + + <fixed-case>VOLARE</fixed-case> - Visual Ontological <fixed-case>LA</fixed-case>nguage <fixed-case>RE</fixed-case>presentation + WernerWiniwarter + 54–65 + In this paper, we introduce a novel meaning representation, which is based on AMR but extends it towards a visual ontological representation. We visualize concepts by representative images, and roles by emojis. All concepts are identified either by PropBank rolesets, Wikipedia page titles, WordNet synsets, or Wikidata lexeme senses. We have developed a Web-based annotation environment enabled by augmented browsing and interactive diagramming. As first application, we have implemented a multilingual annotation solution by using English as anchor language and comparing it with French and Japanese language versions. Therefore, we have extended our representation by a translation deviation annotation to document the differences between the language versions. The intended user groups are, besides professional translators and interpreters, students of translation, language, and literary studies. We describe a first use case in which we use novels by French authors and compare them with their English and Japanese translations. The main motivation for choosing Japanese is the soaring popularity of Japanese courses at our university and the particular challenges involved with trying to master this language. + 2024.dmr-1.7 + winiwarter-2024-volare + + + <fixed-case>YARN</fixed-case> is All You Knit: Encoding Multiple Semantic Phenomena with Layers + SiyanaPavlova + MaximeAmblard + BrunoGuillaume + 66–76 + In this paper, we present the first version of YARN, a new semantic representation formalism. We propose this new formalism to unify the advantages of logic-based formalisms while retaining direct interpretation, making it widely usable. YARN is rooted in the encoding of different semantic phenomena as separate layers. We begin by presenting a formal definition of the mathematical structure that constitutes YARN. We then illustrate with concrete examples how this structure can be used in the context of semantic representation for encoding multiple phenomena (such as modality, negation and quantification) as layers built on top of a central predicate-argument structure. The benefit of YARN is that it allows for the independent annotation and analysis of different phenomena as they are easy to “switch off”. Furthermore, we have explored YARN’s ability to encode simple interactions between phenomena. We wrap up the work presented by a discussion of some of the interesting observations made during the development of YARN so far and outline our extensive future plans for this formalism. + 2024.dmr-1.8 + pavlova-etal-2024-yarn + + + Argument Sharing in Meaning Representation Parsing + MajaBuljan + StephanOepen + LiljaØvrelid + 77–87 + We present a contrastive study of argument sharing across three graph-based meaning representation frameworks, where semantically shared arguments manifest as reentrant graph nodes. For a state-of-the-art graph parser, we observe how parser performance – in terms of output quality – covaries with overall graph complexity, on the one hand, and presence of different types of reentrancies, on the other hand. We identify common linguistic phenomena that give rise to shared arguments, and therefore node reentrancies, through a small-case and partially automated annotation study and parallel error anaylsis of actual parser outputs. Our results provide new insights into the distribution of different types of reentrancies in meaning representation graphs for three distinct frameworks, as well as on the effects that these structures have on parser performance, thus suggesting both novel cross-framework generalisations as well as avenues for focussed parser development. + 2024.dmr-1.9 + buljan-etal-2024-argument + + + Mapping <fixed-case>P</fixed-case>rop<fixed-case>B</fixed-case>ank Argument Labels to <fixed-case>C</fixed-case>zech Verbal Valency + JanHajič + EvaFučíková + MarketaLopatkova + ZdeňkaUrešová + 88–100 + For many years, there has been attempts to compare predicate-argument labeling schemas between formalism, typically under the dependency assumptions (even if the annotation by these schemas could have been performed on either constituent-based specifications or dependency ones). Given the growing number of resources that link various lexical resources to one another, as well as thanks to parallel annotated corpora (with or without annotation), it is now possible to do more in-depth studies of those correspondences. We present here a high-coverage pilot study of mapping the labeling system used in PropBank (for English) to Czech, which has so far used mainly valency lexicons (in several closely related forms) for annotation projects, under a different level of specification and different theoretical assumptions. The purpose of this study is both theoretical (comparing the argument labeling schemes) and practical (to be able to annotate Czech under the standard UMR specifications). + 2024.dmr-1.10 + hajic-etal-2024-mapping + + + Lexicalized Meaning Representation (<fixed-case>LMR</fixed-case>) + JorgeBaptista + SóniaReis + JoãoDias + PedroSantos + 101–111 + This paper presents an adaptation of the Abstract Meaning Representation (AMR) framework for European Portuguese. This adaptation, referred to as Lexicalized Meaning Representation (LMR), was deemed necessary to address specific challenges posed by the grammar of the language, as well as various linguistic issues raised by the current version of AMR annotation guidelines. Some of these aspects stemmed from the use of a notation similar to AMR to represent real texts from the legal domain, enabling its use in Natural Language Processing (NLP) applications. In this context, several aspects of AMR were significantly simplified (e.g., the representation of multi-word expressions, named entities, and temporal expressions), while others were introduced, with efforts made to maintain the representation scheme as compatible as possible with standard AMR notation. + 2024.dmr-1.11 + baptista-etal-2024-lexicalized + + + Adjudicating <fixed-case>LLM</fixed-case>s as <fixed-case>P</fixed-case>rop<fixed-case>B</fixed-case>ank Adjudicators + JuliaBonn + HarishTayyar Madabushi + Jena D.Hwang + ClaireBonial + 112–123 + We evaluate the ability of large language models (LLMs) to provide PropBank semantic role label annotations across different realizations of the same verbs in transitive, intransitive, and middle voice constructions. In order to assess the meta-linguistic capabilities of LLMs as well as their ability to glean such capabilities through in-context learning, we evaluate the models in a zero-shot setting, in a setting where it is given three examples of another verb used in transitive, intransitive, and middle voice constructions, and finally in a setting where it is given the examples as well as the correct sense and roleset information. We find that zero-shot knowledge of PropBank annotation is almost nonexistent. The largest model evaluated, GPT-4, achieves the best performance in the setting where it is given both examples and the correct roleset in the prompt, demonstrating that larger models can ascertain some meta-linguistic capabilities through in-context learning. However, even in this setting, which is simpler than the task of a human in PropBank annotation, the model achieves only 48% accuracy in marking numbered arguments correctly. To ensure transparency and reproducibility, we publicly release our dataset and model responses. + 2024.dmr-1.12 + bonn-etal-2024-adjudicating + + + Extending <fixed-case>V</fixed-case>erb<fixed-case>N</fixed-case>et’s Verb-Specific Features to Enhance Selectional Preferences of Semantic Roles + Susan WindischBrown + 124–130 + This work proposes expanding the thematic role selectional preferences used in the lexical resource VerbNet as a way to increase the available semantic information in the resource, induce semantically-based subclasses for the more generic VerbNet classes, and create new links across classes. The addition of verb-specific features in the latest version of VerbNet provides a means for adding more specific selectional preferences based on the meaning of a class’s individual member verbs. These features could refine both the instantiated class roles and the new implicit roles introduced in VerbNet version 4. We suggest 49 classes that would benefit from 111 verb-specific selectional preferences and explain how they would enhance VerbNet’s semantic representations. + 2024.dmr-1.13 + brown-2024-extending + + + <fixed-case>C</fixed-case>hinese <fixed-case>UMR</fixed-case> annotation: Can <fixed-case>LLM</fixed-case>s help? + HaiboSun + NianwenXue + JinZhao + LiuluYue + YaoSun + KeerXu + JiaweiWu + 131–139 + We explore using LLMs, GPT-4 specifically, to generate draft sentence-level Chinese Uniform Meaning Representations (UMRs) that human annotators can revise to speed up the UMR annotation process. In this study, we use few-shot learning and Think-Aloud prompting to guide GPT-4 to generate sentence-level graphs of UMR. Our experimental results show that compared with annotating UMRs from scratch, using LLMs as a preprocessing step reduces the annotation time by two thirds on average. This indicates that there is great potential for integrating LLMs into the pipeline for complicated semantic annotation tasks. + 2024.dmr-1.14 + sun-etal-2024-chinese + + + Accelerating <fixed-case>UMR</fixed-case> Adoption: Neuro-Symbolic Conversion from <fixed-case>AMR</fixed-case>-to-<fixed-case>UMR</fixed-case> with Low Supervision + Claire BenetPost + Marie C.McGregor + Maria LeonorPacheco + AlexisPalmer + 140–150 + Despite Uniform Meaning Representation’s (UMR) potential for cross-lingual semantics, limited annotated data has hindered its adoption. There are large datasets of English AMRs (Abstract Meaning Representations), but the process of converting AMR graphs to UMR graphs is non-trivial. In this paper we address a complex piece of that conversion process, namely cases where one AMR role can be mapped to multiple UMR roles through a non-deterministic process. We propose a neuro-symbolic method for role conversion, integrating animacy parsing and logic rules to guide a neural network, and minimizing human intervention. On test data, the model achieves promising accuracy, highlighting its potential to accelerate AMR-to-UMR conversion. Future work includes expanding animacy parsing, incorporating human feedback, and applying the method to broader aspects of conversion. This research demonstrates the benefits of combining symbolic and neural approaches for complex semantic tasks. + 2024.dmr-1.15 + post-etal-2024-accelerating + + + The Relative Clauses <fixed-case>AMR</fixed-case> Parsers Hate Most + XiulinYang + NathanSchneider + 151–161 + This paper evaluates how well English Abstract Meaning Representation parsers process an important and frequent kind of Long-Distance Dependency construction, namely, relative clauses (RCs). On two syntactically parsed datasets, we evaluate five AMR parsers at recovering the semantic reentrancies triggered by different syntactic subtypes of relative clauses. Our findings reveal a general difficulty among parsers at predicting such reentrancies, with recall below 64% on the EWT corpus. The sequence-to-sequence models (regardless of whether structural biases were included in training) outperform the compositional model. An analysis by relative clause subtype shows that passive subject RCs are the easiest, and oblique and reduced RCs the most challenging, for AMR parsers. + 2024.dmr-1.16 + yang-schneider-2024-relative + + + Gaining More Insight into Neural Semantic Parsing with Challenging Benchmarks + XiaoZhang + ChunliuWang + Rikvan Noord + JohanBos + 162–175 + The Parallel Meaning Bank (PMB) serves as a corpus for semantic processing with a focus on semantic parsing and text generation. Currently, we witness an excellent performance of neural parsers and generators on the PMB. This might suggest that such semantic processing tasks have by and large been solved. We argue that this is not the case and that performance scores from the past on the PMB are inflated by non-optimal data splits and test sets that are too easy. In response, we introduce several changes. First, instead of the prior random split, we propose a more systematic splitting approach to improve the reliability of the standard test data. Second, except for the standard test set, we also propose two challenge sets: one with longer texts including discourse structure, and one that addresses compositional generalization. We evaluate five neural models for semantic parsing and meaning-to-text generation. Our results show that model performance declines (in some cases dramatically) on the challenge sets, revealing the limitations of neural models when confronting such challenges. + 2024.dmr-1.17 + zhang-etal-2024-gaining + +
+
diff --git a/data/xml/2024.dravidianlangtech.xml b/data/xml/2024.dravidianlangtech.xml index aedafda436..27d20a311c 100644 --- a/data/xml/2024.dravidianlangtech.xml +++ b/data/xml/2024.dravidianlangtech.xml @@ -31,6 +31,7 @@ Accented speech classification plays a vital role in the advancement of high-quality automatic speech recognition (ASR) technology. For certain applications, like multi-accented speech classification, it is not always viable to obtain data with accent variation, especially for resource-poor languages. This is one of the major reasons that contributes to the underperformance of the speech classification systems. Therefore, in order to handle speech variability in Indian language speaker accents, we propose a few-shot learning paradigm in this study. It learns generic feature embeddings using an encoder from a pre-trained whisper model and a classification head for classification. The model is refined using LLM’s fine-tuning techniques, such as LoRA and QLoRA, for the six Indian English accents in the Indic Accent Dataset. The experimental findings show that the accuracy of the model is greatly increased by the few-shot learning paradigm’s effectiveness combined with LLM’s fine-tuning techniques. In optimal settings, the model’s accuracy can reach 94% when the trainable parameters are set to 5%. 2024.dravidianlangtech-1.1 r-etal-2024-shot +
Neural Machine Translation for <fixed-case>M</fixed-case>alayalam Paraphrase Generation @@ -41,6 +42,7 @@ This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. We evaluate the resulting paraphrases using both automated metrics, such as BLEU, METEOR, and cosine similarity, as well as human annotation. Our findings suggest that automated evaluation measures may not be fully appropriate for Malayalam, as they do not consistently align with human judgment. This discrepancy underscores the need for more nuanced paraphrase evaluation approaches especially for highly agglutinative languages. 2024.dravidianlangtech-1.2 varghese-etal-2024-neural + From Dataset to Detection: A Comprehensive Approach to Combating <fixed-case>M</fixed-case>alayalam Fake News @@ -54,6 +56,7 @@ Identifying fake news hidden as real news is crucial to fight misinformation and ensure reliable information, especially in resource-scarce languages like Malayalam. To recognize the unique challenges of fake news in languages like Malayalam, we present a dataset curated specifically for classifying fake news in Malayalam. This fake news is categorized based on the degree of misinformation, marking the first of its kind in this language. Further, we propose baseline models employing multilingual BERT and diverse machine learning classifiers. Our findings indicate that logistic regression trained on LaBSE features demonstrates promising initial performance with an F1 score of 0.3393. However, addressing the significant data imbalance remains essential for further improvement in model accuracy. 2024.dravidianlangtech-1.3 k-etal-2024-dataset + Social Media Fake News Classification Using Machine Learning Algorithm @@ -65,6 +68,7 @@ The rise of social media has facilitated easier communication, information sharing, and current affairs updates. However, the prevalence of misleading and deceptive content, commonly referred to as fake news, poses a significant challenge. This paper focuses on the classification of fake news in Malayalam, a Dravidian language, utilizing natural language processing (NLP) techniques. To develop a model, we employed a random forest machine learning method on a dataset provided by a shared task(DravidianLangTech@EACL 2024)1. When evaluated by the separate test dataset, our developed model achieved a 0.71 macro F1 measure. 2024.dravidianlangtech-1.4 bade-etal-2024-social + Exploring the impact of noise in low-resource <fixed-case>ASR</fixed-case> for <fixed-case>T</fixed-case>amil @@ -74,6 +78,7 @@ The use of deep learning algorithms has resulted in significant progress in automatic speech recognition (ASR). Robust high-accuracy ASR models typically require thousands or tens of thousands of hours of speech data, but even the strongest models tend fail under noisy conditions. Unsurprisingly, the impact of noise on accuracy is more drastic in low-resource settings. In this paper, we investigate the impact of noise on ASR in a low-resource setting. We explore novel methods for developing noise-robust ASR models using a a small dataset for Tamil, a widely-spoken but under-resourced Dravidian languages. We add various noises to the audio data to determine the impact of different kinds of noise (e.g., punctuated vs. constant, man-made vs natural) We also explore the relationship between different data augmentation methods are better suited to handling different types of noise. Our results show that all noises, regardless of the type, had an impact on ASR performance, and that upgrading the architecture alone could not mitigate the impact of noise. SpecAugment, the most common data augmentation method, was not as helpful as raw data augmentation, in which noise is explicitly added to the training data. Raw data augmentation enhances ASR performance on both clean data and noise-mixed data. 2024.dravidianlangtech-1.5 lakshminarayanan-prudhommeaux-2024-exploring + <fixed-case>S</fixed-case>et<fixed-case>F</fixed-case>it: A Robust Approach for Offensive Content Detection in <fixed-case>T</fixed-case>amil-<fixed-case>E</fixed-case>nglish Code-Mixed Conversations Using Sentence Transfer Fine-tuning @@ -86,6 +91,7 @@ Code-mixed languages are increasingly prevalent on social media and online platforms, presenting significant challenges in offensive content detection for natural language processing (NLP) systems. Our study explores how effectively the Sentence Transfer Fine-tuning (Set-Fit) method, combined with logistic regression, detects offensive content in a Tamil-English code-mixed dataset. We compare our model’s performance with five other NLP models: Multilingual BERT (mBERT), LSTM, BERT, IndicBERT, and Language-agnostic BERT Sentence Embeddings (LaBSE). Our model, SetFit, outperforms these models in accuracy, achieving an impressive 89.72%, significantly higher than other models. These results suggest the sentence transformer model’s substantial potential for detecting offensive content in codemixed languages. Our study provides valuable insights into the sentence transformer model’s ability to identify various types of offensive material in Tamil-English online conversations, paving the way for more advanced NLP systems tailored to code-mixed languages. 2024.dravidianlangtech-1.6 pannerselvam-etal-2024-setfit + Findings of the First Shared Task on Offensive Span Identification from Code-Mixed <fixed-case>K</fixed-case>annada-<fixed-case>E</fixed-case>nglish Comments @@ -98,6 +104,7 @@ Effectively managing offensive content is crucial on social media platforms to encourage positive online interactions. However, addressing offensive contents in code-mixed Dravidian languages faces challenges, as current moderation methods focus on flagging entire comments rather than pinpointing specific offensive segments. This limitation stems from a lack of annotated data and accessible systems designed to identify offensive language sections. To address this, our shared task presents a dataset comprising Kannada-English code-mixed social comments, encompassing offensive comments. This paper outlines the dataset, the utilized algorithms, and the results obtained by systems participating in this shared task. 2024.dravidianlangtech-1.7 ravikiran-etal-2024-findings + Findings of the Shared Task on Hate and Offensive Language Detection in <fixed-case>T</fixed-case>elugu Codemixed Text (<fixed-case>HOLD</fixed-case>-<fixed-case>T</fixed-case>elugu)@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024 @@ -112,6 +119,7 @@ This paper examines the submissions of various participating teams to the task on Hate and Offensive Language Detection in Telugu Codemixed Text (HOLD-Telugu) organized as part of DravidianLangTech 2024. In order to identify the contents containing harmful information in Telugu codemixed social media text, the shared task pushes researchers and academicians to build models. The dataset for the task was created by gathering YouTube comments and annotated manually. A total of 23 teams participated and submitted their results to the shared task. The rank list was created by assessing the submitted results using the macro F1-score. 2024.dravidianlangtech-1.8 b-etal-2024-findings + Findings of the Shared Task on Multimodal Social Media Data Analysis in <fixed-case>D</fixed-case>ravidian Languages (<fixed-case>MSMDA</fixed-case>-<fixed-case>DL</fixed-case>)@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024 @@ -131,6 +139,7 @@ This paper presents the findings of the shared task on multimodal sentiment analysis, abusive language detection and hate speech detection in Dravidian languages. Through this shared task, researchers worldwide can submit models for three crucial social media data analysis challenges in Dravidian languages: sentiment analysis, abusive language detection, and hate speech detection. The aim is to build models for deriving fine-grained sentiment analysis from multimodal data in Tamil and Malayalam, identifying abusive and hate content from multimodal data in Tamil. Three modalities make up the multimodal data: text, audio, and video. YouTube videos were gathered to create the datasets for the tasks. Thirty-nine teams took part in the competition. However, only two teams, though, turned in their findings. The macro F1-score was used to assess the submissions 2024.dravidianlangtech-1.9 b-etal-2024-findings-shared + Overview of Second Shared Task on Sentiment Analysis in Code-mixed <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu @@ -148,6 +157,7 @@ Sentiment Analysis (SA) in Dravidian codemixed text is a hot research area right now. In this regard, the “Second Shared Task on SA in Code-mixed Tamil and Tulu” at Dravidian- LangTech (EACL-2024) is organized. Two tasks namely SA in Tamil-English and Tulu- English code-mixed data, make up this shared assignment. In total, 64 teams registered for the shared task, out of which 19 and 17 systems were received for Tamil and Tulu, respectively. The performance of the systems submitted by the participants was evaluated based on the macro F1-score. The best method obtained macro F1-scores of 0.260 and 0.584 for code-mixed Tamil and Tulu texts, respectively. 2024.dravidianlangtech-1.10 sambath-kumar-etal-2024-overview + Overview of the Second Shared Task on Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages: <fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech@<fixed-case>EACL</fixed-case> 2024 @@ -168,6 +178,7 @@ The rise of online social media has revolutionized communication, offering users a convenient way to share information and stay updated on current events. However, this surge in connectivity has also led to the proliferation of misinformation, commonly known as fake news. This misleading content, often disguised as legitimate news, poses a significant challenge as it can distort public perception and erode trust in reliable sources. This shared task consists of two subtasks such as task 1 and task 2. Task 1 aims to classify a given social media text into original or fake. The goal of the FakeDetect-Malayalam task2 is to encourage participants to develop effective models capable of accurately detecting and classifying fake news articles in the Malayalam language into different categories like False, Half True, Mostly False, Partly False, and Mostly True. For this shared task, 33 participants submitted their results. 2024.dravidianlangtech-1.11 subramanian-etal-2024-overview + byte<fixed-case>S</fixed-case>ized<fixed-case>LLM</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages - Unleashing the Power of Custom Subword Tokenization with <fixed-case>S</fixed-case>ubword2<fixed-case>V</fixed-case>ec and <fixed-case>B</fixed-case>i<fixed-case>LSTM</fixed-case> @@ -177,6 +188,7 @@ This paper focuses on detecting fake news in resource-constrained languages, particularly Malayalam. We present a novel framework combining subword tokenization, Sanskrit-transliterated Subword2vec embeddings, and a powerful Bidirectional Long Short-Term Memory (BiLSTM) architecture. Despite using only monolingual Malayalam data, our model excelled in the FakeDetect-Malayalam challenge, ranking 4th. The innovative subword tokenizer achieves a remarkable 200x compression ratio, highlighting its efficiency in minimizing model size without compromising accuracy. Our work facilitates resource-efficient deployment in diverse linguistic landscapes and sparks discussion on the potential of multilingual data augmentation. This research provides a promising avenue for mitigating linguistic challenges in the NLP-driven battle against deceptive content. 2024.dravidianlangtech-1.12 kodali-manukonda-2024-bytesizedllm + Fida @<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: A Novel Approach to Hate Speech Detection Using Distilbert-base-multilingual-cased @@ -190,6 +202,7 @@ In the contemporary digital landscape, social media has emerged as a prominent means of communication and information dissemination, offering a rapid outreach to a broad audience compared to traditional communication methods. Unfortunately, the escalating prevalence of abusive language and hate speech on these platforms has become a pressing issue. Detecting and addressing such content on the Internet has garnered considerable attention due to the significant impact it has on individuals. The advent of deep learning has facilitated the use of pre-trained deep neural network models for text classification tasks. While these models demonstrate high performance, some exhibit a substantial number of parameters. In the DravidianLangTech@EACL 2024 task, we opted for the Distilbert-base-multilingual-cased model, an enhancement of the BERT model that effectively reduces the number of parameters without compromising performance. This model was selected based on its exceptional results in the task. Our system achieved a commendable Macro F1 score of 0.6369%. 2024.dravidianlangtech-1.13 ullah-etal-2024-fida + Selam@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024:Identifying Hate Speech and Offensive Language @@ -211,6 +224,7 @@ This study goes into our team’s active participation in the Hate and Offensive Language Detection in Telugu Codemixed Text (HOLDTelugu) shared task, which is an essential component of the DravidianLangTech@EACL 2024 workshop. The ultimate goal of this collaborative work is to push the bounds of hate speech recognition, especially tackling the issues given by codemixed text in Telugu, where English blends smoothly. Our inquiry offers a complete evaluation of the task’s aims, the technique used, and the precise achievements obtained by our team, providing a full insight into our contributions to this crucial linguistic and technical undertaking. 2024.dravidianlangtech-1.15 achamaleh-etal-2024-tewodros + Lidoma@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Identifying Hate Speech in <fixed-case>T</fixed-case>elugu Code-Mixed: A <fixed-case>BERT</fixed-case> Multilingual @@ -223,6 +237,7 @@ Over the past few years, research on hate speech and offensive content identification on social media has been ongoing. Since most people in the world are not native English speakers, unapproved messages are typically sent in code-mixed language. We accomplished collaborative work to identify the language of code-mixed text on social media in order to address the difficulties associated with it in the Telugu language scenario. Specifically, we participated in the shared task on the provided dataset by the Dravidian- LangTech Organizer for the purpose of identifying hate and non-hate content. The assignment is to classify each sentence in the provided text into two predetermined groups: hate or non-hate. We developed a model in Python and selected a BERT multilingual to do the given task. Using a train-development data set, we developed a model, which we then tested on test data sets. An average macro F1 score metric was used to measure the model’s performance. For the task, the model reported an average macro F1 of 0.6151. 2024.dravidianlangtech-1.16 zamir-etal-2024-lidoma + Zavira@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024:<fixed-case>T</fixed-case>elugu hate speech detection using <fixed-case>LSTM</fixed-case> @@ -234,6 +249,7 @@ Hate speech is communication, often oral or written, that incites, stigmatizes, or incites violence or prejudice against individuals or groups based on characteristics such as race, religion, ethnicity, gender, sexual orientation, or other protected characteristics. This usually involves expressions of hostility, contempt, or prejudice and can have harmful social consequences.Among the broader social landscape, an important problem and challenge facing the medical community is related to the impact of people’s verbal expression. These words have a significant and immediate effect on human behavior and psyche. Repeating such phrases can even lead to depression and social isolation.In an attempt to identify and classify these Telugu text samples in the social media domain, our research LSTM and the findings of this experiment are summarized in this paper, in which out of 27 participants, we obtained 8th place with an F1 score of 0.68. 2024.dravidianlangtech-1.17 ahani-etal-2024-zavira + Tayyab@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024:Detecting Fake News in <fixed-case>M</fixed-case>alayalam <fixed-case>LSTM</fixed-case> Approach and Challenges @@ -246,6 +262,7 @@ Global communication has been made easier by the emergence of online social media, but it has also made it easier for “fake news,” or information that is misleading or false, to spread. Since this phenomenon presents a significant challenge, reliable detection techniques are required to discern between authentic and fraudulent content. The primary goal of this study is to identify fake news on social media platforms and in Malayalam-language articles by using LSTM (Long Short-Term Memory) model. This research explores this approach in tackling the DravidianLangTech@EACL 2024 tasks. Using LSTM networks to differentiate between real and fake content at the comment or post level, Task 1 focuses on classifying social media text. To precisely classify the authenticity of the content, LSTM models are employed, drawing on a variety of sources such as comments on YouTube. Task 2 is dubbed the FakeDetect-Malayalam challenge, wherein Malayalam-language articles with fake news are identified and categorized using LSTM models. In order to successfully navigate the challenges of identifying false information in regional languages, we use lstm model. This algoritms seek to accurately categorize the multiple classes written in Malayalam. In Task 1, the results are encouraging. LSTM models distinguish between orignal and fake social media content with an impressive macro F1 score of 0.78 when testing. The LSTM model’s macro F1 score of 0.2393 indicates that Task 2 offers a more complex landscape. This emphasizes the persistent difficulties in LSTM-based fake news detection across various linguistic contexts and the difficulty of correctly classifying fake news within the context of the Malayalam language. 2024.dravidianlangtech-1.18 zamir-etal-2024-tayyab + <fixed-case>IIITDWD</fixed-case>_<fixed-case>SVC</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-2024: Breaking Language Barriers; Hate Speech Detection in <fixed-case>T</fixed-case>elugu-<fixed-case>E</fixed-case>nglish Code-Mixed Text @@ -257,6 +274,7 @@ Social media platforms have become increasingly popular and are utilized for a wide range of purposes, including product promotion, news sharing, accomplishment sharing, and much more. However, it is also employed for defamatory speech, intimidation, and the propagation of untruths about particular groups of people. Further, hateful and offensive posts spread quickly and often have a negative impact on people; it is important to identify and remove them from social media platforms as soon as possible. Over the past few years, research on hate speech detection and offensive content has grown in popularity. One of the many difficulties in identifying hate speech on social media platforms is the use of code-mixed language. The majority of people who use social media typically share their messages in languages with mixed codes, like Telugu–English. To encourage research in this direction, the organizers of DravidianLangTech@EACL-2024 conducted a shared task to identify hateful content in Telugu-English code-mixed text. Our team participated in this shared task, employing three different models: Xlm-Roberta, BERT, and Hate-BERT. In particular, our BERT-based model secured the 14th rank in the competition with a macro F1 score of 0.65. 2024.dravidianlangtech-1.19 sai-etal-2024-iiitdwd + Beyond Tech@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech2024 : Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages Using Machine Learning @@ -269,6 +287,7 @@ In the digital age, identifying fake news is essential when fake information travels quickly via social media platforms. This project employs machine learning techniques, including Random Forest, Logistic Regression, and Decision Tree, to distinguish between real and fake news. With the rise of news consumption on social media, it becomes essential to authenticate information shared on platforms like YouTube comments. The research emphasizes the need to stop spreading harmful rumors and focuses on authenticating news articles. The proposed model utilizes machine learning and natural language processing, specifically Support Vector Machines, to aggregate and determine the authenticity of news. To address the challenges of detecting fake news in this paper, describe the Machine Learning (ML) models submitted to ‘Fake News Detection in Dravidian Languages” at DravidianLangTech@EACL 2024 shared task. Four different models, namely: Naive Bayes, Support Vector Machine (SVM), Random forest, and Decision tree. 2024.dravidianlangtech-1.20 shanmugavadivel-etal-2024-beyond + <fixed-case>C</fixed-case>ode_<fixed-case>M</fixed-case>akers@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case> 2024 : Sentiment Analysis in Code-Mixed <fixed-case>T</fixed-case>amil using Machine Learning Techniques @@ -280,6 +299,7 @@ The rising importance of sentiment analysis online community research is addressed in our project, which focuses on the surge of code-mixed writing in multilingual social media. Targeting sentiments in texts combining Tamil and English, our supervised learning approach, particularly the Decision Tree algorithm, proves essential for effective sentiment classification. Notably, Decision Tree(accuracy: 0.99, average F1 score: 0.39), Random Forest exhibit high accuracy (accuracy: 0.99, macro average F1 score : 0.35), SVM (accuracy: 0.78, macro average F1 score : 0.68), Logistic Regression (accuracy: 0.75, macro average F1 score: 0.62), KNN (accuracy: 0.73, macro average F1 score : 0.26) also demonstrate commendable results. These findings showcase the project’s efficacy, offering promise for linguistic research and technological advancements. Securing the 8th rank emphasizes its recognition in the field. 2024.dravidianlangtech-1.21 shanmugavadivel-etal-2024-code + <fixed-case>IIITDWD</fixed-case>-zk@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-2024: Leveraging the Power of Language Models for Hate Speech Detection in <fixed-case>T</fixed-case>elugu-<fixed-case>E</fixed-case>nglish Code-Mixed Text @@ -291,6 +311,7 @@ Hateful online content is a growing concern, especially for young people. While social media platforms aim to connect us, they can also become breeding grounds for negativity and harmful language. This study tackles this issue by proposing a novel framework called HOLD-Z, specifically designed to detect hate and offensive comments in Telugu-English code-mixed social media content. HOLD-Z leverages a combination of approaches, including three powerful models: LSTM architecture, Zypher, and openchat_3.5. The study highlights the effectiveness of prompt engineering and Quantized Low-Rank Adaptation (QLoRA) in boosting performance. Notably, HOLD-Z secured the 9th place in the prestigious HOLD-Telugu DravidianLangTech@EACL-2024 shared task, showcasing its potential for tackling the complexities of hate and offensive comment classification. 2024.dravidianlangtech-1.22 shaik-etal-2024-iiitdwd + <fixed-case>DLRG</fixed-case>-<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech@<fixed-case>EACL</fixed-case>2024 : Combating Hate Speech in <fixed-case>T</fixed-case>elugu Code-mixed Text on Social Media @@ -324,6 +345,7 @@ In recent years, there has been a persistent focus on developing systems that can automatically identify the hate speech content circulating on diverse social media platforms. This paper describes the team Transformers’ submission to the Caste/Immigration Hate Speech Detection in Tamil shared task by LT-EDI 2024 workshop at EACL 2024. We used an ensemble approach in the shared task, combining various transformer-based pre-trained models using majority voting. The best macro average F1-score achieved was 0.82. We secured the 1st rank in the Caste/Immigration Hate Speech in Tamil shared task. 2024.dravidianlangtech-1.25 singhal-bedi-2024-transformers-dravidianlangtech + Habesha@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Detecting Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages using Deep Learning @@ -335,6 +357,7 @@ This research tackles the issue of fake news by utilizing the RNN-LSTM deep learning method with optimized hyperparameters identified through grid search. The model’s performance in multi-label classification is hindered by unbalanced data, despite its success in binary classification. We achieved a score of 0.82 in the binary classification task, whereas in the multi-class task, the score was 0.32. We suggest incorporating data balancing techniques for researchers who aim to further this task, aiming to improve results in managing a variety of information. 2024.dravidianlangtech-1.26 yigezu-etal-2024-habesha + <fixed-case>W</fixed-case>ord<fixed-case>W</fixed-case>izards@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024:Fake News Detection in <fixed-case>D</fixed-case>ravidian Languages using Cross-lingual Sentence Embeddings @@ -347,6 +370,7 @@ The proliferation of fake news in digital media has become a significant societal concern, impacting public opinion, trust, and decision-making. This project focuses on the development of machine learning models for the detection of fake news. Leveraging a dataset containing both genuine and deceptive news articles, the proposed models employ natural language processing techniques, feature extraction and classification algorithms. This paper provides a solution to Fake News Detection in Dravidian Languages - DravidianLangTech 2024. There are two sub tasks: Task 1 - The goal of this task is to classify a given social media text into original or fake. We propose an approach for this with the help of a supervised machine learning model – SVM (Support Vector Machine). The SVM classifier achieved a macro F1 score of 0.78 in test data and a rank 11. The Task 2 is classifying fake news articles in Malayalam language into different categories namely False, Half True, Mostly False, Partly False and Mostly True.We have used Naive Bayes which achieved macro F1-score 0.3517 in test data and a rank 6. 2024.dravidianlangtech-1.27 anbalagan-etal-2024-wordwizards + Sandalphon@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case>2024: Hate and Offensive Language Detection in <fixed-case>T</fixed-case>elugu Code-mixed Text using Transliteration-Augmentation @@ -359,6 +383,7 @@ Hate and offensive language in online platforms pose significant challenges, necessitating automatic detection methods. Particularly in the case of codemixed text, which is very common in social media, the complexity of this problem increases due to the cultural nuances of different languages. DravidianLangTech-EACL2024 organized a shared task on detecting hate and offensive language for Telugu. To complete this task, this study investigates the effectiveness of transliteration-augmented datasets for Telugu code-mixed text. In this work, we compare the performance of various machine learning (ML), deep learning (DL), and transformer-based models on both original and augmented datasets. Experimental findings demonstrate the superiority of transformer models, particularly Telugu-BERT, achieving the highest f_1-score of 0.77 on the augmented dataset, ranking the 1^{st} position in the leaderboard. The study highlights the potential of transliteration-augmented datasets in improving model performance and suggests further exploration of diverse transliteration options to address real-world scenarios. 2024.dravidianlangtech-1.28 tabassum-etal-2024-sandalphon + <fixed-case>CUET</fixed-case>_<fixed-case>B</fixed-case>inary_<fixed-case>H</fixed-case>ackers@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech <fixed-case>EACL</fixed-case>2024: Fake News Detection in <fixed-case>M</fixed-case>alayalam Language Leveraging Fine-tuned <fixed-case>M</fixed-case>u<fixed-case>RIL</fixed-case> <fixed-case>BERT</fixed-case> @@ -374,6 +399,7 @@ Due to technological advancements, various methods have emerged for disseminating news to the masses. The pervasive reach of news, however, has given rise to a significant concern: the proliferation of fake news. In response to this challenge, a shared task in Dravidian- LangTech EACL2024 was initiated to detect fake news and classify its types in the Malayalam language. The shared task consisted of two sub-tasks. Task 1 focused on a binary classification problem, determining whether a piece of news is fake or not. Whereas task 2 delved into a multi-class classification problem, categorizing news into five distinct levels. Our approach involved the exploration of various machine learning (RF, SVM, XGBoost, Ensemble), deep learning (BiLSTM, CNN), and transformer-based models (MuRIL, Indic- SBERT, m-BERT, XLM-R, Distil-BERT) by emphasizing parameter tuning to enhance overall model performance. As a result, we introduce a fine-tuned MuRIL model that leverages parameter tuning, achieving notable success with an F1-score of 0.86 in task 1 and 0.5191 in task 2. This successful implementation led to our system securing the 3rd position in task 1 and the 1st position in task 2. The source code will be found in the GitHub repository at this link: https://github.com/Salman1804102/ DravidianLangTech-EACL-2024-FakeNews. 2024.dravidianlangtech-1.29 farsi-etal-2024-cuet-binary + <fixed-case>P</fixed-case>unny_<fixed-case>P</fixed-case>unctuators@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case>2024: Transformer-based Approach for Detection and Classification of Fake News in <fixed-case>M</fixed-case>alayalam Social Media Text @@ -387,6 +413,7 @@ The alarming rise of fake news on social media poses a significant threat to public discourse and decision-making. While automatic detection of fake news offers a promising solution, research in low-resource languages like Malayalam often falls behind due to limited data and tools. This paper presents the participation of team Punny_Punctuators in the Fake News Detection in Dravidian Languages shared task at DravidianLangTech@EACL 2024, addressing this gap. The shared task focuses on two sub-tasks: 1. classifying social media texts as original or fake, and 2. categorizing fake news into 5 categories. We experimented with various machine learning (ML), deep learning (DL) and transformer-based models as well as processing techniques such as transliteration. Malayalam-BERT achieved the best performance on both sub-tasks, which obtained us 2^{nd} place with a macro f_1-score of 0.87 for the subtask-1 and 11^{th} place with a macro f_1-score of 0.17 for the subtask-2. Our results highlight the potential of transformer models for low-resource languages in fake news detection and pave the way for further research in this crucial area. 2024.dravidianlangtech-1.30 tabassum-etal-2024-punny + <fixed-case>CUET</fixed-case>_<fixed-case>NLP</fixed-case>_<fixed-case>G</fixed-case>ood<fixed-case>F</fixed-case>ellows@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech <fixed-case>EACL</fixed-case>2024: A Transformer-Based Approach for Detecting Fake News in <fixed-case>D</fixed-case>ravidian Languages @@ -400,6 +427,7 @@ In this modern era, many people have been using Facebook and Twitter, leading to increased information sharing and communication. However, a considerable amount of information on these platforms is misleading or intentionally crafted to deceive users, which is often termed as fake news. A shared task on fake news detection in Malayalam organized by DravidianLangTech@EACL 2024 allowed us for addressing the challenge of distinguishing between original and fake news content in the Malayalam language. Our approach involves creating an intelligent framework to categorize text as either fake or original. We experimented with various machine learning models, including Logistic Regression, Decision Tree, Random Forest, Multinomial Naive Bayes, SVM, and SGD, and various deep learning models, including CNN, BiLSTM, and BiLSTM + Attention. We also explored Indic-BERT, MuRIL, XLM-R, and m-BERT for transformer-based approaches. Notably, our most successful model, m-BERT, achieved a macro F1 score of 0.85 and ranked 4th in the shared task. This research contributes to combating misinformation on social media news, offering an effective solution to classify content accurately. 2024.dravidianlangtech-1.31 osama-etal-2024-cuet + <fixed-case>CUET</fixed-case>_<fixed-case>B</fixed-case>inary_<fixed-case>H</fixed-case>ackers@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech <fixed-case>EACL</fixed-case>2024: Hate and Offensive Language Detection in <fixed-case>T</fixed-case>elugu Code-Mixed Text Using Sentence Similarity <fixed-case>BERT</fixed-case> @@ -413,6 +441,7 @@ With the continuous evolution of technology and widespread internet access, various social media platforms have gained immense popularity, attracting a vast number of active users globally. However, this surge in online activity has also led to a concerning trend by driving many individuals to resort to posting hateful and offensive comments or posts, publicly targeting groups or individuals. In response to these challenges, we participated in this shared task. Our approach involved proposing a fine-tuning-based pre-trained transformer model to effectively discern whether a given text contains offensive content that propagates hatred. We conducted comprehensive experiments, exploring various machine learning (LR, SVM, and Ensemble), deep learning (CNN, BiLSTM, CNN+BiLSTM), and transformer-based models (Indic-SBERT, m- BERT, MuRIL, Distil-BERT, XLM-R), adhering to a meticulous fine-tuning methodology. Among the models evaluated, our fine-tuned L3Cube-Indic-Sentence-Similarity- BERT or Indic-SBERT model demonstrated superior performance, achieving a macro-average F1-score of 0.7013. This notable result positioned us at the 6th place in the task. The implementation details of the task will be found in the GitHub repository. 2024.dravidianlangtech-1.32 farsi-etal-2024-cuet-binary-hackers + <fixed-case>T</fixed-case>ech<fixed-case>W</fixed-case>hiz@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Fake News Detection Using Deep Learning Models @@ -424,6 +453,7 @@ The ever-evolving landscape of online social media has initiated a transformative phase in communication, presenting unprecedented opportunities alongside inherent challenges. The pervasive issue of false information, commonly termed fake news, has emerged as a significant concern within these dynamic platforms. This study delves into the domain of Fake News Detection, with a specific focus on Malayalam. Utilizing advanced transformer models like mBERT, ALBERT, and XMLRoBERTa, our research proficiently classifies social media text into original or fake categories. Notably, our proposed model achieved commendable results, securing a rank of 3 in Task 1 with macro F1 scores of 0.84 using mBERT, 0.56 using ALBERT, and 0.84 using XMLRoBERTa. In Task 2, the XMLRoBERTa model excelled with a rank of 12, attaining a macro F1 score of 0.21, while mBERT and BERT achieved scores of 0.16 and 0.11, respectively. This research aims to develop robust systems capable of discerning authentic from deceptive content, a crucial endeavor in maintaining information reliability on social media platforms amid the rampant spread of misinformation. 2024.dravidianlangtech-1.33 m-etal-2024-techwhiz + <fixed-case>CUET</fixed-case>_<fixed-case>B</fixed-case>inary_<fixed-case>H</fixed-case>ackers@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case> 2024: Sentiment Analysis using Transformer-Based Models in Code-Mixed and Transliterated <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu @@ -437,6 +467,7 @@ Textual Sentiment Analysis (TSA) delves into people’s opinions, intuitions, and emotions regarding any entity. Natural Language Processing (NLP) serves as a technique to extract subjective knowledge, determining whether an idea or comment leans positive, negative, neutral, or a mix thereof toward an entity. In recent years, it has garnered substantial attention from NLP researchers due to the vast availability of online comments and opinions. Despite extensive studies in this domain, sentiment analysis in low-resourced languages such as Tamil and Tulu needs help handling code-mixed and transliterated content. To address these challenges, this work focuses on sentiment analysis of code-mixed and transliterated Tamil and Tulu social media comments. It explored four machine learning (ML) approaches (LR, SVM, XGBoost, Ensemble), four deep learning (DL) methods (BiLSTM and CNN with FastText and Word2Vec), and four transformer-based models (m-BERT, MuRIL, L3Cube-IndicSBERT, and Distilm-BERT) for both languages. For Tamil, L3Cube-IndicSBERT and ensemble approaches outperformed others, while m-BERT demonstrated superior performance among the models for Tulu. The presented models achieved the 3^{rd} and 1^{st} ranks by attaining macro F1-scores of 0.227 and 0.584 in Tamil and Tulu, respectively. 2024.dravidianlangtech-1.34 eusha-etal-2024-cuet + <fixed-case>B</fixed-case>inary_<fixed-case>B</fixed-case>easts@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case> 2024: Multimodal Abusive Language Detection in <fixed-case>T</fixed-case>amil based on Integrated Approach of Machine Learning and Deep Learning Techniques @@ -451,6 +482,7 @@ Detecting abusive language on social media is a challenging task that needs to be solved effectively. This research addresses the formidable challenge of detecting abusive language in Tamil through a comprehensive multimodal approach, incorporating textual, acoustic, and visual inputs. This study utilized ConvLSTM, 3D-CNN, and a hybrid 3D-CNN with BiLSTM to extract video features. Several models, such as BiLSTM, LR, and CNN, are explored for processing audio data, whereas for textual content, MNB, LR, and LSTM methods are explored. To further enhance overall performance, this work introduced a weighted late fusion model amalgamating predictions from all modalities. The fusion model was then applied to make predictions on the test dataset. The ConvLSTM+BiLSTM+MNB model yielded the highest macro F1 score of 71.43%. Our methodology allowed us to achieve 1 st rank for multimodal abusive language detection in the shared task 2024.dravidianlangtech-1.35 rahman-etal-2024-binary + <fixed-case>W</fixed-case>ord<fixed-case>W</fixed-case>izards@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024: Sentiment Analysis in <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu using Sentence Embedding @@ -463,6 +495,7 @@ Sentiment Analysis of Dravidian Languages has begun to garner attention recently as there is more need to analyze emotional responses and subjective opinions present in social media text. As this data is code-mixed and there are not many solutions to code-mixed text out there, we present to you a stellar solution to DravidianLangTech 2024: Sentiment Analysis in Tamil and Tulu task. To understand the sentiment of social media text, we used pre-trained transformer models and feature extraction vectorizers to classify the data with results that placed us 11th in the rankings for the Tamil task and 8th for the Tulu task with a accuracy F1 score of 0.12 and 0.30 which shows the efficiency of our approach. 2024.dravidianlangtech-1.36 balaji-etal-2024-wordwizards + <fixed-case>CUET</fixed-case>_<fixed-case>DUO</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech <fixed-case>EACL</fixed-case>2024: Fake News Classification Using <fixed-case>M</fixed-case>alayalam-<fixed-case>BERT</fixed-case> @@ -477,6 +510,7 @@ Identifying between fake and original news in social media demands vigilant procedures. This paper introduces the significant shared task on ‘Fake News Detection in Dravidian Languages - DravidianLangTech@EACL 2024’. With a focus on the Malayalam language, this task is crucial in identifying social media posts as either fake or original news. The participating teams contribute immensely to this task through their varied strategies, employing methods ranging from conventional machine-learning techniques to advanced transformer-based models. Notably, the findings of this work highlight the effectiveness of the Malayalam-BERT model, demonstrating an impressive macro F1 score of 0.88 in distinguishing between fake and original news in Malayalam social media content, achieving a commendable rank of 1st among the participants. 2024.dravidianlangtech-1.37 rahman-etal-2024-cuet + Wit Hub@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-2024:Multimodal Social Media Data Analysis in <fixed-case>D</fixed-case>ravidian Languages using Machine Learning Models @@ -489,6 +523,7 @@ The main objective of the task is categorised into three subtasks. Subtask-1 Build models to determine the sentiment expressed in multimodal posts (or videos) in Tamil and Malayalam languages, leveraging textual, audio, and visual components. The videos are labelled into five categories: highly positive, positive, neutral, negative and highly negative. Subtask-2 Design machine models that effectively identify and classify abusive language within the multimodal context of social media posts in Tamil. The data are categorized into abusive and non-abusive categories. Subtask-3 Develop advanced models that accurately detect and categorize hate speech and offensive language in multimodal social media posts in Dravidian languages. The data points are categorized into Caste, Offensive, Racist and Sexist classes. In this session, the focus is primarily on Tamil language text data analysis. Various combination of machine learning models have been used to perform each tasks and do oversampling techniques to train models on biased dataset. 2024.dravidianlangtech-1.38 s-etal-2024-wit + <fixed-case>CUETS</fixed-case>entiment<fixed-case>S</fixed-case>illies@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case>2024: Transformer-based Approach for Sentiment Analysis in <fixed-case>T</fixed-case>amil and <fixed-case>T</fixed-case>ulu Code-Mixed Texts @@ -503,6 +538,7 @@ Sentiment analysis (SA) on social media reviews has become a challenging research agenda in recent years due to the exponential growth of textual content. Although several effective solutions are available for SA in high-resourced languages, it is considered a critical problem for low-resourced languages. This work introduces an automatic system for analyzing sentiment in Tamil and Tulu code-mixed languages. Several ML (DT, RF, MNB), DL (CNN, BiLSTM, CNN+BiLSTM), and transformer-based models (Indic-BERT, XLM-RoBERTa, m-BERT) are investigated for SA tasks using Tamil and Tulu code-mixed textual data. Experimental outcomes reveal that the transformer-based models XLM-R and m-BERT surpassed others in performance for Tamil and Tulu, respectively. The proposed XLM-R and m-BERT models attained macro F1-scores of 0.258 (Tamil) and 0.468 (Tulu) on test datasets, securing the 2^{nd} and 5^{th} positions, respectively, in the shared task. 2024.dravidianlangtech-1.39 tripty-etal-2024-cuetsentimentsillies + Social Media Hate and Offensive Speech Detection Using Machine Learning method @@ -514,6 +550,7 @@ Even though the improper use of social media is increasing nowadays, there is also technology that brings solutions. Here, improperness is posting hate and offensive speech that might harm an individual or group. Hate speech refers to an insult toward an individual or group based on their identities. Spreading it on social media platforms is a serious problem for society. The solution, on the other hand, is the availability of natural language processing(NLP) technology that is capable to detect and handle such problems. This paper presents the detection of social media’s hate and offensive speech in the code-mixed Telugu language. For this, the task and golden standard dataset were provided for us by the shared task organizer (DravidianLangTech@ EACL 2024)1. To this end, we have employed the TF-IDF technique for numeric feature extraction and used a random forest algorithm for modeling hate speech detection. Finally, the developed model was evaluated on the test dataset and achieved 0.492 macro-F1. 2024.dravidianlangtech-1.40 bade-etal-2024-social-media + <fixed-case>CUETS</fixed-case>entiment<fixed-case>S</fixed-case>illies@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech <fixed-case>EACL</fixed-case>2024: Transformer-based Approach for Detecting and Categorizing Fake News in <fixed-case>M</fixed-case>alayalam Language @@ -527,6 +564,7 @@ Fake news misleads people and may lead to real-world miscommunication and injury. Removing misinformation encourages critical thinking, democracy, and the prevention of hatred, fear, and misunderstanding. Identifying and removing fake news and developing a detection system is essential for reliable, accurate, and clear information. Therefore, a shared task was organized to detect fake news in Malayalam. This paper presents a system developed for the shared task of detecting and classifying fake news in Malayalam. The approach involves a combination of machine learning models (LR, DT, RF, MNB), deep learning models (CNN, BiLSTM, CNN+BiLSTM), and transformer-based models (Indic-BERT, XLMR, Malayalam-BERT, m-BERT) for both subtasks. The experimental results demonstrate that transformer-based models, specifically m- BERT and Malayalam-BERT, outperformed others. The m-BERT model achieved superior performance in subtask 1 with macro F1-scores of 0.84, and Malayalam-BERT outperformed the other models in subtask 2 with macro F1- scores of 0.496, securing us the 5th and 2nd positions in subtask 1 and subtask 2, respectively. 2024.dravidianlangtech-1.41 tripty-etal-2024-cuetsentimentsillies-dravidianlangtech + <fixed-case>MUCS</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-2024: Role of Learning Approaches in Strengthening Hate-Alert Systems for code-mixed text @@ -553,6 +591,7 @@ Sentiment Analysis (SA) is a field of computational study that analyzes and understands people’s opinions, attitudes, and emotions toward any entity. A review of an entity can be written about an individual, an event, a topic, a product, etc., and such reviews are abundant on social media platforms. The increasing number of social media users and the growing amount of user-generated code-mixed content such as reviews, comments, posts etc., on social media have resulted in a rising demand for efficient tools capable of effectively analyzing such content to detect the sentiments. In spite of this, SA of social media text is challenging because the code-mixed text is complex. To address SA in code-mixed Tamil and Tulu text, this paper describes the Machine Learning (ML) models submitted by our team - MUCS to “Sentiment Analysis in Tamil and Tulu - Dravidian- LangTech” - a shared task organized at European Chapter of the Association for Computational Linguistics (EACL) 2024. Linear Support Vector classifier (LinearSVC) and ensemble of 5 ML classifiers (k Nearest Neighbour (kNN), Stochastic Gradient Descent (SGD), Logistic Regression (LR), LinearSVC, and Random Forest Classifier (RFC)) with hard voting trained using concatenated features obtained from word and character n-ngrams vectoized from Term Frequency-Inverse Document Frequency (TF-IDF) vectorizer and CountVectorizer. Further, Gridsearch algorithm is employed to obtain optimal hyperparameter values.The proposed ensemble model obtained macro F1 scores of 0.260 and 0.550 for Tamil and Tulu languages respectively. 2024.dravidianlangtech-1.43 b-etal-2024-mucs + <fixed-case>I</fixed-case>nnovation<fixed-case>E</fixed-case>ngineers@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech-<fixed-case>EACL</fixed-case> 2024: Sentimental Analysis of <fixed-case>Y</fixed-case>ou<fixed-case>T</fixed-case>ube Comments in <fixed-case>T</fixed-case>amil by using Machine Learning @@ -564,6 +603,7 @@ There is opportunity for machine learning and natural language processing research because of the growing volume of textual data. Although there has been little research done on trend extraction from YouTube comments, sentiment analysis is an intriguing issue because of the poor consistency and quality of the material found there. The purpose of this work is to use machine learning techniques and algorithms to do sentiment analysis on YouTube comments pertaining to popular themes. The findings demonstrate that sentiment analysis is capable of giving a clear picture of how actual events affect public opinion. This study aims to make it easier for academics to find high-quality sentiment analysis research publications. Data normalisation methods are used to clean an annotated corpus of 1500 citation sentences for the study. .For classification, a system utilising one machine learning algorithm—K-Nearest Neighbour (KNN), Na ̈ıve Bayes, SVC (Support Vector Machine), and RandomForest—is built. Metrics like the f1-score and correctness score are used to assess the correctness of the system. 2024.dravidianlangtech-1.44 shanmugavadivel-etal-2024-innovationengineers + <fixed-case>KEC</fixed-case>_<fixed-case>HAWKS</fixed-case>@<fixed-case>D</fixed-case>ravidian<fixed-case>L</fixed-case>ang<fixed-case>T</fixed-case>ech 2024 : Detecting <fixed-case>M</fixed-case>alayalam Fake News using Machine Learning Models @@ -576,6 +616,7 @@ The proliferation of fake news in the Malayalam language across digital platforms has emerged as a pressing issue. By employing Recurrent Neural Networks (RNNs), a type of machine learning model, we aim to distinguish between Original and Fake News in Malayalam and achieved 9th rank in Task 1.RNNs are chosen for their ability to understand the sequence of words in a sentence, which is important in languages like Malayalam. Our main goal is to develop better models that can spot fake news effectively. We analyze various features to understand what contributes most to this accuracy. By doing so, we hope to provide a reliable method for identifying and combating fake news in the Malayalam language. 2024.dravidianlangtech-1.45 subramanian-etal-2024-kec + diff --git a/data/xml/2024.eacl.xml b/data/xml/2024.eacl.xml index 4d4545149a..88ce0cf81f 100644 --- a/data/xml/2024.eacl.xml +++ b/data/xml/2024.eacl.xml @@ -26,6 +26,7 @@ An increasing amount of research in Natural Language Inference (NLI) focuses on the application and evaluation of Large Language Models (LLMs) and their reasoning capabilities. Despite their success, however, LLMs are still prone to factual errors and inconsistencies in their explanations, offering limited control and interpretability for inference in complex domains. In this paper, we focus on ethical NLI, investigating how hybrid neuro-symbolic techniques can enhance the logical validity and alignment of ethical explanations produced by LLMs. Specifically, we present an abductive-deductive framework named Logic-Explainer, which integrates LLMs with an external backward-chaining solver to refine step-wise natural language explanations and jointly verify their correctness, reduce incompleteness and minimise redundancy. An extensive empirical analysis demonstrates that Logic-Explainer can improve explanations generated via in-context learning methods and Chain-of-Thought (CoT) on challenging ethical NLI tasks, while, at the same time, producing formal proofs describing and supporting models’ reasoning. As ethical NLI requires commonsense reasoning to identify underlying moral violations, our results suggest the effectiveness of neuro-symbolic methods for multi-step NLI more broadly, opening new opportunities to enhance the logical consistency, reliability, and alignment of LLMs. 2024.eacl-long.1 quan-etal-2024-enhancing +