H. -> H. Andrew (closes #3814)

acl-org · Aug 29, 2024 · 591be9f · 591be9f
1 parent ff6b5ef
commit 591be9f
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 5 deletions.
diff --git a/data/xml/2023.wassa.xml b/data/xml/2023.wassa.xml
@@ -425,7 +425,7 @@
       <author><first>Adithya</first><last>V Ganesan</last><affiliation>, State University of New York, Stony Brook</affiliation></author>
       <author><first>Yash Kumar</first><last>Lal</last><affiliation>State University of New York, Stony Brook</affiliation></author>
       <author><first>August</first><last>Nilsson</last></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>H. Andrew</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
       <pages>390-400</pages>
       <abstract>Very large language models (LLMs) perform extremely well on a spectrum of NLP tasks in a zero-shot setting. However, little is known about their performance on human-level NLP problems which rely on understanding psychological concepts, such as assessing personality traits. In this work, we investigate the zero-shot ability of GPT-3 to estimate the Big 5 personality traits from users’ social media posts. Through a set of systematic experiments, we find that zero-shot GPT-3 performance is somewhat close to an existing pre-trained SotA for broad classification upon injecting knowledge about the trait in the prompts. However, when prompted to provide fine-grained classification, its performance drops to close to a simple most frequent class (MFC) baseline. We further analyze where GPT-3 performs better, as well as worse, than a pretrained lexical model, illustrating systematic errors that suggest ways to improve LLMs on human-level NLP tasks. The code for this project is available on Github.</abstract>
       <url hash="85dbc3d7">2023.wassa-1.34</url>

diff --git a/data/xml/2024.eacl.xml b/data/xml/2024.eacl.xml
@@ -2810,7 +2810,7 @@
       <author><first>Matthew</first><last>Matero</last></author>
       <author><first>Salvatore</first><last>Giorgi</last></author>
       <author><first>Vivek</first><last>Kulkarni</last></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>H. Andrew</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
       <pages>454-468</pages>
       <abstract>Social science NLP tasks, such as emotion or humor detection, are required to capture the semantics along with the implicit pragmatics from text, often with limited amounts of training data. Instruction tuning has been shown to improve the many capabilities of large language models (LLMs) such as commonsense reasoning, reading comprehension, and computer programming. However, little is known about the effectiveness of instruction tuning on the social domain where implicit pragmatic cues are often needed to be captured. We explore the use of instruction tuning for social science NLP tasks and introduce Socialite-Llama — an open-source, instruction-tuned Llama. On a suite of 20 social science tasks, Socialite-Llama improves upon the performance of Llama as well as matches or improves upon the performance of a state-of-the-art, multi-task finetuned model on a majority of them. Further, Socialite-Llama also leads to improvement on 5 out of 6 related social tasks as compared to Llama, suggesting instruction tuning can lead to generalized social understanding. All resources including our code, model and dataset can be found through [bit.ly/socialitellama](https://bit.ly/socialitellama/).</abstract>
       <url hash="91eb1d08">2024.eacl-short.40</url>

diff --git a/data/xml/2024.naacl.xml b/data/xml/2024.naacl.xml
@@ -1749,7 +1749,7 @@
       <author><first>Vasudha</first><last>Varadarajan</last></author>
       <author><first>Sverker</first><last>Sikström</last></author>
       <author><first>Oscar</first><last>Kjell</last></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>H. Andrew</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
       <pages>2466-2478</pages>
       <abstract>Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments haveshown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introducesthe task of Adaptive Language-Based Assessment (ALBA), which involves adaptively ordering questions while also scoring an individual’s latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory.We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-basedmethod (ALIRT) and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We foundALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ≈ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.</abstract>
       <url hash="d483a97d">2024.naacl-long.136</url>
@@ -6337,7 +6337,7 @@
     <paper id="477">
       <title>Large Human Language Models: A Need and the Challenges</title>
       <author><first>Nikita</first><last>Soni</last></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>H. Andrew</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
       <author><first>João</first><last>Sedoc</last><affiliation>New York University</affiliation></author>
       <author><first>Niranjan</first><last>Balasubramanian</last><affiliation>State University of New York, Stony Brook</affiliation></author>
       <pages>8631-8646</pages>

diff --git a/data/xml/2024.wassa.xml b/data/xml/2024.wassa.xml
@@ -287,7 +287,7 @@
       <title>Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?</title>
       <author><first>Nikita</first><last>Soni</last></author>
       <author><first>Niranjan</first><last>Balasubramanian</last><affiliation>State University of New York, Stony Brook</affiliation></author>
-      <author><first>H.</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
+      <author><first>H. Andrew</first><last>Schwartz</last><affiliation>Stony Brook University (SUNY)</affiliation></author>
       <author><first>Dirk</first><last>Hovy</last><affiliation>Bocconi University</affiliation></author>
       <pages>316-328</pages>
       <abstract>Pre-trained language models consider the context of neighboring words and documents but lack any author context of the human generating the text. However, language depends on the author’s states, traits, social, situational, and environmental attributes, collectively referred to as human context (Soni et al., 2024). Human-centered natural language processing requires incorporating human context into language models. Currently, two methods exist: pre-training with 1) group-wise attributes (e.g., over-45-year-olds) or 2) individual traits. Group attributes are simple but coarse — not all 45-year-olds write the same way — while individual traits allow for more personalized representations, but require more complex modeling and data. It is unclear which approach benefits what tasks. We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on five user- and document-level tasks. Our results show that there is no best approach, but that human-centered language modeling holds avenues for different methods.</abstract>