From bb69dcf083ec99fbd1d2ca3d17d2db0a684bcd74 Mon Sep 17 00:00:00 2001
From: anthology-assist <anthologygit@gmail.com>
Date: Tue, 17 Sep 2024 13:43:32 -0500
Subject: [PATCH] Paper Revision: {2024.acl-long.852}, closes #3879.

---
 data/xml/2024.acl.xml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/data/xml/2024.acl.xml b/data/xml/2024.acl.xml
index f20229ebe1..8e53e8ad6d 100644
--- a/data/xml/2024.acl.xml
+++ b/data/xml/2024.acl.xml
@@ -11145,8 +11145,10 @@
       <author><first>Arman</first><last>Cohan</last><affiliation>Yale University</affiliation></author>
       <pages>16103-16120</pages>
       <abstract>Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables. We evaluate a wide spectrum of 27 LLMs, including those specialized in math, coding and finance, with Chain-of-Thought and Program-of-Thought prompting methods. We found that even the current best-performing system (i.e., GPT-4) still significantly lags behind human experts in solving complex numerical reasoning problems grounded in long contexts. We believe DocMath-Eval can be used as a valuable benchmark to evaluate LLMs’ capabilities to solve challenging numerical reasoning problems in expert domains.</abstract>
-      <url hash="de5c6157">2024.acl-long.852</url>
+      <url hash="0f0bea3b">2024.acl-long.852</url>
       <bibkey>zhao-etal-2024-docmath</bibkey>
+      <revision id="1" href="2024.acl-long.852v1" hash="de5c6157"/>
+      <revision id="2" href="2024.acl-long.852v2" hash="0f0bea3b" date="2024-09-17">Included experimental results.</revision>
     </paper>
     <paper id="853">
       <title>Unintended Impacts of <fixed-case>LLM</fixed-case> Alignment on Global Representation</title>