Skip to content

Commit

Permalink
Paper Revision: {2024.acl-long.852}, closes #3879.
Browse files Browse the repository at this point in the history
  • Loading branch information
anthology-assist committed Sep 17, 2024
1 parent 8d9cfd7 commit bb69dcf
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion data/xml/2024.acl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11145,8 +11145,10 @@
<author><first>Arman</first><last>Cohan</last><affiliation>Yale University</affiliation></author>
<pages>16103-16120</pages>
<abstract>Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning capabilities of LLMs in the context of understanding and analyzing financial documents containing both text and tables. We evaluate a wide spectrum of 27 LLMs, including those specialized in math, coding and finance, with Chain-of-Thought and Program-of-Thought prompting methods. We found that even the current best-performing system (i.e., GPT-4) still significantly lags behind human experts in solving complex numerical reasoning problems grounded in long contexts. We believe DocMath-Eval can be used as a valuable benchmark to evaluate LLMs’ capabilities to solve challenging numerical reasoning problems in expert domains.</abstract>
<url hash="de5c6157">2024.acl-long.852</url>
<url hash="0f0bea3b">2024.acl-long.852</url>
<bibkey>zhao-etal-2024-docmath</bibkey>
<revision id="1" href="2024.acl-long.852v1" hash="de5c6157"/>
<revision id="2" href="2024.acl-long.852v2" hash="0f0bea3b" date="2024-09-17">Included experimental results.</revision>
</paper>
<paper id="853">
<title>Unintended Impacts of <fixed-case>LLM</fixed-case> Alignment on Global Representation</title>
Expand Down

0 comments on commit bb69dcf

Please sign in to comment.