Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paper Metadata: {2024.acl-long.852} #3880

Open
3 of 4 tasks
yilunzhao opened this issue Sep 15, 2024 · 1 comment
Open
3 of 4 tasks

Paper Metadata: {2024.acl-long.852} #3880

yilunzhao opened this issue Sep 15, 2024 · 1 comment
Assignees
Labels
correction for corrections submitted to the anthology metadata Correction to metadata pending has been dealt with and is awaiting a PR to be merged

Comments

@yilunzhao
Copy link

Confirm that this is a metadata correction

  • I want to file corrections to make the metadata match the PDF file hosted on the ACL Anthology.

Anthology ID

2024.acl-long.852

Type of Paper Metadata Correction

  • Paper Title
  • Paper Abstract
  • Author Name(s)

Correction to Paper Title

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents

Correction to Paper Abstract

Recent LLMs have demonstrated remarkable performance in solving exam-like math word problems. However, the degree to which these numerical reasoning skills are effective in real-world scenarios, particularly in expert domains, is still largely unexplored. This paper introduces DocMath-Eval, a comprehensive benchmark specifically designed to evaluate the numerical reasoning capabilities of LLMs in the context of understanding and analyzing specialized documents containing both text and tables. We conduct an extensive evaluation of 48 LLMs with Chain-of-Thought and Program-of-Thought prompting methods, aiming to comprehensively assess the capabilities and limitations of existing LLMs in DocMath-Eval. We found that even the current best-performing system (i.e., GPT-4o) still significantly lags behind human experts in solving complex numerical reasoning problems grounded in long contexts. We believe that DocMath-Eval can serve as a valuable benchmark for evaluating LLMs' capabilities in solving challenging numerical reasoning problems within expert domains.

Correction to Author Name(s)

No response

@yilunzhao yilunzhao added correction for corrections submitted to the anthology metadata Correction to metadata labels Sep 15, 2024
@yilunzhao
Copy link
Author

This is a follow-up correction of #3879. Thanks!

@anthology-assist anthology-assist added the pending has been dealt with and is awaiting a PR to be merged label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction for corrections submitted to the anthology metadata Correction to metadata pending has been dealt with and is awaiting a PR to be merged
Projects
None yet
Development

No branches or pull requests

2 participants