Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
shintaro-ozaki committed Aug 6, 2024
1 parent 4038033 commit 74dc23f
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1488,3 +1488,21 @@ the particular case of spatial reasoning (which is a fundamental aspect of
commonsense reasoning). We conclude with some suggestions for future work both
to improve the capabilities of language models and to systematise this kind of
dialectical evaluation.
<br>http://arxiv.org/abs/2403.14112v2
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations
We introduce CHARM, the first benchmark for comprehensively and in-depth
evaluating the commonsense reasoning ability of large language models (LLMs) in
Chinese, which covers both globally known and Chinese-specific commonsense. We
evaluated 7 English and 12 Chinese-oriented LLMs on CHARM, employing 5
representative prompt strategies for improving LLMs' reasoning ability, such as
Chain-of-Thought. Our findings indicate that the LLM's language orientation and
the task's domain influence the effectiveness of the prompt strategy, which
enriches previous research findings. We built closely-interconnected reasoning
and memorization tasks, and found that some LLMs struggle with memorizing
Chinese commonsense, affecting their reasoning ability, while others show
differences in reasoning despite similar memorization performance. We also
evaluated the LLMs' memorization-independent reasoning abilities and analyzed
the typical errors. Our study precisely identified the LLMs' strengths and
weaknesses, providing the clear direction for optimization. It can also serve
as a reference for studies in other fields. We will release CHARM at
https://github.com/opendatalab/CHARM .

0 comments on commit 74dc23f

Please sign in to comment.