Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
shintaro-ozaki committed Sep 21, 2024
1 parent 7456e04 commit 4841784
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2271,3 +2271,18 @@ Turbo can generate questions with adequate general knowledge in both languages,
albeit not as culturally 'deep' as humans. We also observe a higher occurrence
of fluency errors in the Sundanese dataset, highlighting the discrepancy
between medium- and lower-resource languages.
<br>http://arxiv.org/abs/2304.11164v1
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs
Language models have become very popular recently and many claims have been
made about their abilities, including for commonsense reasoning. Given the
increasingly better results of current language models on previous static
benchmarks for commonsense reasoning, we explore an alternative dialectical
evaluation. The goal of this kind of evaluation is not to obtain an aggregate
performance value but to find failures and map the boundaries of the system.
Dialoguing with the system gives the opportunity to check for consistency and
get more reassurance of these boundaries beyond anecdotal evidence. In this
paper we conduct some qualitative investigations of this kind of evaluation for
the particular case of spatial reasoning (which is a fundamental aspect of
commonsense reasoning). We conclude with some suggestions for future work both
to improve the capabilities of language models and to systematise this kind of
dialectical evaluation.

0 comments on commit 4841784

Please sign in to comment.