Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
shintaro-ozaki committed Aug 5, 2024
1 parent 75da8f8 commit 4038033
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1473,3 +1473,18 @@ commonsense persona knowledge linkers. Additionally, our top-performing model,
Derberta-SynCPKL, secured first place in the CPKL challenge by a 16%
improvement in F1 score. We released both SynCPKL and Derberta-SynCPKL at
https://github.com/irislin1006/CPKL.
<br>http://arxiv.org/abs/2304.11164v1
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs
Language models have become very popular recently and many claims have been
made about their abilities, including for commonsense reasoning. Given the
increasingly better results of current language models on previous static
benchmarks for commonsense reasoning, we explore an alternative dialectical
evaluation. The goal of this kind of evaluation is not to obtain an aggregate
performance value but to find failures and map the boundaries of the system.
Dialoguing with the system gives the opportunity to check for consistency and
get more reassurance of these boundaries beyond anecdotal evidence. In this
paper we conduct some qualitative investigations of this kind of evaluation for
the particular case of spatial reasoning (which is a fundamental aspect of
commonsense reasoning). We conclude with some suggestions for future work both
to improve the capabilities of language models and to systematise this kind of
dialectical evaluation.

0 comments on commit 4038033

Please sign in to comment.