iEvals : iKala's Evaluator for Large Language Models

iEvals is a framework for evaluating chinese large language models (LLMs), especially performance in traditional chinese domain. Our goal was to provide an easy to setup and fast evaluation library for guiding the performance/use on existing chinese LLMs.

Currently, we only support evaluation for TMMLU+, however in the future we are exploring more domain, ie knowledge extensive dataset (CMMLU, C-Eval) as well as context retrieval and multi-conversation dataset.

Installation

pip install git+https://github.com/ikala-corp/ievals.git

Usage

ieval <model name> <series: optional> --top_k <numbers of incontext examples>

For more details please refer to models section

Coming soon

Chain of Thought (CoT) with few shot
Arxiv paper : detailed analysis on model interior and exterior relations
More tasks

Citation

@article{ikala2023eval,
  title={An Improved Traditional Chinese Evaluation Suite for Foundation Model},
  author={Tam, Zhi-Rui and Pai, Ya-Ting},
  journal={arXiv},
  year={2023}
}

Disclaimer

This is not an officially supported iKala product.

This research code is provided "as-is" to the broader research community. iKala does not promise to maintain or otherwise support this code in any way.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
ievals		ievals
resources		resources
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
MODELS.md		MODELS.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

iEvals : iKala's Evaluator for Large Language Models

Installation

Usage

Coming soon

Citation

Disclaimer

About

Releases

Packages

Contributors 3

Languages

License

iKala/ievals

Folders and files

Latest commit

History

Repository files navigation

iEvals : iKala's Evaluator for Large Language Models

Installation

Usage

Coming soon

Citation

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages