Closed source API : OpenAI, Azure, Anthropic, Gemini
Open weights models :
- We rely on tgi for inference, checkout Text generation inference for more details.
- For Qwen models or other local models, please visit Qwen models or other local models section.
In order to reduce download friction, we recommend using text-generation-inference for inferencing open-weight models
For example this would setup a simple tgi instance using docker
sudo docker run --gpus '"device=0"' \
--shm-size 1g -p 8020:80 \
-v /volume/saved_model/:/data ghcr.io/huggingface/text-generation-inference:1.1.0 \
--max-input-length 4000 \
--max-total-tokens 4096 \
--model-id GeneZC/MiniChat-3B
Note: For 5 shot settings, one might need to supply more than 5200 max-input-length to fit in the entire prompt
Once the server has warmed up, simply assign the models and IP:Port to the evaluation cli
ieval GeneZC/MiniChat-3B --ip_addr 0.0.0.0:8020
For custom models, you might need to provide tokens text for system, user, assistant and end of sentence.
ieval GeneZC/MiniChat-3B --ip_addr 0.0.0.0:8020 \
--sys_token "<s> [|User|] " \
--usr_token "<s> [|User|] " \
--ast_token "[|Assistant|]" \
--eos_token "</s>"
You can run ieval supported
to check models which we have already included with chat prompt. (This feature will be deprecated once more models support format chat prompt function)
ieval gpt-3.5-turbo-0613 --series openai_chat --api_key "<Your OpenAI platform Key>"
ieval gpt-3.5-turbo-instruct --series openai_complete --api_key "<Your OpenAI platform Key>" --top_k 5
Use str parsing the answers -
ieval gemini-pro --api_key "<Your API Key from https://ai.google.dev/>" --top_k 5
Use LLM parsing the answers -
API_KEY="<Gemini API Key>" ieval gemini-pro --api_key "<Your API Key from https://ai.google.dev/>" --top_k 5 --parsing_method llm
Currently we do not support models from vertex AI yet. So PaLM (bison) series are not supported
Use str parsing the answers -
ieval claude-instant-1 --api_key "<Anthropic API keys>"
Use LLM parsing the answers -
API_KEY="<Gemini API Key>" ieval claude-instant-1 --api_key "<Anthropic API keys>" --parsing_method llm
Use str parsing the answers -
export AZURE_OPENAI_ENDPOINT="https://XXXX.azure.com/"
ieval <your azure model name> --series azure --api_key "<Your API Key>" --top_k 5
Use LLM parsing the answers -
API_KEY="<Gemini API Key>" ieval <your azure model name> --series azure --api_key "<Your API Key>" --top_k 5 --parsing_method llm
We haven't experimented with instruction based model from azure yet, so for instruction based models, you will have to fallback to openai's models
Before using models from dashscope please install it via pypi
pip install dashscope==1.13.6
Once installed, you should be able to run: Use str parsing the answers -
ieval <Your model name> --api_key "<Dash Scope API>"
Use LLM parsing the answers -
API_KEY="<Gemini API Key>" ieval <Your model name> --api_key "<Dash Scope API>"
Supported models : qwen-turbo, qwen-plus, qwen-max, qwen-plus-v1, bailian-v1
Use str parsing the answers -
CUDA_VISIBLE_DEVICES=1 ieval Qwen/Qwen-7B-Chat
or
CUDA_VISIBLE_DEVICES=1 ieval Qwen/Qwen-7B-Chat --series hf_chat
Use LLM parsing the answers -
API_KEY="<Gemini API Key>" CUDA_VISIBLE_DEVICES=1 ieval Qwen/Qwen-7B-Chat --series hf_chat --parsing_method llm
If the mentioned model is private you can pass in your huggingface read token via --api_key argument
Before using models from reka please install it via pypi
pip install reka
Once installed, you should be able to run:
ieval <reka-flash, reka-flash, reka-edge> --api_key "<Reka API Key>"
Before using models from Groq please install it via pypi
pip install groq
Once installed, you should be able to run:
ieval llama3-8b-8192 --series groq --api_key "<Groq API : gsk_XXXXX>"
Before using models from Together please install it via pypi
pip install together
Once installed, you should be able to run:
ieval meta-llama/Llama-3-70b-chat-hf --series together --api_key "XXX"