IPEX-LLM is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency (for any PyTorch model).
The integration with IPEX-LLM currently only supports running on Intel CPU.
Please follow setup.md to setup the environment first. Additional, you will need to install IPEX-LLM dependencies as below.
pip install .[ipex-llm] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Please follow the serving document for configuring the parameters. In the configuration file, you need to set ipex-llm
and load_in_4bit
to true. Example configuration files for enalbing ipex-llm are availabe [here].(../inference/models/ipex-llm)
ipexllm: true
config:
load_in_4bit: true
Please follow the serving document for deploying and testing.