Deploying and Serving LLMs with IPEX-LLM

IPEX-LLM is a library for running LLM (large language model) on Intel XPU (from Laptop to GPU to Cloud) using INT4 with very low latency (for any PyTorch model).

The integration with IPEX-LLM currently only supports running on Intel CPU.

Setup

Please follow setup.md to setup the environment first. Additional, you will need to install IPEX-LLM dependencies as below.

pip install .[ipex-llm] --extra-index-url https://download.pytorch.org/whl/cpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/

Configure Serving Parameters

Please follow the serving document for configuring the parameters. In the configuration file, you need to set ipex-llm and load_in_4bit to true. Example configuration files for enalbing ipex-llm are availabe [here].(../inference/models/ipex-llm)

  ipexllm: true
  config:
    load_in_4bit: true

Deploy and Test

Please follow the serving document for deploying and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serve_ipex-llm.md

serve_ipex-llm.md

Deploying and Serving LLMs with IPEX-LLM

Setup

Configure Serving Parameters

Deploy and Test

Files

serve_ipex-llm.md

Latest commit

History

serve_ipex-llm.md

File metadata and controls

Deploying and Serving LLMs with IPEX-LLM

Setup

Configure Serving Parameters

Deploy and Test