Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoAWQ 4bit quantization #504

Open
1 task
irthomasthomas opened this issue Feb 3, 2024 · 0 comments
Open
1 task

AutoAWQ 4bit quantization #504

irthomasthomas opened this issue Feb 3, 2024 · 0 comments
Labels
llm-experiments experiments with large language models llm-quantization All about Quantized LLM models and serving MachineLearning ML Models, Training and Inference

Comments

@irthomasthomas
Copy link
Owner

CONTENT:

TITLE: Code search results

DESCRIPTION:

Repository files navigation

README
MIT license

AutoAWQ

| Roadmap | Examples | Issues: Help Wanted |

AutoAWQ is an easy-to-use package for 4-bit quantized models. AutoAWQ speeds up models by 3x and reduces memory requirements by 3x compared to FP16. AutoAWQ implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs. AutoAWQ was created and improved upon from the original work from MIT.

Latest News 🔥

[2023/12] Mixtral, LLaVa, QWen, Baichuan model support.
[2023/11] AutoAWQ inference has been integrated into 🤗 transformers. Now includes CUDA 12.1 wheels.
[2023/10] Mistral (Fused Modules), Bigcode, Turing support, Memory Bug Fix (Saves 2GB VRAM)
[2023/09] 1.6x-2.5x speed boost on fused models (now including MPT and Falcon).
[2023/09] Multi-GPU support, bug fixes, and better benchmark scripts available
[2023/08] PyPi package released and AutoModel class available

Install

Prerequisites

NVIDIA:

Your NVIDIA GPU(s) must be of Compute Capability 7.5. Turing and later architectures are supported.
Your CUDA version must be CUDA 11.8 or later.

AMD:

Your ROCm version must be ROCm 5.6 or later.

Install from PyPi

To install the newest AutoAWQ from PyPi, you need CUDA 12.1 installed.

pip install autoawq

Build from source

For CUDA 11.8, ROCm 5.6, and ROCm 5.7, you can install wheels from the release page:

pip install autoawq@https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp310-cp310-linux_x86_64.whl

Or from the main branch directly:

pip install autoawq@https://github.com/casper-hansen/AutoAWQ.git

Or by cloning the repository and installing from source:

git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip install -e .

All three methods will install the latest and correct kernels for your system from AutoAWQ_Kernels.

If your system is not supported (i.e. not on the release page), you can build the kernels yourself by following the instructions in AutoAWQ_Kernels and then install AutoAWQ from source.

Supported models

The detailed support list:

Models Sizes
LLaMA-2 7B/13B/70B
LLaMA 7B/13B/30B/65B
Mistral 7B
Vicuna 7B/13B
MPT 7B/30B
Falcon 7B/40B
OPT 125m/1.3B/2.7B/6.7B/13B/30B
Bloom 560m/3B/7B/
GPTJ 6.7B
Aquila 7B
Aquila2 7B/34B
Yi 6B/34B
Qwen 1.8B/7B/14B/72B
BigCode 1B/7B/15B
GPT NeoX 20B
GPT-J 6B
LLaVa 7B/13B
Mixtral 8x7B
Baichuan 7B/13B
QWen 1.8B/7B/14/72B

Usage

Under examples, you can find examples of how to quantize, run inference, and benchmark AutoAWQ models.

INT4 GEMM vs INT4 GEMV vs FP16

There are two versions of AWQ: GEMM

Suggested labels

null

@irthomasthomas irthomasthomas added llm-experiments experiments with large language models llm-quantization All about Quantized LLM models and serving MachineLearning ML Models, Training and Inference labels Feb 3, 2024
@irthomasthomas irthomasthomas changed the title Code search results AutoAWQ 4bit quantization Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llm-experiments experiments with large language models llm-quantization All about Quantized LLM models and serving MachineLearning ML Models, Training and Inference
Projects
None yet
Development

No branches or pull requests

1 participant