Skip to content

Commit

Permalink
Update README.md (ModelCloud#46)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md
  • Loading branch information
Qubitium authored Jun 18, 2024
1 parent f206720 commit 47f1c8a
Showing 1 changed file with 28 additions and 4 deletions.
32 changes: 28 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,32 @@

*For more histories please turn to [here](docs/NEWS_OR_UPDATE.md)*


## How is AutoGPTQ-NEXT different from AutoGPTQ?

AutoGPTQ-NEXT is an updated version of AugtoGPTQ with latest bug fixes applied, new features, better/latest model support, and an guranteed from the ModelCloud.ai team and that we, along with the open-source ML community, will take every effort to bring the library up-to-date with latest advancements, model support, and bug fixes.

## Mission Statement

We want AutoGPTQ-NEXT to be highy focused on GPTQ based quantization and target inference compatibility with Transformers, vLLM, and SGLang.

## Major Changes vs AutoGPTQ

* `Sym=False` Support. AutoGPTQ main has broken `sym=false`.
* `lm_head` module quantized inference support for further vram reduction.
* PENDING `lm_head` quantization will be added soon with support from Intel/Autoround.
* ChatGLM Model Support.
* Better defaults resulting in faster inference.
* Better default PPL without with tweaked internal code (Result may vary depending on calibration set and gpu usage).
* PENDING: DBRX Model Support.
* Removed non-working, partially working, or fully deprecated features: Peft, ROCM, AWQ Gemm execution via GPTQ kernels, Triton v1 (replaced by v2).
* Fixed Packing Performance regression on high core-count systems.
* Thousands of lines of refractor/cleanup.
* Complete tests with every feature and model tested. Everything that does not pass tests will be removed from repo. We want quality over quantity.

## Platform Support
AutoGPTQ-NEXT is currently Linux only and requires Torch/Cuda capable GPU from NVIDIA. WSL on Windows should work as well. ROCM/AMD support will be re-added in a furture version after everything on ROCM has been validated. Only fully validated features will be re-added from the original AutoGPTQ repo.

## Installation

AutoGPTQ-NEXT is available for Linux only. You can install the latest stable release of AutoGPTQ from pip with pre-built wheels:
Expand All @@ -29,14 +55,12 @@ On NVIDIA systems, AutoGPTQ-NEXT does not support [Maxwell or lower](https://qii

### Install from source

Clone the source code:
Clone repo:
```bash
git clone https://github.com/Qubitium/AutoGPTQ-NEXT.git && cd AutoGPTQ
```

A few packages are required in order to build from source: `pip install numpy gekko pandas`.

Then, install locally from source:
Compile:
```bash
pip install -vvv --no-build-isolation -e .
```
Expand Down

0 comments on commit 47f1c8a

Please sign in to comment.