diff --git a/README.md b/README.md index 455d102c..d96354e0 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,32 @@ *For more histories please turn to [here](docs/NEWS_OR_UPDATE.md)* + +## How is AutoGPTQ-NEXT different from AutoGPTQ? + +AutoGPTQ-NEXT is an updated version of AugtoGPTQ with latest bug fixes applied, new features, better/latest model support, and an guranteed from the ModelCloud.ai team and that we, along with the open-source ML community, will take every effort to bring the library up-to-date with latest advancements, model support, and bug fixes. + +## Mission Statement + +We want AutoGPTQ-NEXT to be highy focused on GPTQ based quantization and target inference compatibility with Transformers, vLLM, and SGLang. + +## Major Changes vs AutoGPTQ + +* `Sym=False` Support. AutoGPTQ main has broken `sym=false`. +* `lm_head` module quantized inference support for further vram reduction. +* PENDING `lm_head` quantization will be added soon with support from Intel/Autoround. +* ChatGLM Model Support. +* Better defaults resulting in faster inference. +* Better default PPL without with tweaked internal code (Result may vary depending on calibration set and gpu usage). +* PENDING: DBRX Model Support. +* Removed non-working, partially working, or fully deprecated features: Peft, ROCM, AWQ Gemm execution via GPTQ kernels, Triton v1 (replaced by v2). +* Fixed Packing Performance regression on high core-count systems. +* Thousands of lines of refractor/cleanup. +* Complete tests with every feature and model tested. Everything that does not pass tests will be removed from repo. We want quality over quantity. + +## Platform Support +AutoGPTQ-NEXT is currently Linux only and requires Torch/Cuda capable GPU from NVIDIA. WSL on Windows should work as well. ROCM/AMD support will be re-added in a furture version after everything on ROCM has been validated. Only fully validated features will be re-added from the original AutoGPTQ repo. + ## Installation AutoGPTQ-NEXT is available for Linux only. You can install the latest stable release of AutoGPTQ from pip with pre-built wheels: @@ -29,14 +55,12 @@ On NVIDIA systems, AutoGPTQ-NEXT does not support [Maxwell or lower](https://qii ### Install from source -Clone the source code: +Clone repo: ```bash git clone https://github.com/Qubitium/AutoGPTQ-NEXT.git && cd AutoGPTQ ``` -A few packages are required in order to build from source: `pip install numpy gekko pandas`. - -Then, install locally from source: +Compile: ```bash pip install -vvv --no-build-isolation -e . ```