AutoAWQ 4bit quantization #504

irthomasthomas · 2024-02-03T14:48:07Z

Code search results

CONTENT:

TITLE: Code search results

DESCRIPTION:

Add file
Folders and files
Name
Latest commit
casper-hansen
- Fix n_samples (Assisted Generation: a new direction toward low-latency text generation #326)
- ebe8fc3
- History
.github/workflows
- AMD ROCM Support (A Cheat Sheet and Some Recipes For Building Advanced RAG | by Andrei | Jan, 2024 | LlamaIndex Blog #315)
awq
- Fix n_samples (Assisted Generation: a new direction toward low-latency text generation #326)
examples
- Marlin symmetric quantization and inference (MultiLoRA: Democratizing LoRA for Better Multi-Task Learning #320)
scripts
- Exclude download of CUDA wheels (TinyLlama exl2 quants for speculative decoding #159)
tests
- Torch only inference + any-device quantization (#319)
.gitignore
- first commit
LICENSE
- add LICENSE
README.md
- AMD ROCM Support (A Cheat Sheet and Some Recipes For Building Advanced RAG | by Andrei | Jan, 2024 | LlamaIndex Blog #315)
setup.py
- AMD ROCM Support (A Cheat Sheet and Some Recipes For Building Advanced RAG | by Andrei | Jan, 2024 | LlamaIndex Blog #315)

Repository files navigation

README
MIT license

AutoAWQ

| Roadmap | Examples | Issues: Help Wanted |

AutoAWQ is an easy-to-use package for 4-bit quantized models. AutoAWQ speeds up models by 3x and reduces memory requirements by 3x compared to FP16. AutoAWQ implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs. AutoAWQ was created and improved upon from the original work from MIT.

Suggested labels

null

irthomasthomas added llm-experiments experiments with large language models llm-quantization All about Quantized LLM models and serving MachineLearning ML Models, Training and Inference labels Feb 3, 2024

irthomasthomas changed the title ~~Code search results~~ AutoAWQ 4bit quantization Feb 3, 2024

irthomasthomas mentioned this issue Feb 27, 2024

Guide to choosing quants and engines : r/LocalLLaMA #641

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoAWQ 4bit quantization #504

AutoAWQ 4bit quantization #504

irthomasthomas commented Feb 3, 2024

AutoAWQ 4bit quantization #504

AutoAWQ 4bit quantization #504

Comments

irthomasthomas commented Feb 3, 2024

CONTENT:

TITLE: Code search results

Suggested labels

null