Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

Open
1 task
irthomasthomas opened this issue Jan 8, 2024 · 0 comments
Open
1 task

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes : r/Oobabooga #304

irthomasthomas opened this issue Jan 8, 2024 · 0 comments
Labels
llm Large Language Models llm-benchmarks testing and benchmarking large language models llm-experiments experiments with large language models llm-inference-engines Software to run inference on large language models llm-quantization All about Quantized LLM models and serving

Comments

@irthomasthomas
Copy link
Owner

GPTQ vs EXL2 vs AWQ vs Q4_K_M model sizes

Mod Post
Size (mb) Model
16560 Phind_Phind-CodeLlama-34B-v2-EXL2-4.000b
17053 Phind_Phind-CodeLlama-34B-v2-EXL2-4.125b
17463 Phind-CodeLlama-34B-v2-AWQ-4bit-128g
17480 Phind-CodeLlama-34B-v2-GPTQ-4bit-128g-actorder
17548 Phind_Phind-CodeLlama-34B-v2-EXL2-4.250b
18143 Phind_Phind-CodeLlama-34B-v2-EXL2-4.400b
19133 Phind_Phind-CodeLlama-34B-v2-EXL2-4.650b
19284 phind-codellama-34b-v2.Q4_K_M.gguf
19320 Phind-CodeLlama-34B-v2-AWQ-4bit-32g
19337 Phind-CodeLlama-34B-v2-GPTQ-4bit-32g-actorder
I created all these EXL2 quants to compare them to GPTQ and AWQ. The preliminary result is that EXL2 4.4b seems to outperform GPTQ-4bit-32g while EXL2 4.125b seems to outperform GPTQ-4bit-128g while using less VRAM in both cases.

I couldn't test AWQ yet because my quantization ended up broken, possibly due to this particular model using NTK scaling, so I'll probably have to go through the fun of burning my GPU for 16 hours again to quantize and evaluate another model so that a conclusion can be reached.

Also no idea if Phind-CodeLlama is actually good. WizardCoder-Python might be better.

Suggested labels

"LLM-Quantization"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llm Large Language Models llm-benchmarks testing and benchmarking large language models llm-experiments experiments with large language models llm-inference-engines Software to run inference on large language models llm-quantization All about Quantized LLM models and serving
Projects
None yet
Development

No branches or pull requests

2 participants