Add GPTQ via Transformers. [Basic] #2365

digisomni · 2023-09-05T14:30:31Z

This enables support for GPTQ via Transformers. Seems the cleanest and most efficient way to do things.

Also updated format.sh to allow 'greater than or equal to' tool verisons.

Note: Perhaps the old way of quanting can be deprecated then as the maintainer recommends using AutoGPTQ. Alternatively, if there's a compelling reason to also have manual support, then the GPTQ module should be upgraded to use AutoGPTQ.

Closes #2215 #1745 #1671 #2375

Warning: This alters the package requirements for FastChat.

digisomni · 2023-09-06T09:26:00Z

Due to some bugs with functionality, I refactored the model_adapter quite a bit. Could use further refactoring I think but this is a start.

Please note this does affect almost all adapters, so proper code review + testing where possible should be employed.

leonxia1018 · 2023-09-12T15:37:17Z

I tried your branch with Exllama enabled but get this error:

Really need Exllama kernel, as it much faster than the default kernel.

digisomni · 2023-09-13T07:44:55Z

Official documentation recommends it to be off. I am not entirely sure the source of why this happens, would need to investigate in order to sort out how / when it can be on.

digisomni · 2023-09-13T09:50:40Z

Relevant notes are here: https://huggingface.co/docs/transformers/main/main_classes/quantization

…rmers-gptq

digisomni · 2023-09-25T07:00:53Z

Updated from master.

digisomni · 2023-10-03T04:17:18Z

Bump. :)

merrymercy · 2023-10-13T13:25:43Z

@digisomni Sorry for the delay. This is a big refactor. Please allow some time for me to review.
Could you rebase to the main following the recent updates (#2512, #2559 )?

…rmers-gptq

digisomni · 2023-10-16T18:51:12Z

Should be good to go now.

surak · 2023-11-03T12:19:25Z

@merrymercy looks like @digisomni did their part to have it working. Can we merge it?

…rmers-gptq

digisomni · 2023-11-25T15:57:15Z

I've rebased again. I recommend review + test + merging this sooner rather than later as people merge their PRs with adapter mods, it tends to muddy the waters as they're building on the old paradigm. This makes it harder to rebase the more time that elapses from the last rebase.

digisomni · 2024-02-14T02:55:10Z

I am closing this refactor in favor of using the vLLM worker. If at some point we need to push out features faster than vLLM, we can continue trying to maintain FastChat's native model worker, but for now it seems vLLM is going to be faster to the punch.

merrymercy · 2024-02-14T10:41:56Z

@digisomni I am sorry to hear that, but we have very limited bandwidth.
Currently, our focus is on SGLang worker #2928

You are welcome to try it as well. The default HF worker seems not well suited for high-performance deployment.

surak · 2024-02-14T15:10:59Z

@digisomni Really, try SGLang. I already moved as much as I can to it (except Mixtral, which I haven't got running yet) to SGLang, as it's so much better than VLLM and not even comparable to the original worker.

merrymercy · 2024-02-14T19:12:51Z

@surak could you submit the issues you got with SGLang worker?

Kalila added 4 commits September 5, 2023 22:28

Add GPTQ support via Transformers.

260d734

Update dependencies.

34ef72a

Update format.sh and format.

4c7ade0

Fix options for GPTQ-T bits.

704f191

digisomni marked this pull request as ready for review September 5, 2023 14:40

digisomni marked this pull request as draft September 5, 2023 16:43

Fix enabling/disabling GPTQ.

557447b

digisomni marked this pull request as ready for review September 5, 2023 17:43

Fix typo and run formatter.

2735477

merrymercy force-pushed the main branch from 86ef64f to dc3dd12 Compare September 6, 2023 03:26

digisomni marked this pull request as draft September 6, 2023 06:46

Improve model handling for adapters.

d7b168f

digisomni marked this pull request as ready for review September 6, 2023 09:23

Format.

33feda7

This was referenced Sep 6, 2023

Error: Failed to load GPTQ-for-LLaMa. No module named 'llama' when I load quantized model https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ #2375

Open

Vicuna-1.5 Quantized using AWQ Not Working - CUDA Illegal Memory Access #2264

Open

Merge remote-tracking branch 'upstream/main' into feature/add-transfo…

8109642

…rmers-gptq

merrymercy force-pushed the main branch 2 times, most recently from 14c0818 to e4758da Compare September 19, 2023 00:32

Merge remote-tracking branch 'upstream/main' into feature/add-transfo…

edac9c8

…rmers-gptq

merrymercy force-pushed the main branch from cc83153 to 8e8a604 Compare September 29, 2023 04:57

merrymercy self-assigned this Oct 9, 2023

merrymercy force-pushed the main branch from b6bf6b7 to 125f374 Compare October 10, 2023 20:35

digisomni and others added 2 commits October 17, 2023 02:38

Merge remote-tracking branch 'upstream/main' into feature/add-transfo…

9e33476

…rmers-gptq

Re-add import; fix typo.

ec73e60

digisomni force-pushed the feature/add-transformers-gptq branch from bc36cb1 to ec73e60 Compare November 25, 2023 13:37

digisomni and others added 4 commits November 25, 2023 22:46

Merge remote-tracking branch 'upstream/main' into feature/add-transfo…

34ce1de

…rmers-gptq

Fix typos.

605db88

Lint + fix typo.

07c21df

Further lint + fix.

7cec34d

digisomni closed this Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPTQ via Transformers. [Basic] #2365

Add GPTQ via Transformers. [Basic] #2365

digisomni commented Sep 5, 2023 •

edited

Loading

digisomni commented Sep 6, 2023

leonxia1018 commented Sep 12, 2023

digisomni commented Sep 13, 2023

digisomni commented Sep 13, 2023

digisomni commented Sep 25, 2023

digisomni commented Oct 3, 2023

merrymercy commented Oct 13, 2023

digisomni commented Oct 16, 2023

surak commented Nov 3, 2023

digisomni commented Nov 25, 2023

digisomni commented Feb 14, 2024

merrymercy commented Feb 14, 2024 •

edited

Loading

surak commented Feb 14, 2024

merrymercy commented Feb 14, 2024

Add GPTQ via Transformers. [Basic] #2365

Add GPTQ via Transformers. [Basic] #2365

Conversation

digisomni commented Sep 5, 2023 • edited Loading

digisomni commented Sep 6, 2023

leonxia1018 commented Sep 12, 2023

digisomni commented Sep 13, 2023

digisomni commented Sep 13, 2023

digisomni commented Sep 25, 2023

digisomni commented Oct 3, 2023

merrymercy commented Oct 13, 2023

digisomni commented Oct 16, 2023

surak commented Nov 3, 2023

digisomni commented Nov 25, 2023

digisomni commented Feb 14, 2024

merrymercy commented Feb 14, 2024 • edited Loading

surak commented Feb 14, 2024

merrymercy commented Feb 14, 2024

digisomni commented Sep 5, 2023 •

edited

Loading

merrymercy commented Feb 14, 2024 •

edited

Loading