fail to load Mixtral-8x7B-v0.1-GPTQ #4960

AlexDeng-AI · 2023-12-17T10:29:18Z

Describe the bug

I load Mixtral-8x7B-v0.1-GPTQ using loader AutoGPTQ, but error occurs as follow:
2023-12-17 18:20:44 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/home/chao/work/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/models.py", line 385, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/AutoGPTQ_loader.py", line 59, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 129, in from_quantized
return quant_func(
^^^^^^^^^^^
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 946, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1494, in load_checkpoint_in_model
set_module_tensor_to_device(
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/utils/patch_utils.py", line 53, in set_module_tensor_to_device_patched
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([384, 14336]) in "qweight" (which has shape torch.Size([512, 14336])), this look incorrect.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

step 1: download Mixtral-8x7B-v0.1-GPTQ
step 2: select Mixtral-8x7B-v0.1-GPTQ
step 3: click load button

Screenshot

No response

Logs

warnings.warn(
2023-12-17 18:20:22 INFO:Loading Mixtral-8x7B-v0.1-GPTQ...
2023-12-17 18:20:23 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': False}
2023-12-17 18:20:23 WARNING:You have activated both exllama and exllamav2 kernel. Setting disable_exllama to True and keeping disable_exllamav2 to False
2023-12-17 18:20:44 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/chao/work/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/models.py", line 89, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/models.py", line 385, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/AutoGPTQ_loader.py", line 59, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 129, in from_quantized
    return quant_func(
           ^^^^^^^^^^^
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 946, in from_quantized
    accelerate.utils.modeling.load_checkpoint_in_model(
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1494, in load_checkpoint_in_model
    set_module_tensor_to_device(
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/utils/patch_utils.py", line 53, in set_module_tensor_to_device_patched
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([384, 14336]) in "qweight" (which has shape torch.Size([512, 14336])), this look incorrect.

System Info

debian 11

SEVENID · 2023-12-17T17:01:28Z

See also: #4897 #4882

killfrenzy96 · 2023-12-17T18:05:37Z

I'm also unable to load Mixtral 8x7B GPTQ using AutoGPTQ. I'm using the 3bit 128g version from:
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GPTQ

The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results. Other models load and function correctly, including Mixtral 8x7B GGUF using llama.cpp.

Running Windows 10 and a RTX 4090. NVIDIA driver version 546.17.

Logs:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 3 or set allow_custom_value=True.
  warnings.warn(
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 128 or set allow_custom_value=True.
  warnings.warn(
2023-12-18 04:51:12 INFO:Loading Mixtral-8x7B-v0.1-GPTQ-3b-128g...
2023-12-18 04:51:13 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': {0: '23000MiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': True}
Press any key to continue . . .

barrymac · 2023-12-18T12:38:05Z

This works for me now with the latest version. No other pip modules needed to be installed. The model took about 1003 seconds to load first time on my 4x V100 SXM2 GPUs to load.

AlexDeng-AI · 2023-12-21T09:58:33Z

This works for me now with the latest version. No other pip modules needed to be installed. The model took about 1003 seconds to load first time on my 4x V100 SXM2 GPUs to load.

Thanks. I will try it.

SaidTorres3 · 2023-12-21T17:14:24Z

I'm also unable to load Mixtral 8x7B GPTQ using AutoGPTQ. I'm using the 3bit 128g version from: https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GPTQ

The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results. Other models load and function correctly, including Mixtral 8x7B GGUF using llama.cpp.

Running Windows 10 and a RTX 4090. NVIDIA driver version 546.17.

Logs:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 3 or set allow_custom_value=True.
  warnings.warn(
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 128 or set allow_custom_value=True.
  warnings.warn(
2023-12-18 04:51:12 INFO:Loading Mixtral-8x7B-v0.1-GPTQ-3b-128g...
2023-12-18 04:51:13 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': {0: '23000MiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': True}
Press any key to continue . . .

Same problem, is still happening.

JamesGoldenPlow · 2024-01-17T16:51:40Z

Same exact problem here.

antoineprobst · 2024-01-17T23:39:20Z

Same problem. "The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results."

github-actions · 2024-03-19T23:16:39Z

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

AlexDeng-AI added the bug Something isn't working label Dec 17, 2023

github-actions bot added the stale label Mar 19, 2024

github-actions bot closed this as completed Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

AlexDeng-AI commented Dec 17, 2023

SEVENID commented Dec 17, 2023 •

edited

Loading

killfrenzy96 commented Dec 17, 2023

barrymac commented Dec 18, 2023

AlexDeng-AI commented Dec 21, 2023

SaidTorres3 commented Dec 21, 2023

JamesGoldenPlow commented Jan 17, 2024

antoineprobst commented Jan 17, 2024

github-actions bot commented Mar 19, 2024

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

Comments

AlexDeng-AI commented Dec 17, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

SEVENID commented Dec 17, 2023 • edited Loading

killfrenzy96 commented Dec 17, 2023

barrymac commented Dec 18, 2023

AlexDeng-AI commented Dec 21, 2023

SaidTorres3 commented Dec 21, 2023

JamesGoldenPlow commented Jan 17, 2024

antoineprobst commented Jan 17, 2024

github-actions bot commented Mar 19, 2024

SEVENID commented Dec 17, 2023 •

edited

Loading