Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

Closed
1 task done
AlexDeng-AI opened this issue Dec 17, 2023 · 8 comments
Closed
1 task done

fail to load Mixtral-8x7B-v0.1-GPTQ #4960

AlexDeng-AI opened this issue Dec 17, 2023 · 8 comments
Labels
bug Something isn't working stale

Comments

@AlexDeng-AI
Copy link

Describe the bug

I load Mixtral-8x7B-v0.1-GPTQ using loader AutoGPTQ, but error occurs as follow:
2023-12-17 18:20:44 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/home/chao/work/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/models.py", line 385, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/work/text-generation-webui/modules/AutoGPTQ_loader.py", line 59, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 129, in from_quantized
return quant_func(
^^^^^^^^^^^
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 946, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1494, in load_checkpoint_in_model
set_module_tensor_to_device(
File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/utils/patch_utils.py", line 53, in set_module_tensor_to_device_patched
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([384, 14336]) in "qweight" (which has shape torch.Size([512, 14336])), this look incorrect.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

step 1: download Mixtral-8x7B-v0.1-GPTQ
step 2: select Mixtral-8x7B-v0.1-GPTQ
step 3: click load button

Screenshot

No response

Logs

warnings.warn(
2023-12-17 18:20:22 INFO:Loading Mixtral-8x7B-v0.1-GPTQ...
2023-12-17 18:20:23 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': False}
2023-12-17 18:20:23 WARNING:You have activated both exllama and exllamav2 kernel. Setting disable_exllama to True and keeping disable_exllamav2 to False
2023-12-17 18:20:44 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/chao/work/text-generation-webui/modules/ui_model_menu.py", line 209, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/models.py", line 89, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/models.py", line 385, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/work/text-generation-webui/modules/AutoGPTQ_loader.py", line 59, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/auto.py", line 129, in from_quantized
    return quant_func(
           ^^^^^^^^^^^
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/modeling/_base.py", line 946, in from_quantized
    accelerate.utils.modeling.load_checkpoint_in_model(
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1494, in load_checkpoint_in_model
    set_module_tensor_to_device(
  File "/home/chao/miniconda3/envs/textgen/lib/python3.11/site-packages/auto_gptq/utils/patch_utils.py", line 53, in set_module_tensor_to_device_patched
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([384, 14336]) in "qweight" (which has shape torch.Size([512, 14336])), this look incorrect.

System Info

debian 11
@AlexDeng-AI AlexDeng-AI added the bug Something isn't working label Dec 17, 2023
@SEVENID
Copy link

SEVENID commented Dec 17, 2023

See also: #4897 #4882

@killfrenzy96
Copy link

I'm also unable to load Mixtral 8x7B GPTQ using AutoGPTQ. I'm using the 3bit 128g version from:
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GPTQ

The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results. Other models load and function correctly, including Mixtral 8x7B GGUF using llama.cpp.

Running Windows 10 and a RTX 4090. NVIDIA driver version 546.17.

Logs:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 3 or set allow_custom_value=True.
  warnings.warn(
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 128 or set allow_custom_value=True.
  warnings.warn(
2023-12-18 04:51:12 INFO:Loading Mixtral-8x7B-v0.1-GPTQ-3b-128g...
2023-12-18 04:51:13 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': {0: '23000MiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': True}
Press any key to continue . . .

@barrymac
Copy link

This works for me now with the latest version. No other pip modules needed to be installed. The model took about 1003 seconds to load first time on my 4x V100 SXM2 GPUs to load.

@AlexDeng-AI
Copy link
Author

This works for me now with the latest version. No other pip modules needed to be installed. The model took about 1003 seconds to load first time on my 4x V100 SXM2 GPUs to load.

Thanks. I will try it.

@SaidTorres3
Copy link

I'm also unable to load Mixtral 8x7B GPTQ using AutoGPTQ. I'm using the 3bit 128g version from: https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GPTQ

The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results. Other models load and function correctly, including Mixtral 8x7B GGUF using llama.cpp.

Running Windows 10 and a RTX 4090. NVIDIA driver version 546.17.

Logs:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 3 or set allow_custom_value=True.
  warnings.warn(
C:\Programs\text-generation-webui\installer_files\env\Lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: 128 or set allow_custom_value=True.
  warnings.warn(
2023-12-18 04:51:12 INFO:Loading Mixtral-8x7B-v0.1-GPTQ-3b-128g...
2023-12-18 04:51:13 INFO:The AutoGPTQ params are: {'model_basename': 'model', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': {0: '23000MiB', 'cpu': '99GiB'}, 'quantize_config': None, 'use_cuda_fp16': True, 'disable_exllama': False, 'disable_exllamav2': True}
Press any key to continue . . .

Same problem, is still happening.

@JamesGoldenPlow
Copy link

Same exact problem here.

@antoineprobst
Copy link

Same problem. "The WebUI crashes without displaying any further errors when loading the model. A fresh install of the WebUI produces the same results."

@github-actions github-actions bot added the stale label Mar 19, 2024
Copy link

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

7 participants