Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] DeepSeek V2 Chat Support #48

Closed
Xu-Chen opened this issue Jun 23, 2024 · 6 comments · Fixed by #51
Closed

[FEATURE] DeepSeek V2 Chat Support #48

Xu-Chen opened this issue Jun 23, 2024 · 6 comments · Fixed by #51
Assignees
Labels
enhancement New feature or request

Comments

@Xu-Chen
Copy link

Xu-Chen commented Jun 23, 2024

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

AutoGPTQ/AutoGPTQ#664

@Xu-Chen Xu-Chen added the enhancement New feature or request label Jun 23, 2024
@Qubitium
Copy link
Contributor

@LRL-ModelCloud has been assigned to this task. Model has been downloaded and work should be completed soon.

@Xu-Chen
Copy link
Author

Xu-Chen commented Jun 29, 2024

@LRL-ModelCloud has been assigned to this task. Model has been downloaded and work should be completed soon.

Can you provide a quantified model for DeepSeek V2 Chat? I encountered an OOM error during the quantization process

@Qubitium
Copy link
Contributor

Qubitium commented Jun 29, 2024

@Xu-Chen What gpu model did you use for deepseek v2 quant? I want to check if the oom is code related or just because deepseek v2 is a little special and requires more vram.

@Xu-Chen
Copy link
Author

Xu-Chen commented Jun 29, 2024

@Xu-Chen What gpu model did you use for deepseek v2 quant? I want to check if the oom is code related or just because deepseek v2 is a little special and requires more vram.

File "/home/root/.local/lib/python3.10/site-packages/gptqmodel/models/base.py", line 258, in quantize
    move_to(module, cur_layer_device)
  File "/home/root/.local/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 66, in move_to
    obj = obj.to(device)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

quant code

  quantize_config = QuantizeConfig(
        true_sequential=False,
        bits=4,                
        group_size=group_size,    
        desc_act=desc_act
    )
    max_memory = {i: "75GB" for i in range(8)}
    model = GPTQModel.from_pretrained(
        args.model_id,
        quantize_config,
        trust_remote_code=True,
        device_map="sequential",
        attn_implementation="eager",
        torch_dtype=torch.bfloat16,
        max_memory=max_memory)
    model.quantize(examples)

Is it not possible to use the GPU to load the model?

GPU: 8 * A800-80GB
RAM: 800GB

@Xu-Chen
Copy link
Author

Xu-Chen commented Jun 29, 2024

vram

delete max_memory=max_memory can run.

Is there a way to use the GPU to load the model and then perform parallel quantization to improve the quantization speed?

@Qubitium
Copy link
Contributor

Remove all options and use just the base. GPTQModel will select the best dtype and accelerate will auto handle model weight splits.

  model = GPTQModel.from_pretrained(
        args.model_id,
        quantize_config,
  )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants