[FEATURE] DeepSeek V2 Chat Support #48

Xu-Chen · 2024-06-23T02:39:15Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

AutoGPTQ/AutoGPTQ#664

Qubitium · 2024-06-24T03:17:01Z

@LRL-ModelCloud has been assigned to this task. Model has been downloaded and work should be completed soon.

Xu-Chen · 2024-06-29T02:45:34Z

@LRL-ModelCloud has been assigned to this task. Model has been downloaded and work should be completed soon.

Can you provide a quantified model for DeepSeek V2 Chat? I encountered an OOM error during the quantization process

Qubitium · 2024-06-29T03:55:02Z

@Xu-Chen What gpu model did you use for deepseek v2 quant? I want to check if the oom is code related or just because deepseek v2 is a little special and requires more vram.

Xu-Chen · 2024-06-29T04:55:16Z

@Xu-Chen What gpu model did you use for deepseek v2 quant? I want to check if the oom is code related or just because deepseek v2 is a little special and requires more vram.

File "/home/root/.local/lib/python3.10/site-packages/gptqmodel/models/base.py", line 258, in quantize
    move_to(module, cur_layer_device)
  File "/home/root/.local/lib/python3.10/site-packages/gptqmodel/utils/model.py", line 66, in move_to
    obj = obj.to(device)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 804, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

quant code

  quantize_config = QuantizeConfig(
        true_sequential=False,
        bits=4,                
        group_size=group_size,    
        desc_act=desc_act
    )
    max_memory = {i: "75GB" for i in range(8)}
    model = GPTQModel.from_pretrained(
        args.model_id,
        quantize_config,
        trust_remote_code=True,
        device_map="sequential",
        attn_implementation="eager",
        torch_dtype=torch.bfloat16,
        max_memory=max_memory)
    model.quantize(examples)

Is it not possible to use the GPU to load the model?

GPU: 8 * A800-80GB
RAM: 800GB

Xu-Chen · 2024-06-29T05:16:41Z

vram

delete max_memory=max_memory can run.

Is there a way to use the GPU to load the model and then perform parallel quantization to improve the quantization speed?

Qubitium · 2024-06-29T09:04:49Z

Remove all options and use just the base. GPTQModel will select the best dtype and accelerate will auto handle model weight splits.

  model = GPTQModel.from_pretrained(
        args.model_id,
        quantize_config,
  )

Xu-Chen added the enhancement New feature or request label Jun 23, 2024

Qubitium assigned LRL-ModelCloud Jun 24, 2024

Qubitium mentioned this issue Jun 24, 2024

[MODEL] Add DeepSeek-V2 support #51

Merged

Qubitium closed this as completed in #51 Jun 26, 2024

Qubitium mentioned this issue Jun 29, 2024

[BUG] We may need to remove max_memory arg #115

Closed

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this issue Jul 19, 2024

Update supported model table's style (ModelCloud#48)

5f07508

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] DeepSeek V2 Chat Support #48

[FEATURE] DeepSeek V2 Chat Support #48

Xu-Chen commented Jun 23, 2024

Qubitium commented Jun 24, 2024

Xu-Chen commented Jun 29, 2024

Qubitium commented Jun 29, 2024 •

edited

Loading

Xu-Chen commented Jun 29, 2024 •

edited

Loading

Xu-Chen commented Jun 29, 2024

Qubitium commented Jun 29, 2024

[FEATURE] DeepSeek V2 Chat Support #48

[FEATURE] DeepSeek V2 Chat Support #48

Comments

Xu-Chen commented Jun 23, 2024

Qubitium commented Jun 24, 2024

Xu-Chen commented Jun 29, 2024

Qubitium commented Jun 29, 2024 • edited Loading

Xu-Chen commented Jun 29, 2024 • edited Loading

Xu-Chen commented Jun 29, 2024

Qubitium commented Jun 29, 2024

Qubitium commented Jun 29, 2024 •

edited

Loading

Xu-Chen commented Jun 29, 2024 •

edited

Loading