Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6

Qubitium · 2023-04-09T15:34:59Z

Env:
Ubuntu 22.04
pytorch 2.1 nightly cuda 11.8
transformer[head]
peft[head]

Reproduction steps:

pip install git+(transformer) head
check out GPTQ-for-LLaMa cuda branch
Generate 4bit quantized --act-order --sequential
Convert 4bit model from last step using GPTQ-triton/convert script to triton model.
Copy GPTQ-Triton/quant.py and custom_autotune.py to GPTQ-for-LLaMa cuda branch
Start inference code using gradio with minor changes.

Result:

The tokenizer loads
The quantized model loads
model.generate() is throwing the following error on input

Is the quantlized code not compatible with transformer[head]? Or am I doing something wrong?

    gen_output = model.generate(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 1485, in generate
    return self.sample(
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/generation/utils.py", line 2524, in sample
    outputs = self(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py", line 292, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: forward() got an unexpected keyword argument 'position_ids

Generation code

    input_ids = inputs["input_ids"].to(DEV)
    
    generation_config = GenerationConfig(
        do_sample=True,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repetition_penalty=repetition_penalty,
        length_penalty=length_penalty,
    )

    with torch.no_grad():
        gen_output = model.generate(
            input_ids=input_ids,
            generation_config=generation_config,
            return_dict_in_generate=True,
            output_scores=True,
            max_new_tokens=max_new_tokens,
        )

The text was updated successfully, but these errors were encountered:

fpgaminer · 2023-04-09T15:48:11Z

Good catch, thank you. I'll fix things up for the latest transformers head. In the meantime, you could try transformers @ commit a92e0ad2e20ef4ce28410b5e05c5d63a5a304e65

Qubitium · 2023-04-10T04:17:36Z

@fpgaminer Your latest triton updates is really fast. Can't believe it. GPTQ-for-LlaMa ported your new codes over and it finally made the triton branch not only useful but the fastest, on all my real-world tests.

Btw, not sure if it is transformer related or triton related but beam searching doesn't appear to work? I expected a slow down with num_beams going up but I get the same performance /s back which doesn't make much sense. Does triton need to implement beam or that should be handled by higher level transformer api? I am trying to isolate why beams are not functioning. Thanks.

fpgaminer · 2023-04-10T04:43:35Z

Does triton need to implement beam or that should be handled by higher level transformer api

That should be handled in the transformers library.

fpgaminer · 2023-04-10T05:29:56Z

FYI, there's a 10% performance regression in the latest transformers library (commit 7dcd870 and onward). I've opened an issue over there for it. I'm going to hold off on updating my code for now, and simply recommend sticking to pre 7dcd870 commits of transformers.

fpgaminer · 2023-04-16T06:12:00Z

Update:

As of the latest GPTQ-triton commit (3daf413), transformers HEAD is supported again. I'm working upstream to fix the performance regression in transformers.

Qubitium · 2023-04-17T11:36:43Z

Currently quantizing 30b 4bit using repo's new quantize script and will do some testing later. Will post finding here.

Qubitium · 2023-04-17T13:28:39Z

@fpgaminer Confirmed transformer[head] compat issue fixed with quantized 30b 4bit using your repo's quantize script. However, found a beaming issue at #11

Qubitium closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6

Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6

Qubitium commented Apr 9, 2023 •

edited

Loading

fpgaminer commented Apr 9, 2023

Qubitium commented Apr 10, 2023 •

edited

Loading

fpgaminer commented Apr 10, 2023 •

edited

Loading

fpgaminer commented Apr 10, 2023

fpgaminer commented Apr 16, 2023

Qubitium commented Apr 17, 2023

Qubitium commented Apr 17, 2023

Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6

Inference throwing: TypeError: forward() got an unexpected keyword argument 'position_ids #6

Comments

Qubitium commented Apr 9, 2023 • edited Loading

fpgaminer commented Apr 9, 2023

Qubitium commented Apr 10, 2023 • edited Loading

fpgaminer commented Apr 10, 2023 • edited Loading

fpgaminer commented Apr 10, 2023

fpgaminer commented Apr 16, 2023

Qubitium commented Apr 17, 2023

Qubitium commented Apr 17, 2023

Qubitium commented Apr 9, 2023 •

edited

Loading

Qubitium commented Apr 10, 2023 •

edited

Loading

fpgaminer commented Apr 10, 2023 •

edited

Loading