Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'LlamaAWQForCausalLM' object has no attribute 'config' #26970

Closed
2 of 4 tasks
OriginalGoku opened this issue Oct 21, 2023 · 10 comments
Closed
2 of 4 tasks

'LlamaAWQForCausalLM' object has no attribute 'config' #26970

OriginalGoku opened this issue Oct 21, 2023 · 10 comments

Comments

@OriginalGoku
Copy link

System Info

I am trying to run a CodeLlama model on Colab with a free GPU.
The code was copied from here:
https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-AWQ

Who can help?

@ArthurZucker
@younesbelkada
@Narsil

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Here is the code:

The code is pretty simple:
!pip3 install autoawq
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

from transformers import pipeline

# model_name_or_path = "TheBloke/CodeLlama-13B-Instruct-AWQ"
model_name_or_path = "TheBloke/CodeLlama-7B-Instruct-AWQ"


# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=True, safetensors=True)

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)


prompt = "Tell me about AI"
# This was the default prompt and i did not change it
prompt_template=f'''[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:

{prompt}

[/INST]

'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

and here is the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-7-6fa1284003fe>](https://localhost:8080/#) in <cell line: 5>()
      3 
      4 print("*** Pipeline:")
----> 5 pipe = pipeline(
      6     "text-generation",
      7     model=model,

1 frames
[/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py](https://localhost:8080/#) in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
    842         )
    843 
--> 844     model_config = model.config
    845     hub_kwargs["_commit_hash"] = model.config._commit_hash
    846     load_tokenizer = type(model_config) in TOKENIZER_MAPPING or model_config.tokenizer_class is not None

[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in __getattr__(self, name)
   1693             if name in modules:
   1694                 return modules[name]
-> 1695         raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
   1696 
   1697     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'LlamaAWQForCausalLM' object has no attribute 'config'

Expected behavior

When I do the inference with the following code, everything works:

print("\n\n*** Generate:")

tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

# Generate output
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    max_new_tokens=512
)

print("Output: ", tokenizer.decode(generation_output[0]))
@younesbelkada
Copy link
Contributor

Hi @OriginalGoku
It seems you are using from awq import AutoAWQForCausalLM which is not an object from transformers, we will integrate soon AWQ in transformers cc @SunMarc for visibility

@ptanov
Copy link

ptanov commented Nov 9, 2023

Hi @OriginalGoku , you could try doing
model=model.model,

@OriginalGoku
Copy link
Author

Hi @ptanov
I did not understand your code

@younesbelkada
Copy link
Contributor

Hi everyone,
Now we have integrated AWQ in transformers, you can directly use it via AutoModelForCausalLM interface, make sure to first pip install -U transformers. And check out this demo: https://colab.research.google.com/drive/1HzZH89yAXJaZgwJDhQj9LqSBux932BvY?usp=sharing for understanding how to use AWQ integration and this documentation section: https://huggingface.co/docs/transformers/main_classes/quantization#awq-integration for more details

@ptanov
Copy link

ptanov commented Nov 10, 2023

Hi @ptanov I did not understand your code

@OriginalGoku, instead of

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

write

pipe = pipeline(
    "text-generation",
    model=model.model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

@ptanov
Copy link

ptanov commented Nov 14, 2023

Hi everyone, Now we have integrated AWQ in transformers, you can directly use it via AutoModelForCausalLM interface, make sure to first pip install -U transformers. And check out this demo: https://colab.research.google.com/drive/1HzZH89yAXJaZgwJDhQj9LqSBux932BvY?usp=sharing for understanding how to use AWQ integration and this documentation section: https://huggingface.co/docs/transformers/main_classes/quantization#awq-integration for more details

Hi @younesbelkada is there any way to set fuse_layers=True (in AutoAWQForCausalLM.from_quantized)? This option seems to improve overall performance of autoawq significantly.

@younesbelkada
Copy link
Contributor

Hi @ptanov
Yes I am working on it here: #27411 and indeed I can confirm the huge performance boost. For now it seems to work fine on Llama & Mistral checkpoints - it will require autoawq==0.1.7 (coming soon) - cc @casper-hansen for visibility

@casper-hansen
Copy link

I have been working hard on making 0.1.7 ready! And it soon will be. After that, you will get the equivalent speedup straight from transformers - stay tuned

Copy link

github-actions bot commented Dec 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@younesbelkada
Copy link
Contributor

CLosing as #27411 has been merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants