[Bug Report] Different results from HuggingFace when using the GPT2 small example #685

nreHieW · 2024-07-27T07:16:05Z

If you are submitting a bug report, please fill in the following details and use the tag [bug].

Describe the bug
The output from gpt2 small is different form the hf implementation. I searched for gpt2 implementation specifically and there doesn't seem to be anything

Code example
Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

import transformer_lens
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

transformer_lens_model = transformer_lens.HookedTransformer.from_pretrained("gpt2-small").cuda()
transformer_lens_model.cfg.default_prepend_bos = False

logits, activations = transformer_lens_model.run_with_cache("Hello World")


tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2").cuda()
inputs = tokenizer("Hello World", return_tensors="pt").to("cuda")
outputs = model(**inputs, output_hidden_states=True)
torch.allclose(logits, outputs.logits)

System Info
Describe the characteristic of your environment:
This was run on the free tier of Google Colab

Additional context
Add any other context about the problem here.

Checklist

I have checked that there is no similar issue in the repo (required)

bryce13950 mentioned this issue Jul 27, 2024

[Bug Report] Qwen model implementation is too inaccurate #683

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Different results from HuggingFace when using the GPT2 small example #685

[Bug Report] Different results from HuggingFace when using the GPT2 small example #685

nreHieW commented Jul 27, 2024

[Bug Report] Different results from HuggingFace when using the GPT2 small example #685

[Bug Report] Different results from HuggingFace when using the GPT2 small example #685

Comments

nreHieW commented Jul 27, 2024

Checklist