Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

Closed
mclanza opened this issue May 17, 2024 · 6 comments
Labels
type/bug An issue about a bug

Comments

@mclanza
Copy link

mclanza commented May 17, 2024

🐛 Describe the bug

Following instructions here https://github.com/allenai/OLMo used to work some days ago. I tried it yesterday and today again and it throws the following warning/error
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
and no results are shown as it used to show.

This is the code that I'm trying to run

from hf_olmo import * # registers the Auto* classes

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B")

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

I also tried with tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B", trust_remote_code=True) but same message is shown.

I also tried allenai/olmo-7b (lowercase) and same message is shown.

What am I missing? What could have changed since the last time that it worked?

Thanks!

Versions

python --version && pip freeze
Python 3.9.6
ai2-olmo==0.3.0
antlr4-python3-runtime==4.9.3
boto3==1.34.107
botocore==1.34.107
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.4
fsspec==2024.5.0
google-api-core==2.19.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
huggingface-hub==0.21.4
idna==3.7
Jinja2==3.1.4
jmespath==1.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
omegaconf==2.3.0
packaging==24.0
pillow==10.3.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
regex==2024.5.15
requests==2.31.0
rich==13.7.1
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.0.0
torchaudio==2.0.1
torchvision==0.15.1
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
urllib3==1.26.18

@mclanza mclanza added the type/bug An issue about a bug label May 17, 2024
@2015aroras
Copy link
Collaborator

Hi Marie,

We integrated OLMo directly into the transformers library recently, which caused us to change the checkpoints to use. You will probably want to use allenai/OLMo-7B-hf instead, or allenai/OLMo-1.7-7B-hf if you would like to use our recently-released improved 7B model. I have updated the README with the new information: #589. Sorry for the inconvenience.

@mclanza
Copy link
Author

mclanza commented May 23, 2024

Thanks @2015aroras !

I tried it with the following code

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf", trust_remote_code=True)

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

and it hangs after loading checkpoint shards

image

I tried it myself and some colleagues tried it too, on different machines, different environments and different networks. Any idea that could help us?

Thanks!


Versions

python --version && pip freeze
Python 3.9.6
ai2-olmo==0.3.0
antlr4-python3-runtime==4.9.3
boto3==1.34.111
botocore==1.34.111
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.4
fsspec==2024.5.0
google-api-core==2.19.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
huggingface-hub==0.21.4
idna==3.7
Jinja2==3.1.4
jmespath==1.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
omegaconf==2.3.0
packaging==24.0
pillow==10.3.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.2
rich==13.7.1
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.2.2
torchaudio==2.2.2
torchvision==0.17.2
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
urllib3==1.26.18

@mclanza
Copy link
Author

mclanza commented May 23, 2024

It hangs here:
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)


Code:

from transformers import AutoModelForCausalLM, AutoTokenizer

print('Step 1...')
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")

print('Step 2...')
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")

message = ["Language modeling is "]
print('Step 3...')
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)

print('Step 4...')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

print('Step 5...')
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Result:
image


The same happens with allenai/OLMo-7B-hf and with/withour argument trust_remote_code=True.

@2015aroras
Copy link
Collaborator

The trust_remote_code=True is not needed for our -hf models.

Without looking too deeply, my guess is that this is trying to run on CPUs (since nothing is telling it to run on GPUs). The simplest solution would be to pass , device_map="auto" to AutoModelForCausalLM.from_pretrained (and pip installing accelerate if needed). You may also add a .to("cuda") to the end of your inputs to move them to GPUs too. Other than that, you can set max_new_tokens=1 just to see if things are hanging or the model is just being slow.

Aside: the model might not fit on just 1 GPU.

@mclanza
Copy link
Author

mclanza commented May 23, 2024

Exactly! Thanks for your reply! It took much more time (hours) but it finally worked. I'll try your suggestions. Thanks again!

image

@mclanza
Copy link
Author

mclanza commented May 23, 2024

Thanks for the detailed replies. It worked ok now. I'm closing the issue. Thanks again!

@mclanza mclanza closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug An issue about a bug
Projects
None yet
Development

No branches or pull requests

2 participants