You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

mclanza · 2024-05-17T11:34:24Z

🐛 Describe the bug

Following instructions here https://github.com/allenai/OLMo used to work some days ago. I tried it yesterday and today again and it throws the following warning/error
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
and no results are shown as it used to show.

This is the code that I'm trying to run

from hf_olmo import * # registers the Auto* classes

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B")

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

I also tried with tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B", trust_remote_code=True) but same message is shown.

I also tried allenai/olmo-7b (lowercase) and same message is shown.

What am I missing? What could have changed since the last time that it worked?

Thanks!

Versions

python --version && pip freeze
Python 3.9.6
ai2-olmo==0.3.0
antlr4-python3-runtime==4.9.3
boto3==1.34.107
botocore==1.34.107
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.4
fsspec==2024.5.0
google-api-core==2.19.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
huggingface-hub==0.21.4
idna==3.7
Jinja2==3.1.4
jmespath==1.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
omegaconf==2.3.0
packaging==24.0
pillow==10.3.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
regex==2024.5.15
requests==2.31.0
rich==13.7.1
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.0.0
torchaudio==2.0.1
torchvision==0.15.1
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
urllib3==1.26.18

The text was updated successfully, but these errors were encountered:

2015aroras · 2024-05-21T21:21:32Z

Hi Marie,

We integrated OLMo directly into the transformers library recently, which caused us to change the checkpoints to use. You will probably want to use allenai/OLMo-7B-hf instead, or allenai/OLMo-1.7-7B-hf if you would like to use our recently-released improved 7B model. I have updated the README with the new information: #589. Sorry for the inconvenience.

mclanza · 2024-05-23T13:18:28Z

Thanks @2015aroras !

I tried it with the following code

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf", trust_remote_code=True)

message = ["Language modeling is "]
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

and it hangs after loading checkpoint shards

I tried it myself and some colleagues tried it too, on different machines, different environments and different networks. Any idea that could help us?

Thanks!

Versions

python --version && pip freeze
Python 3.9.6
ai2-olmo==0.3.0
antlr4-python3-runtime==4.9.3
boto3==1.34.111
botocore==1.34.111
cached_path==1.6.2
cachetools==5.3.3
certifi==2024.2.2
charset-normalizer==3.3.2
filelock==3.13.4
fsspec==2024.5.0
google-api-core==2.19.0
google-auth==2.29.0
google-cloud-core==2.4.1
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-resumable-media==2.7.0
googleapis-common-protos==1.63.0
huggingface-hub==0.21.4
idna==3.7
Jinja2==3.1.4
jmespath==1.0.1
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
omegaconf==2.3.0
packaging==24.0
pillow==10.3.0
proto-plus==1.23.0
protobuf==4.25.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
Pygments==2.18.0
python-dateutil==2.9.0.post0
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.2
rich==13.7.1
rsa==4.9
s3transfer==0.10.1
safetensors==0.4.3
six==1.16.0
sympy==1.12
tokenizers==0.19.1
torch==2.2.2
torchaudio==2.2.2
torchvision==0.17.2
tqdm==4.66.4
transformers==4.40.2
typing_extensions==4.11.0
urllib3==1.26.18

mclanza · 2024-05-23T13:25:24Z

It hangs here:
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

Code:

from transformers import AutoModelForCausalLM, AutoTokenizer

print('Step 1...')
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1.7-7B-hf")

print('Step 2...')
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1.7-7B-hf")

message = ["Language modeling is "]
print('Step 3...')
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)

print('Step 4...')
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)

print('Step 5...')
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Result:

The same happens with allenai/OLMo-7B-hf and with/withour argument trust_remote_code=True.

2015aroras · 2024-05-23T17:02:57Z

The trust_remote_code=True is not needed for our -hf models.

Without looking too deeply, my guess is that this is trying to run on CPUs (since nothing is telling it to run on GPUs). The simplest solution would be to pass , device_map="auto" to AutoModelForCausalLM.from_pretrained (and pip installing accelerate if needed). You may also add a .to("cuda") to the end of your inputs to move them to GPUs too. Other than that, you can set max_new_tokens=1 just to see if things are hanging or the model is just being slow.

Aside: the model might not fit on just 1 GPU.

mclanza · 2024-05-23T19:04:20Z

Exactly! Thanks for your reply! It took much more time (hours) but it finally worked. I'll try your suggestions. Thanks again!

mclanza · 2024-05-23T20:04:14Z

Thanks for the detailed replies. It worked ok now. I'm closing the issue. Thanks again!

mclanza added the type/bug An issue about a bug label May 17, 2024

mclanza closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

mclanza commented May 17, 2024

2015aroras commented May 21, 2024

mclanza commented May 23, 2024

mclanza commented May 23, 2024 •

edited

Loading

2015aroras commented May 23, 2024

mclanza commented May 23, 2024 •

edited

Loading

mclanza commented May 23, 2024

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. #581

Comments

mclanza commented May 17, 2024

🐛 Describe the bug

Versions

2015aroras commented May 21, 2024

mclanza commented May 23, 2024

Versions

mclanza commented May 23, 2024 • edited Loading

2015aroras commented May 23, 2024

mclanza commented May 23, 2024 • edited Loading

mclanza commented May 23, 2024

mclanza commented May 23, 2024 •

edited

Loading

mclanza commented May 23, 2024 •

edited

Loading