Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LlamaFamiliy] add a tip about dtype #25794

Merged
merged 5 commits into from
Aug 28, 2023

Conversation

ArthurZucker
Copy link
Collaborator

What does this PR do?

add a warning=True tip to the Llama2 doc to make sure people are not confused.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 28, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!


The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used.

Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.
Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.

Copy link

@spikeliu spikeliu Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the doc also tell people that using convert_llama_weights_to_hf.py will cause the confusion as mentioned above? As it will write torch_dtype as bfloat16 in the config file, but when loading the model without setting torch_dtype="auto", the parameters in the model will be casted to float32, but when looking at the model.config.torch_dtype, it still say torch.bfloat16, but the actual memory useage doubles. And what is the best practice with dtype with respect to only doing inference? Using bfloat16 is good enough, or better in float32? For llama-2-70B, I think people would care as the memory difference is huge (think of a single computing node with 4 * A100 40GB comapred to 4 * A100 80GB, for 7B and 13B, no trouble for both of them, but the first one won't be able to load the float32 70B model).
Actually there is another confusion, as in the latest convert_llama_weights_to_hf.py, model.config.torch_dtype has been assigned to torch.float16 before save_pretrained is called. but after I run the script with the pre-downloaded model and check the model's config.json file, the torch_dtype is still set to bfloat16.

model.config.torch_dtype = torch.float16

Maybe direcly casting model.config.torch_dtype to a different value won't take effect in the final dumped files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this line is a typo i'll remove it!


The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used.

Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.
Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.

Co-authored-by: Lysandre <[email protected]>
@ArthurZucker ArthurZucker marked this pull request as ready for review August 28, 2023 09:40
@ArthurZucker ArthurZucker merged commit de13970 into huggingface:main Aug 28, 2023
3 checks passed
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* add a warning=True tip to the Llama2 doc

* code llama needs a tip too

* doc nit

* build PR doc

* doc nits

Co-authored-by: Lysandre <[email protected]>

---------

Co-authored-by: Lysandre <[email protected]>
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* add a warning=True tip to the Llama2 doc

* code llama needs a tip too

* doc nit

* build PR doc

* doc nits

Co-authored-by: Lysandre <[email protected]>

---------

Co-authored-by: Lysandre <[email protected]>
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023
* add a warning=True tip to the Llama2 doc

* code llama needs a tip too

* doc nit

* build PR doc

* doc nits

Co-authored-by: Lysandre <[email protected]>

---------

Co-authored-by: Lysandre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants