[`LlamaFamiliy`] add a tip about dtype #25794

ArthurZucker · 2023-08-28T08:20:11Z

What does this PR do?

add a warning=True tip to the Llama2 doc to make sure people are not confused.

HuggingFaceDocBuilderDev · 2023-08-28T08:53:54Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik

Looks good to me!

LysandreJik · 2023-08-28T08:24:03Z

docs/source/en/model_doc/code_llama.md

+
+    The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used. 
+
+    Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.


Suggested change

Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.

Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.

Should the doc also tell people that using convert_llama_weights_to_hf.py will cause the confusion as mentioned above? As it will write torch_dtype as bfloat16 in the config file, but when loading the model without setting torch_dtype="auto", the parameters in the model will be casted to float32, but when looking at the model.config.torch_dtype, it still say torch.bfloat16, but the actual memory useage doubles. And what is the best practice with dtype with respect to only doing inference? Using bfloat16 is good enough, or better in float32? For llama-2-70B, I think people would care as the memory difference is huge (think of a single computing node with 4 * A100 40GB comapred to 4 * A100 80GB, for 7B and 13B, no trouble for both of them, but the first one won't be able to load the float32 70B model).
Actually there is another confusion, as in the latest convert_llama_weights_to_hf.py, model.config.torch_dtype has been assigned to torch.float16 before save_pretrained is called. but after I run the script with the pre-downloaded model and check the model's config.json file, the torch_dtype is still set to bfloat16.

transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py

Line 273 in 1c6f072

model.config.torch_dtype = torch.float16

Maybe direcly casting model.config.torch_dtype to a different value won't take effect in the final dumped files?

Yep, this line is a typo i'll remove it!

LysandreJik · 2023-08-28T08:24:13Z

docs/source/en/model_doc/llama2.md

+
+    The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used. 
+
+    Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.


Suggested change

Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.

Training the model in `float16` is not recommended and known to produce `nan`, as such the model should be trained in `bfloat16`.

Co-authored-by: Lysandre <[email protected]>

* add a warning=True tip to the Llama2 doc * code llama needs a tip too * doc nit * build PR doc * doc nits Co-authored-by: Lysandre <[email protected]> --------- Co-authored-by: Lysandre <[email protected]>

ArthurZucker added 4 commits August 28, 2023 08:13

add a warning=True tip to the Llama2 doc

81957d0

code llama needs a tip too

5a66e6e

doc nit

e5a0f18

build PR doc

646223e

LysandreJik approved these changes Aug 28, 2023

View reviewed changes

doc nits

a58edc0

Co-authored-by: Lysandre <[email protected]>

ArthurZucker marked this pull request as ready for review August 28, 2023 09:40

ArthurZucker merged commit de13970 into huggingface:main Aug 28, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`LlamaFamiliy`] add a tip about dtype #25794

[`LlamaFamiliy`] add a tip about dtype #25794

ArthurZucker commented Aug 28, 2023

HuggingFaceDocBuilderDev commented Aug 28, 2023 •

edited

Loading

LysandreJik left a comment

LysandreJik Aug 28, 2023

spikeliu Aug 30, 2023 •

edited

Loading

ArthurZucker Aug 31, 2023

LysandreJik Aug 28, 2023


		The `dtype` of the online weights is mostly irrelevant, unless you are using `torch_dtype="auto"` when initializing a model using `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`. The reason is that the model will first be downloaded ( using the `dtype` of the checkpoints online) then it will be casted to the default `dtype` of `torch` (becomes `torch.float32`) and finally, if there is a `torch_dtype` provided in the config, it will be used.

		Training the model in `float16` is not recommended and known to produce `nan`, as suche the model should be trained in `bfloat16`.

[LlamaFamiliy] add a tip about dtype #25794

[LlamaFamiliy] add a tip about dtype #25794

Conversation

ArthurZucker commented Aug 28, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 28, 2023 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Aug 28, 2023

Choose a reason for hiding this comment

spikeliu Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

ArthurZucker Aug 31, 2023

Choose a reason for hiding this comment

LysandreJik Aug 28, 2023

Choose a reason for hiding this comment

[`LlamaFamiliy`] add a tip about dtype #25794

[`LlamaFamiliy`] add a tip about dtype #25794

HuggingFaceDocBuilderDev commented Aug 28, 2023 •

edited

Loading

spikeliu Aug 30, 2023 •

edited

Loading