Set prepend_bos to false by default for Bloom model family #775

degenfabian · 2024-11-08T03:24:33Z

Description

This PR addresses Issue #774. Bloom models output insensible output when the prepend_bos flag is set to True, so for usability this flag should be set to False by default for the Bloom model family. I'm happy about ideas to implement this more cleanly, because with the current implementation the flag can practically not be set to True anymore, unless it is overwritten when specifically calling a function like generate. I think that's an unpretty solution but since I can't see a use case in which it would make sense to set the flag to True I think it's not terrible, but again very happy for suggestions.

Fixes #774

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Screenshots

For Screenshots of the difference in the generation of the models please see Issue #774.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

degenfabian · 2024-11-08T04:01:33Z

Interestingly, adding this if clause makes the acceptance test fail although it fixes the outputs being totally off. I will investigate this further tomorrow.

degenfabian · 2024-11-08T15:40:45Z

Okay I just had a look, and the previous expected loss value for the bloom-560m model was wrong and not what I got as loss when I ran the prompt with the huggingface implementation. Setting prepend_bos to False by default has brought the loss of the TransformerLens implementation much closer to the loss of the hugging face implementation, but it is still quite not there. I at the moment am not sure how to fix this, but I hope this might get fixed by some other inaccuracy tweaks across the project. For the time being I will focus on that and hope that this will be resolved by that too.

degenfabian · 2024-11-08T15:45:18Z

Interestingly, I just ran the hugging face implementation on Google Colab to see if I would get a different value than on my Mac, and indeed I did. The loss value I got from Colab now matches what we get from TransformerLens with prepend_bos set to false close enough to be within the error margin, so hooray I guess. I'm slightly confused by this, does anyone have an idea as to why I would get a different loss from the hugging face implementation on my Mac then on Google Colab?

degenfabian · 2024-11-08T18:17:26Z

I just tested this with gpt2-small, and here the output with prepend_bos set to True is not completely unusable but still deviates from Huggingface. If we just want to match HuggingFace as closely as possible, I could imagine adding this in loading_from_pretrained.py and setting the prepend_bos model accordingly for each model we load from Huggingface, which also seems like a much prettier solution then what I did so far. I would be happy to do that and rename this PR and issue, what do you think @bryce13950 ?

bryce13950 · 2024-11-11T19:57:11Z

Let's talk about this a bit to get up to speed, and figure out if we can find where the divergence is.

Fabian Degen added 3 commits November 8, 2024 04:19

fix prepend_bos to False by default for bloom model family

465f587

add comment

8300dd5

edit documentation

5878cbe

degenfabian marked this pull request as ready for review November 8, 2024 03:34

degenfabian marked this pull request as draft November 8, 2024 04:06

fix wrong expected value for bloom-560m model loss

8579529

fix expected loss value for bloom model computed with google colab

958f611

degenfabian marked this pull request as ready for review November 8, 2024 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set prepend_bos to false by default for Bloom model family #775

Set prepend_bos to false by default for Bloom model family #775

degenfabian commented Nov 8, 2024 •

edited

Loading

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024 •

edited

Loading

bryce13950 commented Nov 11, 2024

Set prepend_bos to false by default for Bloom model family #775

Are you sure you want to change the base?

Set prepend_bos to false by default for Bloom model family #775

Conversation

degenfabian commented Nov 8, 2024 • edited Loading

Description

Type of change

Screenshots

Checklist:

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024

degenfabian commented Nov 8, 2024 • edited Loading

bryce13950 commented Nov 11, 2024

degenfabian commented Nov 8, 2024 •

edited

Loading

degenfabian commented Nov 8, 2024 •

edited

Loading