NanoGPT Conversation did not handle case when there were no biases in model #629

dashstander · 2024-06-07T05:37:30Z

Description

convert_nanogpt_weights had two issues:

It lacked the attention mask and the the IGNORE tensor.
It did not correctly handle the case where the nanogpt model was configured to not have biases in the linear layers.
When trying to use the function, loading the converted weights into a HookedTransformer would fail for lack of the proper tensors. If we're not supposed to checkpoint the masking tensor, then there is a separate issue in which HookedTransformer won't load a checkpoint without it there.

I have not added any tests or re-written documentation. There are no existing tests and the only documentation that I can find pertaining to this issue is a comment that said the code worked both with and without biases.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

… when there is no bias. Signed-off-by: Dashiell Stander <[email protected]>

bryce13950 · 2024-06-11T00:26:41Z

Thanks for finding this, and fixing it for everyone. There is just one type error that needs to be resolved when creating your tensor for mlp.b_in in your else block. The variable it is complaining about d_mlp could be None here, so there needs to either be an error thrown, or a default set if it is none at this point. I don't mind which it is. In all likelihood it will be set at this point, but we need to account for the possibility that it is not.

Signed-off-by: Dashiell Stander <[email protected]>

dashstander and others added 3 commits June 6, 2024 22:29

Update convert_nanogpt_weights to have attention mask and handle case…

360b6d4

… when there is no bias. Signed-off-by: Dashiell Stander <[email protected]>

Merge branch 'dev' into nanogpt-fix

ba6dd1c

ran format

e7beaa5

dashstander and others added 3 commits June 29, 2024 16:39

Make beartyping dependency more forgiving

e84afae

Signed-off-by: Dashiell Stander <[email protected]>

Merge branch 'dev' into nanogpt-fix

e1d3e4a

generated lock file

e5fb276

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NanoGPT Conversation did not handle case when there were no biases in model #629

NanoGPT Conversation did not handle case when there were no biases in model #629

dashstander commented Jun 7, 2024 •

edited

Loading

bryce13950 commented Jun 11, 2024

NanoGPT Conversation did not handle case when there were no biases in model #629

Are you sure you want to change the base?

NanoGPT Conversation did not handle case when there were no biases in model #629

Conversation

dashstander commented Jun 7, 2024 • edited Loading

Description

Type of change

Checklist:

bryce13950 commented Jun 11, 2024

dashstander commented Jun 7, 2024 •

edited

Loading