[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

wangyifei0047 · 2024-07-06T09:41:39Z

In LlamaModeling.py, the LlamaRMSNorm function outputs the weights * scaled hidden_states like below

RMSNormPre definition in Transformer_lens: it seems that this function just outputs the scaled hidden_states

The way RMSNormPre by which Transformer_Block uses
it seems that in the forward process in Transformer_Block, the weights of LlamaRMSNorm still not be added.

I want to hook the values after applying RMSNorm on each residual stream, so I try to find the parameters in RMSNorm and find something weird.

[yes] I have checked that there is no similar issue in the repo (required)

4gatepylon · 2024-09-03T19:08:14Z

Have you tried comparing intermediate values using hooks? It may be the case that they folded into the weights of a subsequent layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

wangyifei0047 commented Jul 6, 2024

4gatepylon commented Sep 3, 2024

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

[Bug Report] RMSNormPre in Transformer_lens is maybe different from Llama source code? #657

Comments

wangyifei0047 commented Jul 6, 2024

4gatepylon commented Sep 3, 2024