We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cream/AutoFormer/model/module/qkv_super.py
Lines 72 to 77 in 4a13c40
N = weight.size(0) // 3 sample_weight = torch.cat([sample_weight[i*N:i*N+sample_out_dim//3, :] for i in range(3)], dim=0)
To be more intuitive, I drew a schematic diagram to represent the way 4 and 5 heads SA is shared with Linear.weight.
Maybe I misunderstood the implementation here, can you help check it?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Cream/AutoFormer/model/module/qkv_super.py
Lines 72 to 77 in 4a13c40
I think, there's something wrong in the way weight sharing is done here. I think this code should be:
To be more intuitive, I drew a schematic diagram to represent the way 4 and 5 heads SA is shared with Linear.weight.
Maybe I misunderstood the implementation here, can you help check it?
The text was updated successfully, but these errors were encountered: