Replies: 1 comment
-
Yes, it sorts columns of the linear layer by their group index, which allows act-order matrices to be processed the same way as non act-order matrices. Then at inference time, the input to the linear layer is reordered accordingly (same permutation) and you end up with the same output: WX = P_col(W) P_row(X), or XW = P_col(x) P_row(W) depending on how you look at it. This way you only have to apply a fixed permutation to the input vector, which turns out to be much more efficient than the random access to load a different set of group parameters for every individual column. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If the weight is reordered to let the group_idx be sequential among the columns. The activation order seems to be affected. So does exllama reordered the activation order during inference?
Beta Was this translation helpful? Give feedback.
All reactions