You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 20, 2024. It is now read-only.
The provided example to simulate a 'vdecompress' instruction seems like it'd behave unexpectedly under certain configurations.
From what I can gather, with SEW=8, viota will wrap around after 255, and vrgather can only access the first 256 bytes of the vector. Thus the provided sequence will only work for SEW>8, or if VLMAX has been explicitly checked and found to be <=256.
A length-agnostic approach for SEW=8 might be: change SEW to 16-bit + double LMUL, viota, change back to SEW=8, then vrgatherei16 (this only works for LMUL<=4, but LMUL is user-controllable).
If my understanding is correct, I recommend clarifying this in the documentation.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The provided example to simulate a 'vdecompress' instruction seems like it'd behave unexpectedly under certain configurations.
From what I can gather, with SEW=8,
viota
will wrap around after 255, andvrgather
can only access the first 256 bytes of the vector. Thus the provided sequence will only work for SEW>8, or if VLMAX has been explicitly checked and found to be <=256.A length-agnostic approach for SEW=8 might be: change SEW to 16-bit + double LMUL,
viota
, change back to SEW=8, thenvrgatherei16
(this only works for LMUL<=4, but LMUL is user-controllable).If my understanding is correct, I recommend clarifying this in the documentation.
The text was updated successfully, but these errors were encountered: