Provided vdecompress example may not work with SEW=8, VLMAX>256 #893

zingaburga · 2023-06-22T00:04:43Z

The provided example to simulate a 'vdecompress' instruction seems like it'd behave unexpectedly under certain configurations.

From what I can gather, with SEW=8, viota will wrap around after 255, and vrgather can only access the first 256 bytes of the vector. Thus the provided sequence will only work for SEW>8, or if VLMAX has been explicitly checked and found to be <=256.
A length-agnostic approach for SEW=8 might be: change SEW to 16-bit + double LMUL, viota, change back to SEW=8, then vrgatherei16 (this only works for LMUL<=4, but LMUL is user-controllable).

If my understanding is correct, I recommend clarifying this in the documentation.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provided vdecompress example may not work with SEW=8, VLMAX>256 #893

Provided vdecompress example may not work with SEW=8, VLMAX>256 #893

zingaburga commented Jun 22, 2023

Provided vdecompress example may not work with SEW=8, VLMAX>256 #893

Provided vdecompress example may not work with SEW=8, VLMAX>256 #893

Comments

zingaburga commented Jun 22, 2023