Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal] Add vector intrinsics for loading into lane 0 and setting other lanes to 0 #269

Open
rsandifo-arm opened this issue Jul 20, 2023 · 1 comment
Labels

Comments

@rsandifo-arm
Copy link
Contributor

It would be useful to have vector intrinsics that load lane 0 from memory and set the other elements to zero. E.g.:

  • int8x16_t vfoo_s8(const int8_t *)LDR Bn, [Xn]
  • int16x8_t vfoo_s16(const int16_t *)LDR Hn, [Xn]
  • ….

The same thing would work for SVE.

GCC does at least optimise something like:

#include <arm_neon.h>

float32x2_t f(float32_t *ptr)
{
    float32x2_t vec = {};
    vec = vld1_lane_f32(ptr, vec, 0);
    vec = vld1_lane_f32(ptr + 2, vec, 1);
    return vec;
}

to:

        ldr     s0, [x0], 8
        ld1     {v0.s}[1], [x0]
        ret

and LLVM behaves similarly, but that seems a bit indirect.

@vhscampos
Copy link
Member

Hi, thanks for your issue report.
If possible, we encourage you to contribute with a Pull Request that addresses this issue. We will be happy to review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants