Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

i64x2.eq instruction #381

Merged
merged 1 commit into from
Jan 30, 2021
Merged

i64x2.eq instruction #381

merged 1 commit into from
Jan 30, 2021

Conversation

Maratyszcza
Copy link
Contributor

@Maratyszcza Maratyszcza commented Oct 9, 2020

Introduction

This is proposal to add 64-bit variant of existing eq instruction. ARM64 and x86 (since SSE4.1) natively support this instruction, and on ARMv7 NEON and SSE2 is can be efficiently emulated with 3-4 instructions.

Applications

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX instruction set

  • i64x2.eq
    • y = i64x2.eq(a, b) is lowered to VPCMPEQQ xmm_y, xmm_a, xmm_b

x86/x86-64 processors with SSE4.1 instruction set

  • i64x2.eq
    • y = i64x2.eq(a, b) is lowered to MOVDQA xmm_y, xmm_a + PCMPEQQ xmm_y, xmm_b

x86/x86-64 processors with SSE2 instruction set

  • i64x2.eq
    • y = i64x2.eq(a, b) is lowered to:
      • MOVDQA xmm_y, xmm_a
      • PCMPEQD xmm_y, xmm_b
      • PSHUFD xmm_tmp, xmm_y, 0xB1
      • PAND xmm_y, xmm_tmp

ARM64 processors

  • i64x2.eq
    • y = i64x2.eq(a, b) is lowered to CMEQ Vy.2D, Va.2D, Vb.2D

ARMv7 processors with NEON instruction set

  • i64x2.eq
    • y = i64x2.eq(a, b) is lowered to:
      • VCEQ.I32 Qy, Qa, Qb
      • VREV64.32 Qtmp, Qy
      • VAND Qy, Qtmp

@ngzhian
Copy link
Member

ngzhian commented Oct 9, 2020

Any reason why we specifically want only i64x2.eq?
In #101 we decided that the set of i64x2 instructions we would keep did not include any of the comparisons.

@Maratyszcza
Copy link
Contributor Author

@ngzhian Of course, I'd rather have a full set of compare instructions, but ordered comparisons are hard to emulate in lieu of hardware support. On the other side, emulating 64-bit compare is trivial, and it is in our baseline ISAs (SSE4.1 and ARM64 NEON).

@ngzhian
Copy link
Member

ngzhian commented Oct 9, 2020

It looks incomplete that we only have i64x2.eq, and no other i64x2 comparisons.

How useful will only adding this instruction be? Are there use cases where adding this instruction is sufficient to unlock?

@Maratyszcza
Copy link
Contributor Author

I don't have any use-cases in mind, just trying to orthogonalize the instruction set.

@lemaitre
Copy link

lemaitre commented Oct 9, 2020

I have no code to present, but a use case for that is when vectorizing code that mix doubles and integers: in order to limit the number of shuffles (going back and forth 32-bit elements), one would use 64-bit integers.
There, it would be nice to have a i64x2.eq.

tlively added a commit to llvm/llvm-project that referenced this pull request Oct 30, 2020
As proposed in WebAssembly/simd#381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
@tlively
Copy link
Member

tlively commented Oct 30, 2020

This has been prototyped in LLVM (but not Binaryen) as __builtin_wasm_eq_i64x2. It should be usable from tot Emscripten in a few hours as long as you don't use optimization flags at link time.

@omnisip
Copy link

omnisip commented Dec 7, 2020

Any reason why we specifically want only i64x2.eq?
In #101 we decided that the set of i64x2 instructions we would keep did not include any of the comparisons.

Ditto on the same question. When posted to Stackoverflow regarding pcmpgtq, a response was provided that produced a high-quality result for both SSE2 as well as ARMv7+Neon.

tlively added a commit to tlively/binaryen that referenced this pull request Dec 11, 2020
tlively added a commit to WebAssembly/binaryen that referenced this pull request Dec 12, 2020
@Maratyszcza Maratyszcza mentioned this pull request Dec 23, 2020
@Maratyszcza
Copy link
Contributor Author

Added examples of applications

@abrown
Copy link
Contributor

abrown commented Jan 11, 2021

I actually think this would be nice to add if it didn't have orthogonality implications. Could we just merge this without i64x2.ne and the i64x2 comparisons?

Copy link
Member

@dtig dtig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for merge as of #419.

@tlively tlively merged commit c8c0de2 into WebAssembly:master Jan 30, 2021
ngzhian added a commit to ngzhian/simd that referenced this pull request Feb 2, 2021
These instructions were added in WebAssembly#381 and WebAssembly#411 respectively.

The binary opcodes for these are still not finalized, I'm using what V8
is using for now.
ngzhian added a commit that referenced this pull request Feb 3, 2021
These instructions were added in #381 and #411 respectively.

The binary opcodes for these are still not finalized, I'm using what V8
is using for now.
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Mar 25, 2021
As proposed in WebAssembly/simd#381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants