Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

i64x2.ne instruction #411

Merged
merged 2 commits into from
Jan 30, 2021
Merged

i64x2.ne instruction #411

merged 2 commits into from
Jan 30, 2021

Conversation

Maratyszcza
Copy link
Contributor

Introduction

This is proposal to add 64-bit variant of existing ne instruction. This is motivated by the proposal to add 64-bit variant of eq instruction in #381 and the decision on #351 to keep ne instructions. The only instruction set to natively support this instruction is AMD XOP, but on ARM64 and x86 (since SSE4.1) the lowering is no worse than for other ne forms.

Mapping to Common Instruction Sets

This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.

x86/x86-64 processors with AVX512F and AVX512VL instruction sets:

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to VPCMPEQQ xmm_y, xmm_a, xmm_b + VPTERNLOGQ xmm_y, xmm_y, xmm_y, 0x55

x86/x86-64 processors with XOP instruction set

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to VPCOMEQQ xmm_y, xmm_a, xmm_b

x86/x86-64 processors with AVX instruction set

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to VPCMPEQQ xmm_y, xmm_a, xmm_b + VPXOR xmm_y, xmm_y, [wasm_i64x2_splat(-1)]

x86/x86-64 processors with SSE4.1 instruction set

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to:
      • MOVDQA xmm_y, xmm_a
      • PCMPEQQ xmm_y, xmm_b
      • PXOR xmm_y, [wasm_i64x2_splat(-1)]

x86/x86-64 processors with SSE2 instruction set

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to:
      • MOVDQA xmm_y, xmm_a
      • PCMPEQD xmm_y, xmm_b
      • PSHUFD xmm_tmp, xmm_y, 0xB1
      • PAND xmm_y, xmm_tmp
      • PXOR xmm_y, [wasm_i64x2_splat(-1)]

ARM64 processors

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to CMEQ Vy.2D, Va.2D, Vb.2D + MVN Vy.16B, Vy.16B

ARMv7 processors with NEON instruction set

  • i64x2.ne
    • y = i64x2.ne(a, b) is lowered to:
      • VCEQ.I32 Qy, Qa, Qb
      • VREV64.32 Qtmp, Qy
      • VAND Qy, Qtmp
      • VMVN Qy, Qy

@abrown
Copy link
Contributor

abrown commented Jan 11, 2021

I was in favor of #351 (removing ne altogether) so I'm less favorably disposed to this one. I think the main argument for adding it is orthogonality, right? And I have felt that we should be putting more weight on performance implications than orthogonality.

Copy link
Member

@dtig dtig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is approved for merge as of #419

@tlively tlively merged commit 394330d into WebAssembly:master Jan 30, 2021
ngzhian added a commit to ngzhian/simd that referenced this pull request Feb 2, 2021
These instructions were added in WebAssembly#381 and WebAssembly#411 respectively.

The binary opcodes for these are still not finalized, I'm using what V8
is using for now.
ngzhian added a commit that referenced this pull request Feb 3, 2021
These instructions were added in #381 and #411 respectively.

The binary opcodes for these are still not finalized, I'm using what V8
is using for now.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants