-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RyuJit/x64] Generates extra movsxd when using a byte value as an address offset #8465
Comments
Likely but it's not trivial unfortunately. Making the memory load produce a 64 bit value directly it's problematic due to the they way loads are represented in JIT. But it may be possible to turn the cast that follows it into a no-op because the register ( As a workaround you can cast the |
The |
Right, it generates |
@dotnet/jit-contrib |
Looks like we should be able to handle this by making the cast operand contained (when it's an indir). Codegen needs a bit of work, it doesn't seem possible to handle this right now. |
@RussKeldorph I'm working on a fix in dotnet/coreclr#12676, you can assign to me |
As far as I can tell this also affects ARM64: static long Test(ref sbyte x) => (long)x; generates G_M55886_IG01:
A9BF7BFD stp fp, lr, [sp,#-16]!
910003FD mov fp, sp
G_M55886_IG02:
39800000 ldrsb x0, [x0]
93407C00 sxtw x0, x0
G_M55886_IG03:
A8C17BFD ldp fp, lr, [sp],#16
D65F03C0 ret lr @sdmaclea The |
@mikedn Correct sxtw is redundant correct |
I've just noticed similar redundant instructions when narrowing (u)int->(u)short as well. Here, the i0-i3 variables are 235: dest[0] = (ushort)i0;
00007FFC74A01C41 movzx r9d,r9w
00007FFC74A01C45 mov word ptr [r8],r9w
236: dest[1] = (ushort)i1;
00007FFC74A01C4A movzx r9d,r10w
00007FFC74A01C4E mov word ptr [r8+2],r9w
237: dest[2] = (ushort)i2;
00007FFC74A01C53 movzx r9d,r11w
00007FFC74A01C57 mov word ptr [r8+4],r9w Would this be the same issue? |
Nope, different issue. What is |
BTW, after encountering don't remember how many bugs while working on this issue I'm finally back on it. Hopefully I'll fix it this month. |
|
Sorry, that was a false alarm. It repros under netcoreapp2.0 but not under netcoreapp2.1 or current master. |
Looks fixed today: ; Assembly listing for method Program:LUTMap(ulong,ulong,ulong,int) (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 11, 32 ) long -> rcx
; V01 arg1 [V01,T01] ( 8, 26 ) long -> rdx
; V02 arg2 [V02,T02] ( 6, 18 ) long -> r8 single-def
; V03 arg3 [V03,T04] ( 3, 3 ) int -> r9 single-def
; V04 loc0 [V04,T03] ( 3, 6 ) long -> rax single-def
;# V05 OutArgs [V05 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M55172_IG01: ;; offset=0x0000
;; size=0 bbWeight=1 PerfScore 0.00
G_M55172_IG02: ;; offset=0x0000
movsxd rax, r9d
lea rax, [rcx+rax-0x04]
cmp rcx, rax
ja SHORT G_M55172_IG04
align [0 bytes for IG03]
;; size=13 bbWeight=1 PerfScore 2.50
G_M55172_IG03: ;; offset=0x000D
movzx r10, byte ptr [rcx]
vmovss xmm0, dword ptr [r8+4*r10]
vmovss dword ptr [rdx], xmm0
movzx r10, byte ptr [rcx+0x01]
vmovss xmm0, dword ptr [r8+4*r10]
vmovss dword ptr [rdx+0x04], xmm0
movzx r10, byte ptr [rcx+0x02]
vmovss xmm0, dword ptr [r8+4*r10]
vmovss dword ptr [rdx+0x08], xmm0
movzx r10, byte ptr [rcx+0x03]
vmovss xmm0, dword ptr [r8+4*r10]
vmovss dword ptr [rdx+0x0C], xmm0
add rcx, 4
add rdx, 16
cmp rcx, rax
jbe SHORT G_M55172_IG03
;; size=75 bbWeight=4 PerfScore 135.00
G_M55172_IG04: ;; offset=0x0058
ret
;; size=1 bbWeight=1 PerfScore 1.00 |
In image processing, it's common to map byte values (8-bit-per-channel pixels) through a lookup table.
RyuJit x64 apparently uses a 32-bit
movzx
to read byte values, then widens usingmovsxd
before using them as address offsets.Here's a quick sample program demonstrating the issue:
RyuJit x64 generates the following for the LUTMap method
Can it be modified to use the 64-bit
movzx
in cases like this?category:cq
theme:basic-cq
skill-level:intermediate
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: