Remove netfx-specific code from projects that no longer build in netfx configurations #27114

TimLovellSmith · 2018-08-11T00:52:01Z

TLDR; basically I think there are a bunch of small tweaks to the code which could make this faster, along with the 'bigger' tweaks (risks) of using unsafe code, and really baking little-endian assumptions into the implementation, which I am thinking of doing a pull request for - high level QN: does it sound useful?

I saw a thread today talking about System.Compression.Crc32Helper using relatively a lot of CPU. Idly I wondered how this could be possible, so I copied the code from this repo into a project, came up with a dumb synthetic benchmark where I did millions of CRC32 computation on an array of length 12 for a few seconds, and looked at what the VS 2017 profiler (for .net framework, AnyCPU, release mode) told me the line costs where.

What I found was that the hottest line was the inner statement of the latter loop:

             for (int i = 0; i < endBytes; i++)
            {
                crc32 = s_crcTable_0[(crc32 ^ buffer[offset++]) & 0x000000FF] ^ (crc32 >> 8);
            }

Which for me was compiled (JITted?) as native code:

           for (int i = 0; i < endBytes; i++)
00E108A8  cmp         dword ptr [ebp-14h],0  
00E108AC  jle         00E108EF  
00E108AE  mov         eax,dword ptr [ebp-28h]  
00E108B1  mov         esi,dword ptr [eax+4]  
            {
                crc32 = s_crcTable_0[(crc32 ^ buffer[offset++]) & 0x000000FF] ^ (crc32 >> 8);
00E108B4  mov         dword ptr [ebp-20h],edi  
00E108B7  inc         edi  
00E108B8  mov         eax,dword ptr ds:[03CE353Ch]  
00E108BD  mov         edx,dword ptr [ebp-20h]  
00E108C0  mov         ecx,dword ptr [ebp-28h]  
00E108C3  cmp         edx,esi  
00E108C5  jae         00E108FE  
00E108C7  movzx       edx,byte ptr [ecx+edx+8]  
00E108CC  xor         edx,ebx  
00E108CE  and         edx,0FFh  
00E108D4  cmp         edx,dword ptr [eax+4]  
00E108D7  jae         00E108FE  
00E108D9  mov         eax,dword ptr [eax+edx*4+8]  
00E108DD  shr         ebx,8  
00E108E0  xor         eax,ebx  
00E108E2  mov         ebx,eax  
            for (int i = 0; i < endBytes; i++)
00E108E4  inc         dword ptr [ebp-1Ch]  
00E108E7  mov         eax,dword ptr [ebp-1Ch]  
00E108EA  cmp         eax,dword ptr [ebp-14h]  
00E108ED  jl          00E108B4  
            }

The thing that jumped out at me first while looking at this was the 'jae' checks, which I guessed might be array bound checks! Using 'unsafe' should make this faster right? (maybe)

So I tried it out. I didn't notice an obviously significant change from that, but while trying to understand the disassembly further, I realized that the way 'slice by 8' was being implemented by JIT or whoever did the assembler here was not very optimal. It was literally reading 1 byte at a time from the array and zero extending, then xoring all the zero extended words, just to read (and XOR) all the asserted-as-little-endian bytes into the uint32 variable 'crc32?' I think no C or assembly programmer would do it that way...

                        crc32 ^= unchecked((uint)(buffptr[offset] | buffptr[offset + 1] << 8 | buffptr[offset + 2] << 16 | buffptr[offset + 3] << 24));
01AE0748  movzx       eax,byte ptr [ebx+esi]  
01AE074C  movzx       edx,byte ptr [ebx+esi+1]  
01AE0751  shl         edx,8  
01AE0754  or          eax,edx  
01AE0756  movzx       edx,byte ptr [ebx+esi+2]  
01AE075B  shl         edx,10h  
01AE075E  or          eax,edx  
01AE0760  movzx       edx,byte ptr [ebx+esi+3]  
01AE0765  shl         edx,18h  
01AE0768  or          eax,edx  
01AE076A  xor         edi,eax

So I switched that line out with crc32 ^= ((uint*)buffptr)[offset / 4]'

and it came out with something that looked much more sensible to me, assuming I introduced no bug

00DA0748  mov         eax,edi  
00DA074A  test        eax,eax  
00DA074C  jns         00DA0751  
00DA074E  add         eax,3  
00DA0751  sar         eax,2  
00DA0754  xor         esi,dword ptr [ebx+eax*4]

this seems to shave several percents off how many CPU samples I was spending on those particular lines, which were hot lines dotnet/corefx#2 and dotnet/corefx#3 for my scenario. And happily, these are the lines which are expected to be in the 'hot path' when processing nice big buffer inputs...

There was still some sorta 'crazy looking' inverse operations happening with multiplying and dividing (using right shift) by 4 or 8 to get array offsets which should be able to be canceled out by better manual assembly code... and the fact that both 'offset' and 'i' are getting incremented each time around the loop by 8 and 1 each, is a bit odd, you should only really need one loop counter....

The text was updated successfully, but these errors were encountered:

danmoseley · 2018-08-11T09:40:15Z

@jkotas what is our bar for introducing unsafe?

It may be worth trying SIMD here. There are instructions for CRC

filipnavara · 2018-08-11T11:43:18Z

Please don't bake little endian assumptions into code unconditionally. The code may be reused in Mono, which runs on big endian platforms. It's ok to offer an optimized version and guard it by BitConverter.IsLittleEndian, which is treated as intrinsic by the runtime.

jkotas · 2018-08-11T13:17:14Z

https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/Crc32Helper.Managed.cs is dead code in CoreFX. It was used in the netfx implementation of System.IO.Compression that we are not building in CoreFX anymore. The thing to do here is to cleanup netfx specific stuff from https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System.IO.Compression.csproj.

It may be worth trying SIMD here. There are instructions for CRC

The native implementation of CRC32 that corefx is using has SIMD already: https://github.com/dotnet/corefx/blob/bffef76f6af208e2042a2f27bc081ee908bb390b/src/Native/Windows/clrcompression/zlib-intel/crc32.c#L473

filipnavara · 2018-08-11T13:37:24Z

@jkotas A bit off-topic, but what is the current status of WASM targetting with CoreFX + CoreRT? Someone was asking for WASM-compatible implementation of DeflateStream and that could still be a good use for the managed code...

jkotas · 2018-08-11T13:41:39Z

It is being worked on as a side-project: https://github.com/dotnet/corefx/issues/26107. The WASM implementation should be able to use the stock C implementation just fine. I do not think there is a strong reason to introduce managed implementation just for the WASM.

danmoseley · 2018-08-13T16:55:28Z

There may be other cleanup of project file content stemming from eade3b3 (which removed some netfx configurations). For example src\System.Net.Http\ref\System.Net.Http.csproj has '$(TargetGroup)' == 'netfx' although this commit removed the netfx configuration from src/System.Net.Http/ref/Configurations.props.

@TimLovellSmith do you have any interest in cleaning these up? No particular reason why you should,...

TimLovellSmith · 2018-08-13T19:00:17Z

Well, I can only think of one particular reason - I would like there to be some fruit from this discussion. E.g. making a non-existing performance issue look non-existent. I will need to get to grips with the rest of the contribution process, but can give it a go.

TimLovellSmith · 2018-08-15T19:44:24Z

So... sanity checks wanted!
I found conditions like '$(TargetGroup)' == 'netfx' in a bunch of projects.

These ones look still intended to be needed:

CoreFx.Private.TestUtilities.csproj
Microsoft.XmlSerializer.Generator.Tests.csproj
System.Diagnostics.DiagnosticSource.csproj
System.Diagnostics.DiagnosticSource.Tests.csproj
System.IO.Packaging.csproj
System.Numerics.Vectors.Tests.csproj
System.Runtime.Tests.csproj

Whereas these next other ones, seem to be the prime targets for cleanup - it looks like 'netfx' configuration is no longer a thing for these, and I should clean up the project accordingly:

System.Data.Common.csproj
System.IO.Compression.csproj
System.Diagnostics.StackTrace.csproj
System.Net.Http.csproj
System.Net.Sockets.csproj
System.Runtime.InteropServices.RuntimeInformation.csproj
System.Runtime.Serialization.Primitives.csproj
System.Runtime.Serialization.Xml.csproj
System.Security.Cryptography.Algorithms.csproj
System.Security.SecureString.csproj
System.Threading.Overlapped.csproj
System.Xml.XPath.XDocument.csproj

and perhaps even

System.Diagnostics.Debug.Tests.csproj

By the way I also found 'netfx' configuration exists in the Configuration.props for various other projects not mentioned above, even though they don't directly test for TargetGroup 'netfx' in their project file. I haven't analyzed why, so let me know if that seems surprising...

danmoseley changed the title ~~Should I optimize code of managed Crc32Helper?~~ Remove netfx-specific code from projects that no longer build in netfx configurations Aug 13, 2018

ViktorHofer closed this as completed in dotnet/corefx#31818 Aug 26, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove netfx-specific code from projects that no longer build in netfx configurations #27114

Remove netfx-specific code from projects that no longer build in netfx configurations #27114

TimLovellSmith commented Aug 11, 2018

danmoseley commented Aug 11, 2018

filipnavara commented Aug 11, 2018

jkotas commented Aug 11, 2018 •

edited

Loading

filipnavara commented Aug 11, 2018

jkotas commented Aug 11, 2018

danmoseley commented Aug 13, 2018

TimLovellSmith commented Aug 13, 2018

TimLovellSmith commented Aug 15, 2018

Remove netfx-specific code from projects that no longer build in netfx configurations #27114

Remove netfx-specific code from projects that no longer build in netfx configurations #27114

Comments

TimLovellSmith commented Aug 11, 2018

danmoseley commented Aug 11, 2018

filipnavara commented Aug 11, 2018

jkotas commented Aug 11, 2018 • edited Loading

filipnavara commented Aug 11, 2018

jkotas commented Aug 11, 2018

danmoseley commented Aug 13, 2018

TimLovellSmith commented Aug 13, 2018

TimLovellSmith commented Aug 15, 2018

jkotas commented Aug 11, 2018 •

edited

Loading