Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove netfx-specific code from projects that no longer build in netfx configurations #27114

Closed
TimLovellSmith opened this issue Aug 11, 2018 · 8 comments · Fixed by dotnet/corefx#31818
Labels
area-System.IO.Compression enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@TimLovellSmith
Copy link
Contributor

TLDR; basically I think there are a bunch of small tweaks to the code which could make this faster, along with the 'bigger' tweaks (risks) of using unsafe code, and really baking little-endian assumptions into the implementation, which I am thinking of doing a pull request for - high level QN: does it sound useful?

I saw a thread today talking about System.Compression.Crc32Helper using relatively a lot of CPU. Idly I wondered how this could be possible, so I copied the code from this repo into a project, came up with a dumb synthetic benchmark where I did millions of CRC32 computation on an array of length 12 for a few seconds, and looked at what the VS 2017 profiler (for .net framework, AnyCPU, release mode) told me the line costs where.

What I found was that the hottest line was the inner statement of the latter loop:

             for (int i = 0; i < endBytes; i++)
            {
                crc32 = s_crcTable_0[(crc32 ^ buffer[offset++]) & 0x000000FF] ^ (crc32 >> 8);
            }

Which for me was compiled (JITted?) as native code:

           for (int i = 0; i < endBytes; i++)
00E108A8  cmp         dword ptr [ebp-14h],0  
00E108AC  jle         00E108EF  
00E108AE  mov         eax,dword ptr [ebp-28h]  
00E108B1  mov         esi,dword ptr [eax+4]  
            {
                crc32 = s_crcTable_0[(crc32 ^ buffer[offset++]) & 0x000000FF] ^ (crc32 >> 8);
00E108B4  mov         dword ptr [ebp-20h],edi  
00E108B7  inc         edi  
00E108B8  mov         eax,dword ptr ds:[03CE353Ch]  
00E108BD  mov         edx,dword ptr [ebp-20h]  
00E108C0  mov         ecx,dword ptr [ebp-28h]  
00E108C3  cmp         edx,esi  
00E108C5  jae         00E108FE  
00E108C7  movzx       edx,byte ptr [ecx+edx+8]  
00E108CC  xor         edx,ebx  
00E108CE  and         edx,0FFh  
00E108D4  cmp         edx,dword ptr [eax+4]  
00E108D7  jae         00E108FE  
00E108D9  mov         eax,dword ptr [eax+edx*4+8]  
00E108DD  shr         ebx,8  
00E108E0  xor         eax,ebx  
00E108E2  mov         ebx,eax  
            for (int i = 0; i < endBytes; i++)
00E108E4  inc         dword ptr [ebp-1Ch]  
00E108E7  mov         eax,dword ptr [ebp-1Ch]  
00E108EA  cmp         eax,dword ptr [ebp-14h]  
00E108ED  jl          00E108B4  
            }

The thing that jumped out at me first while looking at this was the 'jae' checks, which I guessed might be array bound checks! Using 'unsafe' should make this faster right? (maybe)

So I tried it out. I didn't notice an obviously significant change from that, but while trying to understand the disassembly further, I realized that the way 'slice by 8' was being implemented by JIT or whoever did the assembler here was not very optimal. It was literally reading 1 byte at a time from the array and zero extending, then xoring all the zero extended words, just to read (and XOR) all the asserted-as-little-endian bytes into the uint32 variable 'crc32?' I think no C or assembly programmer would do it that way...

                        crc32 ^= unchecked((uint)(buffptr[offset] | buffptr[offset + 1] << 8 | buffptr[offset + 2] << 16 | buffptr[offset + 3] << 24));
01AE0748  movzx       eax,byte ptr [ebx+esi]  
01AE074C  movzx       edx,byte ptr [ebx+esi+1]  
01AE0751  shl         edx,8  
01AE0754  or          eax,edx  
01AE0756  movzx       edx,byte ptr [ebx+esi+2]  
01AE075B  shl         edx,10h  
01AE075E  or          eax,edx  
01AE0760  movzx       edx,byte ptr [ebx+esi+3]  
01AE0765  shl         edx,18h  
01AE0768  or          eax,edx  
01AE076A  xor         edi,eax  

So I switched that line out with crc32 ^= ((uint*)buffptr)[offset / 4]'

and it came out with something that looked much more sensible to me, assuming I introduced no bug

00DA0748  mov         eax,edi  
00DA074A  test        eax,eax  
00DA074C  jns         00DA0751  
00DA074E  add         eax,3  
00DA0751  sar         eax,2  
00DA0754  xor         esi,dword ptr [ebx+eax*4]  

this seems to shave several percents off how many CPU samples I was spending on those particular lines, which were hot lines dotnet/corefx#2 and dotnet/corefx#3 for my scenario. And happily, these are the lines which are expected to be in the 'hot path' when processing nice big buffer inputs...

There was still some sorta 'crazy looking' inverse operations happening with multiplying and dividing (using right shift) by 4 or 8 to get array offsets which should be able to be canceled out by better manual assembly code... and the fact that both 'offset' and 'i' are getting incremented each time around the loop by 8 and 1 each, is a bit odd, you should only really need one loop counter....

@danmoseley
Copy link
Member

@jkotas what is our bar for introducing unsafe?

It may be worth trying SIMD here. There are instructions for CRC

@filipnavara
Copy link
Member

Please don't bake little endian assumptions into code unconditionally. The code may be reused in Mono, which runs on big endian platforms. It's ok to offer an optimized version and guard it by BitConverter.IsLittleEndian, which is treated as intrinsic by the runtime.

@jkotas
Copy link
Member

jkotas commented Aug 11, 2018

https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System/IO/Compression/Crc32Helper.Managed.cs is dead code in CoreFX. It was used in the netfx implementation of System.IO.Compression that we are not building in CoreFX anymore. The thing to do here is to cleanup netfx specific stuff from https://github.com/dotnet/corefx/blob/master/src/System.IO.Compression/src/System.IO.Compression.csproj.

It may be worth trying SIMD here. There are instructions for CRC

The native implementation of CRC32 that corefx is using has SIMD already: https://github.com/dotnet/corefx/blob/bffef76f6af208e2042a2f27bc081ee908bb390b/src/Native/Windows/clrcompression/zlib-intel/crc32.c#L473

@filipnavara
Copy link
Member

@jkotas A bit off-topic, but what is the current status of WASM targetting with CoreFX + CoreRT? Someone was asking for WASM-compatible implementation of DeflateStream and that could still be a good use for the managed code...

@jkotas
Copy link
Member

jkotas commented Aug 11, 2018

It is being worked on as a side-project: https://github.com/dotnet/corefx/issues/26107. The WASM implementation should be able to use the stock C implementation just fine. I do not think there is a strong reason to introduce managed implementation just for the WASM.

@danmoseley
Copy link
Member

There may be other cleanup of project file content stemming from eade3b3 (which removed some netfx configurations). For example src\System.Net.Http\ref\System.Net.Http.csproj has '$(TargetGroup)' == 'netfx' although this commit removed the netfx configuration from src/System.Net.Http/ref/Configurations.props.

@TimLovellSmith do you have any interest in cleaning these up? No particular reason why you should,...

@danmoseley danmoseley changed the title Should I optimize code of managed Crc32Helper? Remove netfx-specific code from projects that no longer build in netfx configurations Aug 13, 2018
@TimLovellSmith
Copy link
Contributor Author

Well, I can only think of one particular reason - I would like there to be some fruit from this discussion. E.g. making a non-existing performance issue look non-existent. I will need to get to grips with the rest of the contribution process, but can give it a go.

@TimLovellSmith
Copy link
Contributor Author

So... sanity checks wanted!
I found conditions like '$(TargetGroup)' == 'netfx' in a bunch of projects.

These ones look still intended to be needed:

  • CoreFx.Private.TestUtilities.csproj
  • Microsoft.XmlSerializer.Generator.Tests.csproj
  • System.Diagnostics.DiagnosticSource.csproj
  • System.Diagnostics.DiagnosticSource.Tests.csproj
  • System.IO.Packaging.csproj
  • System.Numerics.Vectors.Tests.csproj
  • System.Runtime.Tests.csproj

Whereas these next other ones, seem to be the prime targets for cleanup - it looks like 'netfx' configuration is no longer a thing for these, and I should clean up the project accordingly:

  • System.Data.Common.csproj
  • System.IO.Compression.csproj
  • System.Diagnostics.StackTrace.csproj
  • System.Net.Http.csproj
  • System.Net.Sockets.csproj
  • System.Runtime.InteropServices.RuntimeInformation.csproj
  • System.Runtime.Serialization.Primitives.csproj
  • System.Runtime.Serialization.Xml.csproj
  • System.Security.Cryptography.Algorithms.csproj
  • System.Security.SecureString.csproj
  • System.Threading.Overlapped.csproj
  • System.Xml.XPath.XDocument.csproj

and perhaps even

  • System.Diagnostics.Debug.Tests.csproj

By the way I also found 'netfx' configuration exists in the Configuration.props for various other projects not mentioned above, even though they don't directly test for TargetGroup 'netfx' in their project file. I haven't analyzed why, so let me know if that seems surprising...

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 3.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO.Compression enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
5 participants