Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite csc2.exe/vbc2.exe in C# and merge with csc.exe/vbc.exe #137

Closed
agocke opened this issue Jan 29, 2015 · 9 comments
Closed

Rewrite csc2.exe/vbc2.exe in C# and merge with csc.exe/vbc.exe #137

agocke opened this issue Jan 29, 2015 · 9 comments
Labels
Area-Compilers Concept-Portability The issue deals with portable code (portable libraries, etc.). Feature Request Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented Verified

Comments

@agocke
Copy link
Member

agocke commented Jan 29, 2015

As it says in the title, get rid of all native code and make using the compiler server a switch on csc/vbc.

@VSadov
Copy link
Member

VSadov commented Jan 29, 2015

Getting rid of native code would be nice!!

@mikedn
Copy link

mikedn commented Feb 4, 2015

Is the compiler server actually useful? I did some testing on a large solution (>100 projects, > 500,000 lines) and it seems to be slower than the non-server approach.

By non-server approach I mean a IL-merged and NGENed Roslyn compiler, a single executable which depends only on .NET Framework assemblies and nothing else.

The server version takes 57 seconds to build the solution while the non-server version takes only 52 seconds. The native version of the compiler takes 43 seconds to complete the build.

All tests have been run twice and the time of the second run has been taken. That means that JIT time should not impact the server version result because the server was already running when I started the second build. All tests have been run using the same msbuild command: msbuild x.sln /t:build /m on a previously cleaned solution.

@pharring
Copy link
Contributor

pharring commented Feb 4, 2015

The compiler server (VBCSCompiler.exe) helps in two areas:

  1. Startup time (if the server is already running)
  2. Caching of metadata references across projects.
    I'm surprised by your server versus non-server results. That's contrary to all the performance data we have. Was the non-server version running as 32-bit or 64-bit? (The server runs 64-bit if available). How many cores?
    Is the IL-merging step necessary to get that kind of speed-up? If so, what's your explanation? More inlining opportunities? De-virtualization? Better code-gen for explicit generic instantiations?

@jmarolf
Copy link
Contributor

jmarolf commented Feb 4, 2015

I would love to get to the bottom of this with a more questions:

  1. What solution did you use for testing? (Is there any way we could look at this solution for our own testing?) Alternatively, what are your numbers for compiling roslyn itself (>100 projects, > 1,000,000 lines)?
  2. What was the exact, reproducible steps you used to create the non-server version?
  3. What hardware was this test run on? We run tests on 2-core, 4-core, and 6-core machines. Where was this machine in that spectrum?

@mikedn
Copy link

mikedn commented Feb 5, 2015

  • The tests were run on Win8.1 x86 so it's all 32 bit
  • The CPU is a quad core Haswell i5 3.10 GHz.
  • The machine has 8 GB ram but of course, the 32 bit OS only uses around 3.5 GB. Max private bytes for the compiler server was around 450 MB and only for a short time.
  • The OS and compilers are on a SSD but the solution is on a HDD
  • The solution is a proprietary application so I cannot provide it for testing. It has 133 projects, all C#, most are class libraries. Most of code is concentrated in ~ 10 projects, around half of the projects have less than 5000 lines of code. The 500,000 line count comes from VS code metrics tool and that tool is very aggressive at removing non code. It also removes code that's auto generated (web references for example) and we have quite a bit of it. The raw line count is significantly higher, something like 800,000.
  • The main reason I merged the assemblies is that I can't ngen them individually, ngen errors out with a "strong name validation failed". A side effect of the merging is that the strong name is discarded and ngen stops complaining
  • I can only guess that merging also decreases the startup time because there are fewer assembly binds that need to be performed.
  • The ILMerge command was ilmerge /lib:. /target:exe /closed /v4 /internalize /out:csc6.exe csc.exe Microsoft.CodeAnalysis.CSharp.Desktop.dll Microsoft.CodeAnalysis.CSharp.dll Microsoft.CodeAnalysis.Desktop.dll Microsoft.CodeAnalysis.dll System.Collections.Immutable.dll System.Reflection.Metadata.dll. The resulting csc6.exe was simply ngened with ngen install csc6.exe.
  • I did a test with a merged and ngened VBCSCompiler.exe. It's as fast as my non-server csc6.
  • I can't build Roslyn.sln due to a missing project. Roslyn2013.sln builds in 64 seconds on the same machine, with its own compiler toolset. As far as I can tell my merged compiler can't be used with the Roslyn solution due to analyzers, they probably fail to load as a side effect of merging and without analyzers the performance numbers will be skewed.

I may try to re-run the tests on a 64 bit OS on the same hardware but that will have to wait until the weekend. And it appears to me that the last 2 points sort of settle the issue.

Even if merging happens to offer some performance benefits in some scenarios it's not an option because it breaks analyzers. If ngen offers some performance benefits then there's nothing stopping the server from taking advantage of it (except perhaps the huge size of native images - the merged ones have ~33 megabytes, non merged ones would probably be even larger).

@pharring
Copy link
Contributor

pharring commented Feb 5, 2015

Thanks for the details, @mikedn

To be honest, we haven't done any performance testing on an x86 OS in a long while. However, the Roslyn compiler is processor neutral so the advantages of x86 over x64 (smaller data structures; "classic" JIT versus RyuJIT) should apply equally to both server and non-server scenarios. We have performed a comparison of x86 (running in Wow64) versus x64 on a 64-bit OS and shown that x86 is faster (~10% on large solutions such as yours) but, for the compiler server, where we keep metadata references in memory long-term, we want the address space benefits that x64 gives us.

When installed via Visual Studio, or the MSBuild tools installer, we do ngen the compiler pieces. We also do IBC training to get a small boost. Importantly, we also use "partial ngen" to cut down on the final native image size. In any case, as long as you're always comparing "ngen versus ngen" or "non-ngen versus non-ngen", then the differences should be slight.

The signing issue is a good point. You could probably modify the build to sign with your own private key (instead of delay-signing with the Roslyn public key).

I guess analyzers fail to load in the ILMerge version because the analyzers reference compiler assemblies which won't be present.

Anyway, I look forward to seeing further comparisons and I'll do my own sanity checking here.

@agocke
Copy link
Member Author

agocke commented Feb 5, 2015

One of the early things that we observed was that address space utilization was a problem in the server on 32-bit machines and it would sometimes OOM and fall back to in-proc compilation. I would probably want to confirm that's not happening.

@mikedn
Copy link

mikedn commented Feb 7, 2015

Well, I reran the tests on a 64 bit OS - the latest Win10 preview. I'm not sure what version of .NET Framework it has but it has to be a 4.6 preview version because it contains RyuJIT. Same hardware, same solution but for the server test I used the binaries that ship with VS2015 CTP5 (I didn't find any ngen images for them but yes, they do contain IBC data).

Surprisingly or not, the x64 results are exactly the opposite. The server version is faster, it builds that solution in ~50s. My merged/ngened compiler takes around 63s.

The strange thing is that I tried to build the solution using the x86 tools (Wow64) and the results are pretty much identical to the x64 results. I was expecting to results similar to the ones I got on the x86 OS or at least some variation in numbers due to the significant difference in memory usage (see the end of my post).

And before you ask, yes, I double checked and triple checked. I kept an eye on task manager to make sure that the right compiler was used - 32/64/no-server/server. I reran the tests on the x86 OS, same results that I got 3 days ago.

Well, I have no idea what's going on the x86 OS but I suppose it can safely be discarded as a weird case.

One of the early things that we observed was that address space utilization was a problem in the server on 32-bit machines and it would sometimes OOM and fall back to in-proc compilation. I would probably want to confirm that's not happening.

During my x86 tests? As far as I could tell that didn't happen, given the CPU and memory usage the server was doing all the work. On x86 the server peaked at around 450 megabytes and I've got the same value under Wow64. The x64 server peaked at around 1 gigabyte.

@gafter gafter added the Concept-Portability The issue deals with portable code (portable libraries, etc.). label Feb 25, 2015
@theoy theoy added this to the 1.0 (stable) milestone Mar 5, 2015
@agocke
Copy link
Member Author

agocke commented Mar 26, 2015

Closed by 2f77a18

@agocke agocke closed this as completed Mar 26, 2015
@gafter gafter added Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented and removed 3 - Working labels Apr 19, 2015
@agocke agocke removed their assignment Apr 27, 2015
@gafter gafter added the Verified label Jun 2, 2015
@gafter gafter self-assigned this Jun 2, 2015
agocke added a commit to agocke/roslyn that referenced this issue Aug 5, 2015
@jaredpar jaredpar removed this from the 1.0 (stable) milestone Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Concept-Portability The issue deals with portable code (portable libraries, etc.). Feature Request Resolution-Fixed The bug has been fixed and/or the requested behavior has been implemented Verified
Projects
None yet
Development

No branches or pull requests

8 participants